+ All Categories
Home > Documents > High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs...

High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs...

Date post: 19-Aug-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
18
Research Article High-Efficiency Min-Entropy Estimation Based on Neural Network for Random Number Generators Na Lv , 1,2,3 Tianyu Chen , 1,2 Shuangyi Zhu, 1,2,3 Jing Yang, 4 Yuan Ma, 1,2 Jiwu Jing, 5 and Jingqiang Lin 1,2 1 State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China 2 Data Assurance and Communications Security Research Center, Chinese Academy of Sciences, Beijing 100093, China 3 School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100093, China 4 School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100093, China 5 China Information Technology Security Evaluation Center, Beijing 100085, China Correspondence should be addressed to Tianyu Chen; [email protected] Received 30 November 2019; Accepted 22 January 2020; Published 17 February 2020 Academic Editor: Clemente Galdi Copyright © 2020 Na Lv et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Random number generator (RNG) is a fundamental and important cryptographic element, which has made an outstanding contribution to guaranteeing the network and communication security of cryptographic applications in the Internet age. In reality, if the random number used cannot provide sufficient randomness (unpredictability) as expected, these cryptographic applications are vulnerable to security threats and cause system crashes. Min-entropy is one of the approaches that are usually employed to quantify the unpredictability. e NIST Special Publication 800-90B adopts the concept of min-entropy in the design of its statistical entropy estimation methods, and the predictive model-based estimators added in the second draft of this standard effectively improve the overall capability of the test suite. However, these predictors have problems on limited application scope and high computational complexity, e.g., they have shortfalls in evaluating random numbers with long dependence and multivariate due to the huge time complexity (i.e., high-order polynomial time complexity). Fortunately, there has been increasing attention to using neural networks to model and forecast time series, and random numbers are also a type of time series. In our work, we propose several new and efficient approaches for min-entropy estimation by using neural network technologies and design a novel execution strategy for the proposed entropy estimation to make it applicable to the validation of both stationary and nonstationary sources. Compared with the 90B’s predictors officially published in 2018, the experimental results on various simulated and real-world data sources demonstrate that our predictors have a better performance on the accuracy, scope of applicability, and execution efficiency. e average execution efficiency of our predictors can be up to 10 times higher than that of the 90B’s for 10 6 sample size with different sample spaces. Furthermore, when the sample space is over 2 2 and the sample size is over 10 8 , the 90B’s predictors cannot give estimated results. Instead, our predictors can still provide accurate results. Copyright© 2019 John Wiley & Sons, Ltd. 1. Introduction Random number generator (RNG) is a fundamental and important element in modern cryptography, which es- pecially provides a basic guarantee for the security of the network and communication system, as in [1–4]. e output of RNGs, called random number, is widely used in a large number of security and cryptographic applications. ese applications include the generation of cryptographic keys, initialization vectors in cryptographic algorithm, digital signature generation, and nonces and padding values. If the output of RNGs cannot provide sufficient unpredictability as expected, the cryptographic applications would be vulnerable, as in [5–7]. us, the necessity of the security analysis on RNGs is self-evident, especially it is important to evaluate the quality of the entropy source which is the main source of randomness for RNGs. Hindawi Security and Communication Networks Volume 2020, Article ID 4241713, 18 pages https://doi.org/10.1155/2020/4241713
Transcript
Page 1: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

Research ArticleHigh-Efficiency Min-Entropy Estimation Based on NeuralNetwork for Random Number Generators

Na Lv 123 Tianyu Chen 12 Shuangyi Zhu123 Jing Yang4 Yuan Ma12 Jiwu Jing5

and Jingqiang Lin12

1State Key Laboratory of Information Security Institute of Information Engineering Chinese Academy of SciencesBeijing 100093 China2Data Assurance and Communications Security Research Center Chinese Academy of Sciences Beijing 100093 China3School of Cyber Security University of Chinese Academy of Sciences Beijing 100093 China4School of Computer Science and Technology University of Chinese Academy of Sciences Beijing 100093 China5China Information Technology Security Evaluation Center Beijing 100085 China

Correspondence should be addressed to Tianyu Chen chentianyuiieaccn

Received 30 November 2019 Accepted 22 January 2020 Published 17 February 2020

Academic Editor Clemente Galdi

Copyright copy 2020 Na Lv et al is is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Random number generator (RNG) is a fundamental and important cryptographic element which has made an outstandingcontribution to guaranteeing the network and communication security of cryptographic applications in the Internet age In reality ifthe random number used cannot provide sufficient randomness (unpredictability) as expected these cryptographic applications arevulnerable to security threats and cause system crashes Min-entropy is one of the approaches that are usually employed to quantifythe unpredictability e NIST Special Publication 800-90B adopts the concept of min-entropy in the design of its statistical entropyestimation methods and the predictive model-based estimators added in the second draft of this standard effectively improve theoverall capability of the test suite However these predictors have problems on limited application scope and high computationalcomplexity eg they have shortfalls in evaluating random numbers with long dependence and multivariate due to the huge timecomplexity (ie high-order polynomial time complexity) Fortunately there has been increasing attention to using neural networksto model and forecast time series and random numbers are also a type of time series In our work we propose several new andefficient approaches for min-entropy estimation by using neural network technologies and design a novel execution strategy for theproposed entropy estimation tomake it applicable to the validation of both stationary and nonstationary sources Compared with the90Brsquos predictors officially published in 2018 the experimental results on various simulated and real-world data sources demonstratethat our predictors have a better performance on the accuracy scope of applicability and execution efficiencye average executionefficiency of our predictors can be up to 10 times higher than that of the 90Brsquos for 106 sample size with different sample spacesFurthermore when the sample space is over 22 and the sample size is over 108 the 90Brsquos predictors cannot give estimated resultsInstead our predictors can still provide accurate results Copyrightcopy 2019 John Wiley amp Sons Ltd

1 Introduction

Random number generator (RNG) is a fundamental andimportant element in modern cryptography which es-pecially provides a basic guarantee for the security of thenetwork and communication system as in [1ndash4]e output of RNGs called random number is widelyused in a large number of security and cryptographicapplications ese applications include the generation of

cryptographic keys initialization vectors in cryptographicalgorithm digital signature generation and nonces andpadding values If the output of RNGs cannot providesufficient unpredictability as expected the cryptographicapplications would be vulnerable as in [5ndash7] us thenecessity of the security analysis on RNGs is self-evidentespecially it is important to evaluate the quality of theentropy source which is the main source of randomnessfor RNGs

HindawiSecurity and Communication NetworksVolume 2020 Article ID 4241713 18 pageshttpsdoiorg10115520204241713

At present in order to guide the designers users andassessors to analyze the security of RNGs many researchorganizations or individuals have provided a number ofapproaches for testing and evaluating the RNGs eseapproaches can be roughly divided into two classes sta-tistical property test and entropy estimation Specifically thestatistical property test is proposed at first such as the NISTSpecial Publication 800-22 [8] AIS 31 [9] Diehard battery[10] and TestU01 [11] which detects whether the outputsequence has obvious statistical defects Because it onlyfocuses on the statistical properties of outputs rather thanthe internal structure and generation principle of RNGs thestatistical property test is a universal (black-box) testingmethod for various types of generators and it is easy tooperate With in-depth understanding for the randomnessin the past few years the concept of ldquoentropyrdquo has beenproposed to evaluate the security of RNGs Entropy is ameasure of uncertainty that is appropriate to reflect theamount of randomness erefore several major stan-dardization organizationsrsquo criterions recommend to adoptthe entropy to quantify the randomness (unpredictability) ofthe outputs of RNGs such as the ISOIEC 18031 [12] andAIS 31 [9] ere are many types of methods for measuringentropy including Shannon entropy Renyi entropy andmin-entropy Min-entropy is a very conservative measurewhichmeans the difficulty of guessing themost-likely outputof entropy sources [13]

However the entropy estimation of entropy sources is avery challenging task because the common assumptions maynot be consistent with the real conditions and the distri-bution of the outputs is unknown Nowadays there are twoways to implement entropy estimation theoretical entropyestimation (stochastic model) and statistical entropy esti-mation A theoretical proof for the security of RNGs can beachieved from a suitable stochastic model as in [14ndash17] Butthe modeling is always difficult and complex because it isalways based on the specific structure of a RNG and anappropriate assumption on the entropy sourcersquos behaviorand even some structures of RNGs still do not have a suitablemodel [18ndash21] Relatively statistical entropy estimation isstill based on the idea of entropy estimation but it isimplemented bymeans of statistical black-box testing whichhas a good applicability for evaluating various types ofRNGs us statistical entropy estimation can partly solvethe problem that the entropy of some RNGs cannot bequantified by modeling

eNIST Special Publication 800-90B [22] (called 90B inthe text below) is a typical representative of the statisticalentropy estimations which is based on min-entropy andspecifies how to design and test entropy sources e finalversion of the 90B was officially published in January 2018[23] and replaces the second draft of 90B published in 2016[22] e predictors proposed by Kelsey et al [24] have abetter performance than the other estimators in this stan-dard and refer to a machine learning algorithm that attemptsto predict each sample in a sequence and updates its internalstate based on all observed samples However there are someproblems in these 90Brsquos predictors On the one hand everypredictor is designed to perform well only for the sources

with a certain statistical property as stated in [24] whichconstrains the application scope of the predictors On theother hand the execution efficiency of these predictors isinfluenced significantly when the selected sample space islarge It is analyzed that the time complexity of 90Brsquos pre-dictors has high order linear relationship with the size ofsample space In the released C++ code of 90Brsquos estimatorspublished in 2018 the bits per symbol of the samples are stilllimited to between 1 and 8 inclusive in order to prevent toolow execution efficiency erefore they are not likely to bewell applied to the entropy evaluation of entropy sourceswith unknown statistical behaviors multivariate and long-range correlation

As we know the output sequences of RNGs are also atype of time series Fortunately there has been increasingattention to using neural networks to model and forecasttime series which has been found to be an alternativemethod when compared with various traditional time seriesmodels [25ndash27] Some specific neural networks are appli-cable to the prediction of time series via approximating theprobability distribution function (PDF) and the timecomplexity varies linearly with the sample space Feedfor-ward neural networks (FNNs) and recurrent neural net-works (RNNs) are the typical representatives FNNs are thequintessential deep learning models In 1991 de Groot andWurtz [28] presented a detailed analysis of univariate timeseries forecasting using FNNs for two nonlinear time seriesFNNs are used to approximate some PDFs [29] RNNs are afamily of neural networks for processing sequential datawhich can also be used for time series forecasting [30ndash32]erefore it is worthwhile and feasible to study newmethods of entropy estimation for RNG based on neuralnetworks

11 Motivation In this paper we aim to propose severalsuitable and efficient predictors for the security evaluation ofRNGs especially for the min-entropy estimation of se-quences generated by entropy sources Since both FNN andRNN are suitable for time series prediction we design twopredictors based on these neural network models e ad-vantages of the proposed predictors are roughly described asfollows Because the selected neural network models havegood universality for the prediction of various types of timeseries in principle the designed predictors have wide ap-plicability for the entropy estimation of the output of en-tropy sources Moreover neural network-based predictorshave high execution efficiency which can properly handlethe difficulty in evaluating random numbers with longdependence and multivariate due to the huge timecomplexity

12 Contributions In summary we make the followingcontributions

(i) We are the first to adopt neural network models todesign predictors to estimate min-entropy for RNGand propose a suitable execution strategy whichmakes our approach applicable to predict both

2 Security and Communication Networks

stationary and nonstationary sequences generatedby different entropy sources

(ii) We conduct a series of experiments to verify theaccuracy of our predictors by using many typicalsimulated sources where the theoretical entropy canbe obtained from the known probability distribu-tion Additionally the computational complexityare evaluated theoreticallye results show that ourapproaches enable the entropy to be estimated withan error of less than 655 and the error is up to1465 for 90Brsquos predictors e time complexity ofour estimation is a linear relationship with samplespace which is high-order linear relationship withsample space for the 90B

(iii) We experimentally compare the advantages of ourpredictors over 90Brsquos predictors on accuracy ap-plication scope and execution efficiency e ex-perimental datasets include several typical real-world data sources and various simulated datasetse experimental results indicate our predictorshave higher accuracy execution efficiency andwider scope of applicability than those of 90Brsquospredictors Furthermore when the test sample spaceand sample size are continuously growing the ex-ecution efficiency of 90Brsquos predictors becomes toolow to estimate the entropy within an acceptabletime interval while our proposed predictors can stillcalculate the estimated results efficiently

e rest of the paper is organized as follows In Section 2we introduce fundamental definitions about min-entropythe evolution of the 90B and estimators especially predictorsdefined in this criterion and two typical neural networks wechoose to support our research In Section 3 we propose twopredictors based on neural networks for min-entropy esti-mation design an execution strategy and give the accuracyverification and complexity analysis of our predictorsFurthermore we apply our predictors to different types ofsimulated and real-world data sources and compare theadvantages of our predictors over 90Brsquos predictors on ac-curacy application scope and execution efficiency in Section4 We finally conclude our work in Section 5

2 Preliminaries

In this section firstly we introduce the fundamental conceptof min-entropy which is the core mathematical thought andassessment method in our work en we introduce theevolution process of the 90B and relevant research work onthis criterion After that we introduce the estimators definedin the 90B especially the predictive model-based estimatorswhich are the focus of this paper At last we describe twopredictive models based on neural networks which apply totime series forecasting and contribute to the design of newpredictors in our work

21 Min-Entropy of Entropy Source e concept of entropyis the core mathematical thought of the 90B and min-

entropy is the assessment method which is a conservativeway to ensure the quality of random numbers in the worstcase for some high-security applications such as the seed ofPRNGs e 90B [22] gives the definition of min-entropythe min-entropy of an independent discrete random variableX that takes values from the set A x1 x2 xk1113864 1113865 (k isin Zlowastdenotes the size of sample space) with the probabilityPr(X xi) pi(i 1 2 k) e min-entropy of theoutput is

Hmin min1leilek

minus log2pi( 1113857

minus log2 max1leilek

pi( 11138571113888 1113889(1)

IfX hasmin-entropyH then the probability of observingany particular value for X is no greater than 2minus H emaximum possible value for the min-entropy of a randomvariable with k distinct values is log2(k) which is attainedwhen the random variable has a uniform probability dis-tribution namely p1 p2 middot middot middot pk 1k

For the non-IID source such as Markov process Turanet al provided a calculation method of min-entropy in [22]A stochastic process Xi1113864 1113865iisinN that takes values from the finitesetA defined above is known as a first-orderMarkov chain if

Pr Xm+1 xm+11113868111386811138681113868 Xm xm Xmminus 1 xmminus 1 X0 x01113872 1113873

Pr Xm+1 xm+11113868111386811138681113868 Xm xm1113872 1113873

(2)

for any m isin Zlowast and all x0 x1 xm xm+1 isin A In adth-order Markov process the transition probabilities havethe property that

Pr Xm+1 xm+11113868111386811138681113868 Xm xm Xmminus 1 xmminus 1 X0 x01113872 1113873

Pr Xm+1 xm+11113868111386811138681113868 Xm xm Xmminus d+1 xmminus d+11113872 1113873

(3)

e initial probabilities of the process arepi Pr(X0 i) and the transition probabilities arepij Pr(Xm+1 j | Xm i) e min-entropy of a Markovprocess of length L is defined as

Hmin minus log2 maxx1 xL

px11113945

L

j1pxjminus 1xj

⎛⎝ ⎞⎠ (4)

e approximate value of min-entropy per sample can beobtained by dividing Hmin by L

22 NIST SP 800-90B and Its Entropy Estimation e 90B isa typical case that evaluates the quality of the entropy sourcefrom the perspective of min-entropy e evolution processof 90Bmainly includes the following three stages Comparedwith the second draft the final version in January 2018 hasmade some corrections

(i) e first draft of 90B [13] was published in August2012 which included five estimators proposed byHagerty and Draper [33] ese estimators are

Security and Communication Networks 3

collision test partial collection test Markov testcompression test and frequency test which aresuitable for sources that do not necessarily satisfythe IID assumption But these estimators give sig-nificant underestimates which were found by Kelseyet al [24] through experiments

(ii) Subsequently the 90B was updated to the seconddraft [22] in January 2016 and the estimators basedon predictors which were proposed by Kelsey et alfor the first time were adopted Compared with thefirst draft the second draft has the following mainchanges (1) Among these estimators the partialcollection test in the first draft was deleted and thefrequency test in the first draft was replaced by mostcommon value estimator and two new estimatorsincluding t-tuple estimator and longest repeatedsubstring estimator were added (2) e secondimportant update was the addition of four predic-tors for entropy estimation However the under-estimation problem was not solved in the seconddraft Zhu et al [34] proved the underestimationproblem for non-IID data from theoretical analysisand experimental validations and proposed animproved method

(iii) e final official version of 90B [23] was publishedin January 2018 In the final 90B estimators withsignificant underestimates such as collision esti-mator and compression estimator are modified tobe limited to only for binary inputs which mayreduce the overall execution efficiency of min-en-tropy estimation for nonbinary inputs In additionthe calculation process and method of key variablesfor min-entropy estimation are also corrected suchas Pglobalprime min-entropy of each predictor

221 Execution Strategy of Min-Entropy Estimation in 90Be 90B takes the following strategy to estimate the min-entropy It first checks whether the tested datasets are IID ornot On the one hand if the tested datasets are non-IID thereare ten estimators as mentioned in Section 222 for entropyestimation Each estimator calculates its own estimation in-dependently then among all estimations theminimum one isselected as the final estimation result for the entropy sourceOn the other hand if the tested datasets are considered IIDonly the most common value estimator is employed Finallyit applies restart tests and gives the entropy estimation

Note that this article only focuses on the analysis andcomparison with the 90Brsquos four predictors and the researchon other parts of the 90B is not considered in this study

222 Estimators in 90B In the final NIST SP 800-90B thereare ten estimators and each estimator has its own specificcharacteristics According to the underlying methods theyemployed we divide these estimators into three classesfrequency-based type entropy statistic based type andpredictors based typee following is a brief introduction ofthe ten estimators and the details can be found in [23]

(1) Predictor-Based Type Estimators e followings are fourpredictors proposed by Kelsey et al [24] for entropy esti-mation for the first time Kelsey et al utilized several ma-chine learning models served as predictors to improve theaccuracy of entropy estimation But these predictors per-form well only for specific distributions

(i) Multi Most Common in Window (MultiMCW)Predictor is predictor performs well in caseswhere there is a clear most common value but thatvalue varies over time

(ii) Lag Predictor e Lag subpredictor predicts thevalue that occurred N samples back in the sequenceis predictor performs well on sources with strongperiodic behavior if N is close to the period

(iii) Multi Markov Model with Counting (MultiMMC)PredictoreMultiMMC subpredictor predicts themost common value followed the previousN samplestring e range of the parameter N is set from 1 to16 is predictor performs well on data from anyprocess that can be accurately modeled by anNth-order Markov model

(iv) LZ78Y Predictor is predictor performs well onthe sort of data that would be efficiently compressedby LZ78-like compression algorithms

(2) Other Estimators Four frequency-based type estimatorsand two entropy statistic-based type estimators are de-scribed as follows e min-entropy of the former type ofestimators is calculated according to the probability of themost-likely output value and the other is based on entropicstatistics presented by Hagerty and Draper [33] Amongthem three estimators including Markov estimator Col-lision estimator and Compression estimator explicitlystate that they only apply to binary inputs in the final 90Bpublished in 2018 which may reduce the execution effi-ciency for nonbinary inputs as proved through experimentsin Section 43

(i) Most Common Value Estimate is estimator cal-culates entropy based on the number of occurrencesof the most common value in the input dataset andthen constructs a confidence interval for this pro-portion e upper bound of the confidence intervalis used to estimate the min-entropy per sample ofthe source

(ii) Markov Estimate is estimator computes entropybymodeling the noise source outputs as a first-orderMarkov model e Markov estimate provides amin-entropy estimate by measuring the depen-dencies between consecutive values from the inputdataset is method is only applied to binaryinputs

(iii) T-Tuple Estimate is method examines the fre-quency of t-tuples (pairs triples etc) that appears inthe input dataset and produces an estimate of theentropy per sample based on the frequency of those t-tuples

4 Security and Communication Networks

(iv) Longest Repeated Substring Estimate (LRS estimate)is method estimates the collision entropy (sam-pling without replacement) of the source based onthe number of repeated substrings (tuples) withinthe input dataset

(v) Collision Estimate is estimator calculates entropybased on the mean number of samples to see the firstcollision in a dataset where a collision is any repeatedvalue is method is only applied to binary inputs

vi) Compression estimate is estimator calculatesentropy based on how much the tested data can becompressed is method is also only applied tobinary inputs

223 Min-Entropy Estimation of 90Brsquos Predictors Eachpredictor in the 90B attempts to predict the next sample in asequence according to a certain statistical property of pre-vious samples and provides an estimated result based on theprobability of successful prediction Every predictor consistsof a set of subpredictors and chooses the subpredictor withthe highest rate of successful predictions to predict thesubsequent output As for each predictor it calculates theglobal predictability and local predictability with the upperbound of the 99 confidence interval and then derives theglobal and the local entropy estimations respectively Fi-nally the final entropy estimation for this predictor is theminimum of the global and the local entropy estimations

For estimating the entropy of a given entropy sourceeach predictor offers a predicted result after testing theoutputs produced by the source and provides an entropyestimation based on the probability of successful predictionsAfter obtaining the estimations from the predictors theminimum estimation of all the predictors is taken as the finalentropy estimation of the entropy source

e entropy estimation will be too loose if there is nopredictor applied to detect the predictable behaviors But if a setof predictors with different approaches are applied they canguarantee that the predictor which is the most effective atpredicting the entropy sourcersquos outputs determines the entropyestimation

23 Two Predictive Models Based on Neural NetworksNext we will introduce two main neural network models tohelp us design predictors for entropy estimation feedfor-ward neural networks (FNNs) and recurrent neural net-works (RNNs) respectively

231 FNNs e goal of a feedforward network is to ap-proximate some function flowast For an instance a classifierY flowast(X) maps an input X to a category Y A feedforwardnetwork describes a mapping Y f(X θ) and learns thevalue of the parameters θ that result in the best functionapproximatione principle of FNN is depicted in Figure 1

For each time step from t 1 to t n (n isin Zlowast denotesthe sample size) the FNN applies the following forwardpropagation equations

Ht f b + WXt( 1113857

Yt c + VHt

1113954Yt g Yt( 1113857

(5)

e parameters and functions that govern the compu-tation happening in a FNN are described as follows

(i) Xt is the input at time step t and is a vector composedof the previous inputs (ie Xt [Xtminus k Xtminus 1]where k refers to the step of memory)

(ii) Ht is the hidden state at time step t where the biasvectors b and input-to-hidden weights W are de-rived via training e number of hidden layers andthe number of hidden nodes per layer are definedbefore training which called hyperparameters inneural networks

(iii) Yt is the output at step t where the bias vectors c andhidden-to-output weights V are derived viatraining

(iv) 1113954Yt is our predictive output at time step t which wouldbe a vector of probabilities across our sample space

(v) e function f(middot) is a fixed nonlinear functioncalled activation function and the function g(middot) isan output function used in the final layer of a neuralnetwork Both of the two functions belong tohyperparameters which are defined before training(Section 32)

e models are called feedforward because the infor-mation flows through the approximate function from theinput Xt through the internal computations used to updatethe model to define f(middot) and to the output 1113954Yt finally Be-sides there is no feedback connection namely the outputsof the model are fed back into itself

232 RNNs If adding the feedback connections to thenetwork then it is called RNNs In particular the RNNrecords the information that has been calculated so far anduse it for the calculation of the present output e principleof RNNs is depicted in Figure 2

For each time step from t 1 to t n the RNN appliesthe following forward propagation equations

Ht f b + WHtminus 1 + UXt( 1113857

Yt c + VHt

1113954Yt g Yt( 1113857

(6)

e parameters and functions that govern the compu-tation happening in a RNN are described as follows

Xt Ht Yt

Output layerInput layer Hidden layer

Figure 1 Feedforward neural network

Security and Communication Networks 5

(i) Xt is the input at time step t and is one-hot vectorFor example if Xt 1 and the sample spaceS 0 1 then Xt [0 1]

(ii) Ht is the hidden state at time step t It is theldquomemoryrdquo of the networkHt is calculated based onthe previous hidden state Htminus 1 and the input at thecurrent stepXt bU andW denote the bias vectorsinput-to-hidden weights and hidden-to-hiddenconnection into the RNN cell respectively

(iii) Yt is the output at step t c and V denote the biasvectors and hidden-to-output weights respectively

(iv) 1113954Yt is our predictive output at time step t whichwould be a vector of probabilities across our samplespace

(v) Similarly the function f(middot) is an activation functionand g(middot) is an output function which are definedbefore training (Section 32)

3 Predictors for Min-Entropy EstimationBased on Neural Network

e neural network is able to approximate the various PDFsand the complexity of neural network is increased slower(linear relationship) as the sample space increases Moti-vated by [24] we propose two predictive models based onneural networks for min-entropy estimation Next wepresent the execution strategy of our min-entropy estima-tors provide the choices of the important hyperparametersand give the analysis on the accuracy and complexity of ourpredictive models to prove that our design is feasible

31 Strategy of Our Predictors for Min-Entropy Estimatione execution strategy of our min-entropy estimator isdepicted in Figure 3 which consists of model training andentropy estimation Both of our proposed two predictivemodels (namely predictors) which are based on FNN andRNN respectively follow the same strategy

e benefit of this strategy is that it applies not only tostationary sequences generated by entropy sources but alsoto nonstationary sequences of which the probability dis-tribution is time-varying On the one hand in our strategyin order that the model can match the statistical behavior ofthe data source well we use the whole input dataset to trainand continuously update the model On the other hand toeffectively estimate the entropy of the data source we use thepredictive model to compute the min-entropy only when thepredictive model is updated enough to characterize thestatistical behavior of the tested dataset Specifically for thetesting dataset which is used for computing entropy

estimation we preset the testing dataset as a part of thewhole observations and utilize a proportion parameter(c isin [0 1]) to determine the size of testing dataset namelythe last c of the inputs are used for computing entropy whilethe model is also updating

e workflow of the min-entropy estimator based onneural network is listed as the following steps

(1) Initialization choose one model (FNN or RNN) andset the hyperparameters of the model

(2) Input data input the tested dataset and judge theproportion of the remaining data in the entire testeddataset If the proportion of the remaining obser-vations le c record the accuracy for predicting thenext sample to be observed else continue

(3) Prediction predict the current output according tothe forward propagation equations

(4) Comparison and loss function observe the realoutput and compare it to the predicted value encompute the loss function

(5) Update predictive model compute the gradient andupdate the predictive model If the entire datasetruns out please turn to Step 6 else please repeat Step2 sim Step 5

(6) Calculate the predictive accuracy and probabilityobtain the accuracy of the executed predictive modelfrom the last c observations including global pre-dictions and local predictions and compute theprobability respectively

(7) Calculate the min-entropy calculate the min-en-tropy by the obtained probability After obtaining theestimations from the two predictors (FNN andRNN) respectively the minimum entropy of the twopredictors is taken as the final entropy estimation(namely min-entropy) of the tested dataset

Initialize the predictive model

Input the datasets to be observed

Predict the next sample

Unobservedproportion le γ

Comparison and compute theloss function

Update the predictive model

Record the accuracy

All the samplesruns out

Calculate the predictionprobability

N

Y

N

Y

Output min-entropy

Figure 3 Execution strategy of our predictors for min-entropyestimation based on neural network

Xt Ht Yt

Output layerInput layer Hidden layer

Htndash1

Figure 2 Recurrent neural network

6 Security and Communication Networks

Combining the calculation principle of min-entropy inSection 21 we can see the lower bound on the probabilityof making a correct prediction gives an upper bound on theentropy of the source In other words the more predictablea source is the larger probability of making correct pre-dictions is and the less entropy it has erefore a modelthat is a bad fit for the source or not fully trained will resultin inaccurate predictions a low accurate predictionprobability and a too-high entropy estimation of thesource So the models that are bad fit for the source or notfully trained can give big overestimates but notunderestimates

Further we can confirm that adding one more pre-dictor will not do any harm and conversely will make theentropy estimation much more accurate From the exe-cution strategy we can see that if all the predictors whosemodels are not matched for the noise source are usedalongside a predictor whose underlying model matchesthe sourcersquos behavior well then the predictor whichmatches the source well will determine the final entropyestimation

32 Choices of Important Parameters

321 Hyperparameters for FNN and RNN In neural net-works the choices of modelsrsquo hyperparameters have sig-nificant influences on the computational resource andperformance required to train and test erefore thechoices of hyperparameters are crucial to neural networksNext we illustrate the choices of some key hyperparameters

(1) Hidden Layers and Nodes Comprehensively balance theaccuracy and efficiency of our predictors in this paper forthe FNNmodel except for the multivariate M-sequences weset the number of hidden layers as 2 and the number ofhidden nodes per layer is 10 and 5 respectively While forthe multivariate M-sequences after extensive tests thenumber of hidden nodes per layer shall be larger to givebetter results By observing the results finally we set thenumbers as 35 and 30 respectively

(2) Step of Memory e step of memory determines thenumber of previous samples used for predicting the currentoutput Generally speaking the larger the value the betterthe performance However the computational resources(memory and runtime) increase as the step of memorygrows In this paper we set the step of memory as 20 bytrading off performance and resource at is to say as forthe FNN the input at time step t is the previous 20 observedvalues and as for the RNN the hidden layer contains 20unfolded hidden units

(3) Loss Function e loss function refers to the functionthat can measure the difference between the predicted valuesand the true values e total loss for a given sequence ofx x1 xn1113864 1113865 values paired with a sequence ofy y1 yn1113864 1113865 values would then be just the sum of thelosses over all the time steps For example if Lt is thenegative log-likelihood of yt given x1 xt then

L x1 xn1113864 1113865 y1 yn1113864 1113865( 1113857

1113944t

Lt

minus 1113944t

log2 pmodel yt

1113868111386811138681113868 x1 xt1113872 11138731113872 1113873

(7)

where pmodel(yt | x1 xt) is given by reading the entry foryt from the modelrsquos output vector 1113954yt e models are trainedto minimize the cross-entropy between the training data andthe modelsrsquo predictions (ie equation (7)) which isequivalent to minimizing the mean squared error (ie theaverage of the squares of the errors or deviations)

(4) Learning Rate Learning rate is a positive scalar deter-mining the size of the step To control the effective capacity ofthe model we need to set the value of learning rate in anappropriate range e learning rate determines how fast theparameter θmoves to its optimum value If the learning rate istoo large gradient descent can inadvertently increase ratherthan decrease the training error namely the parameters arelikely to cross the optimal value However if the learning rateis too small the training is not only slower but may becomepermanently stuck with a high training error So the learningrate is crucial to the performance of the model

Based on the above analysis we pick the learning rateapproximately on a logarithmic scale ie the learning ratetaken within the set 01 001 10minus 3 10minus 4 10minus 51113864 1113865 At thebeginning of model training we set the learning rate aslarger value to faster reach the optimum value en withthe number of training increasing we set the smaller valuefor not crossing the optimal value e detailed settings aredescribed in Algorithm 1

(5) Activation Function In general we must use a nonlinearfunction to describe the features Most neural networks doso using an affine transformation controlled by learnedparameters followed by a fixed nonlinear function called anactivation function Activation function plays an importantrole in neural networks e commonly used activationfunctions include tanh(middot) relu(middot) and sigmoid functionwhich is defined as σ(middot) ie equation (8)) in this paperBecause the sigmoid function is easy to saturate whichcauses the gradient to change slowly during training it isgenerally no longer used as an activation function except inRNN-LSTM (long short term memory)

σ(x) 1

1 + eminus x (8)

After many attempts (ie we compare the efficiency andperformance by means of the exhaustive method manually)we finally choose the tanh(middot) and relu(middot) as activationfunctions for FNN and RNN respectively ey can beexpressed as

tanh(x) 1 minus eminus 2x

1 + eminus 2x (9)

Compared with σ(middot) tanh(middot) is symmetrical about theorigin In some cases this symmetry can give better

Security and Communication Networks 7

performance It compresses the real-valued input to a rangeof minus 1 to 1 and the mean of its output is zero which makes itconverge faster than the σ(middot) and reduce the number ofiterations erefore this is suitable for activation functionand the zero-centered training dataset contributes theconvergence speed of model training

relu(x) max(0 x) (10)

where relu(middot) is currently a popular activation function It islinear and gets the activation value which requires the onlyone threshold We choose this function based on the fol-lowing two considerations On the one hand it solvesvanishing gradient problem of back propagation throughtime (BPTT) algorithms for the reason that the derivative ofrelu(middot) is 1 On the other hand it greatly improves the speedof calculation because it only needs to judge whether theinput is greater than 0

(6) Output Function e output function is used in the finallayer of a neural network model e predictors for timeseries are considered as a solution to a multiclass classifi-cation problem so we take softmax(middot) as the output func-tion which can be expressed as

yi softmax zi( 1113857 ezi

1113936si1 ezi

(11)

where s is the size of sample space and softmax(zi) denotesthe probability of the output is zi and satisfies that1113936

si1 yi 1

ie the sum of the probability of all the outputs is equal to 1Such networks are commonly trained under a cross-entropyregime (ie the loss function mentioned above)

322 Selection of Testing Dataset Length To better estimatethe entropy of the data source the length of the testingdataset is very important for min-entropy estimation forrandom numbers generated by different types of sources Inreality most entropy sources are time-varying (namelynonstationary) which means the probability distribution ofthe output sequences from the source is changing over timeSo the length of the testing dataset shall be adaptive to thetype of the source

erefore as described in Section 31 we utilize c todetermine the size of testing dataset Specifically in ourstrategy for the stationary entropy source of which theprobability distribution of the outputs from the source is notchanging over time the parameter c is preset to 20

Relatively for the nonstationary entropy source all obser-vation points (namely c is 100) need to serve as the testingdataset

To verify the reasonableness of the c value we computethe root-mean-squared error (RMSE) of the lowest esti-mations of our predictors over 80 sequences from the fol-lowing simulated datasets generated by nonstationarysource

(i) Time-Varying Normal Distribution Rounded to In-tegers e samples are subject to a normal distri-bution and rounded to integer values but themean ofthe distribution moves along a sine curve to simulatea time-varying signal

e RMSE ie

(1N) 1113936Ni1 ( 1113954Hmin minus Hmin)2

1113969

refers tothe arithmetic square root for the mean of the squares of theerrors or deviations for each class of simulated sources Notethat here N indicates the number of test samples 1113954Hminindicates the estimated result for each sample and Hminmeans the theoretical result for each sample In other wordsthe smaller the RMSE is the closer the estimated result is tothe theoretical entropy which indicates the predictor has abetter accuracy

As shown in Table 1 we can see that for the time-varyingdata source only when the c is 100 (namely the entiredataset shall be used for min-entropy estimation) the pre-dictors can give themost accurate resultsismeans when theprobability distribution of data sources is varying with timethe part of the input dataset cannot represent the overalldistribution of the input dataset so the part of the input datasetcannot accurately give the estimation result of the entire inputdataset Besides for the stationary sources it is reasonable thatc is preset to 20 because the estimated results obtained byour method are very close to the correct entropy (in theory) ofthe selected entropy source as presented in Section 41

33 Evaluation onOurPredictors In this section we conductsome experiments on simulated datasets to verify the accuracyof our proposed predictors for the min-entropy estimationand compare the experimental results with theoretical resultsIn addition we have a theoretical analysis of the complexity ofour predictors Note that in Section 4 we will apply ourpredictors to different data sources and provide the com-parison on our predictors with 90Brsquos predictors

331 Accuracy Verification We train our predictive modelsFNN and RNN on a number of representative simulated datasources (including stationary and nonstationary entropysources) of which the theoretical entropy can be obtainedfrom the known probability distribution of the outputsSimulated datasets are produced using the following dis-tribution families adopted in [24]

(1) Simulated Datasets Generated by Stationary Sources

(i) Discrete Uniform Distribution e samples areequally likely which come from an IID source

(1) if train numlt train dataset size3 then(2) learning rate⟵ 01 001 (3) else if train numlt train dataset size15 then(4) learning rate⟵ 001 10minus 3 10minus 41113864 1113865(5) else(6) learning rate⟵ 10minus 4 10minus 51113864 1113865(7) end if

ALGORITHM 1 Setting of learning rate

8 Security and Communication Networks

(ii) Discrete Near-Uniform Distribution All samples areequally likely except one which come from an IIDsource A certain sample has a higher probabilitythan the rest

(iii) Normal Distribution Rounded to Integers esamples are subject to a normal distribution androunded to integer values which come from an IIDsource

(iv) Markov Model e samples are generated using adth-order Markov model which come from a non-IID source

(2) Simulated Datasets Generated by Nonstationary Sourcesese datasets are the same as those used in Section 322

For every class listed above we generate a set of 80simulated datasets each of which contains 106 samples andestimate min-entropy by using predictive models FNN andRNN respectively For each dataset the theoretical min-entropy Hmin is derived from the known probabilitydistribution

From Figures 4ndash9 the abscissa in the figure representsthe theoretical entropy of the test sample and the ordinaterepresents the estimated entropy of the test sample Figure 4shows the estimated entropy results for the 80 simulateddatasets with uniform and near-uniform distributions re-spectively From Figures 4(a) and 4(b) we see that the es-timated results given by our proposed two predictive models(FNN and RNN) are almost consistent with the theoreticalentropy for both uniform and near-uniform distributionsSo the final estimated result which is the minimum result ofthe two predictive models is also basically consistent with thetheoretical entropy Figure 5 shows the estimated entropyresults for the 80 simulated datasets with normal distribu-tions and time-varying normal distributions respectivelyFrom Figures 5(a) and 5(b) we can see that the estimatedresults given by our proposed two predictive models areclose to the theoretical entropy with normal distributionsand time-varying normal distributions According to ourexecution strategy here we calculate min-entropy estima-tions using the whole input dataset for time-varying normaldistributions

Figure 6 shows the estimated results of Markov distri-butions we can see that both of our predictive models give anumber of overestimates when applied to the Markovsources particularly with the theoretical entropy increasing

Table 2 shows the relative errors (namely|( 1113954Hmin minus Hmin)Hmin|lowast100) between the theoretical re-sults and the estimated results of FNN and RNN to furtherreflect the accuracy of the models 1113954Hmin and Hmin have thesame meaning as in Section 322 We see that the entropy tobe estimated with an error of less than 602 for FNN and7 for RNN for the simulated classes respectively

Based on the above accuracy verification of our pre-dictors with simulated datasets from different distributionswhat we can be sure is that out predictors can give almostaccurate results except Markov distributions

332 Complexity Analysis To analyze the usability of ourpredictors in terms of execution efficiency we derive thefollowing computational complexity through the analysis oftheory and principle of implementation

We believe that the computational complexity of entropyestimators used for RNG evaluation mainly comes from thesample space and sample size For ease of analysis we definethe following parameter n as the sample size which indicatesthe length of the sample s as the sample space which meansthe kinds of symbols in the sample (ie s 8means there are8 symbols in the sample and the bit width of each symbol islog2(8) 3 such as 010 110 111 ) and k denotes themaximum step of correlation which is set as a constant in90Brsquos predictors (k 16) and our predictors (k 20)

rough the analysis of the implementation the com-putational complexity of the final 90Brsquos predictors [23]mainly comes from the MultiMMC predictor and is of orderO(sk middot n + 2k middot n middot log2(s)) which is mainly linear timecomplexity of n and k-order polynomial time complexity ofs While the computational complexity of our predictor is oforder O(s middot n) which is linear time complexity of s and n Itcan be seen that the computational complexity of ourpredictors is much lower than that of the 90Brsquos predictors

It is important to note that the MultiMMC predictorrequires sk≪ n otherwise this predictor cannot give ac-curate estimated results statisticallyat is to say when the sis increasing the MultiMMC predictor requires largersample size in order to estimate the entropy accurately

From the above analysis we can see our predictors havelower computational complexity We will give the experi-mental proof in Section 43

4 Comparison on Our Predictors With90BrsquoS Predictors

In this section a large number of experiments have beendone to evaluate our proposed predictors for entropy esti-mation from the aspects of accuracy applicability and ef-ficiency by applying our predictors to different simulateddata and real-world data For the experiments mentionedabove we compare the results with the final 90Brsquos predictors[23] to highlight the advantages of our work Similarly ourpredictors in these experiments compute an upper-bound ofmin-entropy estimation at the significance level α 001which is the same as 90Brsquos predictors

41 Comparison on Accuracy

411 Simulated Data e simulated datasets are producedusing the same distribution families as described in Section331 Further we append the following two new distributionfamilies such as pseudorandom sequence and post-processing sequence which are representative and com-monly used in reality

Table 1 Error measures of final estimations of our predictors fornonstationary sources with different c values

c 01 02 04 06 08 1RMSE 00911 01364 00788 00817 00219 00149

Security and Communication Networks 9

eoretical entropyFNNRNN

0

1

2

3

4

5

6

7

8Es

timat

ed en

trop

y pe

r sam

ple

1 2 3 4 5 6 7 80eoretical entropy per sample

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 4 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for(a) uniform distributions and (b) near-uniform distributions

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entro

py p

er sa

mpl

e

(b)

Figure 5 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for (a) normaldistributions and (b) time-varying normal distributions

10 Security and Communication Networks

Theoretical entropyFNNRNN

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

Figure 6 Comparison of estimated results obtained from our two predictive models with the theoretical entropy for Markov distributions

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 7 Comparison of the min-entropy estimation obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) uniform distributions and (b) near-uniform distributions

Security and Communication Networks 11

(i) M-Sequence A maximum length sequence which isa type of pseudorandom binary sequence ([35])

(ii) Nonuniform Distribution by Postprocessing UsingLFSR e samples are processed using a linearfeedback shifting register (LFSR) which come froman IID source ([35])

For every distribution mentioned above we also gen-erate a set of 80 simulated datasets each of which contains106 samples and estimate min-entropy by using our pro-posed predictors and final 90Brsquos predictors [23]

Figure 7 shows the estimated min-entropy results for the80 simulated datasets with uniform distributions and near-uniform distributions respectively From Figures 7(a) and7(b) we see that several points of the results obtained fromthe 90Brsquos predictors are apparently underestimated whichmay result from the overfitting phenomenon Comparedwith 90Brsquos predictors our predictors provide more accurateresults

Figure 8 shows the estimated min-entropy results fornormal distributions and time-varying normal distributionsrespectively From Figures 8(a) and 8(b) we can see that theestimated results given by our predictors are close to thetheoretical entropy with normal distributions and time-varying normal distributions However the lowest entropyestimation results obtained from the 90Brsquos predictors givesignificant underestimates

Figure 9 shows the estimated min-entropy results forMarkov distributions We can see that the 90Brsquos predictorsalmost give underestimates compared with the theoreticalentropy while estimated results given by our predictors aremuch closer to the theoretical entropy than those obtainedfrom 90Brsquos predictors

To further obviously compare the accuracy of our and90Brsquos predictors we apply the predictors to the M-sequenceand the non-uniform distribution sequence by post-pro-cessing using LFSR and their theoretical entropy is a knownand fixed value

It is further confirmed that the higher stage (the max-imum step of correlation) M-sequence and nonuniformdistribution sequence by postprocessing using LFSR are ableto pass the NIST SP 800-22 statistical tests [8] e estimatedresults are listed in Tables 3 and 4 and the lowest entropyestimations from 90Brsquos predictors and our predictors foreach stage are shown in bold font

For M-sequence and nonuniform distribution by post-processing using LFSR the MultiMMC predictor presentedin the final 90B gives the most accurate entropy estimationresults for the stage le 16 However when the stage ofM-sequence and nonuniform distribution by postprocessingusing LFSR is greater than 16 the MultiMMC predictorcannot give accurate entropy estimation result because thispredictor is parameterized by k isin 1 2 16 (k is themaximum step of correlation) Perhaps we could set the

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7Es

timat

ed en

trop

y pe

r sam

ple

(a)

eoretical entropy90Brsquos predictorsOur predictors

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e1 2 3 4 5 6 70

eoretical entropy per sample

(b)

Figure 8 Comparison of the min-entropy estimations obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) normal distributions and (b) time-varying normal distributions

12 Security and Communication Networks

parameter of the MultiMMC predictor as a greater range toachieve a more accurate estimated result for the higher stagebut the time complexity grows exponentially with the pa-rameter k as we analyzed in Section 332 Moreover theFNN model can also give accurate estimated results eventhough the stages of M-sequence and LFSR are greater than16 However the RNN model can give accurate estimated

results only when the stage is 8erefore the FNNmodel ismore matched to M-sequence and nonuniform distributionby postprocessing using LFSR than RNN

We also compute the relative errors of estimated resultsfrom 90Brsquos predictors and our predictors over 80 se-quences from each class of simulated sources We calculatethe relative errors using the min-entropy obtained from90Brsquos predictors (the lowest estimation result of 90Brsquos fourpredictors) and our predictors (the lowest estimation re-sult of FNN and RNN) respectively As illustrated inTable 5 it shows that for all five classes of simulatedsources the errors of our predictors are lower than that ofthe 90Brsquos predictors Specially our approaches enable theentropy to be estimated with an error of less than 655but it is up to 1465 for 90Brsquos predictors Overall thisindicates that our proposed predictors have a betterperformance than that of 90Brsquos predictors on accuracy forboth stationary sequences and nonstationary sequenceswhich is consistent with the conclusion drawn in thefigures above

From Tables 2ndash4 we also find that the accuracy of theRNN predictive model is slightly higher than that of theFNN predictive model except for the cases of the Markovsources M-sequence and nonuniform distribution bypostprocessing using LFSR

We will further verify the applicability for time-varyingsources in Section 42 erefore through the evaluation onthe entropy estimation results of the above simulateddatasets we see that our proposed predictors are superior inaccuracy compared with the 90Brsquos predictors

412 Real-World Data We further apply our predictors tothe datasets which are generated from the RNGs deployed inthe real-world In fact the theoretical entropy of per sampleis unknown for these real-world sources so no error can becompared like the simulated datasets for the predictorsHowever the estimated results from the predictors pre-sented here can still be compared to the 90Brsquos predictorsbased on the knowledge that underestimates from thepredictors have theoretical bounds

Datasets of real-world data are produced using thefollowing approaches e first two are adopted in [24] andthe others are commonly used typical RNGs e estima-tions of the real-world sources are presented in Table 6

(i) RANDOMORG is is a service that providesrandom numbers based on atmospheric noise and isused in [24] It allows the user to specify the

Theoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

Figure 9 Comparison of the min-entropy estimations obtainedfrom our proposed predictors and 90Brsquos predictors with the the-oretical entropy for Markov distributions

Table 2 Relative errors of FNN and RNN estimation results

Simulated data class FNN () RNN ()Uniform 165 16Near-uniform 160 152Normal 117 108Time-varying normal 212 184Markov 602 7

Table 3 Estimated results for M-sequence (Hmin 0000)

Stage 8 10 12 14 16 18 20MultiMCW 0991 0996 0988 0989 0993 0999 lowast 1000Lag 1000 1000 1000 1000 1000 1000 1000MultiMMC 0000 0000 0000 0000 0000 1000 1000LZ78Y 1000 1000 1000 1000 1000 1000 0997 lowast

FNN 0000 0000 0000 0000 0000 0000 0000RNN 0000 1048 1007 1002 0996 09920 09997

Table 4 Estimated results for nonuniform distribution by post-processing using LFSR (Hmin 0152)

Stage 8 10 12 14 16 18 20MultiMCW 0440 0595 0743 0721 0998 0994 0998Lag 0581 0581 0680 0680 0992 0994 0999MultiMMC 0151 0153 0158 0181 0234 0995 0996LZ78Y 0567 0995 0766 0679 0997 0996lowast 0994lowast

FNN 0151 0145 0149 0147 0149 0142 0144RNN 0149 0947 1012 0998 1012 0997 0985

Security and Communication Networks 13

minimum and maximum values that are outpute sequence used here consists of bits

(ii) Ubldit TrueRNGpro TrueRNGpro provides a steadystream of random numbers through a USB CDCserial port which is a USB randomnumber generatorproduced by Ubldit is entropy source is also usedin [24] e sequence used here consists of bits

(iii) Linux kernel entropy source e Linux kernelrandom generator is used for the generation of areal-world sequence without any processing esequence used here is the last bit of per symbol

(iv) Linuxdevurandomedevurandom [6] of Linuxis used for the generation of a real-world sequencewith strict processing e sequence used hereconsists of bits

(v) Windows RNG Windows RNG [5] is used for thegeneration of a real-world sequence by calling aCrypto APIe sequence used here consists of bits

As illustrated in Table 6 the lowest entropy estimationfor each source is shown in bold font We see that ourpredictors perform better than 90Brsquos predictors because thelowest entropy estimation is always obtained from our workfor each real-world source Furthermore for Linux kernelentropy source we find that both of the predictor Lag andMultiMMC are able to give lower estimation results Itindicates that Linux kernel entropy source has periodicityand conforms to the Markov model which is well under-stood because the randomness of Linux kernel entropysource comes from human behaviors such as manipulatingthe mouse and keyboard In our work compared with theentropy estimations for other real-world sources FNN fitsmuch better than RNN for Linux kernel entropy sourcewhich is consistent with the previous view that FNN per-forms well in testing Markov sources

42 Comparison on the Scope of Applicability After evalu-ating the accuracy we further validate the scope of appli-cability of our proposed predictors and compare them withthat of the 90Brsquos predictors Kelsey et al [24] stated that eachof the 90Brsquos predictors performs well only for a specialdistribution as described in Section 221 To prove ourpredictor has better applicability the following four simu-lated datasets are generated which are suitable for eachpredictor employed in the final 90B

(i) Time-Varying Sourcese probability distribution ofdata sources is varying with timeeMCWpredictor

predicts the current output according to previousoutputs in a short period of time and thus the MCWpredictor performs well in these data sources

(ii) Periodic Sources e data source changes periodi-cally e lag predictor predicts the value that oc-curred samples back in the sequence as the currentoutput and thus the lag predictor performs well onsources with strong periodic behavior

(iii) Markov Sources e data sources can be modeledby the Markov model e MultiMMC predictorpredicts the current output according to theMarkovmodel and thus the MultiMMC predictor performswell on data from any process that can be accuratelymodeled by a Markov model

(iv) LZ78Y Sources e data sources can be efficientlycompressed by LZ78-like compression algorithmswhich applies to the LZ78Y predictor well

For each above simulated source we generate a set of 10simulated datasets each of which contains 106 samples andthe min-entropy is estimated by our and 90Brsquos predictorse final result for a predictor is the average value of 10estimated results corresponding to the 10 simulated datasetsfor one simulated source

421 Time-Varying Sources Firstly we generate the time-varying binary data which is suitable for the statistical be-haviors of the MCW predictor presented in the 90B Table 7shows the entropy estimation results for time-varying data

As shown in Table 7 symbol gradual(x) (x isin [0 1] thesame below) is defined as a simulated source that theprobability of output ldquo0rdquo changes gradually from x to 1 minus x

with time Symbol period(x) is defined as a simulated sourcethat the probability of output ldquo0rdquo changes periodically withtime and the probability varies from x to 1 minus x in one periode period length is set to 20 of the entire input datasetSymbol sudden(x) is defined as a simulated source that theprobability of output ldquo0rdquo changes suddenly with timenamely the probability is set to x for the first half of the inputdataset and 1 minus x for the last half

From Table 7 the estimation results for MCW predictorand our work are shown in bold font We see that the MCWpredictor gives the lowest and most accurate entropy esti-mations for the three types of time-varying data mentionedabove but it gives a little underestimates at gradual(02) andperiod(02) It is confirmed that the time-varying sourcesmentioned above match with the statistical behaviors of theMCW predictor Relatively we find that our proposedpredictive models are all capable to obtain the satisfiedentropy estimations that are close to the correct valueserefore it is proved that our proposed predictive modelsare suitable for the time-varying datamentioned above Notethat we calculate the min-entropy estimate according to theentire dataset rather than the last 20 of the input dataset forthese time-varying sources Because the probability distri-bution is varying with time the part of the input datasetcannot represent the overall distribution of the input dataset

Table 5 Relative errors of the final estimations of 90Brsquos predictorsand our predictors for five classes of simulated sources

Simulated data class 90Brsquos predictors () Our predictors ()Uniform 437 153Near-uniform 347 159Normal 608 157Time-varying normal 347 172Markov 1465 655

14 Security and Communication Networks

422 Periodic Sources Secondly we generate periodic datawhich are suitable for the statistical behaviors of the lagpredictor presented in 90B e following is entropy esti-mation results for periodic sequences e data source iscompletely obeying the periodic rule so the correct entropyis zero e bit width of samples is traversed from 2 to 8

As shown in Table 8 the estimation results for the lagpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated periodicsources we confirm that the lag predictor is suitable for theentropy estimation of this type of source as expected Rel-atively the RNN can also give the accurate min-entropyestimates ie estimated results are zeros us our pro-posed predictive models are suitable for the entropy esti-mation of the (strong) periodic data In addition theMultiMMC predictor can also give the accurate min-entropyestimations is is reasonable because periodicity is also aform of correlation

423 Markov Sources Next we generate multivariateM-sequences as Markov sources which fit the statisticalbehaviors of the MultiMMC predictor Specifically themultivariate M-sequences are composed of multiple M-se-quences with different initial states Due to the determinacyof this type of sequences the correct entropy is zero e bitwidth of the samples is also traversed from 2 to 8 emaximum step of correlation used here is set as 8 Table 9shows the estimated results for multivariate M-sequences

From Table 9 the estimation results for MultiMMCpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated Markovsources we confirm that theMultiMMC predictor is suitablefor the entropy estimation of this type of source as expectedRelatively the RNN can also give the accurate min-entropyestimations ie estimated results are zeros us our

proposed predictive models are suitable for the Markovsources

424 LZ78Y Sources Finally we verify the applicability ofthe LZ78Y sources is type of entropy source is difficult togenerate by simulating However we can still draw theconclusion that our proposed predictive models can beapplied to the LZ78Y sources according to Tables 8 and 9 initalic font Because the periodic data and Markov sequencesare compressible

425 Summary on Applicability Scope of Our PredictorsBy analyzing the experimental results of the above fourspecific simulated sources each of which is oriented towardsa certain predictor in the 90B we have a conclusion that ourpredictors can provide accurate estimated results of entropySo the proposed predictors are well applied to these entropysources as well as the 90Brsquos predictors In addition com-pared with 90Brsquos predictors our predictors have a betterperformance on the scope of applicability for testing the

Table 7 Entropy estimates for time-varying data

Data class Correct90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNgradual(02) 06345 05290 07808 07240 07790 06288 06289gradual(03) 07437 07378 09221 08416 09243 07430 07460gradual(04) 08645 08631 09786 09518 09739 08648 08637period(02) 06345 05537 07428 05537 07669 06205 06209period(03) 07437 07393 09218 08476 09233 07377 07375period(04) 08645 08639 09767 09632 09796 08653 08632sudden(02) 03219 03203 04663 03386 04484 03217 03229sudden(03) 05146 05110 05857 09984 07663 05110 05119sudden(04) 07370 07338 08699 09984 09389 07339 07345

Table 8 Entropy estimates for periodic sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 16458 00000 00000 11817 00079 000003 23318 00000 00000 15957 01315 000004 29147 00000 00000 18016 04748 000005 33269 00000 00000 14586 08898 000006 39092 00000 00000 08322 34944 000007 44908 00000 00000 03973 34960 000008 44919 00000 00000 02027 35408 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 6 Entropy estimates for real-world sources

Real-world sources90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNRANDOMORG 09951 09963 09966 09976 09802 09954Ubldit TrueRNGpro 09979 09955 09973 09966 09934 09728Linux kernel entropy source 06173 01232 01269 06164 01230 03068Linuxdevurandom 09952 09935 09990 09964 09983 09911Windows RNG 09953 09986 09975 09984 09833 09853

Security and Communication Networks 15

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 2: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

At present in order to guide the designers users andassessors to analyze the security of RNGs many researchorganizations or individuals have provided a number ofapproaches for testing and evaluating the RNGs eseapproaches can be roughly divided into two classes sta-tistical property test and entropy estimation Specifically thestatistical property test is proposed at first such as the NISTSpecial Publication 800-22 [8] AIS 31 [9] Diehard battery[10] and TestU01 [11] which detects whether the outputsequence has obvious statistical defects Because it onlyfocuses on the statistical properties of outputs rather thanthe internal structure and generation principle of RNGs thestatistical property test is a universal (black-box) testingmethod for various types of generators and it is easy tooperate With in-depth understanding for the randomnessin the past few years the concept of ldquoentropyrdquo has beenproposed to evaluate the security of RNGs Entropy is ameasure of uncertainty that is appropriate to reflect theamount of randomness erefore several major stan-dardization organizationsrsquo criterions recommend to adoptthe entropy to quantify the randomness (unpredictability) ofthe outputs of RNGs such as the ISOIEC 18031 [12] andAIS 31 [9] ere are many types of methods for measuringentropy including Shannon entropy Renyi entropy andmin-entropy Min-entropy is a very conservative measurewhichmeans the difficulty of guessing themost-likely outputof entropy sources [13]

However the entropy estimation of entropy sources is avery challenging task because the common assumptions maynot be consistent with the real conditions and the distri-bution of the outputs is unknown Nowadays there are twoways to implement entropy estimation theoretical entropyestimation (stochastic model) and statistical entropy esti-mation A theoretical proof for the security of RNGs can beachieved from a suitable stochastic model as in [14ndash17] Butthe modeling is always difficult and complex because it isalways based on the specific structure of a RNG and anappropriate assumption on the entropy sourcersquos behaviorand even some structures of RNGs still do not have a suitablemodel [18ndash21] Relatively statistical entropy estimation isstill based on the idea of entropy estimation but it isimplemented bymeans of statistical black-box testing whichhas a good applicability for evaluating various types ofRNGs us statistical entropy estimation can partly solvethe problem that the entropy of some RNGs cannot bequantified by modeling

eNIST Special Publication 800-90B [22] (called 90B inthe text below) is a typical representative of the statisticalentropy estimations which is based on min-entropy andspecifies how to design and test entropy sources e finalversion of the 90B was officially published in January 2018[23] and replaces the second draft of 90B published in 2016[22] e predictors proposed by Kelsey et al [24] have abetter performance than the other estimators in this stan-dard and refer to a machine learning algorithm that attemptsto predict each sample in a sequence and updates its internalstate based on all observed samples However there are someproblems in these 90Brsquos predictors On the one hand everypredictor is designed to perform well only for the sources

with a certain statistical property as stated in [24] whichconstrains the application scope of the predictors On theother hand the execution efficiency of these predictors isinfluenced significantly when the selected sample space islarge It is analyzed that the time complexity of 90Brsquos pre-dictors has high order linear relationship with the size ofsample space In the released C++ code of 90Brsquos estimatorspublished in 2018 the bits per symbol of the samples are stilllimited to between 1 and 8 inclusive in order to prevent toolow execution efficiency erefore they are not likely to bewell applied to the entropy evaluation of entropy sourceswith unknown statistical behaviors multivariate and long-range correlation

As we know the output sequences of RNGs are also atype of time series Fortunately there has been increasingattention to using neural networks to model and forecasttime series which has been found to be an alternativemethod when compared with various traditional time seriesmodels [25ndash27] Some specific neural networks are appli-cable to the prediction of time series via approximating theprobability distribution function (PDF) and the timecomplexity varies linearly with the sample space Feedfor-ward neural networks (FNNs) and recurrent neural net-works (RNNs) are the typical representatives FNNs are thequintessential deep learning models In 1991 de Groot andWurtz [28] presented a detailed analysis of univariate timeseries forecasting using FNNs for two nonlinear time seriesFNNs are used to approximate some PDFs [29] RNNs are afamily of neural networks for processing sequential datawhich can also be used for time series forecasting [30ndash32]erefore it is worthwhile and feasible to study newmethods of entropy estimation for RNG based on neuralnetworks

11 Motivation In this paper we aim to propose severalsuitable and efficient predictors for the security evaluation ofRNGs especially for the min-entropy estimation of se-quences generated by entropy sources Since both FNN andRNN are suitable for time series prediction we design twopredictors based on these neural network models e ad-vantages of the proposed predictors are roughly described asfollows Because the selected neural network models havegood universality for the prediction of various types of timeseries in principle the designed predictors have wide ap-plicability for the entropy estimation of the output of en-tropy sources Moreover neural network-based predictorshave high execution efficiency which can properly handlethe difficulty in evaluating random numbers with longdependence and multivariate due to the huge timecomplexity

12 Contributions In summary we make the followingcontributions

(i) We are the first to adopt neural network models todesign predictors to estimate min-entropy for RNGand propose a suitable execution strategy whichmakes our approach applicable to predict both

2 Security and Communication Networks

stationary and nonstationary sequences generatedby different entropy sources

(ii) We conduct a series of experiments to verify theaccuracy of our predictors by using many typicalsimulated sources where the theoretical entropy canbe obtained from the known probability distribu-tion Additionally the computational complexityare evaluated theoreticallye results show that ourapproaches enable the entropy to be estimated withan error of less than 655 and the error is up to1465 for 90Brsquos predictors e time complexity ofour estimation is a linear relationship with samplespace which is high-order linear relationship withsample space for the 90B

(iii) We experimentally compare the advantages of ourpredictors over 90Brsquos predictors on accuracy ap-plication scope and execution efficiency e ex-perimental datasets include several typical real-world data sources and various simulated datasetse experimental results indicate our predictorshave higher accuracy execution efficiency andwider scope of applicability than those of 90Brsquospredictors Furthermore when the test sample spaceand sample size are continuously growing the ex-ecution efficiency of 90Brsquos predictors becomes toolow to estimate the entropy within an acceptabletime interval while our proposed predictors can stillcalculate the estimated results efficiently

e rest of the paper is organized as follows In Section 2we introduce fundamental definitions about min-entropythe evolution of the 90B and estimators especially predictorsdefined in this criterion and two typical neural networks wechoose to support our research In Section 3 we propose twopredictors based on neural networks for min-entropy esti-mation design an execution strategy and give the accuracyverification and complexity analysis of our predictorsFurthermore we apply our predictors to different types ofsimulated and real-world data sources and compare theadvantages of our predictors over 90Brsquos predictors on ac-curacy application scope and execution efficiency in Section4 We finally conclude our work in Section 5

2 Preliminaries

In this section firstly we introduce the fundamental conceptof min-entropy which is the core mathematical thought andassessment method in our work en we introduce theevolution process of the 90B and relevant research work onthis criterion After that we introduce the estimators definedin the 90B especially the predictive model-based estimatorswhich are the focus of this paper At last we describe twopredictive models based on neural networks which apply totime series forecasting and contribute to the design of newpredictors in our work

21 Min-Entropy of Entropy Source e concept of entropyis the core mathematical thought of the 90B and min-

entropy is the assessment method which is a conservativeway to ensure the quality of random numbers in the worstcase for some high-security applications such as the seed ofPRNGs e 90B [22] gives the definition of min-entropythe min-entropy of an independent discrete random variableX that takes values from the set A x1 x2 xk1113864 1113865 (k isin Zlowastdenotes the size of sample space) with the probabilityPr(X xi) pi(i 1 2 k) e min-entropy of theoutput is

Hmin min1leilek

minus log2pi( 1113857

minus log2 max1leilek

pi( 11138571113888 1113889(1)

IfX hasmin-entropyH then the probability of observingany particular value for X is no greater than 2minus H emaximum possible value for the min-entropy of a randomvariable with k distinct values is log2(k) which is attainedwhen the random variable has a uniform probability dis-tribution namely p1 p2 middot middot middot pk 1k

For the non-IID source such as Markov process Turanet al provided a calculation method of min-entropy in [22]A stochastic process Xi1113864 1113865iisinN that takes values from the finitesetA defined above is known as a first-orderMarkov chain if

Pr Xm+1 xm+11113868111386811138681113868 Xm xm Xmminus 1 xmminus 1 X0 x01113872 1113873

Pr Xm+1 xm+11113868111386811138681113868 Xm xm1113872 1113873

(2)

for any m isin Zlowast and all x0 x1 xm xm+1 isin A In adth-order Markov process the transition probabilities havethe property that

Pr Xm+1 xm+11113868111386811138681113868 Xm xm Xmminus 1 xmminus 1 X0 x01113872 1113873

Pr Xm+1 xm+11113868111386811138681113868 Xm xm Xmminus d+1 xmminus d+11113872 1113873

(3)

e initial probabilities of the process arepi Pr(X0 i) and the transition probabilities arepij Pr(Xm+1 j | Xm i) e min-entropy of a Markovprocess of length L is defined as

Hmin minus log2 maxx1 xL

px11113945

L

j1pxjminus 1xj

⎛⎝ ⎞⎠ (4)

e approximate value of min-entropy per sample can beobtained by dividing Hmin by L

22 NIST SP 800-90B and Its Entropy Estimation e 90B isa typical case that evaluates the quality of the entropy sourcefrom the perspective of min-entropy e evolution processof 90Bmainly includes the following three stages Comparedwith the second draft the final version in January 2018 hasmade some corrections

(i) e first draft of 90B [13] was published in August2012 which included five estimators proposed byHagerty and Draper [33] ese estimators are

Security and Communication Networks 3

collision test partial collection test Markov testcompression test and frequency test which aresuitable for sources that do not necessarily satisfythe IID assumption But these estimators give sig-nificant underestimates which were found by Kelseyet al [24] through experiments

(ii) Subsequently the 90B was updated to the seconddraft [22] in January 2016 and the estimators basedon predictors which were proposed by Kelsey et alfor the first time were adopted Compared with thefirst draft the second draft has the following mainchanges (1) Among these estimators the partialcollection test in the first draft was deleted and thefrequency test in the first draft was replaced by mostcommon value estimator and two new estimatorsincluding t-tuple estimator and longest repeatedsubstring estimator were added (2) e secondimportant update was the addition of four predic-tors for entropy estimation However the under-estimation problem was not solved in the seconddraft Zhu et al [34] proved the underestimationproblem for non-IID data from theoretical analysisand experimental validations and proposed animproved method

(iii) e final official version of 90B [23] was publishedin January 2018 In the final 90B estimators withsignificant underestimates such as collision esti-mator and compression estimator are modified tobe limited to only for binary inputs which mayreduce the overall execution efficiency of min-en-tropy estimation for nonbinary inputs In additionthe calculation process and method of key variablesfor min-entropy estimation are also corrected suchas Pglobalprime min-entropy of each predictor

221 Execution Strategy of Min-Entropy Estimation in 90Be 90B takes the following strategy to estimate the min-entropy It first checks whether the tested datasets are IID ornot On the one hand if the tested datasets are non-IID thereare ten estimators as mentioned in Section 222 for entropyestimation Each estimator calculates its own estimation in-dependently then among all estimations theminimum one isselected as the final estimation result for the entropy sourceOn the other hand if the tested datasets are considered IIDonly the most common value estimator is employed Finallyit applies restart tests and gives the entropy estimation

Note that this article only focuses on the analysis andcomparison with the 90Brsquos four predictors and the researchon other parts of the 90B is not considered in this study

222 Estimators in 90B In the final NIST SP 800-90B thereare ten estimators and each estimator has its own specificcharacteristics According to the underlying methods theyemployed we divide these estimators into three classesfrequency-based type entropy statistic based type andpredictors based typee following is a brief introduction ofthe ten estimators and the details can be found in [23]

(1) Predictor-Based Type Estimators e followings are fourpredictors proposed by Kelsey et al [24] for entropy esti-mation for the first time Kelsey et al utilized several ma-chine learning models served as predictors to improve theaccuracy of entropy estimation But these predictors per-form well only for specific distributions

(i) Multi Most Common in Window (MultiMCW)Predictor is predictor performs well in caseswhere there is a clear most common value but thatvalue varies over time

(ii) Lag Predictor e Lag subpredictor predicts thevalue that occurred N samples back in the sequenceis predictor performs well on sources with strongperiodic behavior if N is close to the period

(iii) Multi Markov Model with Counting (MultiMMC)PredictoreMultiMMC subpredictor predicts themost common value followed the previousN samplestring e range of the parameter N is set from 1 to16 is predictor performs well on data from anyprocess that can be accurately modeled by anNth-order Markov model

(iv) LZ78Y Predictor is predictor performs well onthe sort of data that would be efficiently compressedby LZ78-like compression algorithms

(2) Other Estimators Four frequency-based type estimatorsand two entropy statistic-based type estimators are de-scribed as follows e min-entropy of the former type ofestimators is calculated according to the probability of themost-likely output value and the other is based on entropicstatistics presented by Hagerty and Draper [33] Amongthem three estimators including Markov estimator Col-lision estimator and Compression estimator explicitlystate that they only apply to binary inputs in the final 90Bpublished in 2018 which may reduce the execution effi-ciency for nonbinary inputs as proved through experimentsin Section 43

(i) Most Common Value Estimate is estimator cal-culates entropy based on the number of occurrencesof the most common value in the input dataset andthen constructs a confidence interval for this pro-portion e upper bound of the confidence intervalis used to estimate the min-entropy per sample ofthe source

(ii) Markov Estimate is estimator computes entropybymodeling the noise source outputs as a first-orderMarkov model e Markov estimate provides amin-entropy estimate by measuring the depen-dencies between consecutive values from the inputdataset is method is only applied to binaryinputs

(iii) T-Tuple Estimate is method examines the fre-quency of t-tuples (pairs triples etc) that appears inthe input dataset and produces an estimate of theentropy per sample based on the frequency of those t-tuples

4 Security and Communication Networks

(iv) Longest Repeated Substring Estimate (LRS estimate)is method estimates the collision entropy (sam-pling without replacement) of the source based onthe number of repeated substrings (tuples) withinthe input dataset

(v) Collision Estimate is estimator calculates entropybased on the mean number of samples to see the firstcollision in a dataset where a collision is any repeatedvalue is method is only applied to binary inputs

vi) Compression estimate is estimator calculatesentropy based on how much the tested data can becompressed is method is also only applied tobinary inputs

223 Min-Entropy Estimation of 90Brsquos Predictors Eachpredictor in the 90B attempts to predict the next sample in asequence according to a certain statistical property of pre-vious samples and provides an estimated result based on theprobability of successful prediction Every predictor consistsof a set of subpredictors and chooses the subpredictor withthe highest rate of successful predictions to predict thesubsequent output As for each predictor it calculates theglobal predictability and local predictability with the upperbound of the 99 confidence interval and then derives theglobal and the local entropy estimations respectively Fi-nally the final entropy estimation for this predictor is theminimum of the global and the local entropy estimations

For estimating the entropy of a given entropy sourceeach predictor offers a predicted result after testing theoutputs produced by the source and provides an entropyestimation based on the probability of successful predictionsAfter obtaining the estimations from the predictors theminimum estimation of all the predictors is taken as the finalentropy estimation of the entropy source

e entropy estimation will be too loose if there is nopredictor applied to detect the predictable behaviors But if a setof predictors with different approaches are applied they canguarantee that the predictor which is the most effective atpredicting the entropy sourcersquos outputs determines the entropyestimation

23 Two Predictive Models Based on Neural NetworksNext we will introduce two main neural network models tohelp us design predictors for entropy estimation feedfor-ward neural networks (FNNs) and recurrent neural net-works (RNNs) respectively

231 FNNs e goal of a feedforward network is to ap-proximate some function flowast For an instance a classifierY flowast(X) maps an input X to a category Y A feedforwardnetwork describes a mapping Y f(X θ) and learns thevalue of the parameters θ that result in the best functionapproximatione principle of FNN is depicted in Figure 1

For each time step from t 1 to t n (n isin Zlowast denotesthe sample size) the FNN applies the following forwardpropagation equations

Ht f b + WXt( 1113857

Yt c + VHt

1113954Yt g Yt( 1113857

(5)

e parameters and functions that govern the compu-tation happening in a FNN are described as follows

(i) Xt is the input at time step t and is a vector composedof the previous inputs (ie Xt [Xtminus k Xtminus 1]where k refers to the step of memory)

(ii) Ht is the hidden state at time step t where the biasvectors b and input-to-hidden weights W are de-rived via training e number of hidden layers andthe number of hidden nodes per layer are definedbefore training which called hyperparameters inneural networks

(iii) Yt is the output at step t where the bias vectors c andhidden-to-output weights V are derived viatraining

(iv) 1113954Yt is our predictive output at time step t which wouldbe a vector of probabilities across our sample space

(v) e function f(middot) is a fixed nonlinear functioncalled activation function and the function g(middot) isan output function used in the final layer of a neuralnetwork Both of the two functions belong tohyperparameters which are defined before training(Section 32)

e models are called feedforward because the infor-mation flows through the approximate function from theinput Xt through the internal computations used to updatethe model to define f(middot) and to the output 1113954Yt finally Be-sides there is no feedback connection namely the outputsof the model are fed back into itself

232 RNNs If adding the feedback connections to thenetwork then it is called RNNs In particular the RNNrecords the information that has been calculated so far anduse it for the calculation of the present output e principleof RNNs is depicted in Figure 2

For each time step from t 1 to t n the RNN appliesthe following forward propagation equations

Ht f b + WHtminus 1 + UXt( 1113857

Yt c + VHt

1113954Yt g Yt( 1113857

(6)

e parameters and functions that govern the compu-tation happening in a RNN are described as follows

Xt Ht Yt

Output layerInput layer Hidden layer

Figure 1 Feedforward neural network

Security and Communication Networks 5

(i) Xt is the input at time step t and is one-hot vectorFor example if Xt 1 and the sample spaceS 0 1 then Xt [0 1]

(ii) Ht is the hidden state at time step t It is theldquomemoryrdquo of the networkHt is calculated based onthe previous hidden state Htminus 1 and the input at thecurrent stepXt bU andW denote the bias vectorsinput-to-hidden weights and hidden-to-hiddenconnection into the RNN cell respectively

(iii) Yt is the output at step t c and V denote the biasvectors and hidden-to-output weights respectively

(iv) 1113954Yt is our predictive output at time step t whichwould be a vector of probabilities across our samplespace

(v) Similarly the function f(middot) is an activation functionand g(middot) is an output function which are definedbefore training (Section 32)

3 Predictors for Min-Entropy EstimationBased on Neural Network

e neural network is able to approximate the various PDFsand the complexity of neural network is increased slower(linear relationship) as the sample space increases Moti-vated by [24] we propose two predictive models based onneural networks for min-entropy estimation Next wepresent the execution strategy of our min-entropy estima-tors provide the choices of the important hyperparametersand give the analysis on the accuracy and complexity of ourpredictive models to prove that our design is feasible

31 Strategy of Our Predictors for Min-Entropy Estimatione execution strategy of our min-entropy estimator isdepicted in Figure 3 which consists of model training andentropy estimation Both of our proposed two predictivemodels (namely predictors) which are based on FNN andRNN respectively follow the same strategy

e benefit of this strategy is that it applies not only tostationary sequences generated by entropy sources but alsoto nonstationary sequences of which the probability dis-tribution is time-varying On the one hand in our strategyin order that the model can match the statistical behavior ofthe data source well we use the whole input dataset to trainand continuously update the model On the other hand toeffectively estimate the entropy of the data source we use thepredictive model to compute the min-entropy only when thepredictive model is updated enough to characterize thestatistical behavior of the tested dataset Specifically for thetesting dataset which is used for computing entropy

estimation we preset the testing dataset as a part of thewhole observations and utilize a proportion parameter(c isin [0 1]) to determine the size of testing dataset namelythe last c of the inputs are used for computing entropy whilethe model is also updating

e workflow of the min-entropy estimator based onneural network is listed as the following steps

(1) Initialization choose one model (FNN or RNN) andset the hyperparameters of the model

(2) Input data input the tested dataset and judge theproportion of the remaining data in the entire testeddataset If the proportion of the remaining obser-vations le c record the accuracy for predicting thenext sample to be observed else continue

(3) Prediction predict the current output according tothe forward propagation equations

(4) Comparison and loss function observe the realoutput and compare it to the predicted value encompute the loss function

(5) Update predictive model compute the gradient andupdate the predictive model If the entire datasetruns out please turn to Step 6 else please repeat Step2 sim Step 5

(6) Calculate the predictive accuracy and probabilityobtain the accuracy of the executed predictive modelfrom the last c observations including global pre-dictions and local predictions and compute theprobability respectively

(7) Calculate the min-entropy calculate the min-en-tropy by the obtained probability After obtaining theestimations from the two predictors (FNN andRNN) respectively the minimum entropy of the twopredictors is taken as the final entropy estimation(namely min-entropy) of the tested dataset

Initialize the predictive model

Input the datasets to be observed

Predict the next sample

Unobservedproportion le γ

Comparison and compute theloss function

Update the predictive model

Record the accuracy

All the samplesruns out

Calculate the predictionprobability

N

Y

N

Y

Output min-entropy

Figure 3 Execution strategy of our predictors for min-entropyestimation based on neural network

Xt Ht Yt

Output layerInput layer Hidden layer

Htndash1

Figure 2 Recurrent neural network

6 Security and Communication Networks

Combining the calculation principle of min-entropy inSection 21 we can see the lower bound on the probabilityof making a correct prediction gives an upper bound on theentropy of the source In other words the more predictablea source is the larger probability of making correct pre-dictions is and the less entropy it has erefore a modelthat is a bad fit for the source or not fully trained will resultin inaccurate predictions a low accurate predictionprobability and a too-high entropy estimation of thesource So the models that are bad fit for the source or notfully trained can give big overestimates but notunderestimates

Further we can confirm that adding one more pre-dictor will not do any harm and conversely will make theentropy estimation much more accurate From the exe-cution strategy we can see that if all the predictors whosemodels are not matched for the noise source are usedalongside a predictor whose underlying model matchesthe sourcersquos behavior well then the predictor whichmatches the source well will determine the final entropyestimation

32 Choices of Important Parameters

321 Hyperparameters for FNN and RNN In neural net-works the choices of modelsrsquo hyperparameters have sig-nificant influences on the computational resource andperformance required to train and test erefore thechoices of hyperparameters are crucial to neural networksNext we illustrate the choices of some key hyperparameters

(1) Hidden Layers and Nodes Comprehensively balance theaccuracy and efficiency of our predictors in this paper forthe FNNmodel except for the multivariate M-sequences weset the number of hidden layers as 2 and the number ofhidden nodes per layer is 10 and 5 respectively While forthe multivariate M-sequences after extensive tests thenumber of hidden nodes per layer shall be larger to givebetter results By observing the results finally we set thenumbers as 35 and 30 respectively

(2) Step of Memory e step of memory determines thenumber of previous samples used for predicting the currentoutput Generally speaking the larger the value the betterthe performance However the computational resources(memory and runtime) increase as the step of memorygrows In this paper we set the step of memory as 20 bytrading off performance and resource at is to say as forthe FNN the input at time step t is the previous 20 observedvalues and as for the RNN the hidden layer contains 20unfolded hidden units

(3) Loss Function e loss function refers to the functionthat can measure the difference between the predicted valuesand the true values e total loss for a given sequence ofx x1 xn1113864 1113865 values paired with a sequence ofy y1 yn1113864 1113865 values would then be just the sum of thelosses over all the time steps For example if Lt is thenegative log-likelihood of yt given x1 xt then

L x1 xn1113864 1113865 y1 yn1113864 1113865( 1113857

1113944t

Lt

minus 1113944t

log2 pmodel yt

1113868111386811138681113868 x1 xt1113872 11138731113872 1113873

(7)

where pmodel(yt | x1 xt) is given by reading the entry foryt from the modelrsquos output vector 1113954yt e models are trainedto minimize the cross-entropy between the training data andthe modelsrsquo predictions (ie equation (7)) which isequivalent to minimizing the mean squared error (ie theaverage of the squares of the errors or deviations)

(4) Learning Rate Learning rate is a positive scalar deter-mining the size of the step To control the effective capacity ofthe model we need to set the value of learning rate in anappropriate range e learning rate determines how fast theparameter θmoves to its optimum value If the learning rate istoo large gradient descent can inadvertently increase ratherthan decrease the training error namely the parameters arelikely to cross the optimal value However if the learning rateis too small the training is not only slower but may becomepermanently stuck with a high training error So the learningrate is crucial to the performance of the model

Based on the above analysis we pick the learning rateapproximately on a logarithmic scale ie the learning ratetaken within the set 01 001 10minus 3 10minus 4 10minus 51113864 1113865 At thebeginning of model training we set the learning rate aslarger value to faster reach the optimum value en withthe number of training increasing we set the smaller valuefor not crossing the optimal value e detailed settings aredescribed in Algorithm 1

(5) Activation Function In general we must use a nonlinearfunction to describe the features Most neural networks doso using an affine transformation controlled by learnedparameters followed by a fixed nonlinear function called anactivation function Activation function plays an importantrole in neural networks e commonly used activationfunctions include tanh(middot) relu(middot) and sigmoid functionwhich is defined as σ(middot) ie equation (8)) in this paperBecause the sigmoid function is easy to saturate whichcauses the gradient to change slowly during training it isgenerally no longer used as an activation function except inRNN-LSTM (long short term memory)

σ(x) 1

1 + eminus x (8)

After many attempts (ie we compare the efficiency andperformance by means of the exhaustive method manually)we finally choose the tanh(middot) and relu(middot) as activationfunctions for FNN and RNN respectively ey can beexpressed as

tanh(x) 1 minus eminus 2x

1 + eminus 2x (9)

Compared with σ(middot) tanh(middot) is symmetrical about theorigin In some cases this symmetry can give better

Security and Communication Networks 7

performance It compresses the real-valued input to a rangeof minus 1 to 1 and the mean of its output is zero which makes itconverge faster than the σ(middot) and reduce the number ofiterations erefore this is suitable for activation functionand the zero-centered training dataset contributes theconvergence speed of model training

relu(x) max(0 x) (10)

where relu(middot) is currently a popular activation function It islinear and gets the activation value which requires the onlyone threshold We choose this function based on the fol-lowing two considerations On the one hand it solvesvanishing gradient problem of back propagation throughtime (BPTT) algorithms for the reason that the derivative ofrelu(middot) is 1 On the other hand it greatly improves the speedof calculation because it only needs to judge whether theinput is greater than 0

(6) Output Function e output function is used in the finallayer of a neural network model e predictors for timeseries are considered as a solution to a multiclass classifi-cation problem so we take softmax(middot) as the output func-tion which can be expressed as

yi softmax zi( 1113857 ezi

1113936si1 ezi

(11)

where s is the size of sample space and softmax(zi) denotesthe probability of the output is zi and satisfies that1113936

si1 yi 1

ie the sum of the probability of all the outputs is equal to 1Such networks are commonly trained under a cross-entropyregime (ie the loss function mentioned above)

322 Selection of Testing Dataset Length To better estimatethe entropy of the data source the length of the testingdataset is very important for min-entropy estimation forrandom numbers generated by different types of sources Inreality most entropy sources are time-varying (namelynonstationary) which means the probability distribution ofthe output sequences from the source is changing over timeSo the length of the testing dataset shall be adaptive to thetype of the source

erefore as described in Section 31 we utilize c todetermine the size of testing dataset Specifically in ourstrategy for the stationary entropy source of which theprobability distribution of the outputs from the source is notchanging over time the parameter c is preset to 20

Relatively for the nonstationary entropy source all obser-vation points (namely c is 100) need to serve as the testingdataset

To verify the reasonableness of the c value we computethe root-mean-squared error (RMSE) of the lowest esti-mations of our predictors over 80 sequences from the fol-lowing simulated datasets generated by nonstationarysource

(i) Time-Varying Normal Distribution Rounded to In-tegers e samples are subject to a normal distri-bution and rounded to integer values but themean ofthe distribution moves along a sine curve to simulatea time-varying signal

e RMSE ie

(1N) 1113936Ni1 ( 1113954Hmin minus Hmin)2

1113969

refers tothe arithmetic square root for the mean of the squares of theerrors or deviations for each class of simulated sources Notethat here N indicates the number of test samples 1113954Hminindicates the estimated result for each sample and Hminmeans the theoretical result for each sample In other wordsthe smaller the RMSE is the closer the estimated result is tothe theoretical entropy which indicates the predictor has abetter accuracy

As shown in Table 1 we can see that for the time-varyingdata source only when the c is 100 (namely the entiredataset shall be used for min-entropy estimation) the pre-dictors can give themost accurate resultsismeans when theprobability distribution of data sources is varying with timethe part of the input dataset cannot represent the overalldistribution of the input dataset so the part of the input datasetcannot accurately give the estimation result of the entire inputdataset Besides for the stationary sources it is reasonable thatc is preset to 20 because the estimated results obtained byour method are very close to the correct entropy (in theory) ofthe selected entropy source as presented in Section 41

33 Evaluation onOurPredictors In this section we conductsome experiments on simulated datasets to verify the accuracyof our proposed predictors for the min-entropy estimationand compare the experimental results with theoretical resultsIn addition we have a theoretical analysis of the complexity ofour predictors Note that in Section 4 we will apply ourpredictors to different data sources and provide the com-parison on our predictors with 90Brsquos predictors

331 Accuracy Verification We train our predictive modelsFNN and RNN on a number of representative simulated datasources (including stationary and nonstationary entropysources) of which the theoretical entropy can be obtainedfrom the known probability distribution of the outputsSimulated datasets are produced using the following dis-tribution families adopted in [24]

(1) Simulated Datasets Generated by Stationary Sources

(i) Discrete Uniform Distribution e samples areequally likely which come from an IID source

(1) if train numlt train dataset size3 then(2) learning rate⟵ 01 001 (3) else if train numlt train dataset size15 then(4) learning rate⟵ 001 10minus 3 10minus 41113864 1113865(5) else(6) learning rate⟵ 10minus 4 10minus 51113864 1113865(7) end if

ALGORITHM 1 Setting of learning rate

8 Security and Communication Networks

(ii) Discrete Near-Uniform Distribution All samples areequally likely except one which come from an IIDsource A certain sample has a higher probabilitythan the rest

(iii) Normal Distribution Rounded to Integers esamples are subject to a normal distribution androunded to integer values which come from an IIDsource

(iv) Markov Model e samples are generated using adth-order Markov model which come from a non-IID source

(2) Simulated Datasets Generated by Nonstationary Sourcesese datasets are the same as those used in Section 322

For every class listed above we generate a set of 80simulated datasets each of which contains 106 samples andestimate min-entropy by using predictive models FNN andRNN respectively For each dataset the theoretical min-entropy Hmin is derived from the known probabilitydistribution

From Figures 4ndash9 the abscissa in the figure representsthe theoretical entropy of the test sample and the ordinaterepresents the estimated entropy of the test sample Figure 4shows the estimated entropy results for the 80 simulateddatasets with uniform and near-uniform distributions re-spectively From Figures 4(a) and 4(b) we see that the es-timated results given by our proposed two predictive models(FNN and RNN) are almost consistent with the theoreticalentropy for both uniform and near-uniform distributionsSo the final estimated result which is the minimum result ofthe two predictive models is also basically consistent with thetheoretical entropy Figure 5 shows the estimated entropyresults for the 80 simulated datasets with normal distribu-tions and time-varying normal distributions respectivelyFrom Figures 5(a) and 5(b) we can see that the estimatedresults given by our proposed two predictive models areclose to the theoretical entropy with normal distributionsand time-varying normal distributions According to ourexecution strategy here we calculate min-entropy estima-tions using the whole input dataset for time-varying normaldistributions

Figure 6 shows the estimated results of Markov distri-butions we can see that both of our predictive models give anumber of overestimates when applied to the Markovsources particularly with the theoretical entropy increasing

Table 2 shows the relative errors (namely|( 1113954Hmin minus Hmin)Hmin|lowast100) between the theoretical re-sults and the estimated results of FNN and RNN to furtherreflect the accuracy of the models 1113954Hmin and Hmin have thesame meaning as in Section 322 We see that the entropy tobe estimated with an error of less than 602 for FNN and7 for RNN for the simulated classes respectively

Based on the above accuracy verification of our pre-dictors with simulated datasets from different distributionswhat we can be sure is that out predictors can give almostaccurate results except Markov distributions

332 Complexity Analysis To analyze the usability of ourpredictors in terms of execution efficiency we derive thefollowing computational complexity through the analysis oftheory and principle of implementation

We believe that the computational complexity of entropyestimators used for RNG evaluation mainly comes from thesample space and sample size For ease of analysis we definethe following parameter n as the sample size which indicatesthe length of the sample s as the sample space which meansthe kinds of symbols in the sample (ie s 8means there are8 symbols in the sample and the bit width of each symbol islog2(8) 3 such as 010 110 111 ) and k denotes themaximum step of correlation which is set as a constant in90Brsquos predictors (k 16) and our predictors (k 20)

rough the analysis of the implementation the com-putational complexity of the final 90Brsquos predictors [23]mainly comes from the MultiMMC predictor and is of orderO(sk middot n + 2k middot n middot log2(s)) which is mainly linear timecomplexity of n and k-order polynomial time complexity ofs While the computational complexity of our predictor is oforder O(s middot n) which is linear time complexity of s and n Itcan be seen that the computational complexity of ourpredictors is much lower than that of the 90Brsquos predictors

It is important to note that the MultiMMC predictorrequires sk≪ n otherwise this predictor cannot give ac-curate estimated results statisticallyat is to say when the sis increasing the MultiMMC predictor requires largersample size in order to estimate the entropy accurately

From the above analysis we can see our predictors havelower computational complexity We will give the experi-mental proof in Section 43

4 Comparison on Our Predictors With90BrsquoS Predictors

In this section a large number of experiments have beendone to evaluate our proposed predictors for entropy esti-mation from the aspects of accuracy applicability and ef-ficiency by applying our predictors to different simulateddata and real-world data For the experiments mentionedabove we compare the results with the final 90Brsquos predictors[23] to highlight the advantages of our work Similarly ourpredictors in these experiments compute an upper-bound ofmin-entropy estimation at the significance level α 001which is the same as 90Brsquos predictors

41 Comparison on Accuracy

411 Simulated Data e simulated datasets are producedusing the same distribution families as described in Section331 Further we append the following two new distributionfamilies such as pseudorandom sequence and post-processing sequence which are representative and com-monly used in reality

Table 1 Error measures of final estimations of our predictors fornonstationary sources with different c values

c 01 02 04 06 08 1RMSE 00911 01364 00788 00817 00219 00149

Security and Communication Networks 9

eoretical entropyFNNRNN

0

1

2

3

4

5

6

7

8Es

timat

ed en

trop

y pe

r sam

ple

1 2 3 4 5 6 7 80eoretical entropy per sample

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 4 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for(a) uniform distributions and (b) near-uniform distributions

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entro

py p

er sa

mpl

e

(b)

Figure 5 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for (a) normaldistributions and (b) time-varying normal distributions

10 Security and Communication Networks

Theoretical entropyFNNRNN

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

Figure 6 Comparison of estimated results obtained from our two predictive models with the theoretical entropy for Markov distributions

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 7 Comparison of the min-entropy estimation obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) uniform distributions and (b) near-uniform distributions

Security and Communication Networks 11

(i) M-Sequence A maximum length sequence which isa type of pseudorandom binary sequence ([35])

(ii) Nonuniform Distribution by Postprocessing UsingLFSR e samples are processed using a linearfeedback shifting register (LFSR) which come froman IID source ([35])

For every distribution mentioned above we also gen-erate a set of 80 simulated datasets each of which contains106 samples and estimate min-entropy by using our pro-posed predictors and final 90Brsquos predictors [23]

Figure 7 shows the estimated min-entropy results for the80 simulated datasets with uniform distributions and near-uniform distributions respectively From Figures 7(a) and7(b) we see that several points of the results obtained fromthe 90Brsquos predictors are apparently underestimated whichmay result from the overfitting phenomenon Comparedwith 90Brsquos predictors our predictors provide more accurateresults

Figure 8 shows the estimated min-entropy results fornormal distributions and time-varying normal distributionsrespectively From Figures 8(a) and 8(b) we can see that theestimated results given by our predictors are close to thetheoretical entropy with normal distributions and time-varying normal distributions However the lowest entropyestimation results obtained from the 90Brsquos predictors givesignificant underestimates

Figure 9 shows the estimated min-entropy results forMarkov distributions We can see that the 90Brsquos predictorsalmost give underestimates compared with the theoreticalentropy while estimated results given by our predictors aremuch closer to the theoretical entropy than those obtainedfrom 90Brsquos predictors

To further obviously compare the accuracy of our and90Brsquos predictors we apply the predictors to the M-sequenceand the non-uniform distribution sequence by post-pro-cessing using LFSR and their theoretical entropy is a knownand fixed value

It is further confirmed that the higher stage (the max-imum step of correlation) M-sequence and nonuniformdistribution sequence by postprocessing using LFSR are ableto pass the NIST SP 800-22 statistical tests [8] e estimatedresults are listed in Tables 3 and 4 and the lowest entropyestimations from 90Brsquos predictors and our predictors foreach stage are shown in bold font

For M-sequence and nonuniform distribution by post-processing using LFSR the MultiMMC predictor presentedin the final 90B gives the most accurate entropy estimationresults for the stage le 16 However when the stage ofM-sequence and nonuniform distribution by postprocessingusing LFSR is greater than 16 the MultiMMC predictorcannot give accurate entropy estimation result because thispredictor is parameterized by k isin 1 2 16 (k is themaximum step of correlation) Perhaps we could set the

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7Es

timat

ed en

trop

y pe

r sam

ple

(a)

eoretical entropy90Brsquos predictorsOur predictors

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e1 2 3 4 5 6 70

eoretical entropy per sample

(b)

Figure 8 Comparison of the min-entropy estimations obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) normal distributions and (b) time-varying normal distributions

12 Security and Communication Networks

parameter of the MultiMMC predictor as a greater range toachieve a more accurate estimated result for the higher stagebut the time complexity grows exponentially with the pa-rameter k as we analyzed in Section 332 Moreover theFNN model can also give accurate estimated results eventhough the stages of M-sequence and LFSR are greater than16 However the RNN model can give accurate estimated

results only when the stage is 8erefore the FNNmodel ismore matched to M-sequence and nonuniform distributionby postprocessing using LFSR than RNN

We also compute the relative errors of estimated resultsfrom 90Brsquos predictors and our predictors over 80 se-quences from each class of simulated sources We calculatethe relative errors using the min-entropy obtained from90Brsquos predictors (the lowest estimation result of 90Brsquos fourpredictors) and our predictors (the lowest estimation re-sult of FNN and RNN) respectively As illustrated inTable 5 it shows that for all five classes of simulatedsources the errors of our predictors are lower than that ofthe 90Brsquos predictors Specially our approaches enable theentropy to be estimated with an error of less than 655but it is up to 1465 for 90Brsquos predictors Overall thisindicates that our proposed predictors have a betterperformance than that of 90Brsquos predictors on accuracy forboth stationary sequences and nonstationary sequenceswhich is consistent with the conclusion drawn in thefigures above

From Tables 2ndash4 we also find that the accuracy of theRNN predictive model is slightly higher than that of theFNN predictive model except for the cases of the Markovsources M-sequence and nonuniform distribution bypostprocessing using LFSR

We will further verify the applicability for time-varyingsources in Section 42 erefore through the evaluation onthe entropy estimation results of the above simulateddatasets we see that our proposed predictors are superior inaccuracy compared with the 90Brsquos predictors

412 Real-World Data We further apply our predictors tothe datasets which are generated from the RNGs deployed inthe real-world In fact the theoretical entropy of per sampleis unknown for these real-world sources so no error can becompared like the simulated datasets for the predictorsHowever the estimated results from the predictors pre-sented here can still be compared to the 90Brsquos predictorsbased on the knowledge that underestimates from thepredictors have theoretical bounds

Datasets of real-world data are produced using thefollowing approaches e first two are adopted in [24] andthe others are commonly used typical RNGs e estima-tions of the real-world sources are presented in Table 6

(i) RANDOMORG is is a service that providesrandom numbers based on atmospheric noise and isused in [24] It allows the user to specify the

Theoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

Figure 9 Comparison of the min-entropy estimations obtainedfrom our proposed predictors and 90Brsquos predictors with the the-oretical entropy for Markov distributions

Table 2 Relative errors of FNN and RNN estimation results

Simulated data class FNN () RNN ()Uniform 165 16Near-uniform 160 152Normal 117 108Time-varying normal 212 184Markov 602 7

Table 3 Estimated results for M-sequence (Hmin 0000)

Stage 8 10 12 14 16 18 20MultiMCW 0991 0996 0988 0989 0993 0999 lowast 1000Lag 1000 1000 1000 1000 1000 1000 1000MultiMMC 0000 0000 0000 0000 0000 1000 1000LZ78Y 1000 1000 1000 1000 1000 1000 0997 lowast

FNN 0000 0000 0000 0000 0000 0000 0000RNN 0000 1048 1007 1002 0996 09920 09997

Table 4 Estimated results for nonuniform distribution by post-processing using LFSR (Hmin 0152)

Stage 8 10 12 14 16 18 20MultiMCW 0440 0595 0743 0721 0998 0994 0998Lag 0581 0581 0680 0680 0992 0994 0999MultiMMC 0151 0153 0158 0181 0234 0995 0996LZ78Y 0567 0995 0766 0679 0997 0996lowast 0994lowast

FNN 0151 0145 0149 0147 0149 0142 0144RNN 0149 0947 1012 0998 1012 0997 0985

Security and Communication Networks 13

minimum and maximum values that are outpute sequence used here consists of bits

(ii) Ubldit TrueRNGpro TrueRNGpro provides a steadystream of random numbers through a USB CDCserial port which is a USB randomnumber generatorproduced by Ubldit is entropy source is also usedin [24] e sequence used here consists of bits

(iii) Linux kernel entropy source e Linux kernelrandom generator is used for the generation of areal-world sequence without any processing esequence used here is the last bit of per symbol

(iv) Linuxdevurandomedevurandom [6] of Linuxis used for the generation of a real-world sequencewith strict processing e sequence used hereconsists of bits

(v) Windows RNG Windows RNG [5] is used for thegeneration of a real-world sequence by calling aCrypto APIe sequence used here consists of bits

As illustrated in Table 6 the lowest entropy estimationfor each source is shown in bold font We see that ourpredictors perform better than 90Brsquos predictors because thelowest entropy estimation is always obtained from our workfor each real-world source Furthermore for Linux kernelentropy source we find that both of the predictor Lag andMultiMMC are able to give lower estimation results Itindicates that Linux kernel entropy source has periodicityand conforms to the Markov model which is well under-stood because the randomness of Linux kernel entropysource comes from human behaviors such as manipulatingthe mouse and keyboard In our work compared with theentropy estimations for other real-world sources FNN fitsmuch better than RNN for Linux kernel entropy sourcewhich is consistent with the previous view that FNN per-forms well in testing Markov sources

42 Comparison on the Scope of Applicability After evalu-ating the accuracy we further validate the scope of appli-cability of our proposed predictors and compare them withthat of the 90Brsquos predictors Kelsey et al [24] stated that eachof the 90Brsquos predictors performs well only for a specialdistribution as described in Section 221 To prove ourpredictor has better applicability the following four simu-lated datasets are generated which are suitable for eachpredictor employed in the final 90B

(i) Time-Varying Sourcese probability distribution ofdata sources is varying with timeeMCWpredictor

predicts the current output according to previousoutputs in a short period of time and thus the MCWpredictor performs well in these data sources

(ii) Periodic Sources e data source changes periodi-cally e lag predictor predicts the value that oc-curred samples back in the sequence as the currentoutput and thus the lag predictor performs well onsources with strong periodic behavior

(iii) Markov Sources e data sources can be modeledby the Markov model e MultiMMC predictorpredicts the current output according to theMarkovmodel and thus the MultiMMC predictor performswell on data from any process that can be accuratelymodeled by a Markov model

(iv) LZ78Y Sources e data sources can be efficientlycompressed by LZ78-like compression algorithmswhich applies to the LZ78Y predictor well

For each above simulated source we generate a set of 10simulated datasets each of which contains 106 samples andthe min-entropy is estimated by our and 90Brsquos predictorse final result for a predictor is the average value of 10estimated results corresponding to the 10 simulated datasetsfor one simulated source

421 Time-Varying Sources Firstly we generate the time-varying binary data which is suitable for the statistical be-haviors of the MCW predictor presented in the 90B Table 7shows the entropy estimation results for time-varying data

As shown in Table 7 symbol gradual(x) (x isin [0 1] thesame below) is defined as a simulated source that theprobability of output ldquo0rdquo changes gradually from x to 1 minus x

with time Symbol period(x) is defined as a simulated sourcethat the probability of output ldquo0rdquo changes periodically withtime and the probability varies from x to 1 minus x in one periode period length is set to 20 of the entire input datasetSymbol sudden(x) is defined as a simulated source that theprobability of output ldquo0rdquo changes suddenly with timenamely the probability is set to x for the first half of the inputdataset and 1 minus x for the last half

From Table 7 the estimation results for MCW predictorand our work are shown in bold font We see that the MCWpredictor gives the lowest and most accurate entropy esti-mations for the three types of time-varying data mentionedabove but it gives a little underestimates at gradual(02) andperiod(02) It is confirmed that the time-varying sourcesmentioned above match with the statistical behaviors of theMCW predictor Relatively we find that our proposedpredictive models are all capable to obtain the satisfiedentropy estimations that are close to the correct valueserefore it is proved that our proposed predictive modelsare suitable for the time-varying datamentioned above Notethat we calculate the min-entropy estimate according to theentire dataset rather than the last 20 of the input dataset forthese time-varying sources Because the probability distri-bution is varying with time the part of the input datasetcannot represent the overall distribution of the input dataset

Table 5 Relative errors of the final estimations of 90Brsquos predictorsand our predictors for five classes of simulated sources

Simulated data class 90Brsquos predictors () Our predictors ()Uniform 437 153Near-uniform 347 159Normal 608 157Time-varying normal 347 172Markov 1465 655

14 Security and Communication Networks

422 Periodic Sources Secondly we generate periodic datawhich are suitable for the statistical behaviors of the lagpredictor presented in 90B e following is entropy esti-mation results for periodic sequences e data source iscompletely obeying the periodic rule so the correct entropyis zero e bit width of samples is traversed from 2 to 8

As shown in Table 8 the estimation results for the lagpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated periodicsources we confirm that the lag predictor is suitable for theentropy estimation of this type of source as expected Rel-atively the RNN can also give the accurate min-entropyestimates ie estimated results are zeros us our pro-posed predictive models are suitable for the entropy esti-mation of the (strong) periodic data In addition theMultiMMC predictor can also give the accurate min-entropyestimations is is reasonable because periodicity is also aform of correlation

423 Markov Sources Next we generate multivariateM-sequences as Markov sources which fit the statisticalbehaviors of the MultiMMC predictor Specifically themultivariate M-sequences are composed of multiple M-se-quences with different initial states Due to the determinacyof this type of sequences the correct entropy is zero e bitwidth of the samples is also traversed from 2 to 8 emaximum step of correlation used here is set as 8 Table 9shows the estimated results for multivariate M-sequences

From Table 9 the estimation results for MultiMMCpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated Markovsources we confirm that theMultiMMC predictor is suitablefor the entropy estimation of this type of source as expectedRelatively the RNN can also give the accurate min-entropyestimations ie estimated results are zeros us our

proposed predictive models are suitable for the Markovsources

424 LZ78Y Sources Finally we verify the applicability ofthe LZ78Y sources is type of entropy source is difficult togenerate by simulating However we can still draw theconclusion that our proposed predictive models can beapplied to the LZ78Y sources according to Tables 8 and 9 initalic font Because the periodic data and Markov sequencesare compressible

425 Summary on Applicability Scope of Our PredictorsBy analyzing the experimental results of the above fourspecific simulated sources each of which is oriented towardsa certain predictor in the 90B we have a conclusion that ourpredictors can provide accurate estimated results of entropySo the proposed predictors are well applied to these entropysources as well as the 90Brsquos predictors In addition com-pared with 90Brsquos predictors our predictors have a betterperformance on the scope of applicability for testing the

Table 7 Entropy estimates for time-varying data

Data class Correct90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNgradual(02) 06345 05290 07808 07240 07790 06288 06289gradual(03) 07437 07378 09221 08416 09243 07430 07460gradual(04) 08645 08631 09786 09518 09739 08648 08637period(02) 06345 05537 07428 05537 07669 06205 06209period(03) 07437 07393 09218 08476 09233 07377 07375period(04) 08645 08639 09767 09632 09796 08653 08632sudden(02) 03219 03203 04663 03386 04484 03217 03229sudden(03) 05146 05110 05857 09984 07663 05110 05119sudden(04) 07370 07338 08699 09984 09389 07339 07345

Table 8 Entropy estimates for periodic sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 16458 00000 00000 11817 00079 000003 23318 00000 00000 15957 01315 000004 29147 00000 00000 18016 04748 000005 33269 00000 00000 14586 08898 000006 39092 00000 00000 08322 34944 000007 44908 00000 00000 03973 34960 000008 44919 00000 00000 02027 35408 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 6 Entropy estimates for real-world sources

Real-world sources90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNRANDOMORG 09951 09963 09966 09976 09802 09954Ubldit TrueRNGpro 09979 09955 09973 09966 09934 09728Linux kernel entropy source 06173 01232 01269 06164 01230 03068Linuxdevurandom 09952 09935 09990 09964 09983 09911Windows RNG 09953 09986 09975 09984 09833 09853

Security and Communication Networks 15

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 3: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

stationary and nonstationary sequences generatedby different entropy sources

(ii) We conduct a series of experiments to verify theaccuracy of our predictors by using many typicalsimulated sources where the theoretical entropy canbe obtained from the known probability distribu-tion Additionally the computational complexityare evaluated theoreticallye results show that ourapproaches enable the entropy to be estimated withan error of less than 655 and the error is up to1465 for 90Brsquos predictors e time complexity ofour estimation is a linear relationship with samplespace which is high-order linear relationship withsample space for the 90B

(iii) We experimentally compare the advantages of ourpredictors over 90Brsquos predictors on accuracy ap-plication scope and execution efficiency e ex-perimental datasets include several typical real-world data sources and various simulated datasetse experimental results indicate our predictorshave higher accuracy execution efficiency andwider scope of applicability than those of 90Brsquospredictors Furthermore when the test sample spaceand sample size are continuously growing the ex-ecution efficiency of 90Brsquos predictors becomes toolow to estimate the entropy within an acceptabletime interval while our proposed predictors can stillcalculate the estimated results efficiently

e rest of the paper is organized as follows In Section 2we introduce fundamental definitions about min-entropythe evolution of the 90B and estimators especially predictorsdefined in this criterion and two typical neural networks wechoose to support our research In Section 3 we propose twopredictors based on neural networks for min-entropy esti-mation design an execution strategy and give the accuracyverification and complexity analysis of our predictorsFurthermore we apply our predictors to different types ofsimulated and real-world data sources and compare theadvantages of our predictors over 90Brsquos predictors on ac-curacy application scope and execution efficiency in Section4 We finally conclude our work in Section 5

2 Preliminaries

In this section firstly we introduce the fundamental conceptof min-entropy which is the core mathematical thought andassessment method in our work en we introduce theevolution process of the 90B and relevant research work onthis criterion After that we introduce the estimators definedin the 90B especially the predictive model-based estimatorswhich are the focus of this paper At last we describe twopredictive models based on neural networks which apply totime series forecasting and contribute to the design of newpredictors in our work

21 Min-Entropy of Entropy Source e concept of entropyis the core mathematical thought of the 90B and min-

entropy is the assessment method which is a conservativeway to ensure the quality of random numbers in the worstcase for some high-security applications such as the seed ofPRNGs e 90B [22] gives the definition of min-entropythe min-entropy of an independent discrete random variableX that takes values from the set A x1 x2 xk1113864 1113865 (k isin Zlowastdenotes the size of sample space) with the probabilityPr(X xi) pi(i 1 2 k) e min-entropy of theoutput is

Hmin min1leilek

minus log2pi( 1113857

minus log2 max1leilek

pi( 11138571113888 1113889(1)

IfX hasmin-entropyH then the probability of observingany particular value for X is no greater than 2minus H emaximum possible value for the min-entropy of a randomvariable with k distinct values is log2(k) which is attainedwhen the random variable has a uniform probability dis-tribution namely p1 p2 middot middot middot pk 1k

For the non-IID source such as Markov process Turanet al provided a calculation method of min-entropy in [22]A stochastic process Xi1113864 1113865iisinN that takes values from the finitesetA defined above is known as a first-orderMarkov chain if

Pr Xm+1 xm+11113868111386811138681113868 Xm xm Xmminus 1 xmminus 1 X0 x01113872 1113873

Pr Xm+1 xm+11113868111386811138681113868 Xm xm1113872 1113873

(2)

for any m isin Zlowast and all x0 x1 xm xm+1 isin A In adth-order Markov process the transition probabilities havethe property that

Pr Xm+1 xm+11113868111386811138681113868 Xm xm Xmminus 1 xmminus 1 X0 x01113872 1113873

Pr Xm+1 xm+11113868111386811138681113868 Xm xm Xmminus d+1 xmminus d+11113872 1113873

(3)

e initial probabilities of the process arepi Pr(X0 i) and the transition probabilities arepij Pr(Xm+1 j | Xm i) e min-entropy of a Markovprocess of length L is defined as

Hmin minus log2 maxx1 xL

px11113945

L

j1pxjminus 1xj

⎛⎝ ⎞⎠ (4)

e approximate value of min-entropy per sample can beobtained by dividing Hmin by L

22 NIST SP 800-90B and Its Entropy Estimation e 90B isa typical case that evaluates the quality of the entropy sourcefrom the perspective of min-entropy e evolution processof 90Bmainly includes the following three stages Comparedwith the second draft the final version in January 2018 hasmade some corrections

(i) e first draft of 90B [13] was published in August2012 which included five estimators proposed byHagerty and Draper [33] ese estimators are

Security and Communication Networks 3

collision test partial collection test Markov testcompression test and frequency test which aresuitable for sources that do not necessarily satisfythe IID assumption But these estimators give sig-nificant underestimates which were found by Kelseyet al [24] through experiments

(ii) Subsequently the 90B was updated to the seconddraft [22] in January 2016 and the estimators basedon predictors which were proposed by Kelsey et alfor the first time were adopted Compared with thefirst draft the second draft has the following mainchanges (1) Among these estimators the partialcollection test in the first draft was deleted and thefrequency test in the first draft was replaced by mostcommon value estimator and two new estimatorsincluding t-tuple estimator and longest repeatedsubstring estimator were added (2) e secondimportant update was the addition of four predic-tors for entropy estimation However the under-estimation problem was not solved in the seconddraft Zhu et al [34] proved the underestimationproblem for non-IID data from theoretical analysisand experimental validations and proposed animproved method

(iii) e final official version of 90B [23] was publishedin January 2018 In the final 90B estimators withsignificant underestimates such as collision esti-mator and compression estimator are modified tobe limited to only for binary inputs which mayreduce the overall execution efficiency of min-en-tropy estimation for nonbinary inputs In additionthe calculation process and method of key variablesfor min-entropy estimation are also corrected suchas Pglobalprime min-entropy of each predictor

221 Execution Strategy of Min-Entropy Estimation in 90Be 90B takes the following strategy to estimate the min-entropy It first checks whether the tested datasets are IID ornot On the one hand if the tested datasets are non-IID thereare ten estimators as mentioned in Section 222 for entropyestimation Each estimator calculates its own estimation in-dependently then among all estimations theminimum one isselected as the final estimation result for the entropy sourceOn the other hand if the tested datasets are considered IIDonly the most common value estimator is employed Finallyit applies restart tests and gives the entropy estimation

Note that this article only focuses on the analysis andcomparison with the 90Brsquos four predictors and the researchon other parts of the 90B is not considered in this study

222 Estimators in 90B In the final NIST SP 800-90B thereare ten estimators and each estimator has its own specificcharacteristics According to the underlying methods theyemployed we divide these estimators into three classesfrequency-based type entropy statistic based type andpredictors based typee following is a brief introduction ofthe ten estimators and the details can be found in [23]

(1) Predictor-Based Type Estimators e followings are fourpredictors proposed by Kelsey et al [24] for entropy esti-mation for the first time Kelsey et al utilized several ma-chine learning models served as predictors to improve theaccuracy of entropy estimation But these predictors per-form well only for specific distributions

(i) Multi Most Common in Window (MultiMCW)Predictor is predictor performs well in caseswhere there is a clear most common value but thatvalue varies over time

(ii) Lag Predictor e Lag subpredictor predicts thevalue that occurred N samples back in the sequenceis predictor performs well on sources with strongperiodic behavior if N is close to the period

(iii) Multi Markov Model with Counting (MultiMMC)PredictoreMultiMMC subpredictor predicts themost common value followed the previousN samplestring e range of the parameter N is set from 1 to16 is predictor performs well on data from anyprocess that can be accurately modeled by anNth-order Markov model

(iv) LZ78Y Predictor is predictor performs well onthe sort of data that would be efficiently compressedby LZ78-like compression algorithms

(2) Other Estimators Four frequency-based type estimatorsand two entropy statistic-based type estimators are de-scribed as follows e min-entropy of the former type ofestimators is calculated according to the probability of themost-likely output value and the other is based on entropicstatistics presented by Hagerty and Draper [33] Amongthem three estimators including Markov estimator Col-lision estimator and Compression estimator explicitlystate that they only apply to binary inputs in the final 90Bpublished in 2018 which may reduce the execution effi-ciency for nonbinary inputs as proved through experimentsin Section 43

(i) Most Common Value Estimate is estimator cal-culates entropy based on the number of occurrencesof the most common value in the input dataset andthen constructs a confidence interval for this pro-portion e upper bound of the confidence intervalis used to estimate the min-entropy per sample ofthe source

(ii) Markov Estimate is estimator computes entropybymodeling the noise source outputs as a first-orderMarkov model e Markov estimate provides amin-entropy estimate by measuring the depen-dencies between consecutive values from the inputdataset is method is only applied to binaryinputs

(iii) T-Tuple Estimate is method examines the fre-quency of t-tuples (pairs triples etc) that appears inthe input dataset and produces an estimate of theentropy per sample based on the frequency of those t-tuples

4 Security and Communication Networks

(iv) Longest Repeated Substring Estimate (LRS estimate)is method estimates the collision entropy (sam-pling without replacement) of the source based onthe number of repeated substrings (tuples) withinthe input dataset

(v) Collision Estimate is estimator calculates entropybased on the mean number of samples to see the firstcollision in a dataset where a collision is any repeatedvalue is method is only applied to binary inputs

vi) Compression estimate is estimator calculatesentropy based on how much the tested data can becompressed is method is also only applied tobinary inputs

223 Min-Entropy Estimation of 90Brsquos Predictors Eachpredictor in the 90B attempts to predict the next sample in asequence according to a certain statistical property of pre-vious samples and provides an estimated result based on theprobability of successful prediction Every predictor consistsof a set of subpredictors and chooses the subpredictor withthe highest rate of successful predictions to predict thesubsequent output As for each predictor it calculates theglobal predictability and local predictability with the upperbound of the 99 confidence interval and then derives theglobal and the local entropy estimations respectively Fi-nally the final entropy estimation for this predictor is theminimum of the global and the local entropy estimations

For estimating the entropy of a given entropy sourceeach predictor offers a predicted result after testing theoutputs produced by the source and provides an entropyestimation based on the probability of successful predictionsAfter obtaining the estimations from the predictors theminimum estimation of all the predictors is taken as the finalentropy estimation of the entropy source

e entropy estimation will be too loose if there is nopredictor applied to detect the predictable behaviors But if a setof predictors with different approaches are applied they canguarantee that the predictor which is the most effective atpredicting the entropy sourcersquos outputs determines the entropyestimation

23 Two Predictive Models Based on Neural NetworksNext we will introduce two main neural network models tohelp us design predictors for entropy estimation feedfor-ward neural networks (FNNs) and recurrent neural net-works (RNNs) respectively

231 FNNs e goal of a feedforward network is to ap-proximate some function flowast For an instance a classifierY flowast(X) maps an input X to a category Y A feedforwardnetwork describes a mapping Y f(X θ) and learns thevalue of the parameters θ that result in the best functionapproximatione principle of FNN is depicted in Figure 1

For each time step from t 1 to t n (n isin Zlowast denotesthe sample size) the FNN applies the following forwardpropagation equations

Ht f b + WXt( 1113857

Yt c + VHt

1113954Yt g Yt( 1113857

(5)

e parameters and functions that govern the compu-tation happening in a FNN are described as follows

(i) Xt is the input at time step t and is a vector composedof the previous inputs (ie Xt [Xtminus k Xtminus 1]where k refers to the step of memory)

(ii) Ht is the hidden state at time step t where the biasvectors b and input-to-hidden weights W are de-rived via training e number of hidden layers andthe number of hidden nodes per layer are definedbefore training which called hyperparameters inneural networks

(iii) Yt is the output at step t where the bias vectors c andhidden-to-output weights V are derived viatraining

(iv) 1113954Yt is our predictive output at time step t which wouldbe a vector of probabilities across our sample space

(v) e function f(middot) is a fixed nonlinear functioncalled activation function and the function g(middot) isan output function used in the final layer of a neuralnetwork Both of the two functions belong tohyperparameters which are defined before training(Section 32)

e models are called feedforward because the infor-mation flows through the approximate function from theinput Xt through the internal computations used to updatethe model to define f(middot) and to the output 1113954Yt finally Be-sides there is no feedback connection namely the outputsof the model are fed back into itself

232 RNNs If adding the feedback connections to thenetwork then it is called RNNs In particular the RNNrecords the information that has been calculated so far anduse it for the calculation of the present output e principleof RNNs is depicted in Figure 2

For each time step from t 1 to t n the RNN appliesthe following forward propagation equations

Ht f b + WHtminus 1 + UXt( 1113857

Yt c + VHt

1113954Yt g Yt( 1113857

(6)

e parameters and functions that govern the compu-tation happening in a RNN are described as follows

Xt Ht Yt

Output layerInput layer Hidden layer

Figure 1 Feedforward neural network

Security and Communication Networks 5

(i) Xt is the input at time step t and is one-hot vectorFor example if Xt 1 and the sample spaceS 0 1 then Xt [0 1]

(ii) Ht is the hidden state at time step t It is theldquomemoryrdquo of the networkHt is calculated based onthe previous hidden state Htminus 1 and the input at thecurrent stepXt bU andW denote the bias vectorsinput-to-hidden weights and hidden-to-hiddenconnection into the RNN cell respectively

(iii) Yt is the output at step t c and V denote the biasvectors and hidden-to-output weights respectively

(iv) 1113954Yt is our predictive output at time step t whichwould be a vector of probabilities across our samplespace

(v) Similarly the function f(middot) is an activation functionand g(middot) is an output function which are definedbefore training (Section 32)

3 Predictors for Min-Entropy EstimationBased on Neural Network

e neural network is able to approximate the various PDFsand the complexity of neural network is increased slower(linear relationship) as the sample space increases Moti-vated by [24] we propose two predictive models based onneural networks for min-entropy estimation Next wepresent the execution strategy of our min-entropy estima-tors provide the choices of the important hyperparametersand give the analysis on the accuracy and complexity of ourpredictive models to prove that our design is feasible

31 Strategy of Our Predictors for Min-Entropy Estimatione execution strategy of our min-entropy estimator isdepicted in Figure 3 which consists of model training andentropy estimation Both of our proposed two predictivemodels (namely predictors) which are based on FNN andRNN respectively follow the same strategy

e benefit of this strategy is that it applies not only tostationary sequences generated by entropy sources but alsoto nonstationary sequences of which the probability dis-tribution is time-varying On the one hand in our strategyin order that the model can match the statistical behavior ofthe data source well we use the whole input dataset to trainand continuously update the model On the other hand toeffectively estimate the entropy of the data source we use thepredictive model to compute the min-entropy only when thepredictive model is updated enough to characterize thestatistical behavior of the tested dataset Specifically for thetesting dataset which is used for computing entropy

estimation we preset the testing dataset as a part of thewhole observations and utilize a proportion parameter(c isin [0 1]) to determine the size of testing dataset namelythe last c of the inputs are used for computing entropy whilethe model is also updating

e workflow of the min-entropy estimator based onneural network is listed as the following steps

(1) Initialization choose one model (FNN or RNN) andset the hyperparameters of the model

(2) Input data input the tested dataset and judge theproportion of the remaining data in the entire testeddataset If the proportion of the remaining obser-vations le c record the accuracy for predicting thenext sample to be observed else continue

(3) Prediction predict the current output according tothe forward propagation equations

(4) Comparison and loss function observe the realoutput and compare it to the predicted value encompute the loss function

(5) Update predictive model compute the gradient andupdate the predictive model If the entire datasetruns out please turn to Step 6 else please repeat Step2 sim Step 5

(6) Calculate the predictive accuracy and probabilityobtain the accuracy of the executed predictive modelfrom the last c observations including global pre-dictions and local predictions and compute theprobability respectively

(7) Calculate the min-entropy calculate the min-en-tropy by the obtained probability After obtaining theestimations from the two predictors (FNN andRNN) respectively the minimum entropy of the twopredictors is taken as the final entropy estimation(namely min-entropy) of the tested dataset

Initialize the predictive model

Input the datasets to be observed

Predict the next sample

Unobservedproportion le γ

Comparison and compute theloss function

Update the predictive model

Record the accuracy

All the samplesruns out

Calculate the predictionprobability

N

Y

N

Y

Output min-entropy

Figure 3 Execution strategy of our predictors for min-entropyestimation based on neural network

Xt Ht Yt

Output layerInput layer Hidden layer

Htndash1

Figure 2 Recurrent neural network

6 Security and Communication Networks

Combining the calculation principle of min-entropy inSection 21 we can see the lower bound on the probabilityof making a correct prediction gives an upper bound on theentropy of the source In other words the more predictablea source is the larger probability of making correct pre-dictions is and the less entropy it has erefore a modelthat is a bad fit for the source or not fully trained will resultin inaccurate predictions a low accurate predictionprobability and a too-high entropy estimation of thesource So the models that are bad fit for the source or notfully trained can give big overestimates but notunderestimates

Further we can confirm that adding one more pre-dictor will not do any harm and conversely will make theentropy estimation much more accurate From the exe-cution strategy we can see that if all the predictors whosemodels are not matched for the noise source are usedalongside a predictor whose underlying model matchesthe sourcersquos behavior well then the predictor whichmatches the source well will determine the final entropyestimation

32 Choices of Important Parameters

321 Hyperparameters for FNN and RNN In neural net-works the choices of modelsrsquo hyperparameters have sig-nificant influences on the computational resource andperformance required to train and test erefore thechoices of hyperparameters are crucial to neural networksNext we illustrate the choices of some key hyperparameters

(1) Hidden Layers and Nodes Comprehensively balance theaccuracy and efficiency of our predictors in this paper forthe FNNmodel except for the multivariate M-sequences weset the number of hidden layers as 2 and the number ofhidden nodes per layer is 10 and 5 respectively While forthe multivariate M-sequences after extensive tests thenumber of hidden nodes per layer shall be larger to givebetter results By observing the results finally we set thenumbers as 35 and 30 respectively

(2) Step of Memory e step of memory determines thenumber of previous samples used for predicting the currentoutput Generally speaking the larger the value the betterthe performance However the computational resources(memory and runtime) increase as the step of memorygrows In this paper we set the step of memory as 20 bytrading off performance and resource at is to say as forthe FNN the input at time step t is the previous 20 observedvalues and as for the RNN the hidden layer contains 20unfolded hidden units

(3) Loss Function e loss function refers to the functionthat can measure the difference between the predicted valuesand the true values e total loss for a given sequence ofx x1 xn1113864 1113865 values paired with a sequence ofy y1 yn1113864 1113865 values would then be just the sum of thelosses over all the time steps For example if Lt is thenegative log-likelihood of yt given x1 xt then

L x1 xn1113864 1113865 y1 yn1113864 1113865( 1113857

1113944t

Lt

minus 1113944t

log2 pmodel yt

1113868111386811138681113868 x1 xt1113872 11138731113872 1113873

(7)

where pmodel(yt | x1 xt) is given by reading the entry foryt from the modelrsquos output vector 1113954yt e models are trainedto minimize the cross-entropy between the training data andthe modelsrsquo predictions (ie equation (7)) which isequivalent to minimizing the mean squared error (ie theaverage of the squares of the errors or deviations)

(4) Learning Rate Learning rate is a positive scalar deter-mining the size of the step To control the effective capacity ofthe model we need to set the value of learning rate in anappropriate range e learning rate determines how fast theparameter θmoves to its optimum value If the learning rate istoo large gradient descent can inadvertently increase ratherthan decrease the training error namely the parameters arelikely to cross the optimal value However if the learning rateis too small the training is not only slower but may becomepermanently stuck with a high training error So the learningrate is crucial to the performance of the model

Based on the above analysis we pick the learning rateapproximately on a logarithmic scale ie the learning ratetaken within the set 01 001 10minus 3 10minus 4 10minus 51113864 1113865 At thebeginning of model training we set the learning rate aslarger value to faster reach the optimum value en withthe number of training increasing we set the smaller valuefor not crossing the optimal value e detailed settings aredescribed in Algorithm 1

(5) Activation Function In general we must use a nonlinearfunction to describe the features Most neural networks doso using an affine transformation controlled by learnedparameters followed by a fixed nonlinear function called anactivation function Activation function plays an importantrole in neural networks e commonly used activationfunctions include tanh(middot) relu(middot) and sigmoid functionwhich is defined as σ(middot) ie equation (8)) in this paperBecause the sigmoid function is easy to saturate whichcauses the gradient to change slowly during training it isgenerally no longer used as an activation function except inRNN-LSTM (long short term memory)

σ(x) 1

1 + eminus x (8)

After many attempts (ie we compare the efficiency andperformance by means of the exhaustive method manually)we finally choose the tanh(middot) and relu(middot) as activationfunctions for FNN and RNN respectively ey can beexpressed as

tanh(x) 1 minus eminus 2x

1 + eminus 2x (9)

Compared with σ(middot) tanh(middot) is symmetrical about theorigin In some cases this symmetry can give better

Security and Communication Networks 7

performance It compresses the real-valued input to a rangeof minus 1 to 1 and the mean of its output is zero which makes itconverge faster than the σ(middot) and reduce the number ofiterations erefore this is suitable for activation functionand the zero-centered training dataset contributes theconvergence speed of model training

relu(x) max(0 x) (10)

where relu(middot) is currently a popular activation function It islinear and gets the activation value which requires the onlyone threshold We choose this function based on the fol-lowing two considerations On the one hand it solvesvanishing gradient problem of back propagation throughtime (BPTT) algorithms for the reason that the derivative ofrelu(middot) is 1 On the other hand it greatly improves the speedof calculation because it only needs to judge whether theinput is greater than 0

(6) Output Function e output function is used in the finallayer of a neural network model e predictors for timeseries are considered as a solution to a multiclass classifi-cation problem so we take softmax(middot) as the output func-tion which can be expressed as

yi softmax zi( 1113857 ezi

1113936si1 ezi

(11)

where s is the size of sample space and softmax(zi) denotesthe probability of the output is zi and satisfies that1113936

si1 yi 1

ie the sum of the probability of all the outputs is equal to 1Such networks are commonly trained under a cross-entropyregime (ie the loss function mentioned above)

322 Selection of Testing Dataset Length To better estimatethe entropy of the data source the length of the testingdataset is very important for min-entropy estimation forrandom numbers generated by different types of sources Inreality most entropy sources are time-varying (namelynonstationary) which means the probability distribution ofthe output sequences from the source is changing over timeSo the length of the testing dataset shall be adaptive to thetype of the source

erefore as described in Section 31 we utilize c todetermine the size of testing dataset Specifically in ourstrategy for the stationary entropy source of which theprobability distribution of the outputs from the source is notchanging over time the parameter c is preset to 20

Relatively for the nonstationary entropy source all obser-vation points (namely c is 100) need to serve as the testingdataset

To verify the reasonableness of the c value we computethe root-mean-squared error (RMSE) of the lowest esti-mations of our predictors over 80 sequences from the fol-lowing simulated datasets generated by nonstationarysource

(i) Time-Varying Normal Distribution Rounded to In-tegers e samples are subject to a normal distri-bution and rounded to integer values but themean ofthe distribution moves along a sine curve to simulatea time-varying signal

e RMSE ie

(1N) 1113936Ni1 ( 1113954Hmin minus Hmin)2

1113969

refers tothe arithmetic square root for the mean of the squares of theerrors or deviations for each class of simulated sources Notethat here N indicates the number of test samples 1113954Hminindicates the estimated result for each sample and Hminmeans the theoretical result for each sample In other wordsthe smaller the RMSE is the closer the estimated result is tothe theoretical entropy which indicates the predictor has abetter accuracy

As shown in Table 1 we can see that for the time-varyingdata source only when the c is 100 (namely the entiredataset shall be used for min-entropy estimation) the pre-dictors can give themost accurate resultsismeans when theprobability distribution of data sources is varying with timethe part of the input dataset cannot represent the overalldistribution of the input dataset so the part of the input datasetcannot accurately give the estimation result of the entire inputdataset Besides for the stationary sources it is reasonable thatc is preset to 20 because the estimated results obtained byour method are very close to the correct entropy (in theory) ofthe selected entropy source as presented in Section 41

33 Evaluation onOurPredictors In this section we conductsome experiments on simulated datasets to verify the accuracyof our proposed predictors for the min-entropy estimationand compare the experimental results with theoretical resultsIn addition we have a theoretical analysis of the complexity ofour predictors Note that in Section 4 we will apply ourpredictors to different data sources and provide the com-parison on our predictors with 90Brsquos predictors

331 Accuracy Verification We train our predictive modelsFNN and RNN on a number of representative simulated datasources (including stationary and nonstationary entropysources) of which the theoretical entropy can be obtainedfrom the known probability distribution of the outputsSimulated datasets are produced using the following dis-tribution families adopted in [24]

(1) Simulated Datasets Generated by Stationary Sources

(i) Discrete Uniform Distribution e samples areequally likely which come from an IID source

(1) if train numlt train dataset size3 then(2) learning rate⟵ 01 001 (3) else if train numlt train dataset size15 then(4) learning rate⟵ 001 10minus 3 10minus 41113864 1113865(5) else(6) learning rate⟵ 10minus 4 10minus 51113864 1113865(7) end if

ALGORITHM 1 Setting of learning rate

8 Security and Communication Networks

(ii) Discrete Near-Uniform Distribution All samples areequally likely except one which come from an IIDsource A certain sample has a higher probabilitythan the rest

(iii) Normal Distribution Rounded to Integers esamples are subject to a normal distribution androunded to integer values which come from an IIDsource

(iv) Markov Model e samples are generated using adth-order Markov model which come from a non-IID source

(2) Simulated Datasets Generated by Nonstationary Sourcesese datasets are the same as those used in Section 322

For every class listed above we generate a set of 80simulated datasets each of which contains 106 samples andestimate min-entropy by using predictive models FNN andRNN respectively For each dataset the theoretical min-entropy Hmin is derived from the known probabilitydistribution

From Figures 4ndash9 the abscissa in the figure representsthe theoretical entropy of the test sample and the ordinaterepresents the estimated entropy of the test sample Figure 4shows the estimated entropy results for the 80 simulateddatasets with uniform and near-uniform distributions re-spectively From Figures 4(a) and 4(b) we see that the es-timated results given by our proposed two predictive models(FNN and RNN) are almost consistent with the theoreticalentropy for both uniform and near-uniform distributionsSo the final estimated result which is the minimum result ofthe two predictive models is also basically consistent with thetheoretical entropy Figure 5 shows the estimated entropyresults for the 80 simulated datasets with normal distribu-tions and time-varying normal distributions respectivelyFrom Figures 5(a) and 5(b) we can see that the estimatedresults given by our proposed two predictive models areclose to the theoretical entropy with normal distributionsand time-varying normal distributions According to ourexecution strategy here we calculate min-entropy estima-tions using the whole input dataset for time-varying normaldistributions

Figure 6 shows the estimated results of Markov distri-butions we can see that both of our predictive models give anumber of overestimates when applied to the Markovsources particularly with the theoretical entropy increasing

Table 2 shows the relative errors (namely|( 1113954Hmin minus Hmin)Hmin|lowast100) between the theoretical re-sults and the estimated results of FNN and RNN to furtherreflect the accuracy of the models 1113954Hmin and Hmin have thesame meaning as in Section 322 We see that the entropy tobe estimated with an error of less than 602 for FNN and7 for RNN for the simulated classes respectively

Based on the above accuracy verification of our pre-dictors with simulated datasets from different distributionswhat we can be sure is that out predictors can give almostaccurate results except Markov distributions

332 Complexity Analysis To analyze the usability of ourpredictors in terms of execution efficiency we derive thefollowing computational complexity through the analysis oftheory and principle of implementation

We believe that the computational complexity of entropyestimators used for RNG evaluation mainly comes from thesample space and sample size For ease of analysis we definethe following parameter n as the sample size which indicatesthe length of the sample s as the sample space which meansthe kinds of symbols in the sample (ie s 8means there are8 symbols in the sample and the bit width of each symbol islog2(8) 3 such as 010 110 111 ) and k denotes themaximum step of correlation which is set as a constant in90Brsquos predictors (k 16) and our predictors (k 20)

rough the analysis of the implementation the com-putational complexity of the final 90Brsquos predictors [23]mainly comes from the MultiMMC predictor and is of orderO(sk middot n + 2k middot n middot log2(s)) which is mainly linear timecomplexity of n and k-order polynomial time complexity ofs While the computational complexity of our predictor is oforder O(s middot n) which is linear time complexity of s and n Itcan be seen that the computational complexity of ourpredictors is much lower than that of the 90Brsquos predictors

It is important to note that the MultiMMC predictorrequires sk≪ n otherwise this predictor cannot give ac-curate estimated results statisticallyat is to say when the sis increasing the MultiMMC predictor requires largersample size in order to estimate the entropy accurately

From the above analysis we can see our predictors havelower computational complexity We will give the experi-mental proof in Section 43

4 Comparison on Our Predictors With90BrsquoS Predictors

In this section a large number of experiments have beendone to evaluate our proposed predictors for entropy esti-mation from the aspects of accuracy applicability and ef-ficiency by applying our predictors to different simulateddata and real-world data For the experiments mentionedabove we compare the results with the final 90Brsquos predictors[23] to highlight the advantages of our work Similarly ourpredictors in these experiments compute an upper-bound ofmin-entropy estimation at the significance level α 001which is the same as 90Brsquos predictors

41 Comparison on Accuracy

411 Simulated Data e simulated datasets are producedusing the same distribution families as described in Section331 Further we append the following two new distributionfamilies such as pseudorandom sequence and post-processing sequence which are representative and com-monly used in reality

Table 1 Error measures of final estimations of our predictors fornonstationary sources with different c values

c 01 02 04 06 08 1RMSE 00911 01364 00788 00817 00219 00149

Security and Communication Networks 9

eoretical entropyFNNRNN

0

1

2

3

4

5

6

7

8Es

timat

ed en

trop

y pe

r sam

ple

1 2 3 4 5 6 7 80eoretical entropy per sample

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 4 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for(a) uniform distributions and (b) near-uniform distributions

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entro

py p

er sa

mpl

e

(b)

Figure 5 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for (a) normaldistributions and (b) time-varying normal distributions

10 Security and Communication Networks

Theoretical entropyFNNRNN

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

Figure 6 Comparison of estimated results obtained from our two predictive models with the theoretical entropy for Markov distributions

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 7 Comparison of the min-entropy estimation obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) uniform distributions and (b) near-uniform distributions

Security and Communication Networks 11

(i) M-Sequence A maximum length sequence which isa type of pseudorandom binary sequence ([35])

(ii) Nonuniform Distribution by Postprocessing UsingLFSR e samples are processed using a linearfeedback shifting register (LFSR) which come froman IID source ([35])

For every distribution mentioned above we also gen-erate a set of 80 simulated datasets each of which contains106 samples and estimate min-entropy by using our pro-posed predictors and final 90Brsquos predictors [23]

Figure 7 shows the estimated min-entropy results for the80 simulated datasets with uniform distributions and near-uniform distributions respectively From Figures 7(a) and7(b) we see that several points of the results obtained fromthe 90Brsquos predictors are apparently underestimated whichmay result from the overfitting phenomenon Comparedwith 90Brsquos predictors our predictors provide more accurateresults

Figure 8 shows the estimated min-entropy results fornormal distributions and time-varying normal distributionsrespectively From Figures 8(a) and 8(b) we can see that theestimated results given by our predictors are close to thetheoretical entropy with normal distributions and time-varying normal distributions However the lowest entropyestimation results obtained from the 90Brsquos predictors givesignificant underestimates

Figure 9 shows the estimated min-entropy results forMarkov distributions We can see that the 90Brsquos predictorsalmost give underestimates compared with the theoreticalentropy while estimated results given by our predictors aremuch closer to the theoretical entropy than those obtainedfrom 90Brsquos predictors

To further obviously compare the accuracy of our and90Brsquos predictors we apply the predictors to the M-sequenceand the non-uniform distribution sequence by post-pro-cessing using LFSR and their theoretical entropy is a knownand fixed value

It is further confirmed that the higher stage (the max-imum step of correlation) M-sequence and nonuniformdistribution sequence by postprocessing using LFSR are ableto pass the NIST SP 800-22 statistical tests [8] e estimatedresults are listed in Tables 3 and 4 and the lowest entropyestimations from 90Brsquos predictors and our predictors foreach stage are shown in bold font

For M-sequence and nonuniform distribution by post-processing using LFSR the MultiMMC predictor presentedin the final 90B gives the most accurate entropy estimationresults for the stage le 16 However when the stage ofM-sequence and nonuniform distribution by postprocessingusing LFSR is greater than 16 the MultiMMC predictorcannot give accurate entropy estimation result because thispredictor is parameterized by k isin 1 2 16 (k is themaximum step of correlation) Perhaps we could set the

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7Es

timat

ed en

trop

y pe

r sam

ple

(a)

eoretical entropy90Brsquos predictorsOur predictors

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e1 2 3 4 5 6 70

eoretical entropy per sample

(b)

Figure 8 Comparison of the min-entropy estimations obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) normal distributions and (b) time-varying normal distributions

12 Security and Communication Networks

parameter of the MultiMMC predictor as a greater range toachieve a more accurate estimated result for the higher stagebut the time complexity grows exponentially with the pa-rameter k as we analyzed in Section 332 Moreover theFNN model can also give accurate estimated results eventhough the stages of M-sequence and LFSR are greater than16 However the RNN model can give accurate estimated

results only when the stage is 8erefore the FNNmodel ismore matched to M-sequence and nonuniform distributionby postprocessing using LFSR than RNN

We also compute the relative errors of estimated resultsfrom 90Brsquos predictors and our predictors over 80 se-quences from each class of simulated sources We calculatethe relative errors using the min-entropy obtained from90Brsquos predictors (the lowest estimation result of 90Brsquos fourpredictors) and our predictors (the lowest estimation re-sult of FNN and RNN) respectively As illustrated inTable 5 it shows that for all five classes of simulatedsources the errors of our predictors are lower than that ofthe 90Brsquos predictors Specially our approaches enable theentropy to be estimated with an error of less than 655but it is up to 1465 for 90Brsquos predictors Overall thisindicates that our proposed predictors have a betterperformance than that of 90Brsquos predictors on accuracy forboth stationary sequences and nonstationary sequenceswhich is consistent with the conclusion drawn in thefigures above

From Tables 2ndash4 we also find that the accuracy of theRNN predictive model is slightly higher than that of theFNN predictive model except for the cases of the Markovsources M-sequence and nonuniform distribution bypostprocessing using LFSR

We will further verify the applicability for time-varyingsources in Section 42 erefore through the evaluation onthe entropy estimation results of the above simulateddatasets we see that our proposed predictors are superior inaccuracy compared with the 90Brsquos predictors

412 Real-World Data We further apply our predictors tothe datasets which are generated from the RNGs deployed inthe real-world In fact the theoretical entropy of per sampleis unknown for these real-world sources so no error can becompared like the simulated datasets for the predictorsHowever the estimated results from the predictors pre-sented here can still be compared to the 90Brsquos predictorsbased on the knowledge that underestimates from thepredictors have theoretical bounds

Datasets of real-world data are produced using thefollowing approaches e first two are adopted in [24] andthe others are commonly used typical RNGs e estima-tions of the real-world sources are presented in Table 6

(i) RANDOMORG is is a service that providesrandom numbers based on atmospheric noise and isused in [24] It allows the user to specify the

Theoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

Figure 9 Comparison of the min-entropy estimations obtainedfrom our proposed predictors and 90Brsquos predictors with the the-oretical entropy for Markov distributions

Table 2 Relative errors of FNN and RNN estimation results

Simulated data class FNN () RNN ()Uniform 165 16Near-uniform 160 152Normal 117 108Time-varying normal 212 184Markov 602 7

Table 3 Estimated results for M-sequence (Hmin 0000)

Stage 8 10 12 14 16 18 20MultiMCW 0991 0996 0988 0989 0993 0999 lowast 1000Lag 1000 1000 1000 1000 1000 1000 1000MultiMMC 0000 0000 0000 0000 0000 1000 1000LZ78Y 1000 1000 1000 1000 1000 1000 0997 lowast

FNN 0000 0000 0000 0000 0000 0000 0000RNN 0000 1048 1007 1002 0996 09920 09997

Table 4 Estimated results for nonuniform distribution by post-processing using LFSR (Hmin 0152)

Stage 8 10 12 14 16 18 20MultiMCW 0440 0595 0743 0721 0998 0994 0998Lag 0581 0581 0680 0680 0992 0994 0999MultiMMC 0151 0153 0158 0181 0234 0995 0996LZ78Y 0567 0995 0766 0679 0997 0996lowast 0994lowast

FNN 0151 0145 0149 0147 0149 0142 0144RNN 0149 0947 1012 0998 1012 0997 0985

Security and Communication Networks 13

minimum and maximum values that are outpute sequence used here consists of bits

(ii) Ubldit TrueRNGpro TrueRNGpro provides a steadystream of random numbers through a USB CDCserial port which is a USB randomnumber generatorproduced by Ubldit is entropy source is also usedin [24] e sequence used here consists of bits

(iii) Linux kernel entropy source e Linux kernelrandom generator is used for the generation of areal-world sequence without any processing esequence used here is the last bit of per symbol

(iv) Linuxdevurandomedevurandom [6] of Linuxis used for the generation of a real-world sequencewith strict processing e sequence used hereconsists of bits

(v) Windows RNG Windows RNG [5] is used for thegeneration of a real-world sequence by calling aCrypto APIe sequence used here consists of bits

As illustrated in Table 6 the lowest entropy estimationfor each source is shown in bold font We see that ourpredictors perform better than 90Brsquos predictors because thelowest entropy estimation is always obtained from our workfor each real-world source Furthermore for Linux kernelentropy source we find that both of the predictor Lag andMultiMMC are able to give lower estimation results Itindicates that Linux kernel entropy source has periodicityand conforms to the Markov model which is well under-stood because the randomness of Linux kernel entropysource comes from human behaviors such as manipulatingthe mouse and keyboard In our work compared with theentropy estimations for other real-world sources FNN fitsmuch better than RNN for Linux kernel entropy sourcewhich is consistent with the previous view that FNN per-forms well in testing Markov sources

42 Comparison on the Scope of Applicability After evalu-ating the accuracy we further validate the scope of appli-cability of our proposed predictors and compare them withthat of the 90Brsquos predictors Kelsey et al [24] stated that eachof the 90Brsquos predictors performs well only for a specialdistribution as described in Section 221 To prove ourpredictor has better applicability the following four simu-lated datasets are generated which are suitable for eachpredictor employed in the final 90B

(i) Time-Varying Sourcese probability distribution ofdata sources is varying with timeeMCWpredictor

predicts the current output according to previousoutputs in a short period of time and thus the MCWpredictor performs well in these data sources

(ii) Periodic Sources e data source changes periodi-cally e lag predictor predicts the value that oc-curred samples back in the sequence as the currentoutput and thus the lag predictor performs well onsources with strong periodic behavior

(iii) Markov Sources e data sources can be modeledby the Markov model e MultiMMC predictorpredicts the current output according to theMarkovmodel and thus the MultiMMC predictor performswell on data from any process that can be accuratelymodeled by a Markov model

(iv) LZ78Y Sources e data sources can be efficientlycompressed by LZ78-like compression algorithmswhich applies to the LZ78Y predictor well

For each above simulated source we generate a set of 10simulated datasets each of which contains 106 samples andthe min-entropy is estimated by our and 90Brsquos predictorse final result for a predictor is the average value of 10estimated results corresponding to the 10 simulated datasetsfor one simulated source

421 Time-Varying Sources Firstly we generate the time-varying binary data which is suitable for the statistical be-haviors of the MCW predictor presented in the 90B Table 7shows the entropy estimation results for time-varying data

As shown in Table 7 symbol gradual(x) (x isin [0 1] thesame below) is defined as a simulated source that theprobability of output ldquo0rdquo changes gradually from x to 1 minus x

with time Symbol period(x) is defined as a simulated sourcethat the probability of output ldquo0rdquo changes periodically withtime and the probability varies from x to 1 minus x in one periode period length is set to 20 of the entire input datasetSymbol sudden(x) is defined as a simulated source that theprobability of output ldquo0rdquo changes suddenly with timenamely the probability is set to x for the first half of the inputdataset and 1 minus x for the last half

From Table 7 the estimation results for MCW predictorand our work are shown in bold font We see that the MCWpredictor gives the lowest and most accurate entropy esti-mations for the three types of time-varying data mentionedabove but it gives a little underestimates at gradual(02) andperiod(02) It is confirmed that the time-varying sourcesmentioned above match with the statistical behaviors of theMCW predictor Relatively we find that our proposedpredictive models are all capable to obtain the satisfiedentropy estimations that are close to the correct valueserefore it is proved that our proposed predictive modelsare suitable for the time-varying datamentioned above Notethat we calculate the min-entropy estimate according to theentire dataset rather than the last 20 of the input dataset forthese time-varying sources Because the probability distri-bution is varying with time the part of the input datasetcannot represent the overall distribution of the input dataset

Table 5 Relative errors of the final estimations of 90Brsquos predictorsand our predictors for five classes of simulated sources

Simulated data class 90Brsquos predictors () Our predictors ()Uniform 437 153Near-uniform 347 159Normal 608 157Time-varying normal 347 172Markov 1465 655

14 Security and Communication Networks

422 Periodic Sources Secondly we generate periodic datawhich are suitable for the statistical behaviors of the lagpredictor presented in 90B e following is entropy esti-mation results for periodic sequences e data source iscompletely obeying the periodic rule so the correct entropyis zero e bit width of samples is traversed from 2 to 8

As shown in Table 8 the estimation results for the lagpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated periodicsources we confirm that the lag predictor is suitable for theentropy estimation of this type of source as expected Rel-atively the RNN can also give the accurate min-entropyestimates ie estimated results are zeros us our pro-posed predictive models are suitable for the entropy esti-mation of the (strong) periodic data In addition theMultiMMC predictor can also give the accurate min-entropyestimations is is reasonable because periodicity is also aform of correlation

423 Markov Sources Next we generate multivariateM-sequences as Markov sources which fit the statisticalbehaviors of the MultiMMC predictor Specifically themultivariate M-sequences are composed of multiple M-se-quences with different initial states Due to the determinacyof this type of sequences the correct entropy is zero e bitwidth of the samples is also traversed from 2 to 8 emaximum step of correlation used here is set as 8 Table 9shows the estimated results for multivariate M-sequences

From Table 9 the estimation results for MultiMMCpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated Markovsources we confirm that theMultiMMC predictor is suitablefor the entropy estimation of this type of source as expectedRelatively the RNN can also give the accurate min-entropyestimations ie estimated results are zeros us our

proposed predictive models are suitable for the Markovsources

424 LZ78Y Sources Finally we verify the applicability ofthe LZ78Y sources is type of entropy source is difficult togenerate by simulating However we can still draw theconclusion that our proposed predictive models can beapplied to the LZ78Y sources according to Tables 8 and 9 initalic font Because the periodic data and Markov sequencesare compressible

425 Summary on Applicability Scope of Our PredictorsBy analyzing the experimental results of the above fourspecific simulated sources each of which is oriented towardsa certain predictor in the 90B we have a conclusion that ourpredictors can provide accurate estimated results of entropySo the proposed predictors are well applied to these entropysources as well as the 90Brsquos predictors In addition com-pared with 90Brsquos predictors our predictors have a betterperformance on the scope of applicability for testing the

Table 7 Entropy estimates for time-varying data

Data class Correct90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNgradual(02) 06345 05290 07808 07240 07790 06288 06289gradual(03) 07437 07378 09221 08416 09243 07430 07460gradual(04) 08645 08631 09786 09518 09739 08648 08637period(02) 06345 05537 07428 05537 07669 06205 06209period(03) 07437 07393 09218 08476 09233 07377 07375period(04) 08645 08639 09767 09632 09796 08653 08632sudden(02) 03219 03203 04663 03386 04484 03217 03229sudden(03) 05146 05110 05857 09984 07663 05110 05119sudden(04) 07370 07338 08699 09984 09389 07339 07345

Table 8 Entropy estimates for periodic sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 16458 00000 00000 11817 00079 000003 23318 00000 00000 15957 01315 000004 29147 00000 00000 18016 04748 000005 33269 00000 00000 14586 08898 000006 39092 00000 00000 08322 34944 000007 44908 00000 00000 03973 34960 000008 44919 00000 00000 02027 35408 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 6 Entropy estimates for real-world sources

Real-world sources90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNRANDOMORG 09951 09963 09966 09976 09802 09954Ubldit TrueRNGpro 09979 09955 09973 09966 09934 09728Linux kernel entropy source 06173 01232 01269 06164 01230 03068Linuxdevurandom 09952 09935 09990 09964 09983 09911Windows RNG 09953 09986 09975 09984 09833 09853

Security and Communication Networks 15

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 4: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

collision test partial collection test Markov testcompression test and frequency test which aresuitable for sources that do not necessarily satisfythe IID assumption But these estimators give sig-nificant underestimates which were found by Kelseyet al [24] through experiments

(ii) Subsequently the 90B was updated to the seconddraft [22] in January 2016 and the estimators basedon predictors which were proposed by Kelsey et alfor the first time were adopted Compared with thefirst draft the second draft has the following mainchanges (1) Among these estimators the partialcollection test in the first draft was deleted and thefrequency test in the first draft was replaced by mostcommon value estimator and two new estimatorsincluding t-tuple estimator and longest repeatedsubstring estimator were added (2) e secondimportant update was the addition of four predic-tors for entropy estimation However the under-estimation problem was not solved in the seconddraft Zhu et al [34] proved the underestimationproblem for non-IID data from theoretical analysisand experimental validations and proposed animproved method

(iii) e final official version of 90B [23] was publishedin January 2018 In the final 90B estimators withsignificant underestimates such as collision esti-mator and compression estimator are modified tobe limited to only for binary inputs which mayreduce the overall execution efficiency of min-en-tropy estimation for nonbinary inputs In additionthe calculation process and method of key variablesfor min-entropy estimation are also corrected suchas Pglobalprime min-entropy of each predictor

221 Execution Strategy of Min-Entropy Estimation in 90Be 90B takes the following strategy to estimate the min-entropy It first checks whether the tested datasets are IID ornot On the one hand if the tested datasets are non-IID thereare ten estimators as mentioned in Section 222 for entropyestimation Each estimator calculates its own estimation in-dependently then among all estimations theminimum one isselected as the final estimation result for the entropy sourceOn the other hand if the tested datasets are considered IIDonly the most common value estimator is employed Finallyit applies restart tests and gives the entropy estimation

Note that this article only focuses on the analysis andcomparison with the 90Brsquos four predictors and the researchon other parts of the 90B is not considered in this study

222 Estimators in 90B In the final NIST SP 800-90B thereare ten estimators and each estimator has its own specificcharacteristics According to the underlying methods theyemployed we divide these estimators into three classesfrequency-based type entropy statistic based type andpredictors based typee following is a brief introduction ofthe ten estimators and the details can be found in [23]

(1) Predictor-Based Type Estimators e followings are fourpredictors proposed by Kelsey et al [24] for entropy esti-mation for the first time Kelsey et al utilized several ma-chine learning models served as predictors to improve theaccuracy of entropy estimation But these predictors per-form well only for specific distributions

(i) Multi Most Common in Window (MultiMCW)Predictor is predictor performs well in caseswhere there is a clear most common value but thatvalue varies over time

(ii) Lag Predictor e Lag subpredictor predicts thevalue that occurred N samples back in the sequenceis predictor performs well on sources with strongperiodic behavior if N is close to the period

(iii) Multi Markov Model with Counting (MultiMMC)PredictoreMultiMMC subpredictor predicts themost common value followed the previousN samplestring e range of the parameter N is set from 1 to16 is predictor performs well on data from anyprocess that can be accurately modeled by anNth-order Markov model

(iv) LZ78Y Predictor is predictor performs well onthe sort of data that would be efficiently compressedby LZ78-like compression algorithms

(2) Other Estimators Four frequency-based type estimatorsand two entropy statistic-based type estimators are de-scribed as follows e min-entropy of the former type ofestimators is calculated according to the probability of themost-likely output value and the other is based on entropicstatistics presented by Hagerty and Draper [33] Amongthem three estimators including Markov estimator Col-lision estimator and Compression estimator explicitlystate that they only apply to binary inputs in the final 90Bpublished in 2018 which may reduce the execution effi-ciency for nonbinary inputs as proved through experimentsin Section 43

(i) Most Common Value Estimate is estimator cal-culates entropy based on the number of occurrencesof the most common value in the input dataset andthen constructs a confidence interval for this pro-portion e upper bound of the confidence intervalis used to estimate the min-entropy per sample ofthe source

(ii) Markov Estimate is estimator computes entropybymodeling the noise source outputs as a first-orderMarkov model e Markov estimate provides amin-entropy estimate by measuring the depen-dencies between consecutive values from the inputdataset is method is only applied to binaryinputs

(iii) T-Tuple Estimate is method examines the fre-quency of t-tuples (pairs triples etc) that appears inthe input dataset and produces an estimate of theentropy per sample based on the frequency of those t-tuples

4 Security and Communication Networks

(iv) Longest Repeated Substring Estimate (LRS estimate)is method estimates the collision entropy (sam-pling without replacement) of the source based onthe number of repeated substrings (tuples) withinthe input dataset

(v) Collision Estimate is estimator calculates entropybased on the mean number of samples to see the firstcollision in a dataset where a collision is any repeatedvalue is method is only applied to binary inputs

vi) Compression estimate is estimator calculatesentropy based on how much the tested data can becompressed is method is also only applied tobinary inputs

223 Min-Entropy Estimation of 90Brsquos Predictors Eachpredictor in the 90B attempts to predict the next sample in asequence according to a certain statistical property of pre-vious samples and provides an estimated result based on theprobability of successful prediction Every predictor consistsof a set of subpredictors and chooses the subpredictor withthe highest rate of successful predictions to predict thesubsequent output As for each predictor it calculates theglobal predictability and local predictability with the upperbound of the 99 confidence interval and then derives theglobal and the local entropy estimations respectively Fi-nally the final entropy estimation for this predictor is theminimum of the global and the local entropy estimations

For estimating the entropy of a given entropy sourceeach predictor offers a predicted result after testing theoutputs produced by the source and provides an entropyestimation based on the probability of successful predictionsAfter obtaining the estimations from the predictors theminimum estimation of all the predictors is taken as the finalentropy estimation of the entropy source

e entropy estimation will be too loose if there is nopredictor applied to detect the predictable behaviors But if a setof predictors with different approaches are applied they canguarantee that the predictor which is the most effective atpredicting the entropy sourcersquos outputs determines the entropyestimation

23 Two Predictive Models Based on Neural NetworksNext we will introduce two main neural network models tohelp us design predictors for entropy estimation feedfor-ward neural networks (FNNs) and recurrent neural net-works (RNNs) respectively

231 FNNs e goal of a feedforward network is to ap-proximate some function flowast For an instance a classifierY flowast(X) maps an input X to a category Y A feedforwardnetwork describes a mapping Y f(X θ) and learns thevalue of the parameters θ that result in the best functionapproximatione principle of FNN is depicted in Figure 1

For each time step from t 1 to t n (n isin Zlowast denotesthe sample size) the FNN applies the following forwardpropagation equations

Ht f b + WXt( 1113857

Yt c + VHt

1113954Yt g Yt( 1113857

(5)

e parameters and functions that govern the compu-tation happening in a FNN are described as follows

(i) Xt is the input at time step t and is a vector composedof the previous inputs (ie Xt [Xtminus k Xtminus 1]where k refers to the step of memory)

(ii) Ht is the hidden state at time step t where the biasvectors b and input-to-hidden weights W are de-rived via training e number of hidden layers andthe number of hidden nodes per layer are definedbefore training which called hyperparameters inneural networks

(iii) Yt is the output at step t where the bias vectors c andhidden-to-output weights V are derived viatraining

(iv) 1113954Yt is our predictive output at time step t which wouldbe a vector of probabilities across our sample space

(v) e function f(middot) is a fixed nonlinear functioncalled activation function and the function g(middot) isan output function used in the final layer of a neuralnetwork Both of the two functions belong tohyperparameters which are defined before training(Section 32)

e models are called feedforward because the infor-mation flows through the approximate function from theinput Xt through the internal computations used to updatethe model to define f(middot) and to the output 1113954Yt finally Be-sides there is no feedback connection namely the outputsof the model are fed back into itself

232 RNNs If adding the feedback connections to thenetwork then it is called RNNs In particular the RNNrecords the information that has been calculated so far anduse it for the calculation of the present output e principleof RNNs is depicted in Figure 2

For each time step from t 1 to t n the RNN appliesthe following forward propagation equations

Ht f b + WHtminus 1 + UXt( 1113857

Yt c + VHt

1113954Yt g Yt( 1113857

(6)

e parameters and functions that govern the compu-tation happening in a RNN are described as follows

Xt Ht Yt

Output layerInput layer Hidden layer

Figure 1 Feedforward neural network

Security and Communication Networks 5

(i) Xt is the input at time step t and is one-hot vectorFor example if Xt 1 and the sample spaceS 0 1 then Xt [0 1]

(ii) Ht is the hidden state at time step t It is theldquomemoryrdquo of the networkHt is calculated based onthe previous hidden state Htminus 1 and the input at thecurrent stepXt bU andW denote the bias vectorsinput-to-hidden weights and hidden-to-hiddenconnection into the RNN cell respectively

(iii) Yt is the output at step t c and V denote the biasvectors and hidden-to-output weights respectively

(iv) 1113954Yt is our predictive output at time step t whichwould be a vector of probabilities across our samplespace

(v) Similarly the function f(middot) is an activation functionand g(middot) is an output function which are definedbefore training (Section 32)

3 Predictors for Min-Entropy EstimationBased on Neural Network

e neural network is able to approximate the various PDFsand the complexity of neural network is increased slower(linear relationship) as the sample space increases Moti-vated by [24] we propose two predictive models based onneural networks for min-entropy estimation Next wepresent the execution strategy of our min-entropy estima-tors provide the choices of the important hyperparametersand give the analysis on the accuracy and complexity of ourpredictive models to prove that our design is feasible

31 Strategy of Our Predictors for Min-Entropy Estimatione execution strategy of our min-entropy estimator isdepicted in Figure 3 which consists of model training andentropy estimation Both of our proposed two predictivemodels (namely predictors) which are based on FNN andRNN respectively follow the same strategy

e benefit of this strategy is that it applies not only tostationary sequences generated by entropy sources but alsoto nonstationary sequences of which the probability dis-tribution is time-varying On the one hand in our strategyin order that the model can match the statistical behavior ofthe data source well we use the whole input dataset to trainand continuously update the model On the other hand toeffectively estimate the entropy of the data source we use thepredictive model to compute the min-entropy only when thepredictive model is updated enough to characterize thestatistical behavior of the tested dataset Specifically for thetesting dataset which is used for computing entropy

estimation we preset the testing dataset as a part of thewhole observations and utilize a proportion parameter(c isin [0 1]) to determine the size of testing dataset namelythe last c of the inputs are used for computing entropy whilethe model is also updating

e workflow of the min-entropy estimator based onneural network is listed as the following steps

(1) Initialization choose one model (FNN or RNN) andset the hyperparameters of the model

(2) Input data input the tested dataset and judge theproportion of the remaining data in the entire testeddataset If the proportion of the remaining obser-vations le c record the accuracy for predicting thenext sample to be observed else continue

(3) Prediction predict the current output according tothe forward propagation equations

(4) Comparison and loss function observe the realoutput and compare it to the predicted value encompute the loss function

(5) Update predictive model compute the gradient andupdate the predictive model If the entire datasetruns out please turn to Step 6 else please repeat Step2 sim Step 5

(6) Calculate the predictive accuracy and probabilityobtain the accuracy of the executed predictive modelfrom the last c observations including global pre-dictions and local predictions and compute theprobability respectively

(7) Calculate the min-entropy calculate the min-en-tropy by the obtained probability After obtaining theestimations from the two predictors (FNN andRNN) respectively the minimum entropy of the twopredictors is taken as the final entropy estimation(namely min-entropy) of the tested dataset

Initialize the predictive model

Input the datasets to be observed

Predict the next sample

Unobservedproportion le γ

Comparison and compute theloss function

Update the predictive model

Record the accuracy

All the samplesruns out

Calculate the predictionprobability

N

Y

N

Y

Output min-entropy

Figure 3 Execution strategy of our predictors for min-entropyestimation based on neural network

Xt Ht Yt

Output layerInput layer Hidden layer

Htndash1

Figure 2 Recurrent neural network

6 Security and Communication Networks

Combining the calculation principle of min-entropy inSection 21 we can see the lower bound on the probabilityof making a correct prediction gives an upper bound on theentropy of the source In other words the more predictablea source is the larger probability of making correct pre-dictions is and the less entropy it has erefore a modelthat is a bad fit for the source or not fully trained will resultin inaccurate predictions a low accurate predictionprobability and a too-high entropy estimation of thesource So the models that are bad fit for the source or notfully trained can give big overestimates but notunderestimates

Further we can confirm that adding one more pre-dictor will not do any harm and conversely will make theentropy estimation much more accurate From the exe-cution strategy we can see that if all the predictors whosemodels are not matched for the noise source are usedalongside a predictor whose underlying model matchesthe sourcersquos behavior well then the predictor whichmatches the source well will determine the final entropyestimation

32 Choices of Important Parameters

321 Hyperparameters for FNN and RNN In neural net-works the choices of modelsrsquo hyperparameters have sig-nificant influences on the computational resource andperformance required to train and test erefore thechoices of hyperparameters are crucial to neural networksNext we illustrate the choices of some key hyperparameters

(1) Hidden Layers and Nodes Comprehensively balance theaccuracy and efficiency of our predictors in this paper forthe FNNmodel except for the multivariate M-sequences weset the number of hidden layers as 2 and the number ofhidden nodes per layer is 10 and 5 respectively While forthe multivariate M-sequences after extensive tests thenumber of hidden nodes per layer shall be larger to givebetter results By observing the results finally we set thenumbers as 35 and 30 respectively

(2) Step of Memory e step of memory determines thenumber of previous samples used for predicting the currentoutput Generally speaking the larger the value the betterthe performance However the computational resources(memory and runtime) increase as the step of memorygrows In this paper we set the step of memory as 20 bytrading off performance and resource at is to say as forthe FNN the input at time step t is the previous 20 observedvalues and as for the RNN the hidden layer contains 20unfolded hidden units

(3) Loss Function e loss function refers to the functionthat can measure the difference between the predicted valuesand the true values e total loss for a given sequence ofx x1 xn1113864 1113865 values paired with a sequence ofy y1 yn1113864 1113865 values would then be just the sum of thelosses over all the time steps For example if Lt is thenegative log-likelihood of yt given x1 xt then

L x1 xn1113864 1113865 y1 yn1113864 1113865( 1113857

1113944t

Lt

minus 1113944t

log2 pmodel yt

1113868111386811138681113868 x1 xt1113872 11138731113872 1113873

(7)

where pmodel(yt | x1 xt) is given by reading the entry foryt from the modelrsquos output vector 1113954yt e models are trainedto minimize the cross-entropy between the training data andthe modelsrsquo predictions (ie equation (7)) which isequivalent to minimizing the mean squared error (ie theaverage of the squares of the errors or deviations)

(4) Learning Rate Learning rate is a positive scalar deter-mining the size of the step To control the effective capacity ofthe model we need to set the value of learning rate in anappropriate range e learning rate determines how fast theparameter θmoves to its optimum value If the learning rate istoo large gradient descent can inadvertently increase ratherthan decrease the training error namely the parameters arelikely to cross the optimal value However if the learning rateis too small the training is not only slower but may becomepermanently stuck with a high training error So the learningrate is crucial to the performance of the model

Based on the above analysis we pick the learning rateapproximately on a logarithmic scale ie the learning ratetaken within the set 01 001 10minus 3 10minus 4 10minus 51113864 1113865 At thebeginning of model training we set the learning rate aslarger value to faster reach the optimum value en withthe number of training increasing we set the smaller valuefor not crossing the optimal value e detailed settings aredescribed in Algorithm 1

(5) Activation Function In general we must use a nonlinearfunction to describe the features Most neural networks doso using an affine transformation controlled by learnedparameters followed by a fixed nonlinear function called anactivation function Activation function plays an importantrole in neural networks e commonly used activationfunctions include tanh(middot) relu(middot) and sigmoid functionwhich is defined as σ(middot) ie equation (8)) in this paperBecause the sigmoid function is easy to saturate whichcauses the gradient to change slowly during training it isgenerally no longer used as an activation function except inRNN-LSTM (long short term memory)

σ(x) 1

1 + eminus x (8)

After many attempts (ie we compare the efficiency andperformance by means of the exhaustive method manually)we finally choose the tanh(middot) and relu(middot) as activationfunctions for FNN and RNN respectively ey can beexpressed as

tanh(x) 1 minus eminus 2x

1 + eminus 2x (9)

Compared with σ(middot) tanh(middot) is symmetrical about theorigin In some cases this symmetry can give better

Security and Communication Networks 7

performance It compresses the real-valued input to a rangeof minus 1 to 1 and the mean of its output is zero which makes itconverge faster than the σ(middot) and reduce the number ofiterations erefore this is suitable for activation functionand the zero-centered training dataset contributes theconvergence speed of model training

relu(x) max(0 x) (10)

where relu(middot) is currently a popular activation function It islinear and gets the activation value which requires the onlyone threshold We choose this function based on the fol-lowing two considerations On the one hand it solvesvanishing gradient problem of back propagation throughtime (BPTT) algorithms for the reason that the derivative ofrelu(middot) is 1 On the other hand it greatly improves the speedof calculation because it only needs to judge whether theinput is greater than 0

(6) Output Function e output function is used in the finallayer of a neural network model e predictors for timeseries are considered as a solution to a multiclass classifi-cation problem so we take softmax(middot) as the output func-tion which can be expressed as

yi softmax zi( 1113857 ezi

1113936si1 ezi

(11)

where s is the size of sample space and softmax(zi) denotesthe probability of the output is zi and satisfies that1113936

si1 yi 1

ie the sum of the probability of all the outputs is equal to 1Such networks are commonly trained under a cross-entropyregime (ie the loss function mentioned above)

322 Selection of Testing Dataset Length To better estimatethe entropy of the data source the length of the testingdataset is very important for min-entropy estimation forrandom numbers generated by different types of sources Inreality most entropy sources are time-varying (namelynonstationary) which means the probability distribution ofthe output sequences from the source is changing over timeSo the length of the testing dataset shall be adaptive to thetype of the source

erefore as described in Section 31 we utilize c todetermine the size of testing dataset Specifically in ourstrategy for the stationary entropy source of which theprobability distribution of the outputs from the source is notchanging over time the parameter c is preset to 20

Relatively for the nonstationary entropy source all obser-vation points (namely c is 100) need to serve as the testingdataset

To verify the reasonableness of the c value we computethe root-mean-squared error (RMSE) of the lowest esti-mations of our predictors over 80 sequences from the fol-lowing simulated datasets generated by nonstationarysource

(i) Time-Varying Normal Distribution Rounded to In-tegers e samples are subject to a normal distri-bution and rounded to integer values but themean ofthe distribution moves along a sine curve to simulatea time-varying signal

e RMSE ie

(1N) 1113936Ni1 ( 1113954Hmin minus Hmin)2

1113969

refers tothe arithmetic square root for the mean of the squares of theerrors or deviations for each class of simulated sources Notethat here N indicates the number of test samples 1113954Hminindicates the estimated result for each sample and Hminmeans the theoretical result for each sample In other wordsthe smaller the RMSE is the closer the estimated result is tothe theoretical entropy which indicates the predictor has abetter accuracy

As shown in Table 1 we can see that for the time-varyingdata source only when the c is 100 (namely the entiredataset shall be used for min-entropy estimation) the pre-dictors can give themost accurate resultsismeans when theprobability distribution of data sources is varying with timethe part of the input dataset cannot represent the overalldistribution of the input dataset so the part of the input datasetcannot accurately give the estimation result of the entire inputdataset Besides for the stationary sources it is reasonable thatc is preset to 20 because the estimated results obtained byour method are very close to the correct entropy (in theory) ofthe selected entropy source as presented in Section 41

33 Evaluation onOurPredictors In this section we conductsome experiments on simulated datasets to verify the accuracyof our proposed predictors for the min-entropy estimationand compare the experimental results with theoretical resultsIn addition we have a theoretical analysis of the complexity ofour predictors Note that in Section 4 we will apply ourpredictors to different data sources and provide the com-parison on our predictors with 90Brsquos predictors

331 Accuracy Verification We train our predictive modelsFNN and RNN on a number of representative simulated datasources (including stationary and nonstationary entropysources) of which the theoretical entropy can be obtainedfrom the known probability distribution of the outputsSimulated datasets are produced using the following dis-tribution families adopted in [24]

(1) Simulated Datasets Generated by Stationary Sources

(i) Discrete Uniform Distribution e samples areequally likely which come from an IID source

(1) if train numlt train dataset size3 then(2) learning rate⟵ 01 001 (3) else if train numlt train dataset size15 then(4) learning rate⟵ 001 10minus 3 10minus 41113864 1113865(5) else(6) learning rate⟵ 10minus 4 10minus 51113864 1113865(7) end if

ALGORITHM 1 Setting of learning rate

8 Security and Communication Networks

(ii) Discrete Near-Uniform Distribution All samples areequally likely except one which come from an IIDsource A certain sample has a higher probabilitythan the rest

(iii) Normal Distribution Rounded to Integers esamples are subject to a normal distribution androunded to integer values which come from an IIDsource

(iv) Markov Model e samples are generated using adth-order Markov model which come from a non-IID source

(2) Simulated Datasets Generated by Nonstationary Sourcesese datasets are the same as those used in Section 322

For every class listed above we generate a set of 80simulated datasets each of which contains 106 samples andestimate min-entropy by using predictive models FNN andRNN respectively For each dataset the theoretical min-entropy Hmin is derived from the known probabilitydistribution

From Figures 4ndash9 the abscissa in the figure representsthe theoretical entropy of the test sample and the ordinaterepresents the estimated entropy of the test sample Figure 4shows the estimated entropy results for the 80 simulateddatasets with uniform and near-uniform distributions re-spectively From Figures 4(a) and 4(b) we see that the es-timated results given by our proposed two predictive models(FNN and RNN) are almost consistent with the theoreticalentropy for both uniform and near-uniform distributionsSo the final estimated result which is the minimum result ofthe two predictive models is also basically consistent with thetheoretical entropy Figure 5 shows the estimated entropyresults for the 80 simulated datasets with normal distribu-tions and time-varying normal distributions respectivelyFrom Figures 5(a) and 5(b) we can see that the estimatedresults given by our proposed two predictive models areclose to the theoretical entropy with normal distributionsand time-varying normal distributions According to ourexecution strategy here we calculate min-entropy estima-tions using the whole input dataset for time-varying normaldistributions

Figure 6 shows the estimated results of Markov distri-butions we can see that both of our predictive models give anumber of overestimates when applied to the Markovsources particularly with the theoretical entropy increasing

Table 2 shows the relative errors (namely|( 1113954Hmin minus Hmin)Hmin|lowast100) between the theoretical re-sults and the estimated results of FNN and RNN to furtherreflect the accuracy of the models 1113954Hmin and Hmin have thesame meaning as in Section 322 We see that the entropy tobe estimated with an error of less than 602 for FNN and7 for RNN for the simulated classes respectively

Based on the above accuracy verification of our pre-dictors with simulated datasets from different distributionswhat we can be sure is that out predictors can give almostaccurate results except Markov distributions

332 Complexity Analysis To analyze the usability of ourpredictors in terms of execution efficiency we derive thefollowing computational complexity through the analysis oftheory and principle of implementation

We believe that the computational complexity of entropyestimators used for RNG evaluation mainly comes from thesample space and sample size For ease of analysis we definethe following parameter n as the sample size which indicatesthe length of the sample s as the sample space which meansthe kinds of symbols in the sample (ie s 8means there are8 symbols in the sample and the bit width of each symbol islog2(8) 3 such as 010 110 111 ) and k denotes themaximum step of correlation which is set as a constant in90Brsquos predictors (k 16) and our predictors (k 20)

rough the analysis of the implementation the com-putational complexity of the final 90Brsquos predictors [23]mainly comes from the MultiMMC predictor and is of orderO(sk middot n + 2k middot n middot log2(s)) which is mainly linear timecomplexity of n and k-order polynomial time complexity ofs While the computational complexity of our predictor is oforder O(s middot n) which is linear time complexity of s and n Itcan be seen that the computational complexity of ourpredictors is much lower than that of the 90Brsquos predictors

It is important to note that the MultiMMC predictorrequires sk≪ n otherwise this predictor cannot give ac-curate estimated results statisticallyat is to say when the sis increasing the MultiMMC predictor requires largersample size in order to estimate the entropy accurately

From the above analysis we can see our predictors havelower computational complexity We will give the experi-mental proof in Section 43

4 Comparison on Our Predictors With90BrsquoS Predictors

In this section a large number of experiments have beendone to evaluate our proposed predictors for entropy esti-mation from the aspects of accuracy applicability and ef-ficiency by applying our predictors to different simulateddata and real-world data For the experiments mentionedabove we compare the results with the final 90Brsquos predictors[23] to highlight the advantages of our work Similarly ourpredictors in these experiments compute an upper-bound ofmin-entropy estimation at the significance level α 001which is the same as 90Brsquos predictors

41 Comparison on Accuracy

411 Simulated Data e simulated datasets are producedusing the same distribution families as described in Section331 Further we append the following two new distributionfamilies such as pseudorandom sequence and post-processing sequence which are representative and com-monly used in reality

Table 1 Error measures of final estimations of our predictors fornonstationary sources with different c values

c 01 02 04 06 08 1RMSE 00911 01364 00788 00817 00219 00149

Security and Communication Networks 9

eoretical entropyFNNRNN

0

1

2

3

4

5

6

7

8Es

timat

ed en

trop

y pe

r sam

ple

1 2 3 4 5 6 7 80eoretical entropy per sample

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 4 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for(a) uniform distributions and (b) near-uniform distributions

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entro

py p

er sa

mpl

e

(b)

Figure 5 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for (a) normaldistributions and (b) time-varying normal distributions

10 Security and Communication Networks

Theoretical entropyFNNRNN

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

Figure 6 Comparison of estimated results obtained from our two predictive models with the theoretical entropy for Markov distributions

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 7 Comparison of the min-entropy estimation obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) uniform distributions and (b) near-uniform distributions

Security and Communication Networks 11

(i) M-Sequence A maximum length sequence which isa type of pseudorandom binary sequence ([35])

(ii) Nonuniform Distribution by Postprocessing UsingLFSR e samples are processed using a linearfeedback shifting register (LFSR) which come froman IID source ([35])

For every distribution mentioned above we also gen-erate a set of 80 simulated datasets each of which contains106 samples and estimate min-entropy by using our pro-posed predictors and final 90Brsquos predictors [23]

Figure 7 shows the estimated min-entropy results for the80 simulated datasets with uniform distributions and near-uniform distributions respectively From Figures 7(a) and7(b) we see that several points of the results obtained fromthe 90Brsquos predictors are apparently underestimated whichmay result from the overfitting phenomenon Comparedwith 90Brsquos predictors our predictors provide more accurateresults

Figure 8 shows the estimated min-entropy results fornormal distributions and time-varying normal distributionsrespectively From Figures 8(a) and 8(b) we can see that theestimated results given by our predictors are close to thetheoretical entropy with normal distributions and time-varying normal distributions However the lowest entropyestimation results obtained from the 90Brsquos predictors givesignificant underestimates

Figure 9 shows the estimated min-entropy results forMarkov distributions We can see that the 90Brsquos predictorsalmost give underestimates compared with the theoreticalentropy while estimated results given by our predictors aremuch closer to the theoretical entropy than those obtainedfrom 90Brsquos predictors

To further obviously compare the accuracy of our and90Brsquos predictors we apply the predictors to the M-sequenceand the non-uniform distribution sequence by post-pro-cessing using LFSR and their theoretical entropy is a knownand fixed value

It is further confirmed that the higher stage (the max-imum step of correlation) M-sequence and nonuniformdistribution sequence by postprocessing using LFSR are ableto pass the NIST SP 800-22 statistical tests [8] e estimatedresults are listed in Tables 3 and 4 and the lowest entropyestimations from 90Brsquos predictors and our predictors foreach stage are shown in bold font

For M-sequence and nonuniform distribution by post-processing using LFSR the MultiMMC predictor presentedin the final 90B gives the most accurate entropy estimationresults for the stage le 16 However when the stage ofM-sequence and nonuniform distribution by postprocessingusing LFSR is greater than 16 the MultiMMC predictorcannot give accurate entropy estimation result because thispredictor is parameterized by k isin 1 2 16 (k is themaximum step of correlation) Perhaps we could set the

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7Es

timat

ed en

trop

y pe

r sam

ple

(a)

eoretical entropy90Brsquos predictorsOur predictors

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e1 2 3 4 5 6 70

eoretical entropy per sample

(b)

Figure 8 Comparison of the min-entropy estimations obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) normal distributions and (b) time-varying normal distributions

12 Security and Communication Networks

parameter of the MultiMMC predictor as a greater range toachieve a more accurate estimated result for the higher stagebut the time complexity grows exponentially with the pa-rameter k as we analyzed in Section 332 Moreover theFNN model can also give accurate estimated results eventhough the stages of M-sequence and LFSR are greater than16 However the RNN model can give accurate estimated

results only when the stage is 8erefore the FNNmodel ismore matched to M-sequence and nonuniform distributionby postprocessing using LFSR than RNN

We also compute the relative errors of estimated resultsfrom 90Brsquos predictors and our predictors over 80 se-quences from each class of simulated sources We calculatethe relative errors using the min-entropy obtained from90Brsquos predictors (the lowest estimation result of 90Brsquos fourpredictors) and our predictors (the lowest estimation re-sult of FNN and RNN) respectively As illustrated inTable 5 it shows that for all five classes of simulatedsources the errors of our predictors are lower than that ofthe 90Brsquos predictors Specially our approaches enable theentropy to be estimated with an error of less than 655but it is up to 1465 for 90Brsquos predictors Overall thisindicates that our proposed predictors have a betterperformance than that of 90Brsquos predictors on accuracy forboth stationary sequences and nonstationary sequenceswhich is consistent with the conclusion drawn in thefigures above

From Tables 2ndash4 we also find that the accuracy of theRNN predictive model is slightly higher than that of theFNN predictive model except for the cases of the Markovsources M-sequence and nonuniform distribution bypostprocessing using LFSR

We will further verify the applicability for time-varyingsources in Section 42 erefore through the evaluation onthe entropy estimation results of the above simulateddatasets we see that our proposed predictors are superior inaccuracy compared with the 90Brsquos predictors

412 Real-World Data We further apply our predictors tothe datasets which are generated from the RNGs deployed inthe real-world In fact the theoretical entropy of per sampleis unknown for these real-world sources so no error can becompared like the simulated datasets for the predictorsHowever the estimated results from the predictors pre-sented here can still be compared to the 90Brsquos predictorsbased on the knowledge that underestimates from thepredictors have theoretical bounds

Datasets of real-world data are produced using thefollowing approaches e first two are adopted in [24] andthe others are commonly used typical RNGs e estima-tions of the real-world sources are presented in Table 6

(i) RANDOMORG is is a service that providesrandom numbers based on atmospheric noise and isused in [24] It allows the user to specify the

Theoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

Figure 9 Comparison of the min-entropy estimations obtainedfrom our proposed predictors and 90Brsquos predictors with the the-oretical entropy for Markov distributions

Table 2 Relative errors of FNN and RNN estimation results

Simulated data class FNN () RNN ()Uniform 165 16Near-uniform 160 152Normal 117 108Time-varying normal 212 184Markov 602 7

Table 3 Estimated results for M-sequence (Hmin 0000)

Stage 8 10 12 14 16 18 20MultiMCW 0991 0996 0988 0989 0993 0999 lowast 1000Lag 1000 1000 1000 1000 1000 1000 1000MultiMMC 0000 0000 0000 0000 0000 1000 1000LZ78Y 1000 1000 1000 1000 1000 1000 0997 lowast

FNN 0000 0000 0000 0000 0000 0000 0000RNN 0000 1048 1007 1002 0996 09920 09997

Table 4 Estimated results for nonuniform distribution by post-processing using LFSR (Hmin 0152)

Stage 8 10 12 14 16 18 20MultiMCW 0440 0595 0743 0721 0998 0994 0998Lag 0581 0581 0680 0680 0992 0994 0999MultiMMC 0151 0153 0158 0181 0234 0995 0996LZ78Y 0567 0995 0766 0679 0997 0996lowast 0994lowast

FNN 0151 0145 0149 0147 0149 0142 0144RNN 0149 0947 1012 0998 1012 0997 0985

Security and Communication Networks 13

minimum and maximum values that are outpute sequence used here consists of bits

(ii) Ubldit TrueRNGpro TrueRNGpro provides a steadystream of random numbers through a USB CDCserial port which is a USB randomnumber generatorproduced by Ubldit is entropy source is also usedin [24] e sequence used here consists of bits

(iii) Linux kernel entropy source e Linux kernelrandom generator is used for the generation of areal-world sequence without any processing esequence used here is the last bit of per symbol

(iv) Linuxdevurandomedevurandom [6] of Linuxis used for the generation of a real-world sequencewith strict processing e sequence used hereconsists of bits

(v) Windows RNG Windows RNG [5] is used for thegeneration of a real-world sequence by calling aCrypto APIe sequence used here consists of bits

As illustrated in Table 6 the lowest entropy estimationfor each source is shown in bold font We see that ourpredictors perform better than 90Brsquos predictors because thelowest entropy estimation is always obtained from our workfor each real-world source Furthermore for Linux kernelentropy source we find that both of the predictor Lag andMultiMMC are able to give lower estimation results Itindicates that Linux kernel entropy source has periodicityand conforms to the Markov model which is well under-stood because the randomness of Linux kernel entropysource comes from human behaviors such as manipulatingthe mouse and keyboard In our work compared with theentropy estimations for other real-world sources FNN fitsmuch better than RNN for Linux kernel entropy sourcewhich is consistent with the previous view that FNN per-forms well in testing Markov sources

42 Comparison on the Scope of Applicability After evalu-ating the accuracy we further validate the scope of appli-cability of our proposed predictors and compare them withthat of the 90Brsquos predictors Kelsey et al [24] stated that eachof the 90Brsquos predictors performs well only for a specialdistribution as described in Section 221 To prove ourpredictor has better applicability the following four simu-lated datasets are generated which are suitable for eachpredictor employed in the final 90B

(i) Time-Varying Sourcese probability distribution ofdata sources is varying with timeeMCWpredictor

predicts the current output according to previousoutputs in a short period of time and thus the MCWpredictor performs well in these data sources

(ii) Periodic Sources e data source changes periodi-cally e lag predictor predicts the value that oc-curred samples back in the sequence as the currentoutput and thus the lag predictor performs well onsources with strong periodic behavior

(iii) Markov Sources e data sources can be modeledby the Markov model e MultiMMC predictorpredicts the current output according to theMarkovmodel and thus the MultiMMC predictor performswell on data from any process that can be accuratelymodeled by a Markov model

(iv) LZ78Y Sources e data sources can be efficientlycompressed by LZ78-like compression algorithmswhich applies to the LZ78Y predictor well

For each above simulated source we generate a set of 10simulated datasets each of which contains 106 samples andthe min-entropy is estimated by our and 90Brsquos predictorse final result for a predictor is the average value of 10estimated results corresponding to the 10 simulated datasetsfor one simulated source

421 Time-Varying Sources Firstly we generate the time-varying binary data which is suitable for the statistical be-haviors of the MCW predictor presented in the 90B Table 7shows the entropy estimation results for time-varying data

As shown in Table 7 symbol gradual(x) (x isin [0 1] thesame below) is defined as a simulated source that theprobability of output ldquo0rdquo changes gradually from x to 1 minus x

with time Symbol period(x) is defined as a simulated sourcethat the probability of output ldquo0rdquo changes periodically withtime and the probability varies from x to 1 minus x in one periode period length is set to 20 of the entire input datasetSymbol sudden(x) is defined as a simulated source that theprobability of output ldquo0rdquo changes suddenly with timenamely the probability is set to x for the first half of the inputdataset and 1 minus x for the last half

From Table 7 the estimation results for MCW predictorand our work are shown in bold font We see that the MCWpredictor gives the lowest and most accurate entropy esti-mations for the three types of time-varying data mentionedabove but it gives a little underestimates at gradual(02) andperiod(02) It is confirmed that the time-varying sourcesmentioned above match with the statistical behaviors of theMCW predictor Relatively we find that our proposedpredictive models are all capable to obtain the satisfiedentropy estimations that are close to the correct valueserefore it is proved that our proposed predictive modelsare suitable for the time-varying datamentioned above Notethat we calculate the min-entropy estimate according to theentire dataset rather than the last 20 of the input dataset forthese time-varying sources Because the probability distri-bution is varying with time the part of the input datasetcannot represent the overall distribution of the input dataset

Table 5 Relative errors of the final estimations of 90Brsquos predictorsand our predictors for five classes of simulated sources

Simulated data class 90Brsquos predictors () Our predictors ()Uniform 437 153Near-uniform 347 159Normal 608 157Time-varying normal 347 172Markov 1465 655

14 Security and Communication Networks

422 Periodic Sources Secondly we generate periodic datawhich are suitable for the statistical behaviors of the lagpredictor presented in 90B e following is entropy esti-mation results for periodic sequences e data source iscompletely obeying the periodic rule so the correct entropyis zero e bit width of samples is traversed from 2 to 8

As shown in Table 8 the estimation results for the lagpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated periodicsources we confirm that the lag predictor is suitable for theentropy estimation of this type of source as expected Rel-atively the RNN can also give the accurate min-entropyestimates ie estimated results are zeros us our pro-posed predictive models are suitable for the entropy esti-mation of the (strong) periodic data In addition theMultiMMC predictor can also give the accurate min-entropyestimations is is reasonable because periodicity is also aform of correlation

423 Markov Sources Next we generate multivariateM-sequences as Markov sources which fit the statisticalbehaviors of the MultiMMC predictor Specifically themultivariate M-sequences are composed of multiple M-se-quences with different initial states Due to the determinacyof this type of sequences the correct entropy is zero e bitwidth of the samples is also traversed from 2 to 8 emaximum step of correlation used here is set as 8 Table 9shows the estimated results for multivariate M-sequences

From Table 9 the estimation results for MultiMMCpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated Markovsources we confirm that theMultiMMC predictor is suitablefor the entropy estimation of this type of source as expectedRelatively the RNN can also give the accurate min-entropyestimations ie estimated results are zeros us our

proposed predictive models are suitable for the Markovsources

424 LZ78Y Sources Finally we verify the applicability ofthe LZ78Y sources is type of entropy source is difficult togenerate by simulating However we can still draw theconclusion that our proposed predictive models can beapplied to the LZ78Y sources according to Tables 8 and 9 initalic font Because the periodic data and Markov sequencesare compressible

425 Summary on Applicability Scope of Our PredictorsBy analyzing the experimental results of the above fourspecific simulated sources each of which is oriented towardsa certain predictor in the 90B we have a conclusion that ourpredictors can provide accurate estimated results of entropySo the proposed predictors are well applied to these entropysources as well as the 90Brsquos predictors In addition com-pared with 90Brsquos predictors our predictors have a betterperformance on the scope of applicability for testing the

Table 7 Entropy estimates for time-varying data

Data class Correct90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNgradual(02) 06345 05290 07808 07240 07790 06288 06289gradual(03) 07437 07378 09221 08416 09243 07430 07460gradual(04) 08645 08631 09786 09518 09739 08648 08637period(02) 06345 05537 07428 05537 07669 06205 06209period(03) 07437 07393 09218 08476 09233 07377 07375period(04) 08645 08639 09767 09632 09796 08653 08632sudden(02) 03219 03203 04663 03386 04484 03217 03229sudden(03) 05146 05110 05857 09984 07663 05110 05119sudden(04) 07370 07338 08699 09984 09389 07339 07345

Table 8 Entropy estimates for periodic sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 16458 00000 00000 11817 00079 000003 23318 00000 00000 15957 01315 000004 29147 00000 00000 18016 04748 000005 33269 00000 00000 14586 08898 000006 39092 00000 00000 08322 34944 000007 44908 00000 00000 03973 34960 000008 44919 00000 00000 02027 35408 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 6 Entropy estimates for real-world sources

Real-world sources90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNRANDOMORG 09951 09963 09966 09976 09802 09954Ubldit TrueRNGpro 09979 09955 09973 09966 09934 09728Linux kernel entropy source 06173 01232 01269 06164 01230 03068Linuxdevurandom 09952 09935 09990 09964 09983 09911Windows RNG 09953 09986 09975 09984 09833 09853

Security and Communication Networks 15

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 5: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

(iv) Longest Repeated Substring Estimate (LRS estimate)is method estimates the collision entropy (sam-pling without replacement) of the source based onthe number of repeated substrings (tuples) withinthe input dataset

(v) Collision Estimate is estimator calculates entropybased on the mean number of samples to see the firstcollision in a dataset where a collision is any repeatedvalue is method is only applied to binary inputs

vi) Compression estimate is estimator calculatesentropy based on how much the tested data can becompressed is method is also only applied tobinary inputs

223 Min-Entropy Estimation of 90Brsquos Predictors Eachpredictor in the 90B attempts to predict the next sample in asequence according to a certain statistical property of pre-vious samples and provides an estimated result based on theprobability of successful prediction Every predictor consistsof a set of subpredictors and chooses the subpredictor withthe highest rate of successful predictions to predict thesubsequent output As for each predictor it calculates theglobal predictability and local predictability with the upperbound of the 99 confidence interval and then derives theglobal and the local entropy estimations respectively Fi-nally the final entropy estimation for this predictor is theminimum of the global and the local entropy estimations

For estimating the entropy of a given entropy sourceeach predictor offers a predicted result after testing theoutputs produced by the source and provides an entropyestimation based on the probability of successful predictionsAfter obtaining the estimations from the predictors theminimum estimation of all the predictors is taken as the finalentropy estimation of the entropy source

e entropy estimation will be too loose if there is nopredictor applied to detect the predictable behaviors But if a setof predictors with different approaches are applied they canguarantee that the predictor which is the most effective atpredicting the entropy sourcersquos outputs determines the entropyestimation

23 Two Predictive Models Based on Neural NetworksNext we will introduce two main neural network models tohelp us design predictors for entropy estimation feedfor-ward neural networks (FNNs) and recurrent neural net-works (RNNs) respectively

231 FNNs e goal of a feedforward network is to ap-proximate some function flowast For an instance a classifierY flowast(X) maps an input X to a category Y A feedforwardnetwork describes a mapping Y f(X θ) and learns thevalue of the parameters θ that result in the best functionapproximatione principle of FNN is depicted in Figure 1

For each time step from t 1 to t n (n isin Zlowast denotesthe sample size) the FNN applies the following forwardpropagation equations

Ht f b + WXt( 1113857

Yt c + VHt

1113954Yt g Yt( 1113857

(5)

e parameters and functions that govern the compu-tation happening in a FNN are described as follows

(i) Xt is the input at time step t and is a vector composedof the previous inputs (ie Xt [Xtminus k Xtminus 1]where k refers to the step of memory)

(ii) Ht is the hidden state at time step t where the biasvectors b and input-to-hidden weights W are de-rived via training e number of hidden layers andthe number of hidden nodes per layer are definedbefore training which called hyperparameters inneural networks

(iii) Yt is the output at step t where the bias vectors c andhidden-to-output weights V are derived viatraining

(iv) 1113954Yt is our predictive output at time step t which wouldbe a vector of probabilities across our sample space

(v) e function f(middot) is a fixed nonlinear functioncalled activation function and the function g(middot) isan output function used in the final layer of a neuralnetwork Both of the two functions belong tohyperparameters which are defined before training(Section 32)

e models are called feedforward because the infor-mation flows through the approximate function from theinput Xt through the internal computations used to updatethe model to define f(middot) and to the output 1113954Yt finally Be-sides there is no feedback connection namely the outputsof the model are fed back into itself

232 RNNs If adding the feedback connections to thenetwork then it is called RNNs In particular the RNNrecords the information that has been calculated so far anduse it for the calculation of the present output e principleof RNNs is depicted in Figure 2

For each time step from t 1 to t n the RNN appliesthe following forward propagation equations

Ht f b + WHtminus 1 + UXt( 1113857

Yt c + VHt

1113954Yt g Yt( 1113857

(6)

e parameters and functions that govern the compu-tation happening in a RNN are described as follows

Xt Ht Yt

Output layerInput layer Hidden layer

Figure 1 Feedforward neural network

Security and Communication Networks 5

(i) Xt is the input at time step t and is one-hot vectorFor example if Xt 1 and the sample spaceS 0 1 then Xt [0 1]

(ii) Ht is the hidden state at time step t It is theldquomemoryrdquo of the networkHt is calculated based onthe previous hidden state Htminus 1 and the input at thecurrent stepXt bU andW denote the bias vectorsinput-to-hidden weights and hidden-to-hiddenconnection into the RNN cell respectively

(iii) Yt is the output at step t c and V denote the biasvectors and hidden-to-output weights respectively

(iv) 1113954Yt is our predictive output at time step t whichwould be a vector of probabilities across our samplespace

(v) Similarly the function f(middot) is an activation functionand g(middot) is an output function which are definedbefore training (Section 32)

3 Predictors for Min-Entropy EstimationBased on Neural Network

e neural network is able to approximate the various PDFsand the complexity of neural network is increased slower(linear relationship) as the sample space increases Moti-vated by [24] we propose two predictive models based onneural networks for min-entropy estimation Next wepresent the execution strategy of our min-entropy estima-tors provide the choices of the important hyperparametersand give the analysis on the accuracy and complexity of ourpredictive models to prove that our design is feasible

31 Strategy of Our Predictors for Min-Entropy Estimatione execution strategy of our min-entropy estimator isdepicted in Figure 3 which consists of model training andentropy estimation Both of our proposed two predictivemodels (namely predictors) which are based on FNN andRNN respectively follow the same strategy

e benefit of this strategy is that it applies not only tostationary sequences generated by entropy sources but alsoto nonstationary sequences of which the probability dis-tribution is time-varying On the one hand in our strategyin order that the model can match the statistical behavior ofthe data source well we use the whole input dataset to trainand continuously update the model On the other hand toeffectively estimate the entropy of the data source we use thepredictive model to compute the min-entropy only when thepredictive model is updated enough to characterize thestatistical behavior of the tested dataset Specifically for thetesting dataset which is used for computing entropy

estimation we preset the testing dataset as a part of thewhole observations and utilize a proportion parameter(c isin [0 1]) to determine the size of testing dataset namelythe last c of the inputs are used for computing entropy whilethe model is also updating

e workflow of the min-entropy estimator based onneural network is listed as the following steps

(1) Initialization choose one model (FNN or RNN) andset the hyperparameters of the model

(2) Input data input the tested dataset and judge theproportion of the remaining data in the entire testeddataset If the proportion of the remaining obser-vations le c record the accuracy for predicting thenext sample to be observed else continue

(3) Prediction predict the current output according tothe forward propagation equations

(4) Comparison and loss function observe the realoutput and compare it to the predicted value encompute the loss function

(5) Update predictive model compute the gradient andupdate the predictive model If the entire datasetruns out please turn to Step 6 else please repeat Step2 sim Step 5

(6) Calculate the predictive accuracy and probabilityobtain the accuracy of the executed predictive modelfrom the last c observations including global pre-dictions and local predictions and compute theprobability respectively

(7) Calculate the min-entropy calculate the min-en-tropy by the obtained probability After obtaining theestimations from the two predictors (FNN andRNN) respectively the minimum entropy of the twopredictors is taken as the final entropy estimation(namely min-entropy) of the tested dataset

Initialize the predictive model

Input the datasets to be observed

Predict the next sample

Unobservedproportion le γ

Comparison and compute theloss function

Update the predictive model

Record the accuracy

All the samplesruns out

Calculate the predictionprobability

N

Y

N

Y

Output min-entropy

Figure 3 Execution strategy of our predictors for min-entropyestimation based on neural network

Xt Ht Yt

Output layerInput layer Hidden layer

Htndash1

Figure 2 Recurrent neural network

6 Security and Communication Networks

Combining the calculation principle of min-entropy inSection 21 we can see the lower bound on the probabilityof making a correct prediction gives an upper bound on theentropy of the source In other words the more predictablea source is the larger probability of making correct pre-dictions is and the less entropy it has erefore a modelthat is a bad fit for the source or not fully trained will resultin inaccurate predictions a low accurate predictionprobability and a too-high entropy estimation of thesource So the models that are bad fit for the source or notfully trained can give big overestimates but notunderestimates

Further we can confirm that adding one more pre-dictor will not do any harm and conversely will make theentropy estimation much more accurate From the exe-cution strategy we can see that if all the predictors whosemodels are not matched for the noise source are usedalongside a predictor whose underlying model matchesthe sourcersquos behavior well then the predictor whichmatches the source well will determine the final entropyestimation

32 Choices of Important Parameters

321 Hyperparameters for FNN and RNN In neural net-works the choices of modelsrsquo hyperparameters have sig-nificant influences on the computational resource andperformance required to train and test erefore thechoices of hyperparameters are crucial to neural networksNext we illustrate the choices of some key hyperparameters

(1) Hidden Layers and Nodes Comprehensively balance theaccuracy and efficiency of our predictors in this paper forthe FNNmodel except for the multivariate M-sequences weset the number of hidden layers as 2 and the number ofhidden nodes per layer is 10 and 5 respectively While forthe multivariate M-sequences after extensive tests thenumber of hidden nodes per layer shall be larger to givebetter results By observing the results finally we set thenumbers as 35 and 30 respectively

(2) Step of Memory e step of memory determines thenumber of previous samples used for predicting the currentoutput Generally speaking the larger the value the betterthe performance However the computational resources(memory and runtime) increase as the step of memorygrows In this paper we set the step of memory as 20 bytrading off performance and resource at is to say as forthe FNN the input at time step t is the previous 20 observedvalues and as for the RNN the hidden layer contains 20unfolded hidden units

(3) Loss Function e loss function refers to the functionthat can measure the difference between the predicted valuesand the true values e total loss for a given sequence ofx x1 xn1113864 1113865 values paired with a sequence ofy y1 yn1113864 1113865 values would then be just the sum of thelosses over all the time steps For example if Lt is thenegative log-likelihood of yt given x1 xt then

L x1 xn1113864 1113865 y1 yn1113864 1113865( 1113857

1113944t

Lt

minus 1113944t

log2 pmodel yt

1113868111386811138681113868 x1 xt1113872 11138731113872 1113873

(7)

where pmodel(yt | x1 xt) is given by reading the entry foryt from the modelrsquos output vector 1113954yt e models are trainedto minimize the cross-entropy between the training data andthe modelsrsquo predictions (ie equation (7)) which isequivalent to minimizing the mean squared error (ie theaverage of the squares of the errors or deviations)

(4) Learning Rate Learning rate is a positive scalar deter-mining the size of the step To control the effective capacity ofthe model we need to set the value of learning rate in anappropriate range e learning rate determines how fast theparameter θmoves to its optimum value If the learning rate istoo large gradient descent can inadvertently increase ratherthan decrease the training error namely the parameters arelikely to cross the optimal value However if the learning rateis too small the training is not only slower but may becomepermanently stuck with a high training error So the learningrate is crucial to the performance of the model

Based on the above analysis we pick the learning rateapproximately on a logarithmic scale ie the learning ratetaken within the set 01 001 10minus 3 10minus 4 10minus 51113864 1113865 At thebeginning of model training we set the learning rate aslarger value to faster reach the optimum value en withthe number of training increasing we set the smaller valuefor not crossing the optimal value e detailed settings aredescribed in Algorithm 1

(5) Activation Function In general we must use a nonlinearfunction to describe the features Most neural networks doso using an affine transformation controlled by learnedparameters followed by a fixed nonlinear function called anactivation function Activation function plays an importantrole in neural networks e commonly used activationfunctions include tanh(middot) relu(middot) and sigmoid functionwhich is defined as σ(middot) ie equation (8)) in this paperBecause the sigmoid function is easy to saturate whichcauses the gradient to change slowly during training it isgenerally no longer used as an activation function except inRNN-LSTM (long short term memory)

σ(x) 1

1 + eminus x (8)

After many attempts (ie we compare the efficiency andperformance by means of the exhaustive method manually)we finally choose the tanh(middot) and relu(middot) as activationfunctions for FNN and RNN respectively ey can beexpressed as

tanh(x) 1 minus eminus 2x

1 + eminus 2x (9)

Compared with σ(middot) tanh(middot) is symmetrical about theorigin In some cases this symmetry can give better

Security and Communication Networks 7

performance It compresses the real-valued input to a rangeof minus 1 to 1 and the mean of its output is zero which makes itconverge faster than the σ(middot) and reduce the number ofiterations erefore this is suitable for activation functionand the zero-centered training dataset contributes theconvergence speed of model training

relu(x) max(0 x) (10)

where relu(middot) is currently a popular activation function It islinear and gets the activation value which requires the onlyone threshold We choose this function based on the fol-lowing two considerations On the one hand it solvesvanishing gradient problem of back propagation throughtime (BPTT) algorithms for the reason that the derivative ofrelu(middot) is 1 On the other hand it greatly improves the speedof calculation because it only needs to judge whether theinput is greater than 0

(6) Output Function e output function is used in the finallayer of a neural network model e predictors for timeseries are considered as a solution to a multiclass classifi-cation problem so we take softmax(middot) as the output func-tion which can be expressed as

yi softmax zi( 1113857 ezi

1113936si1 ezi

(11)

where s is the size of sample space and softmax(zi) denotesthe probability of the output is zi and satisfies that1113936

si1 yi 1

ie the sum of the probability of all the outputs is equal to 1Such networks are commonly trained under a cross-entropyregime (ie the loss function mentioned above)

322 Selection of Testing Dataset Length To better estimatethe entropy of the data source the length of the testingdataset is very important for min-entropy estimation forrandom numbers generated by different types of sources Inreality most entropy sources are time-varying (namelynonstationary) which means the probability distribution ofthe output sequences from the source is changing over timeSo the length of the testing dataset shall be adaptive to thetype of the source

erefore as described in Section 31 we utilize c todetermine the size of testing dataset Specifically in ourstrategy for the stationary entropy source of which theprobability distribution of the outputs from the source is notchanging over time the parameter c is preset to 20

Relatively for the nonstationary entropy source all obser-vation points (namely c is 100) need to serve as the testingdataset

To verify the reasonableness of the c value we computethe root-mean-squared error (RMSE) of the lowest esti-mations of our predictors over 80 sequences from the fol-lowing simulated datasets generated by nonstationarysource

(i) Time-Varying Normal Distribution Rounded to In-tegers e samples are subject to a normal distri-bution and rounded to integer values but themean ofthe distribution moves along a sine curve to simulatea time-varying signal

e RMSE ie

(1N) 1113936Ni1 ( 1113954Hmin minus Hmin)2

1113969

refers tothe arithmetic square root for the mean of the squares of theerrors or deviations for each class of simulated sources Notethat here N indicates the number of test samples 1113954Hminindicates the estimated result for each sample and Hminmeans the theoretical result for each sample In other wordsthe smaller the RMSE is the closer the estimated result is tothe theoretical entropy which indicates the predictor has abetter accuracy

As shown in Table 1 we can see that for the time-varyingdata source only when the c is 100 (namely the entiredataset shall be used for min-entropy estimation) the pre-dictors can give themost accurate resultsismeans when theprobability distribution of data sources is varying with timethe part of the input dataset cannot represent the overalldistribution of the input dataset so the part of the input datasetcannot accurately give the estimation result of the entire inputdataset Besides for the stationary sources it is reasonable thatc is preset to 20 because the estimated results obtained byour method are very close to the correct entropy (in theory) ofthe selected entropy source as presented in Section 41

33 Evaluation onOurPredictors In this section we conductsome experiments on simulated datasets to verify the accuracyof our proposed predictors for the min-entropy estimationand compare the experimental results with theoretical resultsIn addition we have a theoretical analysis of the complexity ofour predictors Note that in Section 4 we will apply ourpredictors to different data sources and provide the com-parison on our predictors with 90Brsquos predictors

331 Accuracy Verification We train our predictive modelsFNN and RNN on a number of representative simulated datasources (including stationary and nonstationary entropysources) of which the theoretical entropy can be obtainedfrom the known probability distribution of the outputsSimulated datasets are produced using the following dis-tribution families adopted in [24]

(1) Simulated Datasets Generated by Stationary Sources

(i) Discrete Uniform Distribution e samples areequally likely which come from an IID source

(1) if train numlt train dataset size3 then(2) learning rate⟵ 01 001 (3) else if train numlt train dataset size15 then(4) learning rate⟵ 001 10minus 3 10minus 41113864 1113865(5) else(6) learning rate⟵ 10minus 4 10minus 51113864 1113865(7) end if

ALGORITHM 1 Setting of learning rate

8 Security and Communication Networks

(ii) Discrete Near-Uniform Distribution All samples areequally likely except one which come from an IIDsource A certain sample has a higher probabilitythan the rest

(iii) Normal Distribution Rounded to Integers esamples are subject to a normal distribution androunded to integer values which come from an IIDsource

(iv) Markov Model e samples are generated using adth-order Markov model which come from a non-IID source

(2) Simulated Datasets Generated by Nonstationary Sourcesese datasets are the same as those used in Section 322

For every class listed above we generate a set of 80simulated datasets each of which contains 106 samples andestimate min-entropy by using predictive models FNN andRNN respectively For each dataset the theoretical min-entropy Hmin is derived from the known probabilitydistribution

From Figures 4ndash9 the abscissa in the figure representsthe theoretical entropy of the test sample and the ordinaterepresents the estimated entropy of the test sample Figure 4shows the estimated entropy results for the 80 simulateddatasets with uniform and near-uniform distributions re-spectively From Figures 4(a) and 4(b) we see that the es-timated results given by our proposed two predictive models(FNN and RNN) are almost consistent with the theoreticalentropy for both uniform and near-uniform distributionsSo the final estimated result which is the minimum result ofthe two predictive models is also basically consistent with thetheoretical entropy Figure 5 shows the estimated entropyresults for the 80 simulated datasets with normal distribu-tions and time-varying normal distributions respectivelyFrom Figures 5(a) and 5(b) we can see that the estimatedresults given by our proposed two predictive models areclose to the theoretical entropy with normal distributionsand time-varying normal distributions According to ourexecution strategy here we calculate min-entropy estima-tions using the whole input dataset for time-varying normaldistributions

Figure 6 shows the estimated results of Markov distri-butions we can see that both of our predictive models give anumber of overestimates when applied to the Markovsources particularly with the theoretical entropy increasing

Table 2 shows the relative errors (namely|( 1113954Hmin minus Hmin)Hmin|lowast100) between the theoretical re-sults and the estimated results of FNN and RNN to furtherreflect the accuracy of the models 1113954Hmin and Hmin have thesame meaning as in Section 322 We see that the entropy tobe estimated with an error of less than 602 for FNN and7 for RNN for the simulated classes respectively

Based on the above accuracy verification of our pre-dictors with simulated datasets from different distributionswhat we can be sure is that out predictors can give almostaccurate results except Markov distributions

332 Complexity Analysis To analyze the usability of ourpredictors in terms of execution efficiency we derive thefollowing computational complexity through the analysis oftheory and principle of implementation

We believe that the computational complexity of entropyestimators used for RNG evaluation mainly comes from thesample space and sample size For ease of analysis we definethe following parameter n as the sample size which indicatesthe length of the sample s as the sample space which meansthe kinds of symbols in the sample (ie s 8means there are8 symbols in the sample and the bit width of each symbol islog2(8) 3 such as 010 110 111 ) and k denotes themaximum step of correlation which is set as a constant in90Brsquos predictors (k 16) and our predictors (k 20)

rough the analysis of the implementation the com-putational complexity of the final 90Brsquos predictors [23]mainly comes from the MultiMMC predictor and is of orderO(sk middot n + 2k middot n middot log2(s)) which is mainly linear timecomplexity of n and k-order polynomial time complexity ofs While the computational complexity of our predictor is oforder O(s middot n) which is linear time complexity of s and n Itcan be seen that the computational complexity of ourpredictors is much lower than that of the 90Brsquos predictors

It is important to note that the MultiMMC predictorrequires sk≪ n otherwise this predictor cannot give ac-curate estimated results statisticallyat is to say when the sis increasing the MultiMMC predictor requires largersample size in order to estimate the entropy accurately

From the above analysis we can see our predictors havelower computational complexity We will give the experi-mental proof in Section 43

4 Comparison on Our Predictors With90BrsquoS Predictors

In this section a large number of experiments have beendone to evaluate our proposed predictors for entropy esti-mation from the aspects of accuracy applicability and ef-ficiency by applying our predictors to different simulateddata and real-world data For the experiments mentionedabove we compare the results with the final 90Brsquos predictors[23] to highlight the advantages of our work Similarly ourpredictors in these experiments compute an upper-bound ofmin-entropy estimation at the significance level α 001which is the same as 90Brsquos predictors

41 Comparison on Accuracy

411 Simulated Data e simulated datasets are producedusing the same distribution families as described in Section331 Further we append the following two new distributionfamilies such as pseudorandom sequence and post-processing sequence which are representative and com-monly used in reality

Table 1 Error measures of final estimations of our predictors fornonstationary sources with different c values

c 01 02 04 06 08 1RMSE 00911 01364 00788 00817 00219 00149

Security and Communication Networks 9

eoretical entropyFNNRNN

0

1

2

3

4

5

6

7

8Es

timat

ed en

trop

y pe

r sam

ple

1 2 3 4 5 6 7 80eoretical entropy per sample

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 4 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for(a) uniform distributions and (b) near-uniform distributions

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entro

py p

er sa

mpl

e

(b)

Figure 5 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for (a) normaldistributions and (b) time-varying normal distributions

10 Security and Communication Networks

Theoretical entropyFNNRNN

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

Figure 6 Comparison of estimated results obtained from our two predictive models with the theoretical entropy for Markov distributions

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 7 Comparison of the min-entropy estimation obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) uniform distributions and (b) near-uniform distributions

Security and Communication Networks 11

(i) M-Sequence A maximum length sequence which isa type of pseudorandom binary sequence ([35])

(ii) Nonuniform Distribution by Postprocessing UsingLFSR e samples are processed using a linearfeedback shifting register (LFSR) which come froman IID source ([35])

For every distribution mentioned above we also gen-erate a set of 80 simulated datasets each of which contains106 samples and estimate min-entropy by using our pro-posed predictors and final 90Brsquos predictors [23]

Figure 7 shows the estimated min-entropy results for the80 simulated datasets with uniform distributions and near-uniform distributions respectively From Figures 7(a) and7(b) we see that several points of the results obtained fromthe 90Brsquos predictors are apparently underestimated whichmay result from the overfitting phenomenon Comparedwith 90Brsquos predictors our predictors provide more accurateresults

Figure 8 shows the estimated min-entropy results fornormal distributions and time-varying normal distributionsrespectively From Figures 8(a) and 8(b) we can see that theestimated results given by our predictors are close to thetheoretical entropy with normal distributions and time-varying normal distributions However the lowest entropyestimation results obtained from the 90Brsquos predictors givesignificant underestimates

Figure 9 shows the estimated min-entropy results forMarkov distributions We can see that the 90Brsquos predictorsalmost give underestimates compared with the theoreticalentropy while estimated results given by our predictors aremuch closer to the theoretical entropy than those obtainedfrom 90Brsquos predictors

To further obviously compare the accuracy of our and90Brsquos predictors we apply the predictors to the M-sequenceand the non-uniform distribution sequence by post-pro-cessing using LFSR and their theoretical entropy is a knownand fixed value

It is further confirmed that the higher stage (the max-imum step of correlation) M-sequence and nonuniformdistribution sequence by postprocessing using LFSR are ableto pass the NIST SP 800-22 statistical tests [8] e estimatedresults are listed in Tables 3 and 4 and the lowest entropyestimations from 90Brsquos predictors and our predictors foreach stage are shown in bold font

For M-sequence and nonuniform distribution by post-processing using LFSR the MultiMMC predictor presentedin the final 90B gives the most accurate entropy estimationresults for the stage le 16 However when the stage ofM-sequence and nonuniform distribution by postprocessingusing LFSR is greater than 16 the MultiMMC predictorcannot give accurate entropy estimation result because thispredictor is parameterized by k isin 1 2 16 (k is themaximum step of correlation) Perhaps we could set the

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7Es

timat

ed en

trop

y pe

r sam

ple

(a)

eoretical entropy90Brsquos predictorsOur predictors

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e1 2 3 4 5 6 70

eoretical entropy per sample

(b)

Figure 8 Comparison of the min-entropy estimations obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) normal distributions and (b) time-varying normal distributions

12 Security and Communication Networks

parameter of the MultiMMC predictor as a greater range toachieve a more accurate estimated result for the higher stagebut the time complexity grows exponentially with the pa-rameter k as we analyzed in Section 332 Moreover theFNN model can also give accurate estimated results eventhough the stages of M-sequence and LFSR are greater than16 However the RNN model can give accurate estimated

results only when the stage is 8erefore the FNNmodel ismore matched to M-sequence and nonuniform distributionby postprocessing using LFSR than RNN

We also compute the relative errors of estimated resultsfrom 90Brsquos predictors and our predictors over 80 se-quences from each class of simulated sources We calculatethe relative errors using the min-entropy obtained from90Brsquos predictors (the lowest estimation result of 90Brsquos fourpredictors) and our predictors (the lowest estimation re-sult of FNN and RNN) respectively As illustrated inTable 5 it shows that for all five classes of simulatedsources the errors of our predictors are lower than that ofthe 90Brsquos predictors Specially our approaches enable theentropy to be estimated with an error of less than 655but it is up to 1465 for 90Brsquos predictors Overall thisindicates that our proposed predictors have a betterperformance than that of 90Brsquos predictors on accuracy forboth stationary sequences and nonstationary sequenceswhich is consistent with the conclusion drawn in thefigures above

From Tables 2ndash4 we also find that the accuracy of theRNN predictive model is slightly higher than that of theFNN predictive model except for the cases of the Markovsources M-sequence and nonuniform distribution bypostprocessing using LFSR

We will further verify the applicability for time-varyingsources in Section 42 erefore through the evaluation onthe entropy estimation results of the above simulateddatasets we see that our proposed predictors are superior inaccuracy compared with the 90Brsquos predictors

412 Real-World Data We further apply our predictors tothe datasets which are generated from the RNGs deployed inthe real-world In fact the theoretical entropy of per sampleis unknown for these real-world sources so no error can becompared like the simulated datasets for the predictorsHowever the estimated results from the predictors pre-sented here can still be compared to the 90Brsquos predictorsbased on the knowledge that underestimates from thepredictors have theoretical bounds

Datasets of real-world data are produced using thefollowing approaches e first two are adopted in [24] andthe others are commonly used typical RNGs e estima-tions of the real-world sources are presented in Table 6

(i) RANDOMORG is is a service that providesrandom numbers based on atmospheric noise and isused in [24] It allows the user to specify the

Theoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

Figure 9 Comparison of the min-entropy estimations obtainedfrom our proposed predictors and 90Brsquos predictors with the the-oretical entropy for Markov distributions

Table 2 Relative errors of FNN and RNN estimation results

Simulated data class FNN () RNN ()Uniform 165 16Near-uniform 160 152Normal 117 108Time-varying normal 212 184Markov 602 7

Table 3 Estimated results for M-sequence (Hmin 0000)

Stage 8 10 12 14 16 18 20MultiMCW 0991 0996 0988 0989 0993 0999 lowast 1000Lag 1000 1000 1000 1000 1000 1000 1000MultiMMC 0000 0000 0000 0000 0000 1000 1000LZ78Y 1000 1000 1000 1000 1000 1000 0997 lowast

FNN 0000 0000 0000 0000 0000 0000 0000RNN 0000 1048 1007 1002 0996 09920 09997

Table 4 Estimated results for nonuniform distribution by post-processing using LFSR (Hmin 0152)

Stage 8 10 12 14 16 18 20MultiMCW 0440 0595 0743 0721 0998 0994 0998Lag 0581 0581 0680 0680 0992 0994 0999MultiMMC 0151 0153 0158 0181 0234 0995 0996LZ78Y 0567 0995 0766 0679 0997 0996lowast 0994lowast

FNN 0151 0145 0149 0147 0149 0142 0144RNN 0149 0947 1012 0998 1012 0997 0985

Security and Communication Networks 13

minimum and maximum values that are outpute sequence used here consists of bits

(ii) Ubldit TrueRNGpro TrueRNGpro provides a steadystream of random numbers through a USB CDCserial port which is a USB randomnumber generatorproduced by Ubldit is entropy source is also usedin [24] e sequence used here consists of bits

(iii) Linux kernel entropy source e Linux kernelrandom generator is used for the generation of areal-world sequence without any processing esequence used here is the last bit of per symbol

(iv) Linuxdevurandomedevurandom [6] of Linuxis used for the generation of a real-world sequencewith strict processing e sequence used hereconsists of bits

(v) Windows RNG Windows RNG [5] is used for thegeneration of a real-world sequence by calling aCrypto APIe sequence used here consists of bits

As illustrated in Table 6 the lowest entropy estimationfor each source is shown in bold font We see that ourpredictors perform better than 90Brsquos predictors because thelowest entropy estimation is always obtained from our workfor each real-world source Furthermore for Linux kernelentropy source we find that both of the predictor Lag andMultiMMC are able to give lower estimation results Itindicates that Linux kernel entropy source has periodicityand conforms to the Markov model which is well under-stood because the randomness of Linux kernel entropysource comes from human behaviors such as manipulatingthe mouse and keyboard In our work compared with theentropy estimations for other real-world sources FNN fitsmuch better than RNN for Linux kernel entropy sourcewhich is consistent with the previous view that FNN per-forms well in testing Markov sources

42 Comparison on the Scope of Applicability After evalu-ating the accuracy we further validate the scope of appli-cability of our proposed predictors and compare them withthat of the 90Brsquos predictors Kelsey et al [24] stated that eachof the 90Brsquos predictors performs well only for a specialdistribution as described in Section 221 To prove ourpredictor has better applicability the following four simu-lated datasets are generated which are suitable for eachpredictor employed in the final 90B

(i) Time-Varying Sourcese probability distribution ofdata sources is varying with timeeMCWpredictor

predicts the current output according to previousoutputs in a short period of time and thus the MCWpredictor performs well in these data sources

(ii) Periodic Sources e data source changes periodi-cally e lag predictor predicts the value that oc-curred samples back in the sequence as the currentoutput and thus the lag predictor performs well onsources with strong periodic behavior

(iii) Markov Sources e data sources can be modeledby the Markov model e MultiMMC predictorpredicts the current output according to theMarkovmodel and thus the MultiMMC predictor performswell on data from any process that can be accuratelymodeled by a Markov model

(iv) LZ78Y Sources e data sources can be efficientlycompressed by LZ78-like compression algorithmswhich applies to the LZ78Y predictor well

For each above simulated source we generate a set of 10simulated datasets each of which contains 106 samples andthe min-entropy is estimated by our and 90Brsquos predictorse final result for a predictor is the average value of 10estimated results corresponding to the 10 simulated datasetsfor one simulated source

421 Time-Varying Sources Firstly we generate the time-varying binary data which is suitable for the statistical be-haviors of the MCW predictor presented in the 90B Table 7shows the entropy estimation results for time-varying data

As shown in Table 7 symbol gradual(x) (x isin [0 1] thesame below) is defined as a simulated source that theprobability of output ldquo0rdquo changes gradually from x to 1 minus x

with time Symbol period(x) is defined as a simulated sourcethat the probability of output ldquo0rdquo changes periodically withtime and the probability varies from x to 1 minus x in one periode period length is set to 20 of the entire input datasetSymbol sudden(x) is defined as a simulated source that theprobability of output ldquo0rdquo changes suddenly with timenamely the probability is set to x for the first half of the inputdataset and 1 minus x for the last half

From Table 7 the estimation results for MCW predictorand our work are shown in bold font We see that the MCWpredictor gives the lowest and most accurate entropy esti-mations for the three types of time-varying data mentionedabove but it gives a little underestimates at gradual(02) andperiod(02) It is confirmed that the time-varying sourcesmentioned above match with the statistical behaviors of theMCW predictor Relatively we find that our proposedpredictive models are all capable to obtain the satisfiedentropy estimations that are close to the correct valueserefore it is proved that our proposed predictive modelsare suitable for the time-varying datamentioned above Notethat we calculate the min-entropy estimate according to theentire dataset rather than the last 20 of the input dataset forthese time-varying sources Because the probability distri-bution is varying with time the part of the input datasetcannot represent the overall distribution of the input dataset

Table 5 Relative errors of the final estimations of 90Brsquos predictorsand our predictors for five classes of simulated sources

Simulated data class 90Brsquos predictors () Our predictors ()Uniform 437 153Near-uniform 347 159Normal 608 157Time-varying normal 347 172Markov 1465 655

14 Security and Communication Networks

422 Periodic Sources Secondly we generate periodic datawhich are suitable for the statistical behaviors of the lagpredictor presented in 90B e following is entropy esti-mation results for periodic sequences e data source iscompletely obeying the periodic rule so the correct entropyis zero e bit width of samples is traversed from 2 to 8

As shown in Table 8 the estimation results for the lagpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated periodicsources we confirm that the lag predictor is suitable for theentropy estimation of this type of source as expected Rel-atively the RNN can also give the accurate min-entropyestimates ie estimated results are zeros us our pro-posed predictive models are suitable for the entropy esti-mation of the (strong) periodic data In addition theMultiMMC predictor can also give the accurate min-entropyestimations is is reasonable because periodicity is also aform of correlation

423 Markov Sources Next we generate multivariateM-sequences as Markov sources which fit the statisticalbehaviors of the MultiMMC predictor Specifically themultivariate M-sequences are composed of multiple M-se-quences with different initial states Due to the determinacyof this type of sequences the correct entropy is zero e bitwidth of the samples is also traversed from 2 to 8 emaximum step of correlation used here is set as 8 Table 9shows the estimated results for multivariate M-sequences

From Table 9 the estimation results for MultiMMCpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated Markovsources we confirm that theMultiMMC predictor is suitablefor the entropy estimation of this type of source as expectedRelatively the RNN can also give the accurate min-entropyestimations ie estimated results are zeros us our

proposed predictive models are suitable for the Markovsources

424 LZ78Y Sources Finally we verify the applicability ofthe LZ78Y sources is type of entropy source is difficult togenerate by simulating However we can still draw theconclusion that our proposed predictive models can beapplied to the LZ78Y sources according to Tables 8 and 9 initalic font Because the periodic data and Markov sequencesare compressible

425 Summary on Applicability Scope of Our PredictorsBy analyzing the experimental results of the above fourspecific simulated sources each of which is oriented towardsa certain predictor in the 90B we have a conclusion that ourpredictors can provide accurate estimated results of entropySo the proposed predictors are well applied to these entropysources as well as the 90Brsquos predictors In addition com-pared with 90Brsquos predictors our predictors have a betterperformance on the scope of applicability for testing the

Table 7 Entropy estimates for time-varying data

Data class Correct90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNgradual(02) 06345 05290 07808 07240 07790 06288 06289gradual(03) 07437 07378 09221 08416 09243 07430 07460gradual(04) 08645 08631 09786 09518 09739 08648 08637period(02) 06345 05537 07428 05537 07669 06205 06209period(03) 07437 07393 09218 08476 09233 07377 07375period(04) 08645 08639 09767 09632 09796 08653 08632sudden(02) 03219 03203 04663 03386 04484 03217 03229sudden(03) 05146 05110 05857 09984 07663 05110 05119sudden(04) 07370 07338 08699 09984 09389 07339 07345

Table 8 Entropy estimates for periodic sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 16458 00000 00000 11817 00079 000003 23318 00000 00000 15957 01315 000004 29147 00000 00000 18016 04748 000005 33269 00000 00000 14586 08898 000006 39092 00000 00000 08322 34944 000007 44908 00000 00000 03973 34960 000008 44919 00000 00000 02027 35408 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 6 Entropy estimates for real-world sources

Real-world sources90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNRANDOMORG 09951 09963 09966 09976 09802 09954Ubldit TrueRNGpro 09979 09955 09973 09966 09934 09728Linux kernel entropy source 06173 01232 01269 06164 01230 03068Linuxdevurandom 09952 09935 09990 09964 09983 09911Windows RNG 09953 09986 09975 09984 09833 09853

Security and Communication Networks 15

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 6: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

(i) Xt is the input at time step t and is one-hot vectorFor example if Xt 1 and the sample spaceS 0 1 then Xt [0 1]

(ii) Ht is the hidden state at time step t It is theldquomemoryrdquo of the networkHt is calculated based onthe previous hidden state Htminus 1 and the input at thecurrent stepXt bU andW denote the bias vectorsinput-to-hidden weights and hidden-to-hiddenconnection into the RNN cell respectively

(iii) Yt is the output at step t c and V denote the biasvectors and hidden-to-output weights respectively

(iv) 1113954Yt is our predictive output at time step t whichwould be a vector of probabilities across our samplespace

(v) Similarly the function f(middot) is an activation functionand g(middot) is an output function which are definedbefore training (Section 32)

3 Predictors for Min-Entropy EstimationBased on Neural Network

e neural network is able to approximate the various PDFsand the complexity of neural network is increased slower(linear relationship) as the sample space increases Moti-vated by [24] we propose two predictive models based onneural networks for min-entropy estimation Next wepresent the execution strategy of our min-entropy estima-tors provide the choices of the important hyperparametersand give the analysis on the accuracy and complexity of ourpredictive models to prove that our design is feasible

31 Strategy of Our Predictors for Min-Entropy Estimatione execution strategy of our min-entropy estimator isdepicted in Figure 3 which consists of model training andentropy estimation Both of our proposed two predictivemodels (namely predictors) which are based on FNN andRNN respectively follow the same strategy

e benefit of this strategy is that it applies not only tostationary sequences generated by entropy sources but alsoto nonstationary sequences of which the probability dis-tribution is time-varying On the one hand in our strategyin order that the model can match the statistical behavior ofthe data source well we use the whole input dataset to trainand continuously update the model On the other hand toeffectively estimate the entropy of the data source we use thepredictive model to compute the min-entropy only when thepredictive model is updated enough to characterize thestatistical behavior of the tested dataset Specifically for thetesting dataset which is used for computing entropy

estimation we preset the testing dataset as a part of thewhole observations and utilize a proportion parameter(c isin [0 1]) to determine the size of testing dataset namelythe last c of the inputs are used for computing entropy whilethe model is also updating

e workflow of the min-entropy estimator based onneural network is listed as the following steps

(1) Initialization choose one model (FNN or RNN) andset the hyperparameters of the model

(2) Input data input the tested dataset and judge theproportion of the remaining data in the entire testeddataset If the proportion of the remaining obser-vations le c record the accuracy for predicting thenext sample to be observed else continue

(3) Prediction predict the current output according tothe forward propagation equations

(4) Comparison and loss function observe the realoutput and compare it to the predicted value encompute the loss function

(5) Update predictive model compute the gradient andupdate the predictive model If the entire datasetruns out please turn to Step 6 else please repeat Step2 sim Step 5

(6) Calculate the predictive accuracy and probabilityobtain the accuracy of the executed predictive modelfrom the last c observations including global pre-dictions and local predictions and compute theprobability respectively

(7) Calculate the min-entropy calculate the min-en-tropy by the obtained probability After obtaining theestimations from the two predictors (FNN andRNN) respectively the minimum entropy of the twopredictors is taken as the final entropy estimation(namely min-entropy) of the tested dataset

Initialize the predictive model

Input the datasets to be observed

Predict the next sample

Unobservedproportion le γ

Comparison and compute theloss function

Update the predictive model

Record the accuracy

All the samplesruns out

Calculate the predictionprobability

N

Y

N

Y

Output min-entropy

Figure 3 Execution strategy of our predictors for min-entropyestimation based on neural network

Xt Ht Yt

Output layerInput layer Hidden layer

Htndash1

Figure 2 Recurrent neural network

6 Security and Communication Networks

Combining the calculation principle of min-entropy inSection 21 we can see the lower bound on the probabilityof making a correct prediction gives an upper bound on theentropy of the source In other words the more predictablea source is the larger probability of making correct pre-dictions is and the less entropy it has erefore a modelthat is a bad fit for the source or not fully trained will resultin inaccurate predictions a low accurate predictionprobability and a too-high entropy estimation of thesource So the models that are bad fit for the source or notfully trained can give big overestimates but notunderestimates

Further we can confirm that adding one more pre-dictor will not do any harm and conversely will make theentropy estimation much more accurate From the exe-cution strategy we can see that if all the predictors whosemodels are not matched for the noise source are usedalongside a predictor whose underlying model matchesthe sourcersquos behavior well then the predictor whichmatches the source well will determine the final entropyestimation

32 Choices of Important Parameters

321 Hyperparameters for FNN and RNN In neural net-works the choices of modelsrsquo hyperparameters have sig-nificant influences on the computational resource andperformance required to train and test erefore thechoices of hyperparameters are crucial to neural networksNext we illustrate the choices of some key hyperparameters

(1) Hidden Layers and Nodes Comprehensively balance theaccuracy and efficiency of our predictors in this paper forthe FNNmodel except for the multivariate M-sequences weset the number of hidden layers as 2 and the number ofhidden nodes per layer is 10 and 5 respectively While forthe multivariate M-sequences after extensive tests thenumber of hidden nodes per layer shall be larger to givebetter results By observing the results finally we set thenumbers as 35 and 30 respectively

(2) Step of Memory e step of memory determines thenumber of previous samples used for predicting the currentoutput Generally speaking the larger the value the betterthe performance However the computational resources(memory and runtime) increase as the step of memorygrows In this paper we set the step of memory as 20 bytrading off performance and resource at is to say as forthe FNN the input at time step t is the previous 20 observedvalues and as for the RNN the hidden layer contains 20unfolded hidden units

(3) Loss Function e loss function refers to the functionthat can measure the difference between the predicted valuesand the true values e total loss for a given sequence ofx x1 xn1113864 1113865 values paired with a sequence ofy y1 yn1113864 1113865 values would then be just the sum of thelosses over all the time steps For example if Lt is thenegative log-likelihood of yt given x1 xt then

L x1 xn1113864 1113865 y1 yn1113864 1113865( 1113857

1113944t

Lt

minus 1113944t

log2 pmodel yt

1113868111386811138681113868 x1 xt1113872 11138731113872 1113873

(7)

where pmodel(yt | x1 xt) is given by reading the entry foryt from the modelrsquos output vector 1113954yt e models are trainedto minimize the cross-entropy between the training data andthe modelsrsquo predictions (ie equation (7)) which isequivalent to minimizing the mean squared error (ie theaverage of the squares of the errors or deviations)

(4) Learning Rate Learning rate is a positive scalar deter-mining the size of the step To control the effective capacity ofthe model we need to set the value of learning rate in anappropriate range e learning rate determines how fast theparameter θmoves to its optimum value If the learning rate istoo large gradient descent can inadvertently increase ratherthan decrease the training error namely the parameters arelikely to cross the optimal value However if the learning rateis too small the training is not only slower but may becomepermanently stuck with a high training error So the learningrate is crucial to the performance of the model

Based on the above analysis we pick the learning rateapproximately on a logarithmic scale ie the learning ratetaken within the set 01 001 10minus 3 10minus 4 10minus 51113864 1113865 At thebeginning of model training we set the learning rate aslarger value to faster reach the optimum value en withthe number of training increasing we set the smaller valuefor not crossing the optimal value e detailed settings aredescribed in Algorithm 1

(5) Activation Function In general we must use a nonlinearfunction to describe the features Most neural networks doso using an affine transformation controlled by learnedparameters followed by a fixed nonlinear function called anactivation function Activation function plays an importantrole in neural networks e commonly used activationfunctions include tanh(middot) relu(middot) and sigmoid functionwhich is defined as σ(middot) ie equation (8)) in this paperBecause the sigmoid function is easy to saturate whichcauses the gradient to change slowly during training it isgenerally no longer used as an activation function except inRNN-LSTM (long short term memory)

σ(x) 1

1 + eminus x (8)

After many attempts (ie we compare the efficiency andperformance by means of the exhaustive method manually)we finally choose the tanh(middot) and relu(middot) as activationfunctions for FNN and RNN respectively ey can beexpressed as

tanh(x) 1 minus eminus 2x

1 + eminus 2x (9)

Compared with σ(middot) tanh(middot) is symmetrical about theorigin In some cases this symmetry can give better

Security and Communication Networks 7

performance It compresses the real-valued input to a rangeof minus 1 to 1 and the mean of its output is zero which makes itconverge faster than the σ(middot) and reduce the number ofiterations erefore this is suitable for activation functionand the zero-centered training dataset contributes theconvergence speed of model training

relu(x) max(0 x) (10)

where relu(middot) is currently a popular activation function It islinear and gets the activation value which requires the onlyone threshold We choose this function based on the fol-lowing two considerations On the one hand it solvesvanishing gradient problem of back propagation throughtime (BPTT) algorithms for the reason that the derivative ofrelu(middot) is 1 On the other hand it greatly improves the speedof calculation because it only needs to judge whether theinput is greater than 0

(6) Output Function e output function is used in the finallayer of a neural network model e predictors for timeseries are considered as a solution to a multiclass classifi-cation problem so we take softmax(middot) as the output func-tion which can be expressed as

yi softmax zi( 1113857 ezi

1113936si1 ezi

(11)

where s is the size of sample space and softmax(zi) denotesthe probability of the output is zi and satisfies that1113936

si1 yi 1

ie the sum of the probability of all the outputs is equal to 1Such networks are commonly trained under a cross-entropyregime (ie the loss function mentioned above)

322 Selection of Testing Dataset Length To better estimatethe entropy of the data source the length of the testingdataset is very important for min-entropy estimation forrandom numbers generated by different types of sources Inreality most entropy sources are time-varying (namelynonstationary) which means the probability distribution ofthe output sequences from the source is changing over timeSo the length of the testing dataset shall be adaptive to thetype of the source

erefore as described in Section 31 we utilize c todetermine the size of testing dataset Specifically in ourstrategy for the stationary entropy source of which theprobability distribution of the outputs from the source is notchanging over time the parameter c is preset to 20

Relatively for the nonstationary entropy source all obser-vation points (namely c is 100) need to serve as the testingdataset

To verify the reasonableness of the c value we computethe root-mean-squared error (RMSE) of the lowest esti-mations of our predictors over 80 sequences from the fol-lowing simulated datasets generated by nonstationarysource

(i) Time-Varying Normal Distribution Rounded to In-tegers e samples are subject to a normal distri-bution and rounded to integer values but themean ofthe distribution moves along a sine curve to simulatea time-varying signal

e RMSE ie

(1N) 1113936Ni1 ( 1113954Hmin minus Hmin)2

1113969

refers tothe arithmetic square root for the mean of the squares of theerrors or deviations for each class of simulated sources Notethat here N indicates the number of test samples 1113954Hminindicates the estimated result for each sample and Hminmeans the theoretical result for each sample In other wordsthe smaller the RMSE is the closer the estimated result is tothe theoretical entropy which indicates the predictor has abetter accuracy

As shown in Table 1 we can see that for the time-varyingdata source only when the c is 100 (namely the entiredataset shall be used for min-entropy estimation) the pre-dictors can give themost accurate resultsismeans when theprobability distribution of data sources is varying with timethe part of the input dataset cannot represent the overalldistribution of the input dataset so the part of the input datasetcannot accurately give the estimation result of the entire inputdataset Besides for the stationary sources it is reasonable thatc is preset to 20 because the estimated results obtained byour method are very close to the correct entropy (in theory) ofthe selected entropy source as presented in Section 41

33 Evaluation onOurPredictors In this section we conductsome experiments on simulated datasets to verify the accuracyof our proposed predictors for the min-entropy estimationand compare the experimental results with theoretical resultsIn addition we have a theoretical analysis of the complexity ofour predictors Note that in Section 4 we will apply ourpredictors to different data sources and provide the com-parison on our predictors with 90Brsquos predictors

331 Accuracy Verification We train our predictive modelsFNN and RNN on a number of representative simulated datasources (including stationary and nonstationary entropysources) of which the theoretical entropy can be obtainedfrom the known probability distribution of the outputsSimulated datasets are produced using the following dis-tribution families adopted in [24]

(1) Simulated Datasets Generated by Stationary Sources

(i) Discrete Uniform Distribution e samples areequally likely which come from an IID source

(1) if train numlt train dataset size3 then(2) learning rate⟵ 01 001 (3) else if train numlt train dataset size15 then(4) learning rate⟵ 001 10minus 3 10minus 41113864 1113865(5) else(6) learning rate⟵ 10minus 4 10minus 51113864 1113865(7) end if

ALGORITHM 1 Setting of learning rate

8 Security and Communication Networks

(ii) Discrete Near-Uniform Distribution All samples areequally likely except one which come from an IIDsource A certain sample has a higher probabilitythan the rest

(iii) Normal Distribution Rounded to Integers esamples are subject to a normal distribution androunded to integer values which come from an IIDsource

(iv) Markov Model e samples are generated using adth-order Markov model which come from a non-IID source

(2) Simulated Datasets Generated by Nonstationary Sourcesese datasets are the same as those used in Section 322

For every class listed above we generate a set of 80simulated datasets each of which contains 106 samples andestimate min-entropy by using predictive models FNN andRNN respectively For each dataset the theoretical min-entropy Hmin is derived from the known probabilitydistribution

From Figures 4ndash9 the abscissa in the figure representsthe theoretical entropy of the test sample and the ordinaterepresents the estimated entropy of the test sample Figure 4shows the estimated entropy results for the 80 simulateddatasets with uniform and near-uniform distributions re-spectively From Figures 4(a) and 4(b) we see that the es-timated results given by our proposed two predictive models(FNN and RNN) are almost consistent with the theoreticalentropy for both uniform and near-uniform distributionsSo the final estimated result which is the minimum result ofthe two predictive models is also basically consistent with thetheoretical entropy Figure 5 shows the estimated entropyresults for the 80 simulated datasets with normal distribu-tions and time-varying normal distributions respectivelyFrom Figures 5(a) and 5(b) we can see that the estimatedresults given by our proposed two predictive models areclose to the theoretical entropy with normal distributionsand time-varying normal distributions According to ourexecution strategy here we calculate min-entropy estima-tions using the whole input dataset for time-varying normaldistributions

Figure 6 shows the estimated results of Markov distri-butions we can see that both of our predictive models give anumber of overestimates when applied to the Markovsources particularly with the theoretical entropy increasing

Table 2 shows the relative errors (namely|( 1113954Hmin minus Hmin)Hmin|lowast100) between the theoretical re-sults and the estimated results of FNN and RNN to furtherreflect the accuracy of the models 1113954Hmin and Hmin have thesame meaning as in Section 322 We see that the entropy tobe estimated with an error of less than 602 for FNN and7 for RNN for the simulated classes respectively

Based on the above accuracy verification of our pre-dictors with simulated datasets from different distributionswhat we can be sure is that out predictors can give almostaccurate results except Markov distributions

332 Complexity Analysis To analyze the usability of ourpredictors in terms of execution efficiency we derive thefollowing computational complexity through the analysis oftheory and principle of implementation

We believe that the computational complexity of entropyestimators used for RNG evaluation mainly comes from thesample space and sample size For ease of analysis we definethe following parameter n as the sample size which indicatesthe length of the sample s as the sample space which meansthe kinds of symbols in the sample (ie s 8means there are8 symbols in the sample and the bit width of each symbol islog2(8) 3 such as 010 110 111 ) and k denotes themaximum step of correlation which is set as a constant in90Brsquos predictors (k 16) and our predictors (k 20)

rough the analysis of the implementation the com-putational complexity of the final 90Brsquos predictors [23]mainly comes from the MultiMMC predictor and is of orderO(sk middot n + 2k middot n middot log2(s)) which is mainly linear timecomplexity of n and k-order polynomial time complexity ofs While the computational complexity of our predictor is oforder O(s middot n) which is linear time complexity of s and n Itcan be seen that the computational complexity of ourpredictors is much lower than that of the 90Brsquos predictors

It is important to note that the MultiMMC predictorrequires sk≪ n otherwise this predictor cannot give ac-curate estimated results statisticallyat is to say when the sis increasing the MultiMMC predictor requires largersample size in order to estimate the entropy accurately

From the above analysis we can see our predictors havelower computational complexity We will give the experi-mental proof in Section 43

4 Comparison on Our Predictors With90BrsquoS Predictors

In this section a large number of experiments have beendone to evaluate our proposed predictors for entropy esti-mation from the aspects of accuracy applicability and ef-ficiency by applying our predictors to different simulateddata and real-world data For the experiments mentionedabove we compare the results with the final 90Brsquos predictors[23] to highlight the advantages of our work Similarly ourpredictors in these experiments compute an upper-bound ofmin-entropy estimation at the significance level α 001which is the same as 90Brsquos predictors

41 Comparison on Accuracy

411 Simulated Data e simulated datasets are producedusing the same distribution families as described in Section331 Further we append the following two new distributionfamilies such as pseudorandom sequence and post-processing sequence which are representative and com-monly used in reality

Table 1 Error measures of final estimations of our predictors fornonstationary sources with different c values

c 01 02 04 06 08 1RMSE 00911 01364 00788 00817 00219 00149

Security and Communication Networks 9

eoretical entropyFNNRNN

0

1

2

3

4

5

6

7

8Es

timat

ed en

trop

y pe

r sam

ple

1 2 3 4 5 6 7 80eoretical entropy per sample

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 4 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for(a) uniform distributions and (b) near-uniform distributions

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entro

py p

er sa

mpl

e

(b)

Figure 5 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for (a) normaldistributions and (b) time-varying normal distributions

10 Security and Communication Networks

Theoretical entropyFNNRNN

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

Figure 6 Comparison of estimated results obtained from our two predictive models with the theoretical entropy for Markov distributions

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 7 Comparison of the min-entropy estimation obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) uniform distributions and (b) near-uniform distributions

Security and Communication Networks 11

(i) M-Sequence A maximum length sequence which isa type of pseudorandom binary sequence ([35])

(ii) Nonuniform Distribution by Postprocessing UsingLFSR e samples are processed using a linearfeedback shifting register (LFSR) which come froman IID source ([35])

For every distribution mentioned above we also gen-erate a set of 80 simulated datasets each of which contains106 samples and estimate min-entropy by using our pro-posed predictors and final 90Brsquos predictors [23]

Figure 7 shows the estimated min-entropy results for the80 simulated datasets with uniform distributions and near-uniform distributions respectively From Figures 7(a) and7(b) we see that several points of the results obtained fromthe 90Brsquos predictors are apparently underestimated whichmay result from the overfitting phenomenon Comparedwith 90Brsquos predictors our predictors provide more accurateresults

Figure 8 shows the estimated min-entropy results fornormal distributions and time-varying normal distributionsrespectively From Figures 8(a) and 8(b) we can see that theestimated results given by our predictors are close to thetheoretical entropy with normal distributions and time-varying normal distributions However the lowest entropyestimation results obtained from the 90Brsquos predictors givesignificant underestimates

Figure 9 shows the estimated min-entropy results forMarkov distributions We can see that the 90Brsquos predictorsalmost give underestimates compared with the theoreticalentropy while estimated results given by our predictors aremuch closer to the theoretical entropy than those obtainedfrom 90Brsquos predictors

To further obviously compare the accuracy of our and90Brsquos predictors we apply the predictors to the M-sequenceand the non-uniform distribution sequence by post-pro-cessing using LFSR and their theoretical entropy is a knownand fixed value

It is further confirmed that the higher stage (the max-imum step of correlation) M-sequence and nonuniformdistribution sequence by postprocessing using LFSR are ableto pass the NIST SP 800-22 statistical tests [8] e estimatedresults are listed in Tables 3 and 4 and the lowest entropyestimations from 90Brsquos predictors and our predictors foreach stage are shown in bold font

For M-sequence and nonuniform distribution by post-processing using LFSR the MultiMMC predictor presentedin the final 90B gives the most accurate entropy estimationresults for the stage le 16 However when the stage ofM-sequence and nonuniform distribution by postprocessingusing LFSR is greater than 16 the MultiMMC predictorcannot give accurate entropy estimation result because thispredictor is parameterized by k isin 1 2 16 (k is themaximum step of correlation) Perhaps we could set the

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7Es

timat

ed en

trop

y pe

r sam

ple

(a)

eoretical entropy90Brsquos predictorsOur predictors

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e1 2 3 4 5 6 70

eoretical entropy per sample

(b)

Figure 8 Comparison of the min-entropy estimations obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) normal distributions and (b) time-varying normal distributions

12 Security and Communication Networks

parameter of the MultiMMC predictor as a greater range toachieve a more accurate estimated result for the higher stagebut the time complexity grows exponentially with the pa-rameter k as we analyzed in Section 332 Moreover theFNN model can also give accurate estimated results eventhough the stages of M-sequence and LFSR are greater than16 However the RNN model can give accurate estimated

results only when the stage is 8erefore the FNNmodel ismore matched to M-sequence and nonuniform distributionby postprocessing using LFSR than RNN

We also compute the relative errors of estimated resultsfrom 90Brsquos predictors and our predictors over 80 se-quences from each class of simulated sources We calculatethe relative errors using the min-entropy obtained from90Brsquos predictors (the lowest estimation result of 90Brsquos fourpredictors) and our predictors (the lowest estimation re-sult of FNN and RNN) respectively As illustrated inTable 5 it shows that for all five classes of simulatedsources the errors of our predictors are lower than that ofthe 90Brsquos predictors Specially our approaches enable theentropy to be estimated with an error of less than 655but it is up to 1465 for 90Brsquos predictors Overall thisindicates that our proposed predictors have a betterperformance than that of 90Brsquos predictors on accuracy forboth stationary sequences and nonstationary sequenceswhich is consistent with the conclusion drawn in thefigures above

From Tables 2ndash4 we also find that the accuracy of theRNN predictive model is slightly higher than that of theFNN predictive model except for the cases of the Markovsources M-sequence and nonuniform distribution bypostprocessing using LFSR

We will further verify the applicability for time-varyingsources in Section 42 erefore through the evaluation onthe entropy estimation results of the above simulateddatasets we see that our proposed predictors are superior inaccuracy compared with the 90Brsquos predictors

412 Real-World Data We further apply our predictors tothe datasets which are generated from the RNGs deployed inthe real-world In fact the theoretical entropy of per sampleis unknown for these real-world sources so no error can becompared like the simulated datasets for the predictorsHowever the estimated results from the predictors pre-sented here can still be compared to the 90Brsquos predictorsbased on the knowledge that underestimates from thepredictors have theoretical bounds

Datasets of real-world data are produced using thefollowing approaches e first two are adopted in [24] andthe others are commonly used typical RNGs e estima-tions of the real-world sources are presented in Table 6

(i) RANDOMORG is is a service that providesrandom numbers based on atmospheric noise and isused in [24] It allows the user to specify the

Theoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

Figure 9 Comparison of the min-entropy estimations obtainedfrom our proposed predictors and 90Brsquos predictors with the the-oretical entropy for Markov distributions

Table 2 Relative errors of FNN and RNN estimation results

Simulated data class FNN () RNN ()Uniform 165 16Near-uniform 160 152Normal 117 108Time-varying normal 212 184Markov 602 7

Table 3 Estimated results for M-sequence (Hmin 0000)

Stage 8 10 12 14 16 18 20MultiMCW 0991 0996 0988 0989 0993 0999 lowast 1000Lag 1000 1000 1000 1000 1000 1000 1000MultiMMC 0000 0000 0000 0000 0000 1000 1000LZ78Y 1000 1000 1000 1000 1000 1000 0997 lowast

FNN 0000 0000 0000 0000 0000 0000 0000RNN 0000 1048 1007 1002 0996 09920 09997

Table 4 Estimated results for nonuniform distribution by post-processing using LFSR (Hmin 0152)

Stage 8 10 12 14 16 18 20MultiMCW 0440 0595 0743 0721 0998 0994 0998Lag 0581 0581 0680 0680 0992 0994 0999MultiMMC 0151 0153 0158 0181 0234 0995 0996LZ78Y 0567 0995 0766 0679 0997 0996lowast 0994lowast

FNN 0151 0145 0149 0147 0149 0142 0144RNN 0149 0947 1012 0998 1012 0997 0985

Security and Communication Networks 13

minimum and maximum values that are outpute sequence used here consists of bits

(ii) Ubldit TrueRNGpro TrueRNGpro provides a steadystream of random numbers through a USB CDCserial port which is a USB randomnumber generatorproduced by Ubldit is entropy source is also usedin [24] e sequence used here consists of bits

(iii) Linux kernel entropy source e Linux kernelrandom generator is used for the generation of areal-world sequence without any processing esequence used here is the last bit of per symbol

(iv) Linuxdevurandomedevurandom [6] of Linuxis used for the generation of a real-world sequencewith strict processing e sequence used hereconsists of bits

(v) Windows RNG Windows RNG [5] is used for thegeneration of a real-world sequence by calling aCrypto APIe sequence used here consists of bits

As illustrated in Table 6 the lowest entropy estimationfor each source is shown in bold font We see that ourpredictors perform better than 90Brsquos predictors because thelowest entropy estimation is always obtained from our workfor each real-world source Furthermore for Linux kernelentropy source we find that both of the predictor Lag andMultiMMC are able to give lower estimation results Itindicates that Linux kernel entropy source has periodicityand conforms to the Markov model which is well under-stood because the randomness of Linux kernel entropysource comes from human behaviors such as manipulatingthe mouse and keyboard In our work compared with theentropy estimations for other real-world sources FNN fitsmuch better than RNN for Linux kernel entropy sourcewhich is consistent with the previous view that FNN per-forms well in testing Markov sources

42 Comparison on the Scope of Applicability After evalu-ating the accuracy we further validate the scope of appli-cability of our proposed predictors and compare them withthat of the 90Brsquos predictors Kelsey et al [24] stated that eachof the 90Brsquos predictors performs well only for a specialdistribution as described in Section 221 To prove ourpredictor has better applicability the following four simu-lated datasets are generated which are suitable for eachpredictor employed in the final 90B

(i) Time-Varying Sourcese probability distribution ofdata sources is varying with timeeMCWpredictor

predicts the current output according to previousoutputs in a short period of time and thus the MCWpredictor performs well in these data sources

(ii) Periodic Sources e data source changes periodi-cally e lag predictor predicts the value that oc-curred samples back in the sequence as the currentoutput and thus the lag predictor performs well onsources with strong periodic behavior

(iii) Markov Sources e data sources can be modeledby the Markov model e MultiMMC predictorpredicts the current output according to theMarkovmodel and thus the MultiMMC predictor performswell on data from any process that can be accuratelymodeled by a Markov model

(iv) LZ78Y Sources e data sources can be efficientlycompressed by LZ78-like compression algorithmswhich applies to the LZ78Y predictor well

For each above simulated source we generate a set of 10simulated datasets each of which contains 106 samples andthe min-entropy is estimated by our and 90Brsquos predictorse final result for a predictor is the average value of 10estimated results corresponding to the 10 simulated datasetsfor one simulated source

421 Time-Varying Sources Firstly we generate the time-varying binary data which is suitable for the statistical be-haviors of the MCW predictor presented in the 90B Table 7shows the entropy estimation results for time-varying data

As shown in Table 7 symbol gradual(x) (x isin [0 1] thesame below) is defined as a simulated source that theprobability of output ldquo0rdquo changes gradually from x to 1 minus x

with time Symbol period(x) is defined as a simulated sourcethat the probability of output ldquo0rdquo changes periodically withtime and the probability varies from x to 1 minus x in one periode period length is set to 20 of the entire input datasetSymbol sudden(x) is defined as a simulated source that theprobability of output ldquo0rdquo changes suddenly with timenamely the probability is set to x for the first half of the inputdataset and 1 minus x for the last half

From Table 7 the estimation results for MCW predictorand our work are shown in bold font We see that the MCWpredictor gives the lowest and most accurate entropy esti-mations for the three types of time-varying data mentionedabove but it gives a little underestimates at gradual(02) andperiod(02) It is confirmed that the time-varying sourcesmentioned above match with the statistical behaviors of theMCW predictor Relatively we find that our proposedpredictive models are all capable to obtain the satisfiedentropy estimations that are close to the correct valueserefore it is proved that our proposed predictive modelsare suitable for the time-varying datamentioned above Notethat we calculate the min-entropy estimate according to theentire dataset rather than the last 20 of the input dataset forthese time-varying sources Because the probability distri-bution is varying with time the part of the input datasetcannot represent the overall distribution of the input dataset

Table 5 Relative errors of the final estimations of 90Brsquos predictorsand our predictors for five classes of simulated sources

Simulated data class 90Brsquos predictors () Our predictors ()Uniform 437 153Near-uniform 347 159Normal 608 157Time-varying normal 347 172Markov 1465 655

14 Security and Communication Networks

422 Periodic Sources Secondly we generate periodic datawhich are suitable for the statistical behaviors of the lagpredictor presented in 90B e following is entropy esti-mation results for periodic sequences e data source iscompletely obeying the periodic rule so the correct entropyis zero e bit width of samples is traversed from 2 to 8

As shown in Table 8 the estimation results for the lagpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated periodicsources we confirm that the lag predictor is suitable for theentropy estimation of this type of source as expected Rel-atively the RNN can also give the accurate min-entropyestimates ie estimated results are zeros us our pro-posed predictive models are suitable for the entropy esti-mation of the (strong) periodic data In addition theMultiMMC predictor can also give the accurate min-entropyestimations is is reasonable because periodicity is also aform of correlation

423 Markov Sources Next we generate multivariateM-sequences as Markov sources which fit the statisticalbehaviors of the MultiMMC predictor Specifically themultivariate M-sequences are composed of multiple M-se-quences with different initial states Due to the determinacyof this type of sequences the correct entropy is zero e bitwidth of the samples is also traversed from 2 to 8 emaximum step of correlation used here is set as 8 Table 9shows the estimated results for multivariate M-sequences

From Table 9 the estimation results for MultiMMCpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated Markovsources we confirm that theMultiMMC predictor is suitablefor the entropy estimation of this type of source as expectedRelatively the RNN can also give the accurate min-entropyestimations ie estimated results are zeros us our

proposed predictive models are suitable for the Markovsources

424 LZ78Y Sources Finally we verify the applicability ofthe LZ78Y sources is type of entropy source is difficult togenerate by simulating However we can still draw theconclusion that our proposed predictive models can beapplied to the LZ78Y sources according to Tables 8 and 9 initalic font Because the periodic data and Markov sequencesare compressible

425 Summary on Applicability Scope of Our PredictorsBy analyzing the experimental results of the above fourspecific simulated sources each of which is oriented towardsa certain predictor in the 90B we have a conclusion that ourpredictors can provide accurate estimated results of entropySo the proposed predictors are well applied to these entropysources as well as the 90Brsquos predictors In addition com-pared with 90Brsquos predictors our predictors have a betterperformance on the scope of applicability for testing the

Table 7 Entropy estimates for time-varying data

Data class Correct90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNgradual(02) 06345 05290 07808 07240 07790 06288 06289gradual(03) 07437 07378 09221 08416 09243 07430 07460gradual(04) 08645 08631 09786 09518 09739 08648 08637period(02) 06345 05537 07428 05537 07669 06205 06209period(03) 07437 07393 09218 08476 09233 07377 07375period(04) 08645 08639 09767 09632 09796 08653 08632sudden(02) 03219 03203 04663 03386 04484 03217 03229sudden(03) 05146 05110 05857 09984 07663 05110 05119sudden(04) 07370 07338 08699 09984 09389 07339 07345

Table 8 Entropy estimates for periodic sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 16458 00000 00000 11817 00079 000003 23318 00000 00000 15957 01315 000004 29147 00000 00000 18016 04748 000005 33269 00000 00000 14586 08898 000006 39092 00000 00000 08322 34944 000007 44908 00000 00000 03973 34960 000008 44919 00000 00000 02027 35408 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 6 Entropy estimates for real-world sources

Real-world sources90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNRANDOMORG 09951 09963 09966 09976 09802 09954Ubldit TrueRNGpro 09979 09955 09973 09966 09934 09728Linux kernel entropy source 06173 01232 01269 06164 01230 03068Linuxdevurandom 09952 09935 09990 09964 09983 09911Windows RNG 09953 09986 09975 09984 09833 09853

Security and Communication Networks 15

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 7: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

Combining the calculation principle of min-entropy inSection 21 we can see the lower bound on the probabilityof making a correct prediction gives an upper bound on theentropy of the source In other words the more predictablea source is the larger probability of making correct pre-dictions is and the less entropy it has erefore a modelthat is a bad fit for the source or not fully trained will resultin inaccurate predictions a low accurate predictionprobability and a too-high entropy estimation of thesource So the models that are bad fit for the source or notfully trained can give big overestimates but notunderestimates

Further we can confirm that adding one more pre-dictor will not do any harm and conversely will make theentropy estimation much more accurate From the exe-cution strategy we can see that if all the predictors whosemodels are not matched for the noise source are usedalongside a predictor whose underlying model matchesthe sourcersquos behavior well then the predictor whichmatches the source well will determine the final entropyestimation

32 Choices of Important Parameters

321 Hyperparameters for FNN and RNN In neural net-works the choices of modelsrsquo hyperparameters have sig-nificant influences on the computational resource andperformance required to train and test erefore thechoices of hyperparameters are crucial to neural networksNext we illustrate the choices of some key hyperparameters

(1) Hidden Layers and Nodes Comprehensively balance theaccuracy and efficiency of our predictors in this paper forthe FNNmodel except for the multivariate M-sequences weset the number of hidden layers as 2 and the number ofhidden nodes per layer is 10 and 5 respectively While forthe multivariate M-sequences after extensive tests thenumber of hidden nodes per layer shall be larger to givebetter results By observing the results finally we set thenumbers as 35 and 30 respectively

(2) Step of Memory e step of memory determines thenumber of previous samples used for predicting the currentoutput Generally speaking the larger the value the betterthe performance However the computational resources(memory and runtime) increase as the step of memorygrows In this paper we set the step of memory as 20 bytrading off performance and resource at is to say as forthe FNN the input at time step t is the previous 20 observedvalues and as for the RNN the hidden layer contains 20unfolded hidden units

(3) Loss Function e loss function refers to the functionthat can measure the difference between the predicted valuesand the true values e total loss for a given sequence ofx x1 xn1113864 1113865 values paired with a sequence ofy y1 yn1113864 1113865 values would then be just the sum of thelosses over all the time steps For example if Lt is thenegative log-likelihood of yt given x1 xt then

L x1 xn1113864 1113865 y1 yn1113864 1113865( 1113857

1113944t

Lt

minus 1113944t

log2 pmodel yt

1113868111386811138681113868 x1 xt1113872 11138731113872 1113873

(7)

where pmodel(yt | x1 xt) is given by reading the entry foryt from the modelrsquos output vector 1113954yt e models are trainedto minimize the cross-entropy between the training data andthe modelsrsquo predictions (ie equation (7)) which isequivalent to minimizing the mean squared error (ie theaverage of the squares of the errors or deviations)

(4) Learning Rate Learning rate is a positive scalar deter-mining the size of the step To control the effective capacity ofthe model we need to set the value of learning rate in anappropriate range e learning rate determines how fast theparameter θmoves to its optimum value If the learning rate istoo large gradient descent can inadvertently increase ratherthan decrease the training error namely the parameters arelikely to cross the optimal value However if the learning rateis too small the training is not only slower but may becomepermanently stuck with a high training error So the learningrate is crucial to the performance of the model

Based on the above analysis we pick the learning rateapproximately on a logarithmic scale ie the learning ratetaken within the set 01 001 10minus 3 10minus 4 10minus 51113864 1113865 At thebeginning of model training we set the learning rate aslarger value to faster reach the optimum value en withthe number of training increasing we set the smaller valuefor not crossing the optimal value e detailed settings aredescribed in Algorithm 1

(5) Activation Function In general we must use a nonlinearfunction to describe the features Most neural networks doso using an affine transformation controlled by learnedparameters followed by a fixed nonlinear function called anactivation function Activation function plays an importantrole in neural networks e commonly used activationfunctions include tanh(middot) relu(middot) and sigmoid functionwhich is defined as σ(middot) ie equation (8)) in this paperBecause the sigmoid function is easy to saturate whichcauses the gradient to change slowly during training it isgenerally no longer used as an activation function except inRNN-LSTM (long short term memory)

σ(x) 1

1 + eminus x (8)

After many attempts (ie we compare the efficiency andperformance by means of the exhaustive method manually)we finally choose the tanh(middot) and relu(middot) as activationfunctions for FNN and RNN respectively ey can beexpressed as

tanh(x) 1 minus eminus 2x

1 + eminus 2x (9)

Compared with σ(middot) tanh(middot) is symmetrical about theorigin In some cases this symmetry can give better

Security and Communication Networks 7

performance It compresses the real-valued input to a rangeof minus 1 to 1 and the mean of its output is zero which makes itconverge faster than the σ(middot) and reduce the number ofiterations erefore this is suitable for activation functionand the zero-centered training dataset contributes theconvergence speed of model training

relu(x) max(0 x) (10)

where relu(middot) is currently a popular activation function It islinear and gets the activation value which requires the onlyone threshold We choose this function based on the fol-lowing two considerations On the one hand it solvesvanishing gradient problem of back propagation throughtime (BPTT) algorithms for the reason that the derivative ofrelu(middot) is 1 On the other hand it greatly improves the speedof calculation because it only needs to judge whether theinput is greater than 0

(6) Output Function e output function is used in the finallayer of a neural network model e predictors for timeseries are considered as a solution to a multiclass classifi-cation problem so we take softmax(middot) as the output func-tion which can be expressed as

yi softmax zi( 1113857 ezi

1113936si1 ezi

(11)

where s is the size of sample space and softmax(zi) denotesthe probability of the output is zi and satisfies that1113936

si1 yi 1

ie the sum of the probability of all the outputs is equal to 1Such networks are commonly trained under a cross-entropyregime (ie the loss function mentioned above)

322 Selection of Testing Dataset Length To better estimatethe entropy of the data source the length of the testingdataset is very important for min-entropy estimation forrandom numbers generated by different types of sources Inreality most entropy sources are time-varying (namelynonstationary) which means the probability distribution ofthe output sequences from the source is changing over timeSo the length of the testing dataset shall be adaptive to thetype of the source

erefore as described in Section 31 we utilize c todetermine the size of testing dataset Specifically in ourstrategy for the stationary entropy source of which theprobability distribution of the outputs from the source is notchanging over time the parameter c is preset to 20

Relatively for the nonstationary entropy source all obser-vation points (namely c is 100) need to serve as the testingdataset

To verify the reasonableness of the c value we computethe root-mean-squared error (RMSE) of the lowest esti-mations of our predictors over 80 sequences from the fol-lowing simulated datasets generated by nonstationarysource

(i) Time-Varying Normal Distribution Rounded to In-tegers e samples are subject to a normal distri-bution and rounded to integer values but themean ofthe distribution moves along a sine curve to simulatea time-varying signal

e RMSE ie

(1N) 1113936Ni1 ( 1113954Hmin minus Hmin)2

1113969

refers tothe arithmetic square root for the mean of the squares of theerrors or deviations for each class of simulated sources Notethat here N indicates the number of test samples 1113954Hminindicates the estimated result for each sample and Hminmeans the theoretical result for each sample In other wordsthe smaller the RMSE is the closer the estimated result is tothe theoretical entropy which indicates the predictor has abetter accuracy

As shown in Table 1 we can see that for the time-varyingdata source only when the c is 100 (namely the entiredataset shall be used for min-entropy estimation) the pre-dictors can give themost accurate resultsismeans when theprobability distribution of data sources is varying with timethe part of the input dataset cannot represent the overalldistribution of the input dataset so the part of the input datasetcannot accurately give the estimation result of the entire inputdataset Besides for the stationary sources it is reasonable thatc is preset to 20 because the estimated results obtained byour method are very close to the correct entropy (in theory) ofthe selected entropy source as presented in Section 41

33 Evaluation onOurPredictors In this section we conductsome experiments on simulated datasets to verify the accuracyof our proposed predictors for the min-entropy estimationand compare the experimental results with theoretical resultsIn addition we have a theoretical analysis of the complexity ofour predictors Note that in Section 4 we will apply ourpredictors to different data sources and provide the com-parison on our predictors with 90Brsquos predictors

331 Accuracy Verification We train our predictive modelsFNN and RNN on a number of representative simulated datasources (including stationary and nonstationary entropysources) of which the theoretical entropy can be obtainedfrom the known probability distribution of the outputsSimulated datasets are produced using the following dis-tribution families adopted in [24]

(1) Simulated Datasets Generated by Stationary Sources

(i) Discrete Uniform Distribution e samples areequally likely which come from an IID source

(1) if train numlt train dataset size3 then(2) learning rate⟵ 01 001 (3) else if train numlt train dataset size15 then(4) learning rate⟵ 001 10minus 3 10minus 41113864 1113865(5) else(6) learning rate⟵ 10minus 4 10minus 51113864 1113865(7) end if

ALGORITHM 1 Setting of learning rate

8 Security and Communication Networks

(ii) Discrete Near-Uniform Distribution All samples areequally likely except one which come from an IIDsource A certain sample has a higher probabilitythan the rest

(iii) Normal Distribution Rounded to Integers esamples are subject to a normal distribution androunded to integer values which come from an IIDsource

(iv) Markov Model e samples are generated using adth-order Markov model which come from a non-IID source

(2) Simulated Datasets Generated by Nonstationary Sourcesese datasets are the same as those used in Section 322

For every class listed above we generate a set of 80simulated datasets each of which contains 106 samples andestimate min-entropy by using predictive models FNN andRNN respectively For each dataset the theoretical min-entropy Hmin is derived from the known probabilitydistribution

From Figures 4ndash9 the abscissa in the figure representsthe theoretical entropy of the test sample and the ordinaterepresents the estimated entropy of the test sample Figure 4shows the estimated entropy results for the 80 simulateddatasets with uniform and near-uniform distributions re-spectively From Figures 4(a) and 4(b) we see that the es-timated results given by our proposed two predictive models(FNN and RNN) are almost consistent with the theoreticalentropy for both uniform and near-uniform distributionsSo the final estimated result which is the minimum result ofthe two predictive models is also basically consistent with thetheoretical entropy Figure 5 shows the estimated entropyresults for the 80 simulated datasets with normal distribu-tions and time-varying normal distributions respectivelyFrom Figures 5(a) and 5(b) we can see that the estimatedresults given by our proposed two predictive models areclose to the theoretical entropy with normal distributionsand time-varying normal distributions According to ourexecution strategy here we calculate min-entropy estima-tions using the whole input dataset for time-varying normaldistributions

Figure 6 shows the estimated results of Markov distri-butions we can see that both of our predictive models give anumber of overestimates when applied to the Markovsources particularly with the theoretical entropy increasing

Table 2 shows the relative errors (namely|( 1113954Hmin minus Hmin)Hmin|lowast100) between the theoretical re-sults and the estimated results of FNN and RNN to furtherreflect the accuracy of the models 1113954Hmin and Hmin have thesame meaning as in Section 322 We see that the entropy tobe estimated with an error of less than 602 for FNN and7 for RNN for the simulated classes respectively

Based on the above accuracy verification of our pre-dictors with simulated datasets from different distributionswhat we can be sure is that out predictors can give almostaccurate results except Markov distributions

332 Complexity Analysis To analyze the usability of ourpredictors in terms of execution efficiency we derive thefollowing computational complexity through the analysis oftheory and principle of implementation

We believe that the computational complexity of entropyestimators used for RNG evaluation mainly comes from thesample space and sample size For ease of analysis we definethe following parameter n as the sample size which indicatesthe length of the sample s as the sample space which meansthe kinds of symbols in the sample (ie s 8means there are8 symbols in the sample and the bit width of each symbol islog2(8) 3 such as 010 110 111 ) and k denotes themaximum step of correlation which is set as a constant in90Brsquos predictors (k 16) and our predictors (k 20)

rough the analysis of the implementation the com-putational complexity of the final 90Brsquos predictors [23]mainly comes from the MultiMMC predictor and is of orderO(sk middot n + 2k middot n middot log2(s)) which is mainly linear timecomplexity of n and k-order polynomial time complexity ofs While the computational complexity of our predictor is oforder O(s middot n) which is linear time complexity of s and n Itcan be seen that the computational complexity of ourpredictors is much lower than that of the 90Brsquos predictors

It is important to note that the MultiMMC predictorrequires sk≪ n otherwise this predictor cannot give ac-curate estimated results statisticallyat is to say when the sis increasing the MultiMMC predictor requires largersample size in order to estimate the entropy accurately

From the above analysis we can see our predictors havelower computational complexity We will give the experi-mental proof in Section 43

4 Comparison on Our Predictors With90BrsquoS Predictors

In this section a large number of experiments have beendone to evaluate our proposed predictors for entropy esti-mation from the aspects of accuracy applicability and ef-ficiency by applying our predictors to different simulateddata and real-world data For the experiments mentionedabove we compare the results with the final 90Brsquos predictors[23] to highlight the advantages of our work Similarly ourpredictors in these experiments compute an upper-bound ofmin-entropy estimation at the significance level α 001which is the same as 90Brsquos predictors

41 Comparison on Accuracy

411 Simulated Data e simulated datasets are producedusing the same distribution families as described in Section331 Further we append the following two new distributionfamilies such as pseudorandom sequence and post-processing sequence which are representative and com-monly used in reality

Table 1 Error measures of final estimations of our predictors fornonstationary sources with different c values

c 01 02 04 06 08 1RMSE 00911 01364 00788 00817 00219 00149

Security and Communication Networks 9

eoretical entropyFNNRNN

0

1

2

3

4

5

6

7

8Es

timat

ed en

trop

y pe

r sam

ple

1 2 3 4 5 6 7 80eoretical entropy per sample

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 4 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for(a) uniform distributions and (b) near-uniform distributions

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entro

py p

er sa

mpl

e

(b)

Figure 5 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for (a) normaldistributions and (b) time-varying normal distributions

10 Security and Communication Networks

Theoretical entropyFNNRNN

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

Figure 6 Comparison of estimated results obtained from our two predictive models with the theoretical entropy for Markov distributions

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 7 Comparison of the min-entropy estimation obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) uniform distributions and (b) near-uniform distributions

Security and Communication Networks 11

(i) M-Sequence A maximum length sequence which isa type of pseudorandom binary sequence ([35])

(ii) Nonuniform Distribution by Postprocessing UsingLFSR e samples are processed using a linearfeedback shifting register (LFSR) which come froman IID source ([35])

For every distribution mentioned above we also gen-erate a set of 80 simulated datasets each of which contains106 samples and estimate min-entropy by using our pro-posed predictors and final 90Brsquos predictors [23]

Figure 7 shows the estimated min-entropy results for the80 simulated datasets with uniform distributions and near-uniform distributions respectively From Figures 7(a) and7(b) we see that several points of the results obtained fromthe 90Brsquos predictors are apparently underestimated whichmay result from the overfitting phenomenon Comparedwith 90Brsquos predictors our predictors provide more accurateresults

Figure 8 shows the estimated min-entropy results fornormal distributions and time-varying normal distributionsrespectively From Figures 8(a) and 8(b) we can see that theestimated results given by our predictors are close to thetheoretical entropy with normal distributions and time-varying normal distributions However the lowest entropyestimation results obtained from the 90Brsquos predictors givesignificant underestimates

Figure 9 shows the estimated min-entropy results forMarkov distributions We can see that the 90Brsquos predictorsalmost give underestimates compared with the theoreticalentropy while estimated results given by our predictors aremuch closer to the theoretical entropy than those obtainedfrom 90Brsquos predictors

To further obviously compare the accuracy of our and90Brsquos predictors we apply the predictors to the M-sequenceand the non-uniform distribution sequence by post-pro-cessing using LFSR and their theoretical entropy is a knownand fixed value

It is further confirmed that the higher stage (the max-imum step of correlation) M-sequence and nonuniformdistribution sequence by postprocessing using LFSR are ableto pass the NIST SP 800-22 statistical tests [8] e estimatedresults are listed in Tables 3 and 4 and the lowest entropyestimations from 90Brsquos predictors and our predictors foreach stage are shown in bold font

For M-sequence and nonuniform distribution by post-processing using LFSR the MultiMMC predictor presentedin the final 90B gives the most accurate entropy estimationresults for the stage le 16 However when the stage ofM-sequence and nonuniform distribution by postprocessingusing LFSR is greater than 16 the MultiMMC predictorcannot give accurate entropy estimation result because thispredictor is parameterized by k isin 1 2 16 (k is themaximum step of correlation) Perhaps we could set the

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7Es

timat

ed en

trop

y pe

r sam

ple

(a)

eoretical entropy90Brsquos predictorsOur predictors

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e1 2 3 4 5 6 70

eoretical entropy per sample

(b)

Figure 8 Comparison of the min-entropy estimations obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) normal distributions and (b) time-varying normal distributions

12 Security and Communication Networks

parameter of the MultiMMC predictor as a greater range toachieve a more accurate estimated result for the higher stagebut the time complexity grows exponentially with the pa-rameter k as we analyzed in Section 332 Moreover theFNN model can also give accurate estimated results eventhough the stages of M-sequence and LFSR are greater than16 However the RNN model can give accurate estimated

results only when the stage is 8erefore the FNNmodel ismore matched to M-sequence and nonuniform distributionby postprocessing using LFSR than RNN

We also compute the relative errors of estimated resultsfrom 90Brsquos predictors and our predictors over 80 se-quences from each class of simulated sources We calculatethe relative errors using the min-entropy obtained from90Brsquos predictors (the lowest estimation result of 90Brsquos fourpredictors) and our predictors (the lowest estimation re-sult of FNN and RNN) respectively As illustrated inTable 5 it shows that for all five classes of simulatedsources the errors of our predictors are lower than that ofthe 90Brsquos predictors Specially our approaches enable theentropy to be estimated with an error of less than 655but it is up to 1465 for 90Brsquos predictors Overall thisindicates that our proposed predictors have a betterperformance than that of 90Brsquos predictors on accuracy forboth stationary sequences and nonstationary sequenceswhich is consistent with the conclusion drawn in thefigures above

From Tables 2ndash4 we also find that the accuracy of theRNN predictive model is slightly higher than that of theFNN predictive model except for the cases of the Markovsources M-sequence and nonuniform distribution bypostprocessing using LFSR

We will further verify the applicability for time-varyingsources in Section 42 erefore through the evaluation onthe entropy estimation results of the above simulateddatasets we see that our proposed predictors are superior inaccuracy compared with the 90Brsquos predictors

412 Real-World Data We further apply our predictors tothe datasets which are generated from the RNGs deployed inthe real-world In fact the theoretical entropy of per sampleis unknown for these real-world sources so no error can becompared like the simulated datasets for the predictorsHowever the estimated results from the predictors pre-sented here can still be compared to the 90Brsquos predictorsbased on the knowledge that underestimates from thepredictors have theoretical bounds

Datasets of real-world data are produced using thefollowing approaches e first two are adopted in [24] andthe others are commonly used typical RNGs e estima-tions of the real-world sources are presented in Table 6

(i) RANDOMORG is is a service that providesrandom numbers based on atmospheric noise and isused in [24] It allows the user to specify the

Theoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

Figure 9 Comparison of the min-entropy estimations obtainedfrom our proposed predictors and 90Brsquos predictors with the the-oretical entropy for Markov distributions

Table 2 Relative errors of FNN and RNN estimation results

Simulated data class FNN () RNN ()Uniform 165 16Near-uniform 160 152Normal 117 108Time-varying normal 212 184Markov 602 7

Table 3 Estimated results for M-sequence (Hmin 0000)

Stage 8 10 12 14 16 18 20MultiMCW 0991 0996 0988 0989 0993 0999 lowast 1000Lag 1000 1000 1000 1000 1000 1000 1000MultiMMC 0000 0000 0000 0000 0000 1000 1000LZ78Y 1000 1000 1000 1000 1000 1000 0997 lowast

FNN 0000 0000 0000 0000 0000 0000 0000RNN 0000 1048 1007 1002 0996 09920 09997

Table 4 Estimated results for nonuniform distribution by post-processing using LFSR (Hmin 0152)

Stage 8 10 12 14 16 18 20MultiMCW 0440 0595 0743 0721 0998 0994 0998Lag 0581 0581 0680 0680 0992 0994 0999MultiMMC 0151 0153 0158 0181 0234 0995 0996LZ78Y 0567 0995 0766 0679 0997 0996lowast 0994lowast

FNN 0151 0145 0149 0147 0149 0142 0144RNN 0149 0947 1012 0998 1012 0997 0985

Security and Communication Networks 13

minimum and maximum values that are outpute sequence used here consists of bits

(ii) Ubldit TrueRNGpro TrueRNGpro provides a steadystream of random numbers through a USB CDCserial port which is a USB randomnumber generatorproduced by Ubldit is entropy source is also usedin [24] e sequence used here consists of bits

(iii) Linux kernel entropy source e Linux kernelrandom generator is used for the generation of areal-world sequence without any processing esequence used here is the last bit of per symbol

(iv) Linuxdevurandomedevurandom [6] of Linuxis used for the generation of a real-world sequencewith strict processing e sequence used hereconsists of bits

(v) Windows RNG Windows RNG [5] is used for thegeneration of a real-world sequence by calling aCrypto APIe sequence used here consists of bits

As illustrated in Table 6 the lowest entropy estimationfor each source is shown in bold font We see that ourpredictors perform better than 90Brsquos predictors because thelowest entropy estimation is always obtained from our workfor each real-world source Furthermore for Linux kernelentropy source we find that both of the predictor Lag andMultiMMC are able to give lower estimation results Itindicates that Linux kernel entropy source has periodicityand conforms to the Markov model which is well under-stood because the randomness of Linux kernel entropysource comes from human behaviors such as manipulatingthe mouse and keyboard In our work compared with theentropy estimations for other real-world sources FNN fitsmuch better than RNN for Linux kernel entropy sourcewhich is consistent with the previous view that FNN per-forms well in testing Markov sources

42 Comparison on the Scope of Applicability After evalu-ating the accuracy we further validate the scope of appli-cability of our proposed predictors and compare them withthat of the 90Brsquos predictors Kelsey et al [24] stated that eachof the 90Brsquos predictors performs well only for a specialdistribution as described in Section 221 To prove ourpredictor has better applicability the following four simu-lated datasets are generated which are suitable for eachpredictor employed in the final 90B

(i) Time-Varying Sourcese probability distribution ofdata sources is varying with timeeMCWpredictor

predicts the current output according to previousoutputs in a short period of time and thus the MCWpredictor performs well in these data sources

(ii) Periodic Sources e data source changes periodi-cally e lag predictor predicts the value that oc-curred samples back in the sequence as the currentoutput and thus the lag predictor performs well onsources with strong periodic behavior

(iii) Markov Sources e data sources can be modeledby the Markov model e MultiMMC predictorpredicts the current output according to theMarkovmodel and thus the MultiMMC predictor performswell on data from any process that can be accuratelymodeled by a Markov model

(iv) LZ78Y Sources e data sources can be efficientlycompressed by LZ78-like compression algorithmswhich applies to the LZ78Y predictor well

For each above simulated source we generate a set of 10simulated datasets each of which contains 106 samples andthe min-entropy is estimated by our and 90Brsquos predictorse final result for a predictor is the average value of 10estimated results corresponding to the 10 simulated datasetsfor one simulated source

421 Time-Varying Sources Firstly we generate the time-varying binary data which is suitable for the statistical be-haviors of the MCW predictor presented in the 90B Table 7shows the entropy estimation results for time-varying data

As shown in Table 7 symbol gradual(x) (x isin [0 1] thesame below) is defined as a simulated source that theprobability of output ldquo0rdquo changes gradually from x to 1 minus x

with time Symbol period(x) is defined as a simulated sourcethat the probability of output ldquo0rdquo changes periodically withtime and the probability varies from x to 1 minus x in one periode period length is set to 20 of the entire input datasetSymbol sudden(x) is defined as a simulated source that theprobability of output ldquo0rdquo changes suddenly with timenamely the probability is set to x for the first half of the inputdataset and 1 minus x for the last half

From Table 7 the estimation results for MCW predictorand our work are shown in bold font We see that the MCWpredictor gives the lowest and most accurate entropy esti-mations for the three types of time-varying data mentionedabove but it gives a little underestimates at gradual(02) andperiod(02) It is confirmed that the time-varying sourcesmentioned above match with the statistical behaviors of theMCW predictor Relatively we find that our proposedpredictive models are all capable to obtain the satisfiedentropy estimations that are close to the correct valueserefore it is proved that our proposed predictive modelsare suitable for the time-varying datamentioned above Notethat we calculate the min-entropy estimate according to theentire dataset rather than the last 20 of the input dataset forthese time-varying sources Because the probability distri-bution is varying with time the part of the input datasetcannot represent the overall distribution of the input dataset

Table 5 Relative errors of the final estimations of 90Brsquos predictorsand our predictors for five classes of simulated sources

Simulated data class 90Brsquos predictors () Our predictors ()Uniform 437 153Near-uniform 347 159Normal 608 157Time-varying normal 347 172Markov 1465 655

14 Security and Communication Networks

422 Periodic Sources Secondly we generate periodic datawhich are suitable for the statistical behaviors of the lagpredictor presented in 90B e following is entropy esti-mation results for periodic sequences e data source iscompletely obeying the periodic rule so the correct entropyis zero e bit width of samples is traversed from 2 to 8

As shown in Table 8 the estimation results for the lagpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated periodicsources we confirm that the lag predictor is suitable for theentropy estimation of this type of source as expected Rel-atively the RNN can also give the accurate min-entropyestimates ie estimated results are zeros us our pro-posed predictive models are suitable for the entropy esti-mation of the (strong) periodic data In addition theMultiMMC predictor can also give the accurate min-entropyestimations is is reasonable because periodicity is also aform of correlation

423 Markov Sources Next we generate multivariateM-sequences as Markov sources which fit the statisticalbehaviors of the MultiMMC predictor Specifically themultivariate M-sequences are composed of multiple M-se-quences with different initial states Due to the determinacyof this type of sequences the correct entropy is zero e bitwidth of the samples is also traversed from 2 to 8 emaximum step of correlation used here is set as 8 Table 9shows the estimated results for multivariate M-sequences

From Table 9 the estimation results for MultiMMCpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated Markovsources we confirm that theMultiMMC predictor is suitablefor the entropy estimation of this type of source as expectedRelatively the RNN can also give the accurate min-entropyestimations ie estimated results are zeros us our

proposed predictive models are suitable for the Markovsources

424 LZ78Y Sources Finally we verify the applicability ofthe LZ78Y sources is type of entropy source is difficult togenerate by simulating However we can still draw theconclusion that our proposed predictive models can beapplied to the LZ78Y sources according to Tables 8 and 9 initalic font Because the periodic data and Markov sequencesare compressible

425 Summary on Applicability Scope of Our PredictorsBy analyzing the experimental results of the above fourspecific simulated sources each of which is oriented towardsa certain predictor in the 90B we have a conclusion that ourpredictors can provide accurate estimated results of entropySo the proposed predictors are well applied to these entropysources as well as the 90Brsquos predictors In addition com-pared with 90Brsquos predictors our predictors have a betterperformance on the scope of applicability for testing the

Table 7 Entropy estimates for time-varying data

Data class Correct90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNgradual(02) 06345 05290 07808 07240 07790 06288 06289gradual(03) 07437 07378 09221 08416 09243 07430 07460gradual(04) 08645 08631 09786 09518 09739 08648 08637period(02) 06345 05537 07428 05537 07669 06205 06209period(03) 07437 07393 09218 08476 09233 07377 07375period(04) 08645 08639 09767 09632 09796 08653 08632sudden(02) 03219 03203 04663 03386 04484 03217 03229sudden(03) 05146 05110 05857 09984 07663 05110 05119sudden(04) 07370 07338 08699 09984 09389 07339 07345

Table 8 Entropy estimates for periodic sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 16458 00000 00000 11817 00079 000003 23318 00000 00000 15957 01315 000004 29147 00000 00000 18016 04748 000005 33269 00000 00000 14586 08898 000006 39092 00000 00000 08322 34944 000007 44908 00000 00000 03973 34960 000008 44919 00000 00000 02027 35408 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 6 Entropy estimates for real-world sources

Real-world sources90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNRANDOMORG 09951 09963 09966 09976 09802 09954Ubldit TrueRNGpro 09979 09955 09973 09966 09934 09728Linux kernel entropy source 06173 01232 01269 06164 01230 03068Linuxdevurandom 09952 09935 09990 09964 09983 09911Windows RNG 09953 09986 09975 09984 09833 09853

Security and Communication Networks 15

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 8: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

performance It compresses the real-valued input to a rangeof minus 1 to 1 and the mean of its output is zero which makes itconverge faster than the σ(middot) and reduce the number ofiterations erefore this is suitable for activation functionand the zero-centered training dataset contributes theconvergence speed of model training

relu(x) max(0 x) (10)

where relu(middot) is currently a popular activation function It islinear and gets the activation value which requires the onlyone threshold We choose this function based on the fol-lowing two considerations On the one hand it solvesvanishing gradient problem of back propagation throughtime (BPTT) algorithms for the reason that the derivative ofrelu(middot) is 1 On the other hand it greatly improves the speedof calculation because it only needs to judge whether theinput is greater than 0

(6) Output Function e output function is used in the finallayer of a neural network model e predictors for timeseries are considered as a solution to a multiclass classifi-cation problem so we take softmax(middot) as the output func-tion which can be expressed as

yi softmax zi( 1113857 ezi

1113936si1 ezi

(11)

where s is the size of sample space and softmax(zi) denotesthe probability of the output is zi and satisfies that1113936

si1 yi 1

ie the sum of the probability of all the outputs is equal to 1Such networks are commonly trained under a cross-entropyregime (ie the loss function mentioned above)

322 Selection of Testing Dataset Length To better estimatethe entropy of the data source the length of the testingdataset is very important for min-entropy estimation forrandom numbers generated by different types of sources Inreality most entropy sources are time-varying (namelynonstationary) which means the probability distribution ofthe output sequences from the source is changing over timeSo the length of the testing dataset shall be adaptive to thetype of the source

erefore as described in Section 31 we utilize c todetermine the size of testing dataset Specifically in ourstrategy for the stationary entropy source of which theprobability distribution of the outputs from the source is notchanging over time the parameter c is preset to 20

Relatively for the nonstationary entropy source all obser-vation points (namely c is 100) need to serve as the testingdataset

To verify the reasonableness of the c value we computethe root-mean-squared error (RMSE) of the lowest esti-mations of our predictors over 80 sequences from the fol-lowing simulated datasets generated by nonstationarysource

(i) Time-Varying Normal Distribution Rounded to In-tegers e samples are subject to a normal distri-bution and rounded to integer values but themean ofthe distribution moves along a sine curve to simulatea time-varying signal

e RMSE ie

(1N) 1113936Ni1 ( 1113954Hmin minus Hmin)2

1113969

refers tothe arithmetic square root for the mean of the squares of theerrors or deviations for each class of simulated sources Notethat here N indicates the number of test samples 1113954Hminindicates the estimated result for each sample and Hminmeans the theoretical result for each sample In other wordsthe smaller the RMSE is the closer the estimated result is tothe theoretical entropy which indicates the predictor has abetter accuracy

As shown in Table 1 we can see that for the time-varyingdata source only when the c is 100 (namely the entiredataset shall be used for min-entropy estimation) the pre-dictors can give themost accurate resultsismeans when theprobability distribution of data sources is varying with timethe part of the input dataset cannot represent the overalldistribution of the input dataset so the part of the input datasetcannot accurately give the estimation result of the entire inputdataset Besides for the stationary sources it is reasonable thatc is preset to 20 because the estimated results obtained byour method are very close to the correct entropy (in theory) ofthe selected entropy source as presented in Section 41

33 Evaluation onOurPredictors In this section we conductsome experiments on simulated datasets to verify the accuracyof our proposed predictors for the min-entropy estimationand compare the experimental results with theoretical resultsIn addition we have a theoretical analysis of the complexity ofour predictors Note that in Section 4 we will apply ourpredictors to different data sources and provide the com-parison on our predictors with 90Brsquos predictors

331 Accuracy Verification We train our predictive modelsFNN and RNN on a number of representative simulated datasources (including stationary and nonstationary entropysources) of which the theoretical entropy can be obtainedfrom the known probability distribution of the outputsSimulated datasets are produced using the following dis-tribution families adopted in [24]

(1) Simulated Datasets Generated by Stationary Sources

(i) Discrete Uniform Distribution e samples areequally likely which come from an IID source

(1) if train numlt train dataset size3 then(2) learning rate⟵ 01 001 (3) else if train numlt train dataset size15 then(4) learning rate⟵ 001 10minus 3 10minus 41113864 1113865(5) else(6) learning rate⟵ 10minus 4 10minus 51113864 1113865(7) end if

ALGORITHM 1 Setting of learning rate

8 Security and Communication Networks

(ii) Discrete Near-Uniform Distribution All samples areequally likely except one which come from an IIDsource A certain sample has a higher probabilitythan the rest

(iii) Normal Distribution Rounded to Integers esamples are subject to a normal distribution androunded to integer values which come from an IIDsource

(iv) Markov Model e samples are generated using adth-order Markov model which come from a non-IID source

(2) Simulated Datasets Generated by Nonstationary Sourcesese datasets are the same as those used in Section 322

For every class listed above we generate a set of 80simulated datasets each of which contains 106 samples andestimate min-entropy by using predictive models FNN andRNN respectively For each dataset the theoretical min-entropy Hmin is derived from the known probabilitydistribution

From Figures 4ndash9 the abscissa in the figure representsthe theoretical entropy of the test sample and the ordinaterepresents the estimated entropy of the test sample Figure 4shows the estimated entropy results for the 80 simulateddatasets with uniform and near-uniform distributions re-spectively From Figures 4(a) and 4(b) we see that the es-timated results given by our proposed two predictive models(FNN and RNN) are almost consistent with the theoreticalentropy for both uniform and near-uniform distributionsSo the final estimated result which is the minimum result ofthe two predictive models is also basically consistent with thetheoretical entropy Figure 5 shows the estimated entropyresults for the 80 simulated datasets with normal distribu-tions and time-varying normal distributions respectivelyFrom Figures 5(a) and 5(b) we can see that the estimatedresults given by our proposed two predictive models areclose to the theoretical entropy with normal distributionsand time-varying normal distributions According to ourexecution strategy here we calculate min-entropy estima-tions using the whole input dataset for time-varying normaldistributions

Figure 6 shows the estimated results of Markov distri-butions we can see that both of our predictive models give anumber of overestimates when applied to the Markovsources particularly with the theoretical entropy increasing

Table 2 shows the relative errors (namely|( 1113954Hmin minus Hmin)Hmin|lowast100) between the theoretical re-sults and the estimated results of FNN and RNN to furtherreflect the accuracy of the models 1113954Hmin and Hmin have thesame meaning as in Section 322 We see that the entropy tobe estimated with an error of less than 602 for FNN and7 for RNN for the simulated classes respectively

Based on the above accuracy verification of our pre-dictors with simulated datasets from different distributionswhat we can be sure is that out predictors can give almostaccurate results except Markov distributions

332 Complexity Analysis To analyze the usability of ourpredictors in terms of execution efficiency we derive thefollowing computational complexity through the analysis oftheory and principle of implementation

We believe that the computational complexity of entropyestimators used for RNG evaluation mainly comes from thesample space and sample size For ease of analysis we definethe following parameter n as the sample size which indicatesthe length of the sample s as the sample space which meansthe kinds of symbols in the sample (ie s 8means there are8 symbols in the sample and the bit width of each symbol islog2(8) 3 such as 010 110 111 ) and k denotes themaximum step of correlation which is set as a constant in90Brsquos predictors (k 16) and our predictors (k 20)

rough the analysis of the implementation the com-putational complexity of the final 90Brsquos predictors [23]mainly comes from the MultiMMC predictor and is of orderO(sk middot n + 2k middot n middot log2(s)) which is mainly linear timecomplexity of n and k-order polynomial time complexity ofs While the computational complexity of our predictor is oforder O(s middot n) which is linear time complexity of s and n Itcan be seen that the computational complexity of ourpredictors is much lower than that of the 90Brsquos predictors

It is important to note that the MultiMMC predictorrequires sk≪ n otherwise this predictor cannot give ac-curate estimated results statisticallyat is to say when the sis increasing the MultiMMC predictor requires largersample size in order to estimate the entropy accurately

From the above analysis we can see our predictors havelower computational complexity We will give the experi-mental proof in Section 43

4 Comparison on Our Predictors With90BrsquoS Predictors

In this section a large number of experiments have beendone to evaluate our proposed predictors for entropy esti-mation from the aspects of accuracy applicability and ef-ficiency by applying our predictors to different simulateddata and real-world data For the experiments mentionedabove we compare the results with the final 90Brsquos predictors[23] to highlight the advantages of our work Similarly ourpredictors in these experiments compute an upper-bound ofmin-entropy estimation at the significance level α 001which is the same as 90Brsquos predictors

41 Comparison on Accuracy

411 Simulated Data e simulated datasets are producedusing the same distribution families as described in Section331 Further we append the following two new distributionfamilies such as pseudorandom sequence and post-processing sequence which are representative and com-monly used in reality

Table 1 Error measures of final estimations of our predictors fornonstationary sources with different c values

c 01 02 04 06 08 1RMSE 00911 01364 00788 00817 00219 00149

Security and Communication Networks 9

eoretical entropyFNNRNN

0

1

2

3

4

5

6

7

8Es

timat

ed en

trop

y pe

r sam

ple

1 2 3 4 5 6 7 80eoretical entropy per sample

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 4 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for(a) uniform distributions and (b) near-uniform distributions

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entro

py p

er sa

mpl

e

(b)

Figure 5 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for (a) normaldistributions and (b) time-varying normal distributions

10 Security and Communication Networks

Theoretical entropyFNNRNN

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

Figure 6 Comparison of estimated results obtained from our two predictive models with the theoretical entropy for Markov distributions

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 7 Comparison of the min-entropy estimation obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) uniform distributions and (b) near-uniform distributions

Security and Communication Networks 11

(i) M-Sequence A maximum length sequence which isa type of pseudorandom binary sequence ([35])

(ii) Nonuniform Distribution by Postprocessing UsingLFSR e samples are processed using a linearfeedback shifting register (LFSR) which come froman IID source ([35])

For every distribution mentioned above we also gen-erate a set of 80 simulated datasets each of which contains106 samples and estimate min-entropy by using our pro-posed predictors and final 90Brsquos predictors [23]

Figure 7 shows the estimated min-entropy results for the80 simulated datasets with uniform distributions and near-uniform distributions respectively From Figures 7(a) and7(b) we see that several points of the results obtained fromthe 90Brsquos predictors are apparently underestimated whichmay result from the overfitting phenomenon Comparedwith 90Brsquos predictors our predictors provide more accurateresults

Figure 8 shows the estimated min-entropy results fornormal distributions and time-varying normal distributionsrespectively From Figures 8(a) and 8(b) we can see that theestimated results given by our predictors are close to thetheoretical entropy with normal distributions and time-varying normal distributions However the lowest entropyestimation results obtained from the 90Brsquos predictors givesignificant underestimates

Figure 9 shows the estimated min-entropy results forMarkov distributions We can see that the 90Brsquos predictorsalmost give underestimates compared with the theoreticalentropy while estimated results given by our predictors aremuch closer to the theoretical entropy than those obtainedfrom 90Brsquos predictors

To further obviously compare the accuracy of our and90Brsquos predictors we apply the predictors to the M-sequenceand the non-uniform distribution sequence by post-pro-cessing using LFSR and their theoretical entropy is a knownand fixed value

It is further confirmed that the higher stage (the max-imum step of correlation) M-sequence and nonuniformdistribution sequence by postprocessing using LFSR are ableto pass the NIST SP 800-22 statistical tests [8] e estimatedresults are listed in Tables 3 and 4 and the lowest entropyestimations from 90Brsquos predictors and our predictors foreach stage are shown in bold font

For M-sequence and nonuniform distribution by post-processing using LFSR the MultiMMC predictor presentedin the final 90B gives the most accurate entropy estimationresults for the stage le 16 However when the stage ofM-sequence and nonuniform distribution by postprocessingusing LFSR is greater than 16 the MultiMMC predictorcannot give accurate entropy estimation result because thispredictor is parameterized by k isin 1 2 16 (k is themaximum step of correlation) Perhaps we could set the

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7Es

timat

ed en

trop

y pe

r sam

ple

(a)

eoretical entropy90Brsquos predictorsOur predictors

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e1 2 3 4 5 6 70

eoretical entropy per sample

(b)

Figure 8 Comparison of the min-entropy estimations obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) normal distributions and (b) time-varying normal distributions

12 Security and Communication Networks

parameter of the MultiMMC predictor as a greater range toachieve a more accurate estimated result for the higher stagebut the time complexity grows exponentially with the pa-rameter k as we analyzed in Section 332 Moreover theFNN model can also give accurate estimated results eventhough the stages of M-sequence and LFSR are greater than16 However the RNN model can give accurate estimated

results only when the stage is 8erefore the FNNmodel ismore matched to M-sequence and nonuniform distributionby postprocessing using LFSR than RNN

We also compute the relative errors of estimated resultsfrom 90Brsquos predictors and our predictors over 80 se-quences from each class of simulated sources We calculatethe relative errors using the min-entropy obtained from90Brsquos predictors (the lowest estimation result of 90Brsquos fourpredictors) and our predictors (the lowest estimation re-sult of FNN and RNN) respectively As illustrated inTable 5 it shows that for all five classes of simulatedsources the errors of our predictors are lower than that ofthe 90Brsquos predictors Specially our approaches enable theentropy to be estimated with an error of less than 655but it is up to 1465 for 90Brsquos predictors Overall thisindicates that our proposed predictors have a betterperformance than that of 90Brsquos predictors on accuracy forboth stationary sequences and nonstationary sequenceswhich is consistent with the conclusion drawn in thefigures above

From Tables 2ndash4 we also find that the accuracy of theRNN predictive model is slightly higher than that of theFNN predictive model except for the cases of the Markovsources M-sequence and nonuniform distribution bypostprocessing using LFSR

We will further verify the applicability for time-varyingsources in Section 42 erefore through the evaluation onthe entropy estimation results of the above simulateddatasets we see that our proposed predictors are superior inaccuracy compared with the 90Brsquos predictors

412 Real-World Data We further apply our predictors tothe datasets which are generated from the RNGs deployed inthe real-world In fact the theoretical entropy of per sampleis unknown for these real-world sources so no error can becompared like the simulated datasets for the predictorsHowever the estimated results from the predictors pre-sented here can still be compared to the 90Brsquos predictorsbased on the knowledge that underestimates from thepredictors have theoretical bounds

Datasets of real-world data are produced using thefollowing approaches e first two are adopted in [24] andthe others are commonly used typical RNGs e estima-tions of the real-world sources are presented in Table 6

(i) RANDOMORG is is a service that providesrandom numbers based on atmospheric noise and isused in [24] It allows the user to specify the

Theoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

Figure 9 Comparison of the min-entropy estimations obtainedfrom our proposed predictors and 90Brsquos predictors with the the-oretical entropy for Markov distributions

Table 2 Relative errors of FNN and RNN estimation results

Simulated data class FNN () RNN ()Uniform 165 16Near-uniform 160 152Normal 117 108Time-varying normal 212 184Markov 602 7

Table 3 Estimated results for M-sequence (Hmin 0000)

Stage 8 10 12 14 16 18 20MultiMCW 0991 0996 0988 0989 0993 0999 lowast 1000Lag 1000 1000 1000 1000 1000 1000 1000MultiMMC 0000 0000 0000 0000 0000 1000 1000LZ78Y 1000 1000 1000 1000 1000 1000 0997 lowast

FNN 0000 0000 0000 0000 0000 0000 0000RNN 0000 1048 1007 1002 0996 09920 09997

Table 4 Estimated results for nonuniform distribution by post-processing using LFSR (Hmin 0152)

Stage 8 10 12 14 16 18 20MultiMCW 0440 0595 0743 0721 0998 0994 0998Lag 0581 0581 0680 0680 0992 0994 0999MultiMMC 0151 0153 0158 0181 0234 0995 0996LZ78Y 0567 0995 0766 0679 0997 0996lowast 0994lowast

FNN 0151 0145 0149 0147 0149 0142 0144RNN 0149 0947 1012 0998 1012 0997 0985

Security and Communication Networks 13

minimum and maximum values that are outpute sequence used here consists of bits

(ii) Ubldit TrueRNGpro TrueRNGpro provides a steadystream of random numbers through a USB CDCserial port which is a USB randomnumber generatorproduced by Ubldit is entropy source is also usedin [24] e sequence used here consists of bits

(iii) Linux kernel entropy source e Linux kernelrandom generator is used for the generation of areal-world sequence without any processing esequence used here is the last bit of per symbol

(iv) Linuxdevurandomedevurandom [6] of Linuxis used for the generation of a real-world sequencewith strict processing e sequence used hereconsists of bits

(v) Windows RNG Windows RNG [5] is used for thegeneration of a real-world sequence by calling aCrypto APIe sequence used here consists of bits

As illustrated in Table 6 the lowest entropy estimationfor each source is shown in bold font We see that ourpredictors perform better than 90Brsquos predictors because thelowest entropy estimation is always obtained from our workfor each real-world source Furthermore for Linux kernelentropy source we find that both of the predictor Lag andMultiMMC are able to give lower estimation results Itindicates that Linux kernel entropy source has periodicityand conforms to the Markov model which is well under-stood because the randomness of Linux kernel entropysource comes from human behaviors such as manipulatingthe mouse and keyboard In our work compared with theentropy estimations for other real-world sources FNN fitsmuch better than RNN for Linux kernel entropy sourcewhich is consistent with the previous view that FNN per-forms well in testing Markov sources

42 Comparison on the Scope of Applicability After evalu-ating the accuracy we further validate the scope of appli-cability of our proposed predictors and compare them withthat of the 90Brsquos predictors Kelsey et al [24] stated that eachof the 90Brsquos predictors performs well only for a specialdistribution as described in Section 221 To prove ourpredictor has better applicability the following four simu-lated datasets are generated which are suitable for eachpredictor employed in the final 90B

(i) Time-Varying Sourcese probability distribution ofdata sources is varying with timeeMCWpredictor

predicts the current output according to previousoutputs in a short period of time and thus the MCWpredictor performs well in these data sources

(ii) Periodic Sources e data source changes periodi-cally e lag predictor predicts the value that oc-curred samples back in the sequence as the currentoutput and thus the lag predictor performs well onsources with strong periodic behavior

(iii) Markov Sources e data sources can be modeledby the Markov model e MultiMMC predictorpredicts the current output according to theMarkovmodel and thus the MultiMMC predictor performswell on data from any process that can be accuratelymodeled by a Markov model

(iv) LZ78Y Sources e data sources can be efficientlycompressed by LZ78-like compression algorithmswhich applies to the LZ78Y predictor well

For each above simulated source we generate a set of 10simulated datasets each of which contains 106 samples andthe min-entropy is estimated by our and 90Brsquos predictorse final result for a predictor is the average value of 10estimated results corresponding to the 10 simulated datasetsfor one simulated source

421 Time-Varying Sources Firstly we generate the time-varying binary data which is suitable for the statistical be-haviors of the MCW predictor presented in the 90B Table 7shows the entropy estimation results for time-varying data

As shown in Table 7 symbol gradual(x) (x isin [0 1] thesame below) is defined as a simulated source that theprobability of output ldquo0rdquo changes gradually from x to 1 minus x

with time Symbol period(x) is defined as a simulated sourcethat the probability of output ldquo0rdquo changes periodically withtime and the probability varies from x to 1 minus x in one periode period length is set to 20 of the entire input datasetSymbol sudden(x) is defined as a simulated source that theprobability of output ldquo0rdquo changes suddenly with timenamely the probability is set to x for the first half of the inputdataset and 1 minus x for the last half

From Table 7 the estimation results for MCW predictorand our work are shown in bold font We see that the MCWpredictor gives the lowest and most accurate entropy esti-mations for the three types of time-varying data mentionedabove but it gives a little underestimates at gradual(02) andperiod(02) It is confirmed that the time-varying sourcesmentioned above match with the statistical behaviors of theMCW predictor Relatively we find that our proposedpredictive models are all capable to obtain the satisfiedentropy estimations that are close to the correct valueserefore it is proved that our proposed predictive modelsare suitable for the time-varying datamentioned above Notethat we calculate the min-entropy estimate according to theentire dataset rather than the last 20 of the input dataset forthese time-varying sources Because the probability distri-bution is varying with time the part of the input datasetcannot represent the overall distribution of the input dataset

Table 5 Relative errors of the final estimations of 90Brsquos predictorsand our predictors for five classes of simulated sources

Simulated data class 90Brsquos predictors () Our predictors ()Uniform 437 153Near-uniform 347 159Normal 608 157Time-varying normal 347 172Markov 1465 655

14 Security and Communication Networks

422 Periodic Sources Secondly we generate periodic datawhich are suitable for the statistical behaviors of the lagpredictor presented in 90B e following is entropy esti-mation results for periodic sequences e data source iscompletely obeying the periodic rule so the correct entropyis zero e bit width of samples is traversed from 2 to 8

As shown in Table 8 the estimation results for the lagpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated periodicsources we confirm that the lag predictor is suitable for theentropy estimation of this type of source as expected Rel-atively the RNN can also give the accurate min-entropyestimates ie estimated results are zeros us our pro-posed predictive models are suitable for the entropy esti-mation of the (strong) periodic data In addition theMultiMMC predictor can also give the accurate min-entropyestimations is is reasonable because periodicity is also aform of correlation

423 Markov Sources Next we generate multivariateM-sequences as Markov sources which fit the statisticalbehaviors of the MultiMMC predictor Specifically themultivariate M-sequences are composed of multiple M-se-quences with different initial states Due to the determinacyof this type of sequences the correct entropy is zero e bitwidth of the samples is also traversed from 2 to 8 emaximum step of correlation used here is set as 8 Table 9shows the estimated results for multivariate M-sequences

From Table 9 the estimation results for MultiMMCpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated Markovsources we confirm that theMultiMMC predictor is suitablefor the entropy estimation of this type of source as expectedRelatively the RNN can also give the accurate min-entropyestimations ie estimated results are zeros us our

proposed predictive models are suitable for the Markovsources

424 LZ78Y Sources Finally we verify the applicability ofthe LZ78Y sources is type of entropy source is difficult togenerate by simulating However we can still draw theconclusion that our proposed predictive models can beapplied to the LZ78Y sources according to Tables 8 and 9 initalic font Because the periodic data and Markov sequencesare compressible

425 Summary on Applicability Scope of Our PredictorsBy analyzing the experimental results of the above fourspecific simulated sources each of which is oriented towardsa certain predictor in the 90B we have a conclusion that ourpredictors can provide accurate estimated results of entropySo the proposed predictors are well applied to these entropysources as well as the 90Brsquos predictors In addition com-pared with 90Brsquos predictors our predictors have a betterperformance on the scope of applicability for testing the

Table 7 Entropy estimates for time-varying data

Data class Correct90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNgradual(02) 06345 05290 07808 07240 07790 06288 06289gradual(03) 07437 07378 09221 08416 09243 07430 07460gradual(04) 08645 08631 09786 09518 09739 08648 08637period(02) 06345 05537 07428 05537 07669 06205 06209period(03) 07437 07393 09218 08476 09233 07377 07375period(04) 08645 08639 09767 09632 09796 08653 08632sudden(02) 03219 03203 04663 03386 04484 03217 03229sudden(03) 05146 05110 05857 09984 07663 05110 05119sudden(04) 07370 07338 08699 09984 09389 07339 07345

Table 8 Entropy estimates for periodic sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 16458 00000 00000 11817 00079 000003 23318 00000 00000 15957 01315 000004 29147 00000 00000 18016 04748 000005 33269 00000 00000 14586 08898 000006 39092 00000 00000 08322 34944 000007 44908 00000 00000 03973 34960 000008 44919 00000 00000 02027 35408 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 6 Entropy estimates for real-world sources

Real-world sources90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNRANDOMORG 09951 09963 09966 09976 09802 09954Ubldit TrueRNGpro 09979 09955 09973 09966 09934 09728Linux kernel entropy source 06173 01232 01269 06164 01230 03068Linuxdevurandom 09952 09935 09990 09964 09983 09911Windows RNG 09953 09986 09975 09984 09833 09853

Security and Communication Networks 15

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 9: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

(ii) Discrete Near-Uniform Distribution All samples areequally likely except one which come from an IIDsource A certain sample has a higher probabilitythan the rest

(iii) Normal Distribution Rounded to Integers esamples are subject to a normal distribution androunded to integer values which come from an IIDsource

(iv) Markov Model e samples are generated using adth-order Markov model which come from a non-IID source

(2) Simulated Datasets Generated by Nonstationary Sourcesese datasets are the same as those used in Section 322

For every class listed above we generate a set of 80simulated datasets each of which contains 106 samples andestimate min-entropy by using predictive models FNN andRNN respectively For each dataset the theoretical min-entropy Hmin is derived from the known probabilitydistribution

From Figures 4ndash9 the abscissa in the figure representsthe theoretical entropy of the test sample and the ordinaterepresents the estimated entropy of the test sample Figure 4shows the estimated entropy results for the 80 simulateddatasets with uniform and near-uniform distributions re-spectively From Figures 4(a) and 4(b) we see that the es-timated results given by our proposed two predictive models(FNN and RNN) are almost consistent with the theoreticalentropy for both uniform and near-uniform distributionsSo the final estimated result which is the minimum result ofthe two predictive models is also basically consistent with thetheoretical entropy Figure 5 shows the estimated entropyresults for the 80 simulated datasets with normal distribu-tions and time-varying normal distributions respectivelyFrom Figures 5(a) and 5(b) we can see that the estimatedresults given by our proposed two predictive models areclose to the theoretical entropy with normal distributionsand time-varying normal distributions According to ourexecution strategy here we calculate min-entropy estima-tions using the whole input dataset for time-varying normaldistributions

Figure 6 shows the estimated results of Markov distri-butions we can see that both of our predictive models give anumber of overestimates when applied to the Markovsources particularly with the theoretical entropy increasing

Table 2 shows the relative errors (namely|( 1113954Hmin minus Hmin)Hmin|lowast100) between the theoretical re-sults and the estimated results of FNN and RNN to furtherreflect the accuracy of the models 1113954Hmin and Hmin have thesame meaning as in Section 322 We see that the entropy tobe estimated with an error of less than 602 for FNN and7 for RNN for the simulated classes respectively

Based on the above accuracy verification of our pre-dictors with simulated datasets from different distributionswhat we can be sure is that out predictors can give almostaccurate results except Markov distributions

332 Complexity Analysis To analyze the usability of ourpredictors in terms of execution efficiency we derive thefollowing computational complexity through the analysis oftheory and principle of implementation

We believe that the computational complexity of entropyestimators used for RNG evaluation mainly comes from thesample space and sample size For ease of analysis we definethe following parameter n as the sample size which indicatesthe length of the sample s as the sample space which meansthe kinds of symbols in the sample (ie s 8means there are8 symbols in the sample and the bit width of each symbol islog2(8) 3 such as 010 110 111 ) and k denotes themaximum step of correlation which is set as a constant in90Brsquos predictors (k 16) and our predictors (k 20)

rough the analysis of the implementation the com-putational complexity of the final 90Brsquos predictors [23]mainly comes from the MultiMMC predictor and is of orderO(sk middot n + 2k middot n middot log2(s)) which is mainly linear timecomplexity of n and k-order polynomial time complexity ofs While the computational complexity of our predictor is oforder O(s middot n) which is linear time complexity of s and n Itcan be seen that the computational complexity of ourpredictors is much lower than that of the 90Brsquos predictors

It is important to note that the MultiMMC predictorrequires sk≪ n otherwise this predictor cannot give ac-curate estimated results statisticallyat is to say when the sis increasing the MultiMMC predictor requires largersample size in order to estimate the entropy accurately

From the above analysis we can see our predictors havelower computational complexity We will give the experi-mental proof in Section 43

4 Comparison on Our Predictors With90BrsquoS Predictors

In this section a large number of experiments have beendone to evaluate our proposed predictors for entropy esti-mation from the aspects of accuracy applicability and ef-ficiency by applying our predictors to different simulateddata and real-world data For the experiments mentionedabove we compare the results with the final 90Brsquos predictors[23] to highlight the advantages of our work Similarly ourpredictors in these experiments compute an upper-bound ofmin-entropy estimation at the significance level α 001which is the same as 90Brsquos predictors

41 Comparison on Accuracy

411 Simulated Data e simulated datasets are producedusing the same distribution families as described in Section331 Further we append the following two new distributionfamilies such as pseudorandom sequence and post-processing sequence which are representative and com-monly used in reality

Table 1 Error measures of final estimations of our predictors fornonstationary sources with different c values

c 01 02 04 06 08 1RMSE 00911 01364 00788 00817 00219 00149

Security and Communication Networks 9

eoretical entropyFNNRNN

0

1

2

3

4

5

6

7

8Es

timat

ed en

trop

y pe

r sam

ple

1 2 3 4 5 6 7 80eoretical entropy per sample

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 4 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for(a) uniform distributions and (b) near-uniform distributions

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entro

py p

er sa

mpl

e

(b)

Figure 5 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for (a) normaldistributions and (b) time-varying normal distributions

10 Security and Communication Networks

Theoretical entropyFNNRNN

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

Figure 6 Comparison of estimated results obtained from our two predictive models with the theoretical entropy for Markov distributions

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 7 Comparison of the min-entropy estimation obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) uniform distributions and (b) near-uniform distributions

Security and Communication Networks 11

(i) M-Sequence A maximum length sequence which isa type of pseudorandom binary sequence ([35])

(ii) Nonuniform Distribution by Postprocessing UsingLFSR e samples are processed using a linearfeedback shifting register (LFSR) which come froman IID source ([35])

For every distribution mentioned above we also gen-erate a set of 80 simulated datasets each of which contains106 samples and estimate min-entropy by using our pro-posed predictors and final 90Brsquos predictors [23]

Figure 7 shows the estimated min-entropy results for the80 simulated datasets with uniform distributions and near-uniform distributions respectively From Figures 7(a) and7(b) we see that several points of the results obtained fromthe 90Brsquos predictors are apparently underestimated whichmay result from the overfitting phenomenon Comparedwith 90Brsquos predictors our predictors provide more accurateresults

Figure 8 shows the estimated min-entropy results fornormal distributions and time-varying normal distributionsrespectively From Figures 8(a) and 8(b) we can see that theestimated results given by our predictors are close to thetheoretical entropy with normal distributions and time-varying normal distributions However the lowest entropyestimation results obtained from the 90Brsquos predictors givesignificant underestimates

Figure 9 shows the estimated min-entropy results forMarkov distributions We can see that the 90Brsquos predictorsalmost give underestimates compared with the theoreticalentropy while estimated results given by our predictors aremuch closer to the theoretical entropy than those obtainedfrom 90Brsquos predictors

To further obviously compare the accuracy of our and90Brsquos predictors we apply the predictors to the M-sequenceand the non-uniform distribution sequence by post-pro-cessing using LFSR and their theoretical entropy is a knownand fixed value

It is further confirmed that the higher stage (the max-imum step of correlation) M-sequence and nonuniformdistribution sequence by postprocessing using LFSR are ableto pass the NIST SP 800-22 statistical tests [8] e estimatedresults are listed in Tables 3 and 4 and the lowest entropyestimations from 90Brsquos predictors and our predictors foreach stage are shown in bold font

For M-sequence and nonuniform distribution by post-processing using LFSR the MultiMMC predictor presentedin the final 90B gives the most accurate entropy estimationresults for the stage le 16 However when the stage ofM-sequence and nonuniform distribution by postprocessingusing LFSR is greater than 16 the MultiMMC predictorcannot give accurate entropy estimation result because thispredictor is parameterized by k isin 1 2 16 (k is themaximum step of correlation) Perhaps we could set the

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7Es

timat

ed en

trop

y pe

r sam

ple

(a)

eoretical entropy90Brsquos predictorsOur predictors

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e1 2 3 4 5 6 70

eoretical entropy per sample

(b)

Figure 8 Comparison of the min-entropy estimations obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) normal distributions and (b) time-varying normal distributions

12 Security and Communication Networks

parameter of the MultiMMC predictor as a greater range toachieve a more accurate estimated result for the higher stagebut the time complexity grows exponentially with the pa-rameter k as we analyzed in Section 332 Moreover theFNN model can also give accurate estimated results eventhough the stages of M-sequence and LFSR are greater than16 However the RNN model can give accurate estimated

results only when the stage is 8erefore the FNNmodel ismore matched to M-sequence and nonuniform distributionby postprocessing using LFSR than RNN

We also compute the relative errors of estimated resultsfrom 90Brsquos predictors and our predictors over 80 se-quences from each class of simulated sources We calculatethe relative errors using the min-entropy obtained from90Brsquos predictors (the lowest estimation result of 90Brsquos fourpredictors) and our predictors (the lowest estimation re-sult of FNN and RNN) respectively As illustrated inTable 5 it shows that for all five classes of simulatedsources the errors of our predictors are lower than that ofthe 90Brsquos predictors Specially our approaches enable theentropy to be estimated with an error of less than 655but it is up to 1465 for 90Brsquos predictors Overall thisindicates that our proposed predictors have a betterperformance than that of 90Brsquos predictors on accuracy forboth stationary sequences and nonstationary sequenceswhich is consistent with the conclusion drawn in thefigures above

From Tables 2ndash4 we also find that the accuracy of theRNN predictive model is slightly higher than that of theFNN predictive model except for the cases of the Markovsources M-sequence and nonuniform distribution bypostprocessing using LFSR

We will further verify the applicability for time-varyingsources in Section 42 erefore through the evaluation onthe entropy estimation results of the above simulateddatasets we see that our proposed predictors are superior inaccuracy compared with the 90Brsquos predictors

412 Real-World Data We further apply our predictors tothe datasets which are generated from the RNGs deployed inthe real-world In fact the theoretical entropy of per sampleis unknown for these real-world sources so no error can becompared like the simulated datasets for the predictorsHowever the estimated results from the predictors pre-sented here can still be compared to the 90Brsquos predictorsbased on the knowledge that underestimates from thepredictors have theoretical bounds

Datasets of real-world data are produced using thefollowing approaches e first two are adopted in [24] andthe others are commonly used typical RNGs e estima-tions of the real-world sources are presented in Table 6

(i) RANDOMORG is is a service that providesrandom numbers based on atmospheric noise and isused in [24] It allows the user to specify the

Theoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

Figure 9 Comparison of the min-entropy estimations obtainedfrom our proposed predictors and 90Brsquos predictors with the the-oretical entropy for Markov distributions

Table 2 Relative errors of FNN and RNN estimation results

Simulated data class FNN () RNN ()Uniform 165 16Near-uniform 160 152Normal 117 108Time-varying normal 212 184Markov 602 7

Table 3 Estimated results for M-sequence (Hmin 0000)

Stage 8 10 12 14 16 18 20MultiMCW 0991 0996 0988 0989 0993 0999 lowast 1000Lag 1000 1000 1000 1000 1000 1000 1000MultiMMC 0000 0000 0000 0000 0000 1000 1000LZ78Y 1000 1000 1000 1000 1000 1000 0997 lowast

FNN 0000 0000 0000 0000 0000 0000 0000RNN 0000 1048 1007 1002 0996 09920 09997

Table 4 Estimated results for nonuniform distribution by post-processing using LFSR (Hmin 0152)

Stage 8 10 12 14 16 18 20MultiMCW 0440 0595 0743 0721 0998 0994 0998Lag 0581 0581 0680 0680 0992 0994 0999MultiMMC 0151 0153 0158 0181 0234 0995 0996LZ78Y 0567 0995 0766 0679 0997 0996lowast 0994lowast

FNN 0151 0145 0149 0147 0149 0142 0144RNN 0149 0947 1012 0998 1012 0997 0985

Security and Communication Networks 13

minimum and maximum values that are outpute sequence used here consists of bits

(ii) Ubldit TrueRNGpro TrueRNGpro provides a steadystream of random numbers through a USB CDCserial port which is a USB randomnumber generatorproduced by Ubldit is entropy source is also usedin [24] e sequence used here consists of bits

(iii) Linux kernel entropy source e Linux kernelrandom generator is used for the generation of areal-world sequence without any processing esequence used here is the last bit of per symbol

(iv) Linuxdevurandomedevurandom [6] of Linuxis used for the generation of a real-world sequencewith strict processing e sequence used hereconsists of bits

(v) Windows RNG Windows RNG [5] is used for thegeneration of a real-world sequence by calling aCrypto APIe sequence used here consists of bits

As illustrated in Table 6 the lowest entropy estimationfor each source is shown in bold font We see that ourpredictors perform better than 90Brsquos predictors because thelowest entropy estimation is always obtained from our workfor each real-world source Furthermore for Linux kernelentropy source we find that both of the predictor Lag andMultiMMC are able to give lower estimation results Itindicates that Linux kernel entropy source has periodicityand conforms to the Markov model which is well under-stood because the randomness of Linux kernel entropysource comes from human behaviors such as manipulatingthe mouse and keyboard In our work compared with theentropy estimations for other real-world sources FNN fitsmuch better than RNN for Linux kernel entropy sourcewhich is consistent with the previous view that FNN per-forms well in testing Markov sources

42 Comparison on the Scope of Applicability After evalu-ating the accuracy we further validate the scope of appli-cability of our proposed predictors and compare them withthat of the 90Brsquos predictors Kelsey et al [24] stated that eachof the 90Brsquos predictors performs well only for a specialdistribution as described in Section 221 To prove ourpredictor has better applicability the following four simu-lated datasets are generated which are suitable for eachpredictor employed in the final 90B

(i) Time-Varying Sourcese probability distribution ofdata sources is varying with timeeMCWpredictor

predicts the current output according to previousoutputs in a short period of time and thus the MCWpredictor performs well in these data sources

(ii) Periodic Sources e data source changes periodi-cally e lag predictor predicts the value that oc-curred samples back in the sequence as the currentoutput and thus the lag predictor performs well onsources with strong periodic behavior

(iii) Markov Sources e data sources can be modeledby the Markov model e MultiMMC predictorpredicts the current output according to theMarkovmodel and thus the MultiMMC predictor performswell on data from any process that can be accuratelymodeled by a Markov model

(iv) LZ78Y Sources e data sources can be efficientlycompressed by LZ78-like compression algorithmswhich applies to the LZ78Y predictor well

For each above simulated source we generate a set of 10simulated datasets each of which contains 106 samples andthe min-entropy is estimated by our and 90Brsquos predictorse final result for a predictor is the average value of 10estimated results corresponding to the 10 simulated datasetsfor one simulated source

421 Time-Varying Sources Firstly we generate the time-varying binary data which is suitable for the statistical be-haviors of the MCW predictor presented in the 90B Table 7shows the entropy estimation results for time-varying data

As shown in Table 7 symbol gradual(x) (x isin [0 1] thesame below) is defined as a simulated source that theprobability of output ldquo0rdquo changes gradually from x to 1 minus x

with time Symbol period(x) is defined as a simulated sourcethat the probability of output ldquo0rdquo changes periodically withtime and the probability varies from x to 1 minus x in one periode period length is set to 20 of the entire input datasetSymbol sudden(x) is defined as a simulated source that theprobability of output ldquo0rdquo changes suddenly with timenamely the probability is set to x for the first half of the inputdataset and 1 minus x for the last half

From Table 7 the estimation results for MCW predictorand our work are shown in bold font We see that the MCWpredictor gives the lowest and most accurate entropy esti-mations for the three types of time-varying data mentionedabove but it gives a little underestimates at gradual(02) andperiod(02) It is confirmed that the time-varying sourcesmentioned above match with the statistical behaviors of theMCW predictor Relatively we find that our proposedpredictive models are all capable to obtain the satisfiedentropy estimations that are close to the correct valueserefore it is proved that our proposed predictive modelsare suitable for the time-varying datamentioned above Notethat we calculate the min-entropy estimate according to theentire dataset rather than the last 20 of the input dataset forthese time-varying sources Because the probability distri-bution is varying with time the part of the input datasetcannot represent the overall distribution of the input dataset

Table 5 Relative errors of the final estimations of 90Brsquos predictorsand our predictors for five classes of simulated sources

Simulated data class 90Brsquos predictors () Our predictors ()Uniform 437 153Near-uniform 347 159Normal 608 157Time-varying normal 347 172Markov 1465 655

14 Security and Communication Networks

422 Periodic Sources Secondly we generate periodic datawhich are suitable for the statistical behaviors of the lagpredictor presented in 90B e following is entropy esti-mation results for periodic sequences e data source iscompletely obeying the periodic rule so the correct entropyis zero e bit width of samples is traversed from 2 to 8

As shown in Table 8 the estimation results for the lagpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated periodicsources we confirm that the lag predictor is suitable for theentropy estimation of this type of source as expected Rel-atively the RNN can also give the accurate min-entropyestimates ie estimated results are zeros us our pro-posed predictive models are suitable for the entropy esti-mation of the (strong) periodic data In addition theMultiMMC predictor can also give the accurate min-entropyestimations is is reasonable because periodicity is also aform of correlation

423 Markov Sources Next we generate multivariateM-sequences as Markov sources which fit the statisticalbehaviors of the MultiMMC predictor Specifically themultivariate M-sequences are composed of multiple M-se-quences with different initial states Due to the determinacyof this type of sequences the correct entropy is zero e bitwidth of the samples is also traversed from 2 to 8 emaximum step of correlation used here is set as 8 Table 9shows the estimated results for multivariate M-sequences

From Table 9 the estimation results for MultiMMCpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated Markovsources we confirm that theMultiMMC predictor is suitablefor the entropy estimation of this type of source as expectedRelatively the RNN can also give the accurate min-entropyestimations ie estimated results are zeros us our

proposed predictive models are suitable for the Markovsources

424 LZ78Y Sources Finally we verify the applicability ofthe LZ78Y sources is type of entropy source is difficult togenerate by simulating However we can still draw theconclusion that our proposed predictive models can beapplied to the LZ78Y sources according to Tables 8 and 9 initalic font Because the periodic data and Markov sequencesare compressible

425 Summary on Applicability Scope of Our PredictorsBy analyzing the experimental results of the above fourspecific simulated sources each of which is oriented towardsa certain predictor in the 90B we have a conclusion that ourpredictors can provide accurate estimated results of entropySo the proposed predictors are well applied to these entropysources as well as the 90Brsquos predictors In addition com-pared with 90Brsquos predictors our predictors have a betterperformance on the scope of applicability for testing the

Table 7 Entropy estimates for time-varying data

Data class Correct90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNgradual(02) 06345 05290 07808 07240 07790 06288 06289gradual(03) 07437 07378 09221 08416 09243 07430 07460gradual(04) 08645 08631 09786 09518 09739 08648 08637period(02) 06345 05537 07428 05537 07669 06205 06209period(03) 07437 07393 09218 08476 09233 07377 07375period(04) 08645 08639 09767 09632 09796 08653 08632sudden(02) 03219 03203 04663 03386 04484 03217 03229sudden(03) 05146 05110 05857 09984 07663 05110 05119sudden(04) 07370 07338 08699 09984 09389 07339 07345

Table 8 Entropy estimates for periodic sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 16458 00000 00000 11817 00079 000003 23318 00000 00000 15957 01315 000004 29147 00000 00000 18016 04748 000005 33269 00000 00000 14586 08898 000006 39092 00000 00000 08322 34944 000007 44908 00000 00000 03973 34960 000008 44919 00000 00000 02027 35408 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 6 Entropy estimates for real-world sources

Real-world sources90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNRANDOMORG 09951 09963 09966 09976 09802 09954Ubldit TrueRNGpro 09979 09955 09973 09966 09934 09728Linux kernel entropy source 06173 01232 01269 06164 01230 03068Linuxdevurandom 09952 09935 09990 09964 09983 09911Windows RNG 09953 09986 09975 09984 09833 09853

Security and Communication Networks 15

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 10: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

eoretical entropyFNNRNN

0

1

2

3

4

5

6

7

8Es

timat

ed en

trop

y pe

r sam

ple

1 2 3 4 5 6 7 80eoretical entropy per sample

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 4 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for(a) uniform distributions and (b) near-uniform distributions

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropyFNNRNN

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7

Estim

ated

entro

py p

er sa

mpl

e

(b)

Figure 5 Comparison of estimated results obtained from our two predictive models with the theoretical entropy Estimations for (a) normaldistributions and (b) time-varying normal distributions

10 Security and Communication Networks

Theoretical entropyFNNRNN

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

Figure 6 Comparison of estimated results obtained from our two predictive models with the theoretical entropy for Markov distributions

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 7 Comparison of the min-entropy estimation obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) uniform distributions and (b) near-uniform distributions

Security and Communication Networks 11

(i) M-Sequence A maximum length sequence which isa type of pseudorandom binary sequence ([35])

(ii) Nonuniform Distribution by Postprocessing UsingLFSR e samples are processed using a linearfeedback shifting register (LFSR) which come froman IID source ([35])

For every distribution mentioned above we also gen-erate a set of 80 simulated datasets each of which contains106 samples and estimate min-entropy by using our pro-posed predictors and final 90Brsquos predictors [23]

Figure 7 shows the estimated min-entropy results for the80 simulated datasets with uniform distributions and near-uniform distributions respectively From Figures 7(a) and7(b) we see that several points of the results obtained fromthe 90Brsquos predictors are apparently underestimated whichmay result from the overfitting phenomenon Comparedwith 90Brsquos predictors our predictors provide more accurateresults

Figure 8 shows the estimated min-entropy results fornormal distributions and time-varying normal distributionsrespectively From Figures 8(a) and 8(b) we can see that theestimated results given by our predictors are close to thetheoretical entropy with normal distributions and time-varying normal distributions However the lowest entropyestimation results obtained from the 90Brsquos predictors givesignificant underestimates

Figure 9 shows the estimated min-entropy results forMarkov distributions We can see that the 90Brsquos predictorsalmost give underestimates compared with the theoreticalentropy while estimated results given by our predictors aremuch closer to the theoretical entropy than those obtainedfrom 90Brsquos predictors

To further obviously compare the accuracy of our and90Brsquos predictors we apply the predictors to the M-sequenceand the non-uniform distribution sequence by post-pro-cessing using LFSR and their theoretical entropy is a knownand fixed value

It is further confirmed that the higher stage (the max-imum step of correlation) M-sequence and nonuniformdistribution sequence by postprocessing using LFSR are ableto pass the NIST SP 800-22 statistical tests [8] e estimatedresults are listed in Tables 3 and 4 and the lowest entropyestimations from 90Brsquos predictors and our predictors foreach stage are shown in bold font

For M-sequence and nonuniform distribution by post-processing using LFSR the MultiMMC predictor presentedin the final 90B gives the most accurate entropy estimationresults for the stage le 16 However when the stage ofM-sequence and nonuniform distribution by postprocessingusing LFSR is greater than 16 the MultiMMC predictorcannot give accurate entropy estimation result because thispredictor is parameterized by k isin 1 2 16 (k is themaximum step of correlation) Perhaps we could set the

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7Es

timat

ed en

trop

y pe

r sam

ple

(a)

eoretical entropy90Brsquos predictorsOur predictors

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e1 2 3 4 5 6 70

eoretical entropy per sample

(b)

Figure 8 Comparison of the min-entropy estimations obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) normal distributions and (b) time-varying normal distributions

12 Security and Communication Networks

parameter of the MultiMMC predictor as a greater range toachieve a more accurate estimated result for the higher stagebut the time complexity grows exponentially with the pa-rameter k as we analyzed in Section 332 Moreover theFNN model can also give accurate estimated results eventhough the stages of M-sequence and LFSR are greater than16 However the RNN model can give accurate estimated

results only when the stage is 8erefore the FNNmodel ismore matched to M-sequence and nonuniform distributionby postprocessing using LFSR than RNN

We also compute the relative errors of estimated resultsfrom 90Brsquos predictors and our predictors over 80 se-quences from each class of simulated sources We calculatethe relative errors using the min-entropy obtained from90Brsquos predictors (the lowest estimation result of 90Brsquos fourpredictors) and our predictors (the lowest estimation re-sult of FNN and RNN) respectively As illustrated inTable 5 it shows that for all five classes of simulatedsources the errors of our predictors are lower than that ofthe 90Brsquos predictors Specially our approaches enable theentropy to be estimated with an error of less than 655but it is up to 1465 for 90Brsquos predictors Overall thisindicates that our proposed predictors have a betterperformance than that of 90Brsquos predictors on accuracy forboth stationary sequences and nonstationary sequenceswhich is consistent with the conclusion drawn in thefigures above

From Tables 2ndash4 we also find that the accuracy of theRNN predictive model is slightly higher than that of theFNN predictive model except for the cases of the Markovsources M-sequence and nonuniform distribution bypostprocessing using LFSR

We will further verify the applicability for time-varyingsources in Section 42 erefore through the evaluation onthe entropy estimation results of the above simulateddatasets we see that our proposed predictors are superior inaccuracy compared with the 90Brsquos predictors

412 Real-World Data We further apply our predictors tothe datasets which are generated from the RNGs deployed inthe real-world In fact the theoretical entropy of per sampleis unknown for these real-world sources so no error can becompared like the simulated datasets for the predictorsHowever the estimated results from the predictors pre-sented here can still be compared to the 90Brsquos predictorsbased on the knowledge that underestimates from thepredictors have theoretical bounds

Datasets of real-world data are produced using thefollowing approaches e first two are adopted in [24] andthe others are commonly used typical RNGs e estima-tions of the real-world sources are presented in Table 6

(i) RANDOMORG is is a service that providesrandom numbers based on atmospheric noise and isused in [24] It allows the user to specify the

Theoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

Figure 9 Comparison of the min-entropy estimations obtainedfrom our proposed predictors and 90Brsquos predictors with the the-oretical entropy for Markov distributions

Table 2 Relative errors of FNN and RNN estimation results

Simulated data class FNN () RNN ()Uniform 165 16Near-uniform 160 152Normal 117 108Time-varying normal 212 184Markov 602 7

Table 3 Estimated results for M-sequence (Hmin 0000)

Stage 8 10 12 14 16 18 20MultiMCW 0991 0996 0988 0989 0993 0999 lowast 1000Lag 1000 1000 1000 1000 1000 1000 1000MultiMMC 0000 0000 0000 0000 0000 1000 1000LZ78Y 1000 1000 1000 1000 1000 1000 0997 lowast

FNN 0000 0000 0000 0000 0000 0000 0000RNN 0000 1048 1007 1002 0996 09920 09997

Table 4 Estimated results for nonuniform distribution by post-processing using LFSR (Hmin 0152)

Stage 8 10 12 14 16 18 20MultiMCW 0440 0595 0743 0721 0998 0994 0998Lag 0581 0581 0680 0680 0992 0994 0999MultiMMC 0151 0153 0158 0181 0234 0995 0996LZ78Y 0567 0995 0766 0679 0997 0996lowast 0994lowast

FNN 0151 0145 0149 0147 0149 0142 0144RNN 0149 0947 1012 0998 1012 0997 0985

Security and Communication Networks 13

minimum and maximum values that are outpute sequence used here consists of bits

(ii) Ubldit TrueRNGpro TrueRNGpro provides a steadystream of random numbers through a USB CDCserial port which is a USB randomnumber generatorproduced by Ubldit is entropy source is also usedin [24] e sequence used here consists of bits

(iii) Linux kernel entropy source e Linux kernelrandom generator is used for the generation of areal-world sequence without any processing esequence used here is the last bit of per symbol

(iv) Linuxdevurandomedevurandom [6] of Linuxis used for the generation of a real-world sequencewith strict processing e sequence used hereconsists of bits

(v) Windows RNG Windows RNG [5] is used for thegeneration of a real-world sequence by calling aCrypto APIe sequence used here consists of bits

As illustrated in Table 6 the lowest entropy estimationfor each source is shown in bold font We see that ourpredictors perform better than 90Brsquos predictors because thelowest entropy estimation is always obtained from our workfor each real-world source Furthermore for Linux kernelentropy source we find that both of the predictor Lag andMultiMMC are able to give lower estimation results Itindicates that Linux kernel entropy source has periodicityand conforms to the Markov model which is well under-stood because the randomness of Linux kernel entropysource comes from human behaviors such as manipulatingthe mouse and keyboard In our work compared with theentropy estimations for other real-world sources FNN fitsmuch better than RNN for Linux kernel entropy sourcewhich is consistent with the previous view that FNN per-forms well in testing Markov sources

42 Comparison on the Scope of Applicability After evalu-ating the accuracy we further validate the scope of appli-cability of our proposed predictors and compare them withthat of the 90Brsquos predictors Kelsey et al [24] stated that eachof the 90Brsquos predictors performs well only for a specialdistribution as described in Section 221 To prove ourpredictor has better applicability the following four simu-lated datasets are generated which are suitable for eachpredictor employed in the final 90B

(i) Time-Varying Sourcese probability distribution ofdata sources is varying with timeeMCWpredictor

predicts the current output according to previousoutputs in a short period of time and thus the MCWpredictor performs well in these data sources

(ii) Periodic Sources e data source changes periodi-cally e lag predictor predicts the value that oc-curred samples back in the sequence as the currentoutput and thus the lag predictor performs well onsources with strong periodic behavior

(iii) Markov Sources e data sources can be modeledby the Markov model e MultiMMC predictorpredicts the current output according to theMarkovmodel and thus the MultiMMC predictor performswell on data from any process that can be accuratelymodeled by a Markov model

(iv) LZ78Y Sources e data sources can be efficientlycompressed by LZ78-like compression algorithmswhich applies to the LZ78Y predictor well

For each above simulated source we generate a set of 10simulated datasets each of which contains 106 samples andthe min-entropy is estimated by our and 90Brsquos predictorse final result for a predictor is the average value of 10estimated results corresponding to the 10 simulated datasetsfor one simulated source

421 Time-Varying Sources Firstly we generate the time-varying binary data which is suitable for the statistical be-haviors of the MCW predictor presented in the 90B Table 7shows the entropy estimation results for time-varying data

As shown in Table 7 symbol gradual(x) (x isin [0 1] thesame below) is defined as a simulated source that theprobability of output ldquo0rdquo changes gradually from x to 1 minus x

with time Symbol period(x) is defined as a simulated sourcethat the probability of output ldquo0rdquo changes periodically withtime and the probability varies from x to 1 minus x in one periode period length is set to 20 of the entire input datasetSymbol sudden(x) is defined as a simulated source that theprobability of output ldquo0rdquo changes suddenly with timenamely the probability is set to x for the first half of the inputdataset and 1 minus x for the last half

From Table 7 the estimation results for MCW predictorand our work are shown in bold font We see that the MCWpredictor gives the lowest and most accurate entropy esti-mations for the three types of time-varying data mentionedabove but it gives a little underestimates at gradual(02) andperiod(02) It is confirmed that the time-varying sourcesmentioned above match with the statistical behaviors of theMCW predictor Relatively we find that our proposedpredictive models are all capable to obtain the satisfiedentropy estimations that are close to the correct valueserefore it is proved that our proposed predictive modelsare suitable for the time-varying datamentioned above Notethat we calculate the min-entropy estimate according to theentire dataset rather than the last 20 of the input dataset forthese time-varying sources Because the probability distri-bution is varying with time the part of the input datasetcannot represent the overall distribution of the input dataset

Table 5 Relative errors of the final estimations of 90Brsquos predictorsand our predictors for five classes of simulated sources

Simulated data class 90Brsquos predictors () Our predictors ()Uniform 437 153Near-uniform 347 159Normal 608 157Time-varying normal 347 172Markov 1465 655

14 Security and Communication Networks

422 Periodic Sources Secondly we generate periodic datawhich are suitable for the statistical behaviors of the lagpredictor presented in 90B e following is entropy esti-mation results for periodic sequences e data source iscompletely obeying the periodic rule so the correct entropyis zero e bit width of samples is traversed from 2 to 8

As shown in Table 8 the estimation results for the lagpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated periodicsources we confirm that the lag predictor is suitable for theentropy estimation of this type of source as expected Rel-atively the RNN can also give the accurate min-entropyestimates ie estimated results are zeros us our pro-posed predictive models are suitable for the entropy esti-mation of the (strong) periodic data In addition theMultiMMC predictor can also give the accurate min-entropyestimations is is reasonable because periodicity is also aform of correlation

423 Markov Sources Next we generate multivariateM-sequences as Markov sources which fit the statisticalbehaviors of the MultiMMC predictor Specifically themultivariate M-sequences are composed of multiple M-se-quences with different initial states Due to the determinacyof this type of sequences the correct entropy is zero e bitwidth of the samples is also traversed from 2 to 8 emaximum step of correlation used here is set as 8 Table 9shows the estimated results for multivariate M-sequences

From Table 9 the estimation results for MultiMMCpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated Markovsources we confirm that theMultiMMC predictor is suitablefor the entropy estimation of this type of source as expectedRelatively the RNN can also give the accurate min-entropyestimations ie estimated results are zeros us our

proposed predictive models are suitable for the Markovsources

424 LZ78Y Sources Finally we verify the applicability ofthe LZ78Y sources is type of entropy source is difficult togenerate by simulating However we can still draw theconclusion that our proposed predictive models can beapplied to the LZ78Y sources according to Tables 8 and 9 initalic font Because the periodic data and Markov sequencesare compressible

425 Summary on Applicability Scope of Our PredictorsBy analyzing the experimental results of the above fourspecific simulated sources each of which is oriented towardsa certain predictor in the 90B we have a conclusion that ourpredictors can provide accurate estimated results of entropySo the proposed predictors are well applied to these entropysources as well as the 90Brsquos predictors In addition com-pared with 90Brsquos predictors our predictors have a betterperformance on the scope of applicability for testing the

Table 7 Entropy estimates for time-varying data

Data class Correct90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNgradual(02) 06345 05290 07808 07240 07790 06288 06289gradual(03) 07437 07378 09221 08416 09243 07430 07460gradual(04) 08645 08631 09786 09518 09739 08648 08637period(02) 06345 05537 07428 05537 07669 06205 06209period(03) 07437 07393 09218 08476 09233 07377 07375period(04) 08645 08639 09767 09632 09796 08653 08632sudden(02) 03219 03203 04663 03386 04484 03217 03229sudden(03) 05146 05110 05857 09984 07663 05110 05119sudden(04) 07370 07338 08699 09984 09389 07339 07345

Table 8 Entropy estimates for periodic sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 16458 00000 00000 11817 00079 000003 23318 00000 00000 15957 01315 000004 29147 00000 00000 18016 04748 000005 33269 00000 00000 14586 08898 000006 39092 00000 00000 08322 34944 000007 44908 00000 00000 03973 34960 000008 44919 00000 00000 02027 35408 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 6 Entropy estimates for real-world sources

Real-world sources90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNRANDOMORG 09951 09963 09966 09976 09802 09954Ubldit TrueRNGpro 09979 09955 09973 09966 09934 09728Linux kernel entropy source 06173 01232 01269 06164 01230 03068Linuxdevurandom 09952 09935 09990 09964 09983 09911Windows RNG 09953 09986 09975 09984 09833 09853

Security and Communication Networks 15

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 11: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

Theoretical entropyFNNRNN

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

Figure 6 Comparison of estimated results obtained from our two predictive models with the theoretical entropy for Markov distributions

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(a)

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 7 80eoretical entropy per sample

0

1

2

3

4

5

6

7

8

Estim

ated

entr

opy

per s

ampl

e

(b)

Figure 7 Comparison of the min-entropy estimation obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) uniform distributions and (b) near-uniform distributions

Security and Communication Networks 11

(i) M-Sequence A maximum length sequence which isa type of pseudorandom binary sequence ([35])

(ii) Nonuniform Distribution by Postprocessing UsingLFSR e samples are processed using a linearfeedback shifting register (LFSR) which come froman IID source ([35])

For every distribution mentioned above we also gen-erate a set of 80 simulated datasets each of which contains106 samples and estimate min-entropy by using our pro-posed predictors and final 90Brsquos predictors [23]

Figure 7 shows the estimated min-entropy results for the80 simulated datasets with uniform distributions and near-uniform distributions respectively From Figures 7(a) and7(b) we see that several points of the results obtained fromthe 90Brsquos predictors are apparently underestimated whichmay result from the overfitting phenomenon Comparedwith 90Brsquos predictors our predictors provide more accurateresults

Figure 8 shows the estimated min-entropy results fornormal distributions and time-varying normal distributionsrespectively From Figures 8(a) and 8(b) we can see that theestimated results given by our predictors are close to thetheoretical entropy with normal distributions and time-varying normal distributions However the lowest entropyestimation results obtained from the 90Brsquos predictors givesignificant underestimates

Figure 9 shows the estimated min-entropy results forMarkov distributions We can see that the 90Brsquos predictorsalmost give underestimates compared with the theoreticalentropy while estimated results given by our predictors aremuch closer to the theoretical entropy than those obtainedfrom 90Brsquos predictors

To further obviously compare the accuracy of our and90Brsquos predictors we apply the predictors to the M-sequenceand the non-uniform distribution sequence by post-pro-cessing using LFSR and their theoretical entropy is a knownand fixed value

It is further confirmed that the higher stage (the max-imum step of correlation) M-sequence and nonuniformdistribution sequence by postprocessing using LFSR are ableto pass the NIST SP 800-22 statistical tests [8] e estimatedresults are listed in Tables 3 and 4 and the lowest entropyestimations from 90Brsquos predictors and our predictors foreach stage are shown in bold font

For M-sequence and nonuniform distribution by post-processing using LFSR the MultiMMC predictor presentedin the final 90B gives the most accurate entropy estimationresults for the stage le 16 However when the stage ofM-sequence and nonuniform distribution by postprocessingusing LFSR is greater than 16 the MultiMMC predictorcannot give accurate entropy estimation result because thispredictor is parameterized by k isin 1 2 16 (k is themaximum step of correlation) Perhaps we could set the

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7Es

timat

ed en

trop

y pe

r sam

ple

(a)

eoretical entropy90Brsquos predictorsOur predictors

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e1 2 3 4 5 6 70

eoretical entropy per sample

(b)

Figure 8 Comparison of the min-entropy estimations obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) normal distributions and (b) time-varying normal distributions

12 Security and Communication Networks

parameter of the MultiMMC predictor as a greater range toachieve a more accurate estimated result for the higher stagebut the time complexity grows exponentially with the pa-rameter k as we analyzed in Section 332 Moreover theFNN model can also give accurate estimated results eventhough the stages of M-sequence and LFSR are greater than16 However the RNN model can give accurate estimated

results only when the stage is 8erefore the FNNmodel ismore matched to M-sequence and nonuniform distributionby postprocessing using LFSR than RNN

We also compute the relative errors of estimated resultsfrom 90Brsquos predictors and our predictors over 80 se-quences from each class of simulated sources We calculatethe relative errors using the min-entropy obtained from90Brsquos predictors (the lowest estimation result of 90Brsquos fourpredictors) and our predictors (the lowest estimation re-sult of FNN and RNN) respectively As illustrated inTable 5 it shows that for all five classes of simulatedsources the errors of our predictors are lower than that ofthe 90Brsquos predictors Specially our approaches enable theentropy to be estimated with an error of less than 655but it is up to 1465 for 90Brsquos predictors Overall thisindicates that our proposed predictors have a betterperformance than that of 90Brsquos predictors on accuracy forboth stationary sequences and nonstationary sequenceswhich is consistent with the conclusion drawn in thefigures above

From Tables 2ndash4 we also find that the accuracy of theRNN predictive model is slightly higher than that of theFNN predictive model except for the cases of the Markovsources M-sequence and nonuniform distribution bypostprocessing using LFSR

We will further verify the applicability for time-varyingsources in Section 42 erefore through the evaluation onthe entropy estimation results of the above simulateddatasets we see that our proposed predictors are superior inaccuracy compared with the 90Brsquos predictors

412 Real-World Data We further apply our predictors tothe datasets which are generated from the RNGs deployed inthe real-world In fact the theoretical entropy of per sampleis unknown for these real-world sources so no error can becompared like the simulated datasets for the predictorsHowever the estimated results from the predictors pre-sented here can still be compared to the 90Brsquos predictorsbased on the knowledge that underestimates from thepredictors have theoretical bounds

Datasets of real-world data are produced using thefollowing approaches e first two are adopted in [24] andthe others are commonly used typical RNGs e estima-tions of the real-world sources are presented in Table 6

(i) RANDOMORG is is a service that providesrandom numbers based on atmospheric noise and isused in [24] It allows the user to specify the

Theoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

Figure 9 Comparison of the min-entropy estimations obtainedfrom our proposed predictors and 90Brsquos predictors with the the-oretical entropy for Markov distributions

Table 2 Relative errors of FNN and RNN estimation results

Simulated data class FNN () RNN ()Uniform 165 16Near-uniform 160 152Normal 117 108Time-varying normal 212 184Markov 602 7

Table 3 Estimated results for M-sequence (Hmin 0000)

Stage 8 10 12 14 16 18 20MultiMCW 0991 0996 0988 0989 0993 0999 lowast 1000Lag 1000 1000 1000 1000 1000 1000 1000MultiMMC 0000 0000 0000 0000 0000 1000 1000LZ78Y 1000 1000 1000 1000 1000 1000 0997 lowast

FNN 0000 0000 0000 0000 0000 0000 0000RNN 0000 1048 1007 1002 0996 09920 09997

Table 4 Estimated results for nonuniform distribution by post-processing using LFSR (Hmin 0152)

Stage 8 10 12 14 16 18 20MultiMCW 0440 0595 0743 0721 0998 0994 0998Lag 0581 0581 0680 0680 0992 0994 0999MultiMMC 0151 0153 0158 0181 0234 0995 0996LZ78Y 0567 0995 0766 0679 0997 0996lowast 0994lowast

FNN 0151 0145 0149 0147 0149 0142 0144RNN 0149 0947 1012 0998 1012 0997 0985

Security and Communication Networks 13

minimum and maximum values that are outpute sequence used here consists of bits

(ii) Ubldit TrueRNGpro TrueRNGpro provides a steadystream of random numbers through a USB CDCserial port which is a USB randomnumber generatorproduced by Ubldit is entropy source is also usedin [24] e sequence used here consists of bits

(iii) Linux kernel entropy source e Linux kernelrandom generator is used for the generation of areal-world sequence without any processing esequence used here is the last bit of per symbol

(iv) Linuxdevurandomedevurandom [6] of Linuxis used for the generation of a real-world sequencewith strict processing e sequence used hereconsists of bits

(v) Windows RNG Windows RNG [5] is used for thegeneration of a real-world sequence by calling aCrypto APIe sequence used here consists of bits

As illustrated in Table 6 the lowest entropy estimationfor each source is shown in bold font We see that ourpredictors perform better than 90Brsquos predictors because thelowest entropy estimation is always obtained from our workfor each real-world source Furthermore for Linux kernelentropy source we find that both of the predictor Lag andMultiMMC are able to give lower estimation results Itindicates that Linux kernel entropy source has periodicityand conforms to the Markov model which is well under-stood because the randomness of Linux kernel entropysource comes from human behaviors such as manipulatingthe mouse and keyboard In our work compared with theentropy estimations for other real-world sources FNN fitsmuch better than RNN for Linux kernel entropy sourcewhich is consistent with the previous view that FNN per-forms well in testing Markov sources

42 Comparison on the Scope of Applicability After evalu-ating the accuracy we further validate the scope of appli-cability of our proposed predictors and compare them withthat of the 90Brsquos predictors Kelsey et al [24] stated that eachof the 90Brsquos predictors performs well only for a specialdistribution as described in Section 221 To prove ourpredictor has better applicability the following four simu-lated datasets are generated which are suitable for eachpredictor employed in the final 90B

(i) Time-Varying Sourcese probability distribution ofdata sources is varying with timeeMCWpredictor

predicts the current output according to previousoutputs in a short period of time and thus the MCWpredictor performs well in these data sources

(ii) Periodic Sources e data source changes periodi-cally e lag predictor predicts the value that oc-curred samples back in the sequence as the currentoutput and thus the lag predictor performs well onsources with strong periodic behavior

(iii) Markov Sources e data sources can be modeledby the Markov model e MultiMMC predictorpredicts the current output according to theMarkovmodel and thus the MultiMMC predictor performswell on data from any process that can be accuratelymodeled by a Markov model

(iv) LZ78Y Sources e data sources can be efficientlycompressed by LZ78-like compression algorithmswhich applies to the LZ78Y predictor well

For each above simulated source we generate a set of 10simulated datasets each of which contains 106 samples andthe min-entropy is estimated by our and 90Brsquos predictorse final result for a predictor is the average value of 10estimated results corresponding to the 10 simulated datasetsfor one simulated source

421 Time-Varying Sources Firstly we generate the time-varying binary data which is suitable for the statistical be-haviors of the MCW predictor presented in the 90B Table 7shows the entropy estimation results for time-varying data

As shown in Table 7 symbol gradual(x) (x isin [0 1] thesame below) is defined as a simulated source that theprobability of output ldquo0rdquo changes gradually from x to 1 minus x

with time Symbol period(x) is defined as a simulated sourcethat the probability of output ldquo0rdquo changes periodically withtime and the probability varies from x to 1 minus x in one periode period length is set to 20 of the entire input datasetSymbol sudden(x) is defined as a simulated source that theprobability of output ldquo0rdquo changes suddenly with timenamely the probability is set to x for the first half of the inputdataset and 1 minus x for the last half

From Table 7 the estimation results for MCW predictorand our work are shown in bold font We see that the MCWpredictor gives the lowest and most accurate entropy esti-mations for the three types of time-varying data mentionedabove but it gives a little underestimates at gradual(02) andperiod(02) It is confirmed that the time-varying sourcesmentioned above match with the statistical behaviors of theMCW predictor Relatively we find that our proposedpredictive models are all capable to obtain the satisfiedentropy estimations that are close to the correct valueserefore it is proved that our proposed predictive modelsare suitable for the time-varying datamentioned above Notethat we calculate the min-entropy estimate according to theentire dataset rather than the last 20 of the input dataset forthese time-varying sources Because the probability distri-bution is varying with time the part of the input datasetcannot represent the overall distribution of the input dataset

Table 5 Relative errors of the final estimations of 90Brsquos predictorsand our predictors for five classes of simulated sources

Simulated data class 90Brsquos predictors () Our predictors ()Uniform 437 153Near-uniform 347 159Normal 608 157Time-varying normal 347 172Markov 1465 655

14 Security and Communication Networks

422 Periodic Sources Secondly we generate periodic datawhich are suitable for the statistical behaviors of the lagpredictor presented in 90B e following is entropy esti-mation results for periodic sequences e data source iscompletely obeying the periodic rule so the correct entropyis zero e bit width of samples is traversed from 2 to 8

As shown in Table 8 the estimation results for the lagpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated periodicsources we confirm that the lag predictor is suitable for theentropy estimation of this type of source as expected Rel-atively the RNN can also give the accurate min-entropyestimates ie estimated results are zeros us our pro-posed predictive models are suitable for the entropy esti-mation of the (strong) periodic data In addition theMultiMMC predictor can also give the accurate min-entropyestimations is is reasonable because periodicity is also aform of correlation

423 Markov Sources Next we generate multivariateM-sequences as Markov sources which fit the statisticalbehaviors of the MultiMMC predictor Specifically themultivariate M-sequences are composed of multiple M-se-quences with different initial states Due to the determinacyof this type of sequences the correct entropy is zero e bitwidth of the samples is also traversed from 2 to 8 emaximum step of correlation used here is set as 8 Table 9shows the estimated results for multivariate M-sequences

From Table 9 the estimation results for MultiMMCpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated Markovsources we confirm that theMultiMMC predictor is suitablefor the entropy estimation of this type of source as expectedRelatively the RNN can also give the accurate min-entropyestimations ie estimated results are zeros us our

proposed predictive models are suitable for the Markovsources

424 LZ78Y Sources Finally we verify the applicability ofthe LZ78Y sources is type of entropy source is difficult togenerate by simulating However we can still draw theconclusion that our proposed predictive models can beapplied to the LZ78Y sources according to Tables 8 and 9 initalic font Because the periodic data and Markov sequencesare compressible

425 Summary on Applicability Scope of Our PredictorsBy analyzing the experimental results of the above fourspecific simulated sources each of which is oriented towardsa certain predictor in the 90B we have a conclusion that ourpredictors can provide accurate estimated results of entropySo the proposed predictors are well applied to these entropysources as well as the 90Brsquos predictors In addition com-pared with 90Brsquos predictors our predictors have a betterperformance on the scope of applicability for testing the

Table 7 Entropy estimates for time-varying data

Data class Correct90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNgradual(02) 06345 05290 07808 07240 07790 06288 06289gradual(03) 07437 07378 09221 08416 09243 07430 07460gradual(04) 08645 08631 09786 09518 09739 08648 08637period(02) 06345 05537 07428 05537 07669 06205 06209period(03) 07437 07393 09218 08476 09233 07377 07375period(04) 08645 08639 09767 09632 09796 08653 08632sudden(02) 03219 03203 04663 03386 04484 03217 03229sudden(03) 05146 05110 05857 09984 07663 05110 05119sudden(04) 07370 07338 08699 09984 09389 07339 07345

Table 8 Entropy estimates for periodic sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 16458 00000 00000 11817 00079 000003 23318 00000 00000 15957 01315 000004 29147 00000 00000 18016 04748 000005 33269 00000 00000 14586 08898 000006 39092 00000 00000 08322 34944 000007 44908 00000 00000 03973 34960 000008 44919 00000 00000 02027 35408 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 6 Entropy estimates for real-world sources

Real-world sources90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNRANDOMORG 09951 09963 09966 09976 09802 09954Ubldit TrueRNGpro 09979 09955 09973 09966 09934 09728Linux kernel entropy source 06173 01232 01269 06164 01230 03068Linuxdevurandom 09952 09935 09990 09964 09983 09911Windows RNG 09953 09986 09975 09984 09833 09853

Security and Communication Networks 15

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 12: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

(i) M-Sequence A maximum length sequence which isa type of pseudorandom binary sequence ([35])

(ii) Nonuniform Distribution by Postprocessing UsingLFSR e samples are processed using a linearfeedback shifting register (LFSR) which come froman IID source ([35])

For every distribution mentioned above we also gen-erate a set of 80 simulated datasets each of which contains106 samples and estimate min-entropy by using our pro-posed predictors and final 90Brsquos predictors [23]

Figure 7 shows the estimated min-entropy results for the80 simulated datasets with uniform distributions and near-uniform distributions respectively From Figures 7(a) and7(b) we see that several points of the results obtained fromthe 90Brsquos predictors are apparently underestimated whichmay result from the overfitting phenomenon Comparedwith 90Brsquos predictors our predictors provide more accurateresults

Figure 8 shows the estimated min-entropy results fornormal distributions and time-varying normal distributionsrespectively From Figures 8(a) and 8(b) we can see that theestimated results given by our predictors are close to thetheoretical entropy with normal distributions and time-varying normal distributions However the lowest entropyestimation results obtained from the 90Brsquos predictors givesignificant underestimates

Figure 9 shows the estimated min-entropy results forMarkov distributions We can see that the 90Brsquos predictorsalmost give underestimates compared with the theoreticalentropy while estimated results given by our predictors aremuch closer to the theoretical entropy than those obtainedfrom 90Brsquos predictors

To further obviously compare the accuracy of our and90Brsquos predictors we apply the predictors to the M-sequenceand the non-uniform distribution sequence by post-pro-cessing using LFSR and their theoretical entropy is a knownand fixed value

It is further confirmed that the higher stage (the max-imum step of correlation) M-sequence and nonuniformdistribution sequence by postprocessing using LFSR are ableto pass the NIST SP 800-22 statistical tests [8] e estimatedresults are listed in Tables 3 and 4 and the lowest entropyestimations from 90Brsquos predictors and our predictors foreach stage are shown in bold font

For M-sequence and nonuniform distribution by post-processing using LFSR the MultiMMC predictor presentedin the final 90B gives the most accurate entropy estimationresults for the stage le 16 However when the stage ofM-sequence and nonuniform distribution by postprocessingusing LFSR is greater than 16 the MultiMMC predictorcannot give accurate entropy estimation result because thispredictor is parameterized by k isin 1 2 16 (k is themaximum step of correlation) Perhaps we could set the

eoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70eoretical entropy per sample

0

1

2

3

4

5

6

7Es

timat

ed en

trop

y pe

r sam

ple

(a)

eoretical entropy90Brsquos predictorsOur predictors

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e1 2 3 4 5 6 70

eoretical entropy per sample

(b)

Figure 8 Comparison of the min-entropy estimations obtained from our proposed predictors and 90Brsquos predictors with the theoreticalentropy Estimations for (a) normal distributions and (b) time-varying normal distributions

12 Security and Communication Networks

parameter of the MultiMMC predictor as a greater range toachieve a more accurate estimated result for the higher stagebut the time complexity grows exponentially with the pa-rameter k as we analyzed in Section 332 Moreover theFNN model can also give accurate estimated results eventhough the stages of M-sequence and LFSR are greater than16 However the RNN model can give accurate estimated

results only when the stage is 8erefore the FNNmodel ismore matched to M-sequence and nonuniform distributionby postprocessing using LFSR than RNN

We also compute the relative errors of estimated resultsfrom 90Brsquos predictors and our predictors over 80 se-quences from each class of simulated sources We calculatethe relative errors using the min-entropy obtained from90Brsquos predictors (the lowest estimation result of 90Brsquos fourpredictors) and our predictors (the lowest estimation re-sult of FNN and RNN) respectively As illustrated inTable 5 it shows that for all five classes of simulatedsources the errors of our predictors are lower than that ofthe 90Brsquos predictors Specially our approaches enable theentropy to be estimated with an error of less than 655but it is up to 1465 for 90Brsquos predictors Overall thisindicates that our proposed predictors have a betterperformance than that of 90Brsquos predictors on accuracy forboth stationary sequences and nonstationary sequenceswhich is consistent with the conclusion drawn in thefigures above

From Tables 2ndash4 we also find that the accuracy of theRNN predictive model is slightly higher than that of theFNN predictive model except for the cases of the Markovsources M-sequence and nonuniform distribution bypostprocessing using LFSR

We will further verify the applicability for time-varyingsources in Section 42 erefore through the evaluation onthe entropy estimation results of the above simulateddatasets we see that our proposed predictors are superior inaccuracy compared with the 90Brsquos predictors

412 Real-World Data We further apply our predictors tothe datasets which are generated from the RNGs deployed inthe real-world In fact the theoretical entropy of per sampleis unknown for these real-world sources so no error can becompared like the simulated datasets for the predictorsHowever the estimated results from the predictors pre-sented here can still be compared to the 90Brsquos predictorsbased on the knowledge that underestimates from thepredictors have theoretical bounds

Datasets of real-world data are produced using thefollowing approaches e first two are adopted in [24] andthe others are commonly used typical RNGs e estima-tions of the real-world sources are presented in Table 6

(i) RANDOMORG is is a service that providesrandom numbers based on atmospheric noise and isused in [24] It allows the user to specify the

Theoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

Figure 9 Comparison of the min-entropy estimations obtainedfrom our proposed predictors and 90Brsquos predictors with the the-oretical entropy for Markov distributions

Table 2 Relative errors of FNN and RNN estimation results

Simulated data class FNN () RNN ()Uniform 165 16Near-uniform 160 152Normal 117 108Time-varying normal 212 184Markov 602 7

Table 3 Estimated results for M-sequence (Hmin 0000)

Stage 8 10 12 14 16 18 20MultiMCW 0991 0996 0988 0989 0993 0999 lowast 1000Lag 1000 1000 1000 1000 1000 1000 1000MultiMMC 0000 0000 0000 0000 0000 1000 1000LZ78Y 1000 1000 1000 1000 1000 1000 0997 lowast

FNN 0000 0000 0000 0000 0000 0000 0000RNN 0000 1048 1007 1002 0996 09920 09997

Table 4 Estimated results for nonuniform distribution by post-processing using LFSR (Hmin 0152)

Stage 8 10 12 14 16 18 20MultiMCW 0440 0595 0743 0721 0998 0994 0998Lag 0581 0581 0680 0680 0992 0994 0999MultiMMC 0151 0153 0158 0181 0234 0995 0996LZ78Y 0567 0995 0766 0679 0997 0996lowast 0994lowast

FNN 0151 0145 0149 0147 0149 0142 0144RNN 0149 0947 1012 0998 1012 0997 0985

Security and Communication Networks 13

minimum and maximum values that are outpute sequence used here consists of bits

(ii) Ubldit TrueRNGpro TrueRNGpro provides a steadystream of random numbers through a USB CDCserial port which is a USB randomnumber generatorproduced by Ubldit is entropy source is also usedin [24] e sequence used here consists of bits

(iii) Linux kernel entropy source e Linux kernelrandom generator is used for the generation of areal-world sequence without any processing esequence used here is the last bit of per symbol

(iv) Linuxdevurandomedevurandom [6] of Linuxis used for the generation of a real-world sequencewith strict processing e sequence used hereconsists of bits

(v) Windows RNG Windows RNG [5] is used for thegeneration of a real-world sequence by calling aCrypto APIe sequence used here consists of bits

As illustrated in Table 6 the lowest entropy estimationfor each source is shown in bold font We see that ourpredictors perform better than 90Brsquos predictors because thelowest entropy estimation is always obtained from our workfor each real-world source Furthermore for Linux kernelentropy source we find that both of the predictor Lag andMultiMMC are able to give lower estimation results Itindicates that Linux kernel entropy source has periodicityand conforms to the Markov model which is well under-stood because the randomness of Linux kernel entropysource comes from human behaviors such as manipulatingthe mouse and keyboard In our work compared with theentropy estimations for other real-world sources FNN fitsmuch better than RNN for Linux kernel entropy sourcewhich is consistent with the previous view that FNN per-forms well in testing Markov sources

42 Comparison on the Scope of Applicability After evalu-ating the accuracy we further validate the scope of appli-cability of our proposed predictors and compare them withthat of the 90Brsquos predictors Kelsey et al [24] stated that eachof the 90Brsquos predictors performs well only for a specialdistribution as described in Section 221 To prove ourpredictor has better applicability the following four simu-lated datasets are generated which are suitable for eachpredictor employed in the final 90B

(i) Time-Varying Sourcese probability distribution ofdata sources is varying with timeeMCWpredictor

predicts the current output according to previousoutputs in a short period of time and thus the MCWpredictor performs well in these data sources

(ii) Periodic Sources e data source changes periodi-cally e lag predictor predicts the value that oc-curred samples back in the sequence as the currentoutput and thus the lag predictor performs well onsources with strong periodic behavior

(iii) Markov Sources e data sources can be modeledby the Markov model e MultiMMC predictorpredicts the current output according to theMarkovmodel and thus the MultiMMC predictor performswell on data from any process that can be accuratelymodeled by a Markov model

(iv) LZ78Y Sources e data sources can be efficientlycompressed by LZ78-like compression algorithmswhich applies to the LZ78Y predictor well

For each above simulated source we generate a set of 10simulated datasets each of which contains 106 samples andthe min-entropy is estimated by our and 90Brsquos predictorse final result for a predictor is the average value of 10estimated results corresponding to the 10 simulated datasetsfor one simulated source

421 Time-Varying Sources Firstly we generate the time-varying binary data which is suitable for the statistical be-haviors of the MCW predictor presented in the 90B Table 7shows the entropy estimation results for time-varying data

As shown in Table 7 symbol gradual(x) (x isin [0 1] thesame below) is defined as a simulated source that theprobability of output ldquo0rdquo changes gradually from x to 1 minus x

with time Symbol period(x) is defined as a simulated sourcethat the probability of output ldquo0rdquo changes periodically withtime and the probability varies from x to 1 minus x in one periode period length is set to 20 of the entire input datasetSymbol sudden(x) is defined as a simulated source that theprobability of output ldquo0rdquo changes suddenly with timenamely the probability is set to x for the first half of the inputdataset and 1 minus x for the last half

From Table 7 the estimation results for MCW predictorand our work are shown in bold font We see that the MCWpredictor gives the lowest and most accurate entropy esti-mations for the three types of time-varying data mentionedabove but it gives a little underestimates at gradual(02) andperiod(02) It is confirmed that the time-varying sourcesmentioned above match with the statistical behaviors of theMCW predictor Relatively we find that our proposedpredictive models are all capable to obtain the satisfiedentropy estimations that are close to the correct valueserefore it is proved that our proposed predictive modelsare suitable for the time-varying datamentioned above Notethat we calculate the min-entropy estimate according to theentire dataset rather than the last 20 of the input dataset forthese time-varying sources Because the probability distri-bution is varying with time the part of the input datasetcannot represent the overall distribution of the input dataset

Table 5 Relative errors of the final estimations of 90Brsquos predictorsand our predictors for five classes of simulated sources

Simulated data class 90Brsquos predictors () Our predictors ()Uniform 437 153Near-uniform 347 159Normal 608 157Time-varying normal 347 172Markov 1465 655

14 Security and Communication Networks

422 Periodic Sources Secondly we generate periodic datawhich are suitable for the statistical behaviors of the lagpredictor presented in 90B e following is entropy esti-mation results for periodic sequences e data source iscompletely obeying the periodic rule so the correct entropyis zero e bit width of samples is traversed from 2 to 8

As shown in Table 8 the estimation results for the lagpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated periodicsources we confirm that the lag predictor is suitable for theentropy estimation of this type of source as expected Rel-atively the RNN can also give the accurate min-entropyestimates ie estimated results are zeros us our pro-posed predictive models are suitable for the entropy esti-mation of the (strong) periodic data In addition theMultiMMC predictor can also give the accurate min-entropyestimations is is reasonable because periodicity is also aform of correlation

423 Markov Sources Next we generate multivariateM-sequences as Markov sources which fit the statisticalbehaviors of the MultiMMC predictor Specifically themultivariate M-sequences are composed of multiple M-se-quences with different initial states Due to the determinacyof this type of sequences the correct entropy is zero e bitwidth of the samples is also traversed from 2 to 8 emaximum step of correlation used here is set as 8 Table 9shows the estimated results for multivariate M-sequences

From Table 9 the estimation results for MultiMMCpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated Markovsources we confirm that theMultiMMC predictor is suitablefor the entropy estimation of this type of source as expectedRelatively the RNN can also give the accurate min-entropyestimations ie estimated results are zeros us our

proposed predictive models are suitable for the Markovsources

424 LZ78Y Sources Finally we verify the applicability ofthe LZ78Y sources is type of entropy source is difficult togenerate by simulating However we can still draw theconclusion that our proposed predictive models can beapplied to the LZ78Y sources according to Tables 8 and 9 initalic font Because the periodic data and Markov sequencesare compressible

425 Summary on Applicability Scope of Our PredictorsBy analyzing the experimental results of the above fourspecific simulated sources each of which is oriented towardsa certain predictor in the 90B we have a conclusion that ourpredictors can provide accurate estimated results of entropySo the proposed predictors are well applied to these entropysources as well as the 90Brsquos predictors In addition com-pared with 90Brsquos predictors our predictors have a betterperformance on the scope of applicability for testing the

Table 7 Entropy estimates for time-varying data

Data class Correct90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNgradual(02) 06345 05290 07808 07240 07790 06288 06289gradual(03) 07437 07378 09221 08416 09243 07430 07460gradual(04) 08645 08631 09786 09518 09739 08648 08637period(02) 06345 05537 07428 05537 07669 06205 06209period(03) 07437 07393 09218 08476 09233 07377 07375period(04) 08645 08639 09767 09632 09796 08653 08632sudden(02) 03219 03203 04663 03386 04484 03217 03229sudden(03) 05146 05110 05857 09984 07663 05110 05119sudden(04) 07370 07338 08699 09984 09389 07339 07345

Table 8 Entropy estimates for periodic sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 16458 00000 00000 11817 00079 000003 23318 00000 00000 15957 01315 000004 29147 00000 00000 18016 04748 000005 33269 00000 00000 14586 08898 000006 39092 00000 00000 08322 34944 000007 44908 00000 00000 03973 34960 000008 44919 00000 00000 02027 35408 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 6 Entropy estimates for real-world sources

Real-world sources90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNRANDOMORG 09951 09963 09966 09976 09802 09954Ubldit TrueRNGpro 09979 09955 09973 09966 09934 09728Linux kernel entropy source 06173 01232 01269 06164 01230 03068Linuxdevurandom 09952 09935 09990 09964 09983 09911Windows RNG 09953 09986 09975 09984 09833 09853

Security and Communication Networks 15

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 13: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

parameter of the MultiMMC predictor as a greater range toachieve a more accurate estimated result for the higher stagebut the time complexity grows exponentially with the pa-rameter k as we analyzed in Section 332 Moreover theFNN model can also give accurate estimated results eventhough the stages of M-sequence and LFSR are greater than16 However the RNN model can give accurate estimated

results only when the stage is 8erefore the FNNmodel ismore matched to M-sequence and nonuniform distributionby postprocessing using LFSR than RNN

We also compute the relative errors of estimated resultsfrom 90Brsquos predictors and our predictors over 80 se-quences from each class of simulated sources We calculatethe relative errors using the min-entropy obtained from90Brsquos predictors (the lowest estimation result of 90Brsquos fourpredictors) and our predictors (the lowest estimation re-sult of FNN and RNN) respectively As illustrated inTable 5 it shows that for all five classes of simulatedsources the errors of our predictors are lower than that ofthe 90Brsquos predictors Specially our approaches enable theentropy to be estimated with an error of less than 655but it is up to 1465 for 90Brsquos predictors Overall thisindicates that our proposed predictors have a betterperformance than that of 90Brsquos predictors on accuracy forboth stationary sequences and nonstationary sequenceswhich is consistent with the conclusion drawn in thefigures above

From Tables 2ndash4 we also find that the accuracy of theRNN predictive model is slightly higher than that of theFNN predictive model except for the cases of the Markovsources M-sequence and nonuniform distribution bypostprocessing using LFSR

We will further verify the applicability for time-varyingsources in Section 42 erefore through the evaluation onthe entropy estimation results of the above simulateddatasets we see that our proposed predictors are superior inaccuracy compared with the 90Brsquos predictors

412 Real-World Data We further apply our predictors tothe datasets which are generated from the RNGs deployed inthe real-world In fact the theoretical entropy of per sampleis unknown for these real-world sources so no error can becompared like the simulated datasets for the predictorsHowever the estimated results from the predictors pre-sented here can still be compared to the 90Brsquos predictorsbased on the knowledge that underestimates from thepredictors have theoretical bounds

Datasets of real-world data are produced using thefollowing approaches e first two are adopted in [24] andthe others are commonly used typical RNGs e estima-tions of the real-world sources are presented in Table 6

(i) RANDOMORG is is a service that providesrandom numbers based on atmospheric noise and isused in [24] It allows the user to specify the

Theoretical entropy90Brsquos predictorsOur predictors

1 2 3 4 5 6 70Theoretical entropy per sample

Markov distributions

0

1

2

3

4

5

6

7

Estim

ated

entr

opy

per s

ampl

e

Figure 9 Comparison of the min-entropy estimations obtainedfrom our proposed predictors and 90Brsquos predictors with the the-oretical entropy for Markov distributions

Table 2 Relative errors of FNN and RNN estimation results

Simulated data class FNN () RNN ()Uniform 165 16Near-uniform 160 152Normal 117 108Time-varying normal 212 184Markov 602 7

Table 3 Estimated results for M-sequence (Hmin 0000)

Stage 8 10 12 14 16 18 20MultiMCW 0991 0996 0988 0989 0993 0999 lowast 1000Lag 1000 1000 1000 1000 1000 1000 1000MultiMMC 0000 0000 0000 0000 0000 1000 1000LZ78Y 1000 1000 1000 1000 1000 1000 0997 lowast

FNN 0000 0000 0000 0000 0000 0000 0000RNN 0000 1048 1007 1002 0996 09920 09997

Table 4 Estimated results for nonuniform distribution by post-processing using LFSR (Hmin 0152)

Stage 8 10 12 14 16 18 20MultiMCW 0440 0595 0743 0721 0998 0994 0998Lag 0581 0581 0680 0680 0992 0994 0999MultiMMC 0151 0153 0158 0181 0234 0995 0996LZ78Y 0567 0995 0766 0679 0997 0996lowast 0994lowast

FNN 0151 0145 0149 0147 0149 0142 0144RNN 0149 0947 1012 0998 1012 0997 0985

Security and Communication Networks 13

minimum and maximum values that are outpute sequence used here consists of bits

(ii) Ubldit TrueRNGpro TrueRNGpro provides a steadystream of random numbers through a USB CDCserial port which is a USB randomnumber generatorproduced by Ubldit is entropy source is also usedin [24] e sequence used here consists of bits

(iii) Linux kernel entropy source e Linux kernelrandom generator is used for the generation of areal-world sequence without any processing esequence used here is the last bit of per symbol

(iv) Linuxdevurandomedevurandom [6] of Linuxis used for the generation of a real-world sequencewith strict processing e sequence used hereconsists of bits

(v) Windows RNG Windows RNG [5] is used for thegeneration of a real-world sequence by calling aCrypto APIe sequence used here consists of bits

As illustrated in Table 6 the lowest entropy estimationfor each source is shown in bold font We see that ourpredictors perform better than 90Brsquos predictors because thelowest entropy estimation is always obtained from our workfor each real-world source Furthermore for Linux kernelentropy source we find that both of the predictor Lag andMultiMMC are able to give lower estimation results Itindicates that Linux kernel entropy source has periodicityand conforms to the Markov model which is well under-stood because the randomness of Linux kernel entropysource comes from human behaviors such as manipulatingthe mouse and keyboard In our work compared with theentropy estimations for other real-world sources FNN fitsmuch better than RNN for Linux kernel entropy sourcewhich is consistent with the previous view that FNN per-forms well in testing Markov sources

42 Comparison on the Scope of Applicability After evalu-ating the accuracy we further validate the scope of appli-cability of our proposed predictors and compare them withthat of the 90Brsquos predictors Kelsey et al [24] stated that eachof the 90Brsquos predictors performs well only for a specialdistribution as described in Section 221 To prove ourpredictor has better applicability the following four simu-lated datasets are generated which are suitable for eachpredictor employed in the final 90B

(i) Time-Varying Sourcese probability distribution ofdata sources is varying with timeeMCWpredictor

predicts the current output according to previousoutputs in a short period of time and thus the MCWpredictor performs well in these data sources

(ii) Periodic Sources e data source changes periodi-cally e lag predictor predicts the value that oc-curred samples back in the sequence as the currentoutput and thus the lag predictor performs well onsources with strong periodic behavior

(iii) Markov Sources e data sources can be modeledby the Markov model e MultiMMC predictorpredicts the current output according to theMarkovmodel and thus the MultiMMC predictor performswell on data from any process that can be accuratelymodeled by a Markov model

(iv) LZ78Y Sources e data sources can be efficientlycompressed by LZ78-like compression algorithmswhich applies to the LZ78Y predictor well

For each above simulated source we generate a set of 10simulated datasets each of which contains 106 samples andthe min-entropy is estimated by our and 90Brsquos predictorse final result for a predictor is the average value of 10estimated results corresponding to the 10 simulated datasetsfor one simulated source

421 Time-Varying Sources Firstly we generate the time-varying binary data which is suitable for the statistical be-haviors of the MCW predictor presented in the 90B Table 7shows the entropy estimation results for time-varying data

As shown in Table 7 symbol gradual(x) (x isin [0 1] thesame below) is defined as a simulated source that theprobability of output ldquo0rdquo changes gradually from x to 1 minus x

with time Symbol period(x) is defined as a simulated sourcethat the probability of output ldquo0rdquo changes periodically withtime and the probability varies from x to 1 minus x in one periode period length is set to 20 of the entire input datasetSymbol sudden(x) is defined as a simulated source that theprobability of output ldquo0rdquo changes suddenly with timenamely the probability is set to x for the first half of the inputdataset and 1 minus x for the last half

From Table 7 the estimation results for MCW predictorand our work are shown in bold font We see that the MCWpredictor gives the lowest and most accurate entropy esti-mations for the three types of time-varying data mentionedabove but it gives a little underestimates at gradual(02) andperiod(02) It is confirmed that the time-varying sourcesmentioned above match with the statistical behaviors of theMCW predictor Relatively we find that our proposedpredictive models are all capable to obtain the satisfiedentropy estimations that are close to the correct valueserefore it is proved that our proposed predictive modelsare suitable for the time-varying datamentioned above Notethat we calculate the min-entropy estimate according to theentire dataset rather than the last 20 of the input dataset forthese time-varying sources Because the probability distri-bution is varying with time the part of the input datasetcannot represent the overall distribution of the input dataset

Table 5 Relative errors of the final estimations of 90Brsquos predictorsand our predictors for five classes of simulated sources

Simulated data class 90Brsquos predictors () Our predictors ()Uniform 437 153Near-uniform 347 159Normal 608 157Time-varying normal 347 172Markov 1465 655

14 Security and Communication Networks

422 Periodic Sources Secondly we generate periodic datawhich are suitable for the statistical behaviors of the lagpredictor presented in 90B e following is entropy esti-mation results for periodic sequences e data source iscompletely obeying the periodic rule so the correct entropyis zero e bit width of samples is traversed from 2 to 8

As shown in Table 8 the estimation results for the lagpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated periodicsources we confirm that the lag predictor is suitable for theentropy estimation of this type of source as expected Rel-atively the RNN can also give the accurate min-entropyestimates ie estimated results are zeros us our pro-posed predictive models are suitable for the entropy esti-mation of the (strong) periodic data In addition theMultiMMC predictor can also give the accurate min-entropyestimations is is reasonable because periodicity is also aform of correlation

423 Markov Sources Next we generate multivariateM-sequences as Markov sources which fit the statisticalbehaviors of the MultiMMC predictor Specifically themultivariate M-sequences are composed of multiple M-se-quences with different initial states Due to the determinacyof this type of sequences the correct entropy is zero e bitwidth of the samples is also traversed from 2 to 8 emaximum step of correlation used here is set as 8 Table 9shows the estimated results for multivariate M-sequences

From Table 9 the estimation results for MultiMMCpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated Markovsources we confirm that theMultiMMC predictor is suitablefor the entropy estimation of this type of source as expectedRelatively the RNN can also give the accurate min-entropyestimations ie estimated results are zeros us our

proposed predictive models are suitable for the Markovsources

424 LZ78Y Sources Finally we verify the applicability ofthe LZ78Y sources is type of entropy source is difficult togenerate by simulating However we can still draw theconclusion that our proposed predictive models can beapplied to the LZ78Y sources according to Tables 8 and 9 initalic font Because the periodic data and Markov sequencesare compressible

425 Summary on Applicability Scope of Our PredictorsBy analyzing the experimental results of the above fourspecific simulated sources each of which is oriented towardsa certain predictor in the 90B we have a conclusion that ourpredictors can provide accurate estimated results of entropySo the proposed predictors are well applied to these entropysources as well as the 90Brsquos predictors In addition com-pared with 90Brsquos predictors our predictors have a betterperformance on the scope of applicability for testing the

Table 7 Entropy estimates for time-varying data

Data class Correct90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNgradual(02) 06345 05290 07808 07240 07790 06288 06289gradual(03) 07437 07378 09221 08416 09243 07430 07460gradual(04) 08645 08631 09786 09518 09739 08648 08637period(02) 06345 05537 07428 05537 07669 06205 06209period(03) 07437 07393 09218 08476 09233 07377 07375period(04) 08645 08639 09767 09632 09796 08653 08632sudden(02) 03219 03203 04663 03386 04484 03217 03229sudden(03) 05146 05110 05857 09984 07663 05110 05119sudden(04) 07370 07338 08699 09984 09389 07339 07345

Table 8 Entropy estimates for periodic sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 16458 00000 00000 11817 00079 000003 23318 00000 00000 15957 01315 000004 29147 00000 00000 18016 04748 000005 33269 00000 00000 14586 08898 000006 39092 00000 00000 08322 34944 000007 44908 00000 00000 03973 34960 000008 44919 00000 00000 02027 35408 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 6 Entropy estimates for real-world sources

Real-world sources90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNRANDOMORG 09951 09963 09966 09976 09802 09954Ubldit TrueRNGpro 09979 09955 09973 09966 09934 09728Linux kernel entropy source 06173 01232 01269 06164 01230 03068Linuxdevurandom 09952 09935 09990 09964 09983 09911Windows RNG 09953 09986 09975 09984 09833 09853

Security and Communication Networks 15

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 14: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

minimum and maximum values that are outpute sequence used here consists of bits

(ii) Ubldit TrueRNGpro TrueRNGpro provides a steadystream of random numbers through a USB CDCserial port which is a USB randomnumber generatorproduced by Ubldit is entropy source is also usedin [24] e sequence used here consists of bits

(iii) Linux kernel entropy source e Linux kernelrandom generator is used for the generation of areal-world sequence without any processing esequence used here is the last bit of per symbol

(iv) Linuxdevurandomedevurandom [6] of Linuxis used for the generation of a real-world sequencewith strict processing e sequence used hereconsists of bits

(v) Windows RNG Windows RNG [5] is used for thegeneration of a real-world sequence by calling aCrypto APIe sequence used here consists of bits

As illustrated in Table 6 the lowest entropy estimationfor each source is shown in bold font We see that ourpredictors perform better than 90Brsquos predictors because thelowest entropy estimation is always obtained from our workfor each real-world source Furthermore for Linux kernelentropy source we find that both of the predictor Lag andMultiMMC are able to give lower estimation results Itindicates that Linux kernel entropy source has periodicityand conforms to the Markov model which is well under-stood because the randomness of Linux kernel entropysource comes from human behaviors such as manipulatingthe mouse and keyboard In our work compared with theentropy estimations for other real-world sources FNN fitsmuch better than RNN for Linux kernel entropy sourcewhich is consistent with the previous view that FNN per-forms well in testing Markov sources

42 Comparison on the Scope of Applicability After evalu-ating the accuracy we further validate the scope of appli-cability of our proposed predictors and compare them withthat of the 90Brsquos predictors Kelsey et al [24] stated that eachof the 90Brsquos predictors performs well only for a specialdistribution as described in Section 221 To prove ourpredictor has better applicability the following four simu-lated datasets are generated which are suitable for eachpredictor employed in the final 90B

(i) Time-Varying Sourcese probability distribution ofdata sources is varying with timeeMCWpredictor

predicts the current output according to previousoutputs in a short period of time and thus the MCWpredictor performs well in these data sources

(ii) Periodic Sources e data source changes periodi-cally e lag predictor predicts the value that oc-curred samples back in the sequence as the currentoutput and thus the lag predictor performs well onsources with strong periodic behavior

(iii) Markov Sources e data sources can be modeledby the Markov model e MultiMMC predictorpredicts the current output according to theMarkovmodel and thus the MultiMMC predictor performswell on data from any process that can be accuratelymodeled by a Markov model

(iv) LZ78Y Sources e data sources can be efficientlycompressed by LZ78-like compression algorithmswhich applies to the LZ78Y predictor well

For each above simulated source we generate a set of 10simulated datasets each of which contains 106 samples andthe min-entropy is estimated by our and 90Brsquos predictorse final result for a predictor is the average value of 10estimated results corresponding to the 10 simulated datasetsfor one simulated source

421 Time-Varying Sources Firstly we generate the time-varying binary data which is suitable for the statistical be-haviors of the MCW predictor presented in the 90B Table 7shows the entropy estimation results for time-varying data

As shown in Table 7 symbol gradual(x) (x isin [0 1] thesame below) is defined as a simulated source that theprobability of output ldquo0rdquo changes gradually from x to 1 minus x

with time Symbol period(x) is defined as a simulated sourcethat the probability of output ldquo0rdquo changes periodically withtime and the probability varies from x to 1 minus x in one periode period length is set to 20 of the entire input datasetSymbol sudden(x) is defined as a simulated source that theprobability of output ldquo0rdquo changes suddenly with timenamely the probability is set to x for the first half of the inputdataset and 1 minus x for the last half

From Table 7 the estimation results for MCW predictorand our work are shown in bold font We see that the MCWpredictor gives the lowest and most accurate entropy esti-mations for the three types of time-varying data mentionedabove but it gives a little underestimates at gradual(02) andperiod(02) It is confirmed that the time-varying sourcesmentioned above match with the statistical behaviors of theMCW predictor Relatively we find that our proposedpredictive models are all capable to obtain the satisfiedentropy estimations that are close to the correct valueserefore it is proved that our proposed predictive modelsare suitable for the time-varying datamentioned above Notethat we calculate the min-entropy estimate according to theentire dataset rather than the last 20 of the input dataset forthese time-varying sources Because the probability distri-bution is varying with time the part of the input datasetcannot represent the overall distribution of the input dataset

Table 5 Relative errors of the final estimations of 90Brsquos predictorsand our predictors for five classes of simulated sources

Simulated data class 90Brsquos predictors () Our predictors ()Uniform 437 153Near-uniform 347 159Normal 608 157Time-varying normal 347 172Markov 1465 655

14 Security and Communication Networks

422 Periodic Sources Secondly we generate periodic datawhich are suitable for the statistical behaviors of the lagpredictor presented in 90B e following is entropy esti-mation results for periodic sequences e data source iscompletely obeying the periodic rule so the correct entropyis zero e bit width of samples is traversed from 2 to 8

As shown in Table 8 the estimation results for the lagpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated periodicsources we confirm that the lag predictor is suitable for theentropy estimation of this type of source as expected Rel-atively the RNN can also give the accurate min-entropyestimates ie estimated results are zeros us our pro-posed predictive models are suitable for the entropy esti-mation of the (strong) periodic data In addition theMultiMMC predictor can also give the accurate min-entropyestimations is is reasonable because periodicity is also aform of correlation

423 Markov Sources Next we generate multivariateM-sequences as Markov sources which fit the statisticalbehaviors of the MultiMMC predictor Specifically themultivariate M-sequences are composed of multiple M-se-quences with different initial states Due to the determinacyof this type of sequences the correct entropy is zero e bitwidth of the samples is also traversed from 2 to 8 emaximum step of correlation used here is set as 8 Table 9shows the estimated results for multivariate M-sequences

From Table 9 the estimation results for MultiMMCpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated Markovsources we confirm that theMultiMMC predictor is suitablefor the entropy estimation of this type of source as expectedRelatively the RNN can also give the accurate min-entropyestimations ie estimated results are zeros us our

proposed predictive models are suitable for the Markovsources

424 LZ78Y Sources Finally we verify the applicability ofthe LZ78Y sources is type of entropy source is difficult togenerate by simulating However we can still draw theconclusion that our proposed predictive models can beapplied to the LZ78Y sources according to Tables 8 and 9 initalic font Because the periodic data and Markov sequencesare compressible

425 Summary on Applicability Scope of Our PredictorsBy analyzing the experimental results of the above fourspecific simulated sources each of which is oriented towardsa certain predictor in the 90B we have a conclusion that ourpredictors can provide accurate estimated results of entropySo the proposed predictors are well applied to these entropysources as well as the 90Brsquos predictors In addition com-pared with 90Brsquos predictors our predictors have a betterperformance on the scope of applicability for testing the

Table 7 Entropy estimates for time-varying data

Data class Correct90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNgradual(02) 06345 05290 07808 07240 07790 06288 06289gradual(03) 07437 07378 09221 08416 09243 07430 07460gradual(04) 08645 08631 09786 09518 09739 08648 08637period(02) 06345 05537 07428 05537 07669 06205 06209period(03) 07437 07393 09218 08476 09233 07377 07375period(04) 08645 08639 09767 09632 09796 08653 08632sudden(02) 03219 03203 04663 03386 04484 03217 03229sudden(03) 05146 05110 05857 09984 07663 05110 05119sudden(04) 07370 07338 08699 09984 09389 07339 07345

Table 8 Entropy estimates for periodic sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 16458 00000 00000 11817 00079 000003 23318 00000 00000 15957 01315 000004 29147 00000 00000 18016 04748 000005 33269 00000 00000 14586 08898 000006 39092 00000 00000 08322 34944 000007 44908 00000 00000 03973 34960 000008 44919 00000 00000 02027 35408 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 6 Entropy estimates for real-world sources

Real-world sources90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNRANDOMORG 09951 09963 09966 09976 09802 09954Ubldit TrueRNGpro 09979 09955 09973 09966 09934 09728Linux kernel entropy source 06173 01232 01269 06164 01230 03068Linuxdevurandom 09952 09935 09990 09964 09983 09911Windows RNG 09953 09986 09975 09984 09833 09853

Security and Communication Networks 15

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 15: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

422 Periodic Sources Secondly we generate periodic datawhich are suitable for the statistical behaviors of the lagpredictor presented in 90B e following is entropy esti-mation results for periodic sequences e data source iscompletely obeying the periodic rule so the correct entropyis zero e bit width of samples is traversed from 2 to 8

As shown in Table 8 the estimation results for the lagpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated periodicsources we confirm that the lag predictor is suitable for theentropy estimation of this type of source as expected Rel-atively the RNN can also give the accurate min-entropyestimates ie estimated results are zeros us our pro-posed predictive models are suitable for the entropy esti-mation of the (strong) periodic data In addition theMultiMMC predictor can also give the accurate min-entropyestimations is is reasonable because periodicity is also aform of correlation

423 Markov Sources Next we generate multivariateM-sequences as Markov sources which fit the statisticalbehaviors of the MultiMMC predictor Specifically themultivariate M-sequences are composed of multiple M-se-quences with different initial states Due to the determinacyof this type of sequences the correct entropy is zero e bitwidth of the samples is also traversed from 2 to 8 emaximum step of correlation used here is set as 8 Table 9shows the estimated results for multivariate M-sequences

From Table 9 the estimation results for MultiMMCpredictor and our work are shown in bold font According tothe correct entropy (is equal to 0) of the simulated Markovsources we confirm that theMultiMMC predictor is suitablefor the entropy estimation of this type of source as expectedRelatively the RNN can also give the accurate min-entropyestimations ie estimated results are zeros us our

proposed predictive models are suitable for the Markovsources

424 LZ78Y Sources Finally we verify the applicability ofthe LZ78Y sources is type of entropy source is difficult togenerate by simulating However we can still draw theconclusion that our proposed predictive models can beapplied to the LZ78Y sources according to Tables 8 and 9 initalic font Because the periodic data and Markov sequencesare compressible

425 Summary on Applicability Scope of Our PredictorsBy analyzing the experimental results of the above fourspecific simulated sources each of which is oriented towardsa certain predictor in the 90B we have a conclusion that ourpredictors can provide accurate estimated results of entropySo the proposed predictors are well applied to these entropysources as well as the 90Brsquos predictors In addition com-pared with 90Brsquos predictors our predictors have a betterperformance on the scope of applicability for testing the

Table 7 Entropy estimates for time-varying data

Data class Correct90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNgradual(02) 06345 05290 07808 07240 07790 06288 06289gradual(03) 07437 07378 09221 08416 09243 07430 07460gradual(04) 08645 08631 09786 09518 09739 08648 08637period(02) 06345 05537 07428 05537 07669 06205 06209period(03) 07437 07393 09218 08476 09233 07377 07375period(04) 08645 08639 09767 09632 09796 08653 08632sudden(02) 03219 03203 04663 03386 04484 03217 03229sudden(03) 05146 05110 05857 09984 07663 05110 05119sudden(04) 07370 07338 08699 09984 09389 07339 07345

Table 8 Entropy estimates for periodic sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 16458 00000 00000 11817 00079 000003 23318 00000 00000 15957 01315 000004 29147 00000 00000 18016 04748 000005 33269 00000 00000 14586 08898 000006 39092 00000 00000 08322 34944 000007 44908 00000 00000 03973 34960 000008 44919 00000 00000 02027 35408 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 6 Entropy estimates for real-world sources

Real-world sources90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNNRANDOMORG 09951 09963 09966 09976 09802 09954Ubldit TrueRNGpro 09979 09955 09973 09966 09934 09728Linux kernel entropy source 06173 01232 01269 06164 01230 03068Linuxdevurandom 09952 09935 09990 09964 09983 09911Windows RNG 09953 09986 09975 09984 09833 09853

Security and Communication Networks 15

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 16: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

datasets with long-range correlation as presented in Section411

43 Comparison on Execution Efficiency We implement ourpredictors and the final 90Brsquos predictors using Python 36and the version of TensorFlow is 1311 All the followingtests are conducted on a computer with Intel Core i7 CPUand 32GB RAM

Table 10 shows the mean execution time of our pre-dictors in comparison with that of the final 90Brsquos predictorsand the second draft of 90Brsquos predictors Each experimentalresult in Table 10 is the average value obtained from 50repeated experiments Note that the definitions of parametern s and k are the same as in Section 332

From the listedmean execution time with different scales( n s ) in Table 10 it can be seen that when n 106 themean execution time of our predictors is much lower andincreasing slower with any s than that of the final 90Brsquospredictors In other words the average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe final 90Brsquos predictors for different sample space s whenthe sample size n is 106 In particular when n 108 themean execution time given by final 90Brsquos predictors is farmore than our predictors regardless of the size of samplespace and is too long (over three days) to calculate theestimated results on the case sge 22

In terms of execution efficiency of 90Brsquos predictors wealso find that the mean execution time of the final 90Brsquospredictors is much higher than that of the second draft of90Brsquos predictors Actually the final 90Brsquos mean executiontime is about twice as much as the second draft of 90Brsquosiscould be caused by the characteristics of some estimatorswhich are limited to only for binary inputs Because thecollision estimator Markov estimator and compressionestimator are only suitable for binary input (0 or 1) as statedin [23] So for nonbinary inputs the 90Brsquos estimators will notonly calculate the original symbol entropy but also convert itinto binary input to calculate the bit entropy and finally getthe min-entropy is will greatly increase the mean exe-cution time

44 GeneralDiscussion For the most entropy sources whichhave been tested the RNN gives more accurate estimationsthan the FNN Better accuracy of the RNN predictive model

may be due to the following reasons On the one hand RNNadds the feedback connections to the network ie it con-siders not only the relationship between the current outputand the previous observations but also the relationshipamong the previous observations On the other hand RNNone-hot-encodes the training dataset for better forecastingcategorical data On the contrary for Markov sourcesM-sequence and nonuniform distribution by postprocessingusing LFSR the current output is only related to the previousobservations which fits the FNN predictive model well andthus the FNN provides more accurate estimated results

5 Conclusions and Future Work

Entropy estimation provides a crucial evaluation for thesecurity of RNGs e predictor serves as a universal sanitycheck for entropy estimation In this work we provideseveral new approaches to estimate the min-entropy forentropy sources using predictors based on neural networks(ie FNN and RNN) for the first time In particular wedesign a novel scheme for the proposed entropy estimationbased on neural network models including executionstrategy and parameter settings In order to evaluate thequality of the proposed predictors we collect various typesof simulated sources that belong to the stationary or non-stationary whose correct entropy of the source can be de-rived from the known probability distribution and thetheoretical result is further verified by the experiments of thereal-world sources We also compare our method with thepredictors defined in the NIST 800-90B (published in 2018)which is a commonly used standard for evaluating thevalidation of entropy sources Our assessment experimentsare carried out in three aspects namely accuracy scope ofapplicability and computational complexity e experi-mental results demonstrate that the entropy estimationobtained from our proposed predictors are more accuratethan that of the 90Brsquos predictors and our predictors have aremarkably wide scope of applicability In addition the

Table 9 Entropy estimates for multivariate M-sequences

Bit width90Brsquos predictors Our work

MCW Lag MultiMMC LZ78Y FNN RNN2 19010 20000 00000 20000 00005 000003 29906 30000 00000 24940 00000 000004 34037 40000 00000 40000 00021 000005 49753 50000 00000 12269 00041 000006 53916 60000 00000 12905 00394 000007 53916 60000 00000 19881 00280 000008 70000 70000 00000 06611 08635 00000lowaste result with italic font is used to analyze the applicability for the LZ78Ysources

Table 10 Comparison on execution efficiency of min-entropyestimation of our study and 90Brsquos predictors

n s Final 90B (s) Old 90B (s) Our predictors (s)106 211113864 1113865 921 561 136106 221113864 1113865 1 058 525 138106 231113864 1113865 1 109 574 149106 241113864 1113865 1 235 598 174106 251113864 1113865 1 394 630 190106 261113864 1113865 1 683 785 186106 271113864 1113865 2 077 938 264106 281113864 1113865 2 618 1 298 272108 211113864 1113865 52 274 47 936 9 184108 221113864 1113865 mdash mdash 9 309108 231113864 1113865 mdash mdash 9 385108 241113864 1113865 mdash mdash 9 836108 251113864 1113865 mdash mdash 10 986108 261113864 1113865 mdash mdash 13 303108 271113864 1113865 mdash mdash 17 649108 281113864 1113865 mdash mdash 20 759

16 Security and Communication Networks

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 17: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

computational complexity of ours is obviously lower thanthat of the 90Brsquos with the growing sample space and samplesize in theoretical analysis e average execution efficiencyof our predictors is about 7 to 10 times higher than that ofthe 90Brsquos predictors for different sample spaces when thesample size is 106 Specially the 90Brsquos predictors cannotcalculate out a result due to the huge time complexity whenthe sample space s is over 22 with the parameter of maximumstep k 16 and sample size n 108 relatively our method isable to provide a satisfied result towards the entropy sourceswith large sample space and long dependence

Future work is aiming at designing some specific neuralnetwork predictive models for min-entropy estimation forsome specific entropy sources Our future work will alsofocus on applying this new method to estimate entropy formore application areas like the randomness sources (sensorsand other sources) in mobile terminals

Data Availability

RANDOMORG data used to support the findings of this studycan be accessed from httpswwwrandomorg UblditTrueRNGpro Linux kernel entropy source and Linuxdevurandom andWindowsRNGdata used to support the findingsof this study can be obtained from the relevant listed references

Disclosure

A preliminary version of this paper appeared under the titleldquoNeural Network Based Min-entropy Estimation for Ran-dom Number Generatorsrdquo in Proc Security and Privacy inCommunication Networks-14th EAI International Con-ference SecureComm 2018 Singapore August 8ndash10 2018[36] Dr Jing Yang participated in this work when shestudied in Chinese Academy of Sciences and now she worksin China Information Technology Security EvaluationCenter Beijing China

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is work was supported by the National Key RampD Programof China (No 2018YFB0804300) National Natural ScienceFoundation of China (Nos 61872357 and 61802396) andNational Cryptography Development Fund of China (NoMMJJ20180113)

References

[1] I Kanter Y Aviad I Reidler E Cohen and M RosenbluhldquoAn optical ultrafast random bit generatorrdquoNature Photonicsvol 4 no 1 pp 58ndash61 2010

[2] P Li A Wang Y Guo et al ldquoUltrafast fully photonic randombit generatorrdquo Journal of Lightwave Technology vol 36 no 12pp 2531ndash2540 2018

[3] P Li K Li X Guo et al ldquoParallel optical random bit gen-eratorrdquo Optics Letters vol 44 no 10 pp 2446ndash2449 2019

[4] A Uchida K Amano M Inoue et al ldquoFast physical randombit generation with chaotic semiconductor lasersrdquo NaturePhotonics vol 2 no 12 pp 728ndash732 2008

[5] L Dorrendorf Z Gutterman and B Pinkas ldquoCryptanalysis ofthe random number generator of the windows operatingsystemrdquo ACM Transactions on Information and System Se-curity vol 13 no 1 pp 1ndash32 2009

[6] Z Gutterman B Pinkas and T Reinman ldquoAnalysis of thelinux random number generatorrdquo in Proceedings of the 2006IEEE Symposium on Security and Privacy (SampP 2006)pp 371ndash385 Berkeley CA USA May 2006

[7] M Vanhoef and F Piessens ldquoPredicting decrypting andabusing WPA280211 group keysrdquo in Proceedings of the 25thUSENIX Security Symposium pp 673ndash688 Austin TX USAAugust 2016

[8] A L Rukhin J Soto J R Nechvatal et al Sp 800-22 Rev 1a AStatistical Test Suite for Random and Pseudorandom NumberGenerators for Cryptographic Applications NIST SpecialPublication Gaithersburg MD USA 2010

[9] W Killmann and W Schindler AIS 31 Functionality Classesand Evaluation Methodology for True (Physical) RandomNumber Generators Version 31 T-Systems GEI GmbH andBundesamt fur Sicherheit in der Informationstechnik (BSI)Bonn Germany 2001

[10] G Marsaglia ldquoe marsaglia random number CDROM in-cluding the diehard battery of tests of randomnessrdquo 1996httpwwwstatfsuedupubdiehard

[11] P LrsquoEcuyer and R J Simard ldquoTestU01 a C library for em-pirical testing of random number generatorsrdquo ACM Trans-action on Mathematical Software vol 33 no 4 2007

[12] ISOIEC JTC 1SC 27 Berlin Germany ISOIEC 18031Information TechnologyndashSecurity TechniquesndashRandom bitGeneration 2011

[13] E Barker and J Kelsey ldquoNist draft special publication 800-90B recommendation for the entropy sources used forrandom bit generationrdquo 2012 httpcsrcnistgovpublicationsdrafts800-90draft-sp800-90bpdf

[14] M Baudet D Lubicz J Micolod and A Tassiaux ldquoOn thesecurity of oscillator-based random number generatorsrdquoJournal of Cryptology vol 24 no 2 pp 398ndash425 2011

[15] W Killmann and W Schindler ldquoA design for a physical RNGwith robust entropy estimatorsrdquo in Proceedings of the 10thInternational Workshop Cryptographic Hardware and Em-bedded SystemsmdashCHES 2008 pp 146ndash163 Washington DCUSA August 2008

[16] Y Ma J Lin T Chen C Xu Z Liu and J Jing ldquoEntropyevaluation for oscillator-based true random number gener-atorsrdquo in Proceedings of the 16th International WorkshopCryptographic Hardware and Embedded SystemsmdashCHES2014 pp 544ndash561 Busan South Korea September 2014

[17] Y Ma J Lin and J Jing ldquoOn the entropy of oscillator-basedtrue random number generatorsrdquo in Proceedings of theCryptographersrsquo Track at the RSA Conference pp 165ndash180Springer San Francisco CA USA February 2017

[18] P Li J Zhang L Sang et al ldquoReal-time online photonicrandom number generationrdquo Optics Letters vol 42 no 14pp 2699ndash2702 2017

[19] X Ma F Xu H Xu X Tan B Qi and H K Lo ldquoPost-processing for quantum random-number generators entropyevaluation and randomness extractionrdquo Physical Review Avol 87 no 6 pp 0623271ndash06232710 2013

[20] K Ugajin Y Terashima K Iwakawa et al ldquoReal-time fastphysical random number generator with a photonic

Security and Communication Networks 17

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks

Page 18: High-Efficiency Min-Entropy Estimation Based on Neural Network … · 2020. 2. 17. · the outputs of RNGs, such as the ISO/IEC 18031 [12] and AIS 31 [9]. ere are many types of methods

integrated circuitrdquo Optics Express vol 25 no 6pp 6511ndash6523 2017

[21] F Xu B Qi X Ma H Xu H Zheng and H-K Lo ldquoUltrafastquantum random number generation based on quantumphase fluctuationsrdquo Optics Express vol 20 no 11pp 12366ndash12377 2012

[22] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquo(Second draft) NIST special publication 800-90brecommendation for the entropy sources used for random bitgenerationrdquo 2016 httpscsrcnistgovCSRCmediaPublicationssp800-90bdraftdocumentssp800-90b_second_draftpdf

[23] M S Turan E Barker J Kelsey K McKay M Baish andM Boyle ldquoNIST special publication 800-90B recommenda-tion for the entropy sources used for random bit generationrdquo2018 httpnvlpubsnistgovnistpubsSpecialPublicationsNISTSP800-90Bpdf

[24] J Kelsey K A McKay and M S Turan ldquoPredictive modelsfor min-entropy estimationrdquo in Proceedings of the 17th In-ternational WorkshopCryptographic Hardware and EmbeddedSystemsmdashCHES 2015 pp 373ndash392 Saint-Malo FranceSeptember 2015

[25] S Aras and I D Kocakoccedil ldquoA new model selection strategy intime series forecasting with artificial neural networks IHTSrdquoNeurocomputing vol 174 pp 974ndash987 2016

[26] J P Donate X Li G G Sanchez and A S de Miguel ldquoTimeseries forecasting by evolving artificial neural networks withgenetic algorithms differential evolution and estimation ofdistribution algorithmrdquo Neural Computing and Applicationsvol 22 no 1 pp 11ndash20 2013

[27] J C Luna-Sanchez E Gomez-Ramırez K Najim andE Ikonen ldquoForecasting time series with a logarithmic modelfor the polynomial artificial neural networksrdquo in Proceedingsof the 2011 International Joint Conference on Neural NetworksIJCNN 2011 pp 2725ndash2732 San Jose CA USA 2011

[28] C de Groot and D Wurtz ldquoAnalysis of univariate time serieswith connectionist nets a case study of two classical exam-plesrdquo Neurocomputing vol 3 no 4 pp 177ndash192 1991

[29] I Goodfellow Y Bengio and A Courville Deep LearningMIT Press Cambridge MA USA 2016

[30] X Cai N Zhang G K Venayagamoorthy and D C WunschII ldquoTime series prediction with recurrent neural networkstrained by a hybrid PSO-EA algorithmrdquo Neurocomputingvol 70 no 13ndash15 pp 2342ndash2353 2007

[31] A Jain and A M Kumar ldquoHybrid neural network models forhydrologic time series forecastingrdquo Applied Soft Computingvol 7 no 2 pp 585ndash592 2007

[32] J M P Menezes Jr and G A Barreto ldquoLong-term time seriesprediction with the NARX network an empirical evaluationrdquoNeurocomputing vol 71 no 16ndash18 pp 3335ndash3343 2008

[33] P Hagerty and T Draper ldquoEntropy bounds and statisticaltestsrdquo 2012 httpscsrcnistgovcsrcmediaeventsrandom-bit-generation-workshop-2012documentshagerty_entropy_paperpdf

[34] S Zhu Y Ma T Chen J Lin and J Jing ldquoAnalysis andimprovement of entropy estimators in NIST SP 800-90b fornon-IID entropy sourcesrdquo IACR Transactions on SymmetricCryptology no 3 pp 151ndash168 2017

[35] A Menezes P C van Oorschot and S A VanstoneHandbook of Applied Cryptography CRC Press Boca RatonFL USA 1996

[36] J Yang S Zhu T Chen Y Ma N Lv and J Lin ldquoNeuralnetwork based min-entropy estimation for random numbergeneratorsrdquo in Proceedings of the 14th International

Conference Security and Privacy in Communication Net-worksmdashSecureComm pp 231ndash250 Singapore August 2018

18 Security and Communication Networks


Recommended