+ All Categories
Home > Documents > 1330 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ...

1330 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ...

Date post: 12-Apr-2022
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
9
1330 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 4, AUGUST 2012 ENF Extraction From Digital Recordings Using Adaptive Techniques and Frequency Tracking Ode Ojowu, Jr., Student Member, IEEE, Johan Karlsson, Member, IEEE, Jian Li, Fellow, IEEE, and Yilu Liu, Fellow, IEEE Abstract—A novel forensic tool used for assessing the authen- ticity of digital audio recordings is known as the electric network frequency (ENF) criterion. It involves extracting the embedded power line (utility) frequency from said recordings and matching it to a known database to verify the time the recording was made, and its authenticity. In this paper, a nonparametric, adaptive, and high resolution technique, known as the time-recursive iterative adap- tive approach, is presented as a tool for the extraction of the ENF from digital audio recordings. A comparison is made between this data dependent (adaptive) lter and the conventional short-time Fourier transform (STFT). Results show that the adaptive algo- rithm improves the ENF estimation accuracy in the presence of interference from other signals. To further enhance the ENF esti- mation accuracy, a frequency tracking method based on dynamic programming will be proposed. The algorithm uses the knowledge that the ENF is varying slowly with time to estimate with high ac- curacy the frequency present in the recording. Index Terms—Audio forensics, dynamic programming, electric network frequency (ENF) criterion, iterative adaptive approach (IAA). I. INTRODUCTION T HE use of digital recorders has become more prevalent in the world today due to the advancement in digital tech- nology and the signicant progress made in the eld of digital signal processing (DSP). Prior to the increased use of digital recorders, forensic audio analysis relied on different techniques of audio authentication. For instance, the magnetic signatures that are left by the erase, record or play heads on the magnetic tape of analog recorders can be used to verify the authenticity of such recordings. When it comes to digital recordings, alterations can be made very easily without leaving behind such imprints, because dig- ital recorders produce a recording by converting sound vari- Manuscript received December 02, 2011; revised February 28, 2012; accepted April 18, 2012. Date of publication May 02, 2012; date of current version July 09, 2012. This work was supported in part by the Swedish Research Council. This work made use of Engineering Research Center Shared Facilities supported by the Engineering Research Center Program of the National Science Foundation and DOE under NSF Award Number EEC-1041877 and the CURENT Industry Partnership Program. The work of Y. Liu was supported in part by Award 2009-DN-BX-K233, awarded by the National Institute of Justice, Ofce of Justice Programs, U.S. Department of Justice. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Dinei A. Florencio. O. Ojowu, Jr., J. Karlsson, and J. Li, are with the Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611-6130 USA (e-mail: ojowuode@u.edu; jkarlsson@u.edu; [email protected].edu). Y. Liu is with the Department of Electrical Engineering and Computer Sci- ence, University of Tennessee, Knoxville, TN 37996 USA (e-mail: liu@utk. edu). Digital Object Identier 10.1109/TIFS.2012.2197391 ations to a series of numbers, making authentication of these recordings a lot more difcult [1]. The importance of being able to verify the authenticity of a recording can be seen in litigation cases [2], where digital recordings are brought forward as evi- dence in a trial. Therefore, more reliable methods of verifying the authenticity of digital recordings need to be researched. The electric network frequency (ENF) criterion was proposed by Grigoras [2], [3] to address the issue of digital audio authen- tication. The ENF criterion is based on extracting the utility fre- quency or ENF from a digital audio recording and matching the extracted frequency estimate to a reference database in order to determine the authenticity and also time of the digital recording. This process is possible because, in some cases, digital recorders (even some battery powered recorders [4]), can pick up the au- dible sound that is generated by the oscillation of a power grid’s alternating current at this frequency. The frequency of oscilla- tion is approximately 60 Hz in the U.S., whereas in Europe it os- cillates at approximately 50 Hz. The corresponding harmonics of this frequency might also be present in the digital recording. The ENF criterion is based on two assumptions [5]. Firstly, the ENF for interconnected networks is the same at all points within the network. Secondly, the frequency varies randomly within a given interconnection, and hence, is not repeatable over a long period of time. There are three known methods of extracting the ENF over time from a digital recording [2], [3]. They are as follows. 1) Time/frequency domain analysis—This method is based on computing the spectrogram of the signal and visually com- paring it to the database. 2) Frequency domain analysis—This method is based on selecting the frequency location corresponding to the maximum amplitude of the power spectrum of segments (frames) of the data after applying a bandpass lter. 3) Time domain analysis—This method is based on mea- suring the zero crossings of the signal in the time domain after a bandpass lter has been applied to the recording. Recently in [6], a quadratic interpolation scheme was applied to the frequency domain analysis method to estimate the spectral peak locations (frequencies) more accurately. This reduces the estimation error resulting from the use of a xed grid size in the spectral estimation process. Besides the time-domain analysis, the mentioned methods es- timate the ENF based on computing the fast Fourier transform (FFT) of overlapping segments (frames) of the data known as the short-time Fourier transform (STFT), which is limited by the tradeoff between time resolution and frequency resolution [7]. Parametric methods such as the frequency selective ESPRIT, which give superior resolution compared to the FFT, can also be U.S. Government work not protected by U.S. copyright.
Transcript
Page 1: 1330 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ...

1330 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 4, AUGUST 2012

ENF Extraction From Digital Recordings UsingAdaptive Techniques and Frequency TrackingOde Ojowu, Jr., Student Member, IEEE, Johan Karlsson, Member, IEEE, Jian Li, Fellow, IEEE, and

Yilu Liu, Fellow, IEEE

Abstract—A novel forensic tool used for assessing the authen-ticity of digital audio recordings is known as the electric networkfrequency (ENF) criterion. It involves extracting the embeddedpower line (utility) frequency from said recordings andmatching itto a known database to verify the time the recording wasmade, andits authenticity. In this paper, a nonparametric, adaptive, and highresolution technique, known as the time-recursive iterative adap-tive approach, is presented as a tool for the extraction of the ENFfrom digital audio recordings. A comparison is made between thisdata dependent (adaptive) filter and the conventional short-timeFourier transform (STFT). Results show that the adaptive algo-rithm improves the ENF estimation accuracy in the presence ofinterference from other signals. To further enhance the ENF esti-mation accuracy, a frequency tracking method based on dynamicprogramming will be proposed. The algorithm uses the knowledgethat the ENF is varying slowly with time to estimate with high ac-curacy the frequency present in the recording.

Index Terms—Audio forensics, dynamic programming, electricnetwork frequency (ENF) criterion, iterative adaptive approach(IAA).

I. INTRODUCTION

T HE use of digital recorders has become more prevalent inthe world today due to the advancement in digital tech-

nology and the significant progress made in the field of digitalsignal processing (DSP). Prior to the increased use of digitalrecorders, forensic audio analysis relied on different techniquesof audio authentication. For instance, the magnetic signaturesthat are left by the erase, record or play heads on the magnetictape of analog recorders can be used to verify the authenticityof such recordings.When it comes to digital recordings, alterations can be made

very easily without leaving behind such imprints, because dig-ital recorders produce a recording by converting sound vari-

Manuscript received December 02, 2011; revised February 28, 2012;accepted April 18, 2012. Date of publication May 02, 2012; date of currentversion July 09, 2012. This work was supported in part by the SwedishResearch Council. This work made use of Engineering Research CenterShared Facilities supported by the Engineering Research Center Programof the National Science Foundation and DOE under NSF Award NumberEEC-1041877 and the CURENT Industry Partnership Program. The work ofY. Liu was supported in part by Award 2009-DN-BX-K233, awarded by theNational Institute of Justice, Office of Justice Programs, U.S. Department ofJustice. The associate editor coordinating the review of this manuscript andapproving it for publication was Dr. Dinei A. Florencio.O. Ojowu, Jr., J. Karlsson, and J. Li, are with the Department of Electrical

and Computer Engineering, University of Florida, Gainesville, FL 32611-6130USA (e-mail: [email protected]; [email protected]; [email protected]).Y. Liu is with the Department of Electrical Engineering and Computer Sci-

ence, University of Tennessee, Knoxville, TN 37996 USA (e-mail: [email protected]).Digital Object Identifier 10.1109/TIFS.2012.2197391

ations to a series of numbers, making authentication of theserecordings a lot more difficult [1]. The importance of being ableto verify the authenticity of a recording can be seen in litigationcases [2], where digital recordings are brought forward as evi-dence in a trial. Therefore, more reliable methods of verifyingthe authenticity of digital recordings need to be researched.The electric network frequency (ENF) criterion was proposed

by Grigoras [2], [3] to address the issue of digital audio authen-tication. The ENF criterion is based on extracting the utility fre-quency or ENF from a digital audio recording and matching theextracted frequency estimate to a reference database in order todetermine the authenticity and also time of the digital recording.This process is possible because, in some cases, digital recorders(even some battery powered recorders [4]), can pick up the au-dible sound that is generated by the oscillation of a power grid’salternating current at this frequency. The frequency of oscilla-tion is approximately 60 Hz in the U.S., whereas in Europe it os-cillates at approximately 50 Hz. The corresponding harmonicsof this frequency might also be present in the digital recording.The ENF criterion is based on two assumptions [5]. Firstly,

the ENF for interconnected networks is the same at all pointswithin the network. Secondly, the frequency varies randomlywithin a given interconnection, and hence, is not repeatable overa long period of time.There are three known methods of extracting the ENF over

time from a digital recording [2], [3]. They are as follows.1) Time/frequency domain analysis—This method is based oncomputing the spectrogram of the signal and visually com-paring it to the database.

2) Frequency domain analysis—This method is based onselecting the frequency location corresponding to themaximum amplitude of the power spectrum of segments(frames) of the data after applying a bandpass filter.

3) Time domain analysis—This method is based on mea-suring the zero crossings of the signal in the time domainafter a bandpass filter has been applied to the recording.

Recently in [6], a quadratic interpolation scheme was appliedto the frequency domain analysismethod to estimate the spectralpeak locations (frequencies) more accurately. This reduces theestimation error resulting from the use of a fixed grid size in thespectral estimation process.Besides the time-domain analysis, the mentioned methods es-

timate the ENF based on computing the fast Fourier transform(FFT) of overlapping segments (frames) of the data known asthe short-time Fourier transform (STFT), which is limited by thetradeoff between time resolution and frequency resolution [7].Parametric methods such as the frequency selective ESPRIT,which give superior resolution compared to the FFT, can also be

U.S. Government work not protected by U.S. copyright.

Page 2: 1330 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ...

OJOWU et al.: ENF EXTRACTION FROM DIGITAL RECORDINGS USING ADAPTIVE TECHNIQUES 1331

TABLE INOTATIONS

used successfully to extract the ENF from one frame to another.However, in the presence of significant interference within agiven frame, the parametric methods yield poor frequency es-timates because of their sensitivity to an assumed data model.This paper focuses on two methods of extraction. The first

builds upon the frequency domain analysis with quadratic in-terpolation. However, in place of the FFT, the spectrum is esti-mated for each segment of the data using a nonparametric andhigh resolution adaptive algorithm known as the iterative adap-tive approach (IAA) [8]. In the presence of interfering signalswith frequencies within the range of values the ENF can takeon, IAA yields more accurate estimates of the ENF comparedto the FFT as a result of the improved spectral resolution and in-terference suppression capability. The second method involvesapplying a frequency tracking algorithm based on discrete dy-namic programming [9], which takes into account the slowlyvarying nature of the ENF over time. This tracking algorithmis necessary because, in some frames of the data, the maximumspectral peak might correspond to an interference signal ratherthan the network frequency signal even within the acceptableENF limits. The ENF is then estimated inaccurately, which canresult in a false diagnosis that the recording in question has beenedited.It is worthwhile to point out that, in order for the proposed

methods to work, the ENF must be embedded in the recording,which is not always the case especially in some battery oper-ated recorders [4]. This is certainly a drawback of using the ENFcriterion for digital authentication. However, if the ENF is em-bedded in a digital recording, more reliable methods of extrac-tion need to be sought.Extraction can also be carried out using the harmonics of the

ENF signal for the frequency estimation process. In some cases,the harmonics may give better estimates because of a highersignal-to-interference-and-noise ratio compared to the funda-mental frequency.The remaining sections of this paper are organized as fol-

lows. In Section II, the network characteristics and the networkfrequency database are described. In Section III, the IAA andTRIAA algorithms are described along with the frequencytracking algorithm for ENF extraction. In Section IV, the ex-perimental results based on a set of digital audio recordings arepresented. Finally, Section V contains the conclusions drawnfrom the results.Notation: Boldface uppercase and lowercase letters are used

to denote matrices and vectors, respectively. See Table I formore details on notation.

TABLE IIABBREVIATIONS

Abbreviations: The abbreviations are presented for easy ref-erence in Table II.

II. NETWORK FREQUENCY CHARACTERISTICS AND DATABASE

The frequency at which alternating current is distributed tovarious customers from power stations corresponds to the utilityfrequency or ENF. For European and most Asian countries thevalue of this frequency is 50 Hz, while the value is 60 Hz inNorth America and several countries in South America. Japanuses both frequencies (50 and 60 Hz) for electricity distribution.This frequency is determined by the speed of rotation of theturbines used to drive the generators at the various power plants[11]. Naturally, the rotation speed is not constant and varieswithin a certain limit (approximately 0.05 Hz) dependingon the amount of load connected to the network and amountof power generated at a given time. Experiments carried outin some European countries [2], [12] have shown that thisfrequency variation is random and unique within specificgeographic locations. This uniqueness in frequency variationwithin a region, coupled with the fact that network frequencyis not repeatable over a long period of time, is what makes theaforementioned ENF criterion possible.A database of the network frequency is needed in order to

match the extracted ENF from a recording for verification. In[2], such a database is created by connecting the sound card ofa computer to a transformer which is then connected directlyto an ac power outlet. The database currently being built inNorth America involves deploying several sensors termed fre-quency disturbance recorders (FDRs), which perform accurateENF measurements, up to about 0.0005 Hz. The measureddata collected by the FDRs is transmitted over the internet toservers, where it can be analyzed and stored in a system termedthe information management system (IMS) [13]. This collectionforms the frequency monitoring network (FNET).There are two major interconnections in North America and

three minor interconnections. These regions have unsynchro-nized networks (frequency and phase) and are therefore con-nected via high voltage direct current lines (HVDC) [14]. TheEastern and Western interconnections form the major intercon-nections, while the Quebec, Texas and Alaska interconnectionsform the minor. The Alaska interconnection is isolated, in thesense that it is not connected to any of the other interconnec-tions. It is therefore generally not considered to be part of theNorth American grid. Fig. 1 shows the distribution of the FDRs

Page 3: 1330 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ...

1332 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 4, AUGUST 2012

Fig. 1. FDR distribution in North America [10].

in Western, Eastern, Quebec and Texas Interconnections. Fre-quency measurements collected by the FDRs in these intercon-nections show that the frequency pattern is different at a giventime from one interconnection to another. However, the fre-quency pattern is unique at different locations within each inter-connection [10]. The FNET system, therefore, provides a viableENF database.

III. EXTRACTION ALGORITHMS

A. Frequency Domain Analysis (STFT) [2]

Due to the fact that the ENF varies with time, the extrac-tion process involves analyzing a nonstationary data sequence.STFT is a common method for time-frequency analysis of sig-nals. This analysis assumes the signal of interest is stationarywithin short time windows (frames); the FFT of the signal isthen computed for each frame. The frequency domain analysis[2] method of extraction is based on this idea.The process involves resampling the audio signal to a lower

sampling rate, to reduce the computational complexity of theanalysis. A bandpass filter with a narrow bandwidth is appliedto the signal with center frequency 50/60 Hz as a preprocessingstep. The rest of the analysis is described as follows. Let

(1)

denote the resampled and filtered discrete-time signal. Thissignal is then split into overlapping frames as shown inFig. 2, with each frame having length and a shift from frameto frame of length . Using the frequency domain analysismethod, the ENF of the th frame is estimated by finding thefrequency that maximizes the spectrum of each frame which iscomputed using the FFT-based periodogram.In order to get a more accurate estimate of the frequency,

quadratic interpolation is used [6], [15]. This interpolationscheme involves fitting a quadratic model of the form

(2)

Fig. 2. Segmentation of data for STFT.

around the frequency point that maximizes the power spectrum

(3)

where corresponds tothe frequency grid point of a frequency grid with size , and

is power spectrum of the th frame.The value of that maximizes the model (2) is taken as

the estimated peak of the spectrum. This value is determinedby fitting the model to the highest sample of the power spec-trum and the two adjacent points with corresponding frequen-cies . This value of that maximizesthe model is

(4)

where

(5)

(6)

The corresponding frequency estimate of the th frame in Hz isgiven by

(7)

where is the sampling frequency (in Hertz) of the signal.The use of STFT will result in a tradeoff between frequency

resolution and time resolution. For a given frame length, thistradeoff can be optimized by applying a rectangular window toeach frame, which will provide the best spectral resolution at acost of higher side lobes compared to other spectral windows.In order to get improved spectral resolution over FFT, one has

to resort to using parametric methods or data-dependent (adap-tive) nonparametric methods for spectral estimation. Parametricmethods, on the one hand, are not robust against data model er-rors. On the other hand, nonparametric adaptive methods aremore robust, since they do not assume a specific parametricdata model. Well-known adaptive methods include the Capon

Page 4: 1330 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ...

OJOWU et al.: ENF EXTRACTION FROM DIGITAL RECORDINGS USING ADAPTIVE TECHNIQUES 1333

algorithm and the amplitude and phase estimation (APES) al-gorithm. These algorithms also provide higher resolution andlower side lobes than the periodogram. However, these methodsare inadequate because they require multiple realizations (snap-shots) of the random signal, which is not the case with the cur-rent data, as only one snapshot is available for frequency esti-mation. Spatial smoothing (segmenting and spectral averagingof the data) can be used to improve the spectral estimates ofthe Capon and APES algorithms in the one-snapshot case; butthe cost of doing this will be a degradation in the spectral res-olution, which is not desirable. The wavelet transform is alsoa common tool for time-frequency analysis. Contrary to theSTFT, which uses a fixed window size, the wavelet transformuses short windows at high frequencies and longer windows atlow frequencies. The wavelet transform is therefore not suitablefor our problem because we are interested only in a small rangeof frequencies.IAA is a nonparametric data-dependent algorithm based

on weighted least squares (WLS), originally presented in [8]for direction of arrival (DOA) estimation in array processing.The IAA algorithm is capable of yielding high resolution andlow side lobes even in the case of a single snapshot [8], hencemaking it suitable for estimating the ENF in the presence ofinterferences.

B. IAA and TRIAA

The ENF can be extracted with high accuracy in the presenceof interference using the IAA algorithm for a given frame. Theproposed ENF extraction process follows (2)–(7), with the FFTspectral estimate replaced by the IAA spectral estimate. TheIAA and TRIAA [16] used for spectral estimation of nonsta-tionary data will be discussed in this section.The spectral estimation problem can be setup as follows. Let

denote a uniformly sampled stationarydata sequence and , where

corresponds to a steering(frequency) vector, and cor-responds to a frequency grid point of a frequency grid with size. Also let , with

denoting the complex spectral estimates of at . The fol-lowing data model can be formulated:

(8)

where the noise contributions of are taken into account im-plicitly [8].The IAA algorithm solves for the spectral estimates by

minimizing the following quadratic cost function in (9) usingweighted least squares (WLS):

(9)

where

(10)

(11)

and , with for ,denoting the power estimate at each frequency grid point, given

by . 1 is the covariance matrix of the data andis the covariance matrix of the interference and noise, where in-terference refers to all the signals at frequency grid points otherthan the current grid point of interest . Minimizing the costfunction in (9) with respect to the forgives the following solution:

(12)

The solution in (12) can be rewritten as

(13)

using the Woodbury matrix identity2 and (10). This prevents thecomputation of the interference covariance matrix foreach frequency grid point. Note that the computation ofrequires the knowledge of and vice versa. Hence thisalgorithm is solved in an iterative manner, with the estimateof initialized using the FFT. This iterative algorithm takesabout 10 to 15 iterations to converge based on experimental andnumerical results.Note also that without accounting for the interference from

other frequency grid points (without weighting), minimizing thecost function in (9) for gives the discrete Fourier trans-form (DFT) of the signal

(14)

The IAA algorithm described is used for spectral estimation ofstationary data. Analogous to the STFT, the spectral content of anonstationary data sequence, such as (1), can be estimated usingthe TRIAA [16]. The signal is split into overlapping frames sim-ilar to Fig. 2 and the IAA spectral estimate is computed for eachframe. However, to reduce the computational complexity, eachsubsequent frame after the first frame is initialized with the spec-tral estimate of the previous frame instead of the FFT-based pe-riodogram as described in the IAA algorithm. The resulting al-gorithm yields better spectral resolution and lower side lobesthan the STFT.There is still a significant increase in the computational com-

plexity when using the TRIAA algorithm compared to usingSTFT for spectral estimation. This computational complexityis reduced slightly by reducing the number of iterations in sub-sequent frames for the TRIAA. This is because convergence ofthe estimated spectrum will occur in fewer iterations given thecurrent frame is initialized by the spectral estimate of the pre-vious frame. When the dataset is significantly large, the use ofthis algorithm is still impractical. The bottleneck of the TRIAAalgorithm is in the computation of the denominator in (13) foreach frame.In [18] and [19] the Toeplitz structure of the covariance ma-

trix is exploited and the computation of is performed

1 for ill-conditioned matrices [17].2Matrix inversion lemma.

Page 5: 1330 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ...

1334 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 4, AUGUST 2012

using the Gohberg-Semencul (GS) factorization of this matrix[7]. Moreover, the denominator is obtained via evaluating apolynomial. This reduces the computational complexity of thedenominator in (13) (which is the bottleneck of the IAA al-gorithm) from to floating point operations(flops) [18] for a given frame, without a loss in performance.The algorithm is termed the Fast IAA (FIAA), which is a signif-icant improvement but still computationally expensive for largedatasets. The computational complexity of IAA and FIAA are

and , respectively, where is thedata length and is the grid size, with .An approximate algorithm to the IAA algorithm with signif-

icantly faster computational time is described in [20] and re-ferred to as the Quasi-Newton IAA (QN-IAA). The QN-IAAalgorithm estimates the covariance matrix as if it were froma low-order autoregressive (AR) process, wherewith being the data (frame) length. The inversion of thelower order covariance matrix is carried out inplace of , yielding an approximate solution tothe IAA spectral estimate (13) with significant reduction in thecomputational complexity and just a slight degradation in theresolution. The computational complexity of this algorithm is

.The FIAA or QN-IAA can be used in a time-recursivemanner

for nonstationary data as is the case with the ENF signal. Thisalgorithm reduces the tradeoff between frequency resolutionand time-resolution for a given frame length compared to theFFT-based periodogram during the ENF extraction process. Theextraction process is the same as the frequency domain anal-ysis (2)–(7) with replaced by either of the aforementionedalgorithms.However, even if a good algorithm is used for frequency es-

timation based on (7), specific frames might be corrupted by in-terference signals with frequency components within the ENFlimits. This could lead to errors in frequency estimation, if thefrequency location corresponding to the maximum value of theestimated spectra belongs to an interference signal. A robustmethod of tracking the ENF that exploits the slowly varying na-ture of this frequency is needed. The next section describes theproposed frequency tracking algorithm.

C. Frequency Tracking

A method of estimating the ENF by tracking it from oneframe to another is formulated here from a mathematicalpoint of view. The proposed method uses discrete dynamicprogramming [9] to find a minimum cost path. A cost functionas shown in this section is selected which takes into accountthe slowly varying nature of the actual network frequency. Thiscost function penalizes significant jumps in frequency fromframe to frame and the corresponding path is used to estimatethe ENF.This algorithm involves finding the peak locations from the

spectrum of each frame and assigning costs based on the differ-ence between a peak location in one frame and a peak locationin another frame. The magnitude of the assigned cost is relatedto the difference in the frequency from one frame to another.

The minimum cost path from the first frame to the last frame iscomputed to estimate the ENF.To estimate the number of relevant peaks (sinusoids) in a

given frame, a model order selection tool known as the BayesianInformation Criterion (BIC) is used. The BIC for complex si-nusoids in noise is given by (refer to [7] and [21] for a fullderivation)

(15)

The number of peaks (real sinusoids) is estimated as the min-imizing argument of the above BIC criterion. The first term in(15) is a least squares data fitting term, which decreases as thenumber of estimated peaks increases, where the second termis a penalty term that prevents “overfitting” of the data model.Once the largest peaks and corresponding locations are de-termined in each frame, the frequency tracking problem is for-mulated and solved as follows.Assume that for a given frame , a set of estimated peak lo-

cations (frequencies) is denoted by .We would like to find a path such that andwhere the difference is as small as possible for

. This set corresponds to the estimated ENF over allframes and can be obtained as the minimizing argument in thefollowing optimization problem:

(16)

Calculating this cost using an exhaustive search is impractical.However, using dynamic programming [9] the path that mini-mizes this cost can be computed recursively and efficiently byminimizing the cost from a given frame , to the last frame,denoted by

(17)

This optimal cost satisfies the recursive equation

(18)

which can be calculated for , with theinitialization, . Note that

(19)

is the cost from the first frame to the last frame and theset that minimizes this cost function corresponds tothe extracted ENF signal as mentioned previously. Dynamicprogramming has a computational complexity of ,where corresponds to the total number of frames and

Page 6: 1330 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ...

OJOWU et al.: ENF EXTRACTION FROM DIGITAL RECORDINGS USING ADAPTIVE TECHNIQUES 1335

TABLE IIIPARAMETERS FOR THE EXPERIMENT

is the number of spectral peaks in the frame with the maximumnumber of peaks.

D. Matching Extracted ENF to Database

Once the ENF signal has been extracted, a method ofmatching the estimated signal to the database signal is required.The goal is to find the location/time within the database thatis similar in pattern to the extracted ENF. In [6], a methodbased on minimizing the squared error between the ENFand database is used for automated matching. A method ofcorrelation matching proposed in [22] for short digital record-ings (10–15 min) is used in place of this MSE method. Theprocess of correlation matching is described as follows. As-sume that is the extracted ENF signal and

corresponds to the database signal with. The matching process requires finding such that

(20)

where is the correlation coefficient between and the vector.

An important point to make is that the maximum correlationcoefficient is used here only for matching the estimatedENF to the database and comparing the accuracy (reliability) ofthe different algorithms presented. Once amatch has beenmade,determining locations of edits to a recording should be based onthe differences between the ENF estimate and the database.

IV. EXPERIMENTAL RESULTS

The algorithms presented in the previous section are appliedto two different digital audio datasets referred to as “Data1”and “Data2”. The two datasets are recorded simultaneously and,therefore, should contain the same ENF pattern over time. Thefirst data set (Data1) is acquired by connecting an electric outletvia a voltage divider directly to the internal sound card of adesktop computer, resulting in an ENF signal with a rather highsignal-to-interference-and-noise ratio. On the other hand, thesecond dataset (Data2) is an actual speech recording playedfrom a speaker and picked up by the internal microphone of alaptop computer.Each of these recordings are originally sampled at 44.1 kHz

at a bit rate of 16 bits per sample. Each dataset is resampled to441 Hz, hence keeping only the fundamental frequency (1st har-monic) and the two higher harmonics of the ENF. A bandpassfilter with a narrow bandwidth around the network frequency isapplied to the data to eliminate as much interference as possiblewithout distorting the ENF signal. Based on Fig. 2, each data

Fig. 3. Matching extracted ENF to database (Data1—scaled to 60 Hz).

Fig. 4. Matching extracted ENF to database (Data2—scaled to 60 Hz).

set is split using the values shown in Table III. This setup re-sults in an ENF estimate every second for a total of 30 min foreach dataset.An increase in the frame length improves the signal-to-noise

ratio of the signal [6] and the spectral resolution at the cost oflower time resolution. Therefore, a larger frame length is usedfor Data2 which has a weak ENF signal compared to Data1which has a strong ENF signal.Fig. 3 shows the extracted ENF signal (shifted by 0.05 Hz

for illustration purposes) from Data1, matched with the truthobtained from the FDRs, when the data set has not been al-tered in any form [using STFT and (7)]. Fig. 4 shows the ex-tracted ENF using the STFT-based method and our proposedmethod (also shifted for comparison purposes). Tables IV andVI give the maximum correlation coefficient of the var-ious methods for Data1 and Data2, respectively, also when thesignals have not been altered. The maximum correlation coeffi-cient values are used to compare the accuracy of the algorithmsand hence determine which is more reliable for ENF estima-tion. We have also included similar MSE (actually standard de-viation) analysis in Tables V and VII for the datasets, wherethe MSE is computed by averaging the squared difference be-tween the True ENF and the estimated ENF. It is important to

Page 7: 1330 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ...

1336 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 4, AUGUST 2012

TABLE IVCORRELATION COEFFICIENTS OF ALGORITHMS (DATA1)

TABLE VSTANDARD DEVIATION OF ERROR FOR ALGORITHMS (DATA1)

point out that the estimated ENF can sometimes have a con-stant offset [12], [22]. Therefore, the correlation is the preferredmethod for accuracy measure. The datasets used for this exper-iment do not have such an offset. They have also been madeavailable at http://www.sal.ufl.edu/download.html.

A. Data1 Analysis

Fig. 3 shows the extracted harmonic (180 Hz) of the ENFsignal scaled to 60 Hz and matched [using the location corre-sponding to the maximum correlation (20)] to the actual data-base frequency obtained from the FDRs. For each of the algo-rithms used, the third harmonic gave the most accurate resultsfor this dataset as shown in Table IV. This is because for a fixedgrid size, the estimation error when using the third harmonic isreduced by a factor of three compared to the fundamental fre-quency. Harmonics with frequencies higher than 180 Hz can beused for the estimation process at a cost of increased computa-tional complexity due to the increased sampling rate. Also fromTable IV, it can be seen that each of the STFT and TRIAA algo-rithms produce accurate estimates of the ENF using (7) becauseof the rather strong ENF signal. The signal at the second har-monic is weak relative to the first and third harmonics, and in afew frames the estimate was inaccurate. However, the frequencytracking algorithm mitigated these inaccuracies successfully bytracking the correct spectral peaks.The parametric method, frequency selective (F-ESPRIT) [7],

[23] also yields accurate estimates of the ENF for Data1 whenthe signal model assumes there is only one sinusoid per frame.However, this method and other parametric methods are not ap-propriate for ENF estimation in the presence of interference, be-cause they are sensitive to model assumptions.For this dataset, the STFT yields slightly better results, com-

pared to the adaptive method (TRIAA). This can be explainedby the fact that the periodogram is optimal for estimating spec-tral lines (sinusoids) in the presence of white noise when theyare well resolved [7]. However, when there are interfering sig-nals present, the poor resolution of the periodogram will yieldinaccurate estimates as is the case with Data2, a typical digitalrecording.

TABLE VICORRELATION COEFFICIENTS OF ALGORITHMS (DATA2)

TABLE VIISTANDARD DEVIATION OF ERROR FOR ALGORITHMS (DATA2)

Fig. 5. Power spectrum of one frame (Data2): poor resolution of FFT.

B. Data2 Analysis

For Data2, the second harmonic (120 Hz) is used to estimatethe ENF, because the first and third harmonics are too weakto be used for estimation. Table VI shows the maximum cor-relation coefficient values for the STFT and TRIAA using (7),the frequency tracking algorithm using the spectral peaks of theFFT and IAA and the parametric method (F-ESPRIT) with oneassumed sinusoid. The ENF estimation accuracy is improvedusing the adaptive method (IAA) because of improved spectralresolution for several frames. Fig. 5 shows a comparison of thespectrum of one frame of the Data2, where the poor frequencyresolution of the FFT results in a relatively poor estimate of thenetwork frequency compared to the IAA algorithm.Fig. 4 shows this extracted ENF harmonic using the STFT

and (7) matched with the database. From this figure, there areseveral frames where the ENF is estimated inaccurately, dueto the fact that the frequency corresponding to the maximumspectral peak for those frames do not correspond to the ENF.This can occur if there is another signal present with frequencywithin the limits of the acceptable range of the ENF as illustratedin Fig. 6. This figure shows that for both spectral estimationtechniques used (IAA, FFT) the ENF harmonic estimate using(7) will be 120 Hz, whereas the true frequency is approximately119.95 Hz.

Page 8: 1330 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ...

OJOWU et al.: ENF EXTRACTION FROM DIGITAL RECORDINGS USING ADAPTIVE TECHNIQUES 1337

Fig. 6. Power spectrum of one frame (Data2): strong interference signal.

Fig. 7. Extracted ENF via frequency tracking (Data2—scaled to 60 Hz).

This problem can be rectified using our dynamic program-ming-based frequency tracking the algorithm presented.Fig. 7 shows the spectral peak locations computed using the

TRIAA and the corresponding ENF estimate using dynamicprogramming. The estimate of the network frequency using thistracking algorithm is then matched to the database in Fig. 4,which provides a better match when compared to using (7),which can also be seen in this figure, Fig. 8 (absolute error) andalso from Table VI.A few important points to make are that the frequency

tracking algorithm uses the peak locations for each frameestimated either by the adaptive algorithm (IAA) or the FFT.The results show that the estimated ENF is more accuratewhen the peak locations of IAA are used. This is as a resultof the inaccurate estimates in some frames caused by the poorresolution of using FFT. Also, all the numbers presented can beimproved upon slightly by using the entire dataset (44.1 kHz)for analysis. For example, the STFT maximum correlation of0.9125 will be improved to 0.9158 without resampling, whichmay not be worth the increased computational complexity.

Fig. 8. Absolute error of algorithms (Data2): (a) STFT and (b) TRIAA(Track).

V. CONCLUSION

When it comes to digital audio verification, the reliabilityof the method used for authentication cannot be overempha-sized. This paper demonstrates a reliable method of extractingthe network frequency from a digital recording when the ENFcannot be extracted from some of the frames using the FFT-based periodogram either because of poor spectral resolution ora stronger interference signal within said frame. These problemswere solved by using an iterative adaptive method (IAA), whichprovides better spectral resolution than the FFT-based approach.Also, a frequency tracking method based on dynamic program-ming was used for accurate extraction of the ENF even in thepresence of a strong interference signals within ENF limits.From the results presented, the FFT gives slightly better es-

timates of the network frequency when the signal-to-interfer-ence-plus-ratio is very high as is the case with the first dataset.However, in most digital recordings, there will be significantinterferences from the recorded speech signals and other sur-rounding sounds that could lead to poor estimation performanceusing the FFT due to its poor resolution and high side lobe prob-lems. As the results have shown, the adaptive techniques andfrequency tracking method should be adopted for ENF estima-tion, especially in challenging environments.

ACKNOWLEDGMENT

The opinions, findings, and conclusions or recommendationsexpressed in this publication/program/exhibition are those ofthe author(s) and do not necessarily reflect those of the Depart-ment of Justice.

REFERENCES[1] R. C. Maher, “Audio forensic examination—Authenticity, enhanceme-

ment, and interpretation,” IEEE Signal Processing Mag., vol. 26, pp.84–94, Mar. 2009.

[2] C. Grigoras, “Digital audio recording analysis: The electric networkFrequency criterion,” Int. J. Speech Language Law, vol. 12, no. 1, pp.63–76, 2005.

[3] C. Grigoras, “Applications of ENF criterion in forensic audio, videoand telecommunications analysis,” Forensic Sci. Int., vol. 167, pp.136–176, 2007.

Page 9: 1330 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ...

1338 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 4, AUGUST 2012

[4] E. B. Brixen, “ENF—Quantification of the magnetic field,” in Proc.AES 33rd Conf., Audio Forensics—Theory and Practice, Denver, CO,Jun. 2008.

[5] E. B. Brixen, “Techniques for the authentication of digital audio record-ings,” in Proc. AES 122nd Conv., Vienna, Austria, 2007.

[6] A. J. Cooper, “The electric network frequency (ENF) as an aid toauthenticating forensic digital audio recordings—An automatedapproach,” in Proc. AES 33rd Conf., Audio Forensics—Theory andPractice, Denver, CO, Jun. 2008, pp. 1–6.

[7] P. Stoica and R. L.Moses, Spectral Analysis of Signals. Upper SaddleRiver, NJ: Prentice-Hall, 2005.

[8] T. Yardibi, J. Li, P. Stoica, M. Xue, and A. B. Baggeroer, “Sourcelocalization and sensing: A nonparametric iterative adaptive approachbased on weighted least squares,” IEEE Trans. Aerospace Electron.Syst., vol. 46, no. 1, pp. 425–443, Jan. 2010.

[9] U. Jönsson, C. Trygger, and P. Ögren, “Lecture Notes on Optimal Con-trol: Optimization and system theory,” unpublished.

[10] Liu, Z. Yuan, P. N. Markham, R. Conners, and Y. Liu, “Wide-areafrequency as a criterion for digital audio recording authentication,” inProc. IEEE Power Energy Soc. General Meeting, Jul. 2011, pp. 1–7.

[11] D. Rodríguez, J. Apolinário, and L. Biscainho, “Audio authenticity:Detecting ENF discontinuity with high precision phase analysis,” IEEETrans. Inform. Forensics Security, vol. 5, no. 9, pp. 534–543, Sep.2010.

[12] M. Kajstura, A. Trawinska, and J. Hebenstreit, “Application of theelectrical network frequency (ENF) criterion—A case of a digitalrecording,” Forensic Sci. Int., vol. 155, pp. 165–171, 2005.

[13] Y. Liu, “A US-wide power systems frequency monitoring network,”in Proc. IEEE Power Systems Conf. Expo., Atlanta, GA, Oct. 29–Nov.1 2006, pp. 159–166.

[14] N. G. Hingorani, “High-voltage DC transmission—A power elec-tronics workhorse,” IEEE Spectrum, vol. 33, pp. 63–72, Apr. 1996.

[15] J. O. Smith and X. Serra, “PARSHL an analysis/synthesis program fornon-harmonic sounds based on sinusoidal representation,” in Proc. Int.Computer Music Conf., San Francisco, CA, 2004.

[16] G. Glentis and A. Jakobsson, “Time-recursive IAA spectral estima-tion,” IEEE Signal Processing Lett., vol. 18, pp. 111–114, Feb. 2011.

[17] W. Roberts, P. Stoica, J. Li, T. Yardibi, and F. Sadjadi, “Iterative adap-tive approaches toMIMO radar imaging,” IEEE J. Select. Topics SignalProcess., vol. 4, pp. 5–20, Feb. 2010.

[18] M. Xue, L. Xu, and J. Li, “IAA spectral estimation: Fast implementa-tion using the Gohberg-Semencul factorization,” IEEE Trans. SignalProcess., vol. 59, no. 7, pp. 3251–3261, Jul. 2011.

[19] G. Glentis and A. Jakobsson, “Efficient implementation of iterativeadaptive approach spectral estimation techniques,” IEEE Trans. SignalProcess., vol. 59, no. 9, pp. 4154–4167, Sep. 2011.

[20] G. Glentis and A. Jakobsson, “Superfast approximative implementa-tion of the IAA spectral estimate,” IEEE Trans. Signal Process., to bepublished.

[21] P. Stoica, J. Li, and H. He, “Spectral analysis of nonuniformly sampleddata: A new approach versus the periodogram,” IEEE Trans. SignalProcess., vol. 57, no. 3, pp. 843–858, Mar. 2009.

[22] M. Huijbregtse and Z. Geradts, “Using the ENF criterion for deter-mining the time of recording for short digital audio recordings,” inProc. 3rd Int. Workshop Computational Forensics, IWCF’09, 2009,vol. 1, pp. 116–124.

[23] J. Gunarsson and T. McKelvey, “High SNR performance analysis ofF-ESPRIT,” in Conf. Rec. 38th Asilomar Conf. Signals, Systems Com-puters, Nov. 2004, vol. 1, pp. 1003–1007.

Ode Ojowu, Jr. (S’11) was born in Zaria, Nigeria,in 1984. He received the B.Sc. and M.Sc. degrees inelectrical engineering from Washington University,St. Louis, MO, in 2007. He is currently pursuing aPh.D. degree with the Department of Electrical En-gineering at the University of Florida, Gainesville.His primary research interests are in the areas of

spectral estimation and array signal processing.

Johan Karlsson (S’06–M’09) was born in Stock-holm, Sweden, in 1979. He received the M.S. degreein engineering physics and the Ph.D. degree from theRoyal Institute of Technology (KTH), Stockholm,Sweden, in 2003 in 2008, respectively. He spent theacademic year 2000 to 2001 as an exchange studentat Washington University, Saint Louis, MO, and didhis master thesis at the University of Minnesota,Minneapolis.In Fall 2003, he was a graduate student at the Di-

vision of Optimization and Systems Theory, KTH.From 2009 to 2011, he was with Sirius International, Stockholm, Sweden. Heis currently working as a Postdoctoral Research Associate in the Department ofComputer and Electrical Engineering, University of Florida, Gainesville. Hisresearch interests includes fundamental limitations in estimation, interpolation,and model reduction for applications in signal processing, control theory, andrisk assessment.

Jian Li (S’87–M’91–SM’97–F’05) received theM.Sc. and Ph.D. degrees in electrical engineeringfrom Ohio State University, Columbus, in 1987 and1991, respectively.From April 1991 to June 1991, she was an Adjunct

Assistant Professor with the Department of ElectricalEngineering, Ohio State University. From July 1991to June 1993, she was an Assistant Professor withthe Department of Electrical Engineering, Universityof Kentucky, Lexington. Since August 1993, she hasbeen with the Department of Electrical and Computer

Engineering, University of Florida, Gainesville, where she is currently a Pro-fessor. Her current research interests include spectral estimation, statistical andarray signal processing and their applications.Dr. Li is a Fellow of IET. She is a member of Sigma Xi and Phi Kappa Phi.

She received the 1994 National Science Foundation Young Investigator Awardand the 1996 Office of Naval Research Young Investigator Award. She was anExecutive Committee Member of the 2002 International Conference on Acous-tics, Speech, and Signal Processing, Orlando, FL, May 2002. She was an Asso-ciate Editor of the IEEE TRANSACTIONS ON SIGNAL PROCESSING from 1999 to2005, an Associate Editor of the IEEE Signal Processing Magazine from 2003to 2005, and a member of the Editorial Board of Signal Processing, a publica-tion of the European Association for Signal Processing (EURASIP), from 2005to 2007. She has been a member of the Editorial Board of Digital Signal Pro-cessing—A Review Journal, a publication of Elsevier, since 2006. She is a coau-thor of the papers that have received the First and Second Place Best StudentPaper Awards at the 2005 and 2007 Annual Asilomar Conference on Signals,Systems, and Computers in Pacific Grove, California. She is a coauthor of thepaper that has received the M. Barry Carlton Award for the best paper pub-lished in IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS in2005. She is also a coauthor of the paper that won the Lockheed-Martin BestStudent Paper Award at the 2009 SPIE Defense, Security, and Sensing Confer-ence, Orlando, FL, 2009.

Yilu Liu (S’88–M’89–SM’99–F’04) received theB.S. degree from Xian Jiaotong University andthe M.S. and Ph.D. degrees from the Ohio StateUniversity, Columbus, in 1986 and 1989.She is currently the Governor’s Chair at the Uni-

versity of Tennessee, Knoxville, and Oak Ridge Na-tional Laboratory. Prior to joining UTK/ORNL, shewas a Professor at Virginia Polytechnic Institute andState University (Virginia Tech). She led the effort tocreate the North America power grid monitoring net-work (FNET) at Virginia Tech which is now operated

at UTK and ORNL as GridEye. Her current research interests include powersystem wide-area monitoring and control, large interconnection level dynamicsimulations, electromagnetic transient analysis, and power transformer mod-eling and diagnosis.


Recommended