Sparse Frequency Extrapolation of Spectrograms...the Fourier domain. Plotting the magnitude of the...

Sparse Frequency Extrapolation of SpectrogramsJabran Akhtar and Karl Erik Olsen

Norwegian Defence Research Establishment (FFI)Box 25, 2027 Kjeller, Norway

Email: [email protected], [email protected]

Abstract—The short-time Fourier transform is a prevalentmethod used to analyze the frequency composition of a signalas a function of time. In order to achieve high resolution infrequency a large sliding window needs to be applied whichdegrades the time resolution. This paper proposes the adoptionof sparse reconstruction as a mean to extrapolate supplementaryvalues in time domain for each segment. Over short durations asignal’s frequency content is likely to contain a limited numberof effective frequencies and a sparse regeneration approach canbe advantageous as an extrapolating mechanism. An enlargednumber of samples can thus yield spectrograms with highfrequency resolution. The capabilities of the proposed techniquesare demonstrated on several synthetic and real data signals.

Index Terms—spectrogram, short-time Fourier transform,time-frequency analysis, super-resolution, signal extrapolation

I. INTRODUCTION

The short-time Fourier transform (STFT) is one of the clas-sical and widely used techniques to determine the frequencycontent of a signal over shorter time intervals. A limitedsection of a signal is extracted through the means of a slidingwindow, multiplied by a tapering function and transformed tothe Fourier domain. Plotting the magnitude of the transformas a function of time provides a spectrogram on how thefrequencies of the signal vary as a function of time [1]. Animportant aspect in this regard is the frequency resolution ofthe spectrogram which in essence is determined by the numberof data samples available for the transform. Employing a largerwindow with a larger set of samples increases the frequencyresolution at the expense of poorer time localization. Obtaininghigh accuracy in time as well as high frequency resolutionis therefore generally contradictory. Nevertheless, there havebeen several approaches presented in the literature on how toimprove upon the traditional STFT by using various alternativetime-frequency representations and transformations [2]–[6].

The last couple of years have also seen an increased focuson compressed sensing and sparse reconstruction techniquesbased on the L1-norm optimization [7], [8]. These methodshave typically been developed for use in various settings toreconstruct a signal or image where the data acquisition mayhave been carried out in a compressed or irregular manner.Under certain conditions a sparse reconstruction approach canguarantee perfect recovery even when parts of data may notbe available. In many applications, such as audio recordings,sampling and data collection in itself is not really a bottleneckissue rather detailed and accurate signal analysis is the moreprominent aspect.

In this paper we propose an alternative utilization of sparsereconstruction techniques for increasing the frequency res-olution of the STFT. The Fourier transform with a slidingwindow is retained as the main transformation component andthe frequency resolution is increased by a sparse extrapolationprocess in time domain. This provides an alternative resolu-tion enhancement strategy compared to previously proposedapproaches.

The motivation for selecting sparse reconstruction in thiscontext comes from the fact that a frequency transform overa restricted number of samples is likely to contain a limitednumber of preeminent frequency elements and can thus beconsidered to be sufficiently sparse. Sparse reconstructionprocedures may therefore be employed to extrapolate thedominant signal oscillations appropriately and narrow themdown more precisely in Fourier domain. Another incentive forextrapolation comes from the fact that applying a windowingfunction weights down data entries at the beginning and endof the sequence. In an enlarged extrapolated data set theextrapolated values are the ones who are heavily scaled downwhile the potential of the original data can be utilized fully.The idea of extrapolating each segmented signal to increaseits time-frequency resolution has also been proposed earlierin e.g. [2], [9] using more standard signal modeling andweighted norm approaches. Some recent applications of sparsereconstruction techniques with inter- and extrapolations havebeen applied in phase array antennas [10] and in multifunctionradar [11] settings to compensate for data gaps and improveoverall system performance.

A particular feature often linked with sparse reconstructionis that the obtained results are indeed sparse, i.e. containsignificant number of exact zero values. In order to regen-erate and retain the original properties of the spectrogramwe additionally propose a merger of the extrapolated datawith real data. This has the benefit that noise, less prevalentsignal frequencies and other inaccuracies are fully preserved.Several simulated and real world examples are examined todemonstrate the principles introduced in this paper.

II. SPECTROGRAM MODEL

We consider a discrete-time signal p(t) ∈ CT×1, t =1, ..., T sampled at regular intervals for which a time-frequency representation through STFT is desired. For this, ashorter segment x(t)k ∈ CN×1, t = 1, ..., N with N samplesis extracted from within p(t). It is assumed that k = 1, ...,Kand K denotes the total number of segmented intervals; the

2017 25th European Signal Processing Conference (EUSIPCO)

ISBN 978-0-9928626-7-1 © EURASIP 2017 1175

exact value of K depending upon the lengths of the signaland the segmented window alongside the amount of overlapbetween two consecutive sections. In the reminder of the textwe use xk ∈ CN×1 to denote x(t)k for any particular valueof k.

For further processing xk is typically multiplied element-wise, designated by �, with a windowing function w ∈ CN×1

and afterwards it may potentially also be zero-padded. Per-forming a Fourier transform results in s(ω)k expressed as:

sk = F(w � xk) ∈ CN×1. (1)

F is the discrete Fourier matrix of size N × N , Fm,n =exp(−2πjmn/N). The above process is independent acrosseach time segment and may be executed through an FFT tomake it computationally more efficient. Stacking together allsegments we arrive to the STFT matrix:

S(k, ω) = [s1 ... sK ] ∈ CN×K . (2)

A. Sparse Extrapolation

Assuming each segmented spectrum sk contains relativelimited number of active frequencies, or is sufficiently sparsewithin a tolerance level, one can attempt to extrapolate it intime domain. An extrapolation process constructs additionalsamples from available data and in this proposition the proce-dure needs to materialize with respect to the main dominatingfrequencies as only the extrapolation of these frequencies canforce the spectrum to remain sparse. The optimal solution willthus be the one that maximizes sparsity in frequency while stillpreserving, to a certain extent, the original signal’s integrity,as specified later.

The new regenerated profile for a given segment is specifiedby xk ∈ CL×1 and the relationship between time domainand frequency domain is as previously governed by

sk = F(w � xk) ∈ CL×1 (3)

where F is now an L× L Fourier matrix and w ∈ CL×1 isthe windowing function. L indicates the length of the entireextrapolated segment, where L > N and is chosen freely. Forsimplicity we presume

Q = L−N (4)

is an even number and expresses the total number of ex-trapolated samples. From the original segment, Q/2 numberof samples are therefore extrapolated on both ends in timedomain.

For the sparse reconstruction process we furthermore definea binary selection matrix M ∈ BN×L by taking an L × Lidentity matrix IL×L and removing the first Q/2 and the lastQ/2 rows. This eliminates the respective rows for which nosamples are available. The purpose of the selection matrix isto extract values from positions that contain data.

The extrapolated and regenerated solution should have com-parable values to those measured at their respective placementswhich can be expressed as

Mxk = xk. (5)

With windowing functions incorporated the requirement be-comes

M(w � xk) = (Mw)� xk, (6)

or equivalently

M F∗F(w � xk) = (Mw)� xk

G sk = w � xk, (7)

where w = Mw ∈ CN×1 is the truncated tapering functionand G is the partial inverse Fourier matrix G = MF∗ ∈CN×L with F∗ ∈ CL×L being the inverse Fourier matrix.

As the STFT can be presumed to be reasonably sparsefor each segment, the optimal sparse solution sk must befound with respect to frequency domain. The extrapolatingregenerating process can under convex relaxation therefore beformulated as

sk = arg min ||sk||1 (8)s.t. || G sk − w � xk ||2 ≤ ε (9)

where ε is acceptable error and || ||1 indicates the L1 norm.The constrain (9) is a relaxed version of (7) in order toaccommodate for the presence of noise and other inaccuracies.(8) and (9) together form a standard sparse reconstructionproblem where the selection of ε determines the nature ofthe solution. Generally, the tolerance level may be set relativeto the average noise floor. A larger value can provide moreflexibility in determining a sparse solution though it may thenalso deviate somewhat from the measured data set. On theother hand, a tighter constrain on ε forces the solution to bemore closer to the original data which may retain peculiarproperties including for example noise.

Stacking together the optimal solutions from each segmentprovides the regenerated STFT matrix with a greater numberof bins in frequency:

S(k, ω) = [s1 ... sK ] ∈ CL×K . (10)

We remark that partial Fourier matrices have been well studiedin the literature and have been shown to provide capableoutcomes in sparse reconstruction applications where alsoseveral efficient algorithms have also been proposed on solvingthese types of problems [7], [12], [13]

B. Merged Extrapolation

For general purpose signal analysis a possible drawbackwith the extrapolated solution (10) is the inherit sparsity. Byselecting an appropriate value for ε noise and other inaccura-cies can be eliminated from the sparse solution which is alsoimportant as an extrapolation of e.g. noise is typically notdesired. Nevertheless, for many algorithms and detailed spec-trogram analysis the more subtle fluctuations and alterationswithin the original noisy signal may still remain of interest.A strategy to alleviate these issues is to first determine anextrapolated solution and then utilize the obtained extrapolatedsamples only for extensional purposes in time domain wherethe original signal data remains preserved unaltered in the


ISBN 978-0-9928626-7-1 © EURASIP 2017 1176

k

Freq

Original (zero padded)

0.5 1 1.5 2

x 104

0

0.5

1

1.5

2

2.5

3

dB

−60

−50

−40

−30

−20

−10

0

Fig. 1: Audio signal: standard spectrogram, N = 16

middle. This can be accomplished by transforming sk backto time domain

(w � x(t)k) = F∗ sk (11)

which is then merged with the original data incorporating thetapering function

(w � x(t)k) ={w � x(t−Q/2)k, Q/2 < t < L−Q/2w � x(t)k, otherwise

(12)

where t now runs through t = 1, ..., L. The time domainsolution accordingly contains the original signal, windowedcorrespondingly, in the center. A Fourier transform

sk = F (w � x(t)k) (13)

across all segments yields the final merged spectrogram:

S(k, ω) = [s1 ... sK ] ∈ CL×K . (14)

The merged spectrogram permits usage of standard filtering,detection and classification algorithms who may otherwiserequire modifications for sparse spectrograms.

III. RESULTS AND DISCUSSION

A. Audio test signal

To demonstrate the principles of the proposed sparse extrap-olation approach, a clean audio recording of a male voice at 8kHz was taken advantage from the freely available NOIZEUSdatabase [14]. The samples were first run through the standardSTFT with a window length of 16 samples and a hop size ofonly 1 sample between segments. This provides a versatile andlarge trial set of over 22500 STFT time bins. Each segmentwas tapered with the Hanning window.

The standard spectrogram of the original signal can be seenin figure 1 which was also zero-padded by 48 to bring thenumber of bins in frequency to 64. The limitations of a smallwindow size are nevertheless very noticeable as the resolutionin frequency is not sufficient to clearly separate the variouscomponents of the audio signal without a lot of smearing.

The spectrogram obtained through the sparse solutions, asper the procedure described in the previous section, with anextrapolation of 24 samples on each side is depicted in figure2. The sparse reconstruction process was carried out using theSPGL1 [13] algorithm and with ε = 0.05||wxk|| to sanction a

k

Freq

Sparse extrapolation

0.5 1 1.5 2

x 104

0

0.5

1

1.5

2

2.5

3

dB

−60

−50

−40

−30

−20

−10

0

10

Fig. 2: Sparse extrapolated spectrogram, N = 16, L = 64

k

Freq

Merged extrapolated

0.5 1 1.5 2

x 104

0

0.5

1

1.5

2

2.5

3

dB

−60

−50

−40

−30

−20

−10

0

10

Fig. 3: Extrapolated merged spectrogram N = 16, L = 64

k

Freq


0.5 1 1.5 2

x 104

0

0.5

1

1.5

2

2.5

3−60

−50

−40

−30

−20

−10

0


five percent norm deviation from the original extracted signalfor each segment. All other parameters, including the lengthof the sliding window, are kept identical to those of thestandard spectrogram. As one can observe in the figure themajor features of the voice sample stand out and are nowmuch more clearly located at specific frequency bands. Thesolution is also sufficiently sparse for association and audioanalysis purposes. Note that extrapolation process contributeswith additional integration gain and the power levels are givenrelative to the standard spectrogram. The merged solutioncombining real and extrapolated data is given in figure 3 whichis now no longer sparse but due to extra samples offers asignificant improvement over the original spectrogram in termsof frequency resolution. The convenience of having augmentedextrapolated samples is substantial with more than 10dB.

Audio signals are commonly analyzed with various win-dowing lengths. The second case therefore inspects a windowlength of 64 pulses, still with a hop of 1 sample. 64 sampleswere extrapolated on each side for each segment by thesparse reconstructing process. The original plot, zero-padded


ISBN 978-0-9928626-7-1 © EURASIP 2017 1177

k

Freq


0.5 1 1.5 2

x 104

0

0.5

1

1.5

2

2.5

3

dB

−60

−50

−40

−30

−20

−10

0

10

Fig. 5: Sparse extrapolated spectrogram, N = 64, L = 192

k

Freq

Merged extrapolated

0.5 1 1.5 2

x 104

0

0.5

1

1.5

2

2.5

3

dB

−60

−50

−40

−30

−20

−10

0

10

Fig. 6: Extrapolated merged spectrogram, N = 64, L = 192

ARMA extrapolated

Freq

k

0.5 1 1.5 2

x 104

0

0.5

1

1.5

2

2.5

3

dB

−70

−60

−50

−40

−30

−20

−10

0

10

Fig. 7: ARMA extrapolated spectrogram, N = 64, L = 192

to length of 192 samples, can be seen in figure 4. Thisresembles the previously extrapolated figure 3 where manyof the same frequencies stand out. The sparse extrapolationprocess (figure 5) correctly splits the various bands and a fulldiscrimination is now possible. This remains the case evenfor the merged solution of figure 6. Subjectively, listening tothe sparse or the merged sample sounds very similar to theoriginal recording.

To compare the outcome against more traditional meth-ods, each segment of the signal was extrapolated, on bothends, through two independent ARMA(10,40) processes usingProny’s method [1]. A total of 64 additional samples weregenerated on each side, and the final spectrogram can be seenin figure 7. It can be observed that the model has not been ableto divide the major frequency bands as successfully and thereis marked leakage. For comparison, the standard spectrogramwith 192 sliding window samples can be seen in figure 8.

k

Freq

Original

0.5 1 1.5 2

x 104

0

0.5

1

1.5

2

2.5

3

dB

−60

−50

−40

−30

−20

−10

0


B. Noisy phonocardiogram signal

In order to investigate the performance under more de-manding circumstances, a highly noisy simulation of a fetalphonocardiogram (PCG) recording sampled at 1 kHz with anSNR of -15.1dB was put to use [15]. The main objectiveto demonstrate that sparse reconstruction can also be highlyuseful in challenging noisy conditions where traditional ex-trapolation breaks down rapidly.

The original spectrogram with sliding Hanning window of24 samples with a hop of 8 samples can be depicted in figure9. The more abnormal properties as well as the low frequencyheartbeats appear in the spectrogram though the latter is moreeasily observable in the magnified portion of the plot on theright side. The noise is otherwise quite dominating and thesparse reconstruction process must therefore take that intoaccount as a highly sparse solution on it’s own may notbe able to capture all activity. A possible choice for ε cantherefore be ε = 0.5||wxk|| to allow for an up to fifty percentnorm disparity from each of the original segmented signals.The result of sparse extrapolation with this selection can beseen in figure 10 and the merged solution in figure 11. Theoutcome is not sparse as some noise is retained, except atpositions enclosing high frequency anomalies. Otherwise, themain heart beating frequencies have clearly been enhancedand narrowed down in frequency. This is further evident inthe hybrid spectrogram where the frequency spread is muchcramped.

To realize more sparser images the acceptable error canbe raised, albeit that may come at the expense of somewhatreduced sensitivity to the subordinate frequencies due to thevery low SNR. Figures 12 and 13 illustrate this in practicewhere ε = 0.95||wxk|| is applied. The main features of theheart beats and the abnormalities are nevertheless preservedand stand out easily distinguishable even if the intensity levelsare reduced.

Overall, the proposed sparse extrapolation and merger tech-niques have successfully managed to generate spectrogramswith high frequency resolution using sliding window of thesame length.

IV. CONCLUSION

The short-time Fourier transform is an important tool inmany signal processing applications and improving the timeand frequency localization is of great interest. In this paper


ISBN 978-0-9928626-7-1 © EURASIP 2017 1178

k

Fre

q


200 400 600 800 1000 1200

0

0.5

1

1.5

2

2.5

3

dB

−50

−40

−30

−20

−10

0

(a) Fullk

Fre

q


0 50 100 150 200 250 300 350 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

dB

−50

−40

−30

−20

−10

0

(b) MagnifiedFig. 9: PCG: standard spectrogram, N = 24

k

Fre

q


200 400 600 800 1000 1200

0

0.5

1

1.5

2

2.5

3

dB

−50

−40

−30

−20

−10

0

10

(a) Fullk

Fre

q


50 100 150 200 250 300 350 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

dB

−50

−40

−30

−20

−10

0

10

(b) MagnifiedFig. 10: Sparse extrapolated, N = 24, ε = 0.5||wxk||

k

Fre

q

Merged extrapolated

200 400 600 800 1000 1200

0

0.5

1

1.5

2

2.5

3

dB

−50

−40

−30

−20

−10

0

10

(a) Fullk

Fre

q

Merged extrapolated

50 100 150 200 250 300 350 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

dB

−50

−40

−30

−20

−10

0

10

(b) MagnifiedFig. 11: Extrapolated merged, N = 24, ε = 0.5||wxk||

a technique based on sparse extrapolation of signals wasproposed for this objective. Each segment of the signal isextrapolated in time in order to retrieve a sparse representationin frequency. This allows for a simple yet effective strategy toattain super-resolution sparse spectrograms. The extrapolateddata may further be merged with the original statistics to obtainnon-sparse high-resolution spectrograms. The practicability ofthis was demonstrated through several examples.

REFERENCES

[1] M. H. Hayes, "Statistical Digital Signal Processing and Modeling".New York: Wiley, 1996.

[2] G. Thomas and S. D. Cabrera, “Resolution enhancement in time-frequency distributions based on adaptive time extrapolations,” in Proc.of International Symposium on Time- Frequency and Time-Scale Anal-ysis, 1994, pp. 104–107.

[3] J. Nam, G. Mysore, J. Ganseman, K. Lee, and J. S. Abel, “A super-resolution spectrogram using coupled PLCA,” in Proc. of the 11thConference of the International Speech Communication Association,2010.

k

Fre

q


200 400 600 800 1000 1200

0

0.5

1

1.5

2

2.5

3

dB

−50

−40

−30

−20

−10

0

(a) Fullk

Fre

q


50 100 150 200 250 300 350 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

dB

−50

−40

−30

−20

−10

0

(b) MagnifiedFig. 12: Sparse extrapolated, N = 24, ε = 0.95||wxk||

k

Fre

q

Merged extrapolated

200 400 600 800 1000 1200

0

0.5

1

1.5

2

2.5

3

dB

−50

−40

−30

−20

−10

0

(a) Fullk

Fre

q

Merged extrapolated

50 100 150 200 250 300 350 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

dB

−50

−40

−30

−20

−10

0

(b) MagnifiedFig. 13: Extrapolated merged, N = 24, ε = 0.95||wxk||

[4] R. Maleh and F. A. Boyle, “Exploiting spectral leakage for spectrogramfrequency super-resolution,” in Proc. of Asilomar Conference on Signals,Systems and Computers, 2013.

[5] M. I. Mandel and Y. S. Cho, “Audio super-resolution using concatenativeresynthesis,” in Proc. of IEEE Workshop on Applications of SignalProcessing to Audio and Acoustics, 2015, pp. 1–5.

[6] G. Schamberg, D. Ba, M. Wagner, and T. Coleman, “Efficient low-rank spectrotemporal decomposition using ADMM,” in IEEE StatisticalSignal Processing Workshop, 2016.

[7] E. Candès, J. Romberg, and T. Tao, “Stable signal receovery fromincomplete and inaccurate measurments,” Communication in Pure andApplied Mathematics, vol. 59, pp. 1207–1223, 2006.

[8] E. J. Candès and C. Fernandez-Granda, “Towards a mathematical theoryof super-resolution,” Communications on Pure and Applied Mathemat-ics, vol. 67, no. 6, pp. 906–956, 2014.

[9] X. Dong and Z. Zhu, “Digital extrapolation spectral analysis based onARMA model,” in Proc. of the National Aerospace and ElectronicsConference, 1992.

[10] L. Anitori, W. van Rossum, and A. Huizing, “Array aperture extrapola-tion using sparse reconstruction,” in IEEE Radar Conference, 2015, pp.237–242.

[11] J.Akhtar and K. E. Olsen, “Formation of range-doppler maps based onsparse reconstruction,” IEEE Sensors Journal, vol. 16, no. 15, pp. 5921–5926, Aug. 2016.

[12] N. Y. Yu and Y. Li, “Deterministic construction of Fourier-basedcompressed sensing matrices using an almost difference set,” EURASIPJournal on Advances in Signal Processing, pp. 805–821, Oct. 2013.

[13] E. van den Berg and M. P. Friedlander, “Probing the pareto frontier forbasis pursuit solutions,” SIAM Journal on Scientific Computing, vol. 31,no. 2, pp. 890–912, 2008.

[14] Y. Hu and P. Loizou, “Subjective evaluation and comparison of speechenhancement algorithms,” in Speech Communication, 2007, pp. 588–601.

[15] M. Cesarelli, M. Ruffo, M. Romano, and P. Bifulco, “Simulation offoetal phonocardiographic recordings for testing of FHR extractionalgorithms,” Computer Methods and Programs in Biomedicine, vol. 107,no. 3, pp. 513–523, Sept. 2012.


ISBN 978-0-9928626-7-1 © EURASIP 2017 1179

Date post:	23-Apr-2020
Category:	Documents
Upload:	others
View:	12 times
Download:	0 times

Sparse Frequency Extrapolation of Spectrograms...the Fourier domain. Plotting the magnitude of the...

Documents