+ All Categories
Home > Documents > Spectrogram Segmentation 2

Spectrogram Segmentation 2

Date post: 20-Apr-2015
Category:
Upload: ricardo-ismael-vazquez-rios
View: 39 times
Download: 0 times
Share this document with a friend
11
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 12, DECEMBER 2002 2915 Spectrogram Segmentation by Means of Statistical Features for Non-Stationary Signal Interpretation Cyril Hory, Nadine Martin, and Alain Chehikian Abstract—Time–frequency representations (TFRs) are suitable tools for nonstationary signal analysis, but their reading is not straightforward for a signal interpretation task. This paper inves- tigates the use of TFR statistical properties for classification or recognition purposes, focusing on a particular TFR: the Spectro- gram. From the properties of a stationary process periodogram, we derive the properties of a nonstationary process spectrogram. It leads to transform the TFR to a local statistical features space from which we propose a method of segmentation. We illustrate our matter with first- and second-order statistics and identify the information they, respectively, provide. The segmentation is operated by a region growing algorithm, which does not require any prior knowledge on the nonstationary signal. The result is an automatic extraction of informative subsets from the TFR, which is relevant for the signal understanding. Examples are presented concerning synthetic and real signals. Index Terms— distribution law, maximum likelihood, region growing technique, statistical pattern recognition, time–frequency analysis. I. INTRODUCTION T HIS paper investigates a new method for the interpreta- tion of nonstationary processes. This issue concerns the problem of defining an automatic process to support a decision from the analyzed signal. It is, for instance, the case of fault detection in industrial control but also in many domains of ap- plication. The relevant information to be extracted from a non- stationary signal is included in the time evolution of its spec- tral content. Techniques based on time or frequency represen- tations of the signal are not appropriate to provide such infor- mation. Several analysis methods called time–frequency repre- sentations (TFRs) have been proposed to represent a signal in a hybrid space [7], [15]. A TFR displays the energy content of a signal along both time and frequency dimensions. The com- ponents of the analyzed signal are described in this space by structures called spectral patterns. In the literature, many approaches have been proposed to design automatic interpretation technique involving TFR. Two main classes can be drawn according to the position of the time–frequency (TF) tool in the interpretation method. In the first class, TFRs are fitted toward the objectives of the method. This is the case, for example, of methods based on reassigned TFRs [4], which provides an increased readability, or adaptive Manuscript received August 2, 2001; revised July 8, 2002. The associate ed- itor coordinating the review of this paper and approving it for publication was Dr. Chong-Yung Chi. C. Hory and N. Martin are with the Laboratoire des Images et des Sig- naux (LIS), UMR 5083, CNRS-INP Grenoble, Grenoble, France (e-mail: [email protected]). A. Chehikian is with the Université Joseph Fourier, Grenoble, France. Digital Object Identifier 10.1109/TSP.2002.805489 kernels of Cohen distributions [2], [5]. The method we propose in this paper lies within the second class of method, where the T-F interpretation is considered as a post-processing. In that case, the interpretation task does not have any influence on the performances of the T-F analysis as resolution or variance. It is relevant if designed from the inner properties of the TFR. We already proposed in [17] a processing of a TFR based on mathematical morphology tools. This method was efficient for straight spectral patterns segmentation but could not succeed in detecting slowly varying structures because of the use of the gradient function as shown in [14]. We present here a new method that is adapted to narrowband as well as wideband components. We propose to exploit the TFR sta- tisticalpropertiestoextractandcharacterizespectralpatternswith no a priori knowledge on the analyzed signal. The methods men- tioned above consider the T-F plane in a global manner. We con- sider here the nonstationarity described by the spectral patterns in a local approach. The basic principle is to extract local statistical features from the TFR points. Features are selected such that spec- tral patterns aggregate in the so-called features space. This way, spectral patterns are not characterized by an iso-energy level but by a local T–F coherency. We will see that, in the particular case of TFR, the variance of the patterns increases with its energy level. Therefore, our approach concerns T–F structures that could not be extracted by a constant thresholding. Furthermore, the proposed segmentation is blind toward the analyzed signal. This is of im- portance in industrial applications. The TFR chosen is the spectrogram because its statistical properties have been derived for stationary processes [11], [13]. We extend this study to nonstationary processes and propose a general method of interpretation based on the derived TFR statistical model. Then, we propose a first set of features and study their statistical properties. This provides a description of the features space and allows one to build an appropriate region growing method of segmentation combined with data analysis criteria. Examples on synthetic and natural signals are presented to illustrate the region growing algorithm and validate the effi- ciency of the method. II. METHOD Each location of a TFR is characterized by an energy level called T–F coefficient. Connected points form regions that de- fine the spectral patterns. Segmenting a TFR consists of de- ciding whether a coefficient belongs to some deterministic com- ponent region or to noise (or background) region. To perform such a decision, one needs more than the energy level, and this is for two reasons. First, the Heisenberg–Gabor inequality en- sures that the energy content of the signal at instant and fre- 1053-587X/02$17.00 © 2002 IEEE
Transcript
Page 1: Spectrogram Segmentation 2

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 12, DECEMBER 2002 2915

Spectrogram Segmentation by Means of StatisticalFeatures for Non-Stationary Signal Interpretation

Cyril Hory, Nadine Martin, and Alain Chehikian

Abstract—Time–frequency representations (TFRs) are suitabletools for nonstationary signal analysis, but their reading is notstraightforward for a signal interpretation task. This paper inves-tigates the use of TFR statistical properties for classification orrecognition purposes, focusing on a particular TFR: the Spectro-gram. From the properties of a stationary process periodogram,we derive the properties of a nonstationary process spectrogram.It leads to transform the TFR to a local statistical features spacefrom which we propose a method of segmentation. We illustrateour matter with first- and second-order statistics and identifythe information they, respectively, provide. The segmentation isoperated by a region growing algorithm, which does not requireany prior knowledge on the nonstationary signal. The result is anautomatic extraction of informative subsets from the TFR, whichis relevant for the signal understanding. Examples are presentedconcerning synthetic and real signals.

Index Terms— 2 distribution law, maximum likelihood, regiongrowing technique, statistical pattern recognition, time–frequencyanalysis.

I. INTRODUCTION

T HIS paper investigates a new method for the interpreta-tion of nonstationary processes. This issue concerns the

problem of defining an automatic process to support a decisionfrom the analyzed signal. It is, for instance, the case of faultdetection in industrial control but also in many domains of ap-plication. The relevant information to be extracted from a non-stationary signal is included in the time evolution of its spec-tral content. Techniques based on time or frequency represen-tations of the signal are not appropriate to provide such infor-mation. Several analysis methods called time–frequency repre-sentations (TFRs) have been proposed to represent a signal ina hybrid space [7], [15]. A TFR displays the energy content ofa signal along both time and frequency dimensions. The com-ponents of the analyzed signal are described in this space bystructures called spectral patterns.

In the literature, many approaches have been proposed todesign automatic interpretation technique involving TFR. Twomain classes can be drawn according to the position of thetime–frequency (TF) tool in the interpretation method. In thefirst class, TFRs are fitted toward the objectives of the method.This is the case, for example, of methods based on reassignedTFRs [4], which provides an increased readability, or adaptive

Manuscript received August 2, 2001; revised July 8, 2002. The associate ed-itor coordinating the review of this paper and approving it for publication wasDr. Chong-Yung Chi.

C. Hory and N. Martin are with the Laboratoire des Images et des Sig-naux (LIS), UMR 5083, CNRS-INP Grenoble, Grenoble, France (e-mail:[email protected]).

A. Chehikian is with the Université Joseph Fourier, Grenoble, France.Digital Object Identifier 10.1109/TSP.2002.805489

kernels of Cohen distributions [2], [5]. The method we proposein this paper lies within the second class of method, where theT-F interpretation is considered as a post-processing. In thatcase, the interpretation task does not have any influence on theperformances of the T-F analysis as resolution or variance. Itis relevant if designed from the inner properties of the TFR.We already proposed in [17] a processing of a TFR based onmathematical morphology tools. This method was efficient forstraight spectral patterns segmentation but could not succeedin detecting slowly varying structures because of the use of thegradient function as shown in [14].

We present here a new method that is adapted to narrowband aswellaswidebandcomponents.Weproposetoexploit theTFRsta-tisticalpropertiestoextractandcharacterizespectralpatternswithnoa priori knowledge on the analyzed signal. The methods men-tioned above consider the T-F plane in a global manner. We con-siderhere thenonstationaritydescribedby thespectral patterns ina local approach. The basic principle is to extract local statisticalfeaturesfromtheTFRpoints.Featuresareselectedsuchthatspec-tral patterns aggregate in the so-called features space. This way,spectral patterns are not characterized by an iso-energy level butbya localT–Fcoherency.Wewill see that, in theparticularcaseofTFR, the variance of the patterns increases with its energy level.Therefore,ourapproachconcernsT–Fstructuresthatcouldnotbeextracted by a constant thresholding. Furthermore, the proposedsegmentation is blind toward the analyzed signal. This is of im-portance in industrial applications.

The TFR chosen is the spectrogram because its statisticalproperties have been derived for stationary processes [11], [13].We extend this study to nonstationary processes and proposea general method of interpretation based on the derived TFRstatistical model. Then, we propose a first set of features andstudy their statistical properties. This provides a description ofthe features space and allows one to build an appropriate regiongrowing method of segmentation combined with data analysiscriteria. Examples on synthetic and natural signals are presentedto illustrate the region growing algorithm and validate the effi-ciency of the method.

II. M ETHOD

Each location of a TFR is characterized by an energy levelcalled T–F coefficient. Connected points form regions that de-fine the spectral patterns. Segmenting a TFR consists of de-ciding whether a coefficient belongs to some deterministic com-ponent region or to noise (or background) region. To performsuch a decision, one needs more than the energy level, and thisis for two reasons. First, the Heisenberg–Gabor inequality en-sures that the energy content of the signal at instantand fre-

1053-587X/02$17.00 © 2002 IEEE

Page 2: Spectrogram Segmentation 2

2916 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 12, DECEMBER 2002

Fig. 1. Overview of the method.

quency lays on a neighborhood of this point in the TFR. Thus,the TFR coefficient cannot fully describe the signal content atinstant and frequency . Second, spectrogram coefficients arerandomly corrupted by embedding noise power.

We thus propose to associate other features than the energylevel toeachlocation. Inordertotakeintoconsiderationtheuncer-tainty principle, we define features involving sets of spectrogramcoefficients.These featuresarechosenasstatisticsof the spectro-gram coefficients. We associate to each point a cell of neighborcoefficients. Itssize mustbesmall regardingtothetotalnumberof points in order to ensure a local description of the TFR. A setof features is then computed over each cell. Segmenting the TFRrequires a label to be associated with each spectrogram location,namely, noise or signal plus noise. As this segmenting procedureismadedifficultbyboththeHeisenberg–Gabor inequalityandthenoise corrupting the energy level, we rather propose to transformthe spectrogram energy level. In this space, which is referred toas features space, each T–F location is positioned with respectto the statistical features extracted from the neighboring cell, asdisplayedFig.1.Clustersare formed in the featuresspacebycellsof similar statistical properties. We perform a region growingalgorithm that operates a segmentation by associating a commonlabel to connected points in the TFR having same properties inthe features space. The region growing technique is free of tuningparameters. It is known to be stable with respect to noise [1], [3].Thus, it can be applied to a huge range of signals without specifi-cations. It providesacharacterizationof thespectralpatternswithnoa priori knowledge about their situation and orientation in theT-Fspacebutbasedontheirmagnitudevariations.Noadjustmentin time is necessary, contrary to most of the existing processings,which require an estimation of the pattern initial time. We choosethe features such that they provide descriptive parameters of theT-F structures.

These features, as combinations of random variables, arerandom variables. In the features space, they aggregate as clus-ters whose location and dispersion are, respectively, measuredby their expected value and variance. We propose a theoreticalstudy of these statistical properties which allows to foresee thefeatures space position of the spectral patterns.

III. T OWARDS A STATISTICAL INTERPRETATION

We derive in this section the probability density function(PDF) of the spectrogram coefficients of a deterministic

sequence embedded in white Gaussian noise. We then proposea local statistical model of spectrogram.

A. Spectrogram Statistical Properties

Let consider the signal , which is the sum of a deter-ministic discrete sequence of samples and of a whiteGaussian process of zero mean and variance :

with (1)

The discrete spectrogram at time and frequency ofis the periodogram of this signal weighted by a windowof samples

(2)

When is a boxcar window, which is equal to one whenand zero elsewhere, the spectrogram coeffi-

cients of the white Gaussian noise are known to be centralwith two degrees of freedom and proportionality parameter

[11], [13]

if and (3)

if tends toward infinity [13]. It is well known [10] that PDFis a gamma distribution

(4)

where is the proportionality parameter, and isthe number of degrees of freedom.

The sequence of (1) is a set of independent Gaussianvariables of nonzero means and variance . We extendthe proof proposed in [13] by considering nonzero means andconclude that is a noncentral with two degrees offreedom, noncentral parameters , and proportionalityparameter

if and (5)

See Appendix A for the expression of the noncentraldistri-bution. The central of (3) is a special case of the noncentral

with a null noncentral parameter.

Page 3: Spectrogram Segmentation 2

HORY et al.: SPECTROGRAM SEGMENTATION BY MEANS OF STATISTICAL FEATURES 2917

Fig. 2. Cell description. White squares are central� spectrogram coefficients and black squares are noncentral� spectrogram coefficients.

Note that a small number of spectrogram coefficients canhave different distributions due to the use of any time window[11]. We do not take them into account in the distribution mod-eling. The behavior of random variables is fully characterized bytheir moments. We propose in Appendix A a general expressionof the th moment about zero of a noncentraldistribution. Inparticular, the expected value and variance of are de-rived from (31):

(6)

Var (7)

They increase linearly with the noncentral parameter .This parameter describes the content of the deterministicsignal alone at instant and frequency . If is anonstationary process, then the nonstationarity to be analyzedis contained in the above moments by the noncentral parameter

. The set of noncentral parameters is thus a signature ofthe nonstationarity.

B. Local Statistical Model

Each cell is a set of TFR coefficients havingPDFs with noncentral parameters and variance .Thus, the parent variable associated with has aPDF defined by unknown parameters. As far as we wantto define features as statistics of this parent variable, a modelof the cell must be defined to reduce the number of unknownparameters.

Consider a cell1 , where out of the points containenergy of the deterministic signal. Each one of theassoci-ated coefficients is a noncentral with noncentral parameter

. As the size of is small with regard to the size of

1Indexes (n; k) are now omitted when dealing with a single cell without con-fusion.

the TFR (which guarantees the local approach), the energy con-tribution of the deterministic signal can be considered to bevarying slowly over the coefficients. The noncentral pa-rameters can then be approximated by the same parameter,which is the mean of the over the cell

(8)

The – other coefficients are central . Therefore, eachcoefficient of is a sample of random variable

with probability and a sample of randomvariable with probability (see Fig. 2). Thanksto total probabilities formula, each coefficient of can beconsidered to be a sample of the parent variable, whosePDF is a mixture of PDFs:

(9)

where is the PDF of a random variable. Equation (9)is the statistical model we propose to apply to the cell. Underthe assumption of (8), the PDF of the parent variable dependson the only three unknown parameters, , and . The lin-earity property of the Fourier transform leads to write the firstcharacteristic function of the parent variable as

(10)

The th moment of the parent variable is derivedfrom (10):

(11)

Expressions of and , th moments of the noncentral andcentral distributions are derived from (31) of Appendix A,

Page 4: Spectrogram Segmentation 2

2918 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 12, DECEMBER 2002

with , , and for , and for .We thus have

(12)

where is the local signal to noise ratio over the cell.Note that this interpretation is common in signal theory wherethe noncentral parameter is assimilated to a signal-to-noise ratio(SNR).

Finally, each cell defined over the T-F space is described byits parent variable. The spectrogram coefficients of the cell aresamples of this random variable. Its moments, which are ex-pressed by (12), depend on two parameters:

• : ratio of spectrogram coefficients which bearthe deterministic component energy;

• : SNR over the cell.The simultaneous variations of these parameters over the wholeT-F space is related to the variations of along theand dimensions. They characterize the magnitude variationsof the signal components. We want to extract features whosestatistical behavior, which is related to the variations ofand, provides an obvious discrimination of points belonging to

spectral patterns of different magnitude variations. In order tolimit the noise effects on the TFR readability, we select as a firstfeature the expected value of the parent variable

(13)

This feature is relevant for characterizing T-F regions of low-en-ergy density variations. We propose to combine this processingwith the extraction of a second feature: the standard deviationof the cell

Var (14)

This feature characterizes high-energy density variations overthe T-F space.

IV. STATISTICAL PROPERTIES OF THEFEATURES

In the previous section, features are defined as the expectedvalue and the standard-deviation of the parent variables associ-ated to each local cell in the TFR. In this section, we proposeestimators of these features. We also give expressions of theirfirst two moments. These statistical properties are necessary fordescribing the clusters which would be obtained in the FeaturesSpace. Moreover, we discuss the influence of time windowon TFR coefficient distribution, given that the theory derivedin the previous section assumes independence of these coeffi-cients.

A. Local Mean

Assuming ergodicity, the feature of (13) is estimated by ex-tracting the empirical mean of the cell

(15)

As we will show in Appendix B, it is an unbiased estimatorof . When the cell contains energy of the deterministiccomponent, the distribution of is the -times convolution ofdistribution (9) with a nonnull noncentral parameterif spec-trogram coefficients are independent. We do not provide the an-alytical expression of this distribution, but in Appendix B, wederive expressions of the first and second moments of, whichare, respectively, order one and two polynomials inand . Con-sidering the case of cells containing only noise spectrogram co-efficients ( and ), (35) and (36) of Appendix A takethe form

(16)

Var (17)

as we have already shown in [8]. In this noise-only case,isa random variable as it is proven with a matrixformulation in [11].

B. Coefficients Correlation

In many situations, the T-F space presents redundancy of in-formation that signifies correlation between T-F coefficients.Let us consider the spectrogram of a white Gaussian process.Its coefficients along time axis are correlated if the time win-dows overlap [19], [20]. They are asymptotically uncorrelatedalong frequency axis [13], but the use of a weighting windowalso introduces correlation along the frequency axis [11]. Wediscuss the deviation to the theoretical PDF of featurefroma simulation study.

Four white Gaussian process spectrograms are generated with50% window length overlapping or without overlapping with aboxcar or a Hanning window. Spectrograms are about

coefficients. The correlation of the spectrogram coeffi-cients decreases as the time window length increases. The cor-relation due to the window length is thus assumed to be negli-gible by using 1024 points windows. Fig. 3 presents, for eachspectrogram, the theoretical PDF compared withthe histograms of . Fig. 3(a) concerns the spectrogram whosecoefficients are uncorrelated. In Fig. 3(b), the correlation is dueto the use of a Hanning window and, in Fig. 3(c), to the over-lapping. In Fig. 3(d), we combine both sources of correlation.The introduction of correlation induces an increasing of the dis-persion of the data histogram. Johnsonet al. [11] show that theconsequence of the use of a Hanning window is a convolutionof several PDFs with various proportionality coefficients.This produces a smoothing of the PDF. Time windows overlap-ping have the same incidence on the PDF shape. The histogramsmoothing in Fig. 3(d), due to both overlapping and use of theHanning window, is not more important than the other casesbecause the Hanning window reduces the correlation along thetime axis [19], [20].

We will see in Section V that our segmentation procedureis controlled by the PDF of feature extracted from cellscontaining only noise energy. The theoretical PDF derived inSection III is not valid in the presence of correlation, which ismostly the case. Therefore, the effect of correlation on the truePDF must be considered. Let us suppose that featuresare

with unspecified and . We show in [9] that the

Page 5: Spectrogram Segmentation 2

HORY et al.: SPECTROGRAM SEGMENTATION BY MEANS OF STATISTICAL FEATURES 2919

Fig. 3. Comparison between histogramsF (+) and its theoretical PDF (dashed lines) for a white Gaussian noiseN (0; 10) computed with cells of 3� 5 points.Each spectrogram is computed with a 1024-points-long time window and without zero padding. The PDF is estimated by maximum likelihood (plain lines).

maximum likelihood estimators and of a dis-tribution are accurately approximated by

(18)

(19)

with , , andis the number of spectrogram coefficients. Statisticsandare sufficient for the number of degrees of freedom and the non-central parameter. The white Gaussian process variance isthen efficiently estimated by with a low computation cost.Fig. 3 also shows the central distributions estimated by (18)and (19). Table I gives the mean ofand on 100 realizationsfor each spectrogram configuration and shows that correlationinduces a decreasing of the number of degrees of freedom.

One can conclude that whatever the configuration of the spec-trogram computation, the noise power can accurately be esti-mated by considering the measuretaken over noise cell as a

variable with .In presence of deterministic components, the histogram of

is a mixture of central and noncentral PDFs. A binary clas-sification scheme based on the expectation-maximization algo-rithm, for instance, might be used to identify data of the mixture.This would lead to a TFR segmentation without characterizingvariations of the spectral pattern parameters. We, on the otherhand, propose to add a second characterizing featureto theprocessing.

TABLE IESTIMATION OF THE NUMBER OF DEGREES

OF FREEDOM OF A� DISTRIBUTION OF A SPECTROGRAMLOCAL MEAN F IN

THE CASE OF A WHITE GAUSSIAN NOISEN (0; 10) AND ESTIMATION

OF ITS POWER. THEORETICAL NUMBER OF DEGREES OFFREEDOM IS2 � N = 30 WITH N = 3 � 5

C. Local Standard Deviation

Assuming ergodicity, the feature of (14) is extracted by esti-mating the standard deviation of the cell

(20)

The first and second moments of are given by (40) and (41)of Appendix B. The case is the case of a noise cell.Expressions (40) and (41) take the form [8]

(21)

Var (22)

Page 6: Spectrogram Segmentation 2

2920 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 12, DECEMBER 2002

Fig. 4. Noise cluster spread. (a) Spectrogram of a white Gaussian process of zero mean and variance� = 10 and (b) the associated features space (F , F ).

Equations (16) and (21) show that the noise points form a clusterin the Features Space located around ( ,

). Fig. 4 illustrates this result inthe case of a white Gaussian process of zero-mean and variance

with . As expected, by (16) and (21), thecluster is located around ( , ).

Let consider now the case of a deterministic component em-bedded in a white Gaussian noise. Parametersand increasewhen the cell glides from noise points to a deterministic compo-nent pattern. The evolution of and is described interms of and by (35) and (41). A simulated network of curves( , ) parametered by and is displayed in Fig. 5to illustrate this evolution. The signature of a deterministic spec-tral pattern in the Features Space is a curved cluster spread fromthe noise area to nonzeroand area, depending on the patternmaximum magnitude and on the pattern size regarding to thecell size. The shape of the cluster depends on the simultaneousvariations of et . It describes the magnitude variations of thespectral pattern. Consider a spectral pattern with constant localSNR. It is an extreme case of sharp edge spectral pattern. Whenthe cell glides through the pattern, the proportionincreasesfirst and then decreases, whereas the local SNRis constant.Its representation in the Features Space is a cluster following aniso- (plain line).

The local approach allows one to simplify the statisticalmodel of the cell. The derived properties of the features dependon the two characterizing parametersand . This provides adescription of the Features Space that must be used for both thesegmentation of the TFR and the description of the extractedspectral patterns magnitude variations.

V. SEGMENTATION IN THE FEATURESSPACE

Before describing the segmentation algorithm, we discuss thechoice of the cell size. Examples on simulated and real data arepresented to validate the method.

A. Cell Size

The wayfeatures aggregate in the Features Space depends onthe cell size. On one hand, a small cell regarding to the TFR sizeensures a local approach. On the other hand, the spread of theFeatures Space clusters decreases when the cell size increases

Fig. 5. Theoretical grid (EfF g, EfF g) computed with a noise variance� = 10 and a cell ofN = 7�3 points. The point (EfF g = 9:3,EfF g =10) is the noise expected value. The 15 values of parametersr andp are regularlyspaced between, respectively, [0,6] and [0,1] (+). Circles are VarfF g at r 2[0:43; 1:29;3; 6] andp 2 [0:07;0:21;0:5; 1].

since and are consistent estimators of the moments. Thisinduces an increased separability of the data in the FeaturesSpace. A local characterization by means of large cell requireslarge amounts of overlapping and zero padding. The counterpartof overlapping and zero padding is the smoothing of the data dueto the increasing of the TFR coefficients correlation. We definethe cell size as the correlation support of its central point whichdepends on the spectrogram configuration: size and form of thetime window, overlapping, and zero padding. A compromise isprovided in this way between the cell size and the dispersionof the features due to correlation. This choice also permits thecharacterization of each point by its region of influence.

This correlation is equivalent to the amount of redundancy inthe TFR. This redundancy is quantified by the square modulusof the reproducing kernel , which is defined by

(23)

where is the weighting window of (2) delayed intime and shifted in frequency. It measures the influence of

Page 7: Spectrogram Segmentation 2

HORY et al.: SPECTROGRAM SEGMENTATION BY MEANS OF STATISTICAL FEATURES 2921

the point ( ) on the point ( ), [15]. We thus define thecell by

TFR

(24)where is a threshold above which correlation is not negligible.As we work on discrete TFRs, the accuracy of this threshold isnot necessary, namely, 10 appears to be a good choicefor the approximation of the correlation support in number ofpoints. The spread of the cell corresponds to the T–F uncertaintyof the TFR.

B. Region Growing Algorithm

The full segmentation algorithm is described in Fig. 6. Weconsider in the following that in the T–F space, each determin-istic signal region is separated from others by a noise region.Such a signal region is extracted by a mechanism in which alabel is associated to one (or more) well-chosen point called aseed. This seed is propagated to the neighborhood, provided thatthe neighborhood has similar properties than the seed itself. Thepropagation operates by associating the seed label to contami-nated points. Usually, this implies that we have available a sim-ilarity criterion between points to be contaminated. In our case,because the noise region properties may be derived from thenoise parameters estimates, we choose to propagate until adegree of similarity imposed to the unlabeled points is reached.This criterion induces an implicit definition of contours sepa-rating deterministic component patterns from background noisein the TFR.

Figs. 7 and 8 illustrate our matter with synthetic signals. InFig. 7, the signal is composed of three patterns of similar spreadembedded in a white Gaussian noise. The central spectral pat-tern is smoothed, although the two others present sharp edges.The right-hand component is of lower magnitude. In Fig. 8, thesignal is composed of a narrowband pattern and a wideband pat-tern embedded in a white Gaussian noise. During the whole pro-cedure, unlabeled points (that is labeled by zero) are assumed tobelong to noise regions, and the associateds are consideredas having a PDF. Two main steps are iterated untilunlabeled points s fit to a PDF.

• Step 1) Defining the propagation limit:Estimation of noise PDF. At iteration , the number of

degrees of freedom and the noise power are estimated bythe ML estimators and of (18) and (19). Superscript

stands for theth iteration.Segmentation limit.In the Features Space, we determine a

confidence region that corresponds to the noiseregion. Assuming a detection error probability, we define

from the such that Prob .Fig. 7(c) shows the limits defined at each one of the three itera-tions required for that example. Outside this confidence region,we define what we call the working area, where points are can-didates for the segmentation. The working area is defined by

and and implicitly determines a mask in theTFR, where patterns could be labeled.

Theoretical grid.The theoretical network ( , )is computed from (35) and (41) with [see Fig. 7(c)]. Pa-

Fig. 6. Segmentation algorithm.

rameters and are regularly spaced in, respectively, [0,1] and[0, ].

Step 2) Propagating the seeds.The propagation operates as alower level iterative procedure composed of two steps.

Seeds extraction.Seeds have to be selected in the previouslydefined working area. An histogram of the Features Space datais computed with circular bins of radius from(36) centered on each samples ( , ) ofthe network. For the highest ratio, points belonging to the firstbin with the highest signal to noise ratioare chosen as seedsof the region [circles of Fig. 7(c)]. This way, seeds are selectedin the inner part of a deterministic spectral pattern (highandhigh ). A common label is assigned to each one of the seedsbelonging to a common bin of the histogram.

Seeds propagation.From seeds previously extracted in theFeatures Space, the propagation operates in the TFR. Each seed

Page 8: Spectrogram Segmentation 2

2922 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 12, DECEMBER 2002

Fig. 7. Segmentation of the spectrogram of a synthetic signal containing three narrow bands components embedded in a white Gaussian noise of variance� =

19:5. The central spectral pattern is smoothed, although the two others present sharpened edges. (a) Spectrogram containsN = 99 � 124 coefficients. (c)First theoretical grid (-o-) and the limits of propagation (plain lines) computed withP = 0:01 are superimposed to the features space. (b) Extracted regions arerepresented in a (d) split features space .

contaminates the candidates out of its eight nearest neighbors inthe TFR by assigning them the same label. These points contam-inate their neighborhood again. The PDF is again estimatedwithout the recently contaminated points. This propagation isvalidated by means of a Kolmogorov–Smirnov test [16], whichcontrols the adequation of the unlabeled with this newPDF. If the test is positive, the contamination is accepted, and anew label is assigned. The iteration changes when all the candi-dates have been tested.

The algorithm then returns to step 1, considering as noisethe reduced set of unlabeled points. The already-labeled pointsare considered as seeds and propagate under the new imposedconstraints.

The procedure of segmentation stops when the normalizedmaximum likelihood calculated to estimate the PDF of unla-beled converges [Fig. 8(d)]. Note that to conclude that theonly control parameter is the probability of error. It has aconsequence on the required number of iterations but not on thesegmentation result.

The segmentation of the TFR of Fig. 7(a) identifies threespectral patterns separately. The algorithm converges after threeiterations. Each iteration is a Features Space scale changing.This matches with the structure of the Features Space high-lighted by the theoretical grid. The third component of the signalis detected during the second iteration. The scale changing in-duces an increase of the resolution and allows computation ofthe seeds seeking a denser theoretical network and detect this

focused cluster [see Fig. 7(d)]. This figure highlights the shapeof its cluster in the Features Space. It presents a similar inflexionto the one of cluster (1), which is characteristic of sharp edgesspectral patterns (curves of constant). The final noise powerestimation is when the white Gaussian process vari-ance is .

The segmentation of the TFR of Fig. 8(a) identifies twodifferent deterministic component spectral patterns in termsof their T–F spread. Both patterns are smoothed so that thecorresponding clusters in the Features Space aggregate tolinear curves [see Fig. 8(c)]. The tail of cluster (2) is locatedin the area of highest . The variance of theadditive Gaussian process is estimated after six iterations by

.The analyzed signal of Fig. 9 is an underwater recording of

the whistle of a dolphin. The spectrogram contains a succes-sion of straight spectral patterns embedded in a colored noise.The TFR domain was reduced so that the embedding noise canbe considered white. The segmentation process identifies threekinds of spectral patterns in terms of their local SNR. The seg-mentation result of Fig. 9(b) shows that the pattern with label(3) has the highest energy level. Two patterns were extractedwith same label (2) because their magnitude variations is sim-ilar. In Fig. 9(c), we present the bins that define the initial seeds.The one concerning label (1) cannot be seen because it is tooclose to the noise cluster. The result is a segmentation of theTFR without any preprocessing of the TFR data.

Page 9: Spectrogram Segmentation 2

HORY et al.: SPECTROGRAM SEGMENTATION BY MEANS OF STATISTICAL FEATURES 2923

Fig. 8. Segmentation of the spectrogram of a synthetic signal containing a sum of three linear chirps and a sum of seven truncated sines embedded in a whiteGaussian noise of variance� = 10:4 (�̂ = 10:8). (a) Spectrogram contains 127� 199 coefficients. (b) Extracted regions (1) and (2) are represented in a (c)split features space. (d) Algorithm has converged after six iterations.

Fig. 9. Whistle of a dolphin. (a) Spectrogram of a dolphin whistle of 400� 90 coefficients is segmented (b) in four regions with labels ranging from 0 to 3. (c)Initial seeds of regions (2) and (3) are shown on the features space. (d) Histogram of the noiseF after segmentation (+) and the estimated� PDF (plain line).

Page 10: Spectrogram Segmentation 2

2924 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 12, DECEMBER 2002

VI. CONCLUSION

This paper presents a general method for automatic nonsta-tionary signal interpretation based on TFR local statistical fea-tures extraction. It does not require any prior knowledge aboutthe analyzed signal but exploits the general statistical propertiesof the chosen TFR. We focus on the spectrogram. We show thatspectrogram coefficients of a deterministic signal embedded inan additive white Gaussian noise have noncentraldistribu-tions with noncentral parameters equal to the deterministic com-ponent spectrogram coefficients. This drives us to choose asextracted features the first- and second-order statistics of thespectrogram coefficients. We show that such features randomlyvariate with two parameters, which measure a local SNR andthe spread of the structures. These parameters describe the Fea-tures Space content. They allow the identification of spectralpatterns in terms of their magnitude variations and T–F spread.According to this statistical study, we propose a region growingalgorithm to segment the TFR. The segmentation process is con-trolled by the TFR statistical properties. It iteratively leads to anefficient estimation of the noise power and a characterization ofthe deterministic components T-F evolution.

The first-order statistic is relevant for the noise characteriza-tion when the second-order statistic permits discrimination ofthe deterministic structures. Works are in progress to includeother features, like higher order statistics, in order to obtain anaccurate description of the T–F structures content.

APPENDIX AMOMENTS OF ANONCENTRAL DISTRIBUTION

The expression of theth-moment of arandom variable is given in [10] without proof. We proposethe following one.

Let us consider a set of independent Gaussian variablesof mean and variance . The random variable

, is, by definition, a random variable with non-central parameter not equal to zero. Its PDF is ofthe form [10]

(25)where , which is the order- modified Bessel function offirst kind, is defined by

with (26)

and is the Gamma function.One can find in [18], for instance, the general expression

of the th moment of the random variable ,which has, by definition a Rice distribution law

(27)

where is the hypergeometric confluent function de-fined by

(28)

The noncentral distribution is a generalization of theRice distribution [10] as follows a noncentraldistribution. Substituting (28) with into (27)and noting

(29)

The first ratio under the summation can be expressed as

which leads to

The sum under the derivative is the expansion into the Taylorseries of the exponential function around zero:

(30)

Finally, after having expressed the orderderivative

moment of a variable is given by

(31)

This expression concerns the case where (central dis-tribution) as well. The proof is the same as the one presented byreplacing the Rice distribution by a Rayleigh distribution [18].

APPENDIX BMOMENTS OF THEFEATURES

We derive in this Appendix the expression of the first andsecond moments of the features and . These expressionsare necessary for the Feature Space description.

Consider a set of independent and identically dis-tributed random variables ( ) with samples( ) and the th moment about zero of theirparent variable

(32)

Statistics are unbiased estimators of therelated moments [12]

(33)

with variance

Var (34)

Page 11: Spectrogram Segmentation 2

HORY et al.: SPECTROGRAM SEGMENTATION BY MEANS OF STATISTICAL FEATURES 2925

The above expressions lead to expressions of the first and secondmoment of the first feature of a cell, provided (12) ofmoments and of its parent variable

(35)

Var (36)

The derivation of the first and second moments of the secondfeature is not direct since is the square root of the empir-ical second moment about the mean

.The statistic is an asymptotically unbiased estimator of

the variance

(37)

where is the th moment about the mean of the parent vari-able . Its variance is given by

Var (38)

As Var varies as and the square root derivative existsalong , the following approximation holds [12]:

VarVar

(39)

Substituting (37) and (38) into the above approximation leadsto the expression of the variance of feature

Var (40)

One can finally express the expected value of,Var by replacing the obtained from (12)

(41)

where the nonzero and are

The s do not depend on . The variances (36) and (40) thusvary as and tend to zero, whereas tends to infinity. Fea-tures and are thus consistent estimators of the mean andthe standard deviation of the cell.

REFERENCES

[1] R. Adams and L. Bischof, “Seeded region growing,”IEEE Trans. Pat-tern Anal. Machine Intell., vol. 16, pp. 641–647, June 1994.

[2] R. G. Baraniuk and D. L. Jones, “A signal dependent time-frequencyrepresentation : Optimal kernel design,”IEEE Trans. Signal Processing,vol. 41, pp. 1589–1601, Apr. 1993.

[3] Y.-L. Chang and X. Li, “Adaptative image region-growing,”IEEE Trans.Image Processing, vol. 3, pp. 868–872, Nov. 1994.

[4] E. Chassande-Motin, P. Flandrin, and F. Auger, “On the statistics of spec-trogram reassignment vectors,”Multidimen. Syst. Signal Process., vol.9, pp. 355–362, Oct. 1998.

[5] M. Davy and C. Doncarli, “Optimal kernels of time-frequency represen-tations for signal classification,” inProc. IEEE Int. Symp. Time–Freq.Time-Scale Anal., Pittsburgh, PA, Oct. 1998.

[6] U. Grenander, H. O. Pollack, and D. Slepian, “The distribution ofquadratic forms in normal variates: A small sample theory withapplications to spectral analysis,”J. Soc. Indust. Appl. Math., vol. 7, no.4, pp. 374–401, 1959.

[7] P. Flandrin,Time-Frequency/Time-Scale Analysis. New York: Aca-demic, 1999.

[8] C. Hory, N. Martin, A. Chehikian, and L. E. Solberg, “Time-frequencyspace characterization based on statistical criterions,” inProc. EU-SIPCO, Tampere, Finland, Sept. 4–8, 2000, pp. 214–217.

[9] C. Hory and N. Martin, “Maximum likelihood noise estimation for spec-trogram segmentation control,” inProc. ICASSP, Orlando, FL, May13–17, 2002.

[10] N. L. Johnson, S. Kotz, and N. Balakrishnan,Continuous UnivariateDistribution, second ed. New York: Wiley, 1995, vol. 2.

[11] P. E. Johnson and G. L. Long, “The probability density of spectral es-timates based on modified periodogram averages,”IEEE Trans. SignalProcessing, vol. 47, pp. 1255–1261, May 1999.

[12] M. G. Kendall and A. Stuart,The Advanced Theory of Statis-tics. London, U.K.: Charles Griffin, 1963, vol. 2.

[13] L. H. Koopmans,The Spectral Analysis of Time Series. New York:Academic, 1974.

[14] B. Leprettre and N. Martin, “Extraction of pertinent subsets fromtime-frequency representations for detection and recognition purposes,”Signal Process., vol. 82, no. 2, pp. 229–238, Feb. 2002.

[15] S. Mallat,A Wavelet Tour of Signal Processing. New York: Academic,1999.

[16] R. von Mises,Mathematical Theory of Probability and Statistics. NewYork: Academic, 1964.

[17] V. Pierson and N. Martin, “Watershed segmentation of time-frequencyimages,” inProc. IEEE Workshop Non-Linear Signal Image Process.,Haidiki, Greece, June 20–22, 1995.

[18] J. G. Proakis,Digital Communications. New York: Mc Graw-Hill,1995.

[19] P. D. Welch, “A direct digital method of power spectrum estimation,”IBM J. Res. Devel., vol. 5, no. 2, pp. 141–159, Apr. 1961.

[20] , “The use of fast Fourier transform for the estimation of powerspectra: A method based on time averaging over short, modified peri-odograms,”IEEE Trans. Audio Electroacoust., vol. AU-15, pp. 70–73,June 1967.

Cyril Hory received the M.S. degree in applied acoustics from Université duMaine, Le Mans, France in 1999. He is currently pursuing the Ph.D. degree withthe Laboratoire des Images et des Signaux (LIS), Grenoble, France.

Nadine Martin received the Eng. degree in 1980 and the Ph.D. degree in 1984.She is a Director of Research at the National Centre of Scientific Research

(CNRS) and the head of GOTA, a team within Laboratory LIS, Grenoble,France. In the signal processing domain, her research interests are theanalysis and the interpretation of nonstationary signals. She is now workingon time-frequency decision, multipulse modeling, and fault detection. Realvibratory signals are more particularly studied in relation with the mechanicalmodel of the system. She is also directing a project on an automatic spectralanalysis system (ASpect TetrAS). She is the author of about 60 papers andcommunications.

Dr. Martin was co-organizer of the Fourth European Signal Processing Con-ference (EU-SIPCO), of a pre-doctoral course on the recent advances in signalprocessing (Les Houches 93), of the Sixth French Symposium on Signal andImage Processing (GRETSI’97), and of a special session on diagnostics andsignal processing at IEEE-SDEMPED’97.

Alain Chehikian is Professor with the Université Joseph Fourier, Grenoble,France. He was head of Laboratoire de Traitement d’lmages et de Reconnais-sance de Formes (LTIRF), Institut National Polytechnique de Grenoble (INPG).His research concerns segmentation, image description, and algorithm-architec-ture adequation.


Recommended