IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction...

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

IRCAM Research & Technology Seminar

Change detection for audio signals in real-time

Arnaud DesseinInstitut de Recherche et Coordination Acoustique/Musique

October 26th 2011

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 1/21



Conclusion

ContextMotivationsContributions

What is audio change detection?

Audio change detection, segmentation, novelty detectionFinding time boundaries, called change-points, which partition a sound signalinto homogeneous and continuous temporal regions, called segments, thatexhibit inhomogeneities with the adjacent regions.

Temporality:Causality principle.On-line or real-time setups.But also off-line setups.

Homogeneity:Intrinsic homogeneity.Inhomogeneity with contiguous segments.Criterion for homogeneity.

Examples include speech, music, radio broadcasts[Kemp et al., 2000, Sundaram & Chang, 2000, Foote, 2000].

Figure: Audio change detection.




Conclusion











Conclusion











Conclusion











Conclusion


What do we need?

Approach in many works:High level criteria and automatic classification (e.g., speakers, instruments,speech/non speech, voiced/unvoiced, speech/music).Drawbacks: assumes the existence and knowledge of classes, relies on apotentially fallible classification, requires a large amount of training data.

Other approach with no assumption on the existence of classes:

Onset detection [Bello et al., 2005]Speaker change detection [Siegler et al., 1997, Tritschler & Gopinath, 1999,Delacourt & Wellekens, 2000, Kotti et al., 2008, Grasic et al., 2010].Distance between frames, or statistics on the hypothesis of a change.Problem-dependent algorithms and heuristics.More generic frameworks with CuSum algorithms[Basseville & Nikiforov, 1993].Approximations for parameter estimation resulting in practical shortcomings[Omar & Chaudhari, 2005, Cont et al., 2011].

Our approach:

No assumption on the existence of classes, similarly to the second approach.Control on the variation of the information content.Real-time constraints.Modularity with various types of signals and criteria.




Conclusion


What do we need?


Other approach with no assumption on the existence of classes:Onset detection [Bello et al., 2005]Speaker change detection [Siegler et al., 1997, Tritschler & Gopinath, 1999,Delacourt & Wellekens, 2000, Kotti et al., 2008, Grasic et al., 2010].Distance between frames, or statistics on the hypothesis of a change.Problem-dependent algorithms and heuristics.

More generic frameworks with CuSum algorithms[Basseville & Nikiforov, 1993].Approximations for parameter estimation resulting in practical shortcomings[Omar & Chaudhari, 2005, Cont et al., 2011].

Our approach:





Conclusion


What do we need?


Other approach with no assumption on the existence of classes:Onset detection [Bello et al., 2005]Speaker change detection [Siegler et al., 1997, Tritschler & Gopinath, 1999,Delacourt & Wellekens, 2000, Kotti et al., 2008, Grasic et al., 2010].Distance between frames, or statistics on the hypothesis of a change.Problem-dependent algorithms and heuristics.More generic frameworks with CuSum algorithms[Basseville & Nikiforov, 1993].Approximations for parameter estimation resulting in practical shortcomings[Omar & Chaudhari, 2005, Cont et al., 2011].

Our approach:





Conclusion


What do we need?



Our approach:No assumption on the existence of classes, similarly to the second approach.Control on the variation of the information content.

Real-time constraints.Modularity with various types of signals and criteria.




Conclusion


What do we need?



Our approach:No assumption on the existence of classes, similarly to the second approach.Control on the variation of the information content.Real-time constraints.Modularity with various types of signals and criteria.




Conclusion


What do we propose?

Real-time modular change detection scheme.

Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.

Figure: Audio change detection in the framework of information geometry.




Conclusion


What do we propose?

Real-time modular change detection scheme.Framework of information geometry for exponential families.

Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.





Conclusion


What do we propose?

Real-time modular change detection scheme.Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.

Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.





Conclusion


What do we propose?

Real-time modular change detection scheme.Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.

Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.





Conclusion


What do we propose?

Real-time modular change detection scheme.Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.

Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.





Conclusion


What do we propose?

Real-time modular change detection scheme.Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.

Quantization of each segment with an information geometric prototype.





Conclusion


What do we propose?

Real-time modular change detection scheme.Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.





Conclusion

Theoretical backgroundExponential familiesDually flat Bregman geometry

Outline

1 Information geometryTheoretical backgroundExponential familiesDually flat Bregman geometry

2 Proposed system

3 Obtained results




Conclusion


What is information geometry?

Statistical differentiable manifold.Under certain assumptions, a parametric statistical model S = {pξ : ξ ∈ Ξ} ofprobability densities on a measurable set X forms a differentiable manifold.

Example: pξ(x) =1√2πσ2

exp{− (x − µ)2

2σ2

}for all x ∈ X = R, with

ξ = [µ, σ2] ∈ Ξ = R× R++.

Fisher information metric [Rao, 1945, Chentsov, 1982].Under certain assumptions, the Fisher information matrix defines the uniqueRiemannian metric g on S.

Affine connections [Chentsov, 1982, Amari & Nagaoka, 2000].

Under certain assumptions, the α-connections ∇(α) for α ∈ R are the uniqueaffine connections on S.




Conclusion





exp{− (x − µ)2

2σ2


ξ = [µ, σ2] ∈ Ξ = R× R++.







Conclusion





exp{− (x − µ)2

2σ2


ξ = [µ, σ2] ∈ Ξ = R× R++.







Conclusion





exp{− (x − µ)2

2σ2


ξ = [µ, σ2] ∈ Ξ = R× R++.







Conclusion


What are exponential families?

Exponential family [Darmois, 1935, Koopman, 1936, Pitman, 1936].

pθ(x) = exp(θ>T (x)− F (θ) + C(x)

)for all x ∈ X .

Characteristics:θ: natural parameters in a non-empty convex open set Θ ⊆ Rd .F (θ): log-normalizer, smooth strictly convex function on Θ.C(x): carrier measure, measurable function on X .T (x): sufficient statistic, measurable function on X .




Conclusion


What are exponential families?

Exponential family [Darmois, 1935, Koopman, 1936, Pitman, 1936].

pθ(x) = exp(θ>T (x)− F (θ) + C(x)

)for all x ∈ X .

Characteristics:θ: natural parameters in a non-empty convex open set Θ ⊆ Rd .F (θ): log-normalizer, smooth strictly convex function on Θ.C(x): carrier measure, measurable function on X .T (x): sufficient statistic, measurable function on X .

A taxonomy of probability measures

Probability measure

Parametric Non-parametric

Exponential families Non-exponential families

Uniform Cauchy Levy skew α-stableUnivariate Multivariate

uniparameter multi-parameter

Dirichlet Weibull

GaussianRayleigh

Bernoulli

Binomial

Exponential

Poisson

Gamma ΓBeta β

Bi-parameter

Multinomial

c© 2009, Frank Nielsen — p. 62/129

Figure: A taxonomy of probability measures [Nielsen & Garcia, 2009].




Conclusion


What is the canonical geometry of exponential families?F possesses a conjugate F ?, which is a smooth strictly convex functiondefined by the Legendre-Fenchel transform F ?(η) = supθ∈Θ θTη − F (θ)for all η ∈ H.The expectation parameters η form another coordinate system of S andwe have the relations η = ∇F (θ) and θ = ∇F ?(η).

Link with maximum likelihood estimation through ηmle =1n∑n

j=1 T (x j).

(S, g ,∇(1),∇(−1)) is a dually flat space in which the natural parameters θand the expectation parameters η are dual affine coordinate systems.It generalizes the self-dual Euclidean geometry, with two dual Bregmandivergences BF and BF? instead of the self-dual Euclidean distance.

Bregman divergence [Bregman, 1967].

Bφ(ξ ‖ ξ′) = φ(ξ)− φ(ξ′)− (ξ − ξ′)>∇φ(ξ′).

Canonical divergences of dually flat spaces, bijection with exponentialfamilies [Amari & Nagaoka, 2000, Banerjee et al., 2005]:DKL(pξ ‖ pξ′) = BF (θ′ ‖ θ) = BF?(η ‖ η′).Generic algorithms that handle many distances [Dessein & Cont, 2011a].




Conclusion




j=1 T (x j).(S, g ,∇(1),∇(−1)) is a dually flat space in which the natural parameters θand the expectation parameters η are dual affine coordinate systems.It generalizes the self-dual Euclidean geometry, with two dual Bregmandivergences BF and BF? instead of the self-dual Euclidean distance.


Bφ(ξ ‖ ξ′) = φ(ξ)− φ(ξ′)− (ξ − ξ′)>∇φ(ξ′).





Conclusion




j=1 T (x j).(S, g ,∇(1),∇(−1)) is a dually flat space in which the natural parameters θand the expectation parameters η are dual affine coordinate systems.It generalizes the self-dual Euclidean geometry, with two dual Bregmandivergences BF and BF? instead of the self-dual Euclidean distance.


Bφ(ξ ‖ ξ′) = φ(ξ)− φ(ξ′)− (ξ − ξ′)>∇φ(ξ′).





Conclusion


What is the canonical geometry of exponential families?

Figure: Geometrical viewpoint [Nielsen & Nock, 2009].




Conclusion

General architectureSound descriptors modelingChange detection

Outline

1 Information geometry

2 Proposed systemGeneral architectureSound descriptors modelingChange detection

3 Obtained results




Conclusion


How to segment audio streams?

Architecture:1 Represent the incoming audio stream with short-time

sound descriptors x j .2 Model the features x j with probability distributions

pξj from a given statistical family.3 Detect when a change in the parameters ξj occurs.

Sequential change detection:1 Accumulate the incoming observations x j in a

growing window x = (x1, . . . , xn).2 Incrementally try to detect a change at any time i of

the window until a change is detected.3 Discard the observations and start again with an

initial window x = (x i+1, . . . , xn).

Reduces to finding one change-point in a givenwindow x = (x1, . . . , xn).

Figure: Segmentation at time t.

Figure: Schema of thegeneral architecture ofthe system.




Conclusion
















Conclusion
















Conclusion


How to model sounds?Computation of a sound descriptor x j :

Fourier or constant-Q transforms for information on the spectral content.Mel-frequency cepstral coefficients for information on the timbre.Many other possibilities.

Modeling with a probability distribution pξj from a statistical family:Categorical distributions.Multivariate Gaussian distributions.Many other possibilities.

Figure: Sound descriptors modeling.




Conclusion


How to model sounds?Computation of a sound descriptor x j :

Fourier or constant-Q transforms for information on the spectral content.Mel-frequency cepstral coefficients for information on the timbre.Many other possibilities.

Modeling with a probability distribution pξj from a statistical family:Categorical distributions.Multivariate Gaussian distributions.Many other possibilities.

Figure: Sound descriptors modeling.




Conclusion


How to detect a change?Problem

Detect one change-point in a given window x = (x1, . . . , xn).

Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:

12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).




Conclusion




Assumptions

The samples x j are drawn independently from a given statistical modelS = {pξ : ξ ∈ Ξ} and are identically distributed before (resp., after) change.

Usual approach [Basseville & Nikiforov, 1993]:Assume that ξ0 and ξ1 before and after change are known:H0: x1, . . . , xn ∼ pξ0 .H i1: x1, . . . , x i ∼ pξ0 , and x i+1, . . . , xn ∼ pξ1 .

CuSum test can be employed by thresholding the likelihood ratio:

12

LRi = logp(x |H i

1)

p(x |H0)= log

∏ij=1 pξ0 (x j )

∏nj=i+1 pξ1 (x j )∏i

j=1 pξ0 (x j )∏n

j=i+1 pξ0 (x j )=

n∑j=i+1

logpξ1 (x j )pξ0 (x j )

.

ξ1 unknown: CuSum test can still be employed by computing generalizedlikelihood ratio statistics where we replace ξ1 with ξi1.ξ0 unknown: the test cannot be written in its simple form anymore.






Conclusion




Assumptions




12

LRi = logp(x |H i

1)

p(x |H0)= log

∏ij=1 pξ0 (x j )


j=1 pξ0 (x j )∏n

j=i+1 pξ0 (x j )=

n∑j=i+1


.







Conclusion




Assumptions




12

LRi = logp(x |H i

1)

p(x |H0)= log

∏ij=1 pξ0 (x j )


j=1 pξ0 (x j )∏n

j=i+1 pξ0 (x j )=

n∑j=i+1


.

ξ1 unknown: CuSum test can still be employed by computing generalizedlikelihood ratio statistics where we replace ξ1 with ξi1.

ξ0 unknown: the test cannot be written in its simple form anymore.






Conclusion




Assumptions




12

LRi = logp(x |H i

1)

p(x |H0)= log

∏ij=1 pξ0 (x j )


j=1 pξ0 (x j )∏n

j=i+1 pξ0 (x j )=

n∑j=i+1


.







Conclusion




Assumptions

The samples x j are drawn independently from a given exponential modelS = {pθ : θ ∈ Θ} and are identically distributed before (resp., after) change.

Proposed approach for exponential families [Dessein & Cont, 2011b]:θ0 and θ1 unknown:H0: x1, . . . , xn ∼ pθ0 .H i1: x1, . . . , x i ∼ pθi0 , and x i+1, . . . , xn ∼ pθi1 .

Generalized likelihood ratio now becomes:

1

2GLRi = log

∏ij=1 p

θi0(xj )

∏nj=i+1 p

θi1(xj )∏i

j=1 pθ0

(xj )∏nj=i+1 p

θ0(xj )

=i∑

j=1log

pθi0

(xj )

pθ0

(xj )+

n∑j=i+1

logpθi1

(xj )

pθ0

(xj )

=i∑

j=1

((θi0 − θ0)

>T (xj ) − F (θi0) + F (θ0))

+n∑

j=i+1

((θi1 − θ0)

>T (xj ) − F (θi1) + F (θ0))

= i(F (θ0) − F (θi0) + (θi0 − θ0)

>∇F (θi0mle ))

+ (n − i)(F (θ0) − F (θi1) + (θi1 − θ0)

>∇F (θi1mle )).






Conclusion




Assumptions


Proposed approach for exponential families [Dessein & Cont, 2011b]:θ0 and θ1 unknown:H0: x1, . . . , xn ∼ pθ0 .H i1: x1, . . . , x i ∼ pθi0 , and x i+1, . . . , xn ∼ pθi1 .

Generalized likelihood ratio now becomes:

1

2GLRi = log

∏ij=1 p

θi0(xj )

∏nj=i+1 p

θi1(xj )∏i

j=1 pθ0

(xj )∏nj=i+1 p

θ0(xj )

=i∑

j=1log

pθi0

(xj )

pθ0

(xj )+

n∑j=i+1

logpθi1

(xj )

pθ0

(xj )

=i∑

j=1

((θi0 − θ0)

>T (xj ) − F (θi0) + F (θ0))

+n∑

j=i+1

((θi1 − θ0)

>T (xj ) − F (θi1) + F (θ0))

= i(F (θ0) − F (θi0) + (θi0 − θ0)

>∇F (θi0mle ))

+ (n − i)(F (θ0) − F (θi1) + (θi1 − θ0)

>∇F (θi1mle )).






Conclusion




Assumptions


Test statistics12

GLRi = i(DKL

(pθi0mle

∥∥∥pθ0)−DKL(pθi0mle

∥∥∥pθi0

))+(n−i)

(DKL

(pθi1mle


∥∥∥pθi1

)).






Conclusion




Assumptions


Test statistics12

GLRi = i(DKL

(pθi0mle


∥∥∥pθi0

))+(n−i)

(DKL

(pθi1mle


∥∥∥pθi1

)).

Information-geometric interpretation.

Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:





Conclusion




Assumptions


Test statistics12

GLRi = i(DKL

(pθi0mle


∥∥∥pθi0

))+(n−i)

(DKL

(pθi1mle


∥∥∥pθi1

)).

Information-geometric interpretation.Encompasses statistic and distance-based methods.

Computationally efficient updates when considering the mles:





Conclusion




Assumptions


Test statistics12

GLRi = i(DKL

(pθi0mle


∥∥∥pθi0

))+(n−i)

(DKL

(pθi1mle


∥∥∥pθi1

)).






Conclusion

Synthetic dataReal-world dataAudio data

Outline

1 Information geometry

2 Proposed system

3 Obtained resultsSynthetic dataReal-world dataAudio data




Conclusion


Fixed-variance univariate normal densities

50 100 150 200 250 300 350 400 450 500

0.020.040.06

Computation time

50 100 150 200 250 300 350 400 450 50020406080100

Maximum generalized likelihood ratio

50 100 150 200 250 300 350 400 450 500−101

Series

50 100 150 200 250 300 350 400 450 500−101

Change detection

1 2 3−101

Original parameters

1 2 3−101

Estimated parameters

Generalized likelihood ratio

100 200 300 400 500

50

100

150

200

250

300

350

400

450

500

0

10

20

30

40

50

60

70

80

90

100

Figure: Change detection in fixed-variance univariate normal data.




Conclusion


Univariate exponential densities

50 100 150 200 250 300 350 400 450 500

0.020.040.06

Computation time

50 100 150 200 250 300 350 400 450 50020406080100


50 100 150 200 250 300 350 400 450 500

204060

Series

50 100 150 200 250 300 350 400 450 500

204060

Change detection

1 2 3012

Original parameters

1 2 3012



100 200 300 400 500

50

100

150

200

250

300

350

400

450

500

10

20

30

40

50

60

70

80

90

100

Figure: Change detection in univariate exponential data.




Conclusion


Multivariate normal densities

50 100 150 200 250 300 350 400 450 500

0.020.040.060.08

Computation time

50 100 150 200 250 300 350 400 450 50080

100120


50 100 150 200 250 300 350 400 450 500

−202

Series

50 100 150 200 250 300 350 400 450 500

−202

Change detection

1 2 3−2

0

2Original parameters

1 2 3−2

0

2Estimated parameters


100 200 300 400 500

50

100

150

200

250

300

350

400

450

500

0

20

40

60

80

100

120

Figure: Change detection in multivariate normal data.




Conclusion


Well-log

100 200 300 400 500 600 700 800 900 1000

0.020.040.06

Computation time

100 200 300 400 500 600 700 800 900 10005

101520


100 200 300 400 500 600 700 800 900 1000−4−202

Time series

100 200 300 400 500 600 700 800 900 1000−4−202

Segmentation

0 2 4 6 8 10 12 14 16 18−505


Figure: Segmentation of well-log data.




Conclusion


Daily log-return of the Dow Jones

2000 4000 6000 8000 10000 12000 14000 16000 18000

0.020.040.06

Computation time

2000 4000 6000 8000 10000 12000 14000 16000 18000

200400600


2000 4000 6000 8000 10000 12000 14000 16000 18000−20−10

010

Time series

2000 4000 6000 8000 10000 12000 14000 16000 18000−20−10

010

Segmentation

0 5 10 15 20 25 30−100

0100


Figure: Segmentation of the daily log-return of the Dow Jones.




Conclusion


Speech

0 2 4 6 8 10 12 14−1

0

1Original audio

Time (s)

Frame number

Fra

me

num

ber


100 200 300 400 500 600

100

200

300

400

500

600

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Figure: Speaker change detection in a speech fragment.




Conclusion


Polyphonic music

−1

0

1Original audio

0 5 10 15 20 25 30 35F2

A2#

D3#

G3#

C4#

F4#

B4

E5

A5

Time (s)

Pitc

h

Piano Roll

Figure: Note change detection in a polyphonic music excerpt.




Conclusion

What we (don’t) have

Summary and perspectives.Representations.Descriptors modeling.Temporality of events.Applications.




Conclusion



Many possibilities.Combinations of descriptors.Feature selection.




Conclusion



Exponential families and Bregman divergences, mixture models.Model selection.Other geometries, divergences, test statistics.




Conclusion



Assumption of quasi-stationarity.Non-stationarity modeling.Conditional distributions, linear/non-linear systems.




Conclusion



Evaluation on large datasets in audio and other domains.Onset detection, music segmentation, speaker segmentation, etc.First stage in real-time systems for polyphonic music transcription,music similarity analysis, computer-assisted improvisation.




Conclusion



Thank you for your attention! Questions?This work was supported by a doctoral fellowship from the UPMC(EDITE) and by a grant from the JST-CNRS ICT (Improving theVR Experience).




Conclusion

Bibliography I

Amari, S.-i. & Nagaoka, H. (2000).Methods of information geometry, volume 191 of Translations of Mathematical Monographs.American Mathematical Society.

Banerjee, A., Merugu, S., Dhillon, I. S., & Ghosh, J. (2005).Clustering with Bregman divergences.Journal of Machine Learning Research, 6, 1705–1749.

Basseville, M. & Nikiforov, V. (1993).Detection of abrupt changes: Theory and application.Englewood Cliffs, NJ, USA: Prentice-Hall, Inc.

Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. B. (2005).A tutorial on onset detection in music signals.IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1047.

Bregman, L. M. (1967).The relaxation method of finding the common point of convex sets and its application to the solution of problems in convexprogramming.USSR Computational Mathematics and Mathematical Physics, 7(3), 200–217.

Chentsov, N. N. (1982).Statistical decision rules and optimal inference, volume 53 of Translations of Mathematical Monographs.American Mathematical Society.

Cont, A., Dubnov, S., & Assayag, G. (2011).On the information geometry of audio streams with applications to similarity computing.IEEE Transactions on Audio, Speech and Language Processing, 19(4), 837–846.




Conclusion

Bibliography II

Darmois, G. (1935).Sur les lois de probabilités à estimation exhaustive.Comptes Rendus des Séances de l’Académie des Sciences, 200, 1265–1266.

Delacourt, P. & Wellekens, C. J. (2000).DISTBIC: A speaker-based segmentation for audio data indexing.Speech Communication, 32(1–2), 111–126.

Dessein, A. & Cont, A. (2011a).Applications of information geometry to audio signal processing.In 14th International Conference on Digital Audio Effects (DAFx) Paris, France.

Dessein, A. & Cont, A. (2011b).Information-geometric approach to real-time audio change detection.Submitted.

Foote, J. (2000).Automatic audio segmentation using a measure of audio novelty.In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), volume 1 (pp. 452–455). New York City,NY, USA.

Grasic, M., Kos, M., & Kacic, Z. (2010).Online speaker segmentation and clusteringusing cross-likelihood ratio calculation with reference criterion selection.IET Signal Processing, 4(6), 673–685.

Kemp, T., Schmidt, M., Westphal, M., & Waibel, A. (2000).Strategies for automatic segmentation of audio data.In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 3 (pp.1423–1426). Istanbul, Turquie.




Conclusion

Bibliography III

Koopman, B. O. (1936).On distributions admitting a sufficient statistic.Transactions of the American Mathematical Society, 39(3), 399–409.

Kotti, M., Benetos, E., & Kotropoulos, C. (2008).Computationally efficient and robust BIC-based speaker segmentation.IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 920–933.

Nielsen, F. & Garcia, V. (2009).Statistical exponential families: A digest with flash cards.

Nielsen, F. & Nock, R. (2009).Sided and symmetrized Bregman centroids.IEEE Transactions on Information Theory, 55(6), 2882–2904.

Omar, M. K. & Chaudhari, U. (2005).Blind change detection for audio segmentation.In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 501–504).Philadelphie, PA, USA.

Pitman, E. J. G. (1936).Sufficient statistics and intrinsic accuracy.Mathematical Proceedings of the Cambridge Philosophical Society, 32(4), 567–579.

Rao, C. R. (1945).Information and accuracy attainable in the estimation of statistical parameters.Bulletin of the Calcutta Mathematical Society, 37, 81–91.




Conclusion

Bibliography IV

Siegler, M. A., Jain, U., Raj, B., & Stern, R. M. (1997).Automatic segmentation, classification and clustering of broadcast news audio.In Proceedings of the DARPA Speech Recognition Workshop (pp. 97–99). Chantilly, VA, USA.

Sundaram, H. & Chang, S.-F. (2000).Audio scene segmentation using multiple features, models and time scales.In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 4 (pp.2441–2444). Istanbul, Turquie.

Tritschler, A. & Gopinath, R. A. (1999).Improved speaker segmentation and segments clustering using the Bayesian information criterion.In Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech), volume 2 (pp.679–682). Budapest, Hongrie.


Date post:	13-Sep-2018
Category:	Documents
Upload:	duongtruc
View:	213 times
Download:	0 times

IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction...

Documents