IntroductionInformation geometry
Proposed systemObtained results
Conclusion
IRCAM Research & Technology Seminar
Change detection for audio signals in real-time
Arnaud DesseinInstitut de Recherche et Coordination Acoustique/Musique
October 26th 2011
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 1/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What is audio change detection?
Audio change detection, segmentation, novelty detectionFinding time boundaries, called change-points, which partition a sound signalinto homogeneous and continuous temporal regions, called segments, thatexhibit inhomogeneities with the adjacent regions.
Temporality:Causality principle.On-line or real-time setups.But also off-line setups.
Homogeneity:Intrinsic homogeneity.Inhomogeneity with contiguous segments.Criterion for homogeneity.
Examples include speech, music, radio broadcasts[Kemp et al., 2000, Sundaram & Chang, 2000, Foote, 2000].
Figure: Audio change detection.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 2/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What is audio change detection?
Audio change detection, segmentation, novelty detectionFinding time boundaries, called change-points, which partition a sound signalinto homogeneous and continuous temporal regions, called segments, thatexhibit inhomogeneities with the adjacent regions.
Temporality:Causality principle.On-line or real-time setups.But also off-line setups.
Homogeneity:Intrinsic homogeneity.Inhomogeneity with contiguous segments.Criterion for homogeneity.
Examples include speech, music, radio broadcasts[Kemp et al., 2000, Sundaram & Chang, 2000, Foote, 2000].
Figure: Audio change detection.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 2/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What is audio change detection?
Audio change detection, segmentation, novelty detectionFinding time boundaries, called change-points, which partition a sound signalinto homogeneous and continuous temporal regions, called segments, thatexhibit inhomogeneities with the adjacent regions.
Temporality:Causality principle.On-line or real-time setups.But also off-line setups.
Homogeneity:Intrinsic homogeneity.Inhomogeneity with contiguous segments.Criterion for homogeneity.
Examples include speech, music, radio broadcasts[Kemp et al., 2000, Sundaram & Chang, 2000, Foote, 2000].
Figure: Audio change detection.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 2/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What is audio change detection?
Audio change detection, segmentation, novelty detectionFinding time boundaries, called change-points, which partition a sound signalinto homogeneous and continuous temporal regions, called segments, thatexhibit inhomogeneities with the adjacent regions.
Temporality:Causality principle.On-line or real-time setups.But also off-line setups.
Homogeneity:Intrinsic homogeneity.Inhomogeneity with contiguous segments.Criterion for homogeneity.
Examples include speech, music, radio broadcasts[Kemp et al., 2000, Sundaram & Chang, 2000, Foote, 2000].
Figure: Audio change detection.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 2/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What do we need?
Approach in many works:High level criteria and automatic classification (e.g., speakers, instruments,speech/non speech, voiced/unvoiced, speech/music).Drawbacks: assumes the existence and knowledge of classes, relies on apotentially fallible classification, requires a large amount of training data.
Other approach with no assumption on the existence of classes:
Onset detection [Bello et al., 2005]Speaker change detection [Siegler et al., 1997, Tritschler & Gopinath, 1999,Delacourt & Wellekens, 2000, Kotti et al., 2008, Grasic et al., 2010].Distance between frames, or statistics on the hypothesis of a change.Problem-dependent algorithms and heuristics.More generic frameworks with CuSum algorithms[Basseville & Nikiforov, 1993].Approximations for parameter estimation resulting in practical shortcomings[Omar & Chaudhari, 2005, Cont et al., 2011].
Our approach:
No assumption on the existence of classes, similarly to the second approach.Control on the variation of the information content.Real-time constraints.Modularity with various types of signals and criteria.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 3/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What do we need?
Approach in many works:High level criteria and automatic classification (e.g., speakers, instruments,speech/non speech, voiced/unvoiced, speech/music).Drawbacks: assumes the existence and knowledge of classes, relies on apotentially fallible classification, requires a large amount of training data.
Other approach with no assumption on the existence of classes:Onset detection [Bello et al., 2005]Speaker change detection [Siegler et al., 1997, Tritschler & Gopinath, 1999,Delacourt & Wellekens, 2000, Kotti et al., 2008, Grasic et al., 2010].Distance between frames, or statistics on the hypothesis of a change.Problem-dependent algorithms and heuristics.
More generic frameworks with CuSum algorithms[Basseville & Nikiforov, 1993].Approximations for parameter estimation resulting in practical shortcomings[Omar & Chaudhari, 2005, Cont et al., 2011].
Our approach:
No assumption on the existence of classes, similarly to the second approach.Control on the variation of the information content.Real-time constraints.Modularity with various types of signals and criteria.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 3/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What do we need?
Approach in many works:High level criteria and automatic classification (e.g., speakers, instruments,speech/non speech, voiced/unvoiced, speech/music).Drawbacks: assumes the existence and knowledge of classes, relies on apotentially fallible classification, requires a large amount of training data.
Other approach with no assumption on the existence of classes:Onset detection [Bello et al., 2005]Speaker change detection [Siegler et al., 1997, Tritschler & Gopinath, 1999,Delacourt & Wellekens, 2000, Kotti et al., 2008, Grasic et al., 2010].Distance between frames, or statistics on the hypothesis of a change.Problem-dependent algorithms and heuristics.More generic frameworks with CuSum algorithms[Basseville & Nikiforov, 1993].Approximations for parameter estimation resulting in practical shortcomings[Omar & Chaudhari, 2005, Cont et al., 2011].
Our approach:
No assumption on the existence of classes, similarly to the second approach.Control on the variation of the information content.Real-time constraints.Modularity with various types of signals and criteria.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 3/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What do we need?
Approach in many works:High level criteria and automatic classification (e.g., speakers, instruments,speech/non speech, voiced/unvoiced, speech/music).Drawbacks: assumes the existence and knowledge of classes, relies on apotentially fallible classification, requires a large amount of training data.
Other approach with no assumption on the existence of classes:Onset detection [Bello et al., 2005]Speaker change detection [Siegler et al., 1997, Tritschler & Gopinath, 1999,Delacourt & Wellekens, 2000, Kotti et al., 2008, Grasic et al., 2010].Distance between frames, or statistics on the hypothesis of a change.Problem-dependent algorithms and heuristics.More generic frameworks with CuSum algorithms[Basseville & Nikiforov, 1993].Approximations for parameter estimation resulting in practical shortcomings[Omar & Chaudhari, 2005, Cont et al., 2011].
Our approach:No assumption on the existence of classes, similarly to the second approach.Control on the variation of the information content.
Real-time constraints.Modularity with various types of signals and criteria.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 3/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What do we need?
Approach in many works:High level criteria and automatic classification (e.g., speakers, instruments,speech/non speech, voiced/unvoiced, speech/music).Drawbacks: assumes the existence and knowledge of classes, relies on apotentially fallible classification, requires a large amount of training data.
Other approach with no assumption on the existence of classes:Onset detection [Bello et al., 2005]Speaker change detection [Siegler et al., 1997, Tritschler & Gopinath, 1999,Delacourt & Wellekens, 2000, Kotti et al., 2008, Grasic et al., 2010].Distance between frames, or statistics on the hypothesis of a change.Problem-dependent algorithms and heuristics.More generic frameworks with CuSum algorithms[Basseville & Nikiforov, 1993].Approximations for parameter estimation resulting in practical shortcomings[Omar & Chaudhari, 2005, Cont et al., 2011].
Our approach:No assumption on the existence of classes, similarly to the second approach.Control on the variation of the information content.Real-time constraints.Modularity with various types of signals and criteria.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 3/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What do we propose?
Real-time modular change detection scheme.
Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.
Figure: Audio change detection in the framework of information geometry.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 4/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What do we propose?
Real-time modular change detection scheme.Framework of information geometry for exponential families.
Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.
Figure: Audio change detection in the framework of information geometry.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 4/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What do we propose?
Real-time modular change detection scheme.Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.
Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.
Figure: Audio change detection in the framework of information geometry.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 4/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What do we propose?
Real-time modular change detection scheme.Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.
Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.
Figure: Audio change detection in the framework of information geometry.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 4/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What do we propose?
Real-time modular change detection scheme.Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.
Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.
Figure: Audio change detection in the framework of information geometry.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 4/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What do we propose?
Real-time modular change detection scheme.Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.
Quantization of each segment with an information geometric prototype.
Figure: Audio change detection in the framework of information geometry.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 4/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
ContextMotivationsContributions
What do we propose?
Real-time modular change detection scheme.Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.
Figure: Audio change detection in the framework of information geometry.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 4/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Theoretical backgroundExponential familiesDually flat Bregman geometry
Outline
1 Information geometryTheoretical backgroundExponential familiesDually flat Bregman geometry
2 Proposed system
3 Obtained results
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 5/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Theoretical backgroundExponential familiesDually flat Bregman geometry
What is information geometry?
Statistical differentiable manifold.Under certain assumptions, a parametric statistical model S = {pξ : ξ ∈ Ξ} ofprobability densities on a measurable set X forms a differentiable manifold.
Example: pξ(x) =1√2πσ2
exp{− (x − µ)2
2σ2
}for all x ∈ X = R, with
ξ = [µ, σ2] ∈ Ξ = R× R++.
Fisher information metric [Rao, 1945, Chentsov, 1982].Under certain assumptions, the Fisher information matrix defines the uniqueRiemannian metric g on S.
Affine connections [Chentsov, 1982, Amari & Nagaoka, 2000].
Under certain assumptions, the α-connections ∇(α) for α ∈ R are the uniqueaffine connections on S.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 6/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Theoretical backgroundExponential familiesDually flat Bregman geometry
What is information geometry?
Statistical differentiable manifold.Under certain assumptions, a parametric statistical model S = {pξ : ξ ∈ Ξ} ofprobability densities on a measurable set X forms a differentiable manifold.
Example: pξ(x) =1√2πσ2
exp{− (x − µ)2
2σ2
}for all x ∈ X = R, with
ξ = [µ, σ2] ∈ Ξ = R× R++.
Fisher information metric [Rao, 1945, Chentsov, 1982].Under certain assumptions, the Fisher information matrix defines the uniqueRiemannian metric g on S.
Affine connections [Chentsov, 1982, Amari & Nagaoka, 2000].
Under certain assumptions, the α-connections ∇(α) for α ∈ R are the uniqueaffine connections on S.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 6/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Theoretical backgroundExponential familiesDually flat Bregman geometry
What is information geometry?
Statistical differentiable manifold.Under certain assumptions, a parametric statistical model S = {pξ : ξ ∈ Ξ} ofprobability densities on a measurable set X forms a differentiable manifold.
Example: pξ(x) =1√2πσ2
exp{− (x − µ)2
2σ2
}for all x ∈ X = R, with
ξ = [µ, σ2] ∈ Ξ = R× R++.
Fisher information metric [Rao, 1945, Chentsov, 1982].Under certain assumptions, the Fisher information matrix defines the uniqueRiemannian metric g on S.
Affine connections [Chentsov, 1982, Amari & Nagaoka, 2000].
Under certain assumptions, the α-connections ∇(α) for α ∈ R are the uniqueaffine connections on S.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 6/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Theoretical backgroundExponential familiesDually flat Bregman geometry
What is information geometry?
Statistical differentiable manifold.Under certain assumptions, a parametric statistical model S = {pξ : ξ ∈ Ξ} ofprobability densities on a measurable set X forms a differentiable manifold.
Example: pξ(x) =1√2πσ2
exp{− (x − µ)2
2σ2
}for all x ∈ X = R, with
ξ = [µ, σ2] ∈ Ξ = R× R++.
Fisher information metric [Rao, 1945, Chentsov, 1982].Under certain assumptions, the Fisher information matrix defines the uniqueRiemannian metric g on S.
Affine connections [Chentsov, 1982, Amari & Nagaoka, 2000].
Under certain assumptions, the α-connections ∇(α) for α ∈ R are the uniqueaffine connections on S.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 6/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Theoretical backgroundExponential familiesDually flat Bregman geometry
What are exponential families?
Exponential family [Darmois, 1935, Koopman, 1936, Pitman, 1936].
pθ(x) = exp(θ>T (x)− F (θ) + C(x)
)for all x ∈ X .
Characteristics:θ: natural parameters in a non-empty convex open set Θ ⊆ Rd .F (θ): log-normalizer, smooth strictly convex function on Θ.C(x): carrier measure, measurable function on X .T (x): sufficient statistic, measurable function on X .
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 7/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Theoretical backgroundExponential familiesDually flat Bregman geometry
What are exponential families?
Exponential family [Darmois, 1935, Koopman, 1936, Pitman, 1936].
pθ(x) = exp(θ>T (x)− F (θ) + C(x)
)for all x ∈ X .
Characteristics:θ: natural parameters in a non-empty convex open set Θ ⊆ Rd .F (θ): log-normalizer, smooth strictly convex function on Θ.C(x): carrier measure, measurable function on X .T (x): sufficient statistic, measurable function on X .
A taxonomy of probability measures
Probability measure
Parametric Non-parametric
Exponential families Non-exponential families
Uniform Cauchy Levy skew α-stableUnivariate Multivariate
uniparameter multi-parameter
Dirichlet Weibull
GaussianRayleigh
Bernoulli
Binomial
Exponential
Poisson
Gamma ΓBeta β
Bi-parameter
Multinomial
c© 2009, Frank Nielsen — p. 62/129
Figure: A taxonomy of probability measures [Nielsen & Garcia, 2009].
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 7/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Theoretical backgroundExponential familiesDually flat Bregman geometry
What is the canonical geometry of exponential families?F possesses a conjugate F ?, which is a smooth strictly convex functiondefined by the Legendre-Fenchel transform F ?(η) = supθ∈Θ θTη − F (θ)for all η ∈ H.The expectation parameters η form another coordinate system of S andwe have the relations η = ∇F (θ) and θ = ∇F ?(η).
Link with maximum likelihood estimation through ηmle =1n∑n
j=1 T (x j).
(S, g ,∇(1),∇(−1)) is a dually flat space in which the natural parameters θand the expectation parameters η are dual affine coordinate systems.It generalizes the self-dual Euclidean geometry, with two dual Bregmandivergences BF and BF? instead of the self-dual Euclidean distance.
Bregman divergence [Bregman, 1967].
Bφ(ξ ‖ ξ′) = φ(ξ)− φ(ξ′)− (ξ − ξ′)>∇φ(ξ′).
Canonical divergences of dually flat spaces, bijection with exponentialfamilies [Amari & Nagaoka, 2000, Banerjee et al., 2005]:DKL(pξ ‖ pξ′) = BF (θ′ ‖ θ) = BF?(η ‖ η′).Generic algorithms that handle many distances [Dessein & Cont, 2011a].
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 8/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Theoretical backgroundExponential familiesDually flat Bregman geometry
What is the canonical geometry of exponential families?F possesses a conjugate F ?, which is a smooth strictly convex functiondefined by the Legendre-Fenchel transform F ?(η) = supθ∈Θ θTη − F (θ)for all η ∈ H.The expectation parameters η form another coordinate system of S andwe have the relations η = ∇F (θ) and θ = ∇F ?(η).
Link with maximum likelihood estimation through ηmle =1n∑n
j=1 T (x j).(S, g ,∇(1),∇(−1)) is a dually flat space in which the natural parameters θand the expectation parameters η are dual affine coordinate systems.It generalizes the self-dual Euclidean geometry, with two dual Bregmandivergences BF and BF? instead of the self-dual Euclidean distance.
Bregman divergence [Bregman, 1967].
Bφ(ξ ‖ ξ′) = φ(ξ)− φ(ξ′)− (ξ − ξ′)>∇φ(ξ′).
Canonical divergences of dually flat spaces, bijection with exponentialfamilies [Amari & Nagaoka, 2000, Banerjee et al., 2005]:DKL(pξ ‖ pξ′) = BF (θ′ ‖ θ) = BF?(η ‖ η′).Generic algorithms that handle many distances [Dessein & Cont, 2011a].
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 8/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Theoretical backgroundExponential familiesDually flat Bregman geometry
What is the canonical geometry of exponential families?F possesses a conjugate F ?, which is a smooth strictly convex functiondefined by the Legendre-Fenchel transform F ?(η) = supθ∈Θ θTη − F (θ)for all η ∈ H.The expectation parameters η form another coordinate system of S andwe have the relations η = ∇F (θ) and θ = ∇F ?(η).
Link with maximum likelihood estimation through ηmle =1n∑n
j=1 T (x j).(S, g ,∇(1),∇(−1)) is a dually flat space in which the natural parameters θand the expectation parameters η are dual affine coordinate systems.It generalizes the self-dual Euclidean geometry, with two dual Bregmandivergences BF and BF? instead of the self-dual Euclidean distance.
Bregman divergence [Bregman, 1967].
Bφ(ξ ‖ ξ′) = φ(ξ)− φ(ξ′)− (ξ − ξ′)>∇φ(ξ′).
Canonical divergences of dually flat spaces, bijection with exponentialfamilies [Amari & Nagaoka, 2000, Banerjee et al., 2005]:DKL(pξ ‖ pξ′) = BF (θ′ ‖ θ) = BF?(η ‖ η′).Generic algorithms that handle many distances [Dessein & Cont, 2011a].
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 8/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Theoretical backgroundExponential familiesDually flat Bregman geometry
What is the canonical geometry of exponential families?
Figure: Geometrical viewpoint [Nielsen & Nock, 2009].
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 8/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
Outline
1 Information geometry
2 Proposed systemGeneral architectureSound descriptors modelingChange detection
3 Obtained results
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 9/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to segment audio streams?
Architecture:1 Represent the incoming audio stream with short-time
sound descriptors x j .2 Model the features x j with probability distributions
pξj from a given statistical family.3 Detect when a change in the parameters ξj occurs.
Sequential change detection:1 Accumulate the incoming observations x j in a
growing window x = (x1, . . . , xn).2 Incrementally try to detect a change at any time i of
the window until a change is detected.3 Discard the observations and start again with an
initial window x = (x i+1, . . . , xn).
Reduces to finding one change-point in a givenwindow x = (x1, . . . , xn).
Figure: Segmentation at time t.
Figure: Schema of thegeneral architecture ofthe system.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 10/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to segment audio streams?
Architecture:1 Represent the incoming audio stream with short-time
sound descriptors x j .2 Model the features x j with probability distributions
pξj from a given statistical family.3 Detect when a change in the parameters ξj occurs.
Sequential change detection:1 Accumulate the incoming observations x j in a
growing window x = (x1, . . . , xn).2 Incrementally try to detect a change at any time i of
the window until a change is detected.3 Discard the observations and start again with an
initial window x = (x i+1, . . . , xn).
Reduces to finding one change-point in a givenwindow x = (x1, . . . , xn).
Figure: Segmentation at time t.
Figure: Schema of thegeneral architecture ofthe system.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 10/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to segment audio streams?
Architecture:1 Represent the incoming audio stream with short-time
sound descriptors x j .2 Model the features x j with probability distributions
pξj from a given statistical family.3 Detect when a change in the parameters ξj occurs.
Sequential change detection:1 Accumulate the incoming observations x j in a
growing window x = (x1, . . . , xn).2 Incrementally try to detect a change at any time i of
the window until a change is detected.3 Discard the observations and start again with an
initial window x = (x i+1, . . . , xn).
Reduces to finding one change-point in a givenwindow x = (x1, . . . , xn).
Figure: Segmentation at time t.
Figure: Schema of thegeneral architecture ofthe system.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 10/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to model sounds?Computation of a sound descriptor x j :
Fourier or constant-Q transforms for information on the spectral content.Mel-frequency cepstral coefficients for information on the timbre.Many other possibilities.
Modeling with a probability distribution pξj from a statistical family:Categorical distributions.Multivariate Gaussian distributions.Many other possibilities.
Figure: Sound descriptors modeling.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 11/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to model sounds?Computation of a sound descriptor x j :
Fourier or constant-Q transforms for information on the spectral content.Mel-frequency cepstral coefficients for information on the timbre.Many other possibilities.
Modeling with a probability distribution pξj from a statistical family:Categorical distributions.Multivariate Gaussian distributions.Many other possibilities.
Figure: Sound descriptors modeling.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 11/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to detect a change?Problem
Detect one change-point in a given window x = (x1, . . . , xn).
Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:
12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to detect a change?Problem
Detect one change-point in a given window x = (x1, . . . , xn).
Assumptions
The samples x j are drawn independently from a given statistical modelS = {pξ : ξ ∈ Ξ} and are identically distributed before (resp., after) change.
Usual approach [Basseville & Nikiforov, 1993]:Assume that ξ0 and ξ1 before and after change are known:H0: x1, . . . , xn ∼ pξ0 .H i1: x1, . . . , x i ∼ pξ0 , and x i+1, . . . , xn ∼ pξ1 .
CuSum test can be employed by thresholding the likelihood ratio:
12
LRi = logp(x |H i
1)
p(x |H0)= log
∏ij=1 pξ0 (x j )
∏nj=i+1 pξ1 (x j )∏i
j=1 pξ0 (x j )∏n
j=i+1 pξ0 (x j )=
n∑j=i+1
logpξ1 (x j )pξ0 (x j )
.
ξ1 unknown: CuSum test can still be employed by computing generalizedlikelihood ratio statistics where we replace ξ1 with ξi1.ξ0 unknown: the test cannot be written in its simple form anymore.
Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:
12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to detect a change?Problem
Detect one change-point in a given window x = (x1, . . . , xn).
Assumptions
The samples x j are drawn independently from a given statistical modelS = {pξ : ξ ∈ Ξ} and are identically distributed before (resp., after) change.
Usual approach [Basseville & Nikiforov, 1993]:Assume that ξ0 and ξ1 before and after change are known:H0: x1, . . . , xn ∼ pξ0 .H i1: x1, . . . , x i ∼ pξ0 , and x i+1, . . . , xn ∼ pξ1 .
CuSum test can be employed by thresholding the likelihood ratio:
12
LRi = logp(x |H i
1)
p(x |H0)= log
∏ij=1 pξ0 (x j )
∏nj=i+1 pξ1 (x j )∏i
j=1 pξ0 (x j )∏n
j=i+1 pξ0 (x j )=
n∑j=i+1
logpξ1 (x j )pξ0 (x j )
.
ξ1 unknown: CuSum test can still be employed by computing generalizedlikelihood ratio statistics where we replace ξ1 with ξi1.ξ0 unknown: the test cannot be written in its simple form anymore.
Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:
12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to detect a change?Problem
Detect one change-point in a given window x = (x1, . . . , xn).
Assumptions
The samples x j are drawn independently from a given statistical modelS = {pξ : ξ ∈ Ξ} and are identically distributed before (resp., after) change.
Usual approach [Basseville & Nikiforov, 1993]:Assume that ξ0 and ξ1 before and after change are known:H0: x1, . . . , xn ∼ pξ0 .H i1: x1, . . . , x i ∼ pξ0 , and x i+1, . . . , xn ∼ pξ1 .
CuSum test can be employed by thresholding the likelihood ratio:
12
LRi = logp(x |H i
1)
p(x |H0)= log
∏ij=1 pξ0 (x j )
∏nj=i+1 pξ1 (x j )∏i
j=1 pξ0 (x j )∏n
j=i+1 pξ0 (x j )=
n∑j=i+1
logpξ1 (x j )pξ0 (x j )
.
ξ1 unknown: CuSum test can still be employed by computing generalizedlikelihood ratio statistics where we replace ξ1 with ξi1.
ξ0 unknown: the test cannot be written in its simple form anymore.
Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:
12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to detect a change?Problem
Detect one change-point in a given window x = (x1, . . . , xn).
Assumptions
The samples x j are drawn independently from a given statistical modelS = {pξ : ξ ∈ Ξ} and are identically distributed before (resp., after) change.
Usual approach [Basseville & Nikiforov, 1993]:Assume that ξ0 and ξ1 before and after change are known:H0: x1, . . . , xn ∼ pξ0 .H i1: x1, . . . , x i ∼ pξ0 , and x i+1, . . . , xn ∼ pξ1 .
CuSum test can be employed by thresholding the likelihood ratio:
12
LRi = logp(x |H i
1)
p(x |H0)= log
∏ij=1 pξ0 (x j )
∏nj=i+1 pξ1 (x j )∏i
j=1 pξ0 (x j )∏n
j=i+1 pξ0 (x j )=
n∑j=i+1
logpξ1 (x j )pξ0 (x j )
.
ξ1 unknown: CuSum test can still be employed by computing generalizedlikelihood ratio statistics where we replace ξ1 with ξi1.ξ0 unknown: the test cannot be written in its simple form anymore.
Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:
12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to detect a change?Problem
Detect one change-point in a given window x = (x1, . . . , xn).
Assumptions
The samples x j are drawn independently from a given exponential modelS = {pθ : θ ∈ Θ} and are identically distributed before (resp., after) change.
Proposed approach for exponential families [Dessein & Cont, 2011b]:θ0 and θ1 unknown:H0: x1, . . . , xn ∼ pθ0 .H i1: x1, . . . , x i ∼ pθi0 , and x i+1, . . . , xn ∼ pθi1 .
Generalized likelihood ratio now becomes:
1
2GLRi = log
∏ij=1 p
θi0(xj )
∏nj=i+1 p
θi1(xj )∏i
j=1 pθ0
(xj )∏nj=i+1 p
θ0(xj )
=i∑
j=1log
pθi0
(xj )
pθ0
(xj )+
n∑j=i+1
logpθi1
(xj )
pθ0
(xj )
=i∑
j=1
((θi0 − θ0)
>T (xj ) − F (θi0) + F (θ0))
+n∑
j=i+1
((θi1 − θ0)
>T (xj ) − F (θi1) + F (θ0))
= i(F (θ0) − F (θi0) + (θi0 − θ0)
>∇F (θi0mle ))
+ (n − i)(F (θ0) − F (θi1) + (θi1 − θ0)
>∇F (θi1mle )).
Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:
12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to detect a change?Problem
Detect one change-point in a given window x = (x1, . . . , xn).
Assumptions
The samples x j are drawn independently from a given exponential modelS = {pθ : θ ∈ Θ} and are identically distributed before (resp., after) change.
Proposed approach for exponential families [Dessein & Cont, 2011b]:θ0 and θ1 unknown:H0: x1, . . . , xn ∼ pθ0 .H i1: x1, . . . , x i ∼ pθi0 , and x i+1, . . . , xn ∼ pθi1 .
Generalized likelihood ratio now becomes:
1
2GLRi = log
∏ij=1 p
θi0(xj )
∏nj=i+1 p
θi1(xj )∏i
j=1 pθ0
(xj )∏nj=i+1 p
θ0(xj )
=i∑
j=1log
pθi0
(xj )
pθ0
(xj )+
n∑j=i+1
logpθi1
(xj )
pθ0
(xj )
=i∑
j=1
((θi0 − θ0)
>T (xj ) − F (θi0) + F (θ0))
+n∑
j=i+1
((θi1 − θ0)
>T (xj ) − F (θi1) + F (θ0))
= i(F (θ0) − F (θi0) + (θi0 − θ0)
>∇F (θi0mle ))
+ (n − i)(F (θ0) − F (θi1) + (θi1 − θ0)
>∇F (θi1mle )).
Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:
12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to detect a change?Problem
Detect one change-point in a given window x = (x1, . . . , xn).
Assumptions
The samples x j are drawn independently from a given exponential modelS = {pθ : θ ∈ Θ} and are identically distributed before (resp., after) change.
Test statistics12
GLRi = i(DKL
(pθi0mle
∥∥∥pθ0)−DKL(pθi0mle
∥∥∥pθi0
))+(n−i)
(DKL
(pθi1mle
∥∥∥pθ0)−DKL(pθi1mle
∥∥∥pθi1
)).
Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:
12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to detect a change?Problem
Detect one change-point in a given window x = (x1, . . . , xn).
Assumptions
The samples x j are drawn independently from a given exponential modelS = {pθ : θ ∈ Θ} and are identically distributed before (resp., after) change.
Test statistics12
GLRi = i(DKL
(pθi0mle
∥∥∥pθ0)−DKL(pθi0mle
∥∥∥pθi0
))+(n−i)
(DKL
(pθi1mle
∥∥∥pθ0)−DKL(pθi1mle
∥∥∥pθi1
)).
Information-geometric interpretation.
Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:
12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to detect a change?Problem
Detect one change-point in a given window x = (x1, . . . , xn).
Assumptions
The samples x j are drawn independently from a given exponential modelS = {pθ : θ ∈ Θ} and are identically distributed before (resp., after) change.
Test statistics12
GLRi = i(DKL
(pθi0mle
∥∥∥pθ0)−DKL(pθi0mle
∥∥∥pθi0
))+(n−i)
(DKL
(pθi1mle
∥∥∥pθ0)−DKL(pθi1mle
∥∥∥pθi1
)).
Information-geometric interpretation.Encompasses statistic and distance-based methods.
Computationally efficient updates when considering the mles:
12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
General architectureSound descriptors modelingChange detection
How to detect a change?Problem
Detect one change-point in a given window x = (x1, . . . , xn).
Assumptions
The samples x j are drawn independently from a given exponential modelS = {pθ : θ ∈ Θ} and are identically distributed before (resp., after) change.
Test statistics12
GLRi = i(DKL
(pθi0mle
∥∥∥pθ0)−DKL(pθi0mle
∥∥∥pθi0
))+(n−i)
(DKL
(pθi1mle
∥∥∥pθ0)−DKL(pθi1mle
∥∥∥pθi1
)).
Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:
12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Synthetic dataReal-world dataAudio data
Outline
1 Information geometry
2 Proposed system
3 Obtained resultsSynthetic dataReal-world dataAudio data
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 13/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Synthetic dataReal-world dataAudio data
Fixed-variance univariate normal densities
50 100 150 200 250 300 350 400 450 500
0.020.040.06
Computation time
50 100 150 200 250 300 350 400 450 50020406080100
Maximum generalized likelihood ratio
50 100 150 200 250 300 350 400 450 500−101
Series
50 100 150 200 250 300 350 400 450 500−101
Change detection
1 2 3−101
Original parameters
1 2 3−101
Estimated parameters
Generalized likelihood ratio
100 200 300 400 500
50
100
150
200
250
300
350
400
450
500
0
10
20
30
40
50
60
70
80
90
100
Figure: Change detection in fixed-variance univariate normal data.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 14/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Synthetic dataReal-world dataAudio data
Univariate exponential densities
50 100 150 200 250 300 350 400 450 500
0.020.040.06
Computation time
50 100 150 200 250 300 350 400 450 50020406080100
Maximum generalized likelihood ratio
50 100 150 200 250 300 350 400 450 500
204060
Series
50 100 150 200 250 300 350 400 450 500
204060
Change detection
1 2 3012
Original parameters
1 2 3012
Estimated parameters
Generalized likelihood ratio
100 200 300 400 500
50
100
150
200
250
300
350
400
450
500
10
20
30
40
50
60
70
80
90
100
Figure: Change detection in univariate exponential data.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 15/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Synthetic dataReal-world dataAudio data
Multivariate normal densities
50 100 150 200 250 300 350 400 450 500
0.020.040.060.08
Computation time
50 100 150 200 250 300 350 400 450 50080
100120
Maximum generalized likelihood ratio
50 100 150 200 250 300 350 400 450 500
−202
Series
50 100 150 200 250 300 350 400 450 500
−202
Change detection
1 2 3−2
0
2Original parameters
1 2 3−2
0
2Estimated parameters
Generalized likelihood ratio
100 200 300 400 500
50
100
150
200
250
300
350
400
450
500
0
20
40
60
80
100
120
Figure: Change detection in multivariate normal data.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 16/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Synthetic dataReal-world dataAudio data
Well-log
100 200 300 400 500 600 700 800 900 1000
0.020.040.06
Computation time
100 200 300 400 500 600 700 800 900 10005
101520
Maximum generalized likelihood ratio
100 200 300 400 500 600 700 800 900 1000−4−202
Time series
100 200 300 400 500 600 700 800 900 1000−4−202
Segmentation
0 2 4 6 8 10 12 14 16 18−505
Estimated parameters
Figure: Segmentation of well-log data.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 17/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Synthetic dataReal-world dataAudio data
Daily log-return of the Dow Jones
2000 4000 6000 8000 10000 12000 14000 16000 18000
0.020.040.06
Computation time
2000 4000 6000 8000 10000 12000 14000 16000 18000
200400600
Maximum generalized likelihood ratio
2000 4000 6000 8000 10000 12000 14000 16000 18000−20−10
010
Time series
2000 4000 6000 8000 10000 12000 14000 16000 18000−20−10
010
Segmentation
0 5 10 15 20 25 30−100
0100
Estimated parameters
Figure: Segmentation of the daily log-return of the Dow Jones.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 18/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Synthetic dataReal-world dataAudio data
Speech
0 2 4 6 8 10 12 14−1
0
1Original audio
Time (s)
Frame number
Fra
me
num
ber
Generalized likelihood ratio
100 200 300 400 500 600
100
200
300
400
500
600
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Figure: Speaker change detection in a speech fragment.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 19/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Synthetic dataReal-world dataAudio data
Polyphonic music
−1
0
1Original audio
0 5 10 15 20 25 30 35F2
A2#
D3#
G3#
C4#
F4#
B4
E5
A5
Time (s)
Pitc
h
Piano Roll
Figure: Note change detection in a polyphonic music excerpt.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 20/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
What we (don’t) have
Summary and perspectives.Representations.Descriptors modeling.Temporality of events.Applications.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 21/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
What we (don’t) have
Summary and perspectives.Representations.Descriptors modeling.Temporality of events.Applications.
Many possibilities.Combinations of descriptors.Feature selection.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 21/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
What we (don’t) have
Summary and perspectives.Representations.Descriptors modeling.Temporality of events.Applications.
Exponential families and Bregman divergences, mixture models.Model selection.Other geometries, divergences, test statistics.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 21/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
What we (don’t) have
Summary and perspectives.Representations.Descriptors modeling.Temporality of events.Applications.
Assumption of quasi-stationarity.Non-stationarity modeling.Conditional distributions, linear/non-linear systems.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 21/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
What we (don’t) have
Summary and perspectives.Representations.Descriptors modeling.Temporality of events.Applications.
Evaluation on large datasets in audio and other domains.Onset detection, music segmentation, speaker segmentation, etc.First stage in real-time systems for polyphonic music transcription,music similarity analysis, computer-assisted improvisation.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 21/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
What we (don’t) have
Summary and perspectives.Representations.Descriptors modeling.Temporality of events.Applications.
Thank you for your attention! Questions?This work was supported by a doctoral fellowship from the UPMC(EDITE) and by a grant from the JST-CNRS ICT (Improving theVR Experience).
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 21/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Bibliography I
Amari, S.-i. & Nagaoka, H. (2000).Methods of information geometry, volume 191 of Translations of Mathematical Monographs.American Mathematical Society.
Banerjee, A., Merugu, S., Dhillon, I. S., & Ghosh, J. (2005).Clustering with Bregman divergences.Journal of Machine Learning Research, 6, 1705–1749.
Basseville, M. & Nikiforov, V. (1993).Detection of abrupt changes: Theory and application.Englewood Cliffs, NJ, USA: Prentice-Hall, Inc.
Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. B. (2005).A tutorial on onset detection in music signals.IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1047.
Bregman, L. M. (1967).The relaxation method of finding the common point of convex sets and its application to the solution of problems in convexprogramming.USSR Computational Mathematics and Mathematical Physics, 7(3), 200–217.
Chentsov, N. N. (1982).Statistical decision rules and optimal inference, volume 53 of Translations of Mathematical Monographs.American Mathematical Society.
Cont, A., Dubnov, S., & Assayag, G. (2011).On the information geometry of audio streams with applications to similarity computing.IEEE Transactions on Audio, Speech and Language Processing, 19(4), 837–846.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 22/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Bibliography II
Darmois, G. (1935).Sur les lois de probabilités à estimation exhaustive.Comptes Rendus des Séances de l’Académie des Sciences, 200, 1265–1266.
Delacourt, P. & Wellekens, C. J. (2000).DISTBIC: A speaker-based segmentation for audio data indexing.Speech Communication, 32(1–2), 111–126.
Dessein, A. & Cont, A. (2011a).Applications of information geometry to audio signal processing.In 14th International Conference on Digital Audio Effects (DAFx) Paris, France.
Dessein, A. & Cont, A. (2011b).Information-geometric approach to real-time audio change detection.Submitted.
Foote, J. (2000).Automatic audio segmentation using a measure of audio novelty.In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), volume 1 (pp. 452–455). New York City,NY, USA.
Grasic, M., Kos, M., & Kacic, Z. (2010).Online speaker segmentation and clusteringusing cross-likelihood ratio calculation with reference criterion selection.IET Signal Processing, 4(6), 673–685.
Kemp, T., Schmidt, M., Westphal, M., & Waibel, A. (2000).Strategies for automatic segmentation of audio data.In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 3 (pp.1423–1426). Istanbul, Turquie.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 23/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Bibliography III
Koopman, B. O. (1936).On distributions admitting a sufficient statistic.Transactions of the American Mathematical Society, 39(3), 399–409.
Kotti, M., Benetos, E., & Kotropoulos, C. (2008).Computationally efficient and robust BIC-based speaker segmentation.IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 920–933.
Nielsen, F. & Garcia, V. (2009).Statistical exponential families: A digest with flash cards.
Nielsen, F. & Nock, R. (2009).Sided and symmetrized Bregman centroids.IEEE Transactions on Information Theory, 55(6), 2882–2904.
Omar, M. K. & Chaudhari, U. (2005).Blind change detection for audio segmentation.In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 501–504).Philadelphie, PA, USA.
Pitman, E. J. G. (1936).Sufficient statistics and intrinsic accuracy.Mathematical Proceedings of the Cambridge Philosophical Society, 32(4), 567–579.
Rao, C. R. (1945).Information and accuracy attainable in the estimation of statistical parameters.Bulletin of the Calcutta Mathematical Society, 37, 81–91.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 24/21
IntroductionInformation geometry
Proposed systemObtained results
Conclusion
Bibliography IV
Siegler, M. A., Jain, U., Raj, B., & Stern, R. M. (1997).Automatic segmentation, classification and clustering of broadcast news audio.In Proceedings of the DARPA Speech Recognition Workshop (pp. 97–99). Chantilly, VA, USA.
Sundaram, H. & Chang, S.-F. (2000).Audio scene segmentation using multiple features, models and time scales.In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 4 (pp.2441–2444). Istanbul, Turquie.
Tritschler, A. & Gopinath, R. A. (1999).Improved speaker segmentation and segments clustering using the Bayesian information criterion.In Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech), volume 2 (pp.679–682). Budapest, Hongrie.
[email protected] October 26th 2011 IRCAM Research & Technology Seminar 25/21