+ All Categories
Home > Documents > THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM

THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM

Date post: 22-Feb-2016
Category:
Upload: milton
View: 38 times
Download: 0 times
Share this document with a friend
Description:
THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM. FONETIK 2008 Kornel Laskowski , Mattias Heldner and Jens Edlund interACT , Carnegie Mellon University, Pittsburgh PA, USA Centre for Speech Technology, KTH Stockholm, Sweden. Speaker: Hsiao- Tsung. Introduction. - PowerPoint PPT Presentation
Popular Tags:
12
THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM FONETIK 2008 Kornel Laskowski, Mattias Heldner and Jens Edlund interACT, Carnegie Mellon University, Pittsburgh PA, USA Centre for Speech Technology, KTH Stockholm, Sweden Speaker: Hsiao-Tsung
Transcript
Page 1: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM

THE FUNDAMENTAL FREQUENCY VARIATION

SPECTRUMFONETIK 2008

Kornel Laskowski, Mattias Heldner and Jens EdlundinterACT, Carnegie Mellon University, Pittsburgh PA, USACentre for Speech Technology, KTH Stockholm, Sweden

Speaker: Hsiao-Tsung

Page 2: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM

Introduction While speech recognition systems have long ago

transitioned from formant localization to spectral (vector-valued) formant representations.

Prosodic processing continues to rely squarely on a pitch tracker’s ability to identify a peak, corresponding to the fundamental frequency(f0) of the speaker.

Even if a robust, local, analytic, statistical estimate of absolute pitch were available, applications require a representation of pitch variation and go to considerable additional effort to identify a speaker-dependent quantity for normalization

Page 3: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM

The Fundamental Frequency Variation Spectrum

Instantaneous variation in pitch is normally computed by determining a single scalar, the F0, at two temporally adjacent instants and forming their difference.

Page 4: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM

The Fundamental Frequency Variation Spectrum

we propose a vector-valued representation of pitch variation, inspired by vanishing-point perspective(透視 )

While the standard inner product between two vectors can be viewed as the summation of pair-wise products with pairs selected by orthonormal projection onto a point at infinity

F: signal’s spectral content (512-point FFT)

Page 5: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM

The Fundamental Frequency Variation Spectrum

the proposed vanishing-point product induces a 1-point perspective projection onto a point at

Page 6: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM

The Fundamental Frequency Variation Spectrum

The FFV spectrum is then given by

is undefined over the interval [-T0, +T0]

Page 7: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM

The Fundamental Frequency Variation Spectrum

A support for which is continuous over In practice, we compute using magnitude rather than

complex spectra

Page 8: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM

The Fundamental Frequency Variation Spectrum

and are 512-point Fourier transforms, computed every 8 ms.

However, the discrete transforms FL and FR are in general not defind at the corresponding dilate frequencies .

We resort to linear interpolation using the coefficients

Page 9: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM

The Fundamental Frequency Variation Spectrum

Energy independent

Page 10: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM

Filterbank

Rapidly changing

slowly changing

Page 11: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM

Filterbank

Page 12: THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM

Discussion Initial experiments along these lines show that such

HMMs, when trained on dialogue data, corroborate research on human turn-taking behavior in conversations.

does not require peak identification, dynamic time warping, median filtering, landmark detection, linearization, or mean pitch estimation and subtraction

Immediate next steps include fine-tuning the filter banks and the HMM topologies, and testing the results on other tasks where pitch movements are expected to play a role, such as the attitudinal coloring of short feedback utterances, speaker verification, and automatic speech recognition for tonal languages.


Recommended