THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM

Post on 22-Feb-2016

38 views 0 download

Tags:

description

THE FUNDAMENTAL FREQUENCY VARIATION SPECTRUM. FONETIK 2008 Kornel Laskowski , Mattias Heldner and Jens Edlund interACT , Carnegie Mellon University, Pittsburgh PA, USA Centre for Speech Technology, KTH Stockholm, Sweden. Speaker: Hsiao- Tsung. Introduction. - PowerPoint PPT Presentation

transcript

THE FUNDAMENTAL FREQUENCY VARIATION

SPECTRUMFONETIK 2008

Kornel Laskowski, Mattias Heldner and Jens EdlundinterACT, Carnegie Mellon University, Pittsburgh PA, USACentre for Speech Technology, KTH Stockholm, Sweden

Speaker: Hsiao-Tsung

Introduction While speech recognition systems have long ago

transitioned from formant localization to spectral (vector-valued) formant representations.

Prosodic processing continues to rely squarely on a pitch tracker’s ability to identify a peak, corresponding to the fundamental frequency(f0) of the speaker.

Even if a robust, local, analytic, statistical estimate of absolute pitch were available, applications require a representation of pitch variation and go to considerable additional effort to identify a speaker-dependent quantity for normalization

The Fundamental Frequency Variation Spectrum

Instantaneous variation in pitch is normally computed by determining a single scalar, the F0, at two temporally adjacent instants and forming their difference.

The Fundamental Frequency Variation Spectrum

we propose a vector-valued representation of pitch variation, inspired by vanishing-point perspective(透視 )

While the standard inner product between two vectors can be viewed as the summation of pair-wise products with pairs selected by orthonormal projection onto a point at infinity

F: signal’s spectral content (512-point FFT)

The Fundamental Frequency Variation Spectrum

the proposed vanishing-point product induces a 1-point perspective projection onto a point at

The Fundamental Frequency Variation Spectrum

The FFV spectrum is then given by

is undefined over the interval [-T0, +T0]

The Fundamental Frequency Variation Spectrum

A support for which is continuous over In practice, we compute using magnitude rather than

complex spectra

The Fundamental Frequency Variation Spectrum

and are 512-point Fourier transforms, computed every 8 ms.

However, the discrete transforms FL and FR are in general not defind at the corresponding dilate frequencies .

We resort to linear interpolation using the coefficients

The Fundamental Frequency Variation Spectrum

Energy independent

Filterbank

Rapidly changing

slowly changing

Filterbank

Discussion Initial experiments along these lines show that such

HMMs, when trained on dialogue data, corroborate research on human turn-taking behavior in conversations.

does not require peak identification, dynamic time warping, median filtering, landmark detection, linearization, or mean pitch estimation and subtraction

Immediate next steps include fine-tuning the filter banks and the HMM topologies, and testing the results on other tasks where pitch movements are expected to play a role, such as the attitudinal coloring of short feedback utterances, speaker verification, and automatic speech recognition for tonal languages.