GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Pitch Detection and Tracking Juhan Nam 1
Transcript
Slide 1
GCT731 Fall 2014 Topics in Music Technology - Music Information
Retrieval Pitch Detection and Tracking Juhan Nam 1
Slide 2
Introduction Music is described with what? The majority of
musical symbols are notes which mainly contains pitch information
We (our brains) usually memorize music as a melody, that is, a
sequence of pitches 2
Slide 3
Outlines Introduction Definition of Pitch Information in Pitch
Pitch and Harmonicity Pitch Detection Algorithms Time-Domain
Approaches Frequency-Domain Approaches Psychoacoustic Model
Approaches Learning-based Approaches Pitch Tracking Applications
3
Slide 4
Definition of Pitch Pitch Defined as auditory attribute of
sound according to which sounds can be ordered on a scale from low
and high (ANSI, 1994) One way of measuring pitch is finding the
frequency of a sine wave that is matched to the target sound in a
psychophysical experiment thus, subject to individual persons: e.g.
tone-deaf Fundamental Frequency Physical attribute of sounds
measured from periodicity Often called F0 Thus, pitch should be
discriminated from F0 However, they are very close for sounds of
our interest (i.e. musical sounds). So pitch is often used mixed
with F0 4
Slide 5
Information in pitch Music Melody or notes Harmony (when there
are multiple notes with different pitches ) Size (or register) of
musical instruments: Bass, Cello, violin Speech Person: gender,
age, identity Context: question, mood, attitude, Meaning: Chinese
(Mandarin) Others Vocalization of animals (e.g. birds chirp,
whale): size and types, communication 5
Slide 6
Pitch and Harmonicity Not all sounds have pitch Harmonics
sounds Regularly spaced harmonic partials Speech or Singing Voice:
Vowel Musical Instruments: Piano*, Guitar, Strings, Woodwind,
Brass, Organ Non-harmonic sounds No harmonic patterns or irregular
harmonic partials Speech or Singing Voice: Consonant Musical
Instruments: Drum, Mallet (has pitch but not harmonic) 6
*Inharmonicity in Piano Vibraphone [From Klapuris slides]
Slide 7
Pitch Detection Algorithms Taxonomy of Algorithms Time-Domain
Approaches Frequency-Domain Approaches Psychoacoustic Model
Approaches Learning-based Approaches 7
Slide 8
Time-Domain Approach Basic Ideas Periodicity: x(t) = x(t+T)
Measure similarity (or distance) between two segments Find the
period (T) that gives the closest distance Two main approaches
Auto-correlation function (ACF): distance by inner product Average
magnitude difference function(AMDF): distance by difference (e.g.,
L1, L2 norm) 8 T
Slide 9
Auto-Correlation Function (ACF) Measuring self-similarity by 9
Singing Voice (Sondhi 1967)
Slide 10
Biased auto-correlation Unbiased auto-correlation
Auto-Correlation Function (ACF) 10
Slide 11
Comparison of spectrogram and ACF 11 Spectrogram (tracking max
values) ACF (tracking max values)
Slide 12
Interpretation of ACF in Frequency Domain By convolution
theorem, auto-correlation can be computed in frequency domain and
also efficiently using FFT Thus, the ACF can be computed as 12
Slide 13
Interpretation of ACF in Frequency Domain This is equivalent to
ACF is a simple template-based approach in frequency domain
Positive weights for (harmonic) peaks and negative weights for
valleys 13
Slide 14
Problems in ACF Bias to the large peak around zero lag Not
robust to octave errors, particularly, lower octaves ACF is
sensitive to amplitude changes Equal weights for all harmonic
partials In general, low-numbered harmonic partials are more
important in determining pitch 14
Slide 15
Average Magnitude Difference Function (AMDF) Measuring
self-similarity by In YIN, p is set to 2 And the AMDF is normalized
as 15 Minimize the negative ACF plus a lag-dependent term (de
Cheveign & Kawahara, 2002)
Slide 16
Average Magnitude Difference Function (AMDF) 16 AMDF Normalized
AMDF
Slide 17
Why YIN (AMDF) works better 17 Robust to changes in amplitude
The difference takes care of amplitude changes. This reduces octave
errors. Zero-lag bias is avoided by the normalized AMDF The
normalized AMDF allows using a fixed threshold Can choose multiple
candidates and refine peaks
Slide 18
Example of AMDF (YIN) 18
Slide 19
Frequency-Domain Approach Basic Ideas Periodic in time domain
Harmonic in frequency domain Measure how harmonic the spectrum Find
F0 that best explains the harmonic patterns (harmonic partials)
Template matching Harmonic Sieve or Spectral Template Cepstrum
Harmonic-Product-Sum (HPS) 19
Slide 20
Harmonic Sieves (or Comb-filtering) Using sharp harmonic sieves
to take peak regions only ACF is similar to this but not sharp
enough Sigmund~ (PD) and fiddle~ (MaxMSP) are based on weighted
harmonics sieves 20 (Puckette et al. 1998)
Slide 21
Spectral Template Cross-correlation with an ideal template on a
log-scale spectrogram 21 [From Ellis e4896 course slides]
Slide 22
Cepstrum Real Cepstrum is defined as Basic ideas Harmonic
partials are periodic in frequency domain (Inverse) FFT find the
the periodicity 22 Liftering (Noll, 1967)
Slide 23
Harmonic Product Sum (HPS) Harmonic Product Sum (HPS) is
obtained by multiplying the original magnitude spectrum its
decimated spectra by an integer number 23 (Noll, 1969)
Slide 24
Stabilize & Combine Auditory Model 24 input...... HC......
ACF Summary ACF Correlogram Correlogram is formed by concatenating
the ACF of individual HC output Summary ACF is computed by summing
the ACF across all channels The peaks in the ACF represent
periodicity features This is known to be robust to band-limited
noises
Slide 25
Example of Auditory Model 25 Summary ACF
Slide 26
Pitch Tracking Pitch is usually continuous over time Once a
pitch with strong harmonicity is detected on a frame, the following
frames form smooth pitch contour Pitch tracking methods Post
processing: first detect pitch in a frame-by-frame manner and then
find a continuous path by smoothing. Median Filtering Dynamic
Programming (Talkin, 1995) Probabilistic approach: detect multiple
pitch candidates every frame and and find the best path
Viterbi-decoding: Probabilistic YIN (Mauch, 2014) 26
Slide 27
Issues and Challenges Voice activity detection (VAD) / singing
voice detection (SVAD) Discriminate voice/unvoiced/silent frames
Latency: real-time implementation The use of long windows results
in slight delay Post-processing / Probabilistic approaches need
larger delay Noisy environment Learning-based approaches: NMF or
classifiers Active research topic Melody Transcription Pre-dominant
Pitch Detection Singing Voice Separation + Pitch Tracking active
research topic Polyphonic Pitch 27
Slide 28
Musical Applications Sound Modification Time-stretching using
PSOLA Auto-tune: pitch-correction or T-Pain effect Music
Performance Tuning musical instruments Pitch-based sound control:
e.g. fiddle~ Score-following and auto-accompaniment Query-by
humming Relative pitch change might be more important Singing
evaluation (e.g. karaoke) and visualization 28 Original (Variable)
Time-Stretched (N. Bryan, 2012)
Slide 29
References A. de Cheveigne and H. Kawahara, YIN, a Fundamental
Frequency Estimator for Speech and Music, 2002. A. Noll, Cepstrum
Pitch Determination, 1967. A. Noll, Pitch Determination of Human
Speech by the Harmonic Product Spectrum, the harmonic sum spectrum
and a maximum likelihood estimate, 1969 M. Puckette, T. Apel and D.
Zicarelli, Real-time audio analysis tools for Pd and MSP, 1998 M.
Sondhi,New Methods of Pitch Extraction, 1968. D. Talkin,A Robust
Algorithm for Pitch Tracking (RAPT), 1995. M. Mauch and S.
Dixon,PYIN: A Fundamental Frequency Estimator Using Probabilistic
Threshold Distributions, 2014. 29