+ All Categories
Home > Documents > GCT634: Musical Applications of Machine Learning Rhythm...

GCT634: Musical Applications of Machine Learning Rhythm...

Date post: 27-Jul-2018
Category:
Upload: dinhcong
View: 225 times
Download: 2 times
Share this document with a friend
44
GCT634: Musical Applications of Machine Learning Rhythm Transcription Dynamic Programming Graduate School of Culture Technology, KAIST Juhan Nam
Transcript
Page 1: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

GCT634: Musical Applications of Machine LearningRhythm Transcription

Dynamic Programming

Graduate School of Culture Technology, KAISTJuhan Nam

Page 2: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Outlines

• Overview of Automatic Music Transcription (AMT)- Types of AMT Tasks

• Rhythmic Transcription- Introduction- Onset detection- Tempo Estimation

• Dynamic Programming- Beat Tracking

Page 3: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Overview of Automatic Music Transcription (AMT)

• Predicting musical score information from audio- Primary score information is note but they are arranged based on rhythm,

harmony and structure- Equivalent to automatic speech recognition (ASR) for speech signals

Model

Beat

Key Chord

Structure

TempoOnsets

Page 4: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Types of AMT Tasks

• Rhythm transcription- Onset detection- Tempo estimation- Beat tracking

• Tonal analysis - Key estimation- Chord recognition

• Timbre analysis- Instrument identification

• Note transcription- Monophonic note- Polyphonic note- Expression detection

(e.g. vibrato, pedal)

• Structure analysis- Musical structure- Musical boundary / repetition

detection- Highlight detection

Page 5: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Types of AMT Tasks

• Rhythm transcription- Onset detection- Tempo estimation- Beat tracking

• Tonal analysis - Key estimation- Chord recognition

• Timbre analysis- Instrument identification

• Note transcription- Monophonic note- Polyphonic note- Expression detection

(e.g. vibrato, pedal)

• Structure analysis- Musical structure- Musical boundary / repetition

detection- Highlight detection

We will mainly focus on these topics!

Page 6: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Overview of AMT Systems

• Acoustic model- Estimate the target information given input audio (usually short segment)

• Musical knowledge- Music theory (e.g. rhythm, harmony), performance (e.g. playability)

• Prior/Lexical model- Statistical distribution of the score-level music information (e.g. chord

progression)

AcousticModel

Musical Knowledge

TranscriptionModel

Beat, Tempo

Key, Chords

Notes

Prior or Lexical Model

Audio-Level

Score-Level

Page 7: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Introduction to Rhythm

• Rhythm- A strong, regular, and repeated pattern of sound- Distinguish music from speech

• The most primitive and foundational element of music- Melody, harmony and other musical elements are arranged on the basis of

rhythm

• Human and rhythm- Human has innate ability of rhythm perception: heart beat, walking - Associated with motor control: dance, labor song

Page 8: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Introduction to Rhythm

• Hierarchical structure of rhythm- Beat (tactus): the most prominent level,

foot tapping rate- Division (tatum): temporal atom, eighth

or sixteenth- Measure (bar): the unit of rhythm

pattern (and also harmonic changes)

• Notations- Tempo: beats per minute, e.g. 90 bpm - Time signature: e.g. 4/4, 3/4, 6/8

[Wikipedia]

Page 9: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Human Perception of Tempo

• Mckinney and Moelant (2006)- Collect tapping data from 40 human subjects- Initial synchronization delay and anticipation (by tempo estimation)- Ambiguity in tempo: beat or its division ?

[D. Ellis’ e4896 slides]

Page 10: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Overview of Rhythm Transcription Systems

• Consists of several cascaded tasks that detect moments of musical stress (accents) and their regularity

Beat Tracking

Tempo Estimation

OnsetDetection

Musical Knowledge

Page 11: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Onset Detection

• Identify the starting times of musical events- Notes, drum sounds

• Types of onsets- Hard onsets: percussive sounds- Soft onsets: source-driven sounds (e.g. singing voice, woodwind, bowed

strings)

[M.Muller]

Page 12: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Example: Onset Detection

0 1 2 3 4 5 6−1

−0.5

0

0.5

1

time [sec]

ampl

itude

?

“Eat (꺼내먹어요) ”Zion.T

Page 13: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Onset Detection Systems

• Onset detection function (ODF)- Instantaneous measure of temporal change, often called “novelty” function- Types: time-domain energy, spectral or sub-band energy, phase difference

• Decision algorithm- Ruled-based approach- Learning-based approach

DecisionAlgorithm

Onset Detection Function

AudioRepresentations

(Feature Extraction) (Classifier)

Page 14: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Onset Detection Function (ODF)

• Types of ODFs- Time-domain energy- Spectral or sub-band energy- Phase difference

Page 15: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Time-Domain Onset Detection

• Local energy - Usually have high energy at onsets - Effective for percussive sounds

• Various versions- Frame-level energy

- Half-wave rectification

𝑂𝐷𝐹(𝑛) = 𝐸 𝑛 = ) 𝑥 𝑛 +𝑚 𝑤(𝑚) ./

012/

𝑂𝐷𝐹(𝑛) = 𝐻(𝐸 𝑛 + 1 − 𝐸 𝑛 )

𝐻 𝑟 =𝑟 + 𝑟2

= 8𝑟, 𝑟 ≥ 00, 𝑟 < 0

0 1 2 3 4 5 6−1

−0.5

0

0.5

1

time [sec]

ampl

itude

Waveform

0 1 2 3 4 5 60

5

10

15

20

time [sec]

OD

F

0 1 2 3 4 5 60

2

4

6

8

10

time [sec]

OD

F

Page 16: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Spectral-Based Onset Detection

• Spectral Flux- Sum of the positive differences from

log spectrogram- ODF changes depending on the

amount of compression 𝜌

time [sec]

frequ

ency−k

Hz

1 2 3 4 50

0.5

1

1.5

2

x 104

0 1 2 3 4 50

100

200

300

400

time [sec]

OD

F𝑂𝐷𝐹(𝑛) = ) 𝐻(𝑌 𝑛 + 1, 𝑘 − 𝑌 𝑛, 𝑘 )/2A

B1C

𝑌 𝑛, 𝑘 = log 1 + 𝜌 𝑋 𝑛, 𝑘 𝑋 𝑛, 𝑘 : STFT

Page 17: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Phase Deviation

• Sinusoidal components of a note is continuous while the note is sustained- Abrupt change in phase means that there may be a new event

[D. Ellis’ e4896 slides]

Deviation from the steady-statefor all frequency bins

ϕk (n)−ϕk (n−1) ≈ϕk (n−1)−ϕk (n− 2) Phase continuation (e.g. during sustain of a single note)

Δϕk (n) =ϕk (n)− 2ϕk (n−1)+ϕk (n− 2) ≈ 0

ζ p =1N

Δϕk (n)k=1

N

Page 18: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Post-Processing

• DC removal - Subtract the mean of ODF

• Normalization- Scaling level of ODF

• Low-pass filtering- Remove small peaks

• Down-sampling- For data reduction

Low-pass Filtering (Solid line)

(Tzanetakis, 2010)

Page 19: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Onset Decision Algorithm

• Rule-based Approach: peak detection rule- Peaks above thresholds are determined as onsets- The thresholds are often adaptively computed from the ODF- Averaging and median are popular choices to compute the thresholds

threshold =α +β ⋅median(ODF) α : offset,β : scaling

Median with window size 5

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5time [sec]

0

50

100

150

200

250

300

350

OD

F

ODFThreshold

Page 20: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Challenging Issue in Onset Detection: Vibrato

Onset detection using spectral flux

Page 21: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

SuperFlux

• A state-of-the-art rule-based onset detection function- S. Bock et al., “Maximum Filter Vibrato Suppression For Onset Detection”,

DAFx, 2013

• Step1: log-spectrogram- Make harmonic partials have the same depth of vibrato contour

• Step2: max-filtering - Take the maximum in a window on the frequency axis- The vibrato contours become thicker

𝑌 𝑛,𝑚 = log 1 + 𝑋 𝑛, 𝑘 L 𝐹 𝑘,𝑚 𝑋 𝑛, 𝑘 : STFT

𝑌0MN 𝑛,𝑚 = max(𝑌 𝑛,𝑚 − 𝑙:𝑚 + 𝑙 )

Page 22: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

SuperFlux

• A state-of-the-art rule-based onset detection function - S. Bock et al., “Maximum Filter Vibrato Suppression For Onset Detection”,

DAFx, 2013

• Step1: log-spectrogram- Make harmonic partials have the same depth of vibrato contours

• Step2: max-filtering - Take the maximum in a window on the frequency axis- The vibrato contours become thicker

𝑌 𝑛,𝑚 = log 1 + 𝑋 𝑛, 𝑘 L 𝐹 𝑘,𝑚 𝑋 𝑛, 𝑘 : STFT

𝑌0MN 𝑛,𝑚 = max(𝑌 𝑛,𝑚 − 𝑙:𝑚 + 𝑙 )

Page 23: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

SuperFlux

Log-spectrogram

Max-filteredLog-spectrogram

Page 24: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

SuperFlux

• Step3: Super-flux- Take the difference with some distance- Assumption: frame-rate is high in onset detection (i.e. small hop size)

• Step 4: pick-picking- 1) 𝑆𝐹∗(𝑛) = max(𝑆𝐹∗ 𝑛 − 𝑝𝑟𝑒0MN: 𝑛 + 𝑝𝑜𝑠𝑡0MN )- 2) 𝑆𝐹∗(𝑛) ≥ mean(𝑆𝐹∗ 𝑛 − 𝑝𝑟𝑒M\]: 𝑛 + 𝑝𝑜𝑠𝑡M\] ) + 𝛿- 3) 𝑛 − 𝑛_`a\bcde2cfeag > 𝑐𝑜𝑚𝑏𝑖𝑛𝑎𝑡𝑖𝑜𝑛𝑤𝑖𝑑𝑡ℎ

𝑆𝐹∗(𝑛) = ) 𝐻(𝑌 𝑛 + 𝜇, 𝑘 − 𝑌 𝑛, 𝑘 )/2A

B1C𝜇 = max(1,

(𝑁2 − min 𝑛 𝑤 𝑛 > 𝑟 )ℎ

+ 0.5

(0 ≤ 𝑟 ≤ 1)

Page 25: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

SuperFlux

Peak-picking

Max-filteredLog-spectrogram

Page 26: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Tempo Estimation

• Estimate a regular time interval between beats- Tempo is a global attribute of a song: e.g. bpm or mid-tempo song

• Tempo often changes within a song - Intentionally: e.g. dramatic effect: Top 10 tempo changes- Unintentionally: e.g. re-mastering, live performance

• There are also local tempo changes: e.g. rubato

Page 27: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Tempo Estimation Methods

• Auto-Correlation- Find the periodicity as used in pitch detection

• Discrete Fourier Transform- Use DFT over ODF and find the periodicity

• Comb-filter Banks- Leverage the “oscillating nature” of musical beats

Page 28: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Auto-Correlation

• ACF is a generic method to detect periodicity of a signal- Thus, this can be applied to ODF to find a dominant period that may

correspond to tempo- The ACF shows the dominant peaks that indicate dominant tempi

0 1 2 3 4 5−1

0

1

2

3 x 105

time [sec]O

DF

0 1 2 3 4 50

100

200

300

400

time [sec]

OD

F

Onset Detection Function (spectral flux) Auto-Correlation

Page 29: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Tempo Estimation Using Tempo Prior

• Tempo is estimated by multiplying the prior with the auto-correlation (observation)- The auto-correlation corresponds to a likelihood function- Tempo prior can be calculated from beat annotations of a dataset- The distribution fits to a log-normal distribution well

Histogram of beats from a dataset

[D. Ellis’ e4896 slides]

(Klapuri, 2003)

Page 30: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Beat Spectrum

• Leverage the repetitive nature of music

• Algorithm- Step1: compute cosine distance between two

frames of magnitude spectrogram

- Step 2: sum the elements on the diagonals

(Foote, 2001)

𝑆(𝑖, 𝑗) =𝑉b L 𝑉b𝑉b L 𝑉w

𝐵(𝑙) =)𝑆(𝑘, 𝑘 + 𝑙)�

B

Page 31: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Beat Spectrum

• A more robust version can be obtained from the 2D auto-correlation of the similarity matrix

• The final beat spectrum is derived by summing over one axis- The left plot shows five beats and a triplet

within a beat.

• “Beat spectrogram” can be also obtained by successive beat spectra

𝐵(𝑘, 𝑙) =)𝑆(𝑖, 𝑗) L 𝑆(𝑖 + 𝑘, 𝑗 + 𝑙)�

b,w

(Foote, 2001)

Five beats and a triplet within a beat

Page 32: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Tempogram

• Algorithm- Step 1: compute ODF from the half-wave

rectified spectral flux- Step2: obtain the frequency and phase

that provide the maximum correlation with for the ODF and form a local sinusoidal kernel

- Step 3: accumulate the successive local sinusoidal kernels to form a PLP curve

- Step 4: take DFT or auto-correlation(Grosche, 2009)

k(m) = w(m− n)cos(2π (wm− ϕ ))

• Modeling the onset function using sinusoid as predominant local periodicity (PLP)

Page 33: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Tempogram

• Cyclic Tempogram- Accumulate the tempogram

for integer multiples of a tempo (up to four octaves)

- Conceptually similar to “Chromagram”

(Grosche, 2011)

Page 34: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Comb-Filter Banks

• Also called resonant filter banks- Comb filter equation

• Builds up rhythmic evidences (by anticipation?)

(Klapuri, 2006)

𝑦 𝑛 = 𝑥 𝑛 + 𝛼𝑦 𝑛 − 𝜏

Page 35: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Sub-band Resonant Filter Banks

• Algorithm- A sub-band filter bank as a front-end

processing - Parallel ODFs for 6 bands- 150 resonators for each band and all

possible tempo values (60 - 240 bpm)

- Pick up the delay that provides the highest peak as a tempo

(Scheirer, 1998)

Page 36: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Beat Tracking

• Estimate the position of beats in music - Usually a subset of detected onsets selected by the tempo

Page 37: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Beat Tracking by the Resonator Model

• Once the resonator model chooses the tempo that returns the highest peaks, the output produces a sequence of resonated peaks- They correspond to the beats

(Scheirer, 1998)

Page 38: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

• Find the optimal “hopping” path on music (Ellis, 2007)

- 𝐶 𝑡b : cost of the path 𝑡b- 𝑂 𝑡b : onset strength function (i.e. ODF)

- 𝐹(∆𝑡, 𝑇): tempo (𝑇) consistency score: e.g. 𝐹 ∆𝑡, 𝑇 = −(𝑙𝑜𝑔 ∆g�).

Beat Tracking by Dynamic Programming

𝐶 𝑡b =)𝑂 𝑡b

b1A

+ 𝛼)𝐹 𝑡b − 𝑡b2A, 𝑇�

b1.

. . .

1

Page 39: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Finding the Minimum-Cost-Path

• Naïve approach- Find all paths from A to K and calculate the cost for each, and choose the

path that has the minimum cost.- As the number of nodes increases, the number of possible paths increases

exponentially

A C

B

D

E

F

G

H

24

3

36

2

42

2

32

5

4 12

33

1

53

I

J

K7

45

6

3

3

5

74

3 23

2

Page 40: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Dynamic Programming (DP)

• Observation- Say the minimum-cost-path passes by a node p, - What is the minimum-cost-path from A to p ?- It is just a sub-path of the minimum-cost-path from A to K.- Thus, we don’t have to compute the cost from scratch; we can use the cost

computed from the previous nodes.

A C

B

D

E

F

G

H

24

3

36

2

42

2

32

5

4 12

33

1

53

I

J

K7

45

6

3

3

5

74

3 23

2

Page 41: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Dynamic Programming (DP)

• The minimum cost is computed by the following equation:

• The minimum-cost-path can be found by tracing back the computation

Ck ( j) =Ok ( j)+mini {Ck−1(i)+ cij}Ck ( j)Ok ( j)

: cost up to node j: local cost at node j

cij : transition cost from i to j

A C

B

D

E

F

G

H

24

3

36

2

42

2

32

5

4 12

33

1

53

I

J

K7

45

6

3

3

5

74

3 23

2

Page 42: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Applying DP to Beat Tracking

• To optimize:

- Define 𝐶∗ 𝑡 as best score up to time 𝑡 and compute it for every 𝑡

- Also, store the time that returns maximum score 𝑃 𝑡

- At the end of the sequence, traceback 𝑃 𝑡 , which returns the best path 𝑡b

𝐶 𝑡b =)𝑂 𝑡b

b1A

+ 𝛼)𝐹 𝑡b − 𝑡b2A, 𝑇�

b1.

𝐶∗ 𝑡 = 𝑂 𝑡 + max�{𝛼𝐹 𝑡 − 𝜏, 𝑇 + 𝐶∗ 𝜏 }

𝑃 𝑡 = argmax�

{𝛼𝐹 𝑡 − 𝜏, 𝑇 + 𝐶∗ 𝜏 }

0 1 2 3 4 50

100

200

300

400

time [sec]

ODF

𝑡𝜏

𝐶∗ 𝑡

Page 43: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

Example of DP to Beat Tracking

Page 44: GCT634: Musical Applications of Machine Learning Rhythm ...mac.kaist.ac.kr/~juhan/gct634/slides/08-rhythm_transcription.pdf · •Rhythm transcription ... -Division (tatum): temporal

References

• E. Scheirer, “Tempo and Beat Analysis of Acoustic Musical Signals”, 1998• J. Foote and S. Uchihashi, “The Beat Spectrum: A New Approach to Rhythm

Analysis”, 2001• G. Tzanekatis, “Musical Genre Classification of Audio Signals”, 2002• A. Klapuri, “Analysis of the Meter of Acoustic Musical Signals”, 2006• P. Grosche and M. Muller, “Computing Predominant Local Periodicity

Information In Music Recordings”, 2009• P. Grosche and M. Muller, “Cyclic Tempogram – A Mid-Level Tempo

Representation For Music Signals”, 2010• D. Ellis, “Beat Tracking by Dynamics Programming”, 2007• S. Bock and G. Widmer, “Maximum Filter Vibrato Suppression For Onset

Detection”, 2013


Recommended