+ All Categories
Home > Documents > Combined Audio and Video Analysis for Guitar Chord ...

Combined Audio and Video Analysis for Guitar Chord ...

Date post: 02-Oct-2021
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
1
˘ Combined Audio and Video Analysis for Guitar Chord Identification Alex Hrybyk • [email protected], Advisor: Youngmoo Kim Electrical & Computer Engineering, Drexel University Introduction Video Analysis This research presents a multi-modal approach to automatically identifying guitar chords using audio and video of the performer. Chord identification is for stringed instruments adds extra ambiguity as a single chord or melody may be played in different positions on the fretboard. Preserving this information is important, because it signifies the original fingering, and implied “easiest” way to perform the selection. This chord identification system combines analysis of audio to determine the general chord scale (i.e. A major, G minor), and video of the guitarist to determine chord voicing (i.e. open, barred, inversion), to accurately identify the guitar chord. While performing, the guitar can be held in many different orientations relative to the camera, making it difficult to find the locations of frets of the guitar. Homography is used to rectify or warp our original image to fit the ideal fretboard making it easy to locate the fretboard in the image [2]. Once the fretboard has been rectified and extracted from the image, it can be reduced into its “eigen-chord” components, using many images drawn from a training set. The various voicings of a chord tend to group together in the eigen-chord space. By projecting an unknown image into the space, we can determine which voicing it belongs to using the closest centroid from the training set. MET-lab • Music • Entertainment • Technology • http://music.ece.drexel.edu The system that performs the best in terms of correctly identifying the overall chord (scale and voicing) utilizes the strengths of audio and video results. Since Specmurt analysis yielded extremely high accuracy for determining scale, it was used as a preprocessing step to voicing identification via video. [1] S. Saito, H. Kameoka, K. Takahashi, T. Nishimoto, and S. Sagayama, “Specmurt analysis of polyphonicmusic signals,”Audio, Speech, and Language Processing, IEEETransactions on, vol.16, no.3, pp. 639–650, February 20 [2] X. Wang and B. Yang, “Automatic image registration based on natural characteristic points and global homography,” in Computer Science and Software Engineering, 2008 International Conference on, vol. 5, dec. 2008, pp. 1365 –1370. Results Audio Only Video Only Combined System Scale 98.6% 34.8% 98.6% Voicing 62.0% 94.4% 94.4% Both 61.1% 32.8% 93.1% References 2 0 2 x 10 8 3 2 1 0 1 2 3 4 x 10 8 6 4 2 0 2 4 6 x 10 8 Eigenchord 1 Eigenchord Space Separated by Voicing Eigenchord 2 Eigenchord 3 Barred Inverted Open When playing a single note, the guitar, and many other instruments produce natural harmonics (overtones) in addition to the note’s fundamental frequency. When playing multiple notes, the frequency spectrum of the audio appears cluttered, making detecting the fundamental frequencies (the actual notes) hard to locate. Using a technique known as Specmurt analysis [1], the notes of the guitar chord can be extracted from the audio signal. Through a process of de-convolving the log-scaled frequency spectrum of a signal with a known harmonic structure, the resulting spectrum will contain only peaks at the fundamental frequencies, making it easy to locate the notes being played. Since de-convolution is difficult, we use the time/frequency duality of convolution/multiplication to make finding u(f) much easier, taking the frequency data to the “Specmurt domain”. The resulting spectrum u(f) contains only peaks at the fundamental frequencies. Fundamental frequency pattern Common harmonic structure Log-frequency multipitch spectrum ˆ f ˆ f h ( ˆ f ) v ( ˆ f ) ˆ f ˆ f 1 ˆ f 2 u ( ˆ f ) ˆ f 3 ˆ f 1 ˆ f 2 ˆ f 3 Audio Analysis x y Note Name A2 A#2 B2 C3 C#3 D3 D#3 E3 F3 F#3 G3 G#3 A3 A#3 B3 C4 C#4 D4 D#4 E4 F4 F#4 G4 G#4 A4 A#4 Specmurt “Piano-Roll” Output Time (seconds) 1 0 2 3
Transcript
Page 1: Combined Audio and Video Analysis for Guitar Chord ...

˘

Combined Audio and Video Analysis for Guitar Chord IdentificationAlex Hrybyk • [email protected], Advisor: Youngmoo Kim

Electrical & Computer Engineering, Drexel University

Introduction

Video Analysis

This research presents a multi-modal approach to automatically identifying guitar chords using audio and video of the performer. Chord identification is for stringed instruments adds extra ambiguity as a single chord or melody may be played in different positions on the fretboard. Preserving this information is important, because it signifies the original fingering, and implied “easiest” way to perform the selection. This chord identification system combines analysis of audio to determine the general chord scale (i.e. A major, G minor), and video of the guitarist to determine chord voicing (i.e. open, barred, inversion), to accurately identify the guitar chord.

While performing, the guitar can be held in many different orientations relative to the camera, making it difficult to find the locations of frets of the guitar. Homography is used to rectify or warp our original image to fit the ideal fretboard making it easy to locate the fretboard in the image [2].

Once the fretboard has been rectified and extracted from the image, it can be reduced into its “eigen-chord” components, using many images drawn from a training set. The various voicings of a chord tend to group together in the eigen-chord space. By projecting an unknown image into the space, we can determine which voicing it belongs to using the closest centroid from the training set.

MET-lab • Music • Entertainment • Technology • http://music.ece.drexel.edu

The system that performs the best in terms of correctly identifying the overall chord (scale and voicing) utilizes the strengths of audio and video results. Since Specmurt analysis yielded extremely high accuracy for determining scale, it was used as a preprocessing step to voicing identification via video.

[1] S. Saito, H. Kameoka, K. Takahashi, T. Nishimoto, and S. Sagayama, “Specmurt analysis of polyphonicmusic signals,”Audio, Speech, and Language Processing, IEEETransactions on, vol.16, no.3, pp. 639–650, February 20

[2] X. Wang and B. Yang, “Automatic image registration based on natural characteristic points and global homography,” in Computer Science and Software Engineering, 2008 International Conference on, vol. 5, dec. 2008, pp. 1365 –1370.

Results

Audio Only Video Only Combined SystemScale 98.6% 34.8% 98.6%Voicing 62.0% 94.4% 94.4%Both 61.1% 32.8% 93.1%

References

−2

0

2

x 108

−3−2−101234

x 108

−6

−4

−2

0

2

4

6

x 108

Eigenchord 1

Eigen−chord Space Separated by Voicing

Eigenchord 2

Eige

ncho

rd 3

BarredInvertedOpen

When playing a single note, the guitar, and many other instruments produce natural harmonics (overtones) in addition to the note’s fundamental frequency. When playing multiple notes, the frequency spectrum of the audio appears cluttered, making detecting the fundamental frequencies (the actual notes) hard to locate. Using a technique known as Specmurt analysis [1], the notes of the guitar chord can be extracted from the audio signal.

Through a process of de-convolving the log-scaled frequency spectrum of a signal with a known harmonic structure, the resulting spectrum will contain only peaks at the fundamental frequencies, making it easy to locate the notes being played.

Since de-convolution is difficult, we use the time/frequency duality of convolution/multiplication to make finding u(f) much easier, taking the frequency data to the “Specmurt domain”. The resulting spectrum u(f) contains only peaks at the fundamental frequencies.

Fundamental frequency pattern Common harmonic structure Log-frequency multipitch spectrum

!

ˆ f

!

ˆ f !

h( ˆ f )

!

v( ˆ f )

!

ˆ f

!

ˆ f 1

!

ˆ f 2!

u( ˆ f )

!

ˆ f 3

!

ˆ f 1

!

ˆ f 2

!

ˆ f 3

Audio Analysis

x′

y′

Specmurt Piano!roll of C#m7 Jazz Chord

No

te N

am

e

Frame number

5 10 15 20 25 30 35

C2C#2

D2D#2

E2F2

F#2G2

G#2A2

A#2B2C3

C#3D3

D#3E3F3

F#3G3

G#3A3

A#3B3C4

C#4D4

D#4E4F4

F#4G4

G#4A4

A#4B4C5

C#5D5

D#5E5F5

F#5G5

G#5A5

A#5B5

Highest Energy

Specmurt “Piano-Roll” Output

Time (seconds)10 2 3

Recommended