Post on 23-Jan-2016
description
transcript
Content-based Music Retrieval from Acoustic Input (CBMR)
-2-
Outline
What is CBMR?Methods
Signal processing Similarity comparison
Experiment resultsDemoFuture work
-3-
What is CBMR?
CBMR : Content-based Music Retrieval
Traditional database query : Text-based or SQL-based
Our goal : Music retrieval by singing/humming
-4-
Related Work
Query by humming by Ghias,Loga and Chamberlin in 1995 Autocorrelation pitch detection 183 songs in database
MELDEX system by New Zealand Digital Library Project in 1996
Gold/Rabiner Algorithm (800 songs) Sing ‘la’ or ‘ta’ when transposition
Karaoke song recognizer by J.F. Wang in 1997
Novel pitch detection 50 songs in database
-5-
Flowchart
Post Signal Processing
Pitch Tracking
Microphone Signal Input
Filtering
Query Results(Ranked Song List)
Similarity Comparison
Off-line processingMidi message Extraction
Songs Database
Sampling
11KHz
Mid-level Representation
On-line processing
-6-
0 2 4 6 8 10 12
x 104
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Original Wave Input
小雨中的回憶
11025 Hz8 BitsMono
-7-
Single Frame
512 points/frame340 points overlap
Zoom in
Overlap
Frame
-8-
Pitch Tracking
Range E2 - C6 82 Hz - 1047 Hz ( - )
Method Auto-correlation
)()(N
1)(
1
N
n
nSnSr
-9-
Auto-correlation without Clipping
-10-
Center Clipping
(a) (b) (c)
000
Clipping limits are set to % of the absolute maximum of the auto-correlation data
-11-
Auto-correlation with Clipping
-12-
Pitch Contour
-13-
Signal Process
Remove violent point & short notesDown sampling & smoothingFrequency to semitone
Semitone : A music scale based on A440 )
440(log12 2
fS
-14-
Pitch Contour (After Smoothing)
-15-
Mid-level Representation
-16-
Mid-level Representation without Rest
-17-
Similarity Comparison
Goal Find the most similar Midi file
Challenge Tempo variance
Dynamic time warping (DTW)
Tune varianceKey transposition
-18-
Compare by DTW
Wave File
Mid File
DTW
-19-
Dynamic Time Warping (DTW)
i
j
t(i-1) t(i)
r(j)r(j-1)
window
window
),(
)1,2(
)1,1(
)2,1(
min),( jidist
jiD
jiD
jiD
jiD
-20-
DTW (cont.)
i
j dist(i,j) = |t(i)-r(j)|
if ( t(i) = Rest && r(j) = Rest ) dist(i,j) = 0;elseif ( t(i) = Rest || r(j) = Rest) dist(i,j) = restWeight;
-21-
Example of DTW
20 40 60 80-10
-5
0
vec1
10 5 0
20
40
60
80
100
120
vec220 40 60 80
20
40
60
80
100
120DTW path (red)
-22-
Key Transposition
Mean siftBinary search in the searching area
O( N) --> O (log N)
Mean
Searching Area
-23-
Example of Key Transposition
-24-
Score Function
m : length of match string n : length of input string e : DTW distance A = 0.8 B = 0.6
)(*)1()(*n
mABAScore n
e
-25-
Experiment Environment
290 wave files Wave length : 5 - 8 sec Wave format : PCM, 11025Hz, 8bits, Mono
Environment Celeron 450 with 128Mb RAM under Matlab 5.3
Database 493 midi files
-26-
Experiment Result (Histogram)
-27-
Experiment Result (Pie)
82%
5%
2%
5%3%
4%Rank=1 Rank=2~3 Rank=4~10 Rank=11~50 Rank=51~100Rank>100
Total time : 4589 sec (15.8 sec/per-wave)
-28-
Experiment Result (Pie) - With Rest
81%
4%
4%
6%2%2%Rank=1
Rank=2~3 Rank=4~10 Rank=11~50 Rank=51~100Rank>100
Total time : 7893 sec (27.2 sec/per-wave)
-29-
How to Accelerate?
Branch and bound O(N) -> O(lnN) Triangle inequality
d(a,b) + d(b,c) d(a,c)≧
Hierarchical 2 phase
3/32 sec2/32 sec
0節點
1 2 3
4 5 6
16151413
121110987
2423222120191817 302928272625 39383735 3634333231
0 第 層
1 第 層
2 第 層
3 第 層
-30-
Experiment Result (Pie) - 3/32 sec
59%
8%
7%
11%
6%
9%Rank=1 Rank=2~3 Rank=4~10 Rank=11~50 Rank=51~100Rank>100
Total time : 2358 sec (8.9 sec/per-wave)
-31-
Experiment Result (Pie) - 2 Phase
82%
5%
2%
4%
4%3%Rank=1 Rank=2~3 Rank=4~10 Rank=11~50 Rank=51~100Rank>100
Total time : 3006 sec (11.2 sec/per-wave)
-32-
Error Analysis
Midi errorSinging errorLow pitchBroken vocalismNoise
-33-
Future Work
Time consuming Better similarity comparison Different comparison unit Hardware acceleration Better searching algorithm
Steadier pitch tracking algorithmNoise handle