Retrieval Methods for QBSH (Query By Singing/Humming)

Post on 08-Jan-2016

43 views 0 download

description

Retrieval Methods for QBSH (Query By Singing/Humming). J.-S. Roger Jang ( 張智星 ) jang@mirlab.org http://mirlab.org/jang Multimedia Information Retrieval Lab CSIE Dept, National Taiwan University. Retrieval Methods for QBSH. Goal Find the most similar melody in the database Challenges - PowerPoint PPT Presentation

transcript

Retrieval Methods for QBSH (Query By Singing/Humming)

J.-S. Roger Jang (張智星 )

jang@mirlab.org

http://mirlab.org/jang

Multimedia Information Retrieval Lab

CSIE Dept, National Taiwan University

Retrieval Methods for QBSH

Goal Given a query, find the most similar melody in the database

Challenges Robust pitch tracking for various acoustic inputs

Input from a mobile deviceInput at a noisy karaoke room

Comparison methods need to deal with…Key variations in users’ input (due to gender difference)Tempo variations in users’ inputReasonable response time, e.g., 5 seconds

Evaluation of QBSH Methods

Two criteria for evaluating QBSH methods Efficiency: How fast is the system?

Can it deal with a database of 100K songs?

Effectiveness: How accurate is the system?Several performance indices for effectiveness

A Typical Query Result

Performance Indices of Effectiveness in QBSH Methods

Queries always in database Top-10 recognition rates (RR) for n queries:

RR = (1+0+0+1+1…)/n

Top-10 mean reciprocal rank (MRR) for n queries: MRR = (1/3+1/inf+1/4+1/2+1/5…)/n

Queries may not in database True positive and true negative rates to deal with

out-of-vocabulary (OOV) problem

Examples of RR and MRR

Specs No. of queries: 10 Database size: 100 No OOV

GT (groundtruth) of the query set are within DB

Test result GT ranking: [1 3 8 4 9 21

2 5 8 2] Top-5 RR

(1+1+0+1+0+0+1+1+0+1)/10 = 6/10 = 60%

Top-5 MRR(1/1+1/3+1/∞+1/4+1/

∞+1/∞+1/2+1/5+1/∞+1/2)/10 = 0.2783

Quiz!

Quiz!

Types of QBSH Approaches

Categories of approaches to QBSH Histogram/statistics-based Note vs. note

Edit distance

Frame vs. noteHMM

Frame vs. frameLinear scaling, DTW, recursive alignment

Linear Scaling (LS)

Concept Scale the query linearly to match the candidates

Assumption Uniform tempo variation

Rest handling Cut leading and trailing zeros (silence) All the other zeros (rests) are replaced with the

previous non-zero pitch

Quiz! Example: Row Row Row a Boat

Linear Scaling

Scale the query pitch linearly to match the candidates

Original input pitch

Stretched by 1.25

Stretched by 1.5

Compressed by 0.75

Compressed by 0.5

Target pitch in database

Best match

Original pitch

Most likely from a MIDI file

Strength and Weakness of LS

Strength One-shot for dealing

with key transposition Efficient and effective Indexing methods

available

Weakness Cannot deal with non-

uniform tempo variations

Typical mapping path

Quiz!

Compress or Expand a Pitch Vector

Given a pitch vector y of length m, how to compress or expand it to length n? x2=interp1(1:m, y, linspace(1, m, n)); Examples

m=7, n=13m=7, n=9

Quiz!

Distance Function for LS

Commonly used distance function for LS Normalized Lp-norm

Characteristics Usually p=1 or 2 for LS Normalization to get rid of length variations

pp

n

pp

p n

xxxxL

/1

21)(ˆ

Quiz!

Key Transposition in LS

How to find the best transposed query that has the smallest distance from the database items: Best transposition

In practice…

)(minargˆ rsqLs ps

Query

Database item

Transposed query

)()()(ˆ1

)()()(ˆ2

rmedianqmedianrqmediansp

rmeanqmeanrqmeansp

Example of Linear Scaling via L1 Norm

linScaling01.m

0 50 100 150 200 250 300 350

50

60

70Database and input pitch vectors

Sem

itone

s

Database pitch

Input pitch

0 50 100 150 200 250 300 350

50

60

70

Sem

itone

s

Database and scaled pitch vectors

Database pitch

Scaled pitch

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50

2

4

Scaling factor

Dis

tanc

e

Normalized distance

Linear Scaling via L1 and L2 Norm

linScaling02.m

0 50 100 150 200 250 300 350

50

60

70Database and input pitch vectors

Sem

itone

s

Database pitch

Input pitch

0 50 100 150 200 250 300 350

50

60

70

Sem

itone

s

Database and scaled pitch vectors

Database pitch

Scaled pitch via L1 norm

Scaled pitch via L2 norm

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50

5

Scaling factor

Dis

tanc

es

Normalized distances via L1 & L

2 norm

L

1 norm

L2 norm

DTW (Dynamic Time Warping)

About DTW DTW introduction DTW for QBSH#1 method for task 2 in QBSH/MIREX 2006

RA (Recursive Alignment)

Characteristics Combine characteristics

of LS & DTW #1 method for task 1 in

QBSH/MIREX 2006

A typical mapping path

Modified Edit Distance

Note segmentation

Modified edit distance

,

)(}2),,....,,({

)(}2),,,....,({

)(),(

)(),(

)(),(

min

1,1

11,

1,1

1,

,1

,

ionfragmentatjkbbawd

ionconsolidatikbaawd

treplacemenbawd

insertionbwd

deletionawd

d

jkjikji

jikijki

jiji

jji

ji

ji