+ All Categories
Home > Documents > Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine...

Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine...

Date post: 31-Mar-2015
Category:
Upload: sonia-freeman
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
18
Informatik og Matematisk Modellering / Intelligent Signalbehandling 1 Kaare Brandt Petersen Machine Learning on Sound ... how hard can it be? Audio Information Seminar Thursday, June 8, 2006 Kaare Brandt Petersen
Transcript
Page 1: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Informatik og Matematisk Modellering / Intelligent Signalbehandling

1Kaare Brandt Petersen

Machine Learning on Sound... how hard can it be?

Audio Information SeminarThursday, June 8, 2006Kaare Brandt Petersen

Page 2: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 2

Informatik og Matematisk Modellering / Intelligent Signalbehandling

Agenda Motivation

The reason it might be hard:- From data and information- Features

The good news:- Computer power and machine learning- Examples

Conclusions

Page 3: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 3

Informatik og Matematisk Modellering / Intelligent Signalbehandling

Motivation What can we do with audio information?

News archive: Find the grumpy voice in a TV broadcasting from a busy street in the middle east. Search in newsarchives

Music: 6 billion friends. Navigating in the world landscape of music

Page 4: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 4

Informatik og Matematisk Modellering / Intelligent Signalbehandling

Data Sound as perceived by humans

and by computers

-0.000762939453130.00231933593750-0.00714111328125 0.007720947265630.00076293945313-0.00772094726563-0.00900268554688-0.00527954101563-0.00076293945313-0.00231933593750-0.007141113281250.000244140625000.013122558593750.00650024414063-0.01052856445313-0.01089477539063-0.00305175781250-0.01052856445313-0.01089477539063-0.00305175781250

[ Beeps ]

- "There's the televison"

[ Music - violins ]

[ Steps ]- "Its all right there"- "All right there!"

- "Look. Listen. Neel. Pray" - "Commericals!"

[ Male voice - indoor ]

Dialogue Sound events

12 MonkeysMovie from 1995

Page 5: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 5

Informatik og Matematisk Modellering / Intelligent Signalbehandling

Data Is the data-to-information translation really necessary?

1) Query by signal processing[ humans learn how computers think ]

2) Query by information[ computers learn how humans think ]

3) Query by example[ various approaches ]

"happy jazz"

ZCR < 198

Archive

Page 6: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 6

Informatik og Matematisk Modellering / Intelligent Signalbehandling

Data Going from 5 million real

numbers to "Opera"

Bridging the gap: From data to information

Constructing sound features the right way

Information

Meaning

Context

Page 7: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 7

Informatik og Matematisk Modellering / Intelligent Signalbehandling

Features Many shorttime features

Zero crossing rateSpectral flatnessSpectral bandwidthSpectral centroidsSpectral rolloffSpectral fluxEnergy...

Mel Frequency Cepstral Coefficients (MFCC) [Foote97, Rabiner93]Real Cepstral Coefficients (RCC) Linear Prediction Coefficients (LPC)Wavelets Gamma-tone-filterbanksSone / BarkChroma features...

ZCR

MFCC 1

Spec

Sp-Flatness

MFCC 2-7

Waveform

Sp-BandwidthSp-Centroid

Chroma

12 Monkeys sound clip

Page 8: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 8

Informatik og Matematisk Modellering / Intelligent Signalbehandling

Features Aggregating shorttime features

Audio clip = data cloud

Distribution of valuesBasic statistics [Wold96]Histograms and vector quantization [Foote97]Gaussian Mixture Models [Auc02]K-means clustering [Logan01]Anchors by Neural Networks [Beren03]

Temporal modellingSVD of e.g. spectrogram [Gu04] AR-coefficients [Meng05]

Page 9: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 9

Informatik og Matematisk Modellering / Intelligent Signalbehandling

Features What we are trying to do: From data to information

-0.000762939453130.00231933593750-0.00714111328125 0.007720947265630.00076293945313-0.00772094726563-0.00900268554688-0.00527954101563-0.00076293945313-0.00231933593750-0.007141113281250.000244140625000.013122558593750.00650024414063-0.01052856445313-0.01089477539063

Data

ZCRSpectralMFCCChromaSone/BarkRCCLPC...

Low-levelFeatures

Basic statsGMMKmeansAnchorsAR coeffSVDHMM...

High-levelFeatures

"Rough""Deep""Sparky""Broad""Melancolic""Majestic""Jazz""Rock"...

Information

Page 10: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 10

Informatik og Matematisk Modellering / Intelligent Signalbehandling

Features Music similarity example

"Shape of my heart"Backstreet Boys, 2000

"Thats the way it is"Celine Dion, 2000

"Cantaloop"Us3, 1993

"The limitations observed in this paper (...) suggests that the usual route to timbre similarity may not be the optimal one" [Auc04]

Page 11: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 11

Informatik og Matematisk Modellering / Intelligent Signalbehandling

The bad news Sound data is far from the information

Not all features are useful

It is not obvious what the information labels should be

Page 12: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 12

Informatik og Matematisk Modellering / Intelligent Signalbehandling

The good news Computer power Signal processing

- strong development in signal processing and machine learning in general

- Large amounts of data

- Increased interest in sound and music processing

Page 13: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 13

Informatik og Matematisk Modellering / Intelligent Signalbehandling

Example: Genre estimation Genre estimation by temporal

integration

Peter AhrendtAnders Meng[Meng05]

Processing:Sound -> MFCC -> AR

Page 14: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 14

Informatik og Matematisk Modellering / Intelligent Signalbehandling

Example: Genre estimation Genre estimation by temporal integration +

kernel methods

Jeronimo Arenas-GarciaTue Lehn-SchiølerKaare Brandt Petersen [ArGa06]

Processing:Sound -> MFCC -> AR -> KOPLS

Btw: A data harvesting tool coming up - ISMIR 2006

Page 15: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 15

Informatik og Matematisk Modellering / Intelligent Signalbehandling

Example: Source separation Spectrogram modelling with

sparse NTF2D

Morten MørupMikkel Schmidt, [Mørup06]

W = time-frequency patternsH = time, amplitude, pitch

048

0 2 4 6

Time [s]

Fre

qu

ency

[H

z]

0 0.2 0.4 0.6 0.8200

400

800

1600

3200

Original (mixed)

Separated sources (Harp) (Flute)

Page 16: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 16

Informatik og Matematisk Modellering / Intelligent Signalbehandling

Example: CNN Translating a CNN news broadcast

Kasper JørgensenLasse MølgaardLars Kai Hansen[Jorg06]

Music or Speech?Sound -> MFCC, STE, SpF, ZCR -> mean/var

Speaker change detectionSound -> MFCC -> VQ

Speech recognitionSphinx 4 (Carnegie Mellon)

Page 17: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 17

Informatik og Matematisk Modellering / Intelligent Signalbehandling

ConclusionsIt is hard:

Sound data is far from the information Good features are hard to find

but machine learning is catching up:

Examples: Genre, Source separation, CNN-translation

Page 18: Informatik og Matematisk Modellering / Intelligent Signalbehandling 1Kaare Brandt Petersen Machine Learning on Sound... how hard can it be? Audio Information.

Kaare Brandt Petersen 18

Informatik og Matematisk Modellering / Intelligent Signalbehandling

References[Wold96] Wold, E.; Blum, T.; Keislar, D. & Wheaton, J. "Content-based Classification, Search, and Retrieval of Audio" IEEE Multimedia, 1996, 3, 27-36 [Foote97] Foote, J."Content-based retrieval of music and audio", Multimedia Storage and Archiving Systems II, Proc. of SPIE, 1997, 3229, 138-147[Logan01] Logan and Salomon, "A music similarity function based on signal analysis", ICME 2001[Beren03] Berenzweig, Ellis and Lawrence, "Anchorspace for classification and similarity measurement of music" ICME 2003[Rabiner93] Rabiner, L. & Juang, B.H. "Fundamentals of Speech Recognition", Prentice-Hall, 1993 [Gu04] Gu, Lu, Cai and Zhang, "Dominant Feature vector based audio similarity measure", Proceedings of the Pacific Rim Conference on Multimedia, PCM, 2004[Tza02] Tzanetakis and Cook, "Music Genre Classification of Music", IEEE Transactions on Speech and Audio Processing, 2002, 10, 293-302[Auc02] Aucouturier and Pachet, "Music Similarity Measures: Whats the use?" ISMIR 2002 [Meng05] Anders Meng, Peter Ahrendt and Jan Larsen: "Improving Music Genre Classification by Short-Time Feature Integration", ICASSP, 2005. [Auc04] Aucouturier, Pachet, "Improving Timbre Similarity: How high is the sky?", JNRSAS, 2004[Mørup06] Sparse Non-negative Tensor Factor Double Deconvolution (SNTF2D) for multi channel time-frequency analysis", submitted to JMLR 2006[ArGa06], "Reduced Kaernel Orthonormal Partial Least Squares", submitted for NIPS 2006[Jorg06] Kasper Jørgensen, Lasse Mølgaard, Lars Kai Hansen, "Unsupervised speaker change detection for broadcast news segmentation", EUSIPCO 2006


Recommended