+ All Categories
Home > Documents > Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning...

Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning...

Date post: 17-Sep-2018
Category:
Upload: phunghanh
View: 221 times
Download: 0 times
Share this document with a friend
16
Audio: Generation & Extraction Charu Jaiswal
Transcript
Page 1: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

Audio:Generation&Extraction

CharuJaiswal

Page 2: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

MusicComposition– whichapproach?

• FeedforwardNNcan’tstoreinformationaboutpast(orkeeptrackofpositioninsong)• RNNasasinglesteppredictorstrugglewithcomposition,too

• Vanishinggradientsmeanserrorflowvanishesorgrowsexponentially• Networkcan’tdealwithlong-termdependencies

• Butmusicisallaboutlong-termdependencies!

2

Page 3: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

Music

• Long-termdependenciesdefinestyle:• Spanningbarsandnotescontributetometricalandphrasalstructure

• Howdoweintroducestructureatmultiplelevels?• EckandSchmidhuberàLSTM

3

Page 4: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

WhyLSTM?

• Designedtoobtainconstanterrorflowthroughtime• Protecterrorfromperturbations

• Uses linearunitstoovercomedecayproblemswithRNN

• Inputgate:protectsflowfromperturbationbyirrelevantinputs• Outputgate:protectsotherunitsfromperturbationfromirrelevantmemory• Forgetgate:resetmemorycellwhencontentisobsolete

Hochreiter &Schmidhuber, 1997 4

Page 5: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

DataRepresentation

Chords:

Notes:

EckandSchmidhuber,2002 5

Onlyquarternotes

Norests

TrainingmelodieswrittenbyEck

Datasetof4096segments

Page 6: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

Experiment1- LearningChords

• Objective:showthatLSTMcanlearn/representchordstructureintheabsenceofmelody• Network:• 4cellblocksw/2cellseacharefullyconnectedtoeachother+input• Outputlayerisfullyconnectedtoallcellsandtoinputlayer

• Training&testing:predictprobabilityofanotebeingonoroff• Usenetworkpredictionsforensuingtimestepswithdecisionthreshold• CAVEAT:treatoutputsasstatisticallyindependent.Thisisuntrue!(Issue#1)

• Result:generatedchordsequences

6

Page 7: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

Experiment2– LearningMelodyandChords

• CanLSTMlearnchord&melodystructure,andusethesestructuresforcomposition?• Network:• Differenceforex1.:chordcellblockshaverecurrentconnectionstothemselves+melody;melodycellblocksareonlyrecurrentlyconnectedtomelody

• Training:predictprobabilityforanotetobeonoroff

7

Page 8: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

Samplecomposition

• Trainingset:http://people.idsia.ch/~juergen/blues/train.32.mp3

• Chord+melodysample:http://people.idsia.ch/~juergen/blues/lstm_0224_1510.32.mp3

8

Page 9: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

Issues

• Noobjectivewaytojudgequalityofcompositions• Repetitionandsimilaritytotrainingset• Considerednotestobeindependent• Limitedtoquarternotes+norests• Usessymbolicrepresentations(modifiedsheetnotation)à howcouldithandlereal—timeperformancemusic(MIDIoraudio)• Wouldallowinteraction(liveimprovisation)

9

Page 10: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

AudioExtraction(sourceseparation)

• Howdoweseparatesources?• Engineeringapproach:decomposemixedaudiosignalintospectrogram,assigntime-frequencyelementtosource• Idealbinarymask:eachelementisattributedtosourcewithlargestmagnitudeinthesourcespectrogram• Thisisthenusedtoest.referenceseparation

10

Page 11: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

DNNApproach

• Dataset:63popsongs(50fortraining)• binarymaskcomputed:determinedbycomparingmagnitudesofvocal/non-vocalspectrogramsandassigningmaska‘1’whenvocalhadgreatermag

11

Page 12: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

DNN

• Trainedafeed-forwardDNNtopredictbinarymasksforseparatingvocalandnon-vocalsignalsforasong• Spectrogramwindowwasunpackedintoavector• Probabilisticbinarymask:testingusedslidingwindow,andoutputofmodeldescribedpredictionsofbinarymaskinslidingwindowformat• Confidencethreshhold (alpha):Mv binarymask

12

Page 13: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

SeparationofsourcesusingDNN

13

Page 14: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

Separationqualityasafunctionofalpha

14

SIR(red)=signal-to-interferenceratio

SDR(green)=signal-to-distortion

SAR(blue) =signal-to-artefact

SARandSIRcanbeinterpretedasenergeticequivalentsofpositivehitrate(SIR)andfalsepositiverate(SAR)

Page 15: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

Like-to-likeComparison

15

PlotsmeanSARasafunctionofmeanSIRforbothmodels

DNNprovides~3dBbetterSARperformance foragivenSIRindexmean,~5dBforvocalandandonlyasmalladvantagefornon-vocalsignals

DNNseemstohavebiaseditslearnings towardmakinggoodpredictionsviacorrectpositiveidentificationofvocalsounds

Page 16: Audio: Generation & Extractionfidler/teaching/2015/slides/CSC2523/... · Experiment 1- Learning Chords • Objective: show that LSTM can learn/represent chord structure in the absence

CritiqueofPaper+NextSteps

• DNNseemstohavebiaseditslearningstowardmakinggoodpredictionsviacorrectpositiveidentificationofvocalsounds• OnlyasmalladvantagetousingDNNvs.traditionalapproach• Expanddataset

16


Recommended