Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio...

Polyphonic Music Transcription Using A Dynamic Graphical Model

Barry Rafkind

E6820 Speech and Audio Signal Processing

Wednesday, March 9th, 2005

Presentation Outline

• Motivating Music Transcription

• My Project Proposal

• Project Timeline

Motivating Music Transcription

• Given a musical recording, we wish to obtain a MIDI score for :– Performance (convert MIDI to a music score)– Analysis (evaluate intonation or number of missed

or incorrect notes - useful for music education)– Comparison with other music (copyright

infringement / search)– Replay on MIDI synthesizers (use different

instruments / change settings / overlay tracks, etc...)

Recent Previous Work

• Multi-Instrument Musical Transcription Using A

Dynamic Graphical Model, Michael Jordan, 2004

• Automatic Transcription of Piano Music, Christopher

Raphael, 2002, Univ. of Massachusetts, Amherst

• Polyphonic Pitch Extraction, Graham Poliner, E6820

Speech & Audio Signal Processing, Spring 2004

• Many, many, many more…. Try searching Google for

PDF documents with keywords : music transcription





My Project Proposal• Jordan presents a multi-instrument transcription system

capable of listening to a recording in which two or more instruments are playing, and identifying both the notes that were played and the instruments that played them. The system models two musical instruments, each capable of playing at most one note at a time.

• My Goal : implement and improve upon Jordan’s Dynamic Graphical Model (DGM) approach.

• Whereas he made assumptions about how to model each instrument, I want to let the system learn what to look for by starting with a general model.

• Jordan uses a reduced set of states and parameters for efficiency. Try to use a larger model if possible.

• Dynamic Graphical Model (DGM) - what is it?

My Project Proposal

Hidden State Variables Correspond to Discrete Set of Allowable Intensity and Pitch Values

• Key Points in Jordan’s Approach– Use of a note-event timbre model that

includes both a spectral model (in frequency) and a dynamic intensity versus time model (or a “time envelope model”).

– We will perform inference (using the Viterbi Algorithm) on the DGM to compute the path of maximum posterior probability to find explicit note-on events. (note locations)

My Project Proposal

Intensity Transition Model for Violin

My Project Proposal

Intensity Transition Model for Piano

My Project Proposal

General Intensity Transition Model

My Project Proposal

Pitch Transition Model• Build a pitch state conditional probability

distribution as a function of both the previous pitch state and the previous intensity state.

• Transition probabilities are also based on Shephard's pitch helix : defines psycho-acoustic distance between pitches.

My Project Proposal

Observation Model - explains the sound• Model the spectrum of a harmonic musical signal as a series of

narrow bump functions that are harmonically spaced.• That is, conditional on the fundamental frequency Pitch(t) of the

musical signal, we model the spectrum as consisting of a series of bump functions located at integer multiples of Pitch(t).

• Each bump function is given a scale parameter alpha(n) that can depend on Pitch(t).

• The motivation for this is that the relative spectral content of an instrument can depend on what pitch is being played.

• The intensity envelope at time t scales all of the harmonics.

My Project Proposal

Observation Model• Model the spectrum of a harmonic musical signal as a series of






My Project Proposal







My Project Proposal







My Project Proposal







My Project Proposal

Evalution Metrics• Note Error Rate (based on “minimum edit

distance” in speech) = 100 x ( Insertions + Substitutions + Deletions ) / Total Number of Notes in Score. We want to minimize this.

• Dixon Success Score = 100 x (Correct Notes / ( Correct + False Positives + Deletions ). We want to maximize this.

My Project Proposal





Project Timeline

Seven Weeks Left

3/14 - Collect MIDI Data + Convert to WAV Audio, Understand DGM

3/21 - Start building / understanding graphical models

3/28 - Continue building / understanding graphical models

4/04 - Finish building / understanding graphical models

4/11 - Evaluate Results / Fix Bugs

4/18 - Try new data / Fix bugs. Begin Preparing Final Presentation.

4/25 - Finish Preparing Final Presentation

4/27 - Final Presentation in Class

Date post:	01-Jan-2016
Category:	Documents
Upload:	roderick-powers
View:	215 times
Download:	1 times

Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio...

Documents