Developing a Reverberation Plugin with the aim of a Genetic Algorithm

THE GENEVERB PLUGIN

Sound Analysis Synthesis and ProcessingPolitecnico di Milano

Como, AA 2012–2013

Students:

Lorenzo Monni SauFrancesco Tedeschi

Professors:

Augusto Sarti

Paolo BestaginiAntonio Canclini

1

Contents

1 The RackAFX Tool 11.1 The Philosophy behind RackAFX . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Building the DLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Creation of the Plugin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 The GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.5 Processing Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6 Destruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Anatomy of the generalized Feedback Delay Network framework 62.1 Origins of FDN Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 The Feedback Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 General Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Implementation of FDN reverb as a RackAFX plugin 93.1 Delay line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Low-Pass Filter Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 Feedback Delay Line class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.4 Geneverb class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Implementation of the Genetic Algorithm in Matlab 154.1 The Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.2 Contextualization of GA to the FDN . . . . . . . . . . . . . . . . . . . . . . . . . 154.3 Implementation in Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.4 Further Changes in the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 174.5 Behaviour of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Conclusion: Possible Improvements 21

2

Abstract

The aim of the Geneverb project is to model a truly sounding reverberation with a perceptualapproach which could approximate the behaviour of convolution with real impulse responses,without the computational burden of the latter. We took a starting point a white paper titled”Generating Matrix Coefficients for Feedback Delay Networks Using Genetic Algorithm”, writtenby Michael Chemistruck, Kyle Marcolini and Will Pirkle, and published as a Convention Paper (n.8795) by the Audio Engineering Society in the late 2012. The tool taken as reference point from theperceptual approach is the Feedback Delay Network, in the implementation of Michael Gerzon,and later Stautner and Puckette. To make the setup of the FDN perceptually significant for thepurpose of a reverberation it’s fundamental to set the feedback matrix coefficients properly, andthis issue represents the merging bridge between perceptual approach and physical characteristics.To reach the optimal result a genetic evolutionary algorithm was used, making possible tosimulate the response of convolution with impulse response of real acoustic environments. Thenwe searched for a good audio programming environment, suited for Microsoft Windows OS. Aftera look to the VST audio plugin format we found in RackAFX a framework for audio pluginsuitable for our objective. The fundamental idea behind the RackAFX software is to give aplatform for rapid prototyping of real time processing plugins. The heart of RackAFX is anAPI written in C++ which allows to be concerned only with the audio processing part of theplugin, offering a pre-built GUI and automatic compatibility with directX and other OS specificaudio platforms. A deeper development of the GUI could be done using JUCE, i.e. a C++cross-platform library to make audio plugins and GUIs, compatible with RackAFX. MicrosoftVisual Studio 2012 was used as an IDE for C++ programming. Matlab was used as a helperin sound analysis and as a computational tool for writing and testing the genetic algorithm andits behaviour. Images related to the RackAFX framework in this report are taken from the book[5].

Chapter 1

The RackAFX Tool

1.1 The Philosophy behind RackAFX

RackAFX is written in C++ and leverages both the paradigms of object oriented programmingand the structure of other audio processing APIs in the market, making this structure moredeveloper-friendly. The RackAFX API specifies a base class called CPlugin from which all plug-insare derived. When starting a new plugin project, RackAFX writes C++ code to create a blankplug-in by creating a derived class of Cplugin. Changes in the graphical user interface lead toautomatic changes in the code of RackAFX, which updates the variables of the GUI’s controls.We need to implement five functions in RackAFX to create a plug-in:

1. Constructor

2. Destructor

3. prepareForPlay()

4. processAudioFrame()

5. userInterfaceChange()

The real-time audio processing loop is exemplified in figure 1.1.The workflow goes on by switching back and forth between RackAFX and Visual Studio:

RackAFX is used to maintain the GUI, whereas the compiler is used to write the signal processingcode. Figure 1.2 shows a snapshot of the basic RackAFX GUI.

1

Figure 1.1: audio processing workflow in RackAFX

Figure 1.2: Snapshot of the GUI of RackAFX

2

Figure 1.3: lifecycle of a plugin in RackAFX

1.2 Building the DLL

When creating a new plugin RackAFX creates a pass-through plug-in by default, the plugindeveloper will then write over the pass-through code with his own processing algorithm. After asuccessful build it is possible to use RackAFX to test and debug the plug-in. The client needs tohandle four basic operations during the lifecycle of a plugin component:

1. Creation of the plugin;

2. Maintaining the UI;

3. Playing and processing audio through the plug-in;

4. Destruction of the plug-in.

The process is briefly illustrated in the image 1.3.

1.3 Creation of the Plugin

When a plugin is loaded in RackAFX we actually pass the system a path to the DLL we havecreated. RackAFX uses an operating system function call to load the DLL into its process space.Once the DLL is loaded RackAFX first runs a compatibility test, then requests a pointer tothe creation method called createObject(). It uses this pointer to call the method and the DLLreturns a newly created instance of the plug-in cast as the PlugIn* base class type. From thatpoint on, the RackAFX client can call any of the base class methods on the created object. Theconstructor is where all the variables will be initialized. The first line of code in the constructoris predefined and calls initUI(), which is a method that handles the creation and set up of yourGUI controls and that should not be modified.

3

Figure 1.4: workflow of the GUI

1.4 The GUI

RackAFX automatically adds member variables in the header file of the derived plugin class,which are associated to each element created in the GUI, so each slider or button group controlsone variable in the code of the plugin. Each control object in the GUI must be set up withminimum, maximum and initial values as well as supplying the variable name and data type. Asthe user moves the control, RackAFX calculates the variable’s new value and delivers it to theplug-in automatically, updating it in real time. In some cases it will be mandatory to performmore processing in addition to just changing the control variables and RackAFX will also call theuserInterfaceChange() method of the plug-in derived class, as pointed out in figure 1.4.

1.5 Processing Audio

When a user loads an audio file and hits the play button, a two-step sequence of events occurs.First, the client calls PrepareForPlay() on the plugin. The plugin will do any last initializations itneeds before audio begins streaming. PrepareForPlay() is one of the most important functions todeal with. The plug-in has variables declared in it that contain information about the currentlyselected audio file:

m_nNumWAVEChannels;

int m_nSampleRate;

int m_nBitDepth;

Just prior to calling prepareForPlay(), the client sets these values on the plug-in object. Thereason this is done at this point is that the user can load multiple audio file types of varyingchannels (mono or stereo), sample rates, and bit depths at any time; thus, this is a per-playmethod. Many algorithms require these values to be known before certain things can be createdor initialized. Almost all filtering plug-ins require the sample rate in order to calculate theirparameters correctly. After prepareForPlay() returns, audio begins streaming. When audio

4

streams, the client repeatedly calls processAudioFrame(), passing it the input and output bufferpointers. This continues until the user hits Pause or Stop.

1.6 Destruction

When the user unloads the DLL either manually or by loading another plug-in, the client firstdeletes the plug-in object from memory, which calls the destructor. Any dynamically declaredvariables or buffers need to be deleted here. After destruction, the client unloads the DLL fromthe process space.

5

Chapter 2

Anatomy of the generalizedFeedback Delay Networkframework

2.1 Origins of FDN Structure

In the 1982 Stautner and Puckette introduced a new structure of artificial reverb based on a setof delay lines interconnetted with a feedback loop by means of a matrix. More recently thesestructures have been called Feedback Delay Networks (FDN). The Feedback Delay Network ofStautner and Puckette has been obtained as a vectorial generalization of the recursive comb filter:

y(n) = x(n−m) + g ∗ y(n−m)

in which the delay line of m samples has been substituted with a bunch a delay lines havingdifferent ranges, and the feedbak gain g has been substituted by a feedback matrix G. Figure 2.1draws the structure of this FDN:

Jean-Marc Jot proposed to use some classes of unitary matrices, allowing a more efficientimplementation. Furthermore, it was showed how to control the location of poles of the FDN in

Figure 2.1: FDN Structure

6

such a way to set the desired decay time for different frequencies. Driven by perceptual aspects,Jot introduced an important design criteria which says that all the modes in a frequency rangeshould decay at the same rate, in order to avoid the persistence of isolated resonances in thetail of the reverberation. This is not what happens in real rooms, in which different resonancefrequencies can be affected in different manners by the wall absorption.

2.2 The Feedback Matrix

The choice of the feedback matrix is critical to the overall sound and computational requirementof the reverberator. Lots of matrices can create a lossless prototype, such as the diagonal matrixequivalent to the parallel comb filter network. Stautner and Puckette used the following unitarymatrix:

A =

0 1 1 0−1 0 0 −11 0 0 −10 1 −1 0

because of its interesting properties when used in a multi-channel output system. However,significant improvement can be obtained by using a unitary matrix with no null coefficients. Thiswill produce a maximum density while requiring the same amount of delay lines. One such matrixis taken from a class of Householder matrices, derived with the following formula:

AN = JN −2

NuNu

TN

2.3 General Structure

An FDN is built starting from N delay lines, each being ti = miTs seconds long, where Ts = 1/fsis the sampling interval. The FDN is completely described by the following transfer function:

H(z) = cT [D(z−1 −A]−1b+ d

The diagonal matrix D(z) is called the delay matrix, and A is the feedback matrix. Thestability properties of a FDN are all ascribed to the feedback matrix. The fact that ||A||n decaysexponentially with n ensures that the whole structure is stable, and this is why we should choosea unitary matrix, to preserve the energy of signal and ensure marginal stability. Looking at thetransfer function we can notice that the poles of the FDN can be found as the solutions of theequation:

det[A−D(z−1)] = 0.

A unitary matrix allows the poles to fall in the unit circle. This choice leads to the constructionof a lossless prototype but this is not the only choice allowed. The zeros of the transfer functioncan also be found as the solutions of:

det[A− b1

dcT −D(z−1)] = 0

In practice, once we have constructed a lossless FDN prototype, we must insert attenuationcoefficients and filters in the feedback loop. For instance, following the indications of Jot, we cancascade every delay line with a gain: with this choice of the attenuation coefficients, all the polesare contracted by the same factor alfa. In order to have a faster decay at higher frequencies, as

7

happens in real enclosures, we must cascade the delay lines with lowpass filters. If the attenuationcoefficients gi are replaced by lowpass filters, we can still get a local smoothness of decay timesat various frequencies by satisfying the condition, where gi and alfa have been made frequencydependent. Another critical set of parameters is given by the lengths of the delay lines severalauthors suggested the use of sample lengths that are mutually coprime numbers in order tominimize the collision of echoes in the impulse response.

8

Chapter 3

Implementation of FDN reverb asa RackAFX plugin

3.1 Delay line

The first important element in the puzzle of the FDN is the delay line. There will be more delaylines in the processing framework, feeded by the feedback matrix, but we need to have an objectclass which defines one delay line, and then declare all the delay objects we need, dependingon the size of the network. The Digital Delay Module (CDDLModule) we implemented hasthe following characteristics: it implements an n-sample delay line, user controllable, up to 2seconds of delay and delay is given in milliseconds. We have implemented a mono version in asingle module,which means that it takes a mono signal in input and generate a mono signal inoutput. In order to implement a stereo effect in the plug-in we istantiate two module objects,one for left channel and one for right channel. In the interface of FDLModule class we add thecooked variable DelayInSamples. The delay time cooked variable is a float, DelayInSamples ratherthan an integer number of samples. This is because we allow the use of fractional delay. In theconstructor we initialize the variables. The formula to compute the delay time in samples fromthe delay time in milliseconds is:

Samplesdelay = (DmSec)[samplerate

1000].

Since we need to use this plug-in as a module for FDN implementation, it’s a good designidea to define a cooking function to handle the work of recalculating all the plug-in’s variablesregardless of which ones actually change. This function is simply:

void CDDLModule::cookVariables()

{

m_fDelayInSamples = m_fDelay_ms*((float)m_nSampleRate/1000.0);

}

It will be placed in the constructor, in the prepareForPlay method and in the userInterfaceChangemethod. For a delay line we need the following variables:

• float* which points to a buffer of samples (to put the delayed samples to be recalled by thefeedback line);

9

• An integer read index to read in the buffer;

• An integer write index to write in the buffer;

• An integer that is the size of the buffer in samples.

The delay line will be created dynamically and destroyed by the destructor at the end ofexecution. The problem is that we don’t yet know what the sample rate will be; we won’tknow that until the user loads a new file and begins playing it. Just before RackAFX callsyour prepareForPlay() function, it sets the sample rate on your plug-in’s SampleRate variable.Therefore, we will have to dynamically create and flush out the buffer each time prepareForPlay()is called. In the constructor, we set the Buffer to NULL as a flag to know that it is uninitialized,as well as zero the buffer size and read and write index values. To initialize the buffer with 0.0,we use the the memset function which flushes the buffer of data, and we need to do this eachtime prepareForPlay() is called, so we don’t play out old data at the onset. We are going to beflushing and resetting the buffer in several places in code, so it is also a good thing to make intoa function written as the following:

void CDDLModule::resetDelay()

{

if(m_pBuffer)

memset(m_pBuffer, 0, m_nBufferSize*sizeof(float));

m_nWriteIndex = 0;

m_nReadIndex = 0;

}

The setup of the read and write index will depend on the size of the delay time. We haveone minor detail to deal with, and this is going to happen when we use the delay line in aread-then-write fashion. If the user has chosen 0.00 mSec of delay, then the write pointer andread pointer will be the same. This also occurs if the user selects the maximum delay value sincewe want to read the oldest sample before writing it. So, we need to make a check to see if there isno delay at all and deal with it accordingly. The code in the central processAudioFrame methodis the following:

bool __stdcall CDDLModule::processAudioFrame(float* pInputBuffer,

float* pOutputBuffer, UINT uNumInputChannels, UINT uNumOutputChannels)

{

// Do LEFT (MONO) Channel

// Read the Input

float xn = pInputBuffer[0];

// Read the output of the delay at m_nReadIndex

float yn = m_pBuffer[m_nReadIndex];

// if delay < 1 sample, interpolate between input x(n) and x(n-1)

if(m_nReadIndex == m_nWriteIndex && m_fDelayInSamples < 1.00)

{

// interpolate x(n) with x(n-1), set yn = xn

yn = xn;

}

10

// Read the location ONE BEHIND yn at y(n-1)

int nReadIndex_1 = m_nReadIndex - 1;

if(nReadIndex_1 < 0)

nReadIndex_1 = m_nBufferSize-1; // m_nBufferSize-1 is last location

// get y(n-1)

float yn_1 = m_pBuffer[nReadIndex_1];

// interpolate: (0, yn) and (1, yn_1) by the amount fracDelay

float fFracDelay = m_fDelayInSamples - (int)m_fDelayInSamples;

// linerp: x1, x2, y1, y2, x

float fInterp = dLinTerp(0, 1, yn, yn_1, fFracDelay); // interp frac between them

// if zero delay, just pass the input to output

if(m_fDelayInSamples == 0)

yn = xn;

else

yn = fInterp;

// write the input to the delay

if(!m_bUseExternalFeedback)

m_pBuffer[m_nWriteIndex] = xn + m_fFeedback*yn; // normal

else

m_pBuffer[m_nWriteIndex] = xn + m_fFeedbackIn; // external feedback sample

// write the input to the delay

m_pBuffer[m_nWriteIndex] = xn + m_fFeedback*yn;

// create the wet/dry mix and write to the output buffer

// dry = 1 - wet

pOutputBuffer[0] = m_fWetLevel*yn + (1.0 - m_fWetLevel)*xn;

// incremnent the pointers and wrap if necessary

m_nWriteIndex++;

if(m_nWriteIndex >= m_nBufferSize)

m_nWriteIndex = 0;

m_nReadIndex++;

if(m_nReadIndex >= m_nBufferSize)

m_nReadIndex = 0;

// Mono-In, Stereo-Out (AUX Effect)

if(uNumInputChannels == 1 && uNumOutputChannels == 2)

pOutputBuffer[1] = pOutputBuffer[0]; // copy MONO!

// DDL Module is MONO - just do a copy here too

// Stereo-In, Stereo-Out (INSERT Effect)

11

if(uNumInputChannels == 2 && uNumOutputChannels == 2)

pOutputBuffer[1] = pOutputBuffer[0]; // copy MONO!

return true;

}

3.2 Low-Pass Filter Class

In order to emulate the high frequency decay of a true reverb cue, and to avoid audible artifactcaused by thr increasing of high frequency density (due to feedback network) we have to insertinto the FDD a low pass filter. We implemented the module COnePoleLPF that is a simpleone pole low pass filter with one pole specified by a double variable computed by the geneticalgorithm Unlike the DDLModule this module doesn’t need a buffer that keep memory of pastsamples so the prepareForPlay function is not needed. The pole is initialized to 0 before settingthe proper value. Actual samples are computed by the formula:

outputSample = inputSample · (1− gain) + (gain · filterZero)

Where gain is the filter gain and filterZero is the zero corrisponding to the single pole. Thesimple implementation oft he processAudio function is the following:

bool COnePoleLPF::processAudio(float* pInput, float* pOutput)

{

// read the delay line to get w(n-D); call base class

// read

float yn_LPF = *pInput*(1.0 - m_fLPF_g) + m_fLPF_g*m_fLPF_z1;

//float yn_LPF = *pInput*(m_fLPF_g) + (1.0 - m_fLPF_g)*m_fLPF_z1;

// this just reverses the slider

//float yn_LPF = *pInput*(m_fLPF_g) + (1.0 - m_fLPF_g)*m_fLPF_z1;

// underflow check

if(yn_LPF > 0.0 && yn_LPF < FLT_MIN_PLUS) yn_LPF = 0;

if(yn_LPF < 0.0 && yn_LPF > FLT_MIN_MINUS) yn_LPF = 0;

// form fb & write

m_fLPF_z1 = yn_LPF;

*pOutput = yn_LPF;

// all OK

return true;

}

3.3 Feedback Delay Line class

Using the previous basic block, either CDDLModule and COnePoleLPF we can now build up thesingle feedback delay line and further and then the feedback delay network.

12

Figure 3.1: Delay line with absorption filter

From the figure 3.1 we can see the block representation of the single feedback delay line thatwe have implemented in the CFDLModule class. By implementation choices, explained in thepaper [1], we use a four line feedback delay network where each line is implemented in the wayshowed in figure 2.1. For each line a CDDLModule and a COnePoleLPF object are instantiated,those modules are initialized by value retrieved from the main class Geneverb that interfaces withRackAFX GUI. In the setDelayVariable function the is in charge to do this initialiation. The realelaboration of the signal happens in the processAudioFrame function, here is an example of firstfeedback delay line computation:

x1 = fInputSample*m_fb1 + m_fmatrix.data[0][0]*m_fbout1 +

m_fmatrix.data[0][1]*m_fbout2 + m_fmatrix.data[0][2]*m_fbout3 +

m_fmatrix.data[0][3]*m_fbout4;

m_Delay1.processAudioFrame(&x1,&f_delayOut1,1,1);

m_Lpf1.processAudio(&f_delayOut1,&m_fbout1);

Obviusly also the x2, x3 and x4 sample are computed as a result of line 2,3,4 audio processing.The contribute of the four lines are then summed and wighted by the wet/dry coefficient.

yn = m_fdry*fInputSample + m_fc1*m_fbout1 + m_fc2*m_fbout2 +

m_fc3*m_fbout3 + m_fc4*m_fbout4;

A final stage called diffusion manages the simulation of the wash of later reflections. Anotherdelay network is required after the initial reverberation process to address this. The diffusionalgorithm is composed of N different-length delay lines, all less than 300 samples where the outputof the delay line are summed. In our case we choose N=6 with delay length values coprime amongeach others to avoid intermodulation effects. The Diffusion module is feeded with the output ofthe feedback delay network and the result is set as output of the FDLModule.

3.4 Geneverb class

This is the main class where the parameters are read from the RackAFX GUI and the input andoutput buffers are managed. In order to allow a stereo track in input of our plug-in we istantiatetwo object CFDLModule, one for each channel. Once the samples are read from the input bufferthe Feedback Delay Lines takes charge of all audio processing. Then the result is copied in theoutput buffer.

//initialize delay lines

m_FDL_left.processAudioFrame(&fInputLeft,&fOutputLeft);

13

m_FDL_right.processAudioFrame(&fInputRight,&fOutputRight);

pOutputBuffer[0] = fOutputLeft;

// Mono-In, Stereo-Out (AUX Effect)

if(uNumOutputChannels == 2)

pOutputBuffer[1] = fOutputRight;

14

Chapter 4

Implementation of the GeneticAlgorithm in Matlab

4.1 The Genetic Algorithm

As a search algorithm, the Genetic Algorithm is often used in applications where a solution isknown to exist, but deriving the solution mathematically is incredibly difficult. It is a guess-and-check algorithm, revolving around the idea of “survival of the fittest.” The Genetic Algorithmperforms as follows:

• Starting Phase: take an initial set of examples, i.e. a population, where each example iscalled an individual.

• Evaluation Phase: set an evaluation parameter for each individual, using a fitnessfunction.

– The fitness function is calculated by the distance between the performance parameterof an individual and the desired outcome.

• Selection Phase: after every individual in the population has been evaluated, the pop-ulation is split into two parts: the lower performing part is discarded while the higherperforming part is used to refill the population using a mating algorithm.

• Mating Phase: pick two individuals – to be considered as parents – and combine them insome way to produce a new individual (child).

– Parents are randomly picked from the top performers to produce children until thepopulation is back to its original size.

• Iteratations: the genetic algorithm produces many cycles (generations) with the idea thatthe individuals should produce increasingly better results with each additional generation.

4.2 Contextualization of GA to the FDN

In the case of the search for the best coefficients in the Feedback Delay Network to synthesize adesired impulse response, and individual could be the set of coefficients of the feedback matrix,

15

Figure 4.1: Similarity measure between two envelope functions

together with the gain coefficients of the low pass filters after the delay lines. This is what thewhite paper of Will Pirkle suggests, but a wiser choice could embed in an individuals also thevector containing the delay in samples of each delay line, which are fundamental parametersto model a proper FDN. The first generation of the individuals could be a random generationconsisting of unitary matrices and gain coefficients between zero end one. After the generationphase we have to provide an effective way to calculate and compare the resulting impulse responsesof the systems of every individual. The way is to compare the average of each sample for theentire signal, but this method does not produce the expected correlation between the sound ofthe reverb and the total error. An audio envelope detector is used as the comparison functionfor the study. Envelopes have associated attack and release times, which define how closely theenvelope traces the audio signal.

To determine fitness, the envelope of the synthetic impulse response is compared to that ofthe user-defined impulse. The difference between each sample creates a delta value, ε. Each εis then summed. The individual with the lowest total delta is deemed best. Then it comes theselection phase, by choosing a proper percentage of best individuals in the population. To mate,two individuals from the top portion are chosen at random. Each component of their coefficientmatrices is averaged to create the child coefficient matrix. Each LPF cutoff frequency is alsoaveraged to create the child LPF cutoff frequencies. This child is added to the set, and the processis repeated until the population is back to its original size. During mating, there is always apossibility of mutation, or the altering of an individual from its initial state, which creates anentirely unique individual. If a mutation produces better results, it is promoted into the top 35%and used for mating. If not, it is discarded and has no further effect on the population Everycomponent in the coefficient matrix has a 10% chance of mutation. If a component is flaggedfor mutation, it is assigned a value between -1 and +1 randomly. This value is used insteadof the average of the parents’ components. LPF cutoff frequencies have a 5% chance of beingmutated. Mutations cause the LPF cutoff frequency to be set to a random value between 5 –15kHz. As with any other implementation of the genetic algorithm, there is always the possibilityof degeneration. For the purposes of this study, degeneration is defined to have occurred if atleast 75% of the population has the same delta. If degeneration is found, every individual isdiscarded except for the bestranked one. Then, the set is filled with new, randomly generatedindividuals using the same algorithm that initially seeded the population.

16

4.3 Implementation in Matlab

The implementation in matlab of the genetic algorithm can be regarded as a comfortable way tohave a first glimpse of how the algorithm behaves and what impulse responses may it produce,with what level of similarity with the desired impulse response. The drawback of using matlab isthe huge computational effort that a calculator must tackle to deal with such an algorithm withthe proper number of starting population and iterations. A significative improvement can beobtained by automating the process of setting the coefficients in the rackAFX plugin to obtain thecorresponding impulse response, and therefore to rewrite the algorithm as a C++ set of classesand methods. Before writing the algorithm in matlab we needed to redefine the FDN topology,so as to get the impulse responses as output vectors of a function, taking the impulse as input.This is done in the FDN rev function, which is a modified version of an original script presentedwith the book Digital Audio Effects (DAFX). In the algorithm we need to produce randomunitary matrix, a process outlined in the unit matr funtion, which leverages the properties ofthe QR-decomposition to be sure to take only unitary matrices. The seed p function creates theinitial population with random generations of unitary matrix coefficients and the gain coefficientsof the low pass filter. This function in turn uses the fitness delta function to calculate the deltacoefficients which measure similarity between two different impulse responses. To provide thisdelta we calculate the envelope of each impulse response in the audio envelope function, whichuses the method of envelope detection explained in the paper of Pirkle. It is well worth notingthat there are several approaches for designing a software algorithm for the envelope detector.The method proposed by Pirkle uses half-wave rectifier followed by an LPF whose input is scaledby the attack or release time, depending on whether the signal is rising above the current envelopeoutput or falling below it. The formulas for the calculus of attack and release coefficients are asfollows:

τa = eTC/(attackinmSec∗SampleRate∗0.001)

τr = eTC/(releaseinmSec∗SampleRate∗0.001)

where TC=log(0.01).The fitness delta is the parameter which is taken as reference to select the best individuals of

the population, in the genetic selec function. Finally the mating function repopulate individualsand turn back population to its original size, as previously explained in the genetic algorithmdescription. The genetic algorithm function collects all these steps and iterate for the desirednumber of times, taken as input argument in the function.

4.4 Further Changes in the Algorithm

Here is a list of some changes which appear in the final version of matlab algorithm:

• Audio Envelope: we implemented a different version of Envelope follower, since the oneexplained in the paper didn’t perform well in all cases, probably due to a wrong setting ofthe attack and release times. We tried to come up with an envelope which circumnavigatethe attack peak value of the IR, follows the pattern of IR with a moving average of 100samples and crosses the release value.

• Feedback Delay Network: we inserted a pre-delay input argument to defer the peak ofthe IR, placing it in corrispondence to the attack time of desired IR. This parameter is due tothe distance the direct sound must travel from its source to arrive to the listener/microphoneposition in the room.

17

• Genetic Algorithm: in the whole algorithm implementation we keep trace of the bestdelta values found at each iteration and we plot the best corresponding IR.

• Mating and Seed Population: we added the delay lines values in samples, generatedrandomly within a maximum limit of delay.

4.5 Behaviour of the Algorithm

The execution of the whole genetic algorithm in matlab takes a huge amount of time in our PC ifwe set proper values of population and number of iteration, so it was almost impossible to carryout a satisfying evaluation of the result. We pulled out same data with 200 and 300 individualsas starting population and a small amount of iterations, skipping the optional phase of mutation.A better optimization of the algorithm would require re-writing it in C++ language or anotherprogramming language with better performance than matlab. The results showed that witha population of few hundreds of individuals the best delta tends to converge to a value whichcan’t be overtaken, after about five iterations of the algorithm (in some cases three iterations aresufficient for the convergence). An example of IRs extracted from the algorithm is depicted infigure 4.2.

Figure 4.3 shows a typical set of deltas of refunded population, in which the mating generatedthe best individuals (the values in the left part of the chart), whereas best individuals of theprevious population have the delta values shown in the right side of the chart. We can guessfrom this behaviour that the algorithm tends to reach a limit value of similarity descripted bythe delta, and then is unable to provide further improvements. The early convergence may becaused by the physical limits of the FDN structure, or by the use of a poorly sensitive type ofenvelope follower (in the example case the one with the Hilbert Transform was used).

18

(a) IR at the end of the first iteration. (b) IR at the end of the second Iteration.

(c) IR at the end of the third Iteration.

(d) IR at the end of the fourth Iteration. (e) IR at the end of the fifth Iteration.

(f) Desired IR.

Figure 4.2: Sequence of impulse responses corresponding to the individual with the best delta ateach iteration

19

Figure 4.3: delta of reborn population

20

Chapter 5

Conclusion: PossibleImprovements

The setting of a perceptual reverberation with a genetic algorithm is an experimental approachthat could not reach optimal results, anyway many ways of development are possible, departingfrom this project. We could add many other parameters of the FDN model, including the size ofthe matrix, the delay values of the diffusion chain and so on. Furthermore, we could separate theresearch of the best coefficients between left and right canal, enhancing the stereo configuration.The level of similarity between the impulse responses could involve the frequency response andthe phase response. The actual graphical user interface is just a draft and must be designedputting the right controllers for the users. In the audio processing pattern the algorithm couldtake different matrices, depending on the desired size of rooms.

21

Bibliography

[1] Michael Chemistruck, Kyle Marcolini, and Will Pirkle (2012), Generating Matrix Coefficientsfor Feedback Delay Networks Using Genetic Algorithm, http://www.aes.org/e-lib/browse.cfm?elib=16537.

[2] Edited by Udo Zolzer (2002), Digital Audio Effects, http://www.dafx.de/.

[3] Jasmin Frenette (2000), Reducing Artificial Reverberation Algorithm Requirements Using Time-Variant Feedback Delay Networks, http://www.music.miami.edu/programs/Mue/Research/jfrenette/.

[4] Michael Affenzeller, Stephan Winkler, Stefan Wagner, Andreas Beham (2009), GeneticAlgorithms and Genetic Programming, http://books.google.it/books/about/Genetic_Algorithms_and_Genetic_Programmi.html?id=EkELtZAXViEC&redir_esc=y.

[5] Will Pirkle (2012), Designing Audio Effect Plug-Ins in C++, http://www.willpirkle.com/.

22

Date post:	10-Jul-2015
Category:	Technology
Upload:	lorenzo-monni
View:	1,045 times
Download:	0 times

Developing a Reverberation Plugin with the aim of a Genetic Algorithm

Technology