277049642 Harmonising Chorales

transcript

7/25/2019 277049642 Harmonising Chorales

http://slidepdf.com/reader/full/277049642-harmonising-chorales 1/181

Harmonising Choralesin the Style ofJohann Sebastian Bach

Moray Allan

NI VER

D IUN B

Master of ScienceSchool of InformaticsUniversity of Edinburgh2002

AbstractThis dissertation describes a chorale harmonisation system which uses HiddenMarkov Models. We use a standard data set of chorale harmonisations composedby Johann Sebastian Bach. This data set provides a large number of stylisticallysimilar harmonisations, and is freely available in a machine-readable format. Wedivide the data into training and test sets, and compare the predictive power ofvarious models, as measured by cross-entropy, the negative log likelihood persymbol. Using Hidden Markov Models we create a harmonisation system whichlearns its harmonic rules by example, without a pre-programmed knowledge base.We assume that we only need to take into account short-term dependencies inthe local context. However, we generate globally probable harmonisations, ratherthan choosing the locally most likely outcome at each decision. The results produced by the system show that pre-programmed harmonic rules are not necessaryfor automatic harmonisation. Statistical observation of training examples providesthe harmonic knowledge needed to generate reasonable chorale harmonisations.

AcknowledgementsI would like to thank my supervisor, Chris Williams, for his ideas and encouragement while I worked on this project. I would also like to thank my parents forproof-reading this dissertation.

DeclarationI declare that this thesis was composed by myself, that the work contained hereinis my own except where explicitly stated otherwise in the text, and that this workhas not been submitted for any other degree or professional qualification exceptas specified.

(Moray Allan)

Table of contents1

Introduction

Background

Musical background . . . . . . . . . . . . . . . . . . . . . . . . .

Harmonisation . . . . . . . . . . . . . . . . . . . . . . .

Chorales . . . . . . . . . . . . . . . . . . . . . . . . . .

Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Constraint-based systems . . . . . . . . . . . . . . . . . .

Genetic algorithms . . . . . . . . . . . . . . . . . . . . .

Sequence prediction . . . . . . . . . . . . . . . . . . . .

Neural networks . . . . . . . . . . . . . . . . . . . . . .

Goals for a harmonisation system . . . . . . . . . . . . . . . . . .

Predictability and cross-entropy . . . . . . . . . . . . . . . . . .

Bach's chorales . . . . . . . . . . . . . . . . . . . . . . . . . . .

Working with the chorales . . . . . . . . . . . . . . . . . . . . .

Data format . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Training and test data . . . . . . . . . . . . . . . . . . . . . . . .

Sequence prediction

Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Results and discussion . . . . . . . . . . . . . . . . . . . . . . .

Hidden Markov Models

Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Hidden Markov Models as generative systems . . . . . . . . . . .

Viterbi algorithm . . . . . . . . . . . . . . . . . . . . . .

Sampling . . . . . . . . . . . . . . . . . . . . . . . . . .

Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Final harmonisation model

Building a harmonisation model . . . . . . . . . . . . . . . . . .

The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Harmonic skeleton . . . . . . . . . . . . . . . . . . . . .

Chord skeleton . . . . . . . . . . . . . . . . . . . . . . .

Ornamentation . . . . . . . . . . . . . . . . . . . . . . .

Example audio files . . . . . . . . . . . . . . . . . . . . . . . . .

Conclusions and future work

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Appendix

Bibliography

Chapter 1IntroductionThis dissertation investigates automatic harmonisation of chorales. This task canbe described as follows: given a line of music, can we automatically create threefurther lines of music which will sound pleasant when played simultaneously withthe original melody? We will examine how we can use an existing set of choraleharmonisations to get a machine to learn how to harmonise chorales, and how wecan use the same data to evaluate the quality of the harmonisations it produces.Even music students who have no wish to become composers are asked tocompose simple pieces of music to show their understanding of the harmonic language of Western classical music. These exercises in composition often includewriting harmonisations of chorale melodies. This task is seen as open enough toallow a student's skill to be judged, but constrained enough that it is not free composition. It is not originality that is being looked for, but an understanding of thebasic `rules' of harmonisation, codifications of aesthetic preferences. A machinelearning approach to this task attempts to build as good a model as possible fromexample harmonisations. This model can then be used to predict likely harmonisations given the melodic and harmonic context, and by iterating across manycontexts we can create a harmonisation for an entire melody.In this dissertation we will use chorale harmonisations by Johann Sebastian

Bach as our examples. These provide a relatively large set of harmonisations byasingle composer; they are freely available, and well understood by music theorists.1

Chapter 1. Introduction

After a discussion of various model types, a final system will be described whichuses Hidden Markov Models to provide significant enhancements over previousharmonisation models, by taking into account probabilities over an entire choralerather than making decisions based only on the local context.Chapter 2 provides the background to the dissertation, including informationabout the musical background. A brief explanation is given of some music theorynecessary to understand the task of harmonisation, and the origins of the choraleform are related. An overview is provided of previous machine-learning researchrelevant to chorale harmonisation: several approaches have been taken in thepast, including constraint-based systems, genetic algorithms, sequence predictionmethods, and neural networks.Chapter 3 describes the data with which we will be working, the survivingchorale harmonisations of Johann Sebastian Bach. Various advantages of this dataset are identified to justify its choice, notably that the data set was alreadyfreelyavailable in an annotated machine-readable format. An example of a chorale harmo

nisation in this format is included, and the annotations provided by the editionare explained. Reasons are given for the division of the data into separate sets oftraining and test data.Chapters 4 and 5 examine alternative ways in which we can create models ofthe chorale harmonisation data. In Chapter 4 sequence prediction using Markovmodels is described and applied to the data, while Chapter 5 describes the use ofHidden Markov Models. In both chapters an explanation is given of the assumptions made by the model type, and the mathematics needed to build the models.The predictive power of the various models is compared.Chapter 6 describes a harmonisation model which was built taking into account the results of the previous chapters, and discusses the results which this

final model produces. Some examples of chorale harmonisations generated by themodel are included.Chapter 7 asks what conclusions we can make from the preceding material.Suggestions for future work are given, including possible enhancements to themodel described in Chapter 6.

Chapter 2BackgroundThis chapter provides an overview of the background to the work described in thisdissertation. After a brief explanation of some music history and music theorywhich motivate the activity of harmonisation, we will explore some of the previouswork relating to automatic harmonisation.

2.1 Musical background2.1.1 HarmonisationSince the middle ages, much of Western music has been polyphonic, with two ormore musical voices being heard at the same time. In church choirs, for example,singers came to make intentional deviations from the set tune, when they foundthat this improved the overall effect of the music. Later, composers of writtenmusic began to write complex polyphonic music which needed careful planningin advance. By the eighteenth century a system of rules had developed, dictatingwhat combinations of notes were allowed to be played at the same time or following each other. These rules are still relevant to much music today: while the formsof music have changed, the underlying system has seen few changes.This tonal system of music classifies tunes according to a system of keys. Ina melody line, a single pitch is the centre, to which the line must always return.

Chapter 2. Background

Similarly, a single chord is made the centre of the polyphonic texture, to whichthe lines must come together.Notes are seen as equivalent to those of half or double their frequency, and canbe used to perform the same functions. The intervening pitches are quantised, witheight notes (including both ends) allowable in any given key as we move from anote to that of double the frequency, so that this interval is known as an `octave'.These eight notes are placed unevenly, since they are not in origin a sequence butare generated by ratios of frequencies. After the 2:1 ratio of the octave, notes in a3:2 ratio are viewed as those which go best together – the note whose frequencyis in a ratio of 3:2 to the fundamental note is the fifth in the scale. A ratioof 4:3gives rise to the fourth note in the scale. One possible tuning for the remainingnotes can be generated just by multiplying these ratios, or in practical terms bytuning the remaining notes on a keyboard instrument from these first three notesby repeating the ratios up and down the keyboard.

An alternative view sees the eight notes as a subset of thirteen notes in geometric progression, so that each note is in the same ratio to the next, 1:

2. This

view had not yet taken hold in the eighteenth century (though it is the system generally used today), but it highlights the unevenness of the gaps between the eightnotes. Traditionally the larger gaps have been classified as tones and the smallerones as semitones. In a `major' key the gaps between the eight notes run tone –

tone – semitone – tone – tone – tone – semitone. The seven distinct notes in anoctave are named A to G. For historical reasons relating to much older arrangements from which the tonal system developed, the most natural note from whichto start a major scale is C, since the scale derived from any other note will strayoutside the notes named so far. These in-between notes are referred to indirectly,as the `flat' ( ) or `sharp' ( ) version of the note above or below. If we use ratiostune a scale, rather than the artificial `equal-temperament' given by working withtwelfth roots, then C sharp, for example, is not the same note as D flat, or evenC flat as B. `Minor' keys use the pattern of gaps tone – semitone – tone – tone –semitone – tone – tone, so that the key of A minor, for example, gives the same

notes as C major. However, minor keys are more complex than major ones, in that

the sixth and seventh notes may be sharpened in certain contexts.Once this system of keys is established for monophonic melodies, the ideaof keys can be extended to polyphony. If we want to choose notes to fit with amelody note, we will be choosing these from the scale in that note's key. So abasic C major chord is the chord containing the notes C, E, and G, adding thethird and the fifth notes in the major scale from C. The different notes in an octaveare also given names according to the harmonic functions that they play, so that,for example, the base note may be called the `tonic', the fifth the `dominant', andthe seventh the `leading note'.As is hinted at by the above, a great deal of analysis has been associated withthe tonal system. Not only has the system been deeply analysed, and everythinggiven (often several different) names, but individual pieces of music can be analysed in great detail on multiple levels. Harmonic analysis may be performed overthe several hour span of an opera, or over a single phrase of music, specifyingtheexact implications of each note. Indeed, the system itself encourages such analysis, or, equivalently, many composers thought about the harmonic functions of th

enotes they wrote in its terms.Into this framework certain rules are introduced when students are taughtabout harmonisation. These additional rules also clearly represent codificationsof aesthetic preferences, rather than natural laws, but some of them pre-date thetonal system. A famous example is the rule which says that `parallel fifths' arenot allowed. That is, if a voice plays the fifth note in the scale from the notewhich another voice is playing, then when these voices move to their next notesthe relationship between them must change.

2.1.2 ChoralesSince the sixteenth century the music of the Lutheran church had been centred on

the `chorale'. Chorales were hymns, poetic words set to music. A famous earlyexample is Martin Luther's chorale `Ein' feste Burg ist unser Gott'.Chorales at first had only relatively simple single melodic lines, but soon com-

posers began to arrange more complex music to accompany the original music.This chorale-based music was expressed in various different forms, some intendedfor performance by a trained choir, going beyond what could be expected fromcongregational singing.The music with which we will be concerned here represents generally a ratherstraightforward treatment of the chorales, but one which still offered many compositional possibilities. The chorale tune is taken unchanged, and three other musicalparts are created alongside it, supporting it and each other. To be interesting, theadded lines of music should not fit too easily with the melody, but should not clashwith it too much either, and should come together with it at the end. These needscan be expressed in terms of consonant and dissonant harmonies: a dissonancewill improve the music, if it is resolved into a pleasant consonance.This type of chorale harmonisation allows us to apply machine learning techniques to the harmonisation problem in a way that is somewhat detached fromproblems of melodic invention. By dealing with a much narrower field than freecomposition we can limit the variables we need to deal with, and come much

closer to a precisely-defined problem. These reasons explain why chorale harmonisation is so often used to train and to test students, and why it is an appropriatearena in which to investigate machine learning methods.

2.2 Previous work2.2.1 Constraint-based systemsEven while Bach was still composing chorales, music theorists were catching upwith musical practice by writing treatises to explain and to teach harmonisation.Two famous examples, Rameau's Treatise on Harmony (Rameau, 1722) and theGradus ad Parnassum by Fux (1725), show how musical style was systematisedand formalised into sets of rules. This seemed, and still seems, quite reasonabl

egiven their presumed basis in acoustic facts, although clearly the rules remainonlyexplanatory aids, describing a single aesthetic: they are the rules of a style rather

than of all possible musics.This emphasis on systematisation leads to the idea of automatically-composedmusic. Once a piece of music has started it can seem that the notes themselves,or rather the rules we impose on them, are in control. The events which follow`must' come to fulfil the musico-syntactic form that has begun. Can we therefore construct detailed enough rules to spell out this necessity, and draw it outas a mathematical conclusion? We can encode our rules as constraints, and useconstraint-satisfaction techniques to search for a solution which satisfies their requirements.Pachet and Roy (2001) provide a good overview of constraint-based harmonisationsystems. As an example, one early system (Schottstaedt, 1989) takes rulesfrom Fux and assigns them penalties of different sizes, according to the seriousness of each rule being broken. This system then conducts a modified best-firstsearch to produce harmonisations. Using standard constraint-satisfaction techniques for harmonisation is problematic because the space and time needs of thesolver tend to rise extremely quickly with the length of the piece. For example,the system developed by Tsang and Aitken (1991) uses 70 megabytes of storage tofind a harmonisation for a melody 11 notes long. Pachet and Roy (1995)demonstrate that space and time needs can be dramatically reduced by dividingthe problem up into simpler subtasks: they propose a two-part system, where indi

vidual notes must meet constraints to make chords, while these chords mustmeet constraints to make the finished harmonisation.A similar approach to a related problem is described by Löthe (2000). Thetask proposed here is to compose minuets in the early classical style. This taskis divided into simpler subtasks, with the overall structure of the piece plannedfirst, then the harmonic shape and melody of individual phrases, then the bass.Each subtask can then be represented as a search problem. Löthe lists the various sources from which domain knowledge may be acquired: period literature,modern harmonic theories, interviews with experts, manual analysis of examples,computer analysis, computer composition experiments, and interviews with composers. Some of these are included as possible sources of ìmplicit knowledge'

which is not included in the relevant literature. The approach to automatic harmonisation that will be proposed below effectively integrates knowledge acquisition into the overall task, by directly using statistics acquired from a data settoconstruct the model that will be used in finding solutions to harmonisation problems.

2.2.2 Genetic algorithmsGenetic algorithms are a relatively recent method (Holland, 1975) which can beused to find solutions to tasks where conventional constraint satisfaction techniques would take too long or require too much temporary storage space. Several systems have applied genetic programming techniques to harmonisation, forexample, McIntyre (1994). However, Phon-Amnuaisuk and Wiggins (1999) arereserved in their assessment of genetic programming as applied to this problem.They perform a direct comparison with an ordinary rule-based system, and conclude that the performance of each system is related to the amount of knowledgeencoded in it rather than the particular technique it uses. In their comparisontheordinary rule-based system actually performs much better, and they argue that this

is because it possesses implicit control knowledge which the system based on thegenetic algorithm lacks.Towsey et al. (2001) also discuss the use of genetic algorithms in automaticcomposition. They suggest that a better fitness function would be obtained bymeasuring statistics about a population against the same statistics calculated for acollection of training data. They propose various potential features, each a simple numerical measure of elements of a piece, in terms of its pitches, tonalities,contours, rhythms, and repetitive patterns. Principal components analysis is suggested as a method for uncovering relevant features, and providing target values.Their analysis here is inconclusive, but this is almost certainly because they u

seonly 36 training melodies, composed over a period of about five hundred years,rather than because this sort of statistical analysis would not be helpful.

2.2.3 Sequence predictionConklin and Witten (1995) create new chorale melodies using probabilistic finitestate grammars. They train their system on a sample of 95 Bach chorale melodies.(The chorale melodies were in fact written by many different composers over along period, but we can perhaps assume that Bach picked melodies to harmonisewhich appealed to his own compositional sensibility). Conklin and Witten createa `multiple viewpoint system' to make predictions that can be used to analyse existing melodies or to generate new ones. This system combines different`viewpoints', each a simple model of a property of the musical sequence. Theseproperties include, for example, the start time of an event, its pitch, duration, keysignature, time signature, its position in the bar, its interval from the firstevent inthe bar, and its interval from the first event in the phrase.Conklin and Witten's system is similar to some of the systems described abovein that it divides a complex task into simpler subtasks. The previous systemsmade a division into a number of subtasks to be performed sequentially, to dealwith the complex constraint requirements of harmonisation. In contrast this system deals with the difficult long-term relationships in melodies by constructinga large number of simple models to be used in parallel, rather than trying to ge

ta single model to learn the effects of both the closer and more distant melodicrelationships. Multiple viewpoint systems use weighted linear combinations ofMarkov models of different orders; alternatively we could work with the products of the models which we want to use together. Hinton (1999) describes theuse of such `products of experts'. He explains how individual expert models canbe combined using `logarithmic opinion pools', where we look at the average ofappropriately weighted log probabilities (since taking the sum of logarithms ofnumbers is equivalent to taking the product of the numbers). Hinton advocatesthis approach instead of using weighted arithmetic means because these productsof experts are much more efficient in high-dimensional spaces.One application of probabilistic finite state grammars to harmonisation is described by Ponsford et al. (1999). The data set used here is a selection of 84

saraband dances, by 15 different seventeen-century French composers. An automatically annotated corpus is used to train Markov models using contexts of different lengths, and the weighted sum of the probabilities assigned by these modelsused to predict harmonic movement. The longest contexts considered contain foursymbols: longer contexts not only exponentially increase the size of the model,but offer progressively less benefit. The predictions made by a Markov modeleffectively suggest what fragment of the training data is to be repeated; in musical grammars long fragments are unlikely to recur, and we generally want toavoid having obvious quotations from the training data in our generated output.Ponsford et al. create new pieces first by random generation from their models,and secondly by selecting randomly-generated pieces which match a preparedtemplate. Using templates gives better results, but the great majority of randomlygenerated pieces will not match the template and so will have to be discarded, andanother attempt made. Without templates the most likely piece is simply two identical major chords. Ponsford et al. note that even with the longer context of foursymbols, the cadences are poor: genuine pieces tend to use formulaic sequencesof notes in closing, which their models fail to produce.

2.2.4 Neural networksHild et al. (1992) use neural networks to harmonise chorales. The system described here divides harmonisation into three subtasks: first a `harmonic skeleton'is built, then a `chord skeleton' is instantiated from this, then `ornamentation'is added to the notes of the chords. The harmonic skeleton predictor is a neural network, trained for each beat on a context of the previous three harmonies,the previous, current, and next melody notes, the position in the bar, and a stressmarker. Final ornamentation is predicted by another neural network, also trainedfor each beat in the examples on an appropriate representation of context. Thestep between these, however, where chords are chosen to instantiate more general harmonies, includes constraint satisfaction. Parallel fifths, for example, arepenalised by this Knowledge Engineering aspect of the system, so that they will

be filtered out when the best chord is chosen from all those compatible with thepre-decided harmony.

2.3 Goals for a harmonisation systemIf we take the system developed by Hild et al. as an example, we can ask in whatdirections it would be interesting to develop it. We already know that KnowledgeEngineering approaches can work quite well, but can a less supervised systemsucceed at chorale harmonisation? Hild et al.'s system is promising in that, likePonsford et al.'s system, it breaks harmonisation down into simpler subtasks, buthere it is unclear how much the Knowledge Engineering aspect is relied upon – canwe adequately represent the complexities of chorale harmonisation by iteratingsubtasks, or is it really the constraint system that is doing the work?Of course, no system we produce will truly be `unsupervised', since such asystem would need to be capable of considering all possible models, of all kinds,for its training data and choosing the very best. However, in the following chapters we will keep the aim of trying to construct a simple, relatively unsupervisedsystem. To achieve this, decisions made by the model will be based as muchas possible on statistics learnt from the training data rather than programmed-i

nbiases. At the same time, so that we can plausibly obtain good results from arelatively small amount of training data, we will retain some amount of supervision by imposing an unchangeable division into subtasks. In the limited domainof chorale harmonisation this kind of supervision will prevent invalid chorales,but still allow the system to learn from a composer's individual style, somethingwhich is prevented when harmonic rules are provided in advance.Each decision in Hild's system only looks for a local maximum: later iterations cannot go back and challenge decisions already made. The system is pursuing a `greedy search', with no backtracking. Ideally we would like to be ableto find the harmonisation which has the maximum probability over the wholechorale, rather than the local maximum for a particular attribute. Ponsford et al.

noted the problems their system had in predicting cadences, and although theyblame this on the length of the context used by their system, if we look for theglobal maximum then we should be able to predict cadences even with a shortcontext. By looking for the global maximum our system will end up planningahead, choosing the note now that will put it in the best position for making laterchoices.The rest of this dissertation will investigate creating a chorale harmonisationsystem, following these aims: it should be `less supervised', avoiding pre-writtenrules about harmony; it should be simple, making decisions based on short-termmodels which look at the local context; it should be capable of being used tofind globally probable harmonisations, rather than `greedily' choosing the localmaximum at each decision.

2.4 Predictability and cross-entropyBefore we investigate different systems it is worth asking how we can comparethem, and what measure we can use to back up our feeling that a system is performing well or badly. Hild et al. state of their system that, `An audience of musicprofessionals judged the performance [...] to be on the level of an improvisingorganist.' While this evaluation is easy to understand at an immediate level, it i

shard to know what it actually means. To start with, what did these musicians thinkwas the general standard of improvising organists – are they giving the system agreat commendation or is this a tactful way of saying that it makes mistakes? Itwould be preferable if we could conduct a more precise evaluation than this.One way of evaluating a predictive system is to feed in an unseen harmonisationby Bach and ask how likely the system thinks it is. In general, a betterpredictor will allocate the unseen harmonisation a higher probability; averagingover a test data set should smooth out the noise and make this more reliable. Wecan compare the relative performance of systems by comparing `perplexity': giventhe appropriate contexts, which system allocates higher probabilities to the events

which actually occur? The precise measure we will use, taken from informationtheory, is `cross-entropy', or the negative log likelihood per symbol (Shannon,1948). To estimate the cross-entropy, we iterate over some test data, summing thenegative base two logarithm of the probability of each symbol given our model;finally, we divide by the number of symbols traversed. A better model of somedata will assign a greater probability to the events which actually occur in the data,and will therefore have a lower cross-entropy.

Chapter 3DataThis chapter provides a description of the chorale harmonisation data with whichwe will be working, and of the specific machine-readable edition which we willuse. An example of a chorale harmonisation in the textual format provided by thisedition is given, and the annotations included in the files are explained. Reasonsare given for the division of the data into separate sets of training and test data.

3.1 Bach's choralesThe data set we will be using is the surviving chorale harmonisations of JohannSebastian Bach. This has long been a standard data set among musicians, andmore recently has been used in music-based machine learning work. Some ofthe chorales were published during Bach's lifetime, but most were edited after hisdeath by his son Carl Philipp Emmanuel. They are available in several printed editions – for example, Riemenschneider (1941) – and also in various freely-availableelectronic editions. The edition used here (Bach, 1998) includes the chorales notonly in a MIDI format suitable for playback using a sequencer, but a textual fileformat, annotated with bar numbers, phrase markings, and harmonic symbols.The chorale data set is relatively large, containing 384 chorale harmonisations.

Since all the harmonisations are by the same composer we can reasonably assumethat a single model might be able to explain them all. Bach's chorales are rela14

Chapter 3. Data

tively well-understood, and although, for example, some chorale harmonisationscomposed for special occasions are especially complex, they do seem to share acoherent musical style with the simpler harmonisations.In some cases Bach wrote more than one harmonisation of the same choralemelody. It could be argued that we should remove these as duplicates, but thissort of interference with the data does not seem justified. At the least, a choralemelody which appears several times is almost certainly more typical, so even ifweend up effectively counting it twice we are only giving it a justifiable weighting.These different harmonisations of the same melodies should also remind usthat we should not attempt or expect to generate harmonisations precisely thesame as Bach's. Even a perfect model would need external data to achieve this: forexample, for what occasion was the chorale harmonisation originally composed?Some important data that we will not take into account is the words of thechorales. Each chorale melody remains associated with the words of the hymnto which it originally belonged, and research has shown that at times Bach illustrates the meaning of the words in his harmonisation of the relevant fragment ofmusic. It would be an interesting enhancement of a system like this to learn the

emotional content of words in the chorale texts and to make this feature availableto a generative model, but the system described here makes no attempt to do this.

3.2 Working with the choralesThe textual edition of the chorales was chosen for use for several reasons. WhereasMIDI is a binary data format, and cannot easily be read directly by humans, thetextual format is easy to read. Computer packages for music notation often makesome attempt to read in MIDI files and to present them in a score notation, butsuchtranscription is hard to perform well, since the MIDI format was not designed to

provide the necessary information. Moreover, the textual format is probably easier for non-musicians to read than a score format. The textual format is obviouslyeasy to extend with additional data, and the files used here do in fact includedata

Chapter 3. Data

" ! !! ! ! ! !! #

! !! " ! !!

Figure 3.1: Chorale K11, BWV 26.6, `Ach wie nichtig, ach wie fluchtig'¨

Chapter 3. Data

Choralname = bch011Anzahl Stimmen = 4Tonart = A-mollTakt = 4/4Tempo = 100Notentextausgabe in 16tel-Schritten:PHRASE

SOPRAN

HARMONIK

Figure 3.2: Beginning of data file for chorale K11, BWV 26.6, `Ach wie nichtig, ach

wie fluchtig'¨

Chapter 3. Data

40003500300025002000150010005000

Figure 3.3: Frequency of the Nth most common harmonic symbol

in MIDI format, it would be possible to tune the conversion program, configuring

for example the number of beats per minute that we wish to use, and to produceversions in the textual format without having to re-enter all the data manually. Afurther program (chorale2lilypond.pl) was written to produce output in GNULilypond format for typesetting in score notation.In total 81 different harmonic symbols appear in the annotations of all thechorale harmonisations. The most common harmonic symbols are `T', a tonicchord, and `D', a dominant chord. The least common harmonic symbols are`VTp5', `Vd5' and `SS5', which each occur only once in the entire set of harmonisationsfigure 3.3 shows, the harmonic symbol frequencies follow a Zipf-likecurve, often found in statistical natural language. The data is sparse, and we cannot have any confidence that we will produce a good model of the least frequent

occurrences.To ensure that the programs created all used the same interpretation of the data

Chapter 3. Data

files, I wrote the necessary parsing routines and included them in a reusable Perlmodule (Chorale.pm).

3.4 Training and test dataSince chorales in major and minor keys are known to exhibit different harmonicbehaviour, two separate groups of training and test data were needed. Once thechorales had been categorised by their major or minor key, four data sets wereused for each category: a training set containing 40% of the available chorales,and three test sets each containing 20% of the available chorales.Using several test data sets in this way allowed methods to be fairly tested onfresh test data after they had been optimised on another set of validation data. Thethird test data set was not used until the final system had been prepared, so thateven accidental optimisation would not be carried out.To remove any bias from the original ordering of the data, chorales were randomly allocated to these sets. The allocations made are listed in the appendix.

Chapter 4Sequence predictionThis chapter shows how we can create Markov models of properties of choraleharmonisations. A description of the mathematical basis for various kinds ofMarkov model is followed by a comparison of the predictive power of differentmodels of our data set.

4.1 TheorySequence prediction methods allow us to model, for example, states which varyover time. We can pick already-observed events which we think will affect whatwill happen next, and build a model to predict what will happen next from theseobserved variables.Given a record of the weather over a period, we might hope to find patternsrelating the weather on a day to the weather on the previous day, or on the previousfew days. If we built a model of these patterns, we could then use it to predict thenext day's weather, given our knowledge about the weather on the previous days.A very simple model might use only two states, `sunny' and `rainy'. Perhaps ourmodel would say that if it is sunny today, there is a 70% probability it will alsobe sunny tomorrow, and that if it is rainy today there is only a 10% chance it willbe sunny tomorrow. Since there are only two states in our model, and since the

probabilities of all possible states must sum to 1, there must equivalently be a21

Chapter 4. Sequence prediction

30% chance that a sunny day will be followed by a rainy one, and a 90% chancethat a rainy day will be followed by another rainy one. We can represent theseprobabilities as a matrix:(

P ) ytP ) yt

sunny +*

rainy+-,*

0. 7 0. 3

0. 1 0. 9,

P ) yt /

sunny +

P ) yt /

rainy+-,

We can use this equation iteratively to work forwards from a day which weknow was sunny or rainy, and work out the probability of each future day beingin either state. The model which this equation represents is a first-order Markovchain, since it only uses the state at one preceding time step to make its predictions.In the case of the weather we would not expect the model's results to be reliable

many days ahead, and in fact for most sequences of events such a simplemodel will be less and less useful the further ahead we try to look. However, forour purposes here we need only be concerned with short-term prediction, and withhow much a sequence generated by our model looks like a genuine sequence. Wecan use the short-term predictions to calculate the likelihood of sequences fromour data set according to our model, and when we generate new sequences we donot need to predict Bach's actual harmonisation, but something that is like a Bachharmonisation.In general, for mutually-exclusive events S0 to SM , a first-order Markov model

will take this form: 01111

P ) yt1

P ) yt

P ) yt01

P ) yt /

11551255*

3 3 37

P ) yt /

465555

555798

P ) yt

S0 : yt /

P ) yt

S1 : yt /

1 *1 *

S0 +S0 +

P ) ytP ) yt*

P ) yt*

SM : yt /

S0 : yt /*

S1 : yt /

1 *1 *

P ) yt

SM : yt /

S0 : yt /*

S1 : yt /

P ) yt*

SM : yt /

Figure 4.1: Markov chains of orders 1, 2, and 3

This model makes the assumption that yt is conditionally independent of

2 < yt ; 3 = = = y0 ,

P > yt

given yt ; 1 :

Sit @ yt ;?

2 = = =

S i0 B?

P > yt?

Sit @ yt ;

If we want to create a model for some training data, we can use maximum likelihood estimates of these probabilities:P > yt?

S j @ yt ;

P > yt ? S j < yt ; 1 ? Si BP > yt ; 1 ? Si BC

freq > yt ? S j < yt ; 1 ? Si Bfreq > yt ; 1 ? Si B

If we instead claim that we also need to take into account yt ; 2 , we get a secondorder Markov chain; if we claim that we need to take into account yt ;

1 < yt ; 2

yt ; 3 , we get a third-order Markov chain. These three models are representedgraphically in figure 4.1. Markov chains are very well-understood models, and

Figure 4.2: Markov model with additional context

have a long history of use in modelling aspects of language. Markov himself usedthem to model sequences of letters found in a text (Markov, 1913).We do not need to only take into account the previous values of the variablewhich we are trying to predict. If we can observe other relevant variables, thenwe can include these in our model. Figure 4.2 shows a first-order Markov modelwhere the next state of a variable is also dependent on some external context. Wecan easily include this context in our maximum likelihood framework:P > yt?

Sk @ yt ;

Si < ct?

freq > yt ? Sk < yt ; 1 ? Si < ct ? V j Bfreq > yt ; 1 ? Si < ct ? V j B=

While these estimates will maximise the likelihood of the training data withrespect to our model, they may prove problematic when we try to apply them tonew data. When the frequencies of items in a data set are distributed accordingto a Zipf-like curve it is likely that we will come across items in unseen datawhich were not present in our training data. Indeed, the longer the sequences weconsider, or the more additional context we use, the more likely it becomes thatwe will come across something which had a zero frequency in our training data.We can mitigate this problem by, for example, constructing several Markovmodels of different orders, and using them together. We can `smooth' the probabiliti

P DE) yt*

Sk : yt /

S j yt /

l0 P ) yt*

l1 P ) yt*

Sk : yt /

l2 P ) yt

Sk : yt /

s j yt /

Si +G.

Appropriate va ues for the weights lk can be ca cu ated using a va idation set ofhe d-out data. A simp er method which can a so work we is to `back off' toa mode which assumes fewer dependencies when we find an item with a zerofrequency (Katz, 1987; Chen and Goodman, 1998). For examp

P DH) yt*

IJJJJL

P ) yt*

Sk : yt /

S j yt /

P ) yt*

Sk : yt /

S j +8

P ) yt*

if A;if M A B;if M A 8 M B .

A : f req ) yt*

Sk yt /

S j yt /

B : f req ) yt*

Sk 8 yt /

8 N 0.S j +O

Si +ON 0 .

Backing off re ies on the assumption that the mode s which assume more dependencies are better mode

s of the under

ying processes, so that a mode

which assumes more dependencies shou d be used in preference to a mode which assumesfewer dependencies.Smoothing across mode s ike this sti does not he p us with individua eventsin unseen data which were not present in our training set. To dea

with this, weneed to smooth the conditiona probabi ities P ) yt*

Si : C + over a

possib

e i for

each context C. One simp

e possibi

ity is to use `additive smoothing', adding

some va

ue d to all the observe

counts an

renormalising (Li

stone, 1920; Nivre,2000). This mo

ifies our probability estimate: with N the number of observe

states S j ,

Chapter 4. Sequence pre iction

P ) yt*

S j : yt /

Si +QP

freq ) yt * S j yt /freq ) yt / 1 * 8 Si +F

Si + d.F

d ) N 1+

To calculate the cross-entropy (negative average log-likelihoo per symbol)for a Markov mo el, we can simply iterate over the test ata, summing log P ) yt : C +for whatever context C we are using, an once we come to the en ivi e by thenumber of symbols we have traverse .

4.2 ApplicationThe simplest sequence pre iction mo el we can apply to the harmonisation problem is a first-or er Markov chain using the harmonic symbols with which our

ata set is annotate . We can also pro uce higher or er Markov chain mo elsof the harmonic sequence, an use these mo els in combination. These mo els

will not be useful generative mo

els, since they take no account of the melo

yline in making their choices – they will pro uce the same sequences whatever themelo y. However, they o allow us to look at the intrinsic pre ictability of theharmonic symbols.Another very simple, but more powerful, mo el, uses the melo y notes topre ict the harmonic symbols. Again, Markov mo els of ifferent or ers can beuse in combination. Whereas we can obviously use only the prece ing harmonicsymbols, we can use the current melo

y note, so these mo

els have a

itionalinformation about the harmonic behaviour at the current step, compare to mo elswhich only use the harmonic symbols themselves.A thir possibility is to use both notes from the melo y an the prece ingharmonic symbols.The chorale ata files are alrea y organise as sequential ata. For these mo el

s we only nee

to look at two columns of the

ata (`Sopran' an

`Harmonik'),an we are only intereste in every fourth line in these columns: these lines represent the beat of the music, an

are the only lines on which harmonic annotationsare given. When a file is rea in, special symbols are a e to the beginnings an

en s of the sequences. This means that we o not, for example, nee to maintaina separate matrix of initial symbol probabilities, since the usual training metho

will pro

uce appropriate probabilities for the transitions from the beginning-ofsequence marker. Chorales are transpose into C major or C minor as they arerea

in, so that we

o not have to create a separate mo

el for each key but canuse all our ata to create a overall mo els of the major an minor key harmonisations. Events which continue over multiple time steps are represente

by theirplain symbols when they begin, an then at subsequent time steps by mo ifie

forms of their symbols which have a prefix a

to show continuation.I wrote a library of routines to train an use Markov mo els, an use this tocreate these mo

els from the chorale

ata training set. The results are

escribe

below. We will use backing off to smooth across mo

els of

ifferent or

ers, an

a itive smoothing to smooth frequencies to account for unseen ata.

4.3 Results an iscussionThe various mo

escribe

above allow us to examine the relationship betweenthe harmonic symbols with which the ata is annotate an the melo y notes inthe soprano line. By comparing the pre

ictive power of these mo

els we can

iscover how useful

ifferent pieces of contextual information are.To investigate how useful contexts of

ifferent sizes woul

be, I traine

Markov mo els on the major an minor training ata using contexts of up to eightsymbols in length, then iterate over test ata using each mo el in turn, notinghow many sequences were encountere which ha not been seen in the training

ata. The results for mo els pre icting the next harmonic symbol from the prece

ing harmonic symbols are shown in table 4.1, the results for mo els pre ictingthe next harmonic symbol from the melo ic context in table 4.2, an the resultsformo els using both harmonic an melo ic context in table 4.3. In each case thereare two sets of results, since the chorales in major an minor keys are treate

separately.

Context Number Proportion

3 1456

3 1304

4 1950

4 1702

5 2216

5 1888

6 2349

6 1984

7 2397

7 2018

8 2425

8 2031

Table 4.1: Number of unseen sequences for various lengths of harmonic context,an

proportion of all sequences unseen: major (left) an

minor (right)

Table 4.1 shows the number of such unseen sequences for mo

els pre

icting

the next harmonic symbol from the prece

ing harmonic symbols, an

the proportion of all sequences encountere

which were unseen, for contexts from zero toeight symbols long. When a context of zero length is use , we only ask if in ivi

ual symbols have been seen in the training set; as we increase the context lengthan look at longer sequences, we become more an more likely to fin unseen sequences which were not present in the training ata. For these mo els the numberof symbols in the context is the same as the number of quarter-note beats we aretaking into account in making our pre ictions, so the maximum contexts consi ere

here of length eight represent, for example, two bars of music in commontime. We can see that the proportion of sequences encountere in the test atawhich our mo els have seen in the training ata quickly rops off as the contextlength increases.

Table 4.2 shows the equivalent results for mo

els pre

icting the next harmonicsymbol from the melo ic context. The mo els using the melo ic context have aconsistently lower proportion of unseen symbols than the mo els which use theprece

ing harmonic symbols. This is un

erstan

able, because while there are

4 1444

4 1072

5 1777

5 1327

6 1932

6 1486

7 2040

7 1579

8 2096

8 1618

Table 4.2: Number of unseen sequences for various lengths of melo ic context,an

proportion of all sequences unseen: major (left) an

minor (right)

2 1436

2 1225

3 2022

3 1688

4 2283

4 1902

Table 4.3: Number of unseen sequences for various lengths of harmonic an

melo ic context, an proportion of all sequences unseen: major (left) an minor(right)

more than eighty ifferent harmonic symbols in the ata set, we o not expect themelo y notes to vary by much more than an octave. Taking this into consi eration,the proportion of unseen harmonic contexts actually rises fairly slowly with thecontext length, showing that even for the longer lengths of context some sequencesare repeate

across

ifferent chorales.Table 4.3 shows the number of unseens sequences for mo els using both harmonic an

ic context. These results look at how many unseen combinationsof harmonic symbols an melo y notes we fin , so it is not surprising that the proportion of unseen sequences rises more quickly than when we take the harmonicsymbols or melo y notes by themselves. Since mo els use compoun symbols intheir contexts, with each contextual unit enco

ing a combination of a harmonicsymbol an a melo y note, a context four symbols long for these mo els is comparable to a context eight symbols long for the previous mo

els.Using the same set of mo els, we can see how this falling off in sequencecoverage affects the quality of our pre

ictions using the

ifferent contexts. Table4.4 shows how the harmonic symbol base

els perform, table 4.5 shows the

performance of the mo

els which base their pre

ictions on melo

y notes, an

4.6shows the performance of the mo

els which use both melo

y notes an

harmonicsymbols.In table 4.4 we can see how using ifferent numbers of the previous harmonicsymbols affects the quality of our pre ictions for the next harmonic symbol. Asinall these tables, a mo el of or er zero uses a zero length context – that is, it makespre ictions base on the overall probabilities of ifferent symbols without takinginto account the context at all. If we use in ivi ual mo els, then as we increasethe length of the context the mo els very quickly perform worse than the mo el o

for er zero. This is shown in the table by a higher cross-entropy value. The mo elswhich perform worse than the mo

el of or

er zero are suffering from `sparse ata': there are too many unseen sequences for them to be able to consistentlymake useful pre ictions. Smoothing with mo els of lower or ers overcomes thisproblem, since the pre ictions of the lower or er mo els can be use when thehigher or

els come across an unseen sequence. We can see that smoothing

Mo el or er

Single mo el Smoothe mo els

0 4.31

1 3.80

2 4.37

3 6.78

4 10.0

5 12.4

6 13.7

7 14.3

8 14.5

0 4.64

1 3.74

2 4.14

3 6.70

4 10.5

5 13.0

6 14.0

7 14.6

8 14.7

Table 4.4: Cross-entropies for mo

els pre

icting harmonic symbols from previousharmonic symbols: major (above) an

minor (below)

Mo el or er

0 4.31

1 3.20

2 3.46

3 4.39

4 6.82

5 9.27

6 10.6

7 11.4

8 11.9

0 4.64

1 3.38

2 3.32

3 3.88

4 5.68

5 7.70

6 9.23

7 10.2

8 10.5

els pre

icting harmonic symbols from melo

ynotes: major (above) an

minor (below)

Mo el or er

0 4.31

1 3.36

2 7.34

3 11.3

4 13.5

Mo el or er

0 4.64

1 3.10

2 7.20

3 11.4

4 13.5

els pre

icting harmonic symbols from melo

ynotes an previous harmonic symbols: major (above) an minor (below)

with mo els of lower or er greatly improves the pre ictive power of our Markovmo

els. If we use smoothe

els the cross-entropy re

uces as we increase themo el or er, since higher or er mo els are use only where they can make usefulpre

ictions. The tables show contexts up to eight symbols long; by that lengththe higher or er mo els are only being use rarely, an the cross-entropy valuesonly show very small improvements over the mo

els restricte

to using shortercontexts.Table 4.5 shows that the melo

y notes appear to be more useful than the harmonic symbols as contextual information for pre icting the next harmonic symbol, since the mo

els which use them have lower cross-entropies. However, it isimportant to note that the final melo y note in our context is the one being playe

on the beat whose harmonic symbol we want to pre ict. A context of harmonicsymbols can only run up to the previous beat, since otherwise we woul

have nopre iction left to: the final element of our context woul be what we were trying to fin

. Since there are generally several possible ways of following from theprece ing harmonies, the harmonic symbol base mo els can only guess what irection will be taken, whereas the mo

els which use the melo

y notes have thepossibilities narrowe own to those which are compatible with the final melo ynote in their context.

Even lower cross-entropies in table 4.6 show that the best mo

el of the

ata isgiven by using both the melo

y notes an

the prece

ing harmonic symbols. Eventaking into account that, for example, the first or er mo el here combines twosymbols in its context, this mo el consistently performs best.

Chapter 5Hi

en Markov Mo

elsThis chapter escribes Hi en Markov Mo els, an explains how we can applythem to chorale harmonisation. We will see that even a simple Hi en MarkovMo el benefits from its ability to `plan': we can make pre ictions which take intoaccount the probability of the entire sequence of which they are a part, an

canfin the globally most probable sequence.

5.1 TheoryThis section intro

uces Hi

en Markov Mo

els, an

relates some useful results.A useful longer treatment of Hi en Markov Mo els is provi e by Rabiner(1989).Instea of working irectly from observe states, Hi en Markov Mo els assume that observe

events occur because of un

erlying hi

en states. The graphical form of a Hi en Markov Mo el is shown in figure 5.1.An or

inary Markov assumption is ma

e concerning the transition probabilities between the hi en states; here we will use a first-or er mo el, such that weassume:P ) st*

Sqt : st /

2 . . .

S q0 +*

P ) st*

Sqt : st /

However, we now also nee transition probabilities to mo el how the observe

Chapter 5. Hi en Markov Mo els

Figure 5.1: Hi

en Markov Mo

event results from the hi en state. We make a similar assumption that:P > yt?

Yit @ st

Sqt < = = = s0?

Sq0 < yt?

Yit < = = = y0?

Yi0 B?

P > yt?

Yit @ st?

Sqt B=

Thus the probability of a particular state an observe event given the prece ingstate is:

P > yt

Yk < st

S j @ st ;?

P > yt?

Yk @ st?

S j B P > st

S j @ st ;?

To fin the probability of a particular sequence of observe events y 0Yi0 < y1?

Yi0 < = = = yT

YiT , we can sum over all possible state sequences. We e?

fine at > j B s the prob bility of seeing the first t observed events of the sequence nd finishing in st te j:at > i B?

P > y0?

Yi0 < y1?

Yi1 < = = = yt?

Yit < st?

We c n then use these v ri bles to find the sequence prob bility by induction,using these forw rd prob bilities:

a0 > j B?

P > s0?

S j B P > y0?

Yi0 @ s0 ? S j B ;

Ch pter 5. Hidden M rkov Models

at ) j +*

å at /k

P ) y0 * Yi0 y1*

k + P ) st

S j : st /**

P ) yt,

å aT )

Yit : st*

5.2 Hidden M rkov Models s gener tive systems

We would like to be ble to use Hidden M rkov Models to gener te new h rmonis tions, given inform

bout chor

le melodies. This gener

tive systemc n lso be described in terms of cl ssific tion: we wish to l bel e ch time stepin the melody with n ppropri te h rmony. We c n chieve this by tre ting themelody notes

s observed symbols, emitted by underlying h

rmonies. Finding

n ppropri te h rmonis tion is therefore question of finding n ppropri te st tesequence to expl in the observed events. Two possible ppro ches re describedbelow: m

posteriori (MAP) estim

tion using the Viterbi

lgorithm,

nds mpling from the condition l prob bility of the st tes given the outputs.

5.2.1 Viterbi lgorithmIf we h

sequence of events which we

re viewing

s outputs from

HiddenM rkov Model, how c n we find the most likely st te sequence? We do not justw

nt to find the most likely st

ch time step. Choosing the loc

ximum t n individu l time step might bring us into st te where the rest of the sequenceh d very low prob bility. Inste d, we w nt to find the sequence which is glob

llymost prob ble. Hidden M rkov Models h ve the dv nt ge th t we c n indeede

sily find the glob

lly most prob

ble st

te sequence, using the Viterbi

lgorithm(Viterbi, 1967).

rticul

r sequence of observed events y0*

Yi0 y1*

we c n define dt ) j + , the maximum probability of any in ivi ualstatesequence88which pro

uces the first t observe

events of the sequence an

finishes in state j.We can then use in uction to fin the maximum probability of any in ivi ual statesequence which pro

uces the entire observe

sequence. So that we know which

Chapter 5. Hi en Markov Mo els

state sequence it is which has this maximal probability, we instantiate variablesyt ) j + to record the states used along each partial maximum-probabilit

path.We find the maximum probabilit

individual state sequence producingthe observed events:d0 ) j +

P ) s0*

y0 ) j +dt ) j +yt ) j +PD*

S j + P )

0;maxk T dt /

Yi0 : s0 * S j + ;*

k + P ) st

argmaxk T dt /

S j : st /

k + P ) st

Sk + U P ) yt

S j : st /*

Yit : st

Sk + U ;

max j dT ) j +S.

Then we work backwar s to extract the state sequence in question, s 0S q1

Sj+ ;*

S qT .

S q0 s 1*

qTDqtD

argmax j dT ) j + ;*

5.2.2 SamplingWe can also generate random state sequences according to the probabilit

distribution of our model. Using at /

j + , the prob bility of seeing the first t W 1 observed

events of sequence nd finishing in st te j, we c n c lcul te the prob bilityofseeing the first t W 1 events, finishing in ny st te, nd then tr nsitioning to st te k t the next step:P ) y0 * Yi0 y1 * Yi18

. . .*

j + P ) 8 st*

S j st

Sk : st / 8

Sk +S j +G.

Ch pter 5. Hidden M rkov Models

We c n use this to c lcul te rt ) j : k + , the p

obability that we a

e in state S j at timet W 1 given the obse

ved event sequence Yi0 Yi1in state Sk at time t:

rt ) j : k +*

P ) st /

S j : y0

Yit R 1 , and given that we will be

8Yi0 y1 * Yi1 . . . yt / 1 * Yit R 1 st * Sk +at / 1 ) j8 + P ) st * Sk : st / 18 * S j +8.*ål at / 1 ) l + P ) st * Sk : st / 1 * Sl +*

To inst nti te st te sequence s0

S v0 s1

SvT , we first choose the*

fin l st te by s mpling from its prob bility

distribution ccording to our model:88P ) sT*

S j : y0

Yi0 y1

aT ) j +å l aT ) l +*

Once we h

ve chosen vT such th

t the fin

SvT , we c

n use the*

v ri bles rt ) j : k + to move back th ough the sequence:P ) st*

S j : y0*

Yi0 y18

YiT st V

j : vt V

5.3 ApplicationIn p

inciple we would like to use the hidden states, t

ansition p

obabilities andemission p

obabilities which give the best model of the data. He

e we will t

ain aHidden Ma

kov Model di

ectly by taking data annotations as the hidden states andusing maximum likelihood estimation to calculate the conditional p

obabilities weneed.Fo

example, if we t

eat the ha

monic symbols with which ou

data set isannotated as hidden states, we can find the best sequence of ha

monic symbolsacco

ding to ou

model by finding the Vite

bi path. As well as being useful mathematically, it makes sense musically to think of ha

monies as unde

lying a piece,and to think of the melody as being emitted at each time step f

om the unde

monic state.

Chapte

5. Hidden Ma

kov Models

To compa

e a Hidden Ma

kov Model of the

elationship between the melodyand ha

monic symbols with ou

models, we want to calculate the c

ossent

opy (negative ave

age log likelihood pe

symbol). We need to calculate thep

obability of a test ha

monisation s0 * Sv0 s1 * Sv1

model, given a test melody y0

Yi0 y1 * Yi8 18

* 8sT*

SvT acco

ding to ou

P ) s0 * Sv0 s1 * Sv1 . . . sT * SvT : y0 * Yi0 y1 * Yi1 . . . yT * YiT +P ) s0 * S8 v0 s1 * S8 v1 . . . sT * SvT y0 * 8 Yi0 y1 * 8 Yi1 . . . yT * YiT +*P ) y08 * Yi0 y1 * Y8 i1 . . . yT 8 * YiT + 8

We can use equation 5.2 to find the p

obability88 of the event sequence acco

ding toou

model, and we can use equation 5.1 ite

atively to find the joint p

obability of astate sequence and event sequence. Since log ba*

log a W log b, we can find the log

conditional p

obability of the ha

monisation given the melody by subt

acting thelog p

obability of the melody f

om the log joint p

obability. The c

ossent

opyis the total summed conditional p

obability divided by the numbe

of obse

vedsymbols we have t ave sed.I w

ote a lib

outines to t

ain and use Hidden Ma

kov Models, and usedthis to c

eate these models f

om the cho

ale data t

aining set. Functions f

om theo dina y Ma kov model lib a y a e used whe e possible. Fo example, the samefunction can be used fo

maximum likelihood estimation of t

ansition p

obabiliti

eswith eithe

model type.

5.4 Results and discussionUsing Hidden Ma

kov Models with melody notes as the obse

ved events andthe ha

monic symbol annotations f

data set as the hidden states, t

ainedby maximum likelihood estimation f

aining data, we obtain the c

ossent

opy values shown in table 5.1.These c

ossent

opy figu

e highe

than ou

esults fo

kov models. This suggests that the smoothed high o

kov models p

ovided a bette

model of the data. Howeve

eal benefit of Hidden Ma

Chapte

5. Hidden Ma

kov Models

ossent

opy (majo

ossent

opy (mino

Table 5.1: C

ossent

opies fo

Hidden Ma

kov Models t

eating the melody notes asobse

ved events and the ha

monic symbols as hidden states

Method C ossent opy (majo ) C ossent opy (mino )Local maxima 1.20

Global maximum 0.84

Table 5.2: C

ossent

opies fo

sequences of ha

monic symbols gene

ated by taking the locally most likely outcome at each step (local maxima), and by finding theglobally most likely sequence using the Vite

bi algo

ithm (global maximum)

Models comes because the fo

m of the model allows us to take the whole sequenceinto account when making p

edictions. To demonst

ate this we will gene

ate sequences of ha

monic symbols using two methods. The `local maxima' methodtakes the locally most likely outcome at each step, which is equivalent to gene

ation using a simple Ma

kov model, and the `global maximum' method findsthe globally most likely sequence, using the Vite bi algo ithm. We can compa ec

ossent

opy values fo

the sequences gene

ated by the two app

oaches to showthe imp ovement in ove all sequence p obability which we can gain by using aHidden Ma

kov Model and the Vite

bi algo

ithm as desc

ibed above.Table 5.2 shows the values we obtain fo these two methods of sequence gene ation. It is clea

that the Vite

bi algo

ithm allows us to find significantly mo

ep obable sequences than we obtain by taking the local p obability maximum fo

each decision. It is quite no

mal fo

obabilities of ou

ated sequencesto be g

than the p

obabilities of Bach's ha

monisations unde

model, asthey a

e. In fact, this is always the case fo

the Vite

bi path: it is by definitionthe most p obable sequence, so it must have a p obability g eate than o equaltothe p

obability of the `t

ue' sequence f

om the data set.

Chapte

5. Hidden Ma

kov Models

It would be possible fo

it to tu

n out that ou

globally most likely sequenceswe

ing, sticking too much to the most likely states whe

e the sequences inthedata set st

afield. Howeve

, in almost all cases sequences gene

atedbytaking the local maximum at each step will be mo

e guilty of this, since looking fo

the global maximum enables us to `plan', and take a mo

e unusual and the

elowe p obability path now if it will aise the ove all p obability. Even if ou

global maximum sequences a

e too p

obable, it is still bette

to use a HiddenMa

kov Model than an o

kov model, since we can use the samplingmethod desc

ibed in section 5.2.2 to gene

ate less likely sequences in an info

medmanne

, still taking into account an enti

e state sequence

than looking atsingle decisions independently.

Chapte

6Final ha

monisation modelThis chapte

ibes a ha

monisation model which takes into account the

esultsof the p

evious chapte

s. We divide the task of ha

monisation into th

ee subtasks.Each subtask is discussed, and examples of cho

ale ha

monisations gene

ated bythe ove

all model a

e given.

6.1 Building a ha

monisation modelP

evious ha

monisation systems discussed in Chapte

2 suggest that we will bemo

e successful if we divide the ha

monisation task into multiple subtasks. Ideally we would conduct these subtasks in pa allel, fo ming a `p oduct of expe ts'(Hinton, 1999) which efficiently catego

ises the space of possible cho

ale ha

monisations. Howeve

, unde

tain conditions we can justify conducting thesubtasks one afte

the othe

, feeding the

esults of one subtask into the next.This kind of se

ialisation of subtasks will only succeed if no subtask befo

e thefinal one p

ecludes any valid ha

monisations, and if no subtask p

oduces

esultsthat will finally gene

ate an invalid ha

monisation. We need to be able to

eachall valid ha

monisations in the state space, and we need to make su

e we do notp

esults at one stage which a

e useless late

example, if ou

monisation subtask neve

ates ce

tain ha

monies, then these will neve

in the final ha

monisations. Simila

ly, we need to ensu

e that ou

Chapte

6. Final ha

monisation model

subtask gene

ates output f

om which a valid ha

monisation can late

eated;not all sequences of ha

monies will have allowable instantiations as notes. Ofcou

se, since we a

king with p

obabilities

ules, the system willonly judge diffe

ent ha

monisations as mo

less p

obable

than valid o

invalid, but this does not affect the unde

lying issue.

6.2 The modelFollowing Hild et al. (1992), we will divide the task of ha

monisation into th

eesubtasks. Fi

st we will build a `ha

monic skeleton', labelling each beat with aha

monic symbol. Secondly we will build a `cho

d skeleton' by filling in notes,aiming fo

them to fit with these ha

monic symbols and fo

m cohe

ent lines ofmusic in themselves. Thi

d and finally comes `o

namentation', as we fill in notesoff the beat to imp

ove each of the th

ee additional lines of music we have addedto ha

monise the o

iginal melody.

6.2.1 Ha

monic skeleton

We have al

eady discussed possible ways of finding an optimal sequence of ha

monic states fo

a given melody in Chapte

s 4 and 5. We will use the HiddenMa

kov Model c

eated in Chapte

5 to solve the fi

st subtask in ou

monisationsystem.By using this model we t

eat the notes of the melody as an obse

vation sequence `emitted' by the hidden ha monic states. This makes sense in musicalas well as mathematical te

ms: a ha

monic symbol

esents a set of possiblecho ds, and we can view the melody note as one of many possible instantiationsof the unde

lying ha

monic state of the piece at that instant.While some sequences of ha monic symbols would have no valid instantiationas cho

ds, any p

ohibitions that we might list a

m, affecting only adjacent ha monies. The efo e we can justify making ou decision on a sequence of

monic symbols a sepa

ated subtask, since ou

model ought to take into account

Chapte

6. Final ha

monisation model

this kind of p

ohibition in weighing up the p

obabilities of t

ansitions betweenha

monic states.

6.2.2 Cho

d skeletonThis subtask

es a note to be decided at each beat of the melody fo

each ofthe th

ee voices added in ou

monisations. We will use anothe

Hidden Ma

kovModel he

e. The ha

monic symbols decided by the p

evious subtask will now bet eated as an obse vation sequence, and we will gene ate cho ds as a sequence ofhidden states. This model aims to `

' the fully filledout cho

whichthe ha

monic symbols a

e a sho

thand.To encode cho

ds into hidden states fo

a Hidden Ma

kov Model, we need top

oduce single symbols which

esent the

elationships between the fou

musical lines at a pa

ticula

time step. We can do this by using an augmented ha

monicsymbol, which not only shows the ha

monic catego

y of a cho

d, but

esentsthe cho

d as a set of inte

vals f

om a bass note. Figu

e 6.1 shows an example ofthis encoding. Given a sequence of these cho

d symbols, and a cho

ale melodyline, we can unambiguously const

uct a sequence of full cho

ds.We can sepa

ate out p

oducing a sequence of cho

ds as a subtask since f

omany valid sequence of ha

monies we will be able to build a sequence of cho

atwill be valid input fo

namentation stage. In fact, because we a

e using aHidden Ma

kov Model, this subtask might be able to wo

ound p

oblems in itsinput: the model has the capability to igno

e input fo

which it can find no suitablecho

ds, if the ove

obability of the state sequence given the input sequence isinc

eased by doing this.

6.2.3 O

namentationThis subtask allows additional notes to be added into the th ee musical lines wehave c

eated alongside the melody. Fo

example, we may want to add in faste

notes to lessen any la ge jumps in pitch. This `o namentation' was c eated sepa atel

each line of music.

Chapte

6. Final ha

monisation modelm l

oplytoplnm o

o h{z[

a b c d

j c k k

lnu oplnv oplnm o

lnm o to q o

lnm oplnm o q o

e 6.1: Example of cho

d encoding

The data fo

mat, as desc

ibed in Chapte

esolves all

hythms ontosixteenthnote steps, with fou

of these steps making up each beat. `O

namentation'was encoded by listing the inte val f om the fi st of these notes to the noteplayed on each of the fou

steps. This

esentation means that any t

anspositionof a musical ph

ase is encoded in the same way. A compound symbol was usedas the obse

ved state, made up of the notes on the cu

ent and next beat, and thecu

ent ha

monic symbol. The next note was included since the sho

notes thatthis subtask is intended to add fill out the movement between the two notes thatf

ame them.While it could be a

gued that the faste

notes which we add in o

namentationshould be c

eated along with the main notes in a line of music, t

eating all o

namentations of a cho

d as diffe

ent would g

eatly inc

ease the size of the model

ed. Not only would this slow down the calculation of the most likely statesequence exponentially, but it would make ou

data mo

se in p

tion.Simila

ly it would be p

able fo

model to take into account the inte

actions between o

namentation in diffe

ent voices, but the Hidden Ma

kov Modelwe a e using does not cope well with the spa se p obability dist ibution we would

Chapte

6. Final ha

monisation model

Subtask C

ossent

opy (majo

ossent

opy (mino

monic skeleton 2.80

d skeleton 13.3

namentation 16.0

alto 5.57

bass 4.24

Total (all subtasks) 32.1

Table 6.1: C

ossent

opies on test set 3, a heldout set, fo

each subtask

then have to wo

k with. The p

oblems that might be caused by t

eating each voiceindependently a e lessened because we include the cu ent ha monic symbol in

model.6.3 Results and discussionTable 6.1 shows the c

ossent

opy fo

each subtask in the final model, measu

edusing a heldout set of data. Since the `o

namentation' subtask used a sepa

atemodel fo each line of music, the sepa ate c ossent opy values a e shown as wellas the ove

all value. The total c

ossent

opy, with all th

ee subtasks wo

king togethe

, is also shown. Compa

ing the c

ossent

opy values, we can see that theo namentation is least p edictable, then the cho d skeleton, then the ha monicskeleton. The

e many possible solutions fo

namentation, so we would notexpect to be able to p

edict the actual o

namentation with g

eat accu

acy. Ou

model has only taken into account a few of the featu

es which lead to the select

ion of a pa

ticula

namentation. We can see that by sepa

ating out thegene

ation of a ha

monic skeleton we have given ou

selves a

elatively p

edictableway of c eating the f amewo k fo ou ha monisations.Figu

es 6.2 and 6.3 show typical output of the fi

st subtask, which const

ucts a`ha

monic skeleton'. We can see, fo

example, that the final notes of both melodies

Chapte

6. Final ha

monisation model

¨Figu

e 6.2: Example ha

monic skeleton: `Dank sei Gott in de

H ohe',melody ofcho

ale K54, BWV 287

e all assigned the ha

monic symbol `T', showing that these have been labelledas tonic cho

ds, which will make the ends of the ha

monisations sound complete.6.4 shows a less successful ha monic skeleton: he e the ha monies appea to bestuck in some unusual states. If seve

monic symbols a

e equally likely giventhe melody note, the Hidden Ma

kov Model will pick the state sequence whichcontains the highest p

obability t

ansitions. This behaviou

usually wo

ks well,but he

e highp

obability t

ansitions between unusual states have led to a longsequence of such states being p

oduced. To

educe this p

oblem we could makeou

decisions conditional on additional featu

es. Fo

example, if we used a highe

o de Hidden Ma kov Model then the longe context would allow the model tolea

monic symbols come as the beginning of a sequence leading backto mo

equent ha

monies. The fi

model we a

e using he

e only lea

nsthe immediate consequences of events.Figu

e 6.5 shows example output f

om the second subtask, which const

uctsa `cho

d skeleton', fo

the same melody whose ha

monic skeleton was shown in

Chapte

6. Final ha

monisation model

§ ̈ © ª ¦

¬¬¯

±¯ ²

¯¬¯

·¯ -³² ¬ -´º ¬ - ®¬ ¯ -

¯ -³²

·x¸¯

¯ ¬ »

± ±¯ ² ¯ ² ¯ ¬ »

¬¯ ¬ ¬{¯¯

¯ -¶² ¬®¯ -

¯ -¶»

±{·¹¸º-

¬ -´¯ ¬®-¯

° ±¯-³² « ¬®¯ -

¬®¯ -

- ¬®¯ -¶² ¯ -º

®¬ ¯ -

± ±®¬ ¯ - ¯ ² ²¯

¬®-´¯ ¬ - ¯

½½·x¸¯

¯ ¬ »

Figu e 6.3: Example ha monic skeleton: `Schaut, ih Sunde !¨Ih

macht mi

oßePein', melody of K303, BWV 408

Ã Ä Å Æ Â

Ç Ì Ê³È

ÙÀÁÕ]Ö

É Ì Ê¶È

Ì È ÎÇ

ÍÉ Ì Ê¶È

ÍÌ È ÌÇ

É Ê¶ÈÌ

Ð ÒÒ

Ç Ì Ê¶È Ì ÊÌ

ÍÊ É Ì Ê¶È

ÊÎÊ

Á ÐÐÍ

É Ì Ê³È ®Ç Ì Ê¶È

ÍÉ Ì Ê¶È

Ç®Ê¶È Ì ÊÌ

É Ì Ê³È Ì Ê

É Ì Ê¶È Ç®Ê¶È Ø ÊÌÌ

É Ê³ÈÎ Ï

Ò Ð&ÐÌ

ÍÌÊ

É Ì Ê¶È Ì Ê

ÍÊ Ç ÈÉ Ì ÊËÉÎÏ

Ç®Ì Ê¶È

ÍÇ®Ê¶ÈÎÏÌ

Ç È&É Ì ÊÔÉ Ê É ÊÌ

Ç È&É Ì ÊËÉ Ê É Ì Ê³È Ì ÈÌ

ÍÊÌ

ÍÌÇ

Ì Ê¶È Ç Î Ê

e 6.4: Example ha

monic skeleton showing p

oblems: `Ach Gott, vom Himmelsieh' da

ein', melody of K6, BWV 77.6

Chapte

6. Final ha

monisation modelÜ

á â ã ä à

ê ã ë ë

íìì

ÛÔÚ

ßòñ

í îì

Ü]ÝÛ

í îì

ßOóyó

è é ä à

å æ ç à

ôìì

¨Figu

e 6.5: Example cho

d skeleton: ha

monisation of `Dank sei Gott in de

H ohe',

melody of cho

ale K54, BWV 287

Chapte

6. Final ha

monisation model÷öö

ü ý þ ÿ û

¨ leuchtet de

Figu e 6.6: Example cho d skeleton: ha monisation of `Wie sch onMo

genste

n', melody of cho

ale K377, BWV 36.(2).4

Chapte

6. Final ha

monisation model

e 6.2. Although the model selects whole cho

om its t

aining data toinstantiate the ha

monic symbols decided by the fi

st subtask, we can see that

easonable lines of music have been gene

ated: the notes tend not to jump byexcessive inte

vals, but to move in acceptable patte

ns. Since the model wo

kswith whole cho

ds, the lines of music do not c

oss ove

move too fa

t.Figu

e 6.6 gives anothe

example, in which we can see p

oblems with the linesof music st

aying outside thei

expected

anges. In the fi

both the altoandbass lines a e unexpectedly low. This can happen because the encoding used inou

model gives no oppo

tunity fo

anges to be lea

nt, since cho

econside

ed only as t

ansposable sets of inte

vals. To add

ess this p

oblem wewould need to int

oduce some way fo

obability of a cho

d to be judgedacco

ding to the absolute pitches of its notes, not only the t

ansposed inte

vals.Figu

e 6.7 shows example output f

om the final `o

namentation' subtask, fo

the same melody whose ha

monic skeleton was shown in figu

e 6.5 and whosecho

d skeleton was shown in figu

e 6.2. We can see that the combination of theth

ee subtasks has

esulted in a

easonable ha

monisation. The added o

namentation makes the lines of music flow bette

, and makes the passage between diffe

ent ha

monies mo

aceful. Figu

e 6.8 shows an example whe

e we can see

an unfo

tunate ja

ing combination of o

namentation, in ba

8 (the thi

onthe second line). Ou

model makes no attempt to p

event unpleasant combinationsof o

namentation being placed ve

tically togethe

, on the same beat, hopingthat bad ve

tical combinations will be hea

d as allowable passing dissonances.Howeve , conside ing the simple way we model o namentation, the esults seemsu

isingly good.Figu es 6.9, 6.10, and 6.11 show some example ha monisations gene ated bythe system fo

melodies f

om the set of heldout data. Possible enhancements tothe system, and othe suggestions fo futu e wo k, a e discussed in Chapte 7.

6.4 Example audio filesExample audio files, as well as the sou

ce code of the p

ams used, can bedownloaded f om http://www.s cf.ucam.o g/”mma29/2002/ha mony/.

Chapte

6. Final ha

monisation model

!#"%$ &

/ / / 3 //

0 1 // /

/ / 0/

/ / / //

/ / / 3 / /

/ 2/ /

// / / /

0 1 // /

/ 2/ /

/ / / //

/ / / 3 / // 3 /

/ / / / / / / 3 /

/ / /0

/ / / //

/ // 2/ /

e 6.7: Example illust

ating o

namentation: ha

monisation of `Dank sei Gott in¨de

Hohe',melody of cho ale K54, BWV 287

Chapte

6. Final ha

monisation model

= >!?#@A B#>

C D E >

H A II:L:C

L: ;FH

K K J K K

K K K K K KK

J K J K

NK MKKK

K K K KK K

KK K K K K K J K J K N MK K K K

K K J K

K K K K K

KTS JRK Q K KK K

K K K KK K K

K K KK

K KK K KK

K K J KP

K K K K K K K P

U KK K U K

K K KK

K K K J K

e 6.8: Example illust

ating o

namentation: ha

monisation of `In allen meinenTaten', melody of cho

ale K211, BWV 367

Chapte

6. Final ha

monisation model

\ ]!^#_

b c d ]

i i iX

kipm i i il

i j ii

i i i i i i

il i i i i

i i l il

ki i i

i ij i

i i i i i

l il m

i i i l i

i i l i

Figu e 6.9: Example ha monisation: `E standen ist de heilige Ch ist', melody ofcho

ale K85, BWV 306

Chapte

6. Final ha

monisation model

y z!{#|} ~#z z

v w uu

e 6.10: Example ha

monisation: `Fu

¨ deinen Th on t et' ich hie mit', melody

of cho

ale K132, BWV 327

Chapte

6. Final ha

monisation model

¡ ¢ ¡

£¢ £

Figu e 6.11: Example ha monisation: `Wi Ch istenleut', melody of cho ale K380,

BWV 110.7

Chapte

7. Conclusions and futu

ding to the local context, and allows us to keep ou

elatively small,since the sho

m dependencies wo

k togethe

ovide a longte

obability dist

ibution.Pe

haps most significantly, the system desc

ibed in Chapte

6 shows that ap

ammed knowledge base is not necessa

ale ha

monisation. Thesystem was given some supe

vision, since the p

oblem was divided into th

eesubtasks, but the models we

e not given any ha

monic knowledge beyond thatcontained in thei t aining data.Ou

system is able to `plan' like the const

aintbased systems desc

ibed inChapte

2, but he

e knowledge acquisition has been integ

ated into the ove

alltask. The system lea

ns its `

ules' by statistical obse

vation of example ha

monisations, like p

evious sequence p

ediction systems, but the Hidden Ma

kov Modeluses the p

obabilities it lea

ns in a mo

ed fashion, taking into accountwhole sequences

than making independent decisions on individual symbols.

7.2 Futu

kTwo simple enhancements to the system desc

ibed in Chapte

6 would almost ce

tainly inc

ease its pe

mance without too g

eat a computational cost. Fi

st, we

could make ou

decisions conditional on the position in the ba

of music. Musictheo

ecognises a hie

chy of st

essed beats within the ba

, and ha

monicmovement should co

elate with these st

esses. Secondly, we could take into account the p

obability of a note given a model of the dist

ibution of pitches fo

eachvoice. Since ou

system only conside

s the inte

vals within cho

ds, it does nottakeinto account the natu

ange of each voice, and in this way we could int

oducethe concept of voice ange as a statistical dist ibution athe than enfo cing it byadding p edecided `knowledge' to the system. This enhancement would also leadto a mo

e gene

al imp

ovement in the ha

monisations p

oduced, since by p

eventing the bass note f om descending too fa it would keep the musical lines close

Chapte

togethe

, which is itself mo

e pleasing, and leads to mo

e inte

esting ha

monies asthe cho

ds seek solutions to the g

aints on thei

movements.The simple sequence p

ediction models desc

ibed in Chapte

4 could be enhanced in va

ious ways. Mo

e complex methods of smoothing diffe

s ofmodel would be likely to imp

ove thei

mance. `P

obabilistic Suffix Automata'might offe

an alte

native solution (Ron et al., 1994). P

obabilistic SuffixAutomata have a va

iable memo

y length dependent on the context: a t

ee of conditioning events is g own du ing t aining. In music they could allow a model, fo

example, to lea

n to use a longe

context to deal with cadences.We saw that using ha

monic symbols and melody notes togethe

gave thebest model. Using a model which was adapted to deal with seve

al sou

ces ofinfo

mation might inc

ease pe

mance fu

. We might fo

example use amultiple viewpoint system like Conklin and Witten (1995). Alte

natively, a loga

ithmic opinion pool (Hinton, 1999) would allow us to efficiently model a highdimensional space: with such a `p

oduct of expe

ts' we might, fo

example, facto

ise the p

obability of a cho

d given the p

evious cho

d as the p

oduct of sepa

ately modelled p

obabilities of the individual notes following the notes in thep

evious cho

d, and of the inte

vals in the cho

d following the inte

vals in thep

evious cho

elationships. Ghah

amani and Jo

dan (1996) and

Saul and Jo

dan (1998) show how we can use mixtu

e dist

ibutions to c

eate facto

ial Hidden Ma

kov Models which p

ovide an efficient

esentation of la

gestate spaces. Howeve

, such a model would neve

theless have a much highe

timecomplexity than the models used in this disse

tation. If we we

eady to accept ala

model, we could di

ectly extend ou

system to a highe

, using a longe

context.Anothe app oach that would allow us to make use of additional featu es fo

monisation would be to use a loglinea

model. McCallum et al. (2000) p

oposed Maximum Ent opy Ma kov Models, simila to Hidden Ma kov Models butusing loglinea

obability dist

ibutions. Laffe

ty et al. (2001) note that Maximum Ent opy Ma kov Models suffe f om `label bias', p efe ing hidden states

which can only be followed by a few othe

states; they desc

ibe Conditional Ran

Chapte

dom Fields, which do not suffe

om this p

oblem. Thei

esults show that Maximum Ent

opy Ma

kov Models can pe

se than standa

d Hidden Ma

kovModels, while on the same data the Conditional Random Field pe

ms a little bette

. Conditional Random Fields could allow us to c

eate a bette

model ofcho

ale ha

monisation, taking account of mo

e dependencies in ou

data, but it isnot yet clea

how they can be efficiently t

ained. Even when they used thei

optimal Maximum Ent

opy Ma

kov Model pa

s as an initialisation fo

a Conditional Random Field model, the emaining Conditional Random Field t ainingtook ten times mo

ations than the Maximum Ent

opy Ma

kov Model t

aininghad taken in total.All the models we have conside

ed aim to gene

easonable ha

monisationsof the cho

ale melodies, without any fu

aints of fo

m. Bach's ownha

monisations, howeve

eflect mo

e specific intentions: fo

example, some a

eintentionally complex, and othe

s intentionally simple. If we could quantify thiskind of intent, then we could t

ain aspects of ou

monisation system on mo

especific g

oups of examples. As long as we ensu

ed that enough data was availableto t

ain each aspect of the system, the end

esults should be imp

oved by using

e specific and the

e homogeneous t

aining data in this way. As analte

native to labelling the cho

ales acco

ding to types, it would be possible to usea text catego

isation app

oach (Yang and Liu, 1999) and automatically label thecho ales acco ding to thei simila ities and diffe ences. A still mo e sophisticatedha monisation system would take into account the wo ds of the cho ales as itdecided upon thei

monic movements, since Bach's own ha

monisations a

eaffected by the p ope ties of the texts as well as of the melodies.

AppendixAllocation of cho

ales to data setsT

aining set (majo

):205 177 257 281 117 266 375 258 351 237 301 362 292 361 313 19 159 69 110245 218 75 234 285 78 130 64 304 219 128 356 119 104 158 360 89 86 275 364239 97 338 279 294 286 363 3 357 162 335 293 270 314 242 276 236 290 282269 67 88 109 76 378 306 359 23 118 353 339 268 238 352 247 241 77 92 212246 233

Test set 1 (majo

):4 98 291 13 150 254 134 179 103 365 334 220 215 366 146 123 261 296 80 27299 51 277 12 298 61 90 139 175 154 47 193 135 8 127 315 17 100 228 163 108

Test set 2 (majo

):309 155 54 211 30 24 243 46 101 74 235 107 131 189 136 147 20 82 312 273 181227 208 389 271 115 240 221 244 377 259 176 203 1 267 263 256 284 262 329

Chapte

Test set 3 (majo

):91 354 87 230 209 153 376 341 114 151 58 85 178 278 152 62 68 14 355 132 340255 66 323 60 113 358 192 231 280 214 289 204 9 283 129 202 295 194 102 81

aining set (mino

):106 18 141 160 217 206 53 311 388 37 195 39 125 171 333 156 33 373 232 1096 5 31 63 65 223 342 308 48 249 45 253 229 83 22 387 372 200 174 274 251 93196 32 382 332 168 310 34 172 343 287 164 49 248 328 148 379 70 2 145 226305 161 28 327 142 57 381 41 222 190

Test set 1 (mino

):173 224 26 140 260 316 16 126 300 250 144 124 73 137 71 79 347 186 344 21198 349 11 384 207 197 40 52 84 318 302 185 337 322 38 165

Test set 2 (mino

):303 187 6 15 182 350 325 169 252 143 42 43 265 348 371 25 367 170 330 183345 55 370 324 35 383 331 116 7 374 368 167 122 94 111 120

Test set 3 (mino

):29 138 50 386 380 288 210 27 36 317 225 216 166 385 320 264 188 112 201 319

326 59 56 95 346 72 149 105 307 157 321 369 184 180 299 44

Bibliog

aphyBach, J. S. (1998). Cho

ale ha

monisations, in a compute

eadable edition.ftp://i11ftp.i

a.uka.de/pub/neu

o/dominik/midifiles/bach.zip.Bu ke, S. M. (2000). MIDI::Simple. Available f om the Comp ehensive Pe lA

chive Netwo

k, http://www.pe

l.com/CPAN/.Chen, S. F. and Goodman, J. (1998). An empi

ical study of smoothing techniquesfo

language modeling. Technical Repo

t TR1098, Ha

d Unive

sity.Conklin, D. and Witten, I. H. (1995). Multiple viewpoint systems fo

music p

ediction. Jou

nal of New Music Resea

ch, 24:51–73.Fux, J. J. (1725). G

adus ad Pa

nassum. Vienna.Ghah

amani, Z. and Jo

dan, M. I. (1996). Facto

ial Hidden Ma

kov Models. InTou

etzky, D. S., Moze

, M. C., and Hasselmo, M. E., edito

s, Advances inNeu al Info mation P ocessing Systems, volume 8, pages 472–478. The MITP

ess.Hild, H., Feulne

, J., and Menzel, W. (1992). HARMONET: A neu

al net fo

monizing cho

ales in the style of J.S. Bach. In Lippman, R., Moody, J., andTou

etzky, D., edito

s, Advances in Neu

al Info

mation P

ocessing 4 (NIPS 4),pages 267–274. Mo

gan Kaufmann.Hinton, G. E. (1999). P

oducts of Expe

ts. In P

oceedings of the Ninth Inte

national Confe

ence on A

tificial Neu

al Netwo

oc ICANN 99), volume 1,pages 1–6.64

Bibliog

Holland, J. H. (1975). Adaption in Natu

al and A

tificial Systems. Unive

sity ofMichigan P

ess.Katz, S. M. (1987). Estimation of p

obabilities f

om spa

se data fo

the languagemodel component of a speech

ecognise

. IEEE T

ansactions on Acoustics,Speech, and Signal P

ocessing, ASSP35:400–401.Laffe

ty, J., McCallum, A., and Pe

a, F. (2001). Conditional Random Fields:p

obabilistic models fo

segmenting and labeling sequence data. In MachineLea ning: P oceedings of the Eighteenth Inte national Confe ence on MachineLea

ning (ICML2001), pages 282–289.Lidstone, G. J. (1920). Note on the gene

al case of the BayesLaplace fo

mulafo

inductive o

obabilities. T

ansactions of the Faculty of Actua

ies,8:182–192.L¨othe, M. (2000). Knowledgebased composition of classical minuets by a compute

. In P

oceedings of the AISB Symposium on A

tificial Intelligence andCultu

eativity, Bi

mingham.Ma

kov, A. A. (1913). An example of statistical investigation in the text of `Eugene Onyegin' illust

ating coupling of `tests' in chains. P

oceedings of theAcademy of Sciences, St. Pete

g, 7(VI):153–162.

McCallum, A., F

eitag, D., and Pe

a, F. (2000). Maximum Ent

opy Ma

kovModels fo

mation ext

action and segmentation. In Machine Lea

ning:P

oceedings of the Seventeenth Inte

national Confe

ence (ICML2000), pages591–598, Stanfo

d, Califo

nia.McInty e, R. A. (1994). Bach in a box: The evolution of fou pa t ba oque ha mony using the genetic algo

ithm. In P

oceedings of the IEEE Confe

ence onEvolutiona y Computation.Niv

e, J. (2000). Spa

se data and smoothing in statistical pa

tofspeech tagging.Jou

nal of Quantitative Linguistics, 7(1):1–17.

Bibliog

Pachet, F. and Roy, P. (1995). Mixing const

aints and objects: a case study inautomatic ha

monization. In TOOLS Eu

ope '95, pages 119–126. P

enticeHall.Pachet, F. and Roy, P. (2001). Musical ha

monization with const

aints: A su

vey.Const

aints, 6(1):7–19.PhonAmnuaisuk, S. and Wiggins, G. A. (1999). The fou

monisationp

oblem: a compa

ison between genetic algo

ithms and a

ulebased system. InP

oceedings of the AISB'99 Symposium on Musical C

eativity.Ponsfo

d, D., Wiggins, G., and Mellish, C. (1999). Statistical lea

ning of ha

monic movement. Jou nal of New Music Resea ch.Rabine

, L. R. (1989). A tuto

ial on Hidden Ma

kov Models and selected applications in speech

ecognition. P

oceedings of the IEEE, 77(2):257–285.Rameau, J.P. (1722). T

ait´e de l'Ha

eduite a

incipes natu

els. Pa

is.Riemenschneide

, A. (1941). 371 Ha

monized Cho

ales and 69 Cho

ale Melodieswith Figu

ed Bass. G. Schi

, Inc.Ron, D., Singe

, Y., and Tishby, N. (1994). The powe

of amnesia. In Advancesin Neu

al Info

mation P

ocessing Systems, volume 6, pages 176–183. Mo

ganKaufmann.Saul, L. K. and Jo

dan, M. I. (1998). Mixed memo

kov models: decomposing complex stochastic p

ocesses as mixtu

es of simple

ones. Machine

ning, 37:75–87.Schottstaedt, B. (1989). Automatic species counte

point. Technical

t, Stanfo

d Unive

sity CCRMA.Shannon, C. E. (1948). A mathematical theo

y of communication. Bell SystemTechnical Jou nal, 27:379–423 and 623–656.

Bibliog

Towsey, M., B

own, A., W

ight, S., and Diede

ich, J. (2001). Towa

ds melodicextension using genetic algo

ithms. Educational Technology and Society, 4(2).Tsang, C. P. and Aitken, M. (1991). Ha

monizing music as a discipline of const

aint logic p

amming. In P

oc. ICMC 1991, pages 61–64, Mont

eal.Vite

bi, A. J. (1967). E

bounds fo

convolutional codes and an asymptotically optimum decoding algo

ithm. IEEE T

ansactions on Info

mation Theo

y,13:260–267.Yang, Y. and Liu, X. (1999). A

eexamination of text catego

ization methods.In P oceedings of ACM SIGIR Confe ence on Resea ch and Development inInfo

mation Ret

ieval (SIGIR '99), pages 42–49.

277049642 Harmonising Chorales

Documents