Quantifying Pianist Style - An investigation of performer...

QUANTIFYING PIANIST STYLE - AN INVESTIGATION OF

PERFORMER SPACE AND EXPRESSIVE GESTURES FROM AUDIO

RECORDINGS

Submitted In partial fulfillment of the requirements for the

Master of Music in Music Technology

in the Department of Music and Performing Arts Professions

The Steinhardt School

New York University

Advisor: Prof. Juan Pablo Bello

Cheng-i Wang

February 2013

c� Copyright by Cheng-i Wang 2013

All Rights Reserved

ii

Acknowledgement

I would like to express my great appreciation to Dr. Juan Bello, my thesis advisor, for

his valuble and constructive suggestions during my course of study in this program and

throughout the development of this thesis. His willingness to share his wisdom, knowledge

and time so generously has been very much appreciated. I would also like to thank Dr.

Agnieszka Roginska for her support, suggestions and guidance in keeping my progress in

the right track and on schedule. My grateful thanks are extended to Dr. Kenneth Peacock,

Dr. John Gilbert, Prof. Dafna Naphtali and Prof. Tom Beyer for their generous academic

advisements. I would like to thank all the members from the MARL research group, who

constantly inspired and motivated me with constructive discussions and ideas. I would also

like to thank Mr. Justin Mathew, Mr. Andrew Madden and Mr. Donald Bosley for being

such positive influences and companions for the past two years. I would like to thank my

family for their support and understanding. Finally, I would like to thank Ms Fanning Chi

for her unconditional support and encouragement throughout my study.

iii

Contents

Acknowledgement iii

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Approach 12

2.1 The Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.1 The CHARM Mazurka project . . . . . . . . . . . . . . . . . . . . 13

2.1.2 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3 Structures of Performers . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

iv

2.3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4 Feature Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Discussion 40

3.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.1.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.1.2 Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.1.2.1 Beat-level Features . . . . . . . . . . . . . . . . . . . . 41

3.1.2.2 Similarity Measurement . . . . . . . . . . . . . . . . . . 51

3.1.2.3 Feature Refinement . . . . . . . . . . . . . . . . . . . . 53

3.1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Bibliography 65

v

List of Tables

1.1 Topics under the study of music performance . . . . . . . . . . . . . . . . 3

1.2 Subareas of measurements of performance . . . . . . . . . . . . . . . . . . 3

1.3 Mechanisms for explaining music performance . . . . . . . . . . . . . . . 5

1.4 Computational models of expressive music performance . . . . . . . . . . 5

2.1 Stats for the recordings with metadata . . . . . . . . . . . . . . . . . . . . 15

2.2 Beat-by-beat features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Query list for feature selection . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4 Top five feature combinations by correct return rate for L2 �norm . . . . . 18

2.5 Top five feature combinations by correct return rate for cosine distance . . 18

2.6 Union set of performers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1 Section bar numbers - op.63 no.3 . . . . . . . . . . . . . . . . . . . . . . . 42

3.2 Comparison between Ashkenazy and Shebanova from a historical perspective 52

3.3 Comparison between Rubinstein and Milkina from a historical perspectives 52

vi

List of Figures

1.1 Illustration of the performer structure problem . . . . . . . . . . . . . . . . 8

1.2 Diagram of approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1 Visualization of metadata - Mazurka op.17 no.4 by Ashkenazy . . . . . . . 14

2.2 Normalized dynamics of selected pairs of performers . . . . . . . . . . . . 19

2.3 Normalized 2nd order derivatives of duration of selected pairs of performers 20

2.4 Normalized Hall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.5 Histogram of returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.6 Hmp, summarization matrix from mutual proximity . . . . . . . . . . . . . 27

2.7 Normalized query/search returns using mutual proximity . . . . . . . . . . 28

2.8 Histogram of returns (mutual proximity) . . . . . . . . . . . . . . . . . . . 29

2.9 Envelopes of normalized second order derivatives of duration . . . . . . . 32

2.10 Reconstruction of normalized dynamics with fitted polynomials . . . . . . 34

2.11 Reconstruction of normalized 2nd order derivatives of duration with fitted

polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

vii

2.12 Comparison of reconstructed curves . . . . . . . . . . . . . . . . . . . . . 36

2.13 Normalized Hmp�re f ined using mutual proximity . . . . . . . . . . . . . . . 38

2.14 Histogram of returns (mutual proximity & feature refinement) . . . . . . . 39

3.1 Comparison of {d̂, t̂ 00} between score, section A1 of Chopin Op. 63 no.3 . . 43


3.3 Comparison of {d̂, t̂ 00} between score, section B1 of Chopin Op. 63 no.3 . . 44

3.4 Comparison of {d̂, t̂ 00} between score, section B2 of Chopin Op. 63 no.3 . . 45

3.5 Comparison of {d̂, t̂ 00} between score, section C of Chopin Op. 63 no.3 . . . 46

3.6 Comparison of {d̂, t̂ 00} between score, section D of Chopin Op. 63 no.3 . . 46

3.7 Difference between Aas and Arm in section C, op.63 no.3 . . . . . . . . . . 47




3.11 Hierarchical clustering from Hmp . . . . . . . . . . . . . . . . . . . . . . . 54

3.12 Normalized Hmp using {t̂, t̂ 0, ˆt 00} . . . . . . . . . . . . . . . . . . . . . . . 55

3.13 Normalized Hmp using {t̂} . . . . . . . . . . . . . . . . . . . . . . . . . . 55

viii

Chapter 1

Introduction

1.1 Background

Music, in the context of classical Western music, consists of three main human compo-

nents; composers, performers and audiences. Music could also be viewed as an activity, and

the corresponding components become composition, performance and listening (Sloboda,

1985). Composers carry out ideas through their composition, and document these compo-

sitions on musical scores. The role of the performer is not just a transmitter transmitting

musical information from the composer to the audience, but an interpreter who is responsi-

ble for re-creating or creating the evolving structure of the music being played (Rink et al.,

2011). Performances are then conveyed to audiences in live concerts or through recordings.

1

CHAPTER 1. INTRODUCTION 2

Whether a piece of music is appreciated in a live concert environment or through record-

ings, performers have deep influences on what the music sounds like by means of expres-

sive performance. Expressive performance refers to the phenomenon when performers try

to express the intrinsic affect of the composition by varying aspects of control themselves.

Performances of one piece of music differ from performer to performer, and across dif-

ferent renditions of the same piece by one performer. This difference reflects the fact that

each performer has their own way of realizing the composition, and each realization by

the same performer is different. Well-known performers are appraised by their abilities to

execute their aesthetic interpretations with precision and elegance, and differentiate them-

selves from other performers (Sloboda, 2000). It is also an established fact that, whether

caused consciously or not by the performers, one of the effects of expressive performance is

conveying the grouping structure of the composition. Also, experienced performers make

greater use of expressive variations to enhance the communication of grouping structures

than less experienced performers (Sloboda, 1983).

A conclusion could be drawn from the previous discussion, that performers play a very

important role in the communication of music from composers to audiences. Thus the study

of music performance and performers is one of the keys to understanding the mechanism of

music. Studies related to music performances can be traced back to 18th century (Gabriels-

son, 2003). Empirical studies of music performances started around 1900 and focused on

timings in musical performance. Seashore‘s textbook on music psychology, Psychology of


Table 1.1: Topics under the study of music performanceIntroduction

Performance planning

Sight reading

Improvisation

Feedback in performance

Motor processes in performance

Measurements of performance

Models of music performance

Psychological and social factors

Performance evaluation

Table 1.2: Subareas of measurements of performanceTiming & Dynamics

Structure

Tempo

Ritardando

Asynchronizaiton

Perceptual Effects

Intonation & Vibration

Conductor

Intention

Music, (Seashore, 1967) could be seemed as a summarization of research in the first half of

the 20th century. In the second half of 20th century, multiple topics under the study of per-

formance were starting to be investigated. A list of topics are listed in Table 1.1(Gabriels-

son, 2003). Measurements of performance data were the dominant topic in performance

research (Gabrielsson, 2003). A list of all of the subareas of performance measurements

are provided in Table 1.2. Under the topic of measurements of performance, timing and

dynamics are the most emphasized subareas besides intonation and vibrato which focus on

the singing voice and string family. Piano or keyboard instruments were the main focus in


the study of timings and measurements. Several studies conducted by Repp confirmed and

revealed some of the tendencies and phenomena of music performance (Repp, 1990, 1995,

1996, 1997, 1998b,c,a). Some of the findings are listed below in bullets;

• Experts and graduate students have similar group average timing pattern and individ-

ual consistency, but experts showed much more individual expressivity.

• It is strongly suggested that “articulation” or “touch”, a performance attribute very

hard to measure and define, also accounts for a large portion of expressive perfor-

mance besides timings and dynamics.

• An “average” performance gathered across students‘ performances, received the

highest score in aesthetic quality but weak in individuality.

• Some timing and dynamics patterns are extracted by analyzing a collection of

Chopin‘s Etude in E Major, but just a few performers conformed to the patterns.

This research accumulated a large amount of measurement data, which naturally led to

the development of models of performance. The study of models of music performance

aims at finding rules or patterns that explain and summarize the phenomenon extracted

from the measurement data collected. Most of the research in late 20th century considered

combining different mechanisms to explain music performance. A list of mechanisms is

provided in Table 1.3(Gabrielsson, 2003) . Details regarding the related research until the

21st century could be referenced in (Gabrielsson, 2003). Besides distinguishing research by


Table 1.3: Mechanisms for explaining music performanceListener’s learned expectation

Psycho-acoustical/perceptual factors

Motor constraints

Musical structure

Table 1.4: Computational models of expressive music performanceModel Description

The KTH model Rule-based system. Analysis-by-synthesisThe Todd model Direct link between structure and performance

Analysis-by-measurementThe Mazzola model Mathematical music theory. No empirical evaluation

The machine learning model Data-Driven. Data mining techniques

“mechanisms considered”, “approaches used” provide another way to categorize research.

From the beginning of the 21st century in particular, computational or quantitative models

have been gaining more and more attention.

Computational models of expressive music performance embody mathematical models

which define relationships between variables provided by measured data. Four prominent

computational models are listed in Table 1.4 (Widmer and Goebl, 2004). The machine

learning model takes the advantage of the improvements in hardware power and algorithm

developed during the past decade. IMP/ML@OFAI1, conducted a series of investigations

regarding the understanding and visualization of expressive piano performances with ma-

chine learning approaches (Dixon et al., 2002; Flossmann et al., 2009, 2010; Goebl et al.,

2004; Grachten and Widmer, 2007; Grachten et al., 2008; Grachten and Widmer, 2011;1The Intelligent Music Processing and Machine Learning Group of the Austrian Research Institute for

Artificial Intelligence


Madsen and Widmer, 2006; Pampalk et al., 2003; Stamatatos and Widmer, 2002; Widmer,

2001; Widmer et al., 2003; Widmer and Goebl, 2004; Widmer and Zanon, 2004). Efforts

were made to accomplish tasks such as automatic pianist recognition, expressive gesture

visualization and artificial piano performances. Successful results were presented in the last

two areas, but no significant progress about automatic pianist recognition was reported.

Research from the perspective of the relationship between audio recordings and musi-

cology also investigated the subject of pianist style in a quantitative fashion. The CHARM

Mazurka project (Sapp, 2007, 2008)is a result of such investigation. A large amount of

recordings and metadata about performances of Chopin‘s mazurkas were collected, and

research motivated by finding relationships between performers was conducted.

1.2 Motivation

Since computational models based on data-driven approaches make very few assumptions

about the data, and extract patterns or information out of the data objectively. The construc-

tion of an objective framework for expressive music performances based on data-driven

approaches provide a tool that is absent in musical research. With such a framework, mu-

sicologists could verify historical and theoretical issues about music performances, and

researchers in music cognition or perception could gain further understanding about music

performances such as expert performance, the mechanism interpretation, expressive varia-

tion ... etc.


Data-driven approaches rely on the existence of a certain amount of data. In the con-

text of studying expressive music performance, take piano as an example, the data could

either be MIDI recordings (Goebl et al., 2005) from expert performers or audio recordings

of performances. MIDI recordings makes it possible to analyze synchronization between

two hands and articulation, which are two very difficult tasks if given audio recordings. Al-

though MIDI recordings present itself as a better choice than audio recordings in the aspect

of information carried, their availability is a serious problem in data-driven approaches.

The amount of MIDI recordings is far less than that of audio recordings, thus making

MIDI recordings not suitable for data-driven performance analysis.

A large part of previous studies in computational modeling for expressive music perfor-

mances have been focused on forging low-level representations out of music surface-level

events, such as timings and dynamics (Sapp, 2007; Widmer et al., 2003). The relationship

between the surface and its underlying rendering mechanism is a dynamic process which

evolves with time (Rink et al., 2011). It is reported that expert performers usually per-

form with individuality and do not conform to the average performance (Sloboda, 2000).

This finding agrees with the common perception that performers have their own “style” of

comprehending, interpreting and performing music. The goal of this thesis is to devise a

framework for verifying how this phenomenon reflects itself in audio recordings from the

perspective of signal processing and information retrieval. The goal described is two-folds:

how recordings are structured with performers as labels; what features are piece-invariant.


Figure 1.1: Illustration of the performer structure problem

Structure of Performers

Since there is neither any evidence showing that each performer is individually distinguish-

able, nor is there ground truth about how performers should be classified, the structure of

how performers are related to each other in the space defined by the features extracted

from performance measurements should be investigated in an unsupervised manner. This

structural problem could be explained using the plot in Figure 1.1. In the three subplots in

Figure 1.1, every circle represents a different performer and the assigned color represents

the ’true’ groupings of the performers. Each subplot illustrates one possible realization of

the groupings of performers out of many different possibilities, and the true distribution is


unknown to observers. In other words, the assumption that each performer has their own

individual style, hence making it possible to separate them individually in some feature

space should be challenged (Stamatatos and Widmer, 2002; Widmer and Zanon, 2004).

This thesis aims at proposing a framework for investigating such an issue.

Piece-invariant Features

Performances by the same performer with different pieces can not be grouped together

directly since the feature sequences representing each performance have different length

given different pieces. In order to tackle this issue and facilitate the understanding of music

performance, piece-invariant features have to be devised. Piece-invariant features should

reflect the characteristics of the performers instead of the pieces themselves. To the best of

the author‘s knowledge, there has not been any success in devising such features.

1.3 Outline

The workflow is as follows; first, each performance is represented by a vector constructed

via performance measurements from the dataset, then a nearest neighbor search is applied

to each piece given a set of queries in the vectors representing performances. A similarity

measurement matrix is then constructed by aggregating the normalized query/return counts

for each piece. Pairs of query/return in the similarity measurement matrix with high values

will be selected to be used for feature refinement. The goal of feature refinement is to find


Figure 1.2: Diagram of approaches

characteristics separating groups of performers, then design hand-crafted features based on

those found characteristics. This feature refinement step could be think of as a loop process

for improving the features representing performances. In this thesis only one iteration of

the refinement process is studied. The results from the similarity measurement and the

feature refinement are evaluated with musical interpretations.

A diagram of the framework is provided in Figure 1.2 . The outline of this thesis is


as follows; Chapter 2 provides the description of the dataset and approaches. Discussions,

evaluations and future works are covered in Chapter 3.

Chapter 2

Approach

2.1 The Dataset

For the study of expressive performance analysis, piano is considered a suitable instrument

for the task since the performance attributes that a pianist could control are relatively simple

and more easily quantified than for brass, wind and the string family (Widmer et al., 2003).

Also piano performance has a large amount of recordings, thus making this instrument

suitable for data-driven approaches. The advantage of studying expressive performance

with Mazurkas by Chopin is because of their structure, which were borrowed from folk

models, forming the Mazurkas into well structured, repetitive pieces (Rink et al., 2011).As

a result, this makes the comparison between different parts of the piece an easier task.

12

CHAPTER 2. APPROACH 13

2.1.1 The CHARM Mazurka project

The dataset is an archive of recordings and metadata of Chopin‘s mazurkas from the

Mazurka Project1 conducted at CHARM (Center of the History and and Analysis of

Recorded Music, UK) (Sapp, 2007). The collection has 2919 recorded performances for

49 mazurkas. Each mazurka has on average, over 50 renditions from different performers.

The time period of the recordings collected in the dataset spans from 1902 to 2006. A

complete discography of the recordings can be accessed from the project website.

Besides the collection of recordings, metadata comes along with the dataset. Two kinds

of metadata are available for part of the recordings, the duration in seconds and dynamics

of each beat location. The beat locations are annotated in a semi-automated fashion and

then dynamics are calculated given these locations. Details of the approaches can be found

in (Sapp, 2007). From the metadata, durations and tempos of each beat and the average

values across different performances of the same mazurka are then derived. A simple plot

in Figure 2.1 shows the beat-level metadata provided in the dataset.

2.1.2 Pre-processing

In order to investigate the quantitative relationships between performances, only recordings

with metadata attached are included in the analysis. Since not every recording with beat

locations has dynamics information, additional dynamic information is calculated in the1http://mazurka.org.uk


Figure 2.1: Visualization of metadata - Mazurka op.17 no.4 by Ashkenazy


Table 2.1: Stats for the recordings with metadata# of performances # of beats within the piece

Op.17 No.4 62 (32) 396Op.24 No.2 64 360Op.30 No.2 33 192Op.63 No.3 82 228Op.68 No.3 49(27) 180

Total # 290 81444(Numbers in parenthesis represent additional entries by this thesis.)

Table 2.2: Beat-by-beat featuresWithout normalization With normalization

Duration t, t 0, t 00 t̂, t̂ 0, t̂ 00

Dynamics d̂, d̂0, d̂00

same manner mentioned in (Sapp, 2007). In Table 2.1, the numbers in the parenthesis

represent the quanty of dynamic entries added during this research.

Given the beat-by-beat durations and dynamics of the Mazurka performances, basic

pre-processing could be applied to expand the raw beat-level information. Derivatives and

normalization are applied to the raw data to expand the number of beat-level features. Du-

ration value are denoted by t, dynamics are denoted by d. First and second order derivatives

are denoted by 0 and 00. Normalization to zero mean and unit standard deviation denoted

with ˆ is applied across each performance. Table 2.2 shows a list for the features used.

It should be noted that since recordings span across more than a century, dynamic data is

alway normalized due to varying dynamic range of recording technology across this period.


Since the metadata are stored in excel files, a python module is implemented to ac-

cess, parse and process the metadata from the them. The python module is available at

https://bitbucket.org/ciwang/exper.

2.2 Features

In order to computationally analyze each performance, devising a numerical representation

appropriate for representing the performance is necessary before analysis is carried out.

2.2.1 Method

With the beat-level measurements extracted from the dataset, it is still unknown what com-

bination of measurements form the best feature vector to represent a performance. In order

to filter out a combination to start working with, an exhaustive search in the measurement

combination space is implemented and a criterion for ranking the combination is devised.

Similar to the evaluation conducted in (Sapp, 2008), since the goal is to devise a feature

set that could quantify a pianist’s style, one way to devise the criteria is to treat the feature

sequence of each performance as a point in a high-dimensional space and check if perfor-

mances by the same performer are closer than those by others. A table, showed in Table 2.3

, of query recordings is constructed by choosing performances recorded more than once by

the same performer in each of the five mazurkas. Each performance listed in Table 2.3 will

be used as a query and a nearest neighbor search will be performed to return the nearest

https://bitbucket.org/ciwang/exper


Table 2.3: Query list for feature selectionOp. 17, No. 4 Op. 24, No. 2 Op. 30, No. 2 Op. 63, No. 3 Op. 68, No. 3

Rubinstein1939 Rubinstein1939 Rubinstein1939 Rubinstein1939 Rubinstein1938Rubinstein1952 Rubinstein1952 Rubinstein1952 Rubinstein1952 Rubinstein1952Rubinstein1966 Rubinstein1966 Rubinstein1966 Rubinstein1966 Rubinstein1966Horowitz1971 Richter1960 Tsong1993 Friedman1923Horowitz1985 Richter1961 Tsong2005 Friedman1930Czerny1949 Garcia2007

Czerny1949b Garcia2007bRosenthal1930Rosenthal1931

Rosenthal1931bRosenthal1931cRosenthal1931d

Uninsky1932Uninsky1971

Zak1937Zak1951

three performances. Two metrics, L2 � norm and cosine distance were used. The query

itself is excluded during the search. If any one of the returns has the same performer as

the query, it is considered correct, an incorrect return show the opposite. If the piece has N

beats and the feature vector consists of M measurements, then each performance is a vector

of length N ⇥M. An exhaustive search of all the combinations of features in Table 2.2 is

conducted. For each measurement combination, the average correct rate of returns across

all the queries is calculated, then the measurement combination with the highest correct

rate is chosen.


Table 2.4: Top five feature combinations by correct return rate for L2 �normFeature combinations Correct return rate

d̂, t̂ 00 97%d̂,t̂ 0, t 00 94%

d̂, t̂ 91%d̂, t̂ 0 91%

d̂, d̂0, t̂ 88%

Table 2.5: Top five feature combinations by correct return rate for cosine distanceFeature combinations Correct return rate

d̂, t̂ 00 97%d̂0, t̂, t 00 94%d̂0, d̂0, t̂ 0 94%d̂, t 0, t̂ 0 92%t̂, d̂0, ˆt 00 88%

2.2.2 Results

The top five results from both metrics are listed in Table 2.4 and Table 2.5 . Multiple combi-

nations return the same percentage but only the combinations with least features are kept in

the results. From the results, the combination of normalized dynamics and normalized sec-

ond derivatives of durations, give the best correct return rate. The implication and musical

explanation of this feature combination will be discussed in Chapter 3. The result is then

fed to the next stage to form distance matrices. In Figure 2.2 and Figure 2.3, performances

are plotted with the result feature set {d̂, t̂ 00}.


Figure 2.2: Normalized dynamics of selected pairs of performers


Figure 2.3: Normalized 2nd order derivatives of duration of selected pairs of performers


2.3 Structures of Performers

Since each piece has different beat length, a straightforward comparison between perfor-

mance feature sequences is unachievable. Also the relationships between performers in

each piece is unknown. Thus an approach to tackle the two problems is proposed in the

following.

2.3.1 Method

With the feature set {d̂, t̂ 00} from Section 2.2, another query/search task could be used to

derive a similarity measurement between pairs of performers. Only the union of performers

of the five Mazurkas are considered in this stage to obtain a more general structure across

different pieces. The union set S with 19 performers are showed in Table 2.6 . Then the

steps are as follow;

1. For each piece, a 19⇥ 19 matrix Hm is created with m = {17,24,30,63,68} repre-

senting each piece by its opus number.

2. Then for each piece m, each performer si in S is used as a query to return the three

nearest neighbor s j, i 6= j from S using L2�norm2 .Then increment each of the three

cell (i, j) in Hm by 1.2Using cosine distance yields almost the same return results.


Table 2.6: Union set of performersOp.17 No.4 Op.24 No.2 Op.30 No.2 Op.63 No.3 Op.68 No.3 Total Counts

Ashkenazy 1 1 1 1 1 5Biret 1 1 1 1 1 5

Brailowsky 1 1 1 1 1 5Chiu 1 1 1 1 1 5

Cortot 1 1 1 1 1 5Fliere 1 1 1 1 1 5

Francois 1 1 1 1 1 5Hatto 1 2 1 2 2 8Indjic 1 1 1 1 1 5

Luisada 1 2 1 1 1 6Lushtak 1 1 1 1 1 5Magaloff 1 1 1 1 1 5Milkina 1 1 1 1 1 5

Mohovich 1 1 1 1 1 5Rangell 1 1 1 1 1 5

Rubinstein 3 3 3 3 3 15Shebanova 1 1 1 1 1 5

Smith 1 1 1 1 1 5Uninsky 1 1 1 2 1 6

Total Counts 21 23 21 23 22 110The number in each cell represents the number of performances of each piece by eachperformer


3. A summarizing matrix Hall is obtained by summing all the Hm matrices together,

then each row in Hall (representing a query and all its returns) is normalized to unit

sum to balance between different query counts between performers. The values in

cell (i, j) of Hall could be viewed as a similarity measure between performer i and

performer j(not symmetric).

Hall and the histogram of returns by different performers are showed in Figure 2.4 and

Figure 2.5 . Observe that in Figure 2.4 and Figure 2.5, it is obvious that some performers

dominate the return results in certain Mazurkas, such as Milkina in op. 17 no. 4, Uninsky

in op. 24 no. 2, Biret in op. 30 no. 2 ... etc. Milkina has the highest return counts in

op. 17 no. 4, op. 63 no. 3, op. 68 no. 3 and in total. The non-uniform distribution in

the return histogram suggests that the ’hubness’ of the feature space is relatively strong.

Hubness refers to the phenomenon that in a high dimensional space, some points may be

closer to every other points in the space due to high dimensionality and the feature used to

construct the space, not because of the characteristic of the point itself (Flexer et al., 2012).

This is a well-known issue in music similarity related applications, such as recommenda-

tion and search. In the context of this thesis, it is possible that some specific performers

possess the style that is similar or, in a sense, close to the average of most of the perform-

ers. But since there is no evidence of supporting this point of view, a remedy to ease the

’hubness’ of the feature space should be applied to see if it will improve the outcome of


Figure 2.4: Normalized Hall


Figure 2.5: Histogram of returns


the query/search task. The word ’improve’ here means that, the ideal distribution of the re-

turn counts histogram should be uniform, but peaks still appear in Hall . Mutual proximity

(Schnitzer et al., 2011) transforms the distances between a point x and all other points yi

in the space to the probability P(yi isthe nearest neighbor o f x), denoted by P(yi ! x), by

assuming the distribution of the distances from yi to x is a Gaussian distribution. Then the

distance between point x and point y is recalculated as P(y ! x)⇥P(x ! y). It has been

shown that by replacing original distance matrix with mutual proximity, the hubness of the

feature space could be eased (Schnitzer et al., 2011; Flexer et al., 2012). Since the mutual

proximity are themselves values with probablistic properties, the query/search step could

be skipped and all five Hm could be multipied element-wise to produce a 19⇥19 summa-

rizaion matrix with pair-wise similarity measurements. In order to apply mutual proximity,

steps 2 and 3 are replaced by following steps. The resulting normalized histogram will be

denoted by Hmp;

• Calculate the 19⇥ 19 L2 � norm pair-wise distance matrix Dm, then transform Dm

from L2 �norm to mutual proximity Dmp.

• Calculate Hmp by Hmp(i, j) = Ân in m logDnp(i, j).

2.3.2 Results

Hmp is plotted in Figure 2.6 . Query/search results using mutual proximity are showed in

Figure 2.7 and Figure 2.8 . Comparing them with Figure 2.4 and Figure 2.5, it is clear that


Figure 2.6: Hmp, summarization matrix from mutual proximity


Figure 2.7: Normalized query/search returns using mutual proximity


Figure 2.8: Histogram of returns (mutual proximity)


the distribution of returns becomes flatter after applying mutual proximity, and both Hmp

and the normalized query/return results also maintain peaks with more clarity.

2.4 Feature Refinement

To further investigate how to derive piece-invariant features, the characteristics that differ-

entiate performer groups from each other are examined. The results of the examination are

facilitated to refine handcrafted features.

2.4.1 Method

The goal of feature refinement is to find features capable of separating performers but not

pieces. By observing the similarity matrix in Figure 2.7, pairs of query/return showing

high similarity with each other are selected as groups of performers to be examined. The

criteria of examination is to find characteristics that separate group from group. Feature

refinements are then conducted to achieve a more compact representation of performances.

2.4.2 Experiments

Two pairs of performers are selected as groups to be examined and compared. The first

pair is Rubinstein and Milkina and the second is Ashkenazy and Shebanova, which will

be referred as Arm and Aas respectively. Aas showed symmetric peaks in the Hmp which

means that both performers appeared as returns multiple times given the other performer


as queries. Although the similarity between Rubinstein and Milkina is not symmetric, the

pair is chosen because of its strong similarity values given Rubinstein as query and Milkina

as return.

Refinement based on d̂

By observing the dynamic curves inFigure 2.2, it is obvious that the shapes of d̂ are similar

within pairs of performers but dissimilar between pairs. To model these shapes of dynamic

curves, they are first segmented into sections according to the form analysis of the respec-

tive score. Then each section is modeled with polynomial fitting. The coefficients of the

fitted polynomials of each section are then considered as the new representation of d̂.

Refinement based on ˆt 00

For ˆt 00, the initial observation of Figure 2.3 suggested that the curves exhibit semi-

oscillating behaviors. But it is determined that although the curves do exhibit semi-

oscillating behaviors, they are not the factors that separate groups of performers from each

other. Further investigation of Figure 2.3 suggested that it is the envelope ( Figure 2.9) of

the curves separating groups of performers from each other. The envelopes are obtained by

a full-wave rectification followed by a moving average of window length 3. To model the

shape of the envelope, the same approach in 2.4.2 is adopted. The resulting coefficients of

the fitted polynomials of each section are then the new representation of ˆt 00.


Figure 2.9: Envelopes of normalized second order derivatives of duration


Coefficients of fitted polynomial as features

For polynomial fitting, an appropriate order of the polynomial has to be chosen. To choose

an appropriate order, the 1st through 8th orders are tested using the same criteria for choos-

ing feature combination in Section 2.2. The best result is 89% with 1st order for both d̂

and ˆt 00, which means that the fitted polynomials for each section of d̂ and ˆt 00 are in the form

y = ax+b. The reconstructed curves (actually a straight line with slope a and offset b) are

plotted in Figure 2.10 and Figure 2.11 . A plot comparing the reconstructed curves of Arm

and Aas is in Figure 2.12 .

One further question which could be asked about the refined feature is, to what ex-

tent the sequential relationship between sections will influence the ability of separat-

ing groups of performers? If the sequential relationship could be ignored, then it is

possible to design piece-invariant features by summarizing over the sequential features.

To summarize the refined features without taking sequential relationships into account,

the average and standard deviation for coefficients of different orders across each sec-

tion are calculated. The resulting feature for each performance is a vector of length

(# o f orders + 1)⇥ 2(mean & standard deviation)⇥ 2(d̂ & ˆt 00). All 64 combinations of

polynomial fitting orders for d̂ and ˆt 00 from 1 to 8 are tested, using the same criteria as in

2.2.1.3The best combination of orders is still 1st for both d̂ and ˆt 00 with correct rate 44%.3The metric used is L2 � norm. Since summarizing using means and standard deviations of coefficients

implies modeling the coefficients with mixture of gaussians, an altered KL-divergence to measure the simi-larity between two mixtures was also used but yielded only 11% of correct rate.


Figure 2.10: Reconstruction of normalized dynamics with fitted polynomials


Figure 2.11: Reconstruction of normalized 2nd order derivatives of duration with fittedpolynomials


Figure 2.12: Comparison of reconstructed curves


2.4.3 Results

The similarity measurements using the refined features, coefficients of fitted polynomials,

are plotted in Figure 2.13 and Figure 2.14. In short, the refined features broadened the

histogram of returns(Figure 2.14) more than applying mutual proximity only, and more

closely symmetrical pairs of performers appear in Figure 2.13.


Figure 2.13: Normalized Hmp�re f ined using mutual proximity


Figure 2.14: Histogram of returns (mutual proximity & feature refinement)

Chapter 3

Discussion

3.1 Discussion

3.1.1 Dataset

The most critical issue effecting the whole thesis concerns the size of the dataset. Although

the Mazurka Project has 2,919 recordings of Chopin mazurka in the collection, and several

research projects in the area of music information retrieval have been taking advantage

of it (Bello, 2009, 2011; Bello et al., 2012; Nieto et al., 2012). From the perspective of

expressive performance there are only 290 recordings attached with metadata. On average

each of the 5 mazurkas has 30 ~ 60 different renderings by different performers, meaning

that most of the performers have 1 ~ 4 performances across the 5 pieces. If we consider the

task of quantifying pianist style in the context of classification problems (Stamatatos, 2001;

40

CHAPTER 3. DISCUSSION 41

Stamatatos and Widmer, 2002; Widmer and Zanon, 2004), the dataset size for each label(in

this case, the performers) is very small(1 ~ 15) which makes it very difficult to adopt pattern

recognition techniques to extract meaningful timing/dynamic patterns or discrimination

functions for each of the labels. To make things worse, since performances by the same

performer are spread over the 5 mazurkas most of the time, it is very difficult to marginalize

the effect of the composition itself. Not only is pattern recognition very difficult given this

dataset, but also the validation of any findings from this dataset itself is a non-trivial topic.

3.1.2 Approaches

3.1.2.1 Beat-level Features

Examples of beat-level features, {d̂, t̂ 00}, are plotted in Figure 2.2 and Figure 2.3. The

examples are performances from Arm and Aas. The shapes of dynamic curves between

members in each of its pairs are closer than the shapes between pairs. The differences

in dynamic shapes between Arm and Aas clearly displayed a different phrasing strategy

between these two pairs of performers. To further investigate how {d̂, t̂ 00} reflect themselves

in characterizing the performances, zoomed-in inspections on each section of Op. 63 no. 3

by Arm and Aas are conducted. The inspections are done by comparing {d̂, t̂ 00} to the score

section by section, and then observing their commonalities and differences in detail. The

bar numbers corresponding to the sections obtained by form analysis are provided in Table

3.1.


Table 3.1: Section bar numbers - op.63 no.3Section Bar numbers (bar/beat)

A1 0/3 ⇠ 8/2A2 8/3 ⇠ 16/1B1 16/2 ⇠ 24/2B2 24/3 ⇠ 32/3C 33/1 ⇠ 40/3D 41/1 ⇠ 48/3

A3 49/1 ⇠ 56/3A4 57/1 ⇠ 64/3A5 65/1 ⇠ 76/1

Section A1 & A2

In Figure 3.1 and Figure 3.2, section A1 and A2 are plotted against {d̂, t̂ 00} respectively.

The first obvious difference between Aas and Arm is at the beginning of the piece and is

marked by a yellow box on d̂. Arm begins the piece with a more powerful dynamic then

gradually becomes softer towards the second bar, while Aas begins the piece softly then

gradually gets louder towards the end of first bar. The second difference is from bar 3 to

bar 4 and is marked by a purple box on t̂ 00. From bar 3 to bar 4, Arm exhibits an oscillatory

behavior with t̂ 00 while Aas is more smooth during the two bars. The t̂ 00 defines a measure

for the shape of each three points along the curves. The oscillation of Arm shows that the

shape of each three beats represented by t̂ 00 is changing back and forth between concave and

convex at each beat. The change between concave and convex means that during the two

bars, the duration of each beat changes radically and the direction of change also changes

frequently.


Figure 3.1: Comparison of {d̂, t̂ 00} between score, section A1 of Chopin Op. 63 no.3



Figure 3.3: Comparison of {d̂, t̂ 00} between score, section B1 of Chopin Op. 63 no.3

In section A2, there is also a difference in the beginning of the section in terms of

dynamics which is annotated by a yellow box. From beat 3 to 6, Aas is louder relatively to

the pick-up notes while Arm stayed relatively constant until beat 6.

Section B1 & B2

In Figure 3.3 and Figure 3.4, section B1 and B2 are plotted against {d̂, t̂ 00} respectively. For

the B sections, although not significantly distinguishable, d̂ of both groups are similar to the

other performance in the group but different from the other group. Despite the difference

between Arm and Aas in d̂, both d̂ of the two groups have a smooth arc shape. The arc shape

represents a general phrasing strategy in which the phrase will start softly, then gradually

increase the volume toward the end of the phrase, then becomes softer again at the end


Figure 3.4: Comparison of {d̂, t̂ 00} between score, section B2 of Chopin Op. 63 no.3

of the phrase. One specific observation about d̂ is the diminuendo mark at beat 5 to 7 in

section B1(marked by yellow box). Both d̂ in Arm and Aas actually becomes louder during

those three beats instead of following the dynamic notation.

Section C & D

In Figure 3.5 and Figure 3.6, section C and D are plotted against {d̂, t̂ 00} respectively.

For the C section, from beat 8 to 16 (marked by a yellow box), the t̂ 00 of Arm has a more

obvious oscillatory behavior than Aas. The reason for this difference could be understood

as a difference between Arm and Ash in treating local phrase endings. The ^�_�^ shape

of Arm means that toward the end of the first sub phrase in section C (bar 4), the speed

slowed down to mark the end of a phrase and then sped up at the begging of the second


Figure 3.5: Comparison of {d̂, t̂ 00} between score, section C of Chopin Op. 63 no.3

Figure 3.6: Comparison of {d̂, t̂ 00} between score, section D of Chopin Op. 63 no.3


Figure 3.7: Difference between Aas and Arm in section C, op.63 no.3

sub phrase (bar 5), on the contrary, Ash does not display drastic change thus implying the

interpretation of treating the whole section as one phrase. A supplementary plot of t̂ for

beat 8 to 16 is provided in Figure 3.7.

In section D, the d̂ of both Aas and Arm act correspondingly to the crescendo mark-

ers which are placed from beat 15 to 20 (marked with yellow box). For t̂ 00, the phrasing



behavior of Arm towards the middle of the section (the ending of first sub phrase and the

beginning of the second phrase, from beat 9 to 14) occurs similarly as to section C.

Section A3 & A4

In Figure 3.8 and Figure 3.9, section A3 and A4 are plotted against {d̂, t̂ 00} respectively.

The effect of the diminuendo mark in the beginning of the section is not prominent across

both groups as one can observe in Figure 3.8. On beats 9 and 10 (section A4), the effect of

the crescendo mark does not appear in Figure 3.9. The drastic fluctuation in t̂ 00 towards the

end of section A4(beat 14 to 21, marked by yellow box) shared by both Aas and Arm shows

the agreement between performers on how to approach the sub phrase attached to the main



phrase in this section. The d̂ of both groups are similar within the group but apart from the

other group.

Section A5

The last section of op.63 no.3 is plotted in Figure 3.10 against {d̂, t̂ 00}. In A5, both groups

agreed on building up the dynamics toward the end of the piece. The fluctuation in t̂ 00 at the

end of the piece has the same shape as the fluctuation at the end of A4. In fact, the rhythmic

patterns are the same between the two endings of A4 and A5.



Summary

In general, the d̂ of Aas and Arm are similar within the group but deviate from the other

group. The inspection of these two groups of performers shows that the dynamics markings

of this piece are often violated by these four performers. From the view of the two groups,

it could be said that Arm uses more timing variations than Aas , as it can be seen in the

plots that the magnitudes of t̂ 00 of Arm are stronger than Aas a majority of the time. These

variations in timings, are sometimes reflected in the emphasis of short phrases which, on

the contrary, do not appeared in Aas.


By comparing t̂ and t̂ 00, the observation suggests that t̂ 00 is highly connected to the

behavior of t̂ but is able to exaggerate the subtleties of performances, thus making it a

better discriminator between performances.

3.1.2.2 Similarity Measurement

Since there is no ground truth to compare with the similarity measures obtained in Sec-

tion 2.3, a qualitative approach is adopted to examine the results. Cases with significant

connections are examined in the following discussions.

The Hatto Hoax

The strongest similarity between pairs of performers is the pair consisting of Hatto and

Indjic. In fact, their beat-level features are almost identical to each other, which is not

a surprising discovery since it was already discussed in (Cook and Sapp, 2007). It was

reported and confirmed that the recordings in the mazurka collection performed by Hatto

were actually performed by Indjic. This finding in the similarity measures does not pro-

vide any new insight about the style of performers, but rather a sanity check to see if the

approaches should reflect any basic numerical relations between performances.

Vladimir Ashkenazy and Tatiana Shebanova

The other pair in interest is Ashkenazy and Shebanova, which is denoted by Aas. In fact,

they both returned as the nearest performance to each other in op. 24, no. 2 and op.


Table 3.2: Comparison between Ashkenazy and Shebanova from a historical perspectiveLive Date Nationality Institutional Education

Vladimir Ashkenazy 1937 ~ Russian Central Music School/ Moscow ConservatoryTatiana Shebanova 1953 ~ 2011 Russian Central Music School/ Moscow Conservatory

Table 3.3: Comparison between Rubinstein and Milkina from a historical perspectivesLive Date Nationality Institutional Education

Arthur Rubinstein 1887 ~ 1982 Polish NoneNina Milkina 1919 ~ 2006 Russian None

63, no. 3, two queries out of five in the query/search task(Section 2.3). Aas is picked

also for historical reasons. In Table 3.2 , their relationships in historical perspective are

provided. Although they had their education in different eras, institutional influences may

have still had an impact in shaping their performance style as suggested by the similarity

measurements.

Arthur Rubinstein and Nina Milkina

The pair Arm has stronger similarity measures than Aas. This pair does not have obvious

historical relations between them. Their historical backgrounds are provided in Table 3.3

. No obvious connection could be ascertained given their historical backgrounds. Their

performance of op.63 no.3 is compared to Aas using d̂ and t̂ 00 and is discussed in Chapter 3.

Hierarchical clustering using Hmp

Since Hmp contains pair-wise similarity measurements for the performers included, hier-

archical clustering could be applied to examine possible groupings of the performers. In


Figure 3.11 , a hierarchical clustering using complete-algorithm and a cut-off of 7.5 is dis-

played. Hatto was excluded since her presence blocked Indjic from other performers. It

could be seen that both Aas and Arm are grouped together under this settings. Other groups

were also formed from the clustering such as Lushtak and Fliere, Uninsky and Magaloff,

Smith and Biret. Among these formed groups, Uninsky and Magaloff are of particular in-

terest because Uninsky is reported as “ ... greatly reminiscent of Nikita Magaloff”. But due

to the scope of this thesis, the groups beside Aas and Arm were not studied.

Some may still argue that these findings were rather arbitrary and the relationship be-

tween performers were superimposed to force the rationale of the findings. Also this sim-

ilarity measurement is sensitive to the feature set chosen for the nearest neighbor search,

two examples using normalized timing features {t̂, t̂ 0, ˆt 00} and {t̂} are shown in Figure 3.12

and Figure 3.13 . Dominant pairs of peaks such as Arm still hold in these two examples,

but the distribution of similarity values between pairs of performers changed to a certain

degree. Although not explored in this thesis, some tests on the query/search task showed

that the similarity measurement is also sensitive to the metric used.

3.1.2.3 Feature Refinement

The performance of query/return task (2.2.1) using the refined features dropped from 97%

to 89%, and further to 44% if using the average and standard deviations of the coefficients

(2.4.2).


Figure 3.11: Hierarchical clustering from Hmp


Figure 3.12: Normalized Hmp using {t̂, t̂ 0, ˆt 00}

Figure 3.13: Normalized Hmp using {t̂}


The degradation of performance on the query/return task based on results from fea-

ture refinement reflects the fact that the current approach is still far from expectation. The

attempt at extracting generalizable discriminators from only two groups is obviously too

optimistic. Despite these evident shortcomings, the feature refinement process proposed

in this thesis still provides a framework for future development. In the absence of a suf-

ficient amount of training samples, this framework provides a way to formalize the study

of pianist style, that allows experimentation with different algorithms during various stages

of the analysis. By observing Figure 2.10, Figure 2.11 and Figure 2.12, and comparing

the original curves to the reconstructed curves with the fitted coefficients, one explanation

for the degradation of query/return task could be deduced. For d̂ in op.63 no.3, all recon-

structed curves of B1 and B2 from both groups are very close to each other compared to

other sections (can be observed in Figure 2.10 and Figure 2.12), but that does not reflect the

difference between Aas and Arm when the original curves are compared. This agreement

between certain segments with fitted coefficients might be the reason for the degradation of

performance on the query/return task. Further discussion is provided in the next section.

3.1.3 Results

The goal of this thesis is to devise a framework for quantifying performer styles given

audio recordings and construct piece-invariant features. The results of the thesis could be

discussed in two parts; framework and results.


The framework proposed in this thesis proved to be robust enough for investigating

the implicit structures in the performance space defined by features extracted from audio

recordings. The results of the approaches could be assessed either qualitatively or quan-

titatively. The steps taken in the proposed framework avoid the problem of transforming

performances of different pieces of various lengths into feature vectors having the same

dimensions. By the search/query task implemented in Section 2.3, the performance fea-

ture space of each piece is explored independently and the distances between performances

are summarized to form an abstract describing how performers are similar to each other

given the recordings. Although the framework is intended for eventual use of unsuper-

vised machine learning techniques, human judgements were necessary for compensating

the minimal size of the dataset available.

In comparison to the scape plot representation proposed in (Sapp, 2007, 2008), the

similarity measures devised in Section 2.3 do not have the capability of showing different

time-scale relationships between different performances of the same piece. The similarity

measures enable the visualization of relations between performers across all pieces under

consideration.

In regard to the study of pianist style, one assertion that could be made from this thesis

is that there are still no generalizable rules or features that could separate performers from

one another. From 3.1.2.1, it could be concluded that even similar performances disagree

with each other in many aspects, and differences between groups of performers are not


consistent across a single piece. It could be proposed that the differences between groups

do not transfer across different pieces. This reasoning is in line with the argument made

in (Rink et al., 2011), which suggested that the connection between performance surface

events and the underlying structure is neither straight forward nor simple. Arguments and

viewpoints from (Cook and Everist, 1999; Shaffer, 1981; Sloboda, 1985) provide some

insights into the discussion about using performance surface events as features, which is

what has been done in this thesis.

From the view point of music theory (Cook and Everist, 1999) and psychology (Slo-

boda, 1985), both suggested that the structure embedded in the composition plays a very

important role in the relationship between the composition and the performance. Internally

(within a performance) speaking, local decisions have to be made to solve problems de-

rived from physical constraints of instruments or note arrangements as the piece unfolds

itself, while in the same time, decisions on higher-level structures are also made to reflect

the architecture of the composition perceived by the performer. From an external view

(comparing performance to another performance), the composition itself has structural am-

biguities since groups, patterns or hierarchies could be formed by various musical units

(melodies, harmonies, rhythm, note sets ... etc ) and their combinations, and these ambi-

guities provide different choices for performance interpretations. Thus at the instance of a

performance, both hierarchical and temporal decisions are made to shape the outcome of

the performance.


To link the above discussions to the study of pianist style, it could be said that the “style”

of a pianist could be realized in any of the aspects mentioned above, from local decisions

to structural interpretations, from dynamical decisions to hierarchical considerations. Thus

the “style” in this context should be the attributes which are relatively consistent between

performances.

Put the approaches taken in this thesis into the context discussed in above paragraphs,

beat-level features (Section 2.2) reflected only the local decisions made dynamically, then

through the similarity measurements and feature refinement (Section 2.3 and Section 2.4),

higher level features summarizing the local decisions based on segmentation from form

analysis are extracted from beat-level features. The poor performance of the refined fea-

tures could then be explained as the following; the summarization of local decisions did

not extract structural or hierarchical information about the performance but instead lost in-

formation along the process of refinement. The choice of using segmentation from form

analysis also ignored the fact that different performers may have different perceptions of

the structure of the piece, thus making the summarization compromised with irrelevant in-

formation. From an information retrieval view, it is inevitable that everything has to be

built from raw observations, and since in this thesis beat-level features are fairly effective,

it is reasonable to continue the current path and adds more emphasis on how to combine

multi-level features. The other issue in the approaches taken is the lack of consideration

about temporal evolution, or how the performance developed dynamically as it unfolds.


Measurements of temporal evolution should include both indications of how the same mu-

sical units are treated given different contexts within the same piece, and how the use of

performance gestures changes along the performance (Flossmann et al., 2010). Neither of

these indicators were considered in the features used in this study, so there is also no in-

formation about the difference between how each performance evolved. Difficulties facing

these considerations are the quantization of performance gestures, and the relationship of

such gestures to their corresponding musical events. Further discussion of such issues is

not in the scope of this thesis and could be found in (Goebl et al., 2004, 2005; Madsen and

Widmer, 2006; Pampalk et al., 2003; Widmer, 2001).

3.2 Future Works

Expansion of Dataset

Given the discussion in 3.1.1, it is crucial to expand the dataset to improve the validity of

the approaches proposed in this thesis. Some efforts have been made to improve automatic

beat detection given varying-tempo recordings (Grosche et al., 2010; Wu et al., 2011), but

in order to have robust estimations of expressive subtleties, semi-automatic approaches

with manual corrections are still needed at this stage.


Features

One crucial piano performance attribute that was missing throughout the thesis is articula-

tion. Not only is articulation a very important attribute that performers often manipulate

in order to achieve musical expression, but it might also be more relevant to personal style

than the other two attributes, “timing and dynamics”, considered in this thesis. Timing

and dynamics often reflect more about the structure of the piece, or phrase boundaries. In

the context of only the given audio recordings, extracting piano articulation is a non-trivial

topic and would require a background investigation into the acoustics of piano and accurate

automatic transcription for piano.

In terms of the timing and dynamics features, beat- or bar-level features are still too

short for effectively extracting performance information related to longer time-span. Thus

a more creative and novel way of deriving performance features from timing and dynamic

data should still be investigated in order to account for the nature of expressive perfor-

mance. Removing the average performance given the dataset from raw performance mea-

surements could be the starting point of emphasizing the influence of the performer and in

the same time minimizing the influence of the piece.


Structure of Performers

Given the results from this thesis, further studies could be conducted in two directions. The

first one will be taking the advantage of the results displayed in Figure 2.6, the summariza-

tion matrix of Hm. The second employs a deeper investigation into each pair of performers

to find possible groupings or structures within piece.

Feature Refinement

Although the goal of the feature refinement process is to make the framework flexible

enough to support improvements based on previous similarity measurement results, in the

context of this thesis only one iteration of feature refinement is achieved. The reason for

limiting the iteration is mainly because there are still other unexplored parameters for the

approaches taken to improve the original features. For example in this thesis, in the stage

of using polynomial fitting to describe the shape of the curves, one parameter that could

be explored is the segmentation used. Instead of using sections gathered from analysis of

the musical form, shorter segmentation such as 2 or 4 bars could be used. The advantage

of using shorter and unified segmentation is that it is possible to capture more details and

the weight of each section becomes uniform. To further this idea, analyzing the behavior

of shorter segments becomes the study of individual expressive gestures. A number of

previous studies have examined expressive gestures as in (Goebl et al., 2004; Grachten

et al., 2008; Grachten and Widmer, 2011; Madsen and Widmer, 2006; Rink et al., 2011;


Stamatatos, 2001; Stamatatos and Widmer, 2002; Widmer and Zanon, 2004). Based on

the work done in this thesis, instead of forming different expressive gesture clusters for

each of the performance as in (Rink et al., 2011) or the same gesture clusters for the whole

performance set (Goebl et al., 2004), expressive gestures from pairs of performers could be

generated based on results from Section 2.3 and Section 2.4. Another direction of dealing

with segmentation differently is to allow variable length segmentation. Criterion of phrase

boundaries based on performance variances (Todd, 1985) could be implemented to segment

each performance individually, then the length of segments and their distribution could be

used as another performance attribute along with the segments themselves.

3.3 Conclusion

Future directions stemming from this thesis could be summarized into two topics: the study

of expressive gestures, and the temporal structure of music performance. The study of

expressive gestures could be seen as the extension of feature refinement used in this thesis.

By finding meaningful performance features which could either discriminate performers

from each other or explain phenomena involved, the relationship between performance and

composition could be further exploited.

In parallel to the study of expressive gestures, the temporal structure of music per-

formance should also be examined. The connection between the score and the rendered

performance does not stay static, the performance evolves as the piece reveals itself to the


performers and the audience(Rink et al., 2011). In order to understand the mechanics of

music performance, it is crucial to take this time-variant nature of performance into ac-

count as discussed in Section 3.1. Information theoretic approaches primarily concerned

with how dynamic process could be described with quantifiable models (Abdallah and

Plumbley, 2009) are suitable for the study of temporal structures of music performance.

In conclusion, the main task facing the study of pianist style is how to separate perfor-

mance attributes from piece-wise attributes given audio recordings. Two issues have to be

dealt before this main task could be assessed; the extraction of performance measurements

(Grosche et al., 2010) and the groupings of pianist. The later issue was investigated in this

thesis. A framework is suggested to enable the comparison of pair-wise performer similar-

ities between different piece. Two things were being examined in this framework, the first

is the feature used to group performances by the same performer together, the second is the

relationships between performers revealed by the similarity measurements. Different sets

of features were derived from the low-level feature and evaluation results were reported.

The difference between groups of performers were examined qualitatively with musical

assessment.

Bibliography

Abdallah, S. and Plumbley, M. (2009). Information dynamics: patterns of expectation and

surprise in the perception of music. Connection Science, 21(2-3):89–117.

Bello, J. (2011). Measuring structural similarity in music. Audio, Speech, and Language

Processing, IEEE Transactions on, 19(7):2013–2025.

Bello, J., Grosche, P., Müller, M., and Weiss, R. (2012). Analyzing and visualizing repeti-

tive structures in music recordings.

Bello, J. P. (2009). Grouping recorded music by structural similarity. In In Proc. ISMIR,

pages 531–536.

Cook, N. and Everist, M. (1999). Rethinking music. Oxford University Press, USA.

Cook, N. and Sapp, C. (2007). Purely coincidental? joyce hatto and chopin’s mazurkas.

Royal Holloway, Univ. of London, London, UK.

65

BIBLIOGRAPHY 66

Dixon, S., Goebl, W., and Widmer, G. (2002). The performance worm: Real time visual-

isation of expression based on langner’s tempo-loudness animation. In Proceedings of

the international computer music conference (icmc 2002).

Flexer, A., Schnitzer, D., and Schlüter, J. (2012). A mirex meta-analysis of hubness in

audio music similarity. In 13th International Society for Music Information Retrieval

Conference(ISMIR).

Flossmann, S., Grachten, M., Niedermayer, B., and Widmer, G. (2010). The magaloff

project: An interim report. Journal of New Music Research, 39, no.4:369–377.

Flossmann, S., Grachten, M., and Widmer, G. (2009). Expressive performance rendering:

Introducing performance context. Proceedings of the SMC, pages 155–160.

Gabrielsson, A. (2003). Music performance research at the millennium. Psychology of

music, 31(3):221–272.

Goebl, W., Dixon, S., De Poli, G., Friberg, A., Bresin, R., and Widmer, G. (2005). ’sense’

in expressive music performance: Data acquisition, computational studies, and models.

Cirotteau, Damien (Ed.): Sound to Sense, Sense to Sound: A State-of-the-Art. Version

0.1. Logos Berlin.

Goebl, W., Pampalk, E., and Widmer, G. (2004). Exploring expressive performance trajec-

tories: Six famous pianists play six chopin pieces.

BIBLIOGRAPHY 67

Grachten, M., Goebl, W., Flossmann, S., and Widmer, G. (2008). Phase-plane visualiza-

tions of gestural structure in expressive timing. In In Proceedings of the fourth Confer-

ence on Interdisciplinary Musicology.

Grachten, M. and Widmer, G. (2007). Towards phrase structure reconstruction from ex-

pressive performance data. In Proceedings of the international conference on music

communication science, pages 56–59.

Grachten, M. and Widmer, G. (2011). Explaining musical expression as a mixture of basis

functions. In Proceedings of the 8th Sound and Music Computing Conference (SMC

2011).

Grosche, P., Müller, M., and Sapp, C. (2010). What makes beat tracking difficult? a

case study on chopin mazurkas. In Proceedings of the 11th International Conference on

Music Information Retrieval (ISMIR), Utrecht, Netherlands, pages 649–654.

Madsen, S. and Widmer, G. (2006). Exploring pianist performance styles with evolutionary

string matching. International Journal on Artificial Intelligence Tools, 15(04):495–513.

Nieto, O., Humphrey, E., and Bello, J. (2012). Compressing music recordings into au-

dio summaries. In 13th International Society for Music Information Retrieval Confer-

ence(ISMIR).

BIBLIOGRAPHY 68

Pampalk, E., Goebl, W., and Widmer, G. (2003). Visualizing changes in the structure of

data for exploratory feature selection. In Proceedings of the ninth ACM SIGKDD inter-

national conference on Knowledge discovery and data mining, pages 157–166. ACM.

Repp, B. (1990). Patterns of expressive timing in performances of a beethoven minuet by

nineteen famous pianists. The Journal of the Acoustical Society of America, 88:622.

Repp, B. (1995). Quantitative effects of global tempo on expressive timing in music per-

formance: Some perceptual evidence. Music Perception, pages 39–57.

Repp, B. (1996). The dynamics of expressive piano performance: Schumann’s ”träumerei”

revisited. The Journal of the Acoustical Society of America, 100:641.

Repp, B. (1997). The aesthetic quality of a quantitatively average music performance: Two

preliminary experiments. Music Perception, pages 419–444.

Repp, B. (1998a). The detectability of local deviations from a typical expressive timing

pattern. Music Perception, pages 265–289.

Repp, B. (1998b). A microcosm of musical expression. i. quantitative analysis of pianists’

timing in the initial measures of chopin’s etude in e major. The Journal of the Acoustical

Society of America, 104:1085.

BIBLIOGRAPHY 69

Repp, B. (1998c). Variations on a theme by chopin: Relations between perception and

production of timing in music. Journal of Experimental Psychology: Human Perception

and Performance, 24(3):791.

Rink, J., Spiro, N., and Gold, N. (2011). Motive, gesture, and the analysis of performance.

New Perspectives on Music and Gesture, pages 267–92.

Sapp, C. (2007). Comparative analysis of multiple musical performances. In Proceedings

of the International Conference on Music Information Retrieval (ISMIR), pages 497–

500.

Sapp, C. (2008). Hybrid numeric/rank similarity metrics for musical performance analysis.

In Bello, J. P., Chew, E., and Turnbull, D., editors, ISMIR, pages 501–506.

Schnitzer, D., Flexer, A., Schedl, M., and Widmer, G. (2011). Using mutual proximity

to improve content-based audio similarity. In Proc. of the 12th Int. Conf. for Music

Information Retrieval (ISMIR-2011).

Seashore, C. (1967). Psychology of music. Dover Publications.

Shaffer, L. H. (1981). Performances of chopin, bach, and bartok: Studies in motor pro-

gramming. Cognitive Psychology, 13(3):326–376.

Sloboda, J. (1983). The communication of musical metre in piano performance. The

quarterly journal of experimental psychology, 35(2):377–396.

BIBLIOGRAPHY 70

Sloboda, J. (1985). The musical mind : the cognitive psychology of music. Clarendon

Press, Oxford.

Sloboda, J. A. (2000). Individual differences in music performance. Trends in Cognitive

Sciences, 4(10):397 – 403.

Stamatatos, E. (2001). A computational model for discriminating music performers. In

In Proceedings of the MOSART Workshop on Current Research Directions in Computer

Music, pages 65–69.

Stamatatos, E. and Widmer, G. (2002). Music performer recognition using an ensemble of

simple classifiers. In PROCEEDINGS OF THE 15TH EUROPEAN CONFERENCE ON

ARTIFICIAL INTELLIGENCE (ECAI’2002, pages 335–339. IOS Press.

Todd, N. (1985). A model of expressive timing in tonal music. Music Perception, pages

33–57.

Widmer, G. (2001). Machine discoveries: A few simple, robust local expression principles.

Journal of New Music Research, 31:37–50.

Widmer, G., Dixon, S., Goebl, W., Pampalk, E., and Tobudic, A. (2003). In search of the

horowitz factor. AI Magazine, 24(3):111.

Widmer, G. and Goebl, W. (2004). Computational models of expressive music perfor-

mance: The state of the art. Journal of New Music Research, 33:203–216.

BIBLIOGRAPHY 71

Widmer, G. and Zanon, P. (2004). Automatic recognition of famous artists by machine.

In IN PROCEEDINGS OF THE 16TH EUROPEAN CONFERENCE ON ARTIFICIAL

INTELLIGENCE(ECAIÕ2004.

Wu, F., Lee, T., Jang, J., Chang, K., Lu, C., and Wang, W. (2011). A two-fold dynamic

programming approach to beat tracking for audio music with time-varying tempo. In

Proc. ISMIR.

Date post:	28-Oct-2019
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Quantifying Pianist Style - An investigation of performer...

Documents