+ All Categories
Home > Documents > BASS PLAYING STYLE DETECTION BASED ON HIGH …ismir2010.ismir.net/proceedings/ismir2010-18.pdf ·...

BASS PLAYING STYLE DETECTION BASED ON HIGH …ismir2010.ismir.net/proceedings/ismir2010-18.pdf ·...

Date post: 24-Jul-2018
Category:
Upload: hanguyet
View: 227 times
Download: 3 times
Share this document with a friend
6
BASS PLAYING STYLE DETECTION BASED ON HIGH-LEVEL FEATURES AND PATTERN SIMILARITY Jakob Abeßer Fraunhofer IDMT Ilmenau, Germany ([email protected]) Paul Br¨ auer Piranha Musik & IT Berlin, Germany Hanna Lukashevich, Gerald Schuller Fraunhofer IDMT Ilmenau, Germany ABSTRACT In this paper, we compare two approaches for automatic classification of bass playing styles, one based on high- level features and another one based on similarity mea- sures between bass patterns. For both approaches, we com- pare two different strategies: classification of patterns as a whole and classification of all measures of a pattern with a subsequent accumulation of the classification results. Fur- thermore, we investigate the influence of potential tran- scription errors on the classification accuracy, which tend to occur when real audio data is analyzed. We achieve best classification accuracy values of 60.8% for the feature-based classification and 68.5% for the classifica- tion based on pattern similarity based on a taxonomy con- sisting of 8 different bass playing styles. 1. MOTIVATION Melodic and harmonic structures were often studied in the field of Music Information Retrieval. In genre discrimi- nation tasks, however, mainly timbre-related features are somewhat satisfying to the present day. The authors as- sume, that bass patterns and playing styles are missing complementaries. Bass provides central acoustic features of music as a social phenomenon, namely its territorial range and simultaneous bodily grasp. These qualities come in different forms, which are what defines musical genres to a large degree. Western popular music with its world- wide influence on other styles is based upon compositional principles of its classical roots, harmonically structured around the deepest note. African styles also often use tonal bass patterns as ground structure, while Asian and Latin American styles traditionally prefer percussive bass sounds. In contrast to the melody (which can easily be interpreted in “cover versions” of different styles), the bass pattern most often carries the main harmonic information as well as a central part of the rhythmic and structural information. A more detailed stylistic characterization of the bass in- strument within music recordings will inevitably improve Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. c 2010 International Society for Music Information Retrieval. classification results in genre and artist classification tasks. Within the field of Computational Ethnomusicology (CE) [19], the automatic detection of the playing styles of the participating instruments such as the bass constitutes a meaningful approach to unravel the fusion of different mu- sical influences of a song. This holds true for many con- temporary music genres and especially for those of a global music background. The remainder of this paper is organized as follows. Af- ter outlining the goals and challenges in Sec. 2 and Sec. 3, we provide a brief overview over related work in Sec. 4. In Sec. 5, we introduce novel high-level features for the analysis of transcribed bass lines. Furthermore, we pro- pose different classification strategies, which we apply and compare later in this paper. We introduce the used data set and describe the performed experiments in Sec. 6. After the results are discussed, we conclude this paper in Sec. 7. 2. GOALS The goal of this publication is to compare different ap- proaches for automatic playing style classification. For this purpose, we aim at comparing different classification approaches based on common statistical pattern recogni- tion algorithms as well as on the similarity between bass patterns. In both scenarios, we want to investigate the ap- plicability of a aggregation classification based on the sub- patterns of an unknown pattern. 3. CHALLENGES The extraction of score parameters such as note pitch and onset from real audio recordings requires reliable auto- matic transcription methods, which nowadays are still error- prone when it comes to analyzing multi-timbral and poly- phonic audio mixtures [4, 13]. This drawback impedes a reliable extraction of high-level features that are designed to capture important rhythmic and tonal properties for a description of an instrumental track. This is one problem addressed in our experiments. Another general challenge is the translation of musical high-level terms such as syn- copations, scale, or pattern periodicity into parameters that are automatically retrievable by algorithms. Information regarding micro-timing, which is by the nature of things impossible to encompass in a score [9], is left out. 93 11th International Society for Music Information Retrieval Conference (ISMIR 2010)
Transcript

BASS PLAYING STYLE DETECTION BASED ON HIGH-LEVELFEATURES AND PATTERN SIMILARITY

Jakob AbeßerFraunhofer IDMTIlmenau, Germany

([email protected])

Paul BrauerPiranha Musik & IT

Berlin, Germany

Hanna Lukashevich, Gerald SchullerFraunhofer IDMTIlmenau, Germany

ABSTRACT

In this paper, we compare two approaches for automaticclassification of bass playing styles, one based on high-level features and another one based on similarity mea-sures between bass patterns. For both approaches, we com-pare two different strategies: classification of patterns as awhole and classification of all measures of a pattern with asubsequent accumulation of the classification results. Fur-thermore, we investigate the influence of potential tran-scription errors on the classification accuracy, which tendto occur when real audio data is analyzed. We achieve bestclassification accuracy values of 60.8% for thefeature-based classification and 68.5% for the classifica-tion based on pattern similarity based on a taxonomy con-sisting of 8 different bass playing styles.

1. MOTIVATION

Melodic and harmonic structures were often studied in thefield of Music Information Retrieval. In genre discrimi-nation tasks, however, mainly timbre-related features aresomewhat satisfying to the present day. The authors as-sume, that bass patterns and playing styles are missingcomplementaries. Bass provides central acoustic featuresof music as a social phenomenon, namely its territorialrange and simultaneous bodily grasp. These qualities comein different forms, which are what defines musical genresto a large degree. Western popular music with its world-wide influence on other styles is based upon compositionalprinciples of its classical roots, harmonically structuredaround the deepest note. African styles also often use tonalbass patterns as ground structure, while Asian and LatinAmerican styles traditionally prefer percussive bass sounds.In contrast to the melody (which can easily be interpretedin “cover versions” of different styles), the bass patternmost often carries the main harmonic information as wellas a central part of the rhythmic and structural information.

A more detailed stylistic characterization of the bass in-strument within music recordings will inevitably improve

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page.

c© 2010 International Society for Music Information Retrieval.

classification results in genre and artist classification tasks.Within the field of Computational Ethnomusicology (CE)[19], the automatic detection of the playing styles of theparticipating instruments such as the bass constitutes ameaningful approach to unravel the fusion of different mu-sical influences of a song. This holds true for many con-temporary music genres and especially for those of a globalmusic background.

The remainder of this paper is organized as follows. Af-ter outlining the goals and challenges in Sec. 2 and Sec. 3,we provide a brief overview over related work in Sec. 4.In Sec. 5, we introduce novel high-level features for theanalysis of transcribed bass lines. Furthermore, we pro-pose different classification strategies, which we apply andcompare later in this paper. We introduce the used data setand describe the performed experiments in Sec. 6. Afterthe results are discussed, we conclude this paper in Sec. 7.

2. GOALS

The goal of this publication is to compare different ap-proaches for automatic playing style classification. Forthis purpose, we aim at comparing different classificationapproaches based on common statistical pattern recogni-tion algorithms as well as on the similarity between basspatterns. In both scenarios, we want to investigate the ap-plicability of a aggregation classification based on the sub-patterns of an unknown pattern.

3. CHALLENGES

The extraction of score parameters such as note pitch andonset from real audio recordings requires reliable auto-matic transcription methods, which nowadays are still error-prone when it comes to analyzing multi-timbral and poly-phonic audio mixtures [4, 13]. This drawback impedes areliable extraction of high-level features that are designedto capture important rhythmic and tonal properties for adescription of an instrumental track. This is one problemaddressed in our experiments. Another general challengeis the translation of musical high-level terms such as syn-copations, scale, or pattern periodicity into parameters thatare automatically retrievable by algorithms. Informationregarding micro-timing, which is by the nature of thingsimpossible to encompass in a score [9], is left out.

93

11th International Society for Music Information Retrieval Conference (ISMIR 2010)

4. PREVIOUS APPROACHES

Within the last years, the use of score-based high-levelfeatures became more popular for tasks such as automaticgenre classification. To derive a score-based representationfrom real audio recordings, various automatic transcrip-tion algorithms have been proposed so far. The authorsof [18], [13], and [4] presented algorithms to transcribebass lines. Musical high-level features allow to capturedifferent properties from musical domains such as melody,harmony, and rhythm [1,3,10,11]. Bass-related audio fea-tures we used for genre classification in [18], [1], and [17].

An excellent overview over existing approaches for theanalysis of expressive music performance and artist-specific playing styles is provided in [23] and [24]. In [7],different melodic and rhythmic high-level features are ex-tracted before the performed melody is modeled with anevolutionary regression tree model. The authors of [15]also used features derived from the onset, inter-onset-interval and loudness values of note progression to quan-tify the performance style of piano players in terms of theirtiming, articulation and dynamics. To compare differentperformances in terms of rhythmic and dynamic similarity,the authors of [14] proposed a numerical method based onthe correlation at different timescales.

5. NOVEL APPROACH

5.1 Feature extraction

In this paper, we use 23 multi-dimensional high-level fea-tures that capture various musical properties for the tonaland rhythmic description of bass lines. The feature vec-tor consists of 136 dimensions in total. Thebasic noteparameters, which we investigate in this paper, are theabsolute pitchΘP , the loudnessΘV , the onsetΘ[s]

O and

Θ[M ]O , and the durationΘ[s]

D andΘ[M ]D of each note. The

indices [s] and [M] indicate that both the onset and the du-ration of a note can be measured in seconds as well as inlengths of measures. All these parameters are extractedfrom symbolic MIDI files by using the MIDI-Toolbox forMATLAB [5].

Afterwards, furtheradvanced note parametersare de-rived before features are extracted. From the pitch dif-ferences∆ΘP between adjacent notes in semitones, weobtain vectors containing the interval directions∆Θ

(D)P

(being either ascending, constant, or descending), and thepitch differences in terms of functional interval types∆Θ

(F )P . To derive the functional type of an interval, we

map its size to a maximum absolute value of 12 semitonesor one octave by using the modulo 12 operation in case itis larger than one octave upwards or downwards (12 semi-tones). Then each interval is assigned to a function intervaltype (prime, second, third etc.) according to well knownmusic principles. In addition to the high-level features pre-sented in [1], we use various additional features related totonality and rhythm in this paper, which are explained inthe following subsections.

Features related to tonalityWe derive features to measure if a certainscaleis appliedin a bass pattern. Therefore, we take different binary scaletemplates for natural minor (which includes the major scale),harmonic minor, melodic minor, pentatonic minor (subsetof natural minor which also includes the pentatonic majorscale), blues minor, whole tone, whole tone half tone, ara-bian, minor gypsy and hungarian gypsy [21] into account.Each scale template consists of 12 values representing allsemitones of an octave. The value 1 is set for all semi-tones that are part of the scale, the value 0 for those thatare not. All notes within a given pattern, which are relatedto a certain scale, are accumulated by adding their normal-ized note loudness valuesΘV /ΘV,max with ΘV,max be-ing the maximum note loudness in a pattern. The same isdone for all notes, which are not contained in the scale.The ratio of both sums is calculated over all investigatedscales and over all 12 possible cyclic shifts of the scaletemplate. This cyclic shift is performed to cope with eachpossible root note position. The maximum ratio value overall shifts is determined for each scale template and used asa feature value, which measures the presence of each con-sidered scale. We obtain the relative frequenciespi of allpossible values in the vector that contains the interval di-rections (∆Θ

(D)P ) as well as the vector that contains the

functional interval types (∆Θ(F )P ) and use them as fea-

tures to characterize the variety of different pitch transi-tions between adjacent notes.

Features related to rhythmSyncopationembodies an important stylistic means in dif-ferent music genres. It represents the accentuation on weakbeats of a measure instead of an accentuation on a neigh-bored strong beat that usually would be emphasized. Todetect syncopated note sequences within a bass-line, weinvestigate different temporal grids in terms of equidis-tant partitioning of single measures. For instance, for aneight-note grid, we map all notes inside a measure towardsone of eight segments according to their onset position in-side the measure. In a44 time signature, these segmentscorrespond to all 4 quarter notes (on-beats) and their off-beats in between. If at least one note is mapped to a seg-ment, it is associated with the value 1, otherwise with 0.For each grid, we count the presence of the following seg-ment sequences - (1001), (0110), (0001), or (0111). Thesesequences correspond to sequences of alternating on-beatand off-beat accentuations that are labeled as syncopations.The ratios between the number of syncopation sequencesand the number of segments are applied as features for therhythmical grids 4, 8, 16, and 32.

We calculate the ratioΘ(M)D (k)/∆Θ

(M)O (k) between the

duration value of the k-th note in measure lengths and theinter-onset-interval between the k-th note and its succeed-ing note. Then we derive the mean and the variance ofthis value over all notes as features. A high or low meanvalue indicates whether notes are playedlegato or stac-cato. The variance over all ratios captures the variationbetween these two types ofrhythmic articulationwithin a

94

11th International Society for Music Information Retrieval Conference (ISMIR 2010)

given bass pattern. To measure if notes are mostly playedon on-beatsor off-beats, we investigate the distribution ofnotes towards the segments in the rhythmical grids as ex-plained above for the syncopation feature. For example,the segments 1, 3, 5, and 7 are associated to on-beat posi-tions for an eight-note grid and a44 time signature. Again,this ratio is calculated over all notes and mean and vari-ance are taken as feature values. As additional rhythmicproperties, we derive the frequencies of occurrence of allcommonly used note lengths from half notes to 64th notes,each in its normal, dotted, and triplet version. In addition,the relative frequencies from all note-note, note-break andbreak-note sequences over the complete pattern are takenas features.

5.2 Classification based on statistical patternrecognition

We investigate the applicability of the well-established Sup-port Vector Machines (SVM) using the Radial Basis Func-tion (RBF) as kernel combined with a preceding featureselection using the Inertia Ratio Maximization using Fea-ture Space Projection (IRMFSP) as a baseline experiment.The feature selection is applied to choose the most discrim-inative features and thus to reduce the dimensionality ofthe feature space prior to the classification. Therefore, wecalculate the high-level features introduced in 5.1 for eachbass pattern, which results in an 136 dimensional featurespace. Details on both the SVM and the IRMFSP can befound for instance in [1].

5.3 Classification based on pattern similarity

In this paper, we apply 2 different kinds of pattern similar-ity measures,pairwise similarity measuresandsimilaritymeasures based on the Levenshtein distance. To computesimilarity values between patterns, the values of the on-set vectorΘ[M ]

O and the absolute pitch vectorΘP are sim-ply converted into character strings. In the latter case, weinitially subtract the minimum value ofΘP for each pat-tern separately to remain independent from pitch transposi-tions. This approach can of course be affected by potentialoutliers, which do not belong to the pattern.

5.3.1 Similarity measures based on the Levenshteindistance

The Levenshtein distanceDL offers a metric for the com-putation of the similarity of strings [6]. It measures theminimum number of edits in terms of insertions, deletions,and substitutions, which are necessary, to convert one stringinto the other. We use the Wagner-Fischer algorithm [20]to computeDL and derive a similarity measureSL be-tween two strings of lengthl1 andl2 from

SL = 1 − DL/DL,max . (1)

The lengthsl1 and l2 correspond to the number of notesin both patterns.DL,max equals the maximum value ofl1 and l2. In the experiments, we use the rhythmic simi-larity measureSL,R and the tonal similarity measureSL,T

derived from the Levenshtein distance between the onsetΘ

[M ]O and the pitchΘP as explained in the previous sec-

tion. Furthermore, we investigate

SL,RT,Max =

{SL,R , SL,R ≥ SL,T

SL,T , SL,T > SL,R

(2)

and

SL,RT,Mean =1

2(SL,R + SL,T ) (3)

by using the maximum and the arithmetic mean betweenof SL,R andSL,T as aggregated similarity measures.

5.3.2 Pairwise similarity measures

In general, we derive a pairwise similarity measure

SP =1

2

(Nn,m

Nn

+Nm,n

Nm

)

(4)

Nn,m denotes the number of notes in patternn, for whichat least one note in patternm exists that have the sameabsolute pitch value (for the similarity measureSP,T ) oronset value (for the similarity measureSP,R). Nm,n isdefined vice versa. By applying the constraint that bothonset and absolute pitch need to be equal in Eq. 4, weobtain the measureSP,RT . Furthermore, we derive the ag-gregated similarity measuresSP,RT,Max andSP,RT,Mean

analogous to Eq. 2 and Eq. 3.

6. EVALUATION

6.1 Data-set

We assembled a novel dataset from instructional bass lit-erature [12, 21], which consists of bass patterns from the8 genresSwing(SWI), Funk(FUN), Blues(BLU), Reggae(REG),Salsa & Mambo(SAL), Rock(ROC),Soul & Mo-town(SOU) andAfrica (AFR), a rather general term whichhere signifies Sub-Saharan Popular Music Styles [16]. Foreach genre, 40 bass-lines of 4 measure length have beenstored as symbolic audio data as MIDI files. Initial listen-ing tests revealed that in this data set, which was assem-bled and categorized by professional bass players, a certainamount of stylistic overlap and misclassification betweengenres as for instance Blues and Swing or Soul & Motownand Funk occurs. The overlap is partly inherent to the ap-proach of the data sets, which treat all examples of a style(e.g. Rock) as homogenous although the sets include typ-ical patterns of several decades. In some features, earlyRock patterns might resemble early Blues patterns morethan they resemble late patterns of their own style [22].Thus, the data set will be extended further and revised byeducated musicologists for future experiments.

6.2 Experiments & Results

6.2.1 Experiment 1 - Feature-based classification

As described in Sec. 5.2, we performed a baseline experi-ment that consists of IRMFSP for chosing the bestN = 80features and the SVM as classifier. The parameterN has

95

11th International Society for Music Information Retrieval Conference (ISMIR 2010)

AFR BLU FUN MOT REG ROC SAL SWI

SWI

SAL

ROC

REG

MOT

FUN

BLU

AFR 66.2 5.9 2 8.8 10.8 0 6.4 0

0 46.1 0 22.4 0 11.8 3.9 15.7

7.4 4.2 72.8 1.4 10.6 3.6 0 0

2 2.9 6.9 51.6 4.6 21.8 10.3 0

21 0 4.2 10.6 49.4 8.3 6.5 0

2.6 0 0 10.7 0 70.4 16.2 0

25 0 1.2 5.6 6.7 14 47.5 0

0 17.6 0 0 0 0 0 82.4Bas

s P

layi

ng S

tyle

(co

rrec

t)

Bass Playing Style (classified)

66.2 5.9 2 8.8 10.8 0 6.4 0

0 46.1 0 22.4 0 11.8 3.9 15.7

7.4 4.2 72.8 1.4 10.6 3.6 0 0

2 2.9 6.9 51.6 4.6 21.8 10.3 0

21 0 4.2 10.6 49.4 8.3 6.5 0

2.6 0 0 10.7 0 70.4 16.2 0

25 0 1.2 5.6 6.7 14 47.5 0

0 17.6 0 0 0 0 0 82.4

66.2 5.9 2 8.8 10.8 0 6.4 0

0 46.1 0 22.4 0 11.8 3.9 15.7

7.4 4.2 72.8 1.4 10.6 3.6 0 0

2 2.9 6.9 51.6 4.6 21.8 10.3 0

21 0 4.2 10.6 49.4 8.3 6.5 0

2.6 0 0 10.7 0 70.4 16.2 0

25 0 1.2 5.6 6.7 14 47.5 0

0 17.6 0 0 0 0 0 82.4

66.2 5.9 2 8.8 10.8 0 6.4 0

0 46.1 0 22.4 0 11.8 3.9 15.7

7.4 4.2 72.8 1.4 10.6 3.6 0 0

2 2.9 6.9 51.6 4.6 21.8 10.3 0

21 0 4.2 10.6 49.4 8.3 6.5 0

2.6 0 0 10.7 0 70.4 16.2 0

25 0 1.2 5.6 6.7 14 47.5 0

0 17.6 0 0 0 0 0 82.4

Figure 1. Exp. 1 - Confusion matrix for the feature-basedpattern-wise classification (all values given in %). Meanclassification accuracy is 60.8% with a standard deviationof 2.4%.

been determined to perform best in previous tests on thedata-set. A 20-fold cross validation was applied to de-termine the mean and standard deviation of the classifi-cation accuracy. For a feature extraction and classificationbased on complete patterns, we achieved 60.8% of accu-racy with a standard deviation of 2.4%. The correspond-ing confusion matrix is shown in Fig. 1. It can be seen,that best classification results were achieved for the stylesFunk, Rock, and Swing. Strong confusions between Bluesand Motown respectively Swing, Motown and Rock, Reg-gae and Africa as well as between Salsa and Africa canbe identified. These confusions support the musicologicalassessment of the data-set given in Sec. 6.1. In addition,they coincide with historical relations between the styles inAfrica, the Caribbean, and Latin America, as well as rela-tions within North America as it is common musicologicalknowledge [8].

As a second classification strategy, we performed thefeature extraction and classification based on sub-patterns.Therefore, we divided each pattern within the test set intoN = 4 sub-patterns of one measure length. It was en-sured, that no sub-patterns of patterns in the test set wereused as training data. After all sub-patterns were classi-fied, the estimated playing style for the corresponding testset pattern was derived from a majority decision over allsub-pattern classifications. In case of multiple winningclasses, a random decision was applied between the win-ning classes. For the accumulated measure-wise classifi-cation, we achieved only 56.4% of accuracy. Thus, thisapproach did not improve the classification accuracy. Weassume that the majority of the applied high-level featuresthat are based on different statistical descriptors (see Sec. 5.1for details), can not provide a appropriate characterizationof the sub-patterns, which themselves only consist of 6 to9 notes in average.

6.2.2 Experiment 2 - Pattern Similarity

This experiment is based on a leave-one-out cross-validation scheme and thus consists ofN = 320 evalu-ation steps according to the 320 patterns in the data-set.Within each evaluation step, the current patternPk is usedas test data while all remaining patternsPl with l 6= k areused as training data. We derive the class estimateck of

AFR BLU FUN MOT REG ROC SAL SWI

SWI

SAL

ROC

REG

MOT

FUN

BLU

AFR 57.4 2.1 6.4 17 6.4 0 8.5 2.1

4.2 50 4.2 18.8 2.1 6.3 4.2 10.4

4.4 6.7 62.2 11.1 2.2 6.7 4.4 2.2

0 0 0 95.1 0 0 2.4 2.4

4.7 0 7 11.6 65.1 7 2.3 2.3

0 4.7 0 14 0 69.8 2.3 9.3

6.8 4.5 4.5 6.8 4.5 0 68.2 4.5

0 12.5 0 7.5 0 0 0 80Bas

s P

layi

ng S

tyle

(co

rrec

t)

Bass Playing Style (classified)

Figure 2. Exp. 2 - Confusion matrix for the best similarity-based configuration (measure-wise classification using theSP,RT,Max similarity measure - all values given in %).Mean classification accuracy is 68.5% with a standard de-viation of 3.1%.

Pk from the class labelc of the best-fitting patternP as

ck = cl⇔ l = arg max

lSk,l (5)

with Sk,m representing the similarity measure betweenPk

andPm in the given case. As in Sec. 6.2.1, if multiplepatterns have the same (highest) similarity, we performa random decision among these candidates. This experi-ment is performed for all similarity measures introducedin Sec. 6.2.2.

Exp. 2a: Pattern-wise classification.The basic approachfor a pattern-based classification is to use each pattern of 4measures length as one item to be classified.

Exp. 2b: Accumulated measure-wise classification. Basspatterns are often structured in a way, that the measure ora part of the measure, which precedes the pattern repeti-tion, is often altered rhythmically or tonally and thus oftenvaries greatly from the pattern. These figures separating orintroducing pattern repetition are commonly referred to aspickupsor upbeats, meaning that they do not vary or over-lap the following pattern repetition which starts on the firstbeat of the new measure. A pattern-wise classification asdescribed above thus might overemphasize the differencebetween the last measure because the patterns are com-pared over their complete length. Hence, we investigateanother decision aggregation strategy in this experiment.

As described in Sec. 6.2.1, we divide each bass patterninto sub-patterns of one measure length each. Within eachfold k, we classify each sub-patternSPk,l of the currenttest patternPk separately. At the same time, we ensurethat only sub-patterns of the other patternsPi with i 6= kare used as training set for the current fold. To accumulatethe classification results in each fold, we add all similarityvaluesSk,l between each sub-patternSPk,l towards theirassigned winning pattern(s)Pk,l,win. The summation isdone for each of the 6 genres separately. The genre thatachieve the highest sum is considered as the winning genre.

As depicted in Fig. 3, the proposed accumulatedmeasure-wise classification strategy led to higher classifi-cation accuracy values (blue bars) in comparison to apattern-wise classification (red bars). This approach canbe generalized and adopted to patterns of arbitrary length.

96

11th International Society for Music Information Retrieval Conference (ISMIR 2010)

0

20

40

60

80

100

Me

an

accu

racy

S P,R S P

,T

S P,RT

S P,RT,Mean

S P,RT,Max S L

,R S L,T

S L,RT,Max

S L,RT,Mean

Accumulated measure−wise classification

Pattern−wise classification

Figure 3. Mean classification accuracy results for experi-ment 2

0 10 20 30 40 500

20

40

60

80

100

Percentage ε of transcription errors

Acc

urac

y

SP,R

SP,T

SP,RT,Max

Figure 4. Exp. 3 - Mean classification accuracy vs. per-centageε of pattern variation (dotted line - pattern-wisesimilarity, solid line - accumulated measure-wise similar-ity).

The similarity measureSP,RT,Max clearly outperforms theother similarity measures by over 10 percent points of ac-curacy. The corresponding confusion matrix is shown inFig. 2. We therefore assume that it is beneficial to use sim-ilarity information both based on pitch and onset similarityof bass patterns. For the pattern-wise classification, it canbe seen that similarity measures based on tonal similar-ity generally achieve lower accuracy results in comparisonto measures based on the rhythmic similarity. This mightbe explained by the frequently occurring tonal variation ofpatterns according to the given harmonic context such as acertain chord of a changed key in different parts of a song.The most remarkable result in confusion matrix is the veryhigh accuracy of 95.1% for the Motown genre.

6.2.3 Experiment 3 - Influence of pattern variations

For the extraction of bass-patterns from audio recordings,two potential sources of error exist. In most music gen-res, the dominant bass patterns are object of small vari-ations throughout a music piece. An automatic systemmight recognize the basic pattern or a variation of the basicpattern. Furthermore, automatic music transcription sys-tems are prone to errors in terms of incorrect pitch, onset,and duration values of the notes. Both phenomena directlyhave a negative effect on the computed high-level features.We therefore investigate the achievable classification accu-racy dependent on the percentage of notes with erroneousnote parameters.

We simulate the mentioned scenarios by manipulatinga random selection ofε percents of all notes from eachunknown pattern and varyε from 0% to 50%. The ma-

nipulation of a single note consists of either a modifica-tion of the onsetΘ[M ]

O by a randomly chosen difference

−0.25 ≤ ∆Θ(M)O ≤ 0.25 (which corresponds to a maxi-

mum shift distance of one beat for a44 time signature), amodification of the absolute pitchΘP by a randomly cho-sen difference−2 ≤ ∆ΘP ≤ 2 (which corresponds to amaximum distance of 2 semitones), or a simple deletion ofthe current note from the pattern. Octave pitch errors thatoften appear in automatic transcription algorithms were notconsidered because of the mapping of each interval to amaximum size of one octave as described in Sec. 5.1. In-sertions in terms of additional notes, which are not part ofthe pattern will be taken into account in future experiments.

As depicted in Fig. 4, the accuracy curve of the threedifferent pair-wise similarity measuresSP,R, SP,T andSP,RT,Max falls until about 40% for a transcription er-ror rate of 50% Interestingly, the pattern-wise classifica-tion based onSP,R seems to be more robust to transcrip-tion errors above 15% in comparison to the accumulatedmeasure-wise classification even though it has a lower ac-curacy rate for the assumption of a perfect transcription.

6.2.4 Comparison to the related work

The comparison of the achieved results to the related workis not directly feasible. On one side, it is caused by thefact, that different data sets have been utilized. Tsunooet al. [18] reported an accuracy of 44.8% for the GZTANdata set1 while using only bass-line features. On the otherside, the performance of only bass-line features was notevery time stated. The work of Tsuchihashi et al. [17]showed an improvement of classification accuracy from53.6% to 62.7% while applying bass-line features compli-mentary to other timbre and rhythmical features, but theresults of genre classification with only bass features werenot reported.

7. CONCLUSIONS & OUTLOOK

In this paper, different approaches for the automatic de-tection of playing styles from score parameters were com-pared. These parameters can be extracted from symbolicaudio data (e.g. MIDI) or from real audio data by means ofautomatic transcription. For the feature-based appraoch,a best result of 60.8% of accuracy was achieved using acombination of feature selection (IRMFSP) and classifier(SVM) and a pattern-wise classification. Regarding theclassification based on pattern similarity, we achieved68.5% of accuracy using the combined similarity measureSP,RT,Max and a measure-wise aggregation strategy basedon the classification of sub-patterns. The random baselineis 12.5%. This approach outperformed the common ap-proach to classify the complete pattern as once.

For analyzing real-world audio recordings, further mu-sical aspects such as micro-timing, tempo range, appliedplucking & expression styles [2], as well as the interac-

1 G. Tzanetakis and P. Cook. Musical genre classification of audiosignals. IEEE Transaction on Speech and Audio Processing, 10(5):293-302, 2002.

97

11th International Society for Music Information Retrieval Conference (ISMIR 2010)

tion with other participating instruments need to be incor-porated into a all-embracing style description of a specificinstrument in a music recording. The results of experiment4 emphasize the need for a well-performing transcriptionsystem for a high-level classification task such as playingstyle detection.

8. ACKNOWLEDGEMENTS

This work has been partly supported by the German re-search projectGlobalMusic2One2 funded by the FederalMinistry of Education and Research (BMBF-FKZ:01/S08039B). Additionally, the Thuringian Ministry ofEconomy, Employment and Technology supported this re-search by granting funds of the European Fund for Re-gional Development to the projectSongs2See3 , enablingtransnational cooperation between Thuringian companiesand their partners from other European regions.

9. REFERENCES

[1] J. Abeßer, H. Lukashevich, C. Dittmar, andG. Schuller. Genre classification using bass-relatedhigh-level features and playing styles. InProc. of theInt. Society of Music Information Retrieval (ISMIRConference), Kobe, Japan, 2009.

[2] J. Abeßer, H. Lukashevich, and G. Schuller. Feature-based extraction of plucking and expression styles ofthe electric bass guitar. InProc. of the IEEE Int.Conf. on Acoustics, Speech, and Signal Processing(ICASSP), 2010.

[3] P. J. Ponce de Leon and J. M. Inesta. Pattern recogni-tion approach for music style identification using shal-low statistical descriptors.IEEE Transactions on Sys-tem, Man and Cybernetics - Part C : Applications andReviews, 37(2):248–257, March 2007.

[4] C. Dittmar, K. Dressler, and K. Rosenbauer. A tool-box for automatic transcription of polyphonic music.In Proc. of the Audio Mostly, 2007.

[5] Tuomas Eerola and Petri Toiviainen.MIDI Toolbox:MATLAB Tools for Music Research. University ofJyvaskyla, Jyvaskyla, Finland, 2004.

[6] D. Gusfield. Algorithms on strings, trees, and se-quences: computer science and computational biol-ogy. Cambridge University Press, Cambridge, UK,1997.

[7] A. Hazan, M. Grachten, and R. Ramirez. Evolving per-formance models by performance similarity: Beyondnote-to-note transformations. InProc. of the Int. Sym-posium, 2006.

[8] Ellen Koskoff, editor.The Garland Encyclopedia ofWorld Music - The United States and Canada. GarlandPublishing, New York, 2001.

2 see http://www.globalmusic2one.net3 see http://www.idmt.de/eng/researchtopics/songs2see.html

[9] Gerhard Kubik.Zum Verstehen afrikanischer Musik.Lit Verlag, Wien, 2004.

[10] C. McKay and I. Fujinaga. Automatic genre classifi-cation using large high-level musical feature sets. InProc. of the Int. Symposium of Music Information Re-trieval (ISMIR), 2004.

[11] C. McKay and I. Fujinaga. jSymbolic: A feature ex-tractor for MIDI files. InInt. Computer Music Confer-ence (ICMC), pages 302–305, 2006.

[12] H.-J. Reznicek.I’m Walking - Jazz Bass. AMA, 2001.

[13] M. P. Ryynanen and A. P. Klapuri. Automatic tran-scription of melody, bass line, and chords in poly-phonic music.Computer Music Journal, 32:72–86,2008.

[14] C. S. Sapp. Hybrid numeric/rank similarity metrics formusical performance analysis. InProc. of the Int. Sym-posium on Music Information Retrieval (ISMIR), pages501–506, 2008.

[15] E. Stamatatos and G. Widmer. Automatic identificationof music performers with learning ensembles.ArtificialIntelligence, 165:37–56, 2005.

[16] Ruth M. Stone, editor.The Garland Encyclopedia ofWorld Music - Africa, volume 1. Garland Publishing,New York, 1998.

[17] Y. Tsuchihashi, T. Kitahara, and H. Katayose. Usingbass-line features for content-based MIR. InProc. ofthe Int. Conference on Music Information Retrieval(ISMIR), Philadelphia, USA, pages 620–625, 2008.

[18] E. Tsunoo, N. Ono, and S. Sagayama. Musical bass-line clustering and its application to audio genre classi-fication. InProc. of the Int. Society of Music Informa-tion Retrieval (ISMIR Conference), Kobe, Japan, 2009.

[19] George Tzanetakis, Ajay Kapur, W. Andrew Schloss,and Matthew Wright. Computational ethnomusicol-ogy. Journal of Interdisciplinary Music Studies,1(2):1–24, 2007.

[20] Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem.Journal of the ACM(JACM), 21(1):168–173, 1974.

[21] Paul Westwood.Bass Bible. AMA, 1997.

[22] Peter Wicke. Handbuch der popularen Musik:Geschichte, Stile, Praxis, Industrie. Schott, Mainz,2007.

[23] G. Widmer, S. Dixon, W. Goebl, E. Pampalk, andA. Tobudic. In search of the Horowitz factor.AI Mag-azine, 24:111–130, 2003.

[24] G. Widmer and W. Goebl. Computational models ofexpressive music performance: The state of the art.Journal of New Music Research, 33(3):203–216, 2004.

98

11th International Society for Music Information Retrieval Conference (ISMIR 2010)


Recommended