Groundwork for a Resource in Computa5onal Hearing for Extended String Techniques
Amy V. Beeston1 and Mark A. C. Summers2 1Department of Computer Science, 2Department of Music, University of Sheffield, UK {a.beeston, m.summers}@sheffield.ac.uk
Illustra5ons Introduc5on • Extended techniques (ETs) are now mainstream in contemporary music. • ETs are variable by nature and can be problema5c to perform consistently. • Datasets addressing variability in instrumental performance are rare. • A fruiTul integra5on of acous5c instrument and signal processing technology is desirable [cf. 1–5]. • Human listeners are sensi5ve to context; we adapt to our environment and perceive ‘interes5ng’ varia5on in a signal [6–7]. • Machine listeners o^en rely on cues that vary uninten5onally. • We examine variability in recording strategy and performance itera5on in order to enhance machine listening for live instrumental performance.
1. M Parker (2007). Proc. Verband Deutscher Tonmeister Symposium 2. MW Young (2007). Proc. ICMC 508-‐511 3. D Van Nort, J Braasch & P Oliveros (2009). Proc. SMC 131-‐135
4. W Hsu (2010). Leonardo Music J. 20, 33-‐39 5. PA Tremblay & D Schwarz (2010). Proc. NIME 15-‐18 6. AJ Watkins (2005). J. Acoust. Soc. Am. 118 (1) 249-‐262
7. CE S5lp, JM Alexander, M Kie^e & KR Kluender (2010). Anen. Percept. Psycho. 72 (2), 470-‐480
8. P Strange & A Strange (2001). The contemporary violin: extended
performance techniques. University of California Press, Berkeley 9. B Turestzky (1989). The contemporary contrabass. 2nd ed.
University of California Press, Berkeley
10. Praat – hnp://www.praat.org 11. G Peeters, BL Giordano, P Susini, N Misdariis & S McAdams (2011).
J. Acoust. Soc. Am. 130 (5) 2902-‐2916
12. A Francis (2004). Business mathema5cs and sta5s5cs. 6th ed. 150-‐155. Thomson Learning, London.
13. JM Grey & JW Gordon (1978). J. Acoust. Soc. Am. 63 (5), 1493-‐1500
Methods
• Small number of ETs for viola da gamba selected from survey [cf. 8–9]. • Fixed pitch (A3), loudness (RMS) and dura5on (2 seconds). • Sound produced on 6 strings, with 5 bowing techniques. • Click track and notated score to aid performance. • 6 itera5ons (repe55ons) of each technique.
1. Selec5on of techniques
2. Selec5on of microphones
• Recordings were made at The University of Sheffield Sound Studios in an acous5cally isolated room (volume 34.7 m3). • Three ‘close’ microphones and one ‘far’ microphone were used. • Signals recorded to control room via RME Fireface 800 audio interface.
Computational Hearing for Extended String Techniques 3
2 Methods
In the pilot study described below, a prototype corpus was used to examinethe variation naturally arising in normal and extended performance techniquesdue to (i) the recording conditions and (ii) iteration of the technique by theperformer. This section describes four main operations undertaken to gatherdata appropriate to the task: selection of performance techniques; selection ofmicrophones and their placement; sample extraction and storage; automatic an-notation with timbral descriptors.
2.1 Selection of Techniques
The current study draws its sound material from an ongoing project documentingthe sound world of the viola da gamba. An instrument-specific list of techniques(normal and extended) has been compiled, informed by the performing back-ground of one of the present authors (MS) with cross-reference to other surveysof extended techniques on string instruments [7], [8], [17]. A list of 90 individualtechniques serves as the basis for the corpus.
A small number of these techniques have been picked for illustrative analysesin Section 3. Firstly, we fix the pitch, loudness and duration (as in typical timbrestudies), and examine bowing this pitch normally on six different strings. Sec-ondly, we use a single string to examine the effect of different bowing techniques.
2.2 Selection of Microphones and their Placement
Recordings were made in an acoustically isolated room in the University ofSheffield Sound Studios (volume 34.7 m3). Two walls were covered with heavyfelt curtains, and there was an upright piano on another wall. The player sat inone corner pointing diagonally towards a ‘far’ room microphone at a distanceof 3.6 meters. Three further ‘close’ microphones were placed on or near theinstrument as described in Table 1.
The signal arriving at each microphone was recorded via an RME Fireface800 audio interface connected to a MacBook in an adjoining control studio,running Audacity software [19]. Two DPA microphones were directly attached tothe instrument itself, and represent the highest signal-to-noise ratio practicably
Table 1. Description of microphones selected, their directional characteristics andplacement in regard to the instrument and room.
Microphone Direction Proximity Placement
DPA 4060 omni close below bridge, under highest (1st) stringDPA 4060 omni close below bridge, under middle (4th) stringNeumann KM184 cardioid close 0.1 m in front of instrument’s bridgeNeumann KM184 cardioid far 3.6 m distant to front, raised 1.8 m
• Individual samples extracted from the long audio recordings. • 24 audio files for each technique (6 itera5ons X 4 microphones). • Two-‐stage process of segmenta5on: 1. Start/stop 5mes of bow movement marked in Praat TextGrid [10]. 2. TextGrid read in Matlab to excise samples and equalise RMS level.
3. Sample extrac5on
• Automa5c annota5on achieved using the Timbre Toolbox [11]. • To match human audi5on, we reason that the best parameters should capture the `interes5ng’ varia5on. • Thus a small variance is desired for unimportant changes in recording strategy and for unintended changes in performance repe55on. • Peeters et al. stress importance of parameters capturing the central tendency and temporal variability of spectro-‐temporal proper5es, the temporal envelope and the periodicity of the signal [11]. • We inspect varia5on according to the first and last of these, using spectral centroid and spectral flatness measures.
4. Timbral annota5on
• Human variability was measured with a rela5ve, dimensionless measure [12]: the quar5le coefficient of dispersion (QCD). • First, the median and inter-‐quar5le range (iqr = Q3–Q1) were derived for individual audio samples by 5me-‐varying, frame-‐based analysis methods [11]. • QCD quan5fies quar5le devia5on (iqr/2) as a percentage of the median
QCD = (iqr / 2) x (100 / median) • Stable parameters result in low QCD values (close to zero). • A high QCD value implies a high degree of variability.
5. Variability measure
• The open string (2) showed a high centre of gravity or `brightness’, especially when recorded by the DPAs. • The three ‘close’ microphones recorded consistently lower values than the ‘far’ microphone for the stopped strings (3–7).
1. Recording strategy
• Extended techniques were unstable and resulted in higher QCD scores. • The standard bowing techniques were more consistent throughout the dura5on of the sound and achieved lower QCD values.
2. Performance itera5on
Discussion • Much work done in recent years to extract control parameters from audio signals in live performance, however signal variability arising from recording strategy and performance itera5on is typically unreported. • Two perceptually-‐correlated parameters were used to quan5fy varia5on anributable to the microphone setup and to human reproduceability for a range of normal and extended performance techniques. • ETs were found to contain more inherent varia5on than normal bowing.
2 3 4 5 6 7
200
400
600
800
1000
String number
Spect
ral c
entr
oid
media
n (
Hz)
DPA string 1 DPA string 4 Neumann close Neumann far
Mean and standard error of the spectral centroid median of the Short-‐Term Fourier Transform (STFT) power spectrum for six itera5ons of standard bowing of the pitch A3 on strings 2 to 7.
01 02 03 04 05 06 07 08 09 100
10
20
30
40
Technique
Spec
tral f
latn
ess
QC
D
01 = str 2, colegtratnot02 = str 2, alf03 = str 2, sulpont04 = str 2, sultast05 = str 5, bow06 = str 2, bow07 = str 3, bow08 = str 4, bow09 = str 6, bow10 = str 7, bow
Ten versions of the pitch A3 ranked according to the QCD derived from the spectral flatness of the STFT power spectrum. Mean and standard error incorporate the three close microphone posi5ons that might be used in performance.
d
Workflow
Timbral parameters Spectral centroid – indicates centre of mass – correlates with brightness [13]
Spectral flatness – 1 if noisy (flat spectra) – 0 if tonal (peaky spectra) [11]