Wa Ms Thesis

8/3/2019 Wa Ms Thesis

1/80

Investigation of Voice Stage Support: Subjective Preference Test

Using an Auralization System for Self-Voice

by

Cheuk Wa Yuen

A Thesis Submitted to the Graduate

Faculty of Rensselaer Polytechnic Institute

in Partial Fulfillment of the

Requirements for the degree of

MASTER OF SCIENCE IN BUILDING SCIENCES,

CONCENTRATION IN ARCHITECTURAL ACOUSTICS

Approved:

_________________________________________Professor Paul T. Calamia, Thesis Adviser

_________________________________________Professor Ning Xiang, Ph.D.

Rensselaer Polytechnic InstituteTroy, New York

June, 2007(For Graduation August 2007)


2/80

ii

Copyright 2007

by

Cheuk Wa Yuen

All Rights Reserved


3/80

iii

CONTENTS

LIST OF TABLES.........................................................................................................v

LIST OF FIGURES...................................................................................................... vi

ACKNOWLEDGMENT .............................................................................................. ix

ABSTRACT................................................................................................................. xi

1. Introduction..............................................................................................................1

1.1 Aim of the Thesis.............................................................................................3

1.2 Historical Review ............................................................................................5

1.2.1 Stage Acoustics and Support................................................................5

1.2.2 Previous Researches on Subjective Preferences on Stage Acoustics .....7

1.2.3 Self-Voice Perception..........................................................................7

1.2.4 Experimental Setup in Previous Self-voice Auralization and RelatedSound Field Simulation......................................................................10

1.3 Thesis Outline................................................................................................12

2. Self-voice Auralization System: Design and Implementation..................................13

2.1 Experimental Design Concept ........................................................................13

2.2 Measurement Setup........................................................................................13

2.3 Binaural Real-time Auralization System ........................................................15

2.3.1 System Overview...............................................................................15

2.3.2 Implementation of Direct Air Conduction Modeling ..........................19

2.3.3 Implementation of Indirect Air Conduction Modeling........................23

2.3.4 Implementation of Headphone Equalization.......................................25

2.4 BRIR Acquisition System ..............................................................................26

2.5 Experimental Procedures in Subjective Test...................................................30

2.5.1 Test Subject Conditioning..................................................................30

2.5.2 Use of Dramatic Text in Study of Actors............................................30

2.5.3 Verifying the Consistency of Self-voice Stimuli by Monitoring the

Pace of Speech...................................................................................31


4/80

iv

2.6 HATS Verification Tests................................................................................34

2.6.1 Binaural Microphones........................................................................35

2.6.2 Artificial Mouth.................................................................................36

2.7 Subjective Test on Naturalness of the Auralization System ............................41

2.7.1 Evaluation on Naturalness of CIL Filter Delay Time..........................42

2.7.2 Evaluation on Naturalness of CIL Filter Level ...................................45

2.8 Discussion .....................................................................................................46

3. Subjective Preference Tests on Stage Acoustic Conditions for Actors.....................47

3.1 Introduction ...................................................................................................47

3.2 Impulse Response Acquisition .......................................................................48

3.3 Subjective Test Design...................................................................................50

3.3.1 Preference ratings from paired comparison ........................................50

3.3.2 Test procedures..................................................................................51

3.4 Paired Comparison Test on Stage Locations...................................................51

3.4.1 Preference study of stage locations when head orientation is look

center ...............................................................................................52

Preference study of stage locations when head orientation is look left ........59

3.4.2 Preference study of stage locations when head orientation is look

right .................................................................................................65

3.4.3 Discussion on Stage Location Preference...........................................70

4. DISCUSSIONS ......................................................................................................71

4.1 Reflections on subjective preferences on stage acoustic conditions.................71

4.2 Accuracy in subjective testing........................................................................71

4.3 Potential ways of improving voice stage support in proscenium theaters ........72


5/80

v

LIST OF TABLES

Table 1. Direct AC Filter Settings (Rane PE-15)...........................................................20

Table 2. CIL filter settings (digital parametric equalizer on 02R)..................................23

Table 3. Headphone compensation filter settings (02R master output) ..........................26Table 4. Delay time in evaluation test of naturalness of Direct AC insertion loss

compensation filter (CIL filter). *The current system has a processing delay of

0.14ms with a setting of 0.01ms on DN716. ** Prschmann's tested delay times

were based on taps of a 48kHz Tucker Davis DSP system, and they are represented

here in millisecond for convenience of comparison...............................................43


6/80

vi

LIST OF FIGURES

Figure 1. Components of perception of self-voice...........................................................9

Figure 2 Earthwork M30 omni-directional microphone.................................................14

Figure 3 Countryman B3 miniature omni-directional microphone.................................14Figure 4 Transfer function from Earthwork M30 to Countryman B3.............................15

Figure 5 Binaural self-voice auralization system block diagram....................................16

Figure 6 Test subject with microphone and pop filter....................................................17

Figure 7 MRP-to-ERP Transfer Function & Direct AC Filter using PE-15 parametric

equalizer (1/3 octave smoothing) ..........................................................................20

Figure 8 Setup for measuring the headphone's insertion loss using an isolation tube. ....21

Figure 9 Isolation tube used in insertion loss measurement ...........................................22

Figure 10 Waves IR1 Convolution Reverb, loaded with a 3-second unit sample sequence

.............................................................................................................................24

Figure 11 Impulse response trimming before importing to IR-1....................................24

Figure 12 Headphone response and compensation filter (02R master output)................25

Figure 13 Frontal plane section of HATS, showing binaural microphones and the related

fittings. (Adapted from manufacturers manual)....................................................27

Figure 14 Median plane section of HATS, showing the artificial mouth .......................28

Figure 15 Binaural room impulse response acquisition system block diagram, showing

how the binaural ears and artificial mouth of the HATS are connected..................29

Figure 16 Example plot of effective duration of running autocorrelation function of two

recordings of the same text at different pace, showing the use of e min as a temporal

reference for monitoring subjective testing............................................................33

Figure 17 Verification Test of HATS in anechoic chamber at General Electric

Laboratory (NY)...................................................................................................34

Figure 18 Frequency response comparison between HATS binaural microphones ........35Figure 19 Voice directivity of HATS (a) horizontal plane (b) vertical plane, in 4 octave

bands (250Hz, 500Hz, 1khz & 2kHz)....................................................................36

Figure 20 Comparison of voice directivity, in 3 ocatve bands (500Hz, 1kHz & 2kHz)..37

Figure 21 On-axis frequency response of HATS artificial mouth. Overlays of no-

smoothing and 1/6-octave smoothing...............................................................38


7/80

vii

Figure 22 Frequency Response of B&K Artificial Mouth Type 4128C [adapted from

manufacturers datasheet] .....................................................................................39

Figure 23 MRP-to-ERP transfer function of HATS (1/3-octave smoothing)..................40

Figure 24 Averaged frequency response of MRP-to-ERP (Direct AC) of 18 human

subjects. The grey area marks the satndard deviation. (Adapted from Porschmann

[2000])..................................................................................................................41

Figure 25 Subjective evaluation of naturalness of delay time in Direct AC auralization.

Mean score and error rate of all subjects (95% confidence)...................................44

Figure 26 Architectural plan of the main space at the RPI Playhouse. Dimension unit in

inches. Blue lines are dimensional guides. (CAD drawing courtesy of RPI Building

Management)........................................................................................................48

Figure 27 Stage locations where BRIR was measured. Dashed line labeled "CL" is the

center line of the stage across the proscenium. ......................................................49

Figure 28 Top view of the HATS showing 3 different head orientations in binaural room

impulse response acquisition.................................................................................50

Figure 29 Preference score of different stage locations when head orientation is "look

center" (Conditions - A: DSC, B: DSR, C:CSC, D: CSR). Normalized scores of 13

individual subjects A-M (blue bar graph) and overall average score of all subjects

(red bar graph)......................................................................................................55

Figure 30 Interaural cross-correlation functions in 100ms-intervals for conditions A-D

when head orientation is "look center". .................................................................56

Figure 31 VSS plot of binaural ears in four conditions (A: DSC, B: DSR, C: CSC, D:

CSR) when head orientation is "look center".........................................................58

Figure 32 Preference score of different stage locations when head orientation is "look

left" (Conditions - A: DSC, B: DSR, C: CSC, D: CSR). Normalized scores of 13

individual subjects A-M (blue bar graph) and overall average score of all subjects

(red bar graph)......................................................................................................62

Figure 33 Interaural cross-correlation functions in 100ms-intervals for conditions A-D

when head orientation is "look left". .....................................................................63

Figure 34 VSS plot of binaural ears in four conditions (A: DSC, B: DSR, C: CSC, D:

CSR) when head orientation is "look left.............................................................64


8/80

viii


9/80

ix

ACKNOWLEDGMENT

I am grateful for professor Paul Calamia and his willingness to share his knowl-

edge and wisdom. I would also thank Dr Ning Xiang for his insights and meticulous

training in laboratory work and research; and Dr Jonas Braasch for an enjoyable class inpsychoacoustics.

This research would not be possible without the help from Mr. David Larson for his

generosity in lending me the Brel & Kjr head and torso simulator. My gratitude also

goes to Mr Bob Hedeen at General Electric Laboratory (NY) for letting me use the an-

echoic chamber for numerous acoustical measurements.

A thunderous applause goes to all participating actors in this research. Speaking in

an isolated environment without the interaction with other actors or audience had been

the most difficult experience for the artists. Your patience and concentration were pro-

fessional. Without your support, there will be no study of voice stage support.

My study in the United States was made possible by the support of the prestigious

Sir Edward Youde Memorial Fund Fellowship in Hong Kong. I hereby send my dedica-

tion to late Sir Edward Youde. His wife, Pamela Youdes continuous encouragement

means a lot to me. Respect also goes to all officers of the fellowship council, especially

Ms Carnelia Fung.

I sincerely thank for all my mentors and incredible educators from whom I learned a

lot throughout the years at Rensselaer Polytechnic Institute, California Institutes of the

Arts and the Hong Kong Academy for Performing Arts.

Last but not least, I thank my family for their continuous support. This thesis is

dedicated to my parents, late grandmother, Fung Yau Hau, and my brother Cheuk Chi

Yuen who is recovering from speech disorder after a stroke in summer 2006. His speech

therapy session of repetitive reading is an irony to my research.


10/80

x

Not everything that can be counted counts, and not everything that counts

can be counted. [4]

Albert Einstein


11/80

xi

ABSTRACT

The human voice plays an integral role in dramatic art. The performance of singers and

actors, who perceive their voice through their ears as well as bone conduction, is highly

related to the acoustic condition they are in. Due to the proximity of the sound sourceand the spectral difference in the transmission through the skull as compared to air, a

support condition different from that for musical instrumentalists is needed. This paper

aims at initiating a standardization of methodology in subjective preference testing for

voice stage support in order to collect more data for statistical analysis. A proposal of an

acquisition/auralization system for self-voice and a set of subjective test procedures are

presented. The subjective evaluation of the system is compared to previous designs re-

ported in the literature, and the implementation is validated. A small playhouse has been

measured and auralized using the system described, and subjective-preference tests have

been conducted with 13 professionally trained actors. Their preferred stage-acoustic

conditions (in relation to locations on stage and head orientations) are reported. The re-

sults show potential directions for further investigations and identify the necessary con-

cerns in developing an objective parameter for voice stage support.


12/80

1

1. IntroductionIn the course of theater history, from classical Greek drama to Shakespearean plays

to Ibsens naturalistic plays to 20th

century Broadway rock musicals, human voice has

always been an integral part of the dramatic art. The success of this art largely relies onhow well the audience understands the words voiced by the actors. This rule has not

changed for more than 2,300 years since the day of Lycurgian Theater of Dionysus in

Greece (the first great permanent theater in recorded history) [1].

While most contemporary architects and acousticians are focusing on auditorium

acoustics in the design of performance spaces, the special acoustical needs required by

musicians, singers and actors are often less emphasized. Although acoustic shells have

been developed in concert halls and achieved a certain degree of success, opera houses

and theaters do not receive the same attention. Stage performers are left to adapt to the

acoustics of the space as best they can. [2] In many cases, performers would find it diffi-

cult to hear themselves or each other intelligibly, and thus not manage to achieve their

best tonality; and in extreme cases, they would fail to attain pitch accuracy and coher-

ency individually or in the ensemble. This may result in a less-than-satisfactory perform-

ance. That means the performer-audience communication is not achieved and ultimately

affects how the audience would rate the performance and possibly the acoustics of the

venue. It is strongly suggested that stage acoustics demands as much attention as audito-

rium acoustics deserves. It is logical that an optimal stage acoustics is fundamental to a

good overall rating of the acoustics in a performance space. (Visual appeal also plays a

role in audiences rating on the acoustics, but it is out of the scope of discussion in this

thesis.)

There is currently no parameter in international standards quantifying stage acous-

tics. Among all acoustical parameters widely used in the industry, only one is generally

accepted as a means of quantifying the ease of listening and performing on stage Sup-

port (ST1, ST2), first proposed by A.C. Gade in 1989, which is intended to measure the

contribution of early reflections to the sound from the musicians own instrument. [3]


13/80

2

Gades proposal is however limited to instrumentalists only. For singers and actors

(or voice performers as a consistent terminology for the rest of the thesis) whose in-

strument is the human voice, Support cannot be applied directly because of the influence

of bone conduction in the perception of self-voice. Moreover, Support fails to address

the frequency spectrum and orientations of early reflections; and directions of late rever-

beration which might be determining factors as well. The practicality of using a single

parameter to represent voice performers preferred stage acoustic condition remains

uncertain.

But one thing is clear whether it is to propose a new acoustical parameter or to

validate the effectiveness of a stage acoustics design, subjective preference test is the

only viable means of solving the problem. Every human being is uniquely different from

each other and our preferences to a certain acoustic condition remain highly subjective

and may vary enormously. The preferred stage acoustic condition depends on ones own

voice quality and auditory behavior.

A new study in stage acoustics, called Voice Stage Support (VSS), is proposed to

investigate auditory feedback on stage for professional voice performers. It thus ex-

cludes normal speech communication between common people. It is not the objective of

this thesis to comprehensively define VSS or devise a new parameter experimentally. It

is rather an initiation of ground work in promoting the study of this uniquely different

field in opera house and theater acoustics which involves acoustical design, psychoa-

coustics and performance psychology.

One may argue the tremendous difficulties or the impossibility in generalizing audi-

tory preferences for the entire human population, and it remains a challenge for acousti-

cians.

From error to error, one discovers the entire truth.

- Sigmund Freud


14/80

3

1.1 Aim of the ThesisAs discussed above, in order to study Voice Stage Support (VSS), subjective prefer-

ence tests are inevitable. There are some prominent difficulties in conducting such tests.

In statistical analysis, the key to success is to have a large number of samples in the

population. Thus, it takes a long time for any single researcher to acquire enough data

for analysis and come up with a convincing conclusion. It is particularly difficult for

VSS because professional voice performers constitute only a very small portion of the

human population. It is an effort of numerous researches before any comprehensive the-

ory of VSS can be accomplished. The more subjective data collected, the better the de-

velopment of the study.

Unlike most auditory experiments which involve external stimuli (sound sources

outside ones body), the perception of self-voice strictly requires ones own vocalization

to generate the sound stimuli for the test which demands real-time auralization of audi-

tory scenes and inhibits the use of pre-recorded and pre-processed test stimuli.

The demand of real-time auralization implies a system capable of low propagation

delay (or processing delay). Previous similar acoustical researches often require special-

ized digital signal processing (DSP) equipment which means very few facilities are

qualified to repeat such tests to generate compatible results. However, there are now al-

ternatives since digital audio signal processing becomes more widely available and af-

fordable in the professional audio industry.

The argument here is an easily obtainable, reproducible and repeatable setup for

real-time auralization of self-voice would greatly promote this field of study by enabling

more researchers who have access to professional singers and actors to conduct such

tests and thus enlarging the sample base in the aggregation of compatible data for long-

term statistical analysis.


15/80

4

The first objective of this thesis is to verify the reliability of a more accessible real-

time auralization setup, compared to previous experimental systems found in literature.

It is also a process in standardizing psycho-acoustical experimental procedures involving

real-time self-voice auralization and the respective data acquisition, both in terms of

hardware set up and subject conditioning, for the purpose of voice stage support study.

This thesis includes a subjective evaluation of the stage in a 200-seat playhouse. Various

stage locations and head orientations are compared using the proposed auralization setup

and procedures.


16/80

5

1.2 Historical ReviewThis section briefly covers the issues related the thesis. It first summarizes the field of

stage acoustics and support for performers followed by the difference and difficulties of

support for voice performers as compared to musicians (section 1.2.1 & 1.2.2). The psy-

chophysics of self-voice perception is introduced in section 1.2.3. Previous subjective

preferences tests and their auralization setup are reviewed in section 1.2.4.

1.2.1 Stage Acoustics and SupportStage acoustics can be defined as the study of acoustic conditions of where the per-

formers are located in a performance. In many occasions, performers are located in a

stage house or a stage volume which is spatially discriminated (yet not isolated) from thephysical volume of the auditorium. It is particularly obvious in proscenium theaters and

opera houses. In other settings, such as theater-in-the-round or multi-functional/ modular

theaters, the separation between stage and auditorium acoustic space is less distinguish-

ing and may overlap with each other. Whatever the setting is, performers demand a cer-

tain condition so that they can perform comfortably.

Stage support usually refers to the amount of auditory feedback of ones own in-

strument, which enables the performer to hear him/herself with ease and he/she does not

need to force the instrument to develop the tone. In A.C. Gades pioneering work [5], it

is translated into an objective parameter, SUPPORT, which includes three measures of

energy ratios (ST1, ST2 and STlate) in the sound fields.

ST1= 10log10E(20,100ms)

E(0,10ms)

ST2 = 10log10E(20,200ms)E(0,10ms)

STlate

= 10log10E(100,!ms)

E(0,10ms)


17/80

6

After a few applications and analysis, they were later revised by Gade [6] as:

STearly = 10log10E(20,100ms)

E(0,10ms)

STtotal

= 10log10E(20,1000ms)

E(0,10ms)

STlate

= 10log10E(100,1000ms)

E(0,10ms)

where E () stands for the time integral of the squared pressure signal of an impulse

response between the time limits reported in the brackets. In the above definitions, t = 0

is the arrival time of the direct sound. Unit is in dB.

SUPPORT has been applied in various studies of acoustics for performers [7][8][9],

and is generally agreed with performers subjective preferences. Nevertheless, there are

a few points that attract our attention. Firstly, Gades setup suggests that the measure-

ment microphone position is one meter (roughly the maximum distance between the

players ears and his instrument) in front of the sound source. Secondly, single micro-

phone is used in the measurement.

Gade reported that STearly unexpectedly succeeded in describing the ease of hearing

other musicians rather than its intended purpose [6]. Although the reliability has yet to

be ascertained over a longer period of time, some acoustical consultants have been using

it as a parametric guideline.

However closely related to performers support, it is not applicable in the case of singers

and actors because the instrument concerned - the human voice - is in close proximity to

the ears and there exists a fundamental difference in the perception of self-voice than

that of any other music instruments.


18/80

7

1.2.2 Previous Researches on Subjective Preferences on Stage AcousticsAll previous researches indicated musicians preference (including music instru-

mentalists and singers) on early reflections in support of their performance. Marshall and

Meyer, in 1985, reported singers prefer a strong presence of reverberation, while early

reflections were weakly preferred. [10]

Noson, later in 2000, reported that singers preferred longer delay time of reflections

than musicians due to the masking effect of bone conducted sound inside the head [11].

He also discovered that melisma singing style (non-plosive, non-fricative syllables) re-

sulted in a shift in preferred delay time of reflections [12]. It indicated that subjective

preference is content-dependent of the sound source. In Nosons works, it was also

proved that singers subjective preference on delay time of a single reflection is propor-

tional to the minimum effective duration (e min) of the running autocorrelation function

(ACF) of the sound source. This is in direct agreement with Andos previous research on

audiences subjective preference in concert halls [13]. Andos proposal is thus believed

to be applicable to musicians and singers as well.

Nosons work strongly supported that the unique nature of self-voice perception is

the most significant contributing factor of a different preference pattern for voice per-

formers as compared to instrumentalist.

1.2.3 Self-Voice PerceptionPerception of self-voice is constituted by air conduction (sound wave propagations

from mouth to ear) and bone conduction (vibrations from voice organs to the ear inside

the human head).

The air conduction path includes mainly the diffraction of sound coming out from

the mouth opening, across the surface of the head and into the ear canal. It also includes

all transmissions of vibrations of the vocal tissues from the surface of the head into air,


19/80

8

and back to the ear canal. However, this latter component is believed to be of negligible

contribution to our hearing [14].

The role of bone conduction was not well understood until Georg von Bksy [15],

in 1949, identified bone conduction and air conduction as the sound paths pertinent to

perceiving ones self-voice. Estimations derived from his various investigations show

that the perceived loudness of bone conduction is of the same order of magnitude as that

of the air conduction. According to a more contemporary study of bone conduction by

Stenfelt and Goode [16], the bone conduction path can be divided into four components,

(1) sound radiation into the ear canal, (2) inertial motion of the middle ear ossicles, (3)

inertial motion of fluid in the cochlea, and (4) compression and expansion of the bone

encapsulating the cochlea.

In most occasions, natural human voice is found in the presence of an acoustic envi-

ronment. With the inclusion of acoustic space, the air conduction path can be further di-

vided into two - direct air-conduction (mouth-to-ear) and indirect air-conduction (specu-

lar reflections from boundaries in the acoustic environment).

Hence, the paths constituting the perception of self-voice can be summarized as:

Direct Air Conduction (Direct AC) - from mouth to ear

Bone Conduction (BC) - through skull

Indirect Air Conduction (Indirect AC) - reflections of voice off room boundaries

*Direct AC, BC and Indirect AC are used, throughout this thesis, to denote the above

auditory paths.

Their relationship is represented graphically in a simplified fashion in Figure 1.


20/80

9

Figure 1. Components of perception of self-voice

The spectral characteristics of the above pathways can be identified with human subjects.

For Direct AC, it is usually obtained by measuring the transfer function between the

sound pressure at microphones placed at mouth reference point (MRP) and ear reference

point (ERP) in an anechoic environment, in which human subjects recite a selection of

words effectively covering the vocal frequency range, as demonstrated by Prschmann

[17] as well as Williams and Barnes [18]. For BC, direct measurement cannot be applied.

It is determined by measuring the masked threshold of pure-tone or narrow-band noise

while the air-conducted sound is removed (or highly attenuated) [17]. In general, the

threshold increases as frequency rises. Nevertheless, in Stenfelts research [19], it is

found that sensitivity in loudness perception in bone conduction is higher than that in air

conduction. And this trend is progressively more drastic as listening level increases,

suggesting that the loudness contour of bone conduction is different from that of air con-

duction. (Air conduction loudness contour refers to the Fletcher and Munson curve in

1933. [20]) To determine Indirect AC, method similar to that for Direct AC can be used

with human subjects. Binaural receivers can be fitted in the subjects ear canals. By


21/80

10

measuring the impulse response of the receiver at MRP and the binaural receivers in the

ears, the transfer function of the room can be determined. Then, the Indirect AC can be

isolated by properly removing the direct sound from the impulse response.

For Direct AC and BC, an average result can be collected from a group of subjects in the

laboratory. However, for Indirect AC, it requires bringing a large number of subjects to

each acoustic condition being examined (i.e. different concert halls and different stage

positions) which is impractical in most cases. An alternative approach is discussed in

Section 2.1 in this thesis.

1.2.4 Experimental Setup in Previous Self-voice Auralization and Related SoundField Simulation

In the subjective evaluation of a sound field for singers or actors, owning to the use

of self-voice as the sound stimulus, one must create an auralization setup capable of re-

producing (1) the direct mouth-to-ear air conduction (if such path is to be obstructed by

the reproduction system) and (2) the convolution product between the live signal and the

impulse response of the sound field under test, all in real-time.

At the source pickup end, there is no consistency in previous experiments. A micro-

phone is usually placed in front of the mouth. However, the microphone type and micro-

phone-to-mouth distance vary greatly between researches. In Marshall and Meyers

setup [10], a cardioid microphone is placed at 0.5m directly pointing at the mouth; in

Nosons setup [11], a small headset microphone (no polar pattern specified) is located at

10cm in front and 5cm below the mouth; and in Prschmanns experiment [21], a Senn-

heiser KE4 omni-directional miniature microphone is positioned precisely at the mouth

reference point (MRP40) 40mm in front of the lips, with a holding device attached to

the headphones harness.

At the sound reproduction end, there are generally two different approaches (1)

spatially distributed loudspeakers for reproduction of delayed reflections, as found in

Marshall and Noson; and (2) open-back circumaural headphones with compensation fil-

ters for binaural sound field simulation, found in Prschmann. The two reproduction ap-


22/80

11

proaches have their pros and cons. Using loudspeakers inherently create the possible

acoustic feedback path which means, for instance, the delayed reflection is picked up by

the microphone and then more reflections are regenerated again through the system. This

leads to unintended stimuli and eventually affects the accuracy of the subjective test. Its

advantage is that subjects are free of restraint by any body attachment. However, the

speaker system requires a comparatively larger space and is usually not portable.

The headphone system, on the contrary, is less demanding for the laboratory space

and is fairly portable and easy to set up. The disadvantage of using headphone system is

the necessity of implementing a compensation filter for the direct mouth-to-ear air con-

duction path because, even when open-back circumaural headphones are used, the head-

phone enclosures inevitably impose a sound insertion loss between the mouth and the

ears. High frequencies are usually attenuated. Moreover, the compensated (filtered) sig-

nal needs to be delayed before reaching the headphones so that it is in sync with the

natural air conduction to avoid comb-filtering effect. Prschmann [21] has shown that

such reproduction scheme is successful in achieving a certain degree of naturalness in

virtual auditory environment. Another issue with headphones is the occlusion effect and

return of radiation by the human head. Occlusion effect refers to the accentuation of sen-

sitivity in bass frequencies when the ear canal is obstructed. Details of the occlusion ef-

fect can be found in literature written by Tonndorf [22] and Dean [23]. Open-back head-

phones can minimize such effect and have been accepted and used in experiments pro-

vided that there is enough padding between the headphone hardware and the test subject.


23/80

12

1.3 Thesis Outline

In Chapter 2, the proposed self-voice auralization system dedicated for the investigation

of voice stage support is introduced. It includes the binaural impulse response acquisi-

tion system, the binaural auralization system and the subjective preference test proce-

dures. Also, a validation test with human subjects to obtain subjective rating on the natu-

ralness of the system is reported. The results were compared to evaluation of setups in

previous researches. The proposed system was used to investigate stage acoustic condi-

tions of a playhouse. Chapter 3 shows the results and analysis of actors subjective pref-

erences on various stage locations and head orientations. Chapter 4 discusses the ex-

perimental results, followed by suggestions in directions for future work in the field of

voice stage support.


24/80

13

2. Self-voice Auralization System: Design and Implementation

2.1 Experimental Design ConceptAs discussed in section 1.2.3, using human subjects to obtain an average of Indirect

AC data is impractical when a lot of acoustic spaces and conditions have to be examined.

Portability and repeatablility were the first criteria of the current design. In order to

achieve that, a dummy head was proposed to substitute the human head in the measure-

ment and acquisition process. An artificial mouth was used to represent the human voice

source. Dunn and Farnsworth [24] showed that a persons own voice can be modeled by

a source at the opening of the mouth. Similar approach has been taken and examined by

Bozzoli, Viktorovitch and Farina [25].

The design consists of three basic components:

- Binaural Room Impulse Response (BRIR) Acquisition

- Binaural Real-time Auralization

- Experimental Procedures for Test Subjects

Since the experimental design was logically driven by the implementation of aural-

izing the conduction paths, the design of real-time auralization is first described (section

2.3) followed by the BRIR acquisition (section 2.4) and experimental procedures (sec-

tion 2.5).

2.2 Measurement Setup

In this thesis, the acoustical measurement system was Electronic & Acoustic System

Evaluation & Response Analysis (EASERA) v1.0.60 software running on a Pentium M

processor based PC, with a Sound Devices USB Pre (USB-powered audio interface) for

audio input/output. Sampling rate was set at 48 kHz and bit depth is 16. Excitation signal


25/80

14

was a pink sweep sine with 1 pre-send and 3 averages customized in EASERA unless

otherwise stated.

Two measurement microphones were used. They were Earthwork M30 and Countryman

B3 omni-directional microphones (Figure 2 & Figure 3). The M30 was chosen for its

high sound pressure capability whereas the B3 is chosen for its compact size for meas-

urement positions where M30 is unable to reach. Their transfer functions were first

measured in order to compensate for their difference in frequency characteristics when

they were used simultaneously. A pink sweep sine was produced using a Yamaha MSP5

2-way studio monitor speaker at a distance of 1m in front of the microphones. The mi-

crophones outputs were recorded using the above EASERA setup and the transfer func-

tion was then obtained for use as an equalization function in subsequent calculations.

Figure 4 shows the microphones transfer function.

Figure 2 Earthwork M30 omni-directional microphone

Figure 3 Countryman B3 miniature omni-directional microphone


26/80

15

Loudspeaker for excitation source is an artificial mouth in a dummy head unless other-

wise stated. The detailed structure of the dummy head is described in section 2.4.

All measurements were conducted in a hemi-anechoic chamber unless otherwise stated.

Figure 4 Transfer function from Earthwork M30 to Countryman B3

2.3 Binaural Real-time Auralization System2.3.1 System Overview

In this research, only Direct AC and Indirect AC need to be implemented. Since

human subjects own voice is used as sound stimuli in real-time auralization, the bone

conduction component is produced naturally inside the subjects head. The auralization


27/80

16

system used a topology of two separate paths to model the direct air conduction (Direct

AC) and indirect air conduction (Indirect AC). All auralizations were conducted in a

hemi-anechoic chamber.

Figure 5 Binaural self-voice auralization system block diagram

Figure 5 shows the system block diagram. The setup used an Earthwork M30 omni-

directional microphone to pick up the subjects voice. It was positioned at the mouth-

reference-point MRP (80mm from the lips) separated from the subjects mouth by a

metal-grille pop-filter mounted at 40mm from the diaphragm so as to eliminate micro-

phone diaphragm excursion caused by plosive sounds. Figure 6 shows the relationship

between the subjects mouth, the pop filter and the microphone. The microphone signal

was split into two and connected to input channels 1 & 2 (Ch1 & Ch2) on a Yamaha 02R

digital mixing console, with identical and repeatable gain setting using the step-gain con-

trol on the pre-amplifiers. The gain setting was optimized to achieve a peak at -10 dBFS

using a mic calibrator of 1kHz sine tone at 105 dBA. The resulting line-level signals

were routed to two paths, Path 1 & Path 2, modeling the Direct AC and Indirect AC re-

spectively.


28/80

17

Figure 6 Test subject with microphone and pop filter.

(Path 1) Through the channel-insert-send before the A/D stage on the Ch1, the pre-

amplified signal was connected to a Klark Teknik DN-716 single-channel digital delay

unit (with built-in 16-bit A/D & D/A conversion) cascaded with a Rane PE-15 4-band

analog parametric equalizer. The analog output was returned to the channel-insert-return

of Ch1, going into the A/D conversion stage on the 02R. The delay unit and parametric

equalizer were used to model the mouth-to-ear propagation delay and transfer function

respectively. Their implementations are further described in section 2.3.2. The 02R on-

board digital equalizer on Ch1 was used as the compensation filter for the insertion loss

introduced by the auralization headphones. Details are described in section 2.3.4.

(Path 2) Through Ch2, the signals was A/D converted and digitally routed, via an optical

connection (TOSLINK) to a Digidesign TDM MIX digital audio workstation (Motorola


29/80

18

DSP-based PCI mixing engine running in a Macintosh dual processor 500MHz G4 com-

puter) using a Digidesign ADAT Bridge digital interface. The workstation was running

ProTools audio software with Waves IR1 (dual-channel convolution reverberation plug-

in) through which a BRIR can be loaded and convolved with the incoming signal. The

output (convolved) signal was returned to the ADAT IN (TAPE IN 1) on the 02R mixer

via the ADAT Bridge digitally. The setup described was used to model the Indirect AC

or called the room response. The convolution implementation is further described in sec-

tion 2.3.3.

Both returns from Path 1 and Path 2 were internally routed to 02Rs main stereo output

in the digital domain.

The stereo output of the 02R was connected to a Samson HP-5 headphone amplifier

driving a pair of Audio-Technica ATH-A700 open-back headphones. A compensation

filter was implemented using the on-board equalizer on the 02R stereo output channel to

remedy the frequency anomalies induced by the headphones. It is described in section

2.3.4

The A/D & D/A conversions in the 02R and DN-716 are all 16-bit, 48kHz. Each conver-

sion stage introduces a processing delay of 0.02ms. The processing delay of Path 1

measured 0.14ms (A/D & D/A conversion and filter network in DN-716 and conversion

stages in 02R) when DN-716 is at its lowest setting 0.01ms whereas the processing delay

of Path 2 measured 11.74ms (A/D & D/A conversion plus latency of IR1 [11.6ms])

while IR1 was engaged and loaded with a 3-second long unit-sample sequence. All lev-

els were set at unity gain during the delay measurement.


30/80

19

2.3.2 Implementation of Direct Air Conduction Modeling

In Path 1, which is designed to model the Direct AC, the MRP-to-ERP transfer function

(measured in section 2.3.2.3) is approximated using the four-band parametric equalizers

PE-15, the digital delay DN-716 and the internal equalizer on the 02R.

2.3.2.1 Determining the PE-15 filter settingThe frequency response of the MRP-to-ERP impulse response was approximated using

an analog parametric filter. The precise settings of the filters were determined by over-

laying the transfer function of the PE-15 against the MRP-to-ERP magnitude-spectrumplot. Using the Live mode in EASERA, the MRP-to-ERP plot was pre-loaded. A pink

noise was fed to PE-15 at line level and itss output was connected directly back to

EASERA to obtain a live magnitude-spectrum plot while adjusting the PE-15 settings.

Figure 7 shows an overlay magnitude-spectrum plot of the impulse responses of MRP-

to-ERP and the determined filter settings in PE-15 (see Table 1). The plot was generated

in Matlab.


31/80

20

Figure 7 MRP-to-ERP Transfer Function & Direct AC Filter using PE-15 parametric equalizer (1/3

octave smoothing)

Table 1. Direct AC Filter Settings (Rane PE-15)

Direct AC Filter Band 1 Band 2 Band 3 Output Level

Gain (dB) +4.0 -5.5 -8.0 -18.0

Frequency (Hz) 90 800 7k -

Q 1.2 0.26 0.45 -


32/80

21

2.3.2.2 Determining the DN-716 delay timeThe initial arrival time of the ERP-to-MRP impulse response was implemented using a

digital delay line. The mean value of MRP-to-ERP propagation delay in human is 300 s

(or 0.3ms) as reported by Prschmann [17]. Thus, by subtracting the processing delay

0.14ms, the delay time to be inserted is 0.16ms (corresponding to a panel display of

0.17ms on the DN-716). The MRP-to-ERP transfer function measurement is described in

section 2.6.2.3. Various delay times are evaluated in section 2.7.1

2.3.2.3 Determining the 02R parametric equalizer setting

The headphones used in auralization introduced an insertion loss in the Direct AC path.

As a result, Path 1 essentially functions as Direct AC modeling and compensation of in-

sertion loss (CIL) induced by the headphones. The CIL filter was implemented using the

digital parametric equalizer on Channel 1 in the 02R mixer.

Two microphones, M30 and B3, were first calibrated for identical gain and then used to

measure the insertion loss of the headphone as shown Figure 8.

Figure 8 Setup for measuring the headphone's insertion loss using an isolation tube.


33/80

22

Figure 9 Isolation tube used in insertion loss measurement

An isolation tube (see Figure 9) was built to measure the insertion loss of the headphone.

The tube was 300mm in length and 250mm in diameter. It had a 50mm thick soft fiber-

glass outer shell with a thin layer of cotton lined in the inner wall. The headphone was

carefully mounted to the tube opening and is sealed with rubber for any air-gaps. A Ya-

maha MSP5 2-way loudspeaker was used to generate the measurement signal while an

M30 microphone was positioned close to the headphone enclosure outside the tube and a

B3 microphone was mounted 10mm away from the headphone transducer inside the tube.

The transfer function was recorded using EASERA. The inverted magnitude-spectrum

plot represents the compensation filter.

The internal digital parametric equalizer in the 02R was used to approximate the com-

pensation filter response using a similar adjustment method described above for PE-15

(section 2.3.2.1). To assure unity gain through the 02R during filter adjustment, a sine

tone was fed from EASERA and split to two. One was routed back to EASERA channel

1 and the other was connected to the 02R and returned to EASERA Channel 2. The CIL

filter implemented in the 02R is shown in Table 2.


34/80

23

Table 2. CIL filter settings (digital parametric equalizer on 02R)

CIL filter Band 1 Band 2 Band 3 Band 4

Gain (dB) +2.0 +3.0 - -

Frequency (Hz) 4k 10k - -

Q 0.2 0.1 - -

2.3.3 Implementation of Indirect Air Conduction Modeling

In Path 2, which modeled Indirect AC, a real-time binaural convolution was applied

using the Waves IR1 Convolution Reverb plug-in (see Figure 10). In order to time-align

correctly, the room impulse response to be convolved was first trimmed to eliminate the

direct sound. The length of the trim was determined by the propagation delay in Path 2

which measured 11.74ms (see Figure 11). A Hann window was applied to the trimmed

impulse response before importing to IR-1. A shortcoming resulted from the latency is

the incapability of reproducing the room response between Direct AC and the early

sound field up until 11.74ms (approximately 12 feet of traveling distance which is in the

order of a 6-foot tall person) which may include diffractions from the subjects own

body and the first back scattered sound from the floor or other possible nearby bounda-

ries. Nevertheless, the focus of the current research is stage acoustics which seldom in-

volves boundaries in close proximity (at least not in the case of this thesis). Also, there is

no direct specular reflection path between the mouth and the floor. The back-scattered

rays from the floor were assumed to have minimal influence on the perception of self-

voice.


35/80

24

Figure 10 Waves IR1 Convolution Reverb, loaded with a 3-second unit sample sequence

Figure 11 Impulse response trimming before importing to IR-1


36/80

25

2.3.4 Implementation of Headphone Equalization

The frequency response of the headphones was measured in a hemi-anechoic chamber.

An M30 microphone was positioned at 10mm in front of the headphones transducer.

The impulse response was recorded using EASERA. The internal digital parametric

equalizer on the 02R was used to approximate the headphone compensation filter using

the method described in section 2.3.2.1. The headphones response was plotted against

the inverted compensation filter in Figure 12 and the filter settings is shown in Table 3.

Figure 12 Headphone response and compensation filter (02R master output)


37/80

26

Table 3. Headphone compensation filter settings (02R master output)

Headphone

compensation

Band 1 Band 2 Band 3 Band 4

Gain (dB) +8.0 +2.5 -4.0 -2.5

Frequency (Hz) 40 185 1.7k 9.1k

Q 0.1 0.3 1.0 1.2

2.4 BRIR Acquisition System

In order to achieve repeatability in binaural acquisition, a head simulator (or some-

times called dummy head) was used. In the particular interest of this thesis, the sound

source and receivers corresponds to human mouth and ears, thus microphones and loud-

speaker were installed inside the dummy head. The heart of the design was a Brel &

Kjr Type 5930 head and torso simulator (HATS). The head geometry theoretically rep-

resented an average of human head physical features according to the compliance of

ITU-T Rec. P.58, IEC60959 and ANSIS3, 36-1985. It was retrofitted with a loudspeaker

unit of diameter 50mm, inside the mouth cavity as an artificial mouth. The microphones

mounted inside the HATS were Brel & Kjr Type 4010 omni-directional transducers.

The grille of the capsules aligned with the opening of the ear canals as binaural receivers.

(see Appendix for the microphone specifications in free-field). The structure of the

HATS is shown in Figure 13 and the position of the artificial mouth is illustrated in

Figure 14. Detailed dimensional information of the HATS can be obtained from the

Brel & Kjr website (www.bksv.com)


38/80

27

Figure 13 Frontal plane section of HATS, showing binaural microphones and the related fittings.

(Adapted from manufacturers manual)


39/80

28

Figure 14 Median plane section of HATS, showing the artificial mouth

To validate the representativeness of the HATS, a series of verification tests were

conducted to examine the binaural microphone characteristics, artificial mouth fre-

quency responses and MRP-to-ERP transfer function. (See section 2.6)

In BRIR acquisition, the HATS was supported by a microphone stand so that itsheight measured 5 ft 7 in or 67 inches (approximately 1.7-meter), which is about the

mean height between age 20 to 74 of both men and women reported by U.S. Department

Of Health And Human Services in 2004 [26]. The binaural microphones inside the

HATS were connected to the EASERA measurement system during acquisition. Before

data acquisition, their gain settings were optimized to -10dBFS using a microphone cali-


40/80

29

brator of 1kHz sine tone (105dBA). Since the binaural microphones cannot be easily re-

moved from the HATS for calibration, and any such repetitive preparation may imply

microphone position misalignment, a compromised approach was proposed. The calibra-

tor was positioned as close to the binaural microphone as possible while it was on axis.

The artificial mouth was driven by a Samson Servo 170 power amplifier which has

a published linear frequency response between 20Hz and 20kHz. Figure 15 shows the

BRIR acquisition system block diagram.

Figure 15 Binaural room impulse response acquisition system block diagram, showing how the bin-

aural ears and artificial mouth of the HATS are connected

As described in section 2.3.3, the binaural room impulse response was trimmed such

that the direct sound (including any contribution from internal (bone) conduction and

direct air conduction) in the BRIR was not included in the convolution.


41/80

30

2.5 Experimental Procedures in Subjective Test

2.5.1 Test Subject ConditioningIn the current research, it was inevitable to use test subjects voice in real-time

auralization, the sound stimuli thus becomes highly unpredictable and may lead to

erroneous results due to variance in the subjects conditions, both physiological (i.e.

vocal fatigue) and psychological (i.e. personal emotion). In order to minimize the

experimental errors, a set of experimental procedures for human subjects was adapted

and modified from the method used by Jnsdottir, et al [27]. In each measurement,

subjects are asked to read a piece of text at least twice before subjective scoring. Foreach subject, the same set of test was repeated 6 times within a period of 21 days; on the

test day, three trials were in the morning/midday, and three other trials were in late

afternoon/ early evening. Before each experimental session, subjects were asked to warm

up their voice to performing condition (which takes about 10-20 minutes). Subjects

were also assured of under no influences of drugs and alcohol. The above procedures

were expected to lessen the impact of subjects individual conditions over a period of

time.

2.5.2 Use of Dramatic Text in Study of ActorsThe objective of the thesis is to investigate voice support in theaters, and so subjects

were actors who had been trained professionally. Instead of sentences from the Harvard

Psychoacoustics Sentence Lists (often recommended for psychoacoustics research of

speech), a short edited excerpt from Shakespeares playHamletwas chosen for its well-

known dramatic expression and inclusion of most vowels and consonances in English.


42/80

31

To be, or not to be; that is the question;

To die, to sleep, nor more;

and by a sleep to say we end the heart-ache

and the thousand natural shocks.

The reason of using a dramatic text was because actors seldom read sentences that

do not have literal meanings. The Harvard sentences are far from reality and are consid-

ered to have no representation of acting in theater. The second argument in this particu-

lar thesis was that actors always know what they are going to say (as they have rehearsed

before performance), and because the current research is about self-perception, so there

is no issue of intelligibility of unexpected words/ vowel sounds from an unknown sound

source.

Subjects were given a sample recording of the text, prior to each test sessions, to get

accustomed to the rhythm and speed of the speech. And the entire test was recorded and

analyzed for the pace afterwards.

2.5.3 Verifying the Consistency of Self-voice Stimuli by Monitoring the Pace ofSpeech

A method of pace analysis was developed with reference to Andos work on the re-

lationship between subjective preference and objective parameters. Ando found that the

minimum effective duration (e min) of running auto-correlation function (ACF) of the

sound source is proportional to the most preferred delay of a single reflection [28].

Andos results suggested that the faster the tempo of the stimuli, the lower the resultant

e min and thus the shorter preferred delay of first reflection. In the current thesis, it was

attempted to use this parameter as a reference to the pace of speech.


43/80

32

ACF is defined by:

!p (") = lim

T#$

1

2Tp '(t)p '(t+ ")dt

%T

+T

&

where p(t) = p(t) * s (t), s(t) corresponds to ear sensitivity and is chosen as the impulse

response of an A-weighting filter, as suggested by Ando.

The normalized ACF is expressed as:

!p(") =

#p(")

#p(0)

The effective duration of the envelope of normalized ACF is represented by !e, which is

the initially 10dB of the ACF decay (or called the 10 percentile decay), obtained by line

regression.

!eis obtained at 100ms-intervals of the running source signal and the minimum

value is expressed as e min. Figure 16 shows the values ofe against the elapsed time of

two speech recordings. The e min for sample 1 and 2 are 0.39s and 0.37s respectively,

indicating that a slight variation of the pace of speech would not change the overall tem-

poral characteristics.


44/80

33

Figure 16 Example plot of effective duration of running autocorrelation function of two recordings

of the same text at different pace, showing the use of e min as a temporal reference for monitoring

subjective testing.

From Figure 16, it is observed that there is a slight shift of the peaks which indicates

the different paces of speech in the two samples (red is faster and blue is slower).

Through a number of trials, it was proven that e min can be used as a robust quantifier for

verifying the speech pace in subjective testing.

By calculating e min on each trial, the pace was ensured by maintaining a deviation

of less than 5% of the e min value of the sample recording. This is a procedure to secure

all subjects are giving their preference score under the same condition within a fixed tol-

erance.


45/80

34

A total of 15 subjects participated in this project, but 2 of them were dropped out

because they did not manage to attend all required tests. The final 13 subjects included 6

Caucasians, 2 Asians, 3 Black Americans and 2 Latin Americans.

All subjective tests in this thesis followed the above scheme unless otherwise stated.

2.6 HATS Verification TestsAll evaluation measurements of the HATS were conducted in an anechoic chamber

at General Electric Laboratory, Niskayuna, NY (see Figure 17). The volume of the

chamber was 5100 cu. ft. The background noise was rated at under 20dbA. Room tem-

perature and humidity were measured before and after each experimental session and

recorded negligible variations.

Figure 17 Verification Test of HATS in anechoic chamber at General Electric Laboratory (NY).


46/80

35

2.6.1 Binaural Microphones

The microphones at the two ears of the HATS were first calibrated using a sine tone

calibrator (1kHz at 105dBA) to achieve identical gain. A dodecahedron speaker was po-

sitioned at 1 meter away from the HATS, directly in front of the HATS mouth opening.

The frequency responses were averaged over 3 measurements taken with different

orientations of the dodecahedron speakers to minimize any potential error caused by di-

rectional characteristics of the dodecahedron loudspeaker in near field.

Figure 18 shows a frequency response overlay of the two binaural microphones.

Figure 18 Frequency response comparison between HATS binaural microphones

The result showed acceptable differences between the two binaural microphones.

The peaks and notches are due to head-related transfer function and pinna of the HATS.


47/80

36

2.6.2 Artificial Mouth

In BRIR acquisition, the loudspeaker unit inside the HATS was used to generate ex-

citation signals in the HATS mouth cavity; it is thus called the Artificial Mouth.

There are no readily available specifications of the artificial mouth from the HATS

owner. Its directivity, on-axis response, and ERP-to-MRP transfer function were meas-

ured.

2.6.2.1 DirectivityThe directivity measurements were conducted in 15-degree resolution on the horizontal

plane (full 360 degrees) and the vertical plane (range from -45 to 135 degrees). The

HATS was left stationary throughout the measurement session while the microphone

position was manually adjusted for each measurement.

Figure 19 Voice directivity of HATS (a) horizontal plane (b) vertical plane, in 4 octave bands

(250Hz, 500Hz, 1khz & 2kHz)

The directivity was plotted using Matlab and was compared to data from other pre-

vious artificial heads and other mean values of human voices, as shown in Figure 20.


48/80

37

Figure 20 Comparison of voice directivity, in 3 ocatve bands (500Hz, 1kHz & 2kHz)

The current HATS was found to be reminiscent to the human mean values reported

in literature. Although it did not directly reflect a higher reliability of experimental re-

sults, it did suggest the HATS was a good representation of human voice source.

2.6.2.2 On-axis Frequency Response

The on-axis frequency response of the HATS was measured at 1m directly in front

of the artificial mouth. Figure 21 shows the frequency response. It was rather erratic

which may have resulted from the construction of the HATS. The HATS itself had in-

herent resonance characteristics and the head cavity was not dampened by any materials.


49/80

38

The prominent peaks in the high-mid frequency range were believed to have relation-

ships with the resonant frequencies of the HATS.

Figure 21 On-axis frequency response of HATS artificial mouth. Overlays of no-smoothing and

1/6-octave smoothing.

Two resonant peaks were observed 6.6k and 7.6k Hz which roughly correspond to dis-

tances of 0.05m and 0.045m. It was believed they are related to the position of the loud-

speaker unit in respect to the HATS internal cavity.

The current artificial mouth response was compared to another commercially available

model, B&K Type 4128C artificial mouth (see Figure 22).


50/80

39

Figure 22 Frequency Response of B&K Artificial Mouth Type 4128C [adapted from manufacturers

datasheet]

The frequency response of a loudspeaker unit coupled to the HATS is expected to dis-

play anomalies due to the interaction with the physical geometry. It is hardly possible to

achieve a flat frequency response in such deviced. The assumption here is the given

similar voice directivity analyzed in section 2.6.2.1 is acceptable in the current study.

2.6.2.3 MRP-to-ERP Transfer FunctionTwo microphones were used in this measurement. They are Earthwork M30 omni-

directional microphone and a Countryman B3 miniature omni-directional microphone.

The two microphones were first calibrated by measuring their transfer function (see sec-tion 2.2) which was used as an equalization function in the measurement.

The M30 and B3 were placed at MRP and ERP respectively. The B3 capsule was

suspended such that there was no physical contact with the HATS. It was to eliminate

any internal vibration transmission from the artificial mouth to the microphone via the


51/80

40

HATS surface. The transfer function was measured in EASERA and compensated with

the equalization described above. Result is shown in

Figure 23 MRP-to-ERP transfer function of HATS (1/3-octave smoothing)

The above plot was smoothed and zoomed in order to compare to the averaged results of

human subjects in Prschmanns study, as shown in Figure 24.


52/80

41

Figure 24 Averaged frequency response of MRP-to-ERP (Direct AC) of 18 human subjects. The

grey area marks the satndard deviation. (Adapted from Porschmann [2000])

The HATS frequency response was considered to be roughly falling into the human

averaged values, except it was slightly lower in the frequency range between 300Hz to

2kHz.

2.7 Subjective Test on Naturalness of the Auralization SystemConsidering the deviation of every human heads, the validity of the auralization

setup can be proven by subjective tests. The aim of this evaluation test is to find the op-

timal delay time, and filter implementation for Direct AC, as explained in section 2.3.2.1

and 2.3.2.2. In the evaluation of the systems naturalness, Indirect AC (as stated in sec-

tion 1.2.3) was ignored.


53/80

42

2.7.1 Evaluation on Naturalness of CIL Filter Delay Time

The delay time implementation for Direct AC represents the propagation delay from the

mouth opening to the ear canal entrance. It is critical in the auralization process because

the delayed voice reproduced by the headphone was acoustically combined with the di-

rect sound traveling from the mouth through the open-back headphones, before entering

to the subjects ear canal. Improper delay time would induce perceivable echoes or

comb-filtering effect. Since everyones head geometry and facial features are different

from each other, the delay time may vary. It was assumed that the comb-filters were in-

audible if the separation of arrival time is short enough so that the lowest null frequency

of the comb filter is beyond the audible spectrum. For instance, a time delay of 20-

microsecond would produce comb filtering starting at a frequency of 25kHz.

This test was to find out the most natural and representational delay time to be used in

auralization without compromising the perceptual naturalness of self-voice. The tests

were conducted with 13 subjects following the procedures stated in section 2.5, using the

auralization setup in section 2.1.

During the test, subjects were exposed to a random sequence of delay times which in-

cluded 3 repetitions of the 8 delay settings. The random sequence was generated by Mat-

lab individually for each test set. For each setting, subjects were asked to read the given

text in full twice, one with headphones and one without. Subjects were allowed to repeat

reading and listening until they were comfortable to compare the sound of their own

voice with headphones to that without headphones (the reference). Then, the subjects

would rate the degree of similarity on a 7-point category scale in the range of 1 to 7 (1

being very dissimilar and 7 being very similar) for a given setting and notify the experi-

menter via the microphone before changing to the next delay setting.

The aims of this evaluation test are to find out the most optimal delay time for

the CIL filter for auralization and validate the effectiveness of the current system by

comparing the results in previous experiments. Thus, the delay times evaluated were


54/80

43

specifically chosen to match with those found in literature in order to make a direct com-

parison (see Table 4).

Table 4. Delay time in evaluation test of naturalness of Direct AC insertion loss compensation filter

(CIL filter). *The current system has a processing delay of 0.14ms with a setting of 0.01ms on

DN716. ** Prschmann's tested delay times were based on taps of a 48kHz Tucker Davis DSP sys-

tem, and they are represented here in millisecond for convenience of comparison.

DirectAC

Delay Time0.14ms 0.3ms 0.47ms 0.63ms 0.97ms 1.63ms 2.30ms 2.97ms

DN716*

Delay Setting0.01ms 0.17ms 0.34ms 0.50ms 0.84ms 1.50ms 2.17ms 2.84ms

Prschmanns

Test [2001]**-

0.3ms

(0taps)

0.47ms

(8taps)

0.63ms

(16taps)

0.97ms

(32taps)

1.63ms

(64taps)

2.30ms

(96taps)

2.97ms

(128taps)

The same test was repeated 6 times for each subject totaling 144 data. All results were

combined and compared with Prschmanns published results as shown in Figure 25.


55/80

44

Figure 25 Subjective evaluation of naturalness of delay time in Direct AC auralization. Mean score

and error rate of all subjects (95% confidence)

The above comparison showed that the current evaluation exhibit a similar preference

trend as observed in Prschmanns results. Subjects tended to prefer lower delay times.

The current experiment included 1 more delay setting (0.14ms), which is less than the

nominal MRP-to-ERP propagation delay of 0.3ms. Interestingly, the most preferred de-

lay in the current setup is 0.14ms. This might be resulted from the difference in the defi-

nition of MRP (MRP80 80mm from the lips) in the current research as compared to

Prschmanns (MRP40 40mm from the lips).


56/80

45

2.7.2 Evaluation on Naturalness of CIL Filter Level

This test was to find out the sound level of the CIL-filtered signal at the headphone to

achieve the best realism. The tests were conducted with 13 subjects following the proce-

dures stated in section 2.5, using the auralization setup in section 2.1.

There are six levels under test. Inf, -6dB, -3dB, 0dB, +3dB and +6dB. 0 dB indicates

unity gain of the CIL filter and Inf refers to the absence of the filter compensation path

in auralization. Positive dB values represent a gain in the filtered compensation whereas

negative values represent an attenuation.

During the test, subjects were exposed to a random sequence of sound levels which in-

cluded 3 repetitions of 6 different settings. The random sequence was generated by Mat-

lab individually for each test set. For each setting, subjects were asked to read the given

text in full twice, one with headphones and one without. Subjects were allowed to repeat

reading and listening until they were comfortable to compare the sound of their own

voice with headphones to that without headphones (the reference). Then, the subjects

would rate the degree of similarity on a 7-point category scale in the range of 1 to 7 (1

being very dissimilar and 7 being very similar) for a given setting and notify the experi-

menter via the microphone before changing to the next CIL-filter level setting.

The same test was repeated 6 times for each subject totaling 108 data. All results were

combined and compared with Prschmanns published results as shown in Figure 26.

The results showed that the current evaluation has an overall lower score in all settings

but displays a similar trend as observed in Prschmanns setup. Subjects agreed the

nominal level setting for the CIL filter is most natural, proving the successful implemen-

tation.


57/80

46

Figure 26 Subjective evaluation of naturalness of CIL filter level in Direct AC auralization. Mean

score and error rate of all subjects (95% confidence)

2.8 Discussion

Overall, the current auralization setup gave satisfactory results in evaluation of

naturalness in Direct AC. The CIL filter level was determined to be at unity gain. The

delay setting was chosen to be 0.01ms on DN-716 (resulting in 0.14ms of total delay).


58/80

47

3. Subjective Preference Tests on Stage Acoustic Conditions for Actors

3.1 Introduction

The focus of the current study is voice stage support in proscenium theaters. This

particular type of theater architectural design creates a special acoustical phenomenon. A

proscenium theater is characterized by the separation of a stage house from the main

auditorium by a large opening called proscenium. The two acoustic volumes (stage &

auditorium) can have very different acoustical properties depending on the design. In

general, the stage house usually has a high ceiling for counter-weight flying systems and

thick curtains were hung above and beside the stage area in order to mask off-stage ar-

eas, lighting instruments and unused scenery pieces. The stage is often not specifically

designed for acoustical purposes, whereas, in the auditorium, it is usually designed to

optimize the audiences aural experience.

The interaction of these two volumes is known as coupling space phenomenon in

architectural acoustics. Considering the actors location and movement on stage, they are

mostly in the stage volume. As they move closer to the audience and reach the prosce-

nium or the forestage, they are moving into the aperture of the coupled spaces where the

stage volume and the auditorium meet. Unlike an audience, actors are constantly moving

and turning on stage while acting. Their experience of the sound field changes drasti-

cally every second. As a beginning of research work in this field, the current study aims

at investigating acoustic conditions on stage presuming the actor is not moving. Actors

sometimes speak in a fairly fixed location for a sustained period when they are having

monologues (or prose). The most common positions are at center of the forestage or the

central area of the main stage. These two positions were studied in this research and two

more positions off to the side were chosen as well. Ideally, more positions should be

studied, but the limiting factor was the time required for each subject to go through all

acoustic conditions without causing their vocal fatigue.


59/80

48

Figure 27 Architectural plan of the main space at the RPI Playhouse. Dimension unit in inches. Blue

lines are dimensional guides. (CAD drawing courtesy of RPI Building Management)

3.2 Impulse Response AcquisitionBinaural room impulse responses were collected when the playhouse was unoccu-

pied. The auditorium of the 200-seat playhouse has an area of approximately 2650 sq. ft.

(246.1 sq. m.) and a volume of around 39750 cu. ft. (1125.5 cu. m.). The stage is located

opposite to the playhouse entrance separated by a proscenium with dimensions of 31.6 ft

by 14 ft. (9.6m by 4.2m). The stage level is 5 feet higher than the auditorium; and it has

an area of roughly 1260 sq. ft. (117 sq. m.) and a volume of 25200 cu. ft. (713.6 cu. m.).

Figure 27 shows the detailed dimensions. All seats were removed and the stage was

cleared during measurement sessions. The stage was set to a standard configuration of

masking flats and borders hung above stage.


60/80

49

Measurement was taken at four stage locations using the setup described in sec-

tion 2.4. The relative positions of the stage locations are shown on Figure 23. The acro-

nyms used stands for down stage center (DSC), down stage right (DSR), center

stage center (CSC) and center stage right (CSR). It should be noted that stage right

is defined as the right-hand-side from the actors perpective when he or she is facing the

audience. The four stage locations were 3 meters apart from each other.

Figure 28 Stage locations where BRIR was measured. Dashed line labeled "CL" is the center line of

the stage across the proscenium.

At each stage positions, the HATS were adjusted to three different head orienta-

tions manually with the aid of rotation angle guide marked on the top of the HATS. The

relative angles of the head rotation are shown in Figure 29. The height of the HATS was


61/80

50

maintained at 5 feet and 7 inches (about 1.7 meter) above the stage floor as described in

section 2.4.

Figure 29 Top view of the HATS showing 3 different head orientations in binaural room impulse

response acquisition.

Due to the background noise level at the playhouse, an excitation signal of pink

sweep sine of length 21.8s was used to achieve better signal-to-noise ratio (SNR). The

reported average SNR in all measurements was 52dB. Three averages were taken in each

measurement.

3.3 Subjective Test Design3.3.1 Preference ratings from paired comparison

In section 2.7, a 7-category rating method was chosen for a reason - to generate

compatible results to compare with previously published data. Nevertheless, it has uncer-

tainty in absolute judgment between subjects. Paired comparison can help reduce the ab-

solute judgment errors and also the relative judgment errors. In this chapter, different

combinations of acoustic conditions were given in pairs to the subjects in a randomized

order. In each of the paired comparison tests in this chapter, there were four different

acoustic conditions. The six pair-combinations were A-B, A-C, A-D, B-C, B-D and C-D.

Each pair was repeated twice, one in forward order while the other in reversed order.

This would give a total of 12 randomized pair-conditions for each subject in each test.


62/80

51

(The author first proposed 4 repetitions of each pair, resulting in 24 test conditions, but

test subjects reported that they had vocal fatigue and lost concentration after a certain

period of time (usually 30 minutes). Thus, in order to balance reliability of test results

and test subjects personal comfort, it was reduced to 12 pair-conditions in the stage

preference testing.)

3.3.2 Test procedures

In each comparison, the pair-conditions of (Indirect AC) were pre-loaded to preset

A and preset B in IR-1 (see Figure 10 and note the long A: " button below the

red circle named RTAS). The subjects were able to compare the pair-conditions by

asking the experimenter to A/B swap the presets and they allowed to spend as much time

as they wanted in each preset (condition). After auditioning both, the subjects were

asked, Which did you prefer? The preferred condition was scored as +1 and otherwise

as -1. Preference scores were added and normalized. A score of 1.0 indicates complete

unanimity of preference judgment of the acoustic condition, 0.0 means an equal number

of yes and no scores, and -1.0 means complete agreement on a negative preference.

After each complete set of paired comparison, the subjects were allowed to rest for

5 minutes before the next set of tests began. All other procedures follows section 2.5.

3.4 Paired Comparison Test on Stage Locations

The paired comparison tests were in three groups. Each group represents one head orien-

tation they were look center, look left and look right. Four stage locations were

studied in each group. The preference scores were obtained and analyzed.


63/80

52

3.4.1 Preference study of stage locations when head orientation is look center3.4.1.1 ResultThe preference scores for each subject were averaged over the 6 test sessions. Scores

were obtained for 4 Conditions (A,B,C and D) corresponding to the 4 stage locations

(DSC, DSR, CSC and CSR) respectively. The results of all 13 subjects are plotted indi-

vidually in Figure 30. Then, scores of all subjects in each condition were averaged and

plotted as well.

The result shows an agreement of preference on Condition A (DSC-down stage cen-

ter) and Condition C (CSC-center stage center) in all subjects while Condition A is

slightly more preferred than Condition C in the overall average. Another agreement

across all subjects is the negative preference on Condition D (CSR).


64/80

53


65/80

54


66/80

55

Figure 30 Preference score of different stage locations when head orientation is "look center"

(Conditions - A: DSC, B: DSR, C:CSC, D: CSR). Normalized scores of 13 individual subjects A-M

(blue bar graph) and overall average score of all subjects (red bar graph)

3.4.1.2 AnalysisUpon interviewing with the test subjects after the experiment, their common feedback

was the lateralization of the sound decay and the change in the sense of envelopment.

Some of the subjects had questions, during test sessions, of whether the volume of the

headphones was not balanced on the two channels. Subjects expressed an inclination to

spatially balanced and encapsulating sound fields. Interaural cross-correlation function

(IACF) is commonly used in analyzing the impact side reflections and subjective prefer-

ence of room width. IACF can also be used to visualize the lateralization of a running

sound source.

IACF is defined as:

IACFt(!) =

PL(t)P

R(t+ !)dt

t1

t2

"

PL

2(t)

t1

t2

" PR2(t)dt

t1

t2

"#$%%

&

'((

1/ 2

where L and R refer to the entrances to the left and right ear canals. The maximum pos-

sible value of IACF is one when both signals are the same. The variable accounts for


67/80

56

the time difference between the two ears and is varied over a range from -1 to +1 ms

from the first arrival [29].

The IACF for each condition was calculated in 100-ms intervals and plotted in Figure

31.

Figure 31 Interaural cross-correlation functions in 100ms-intervals for conditions A-D when head

orientation is "look center".

From the above IACF plots, the spike at time = 0ms indicates the highest correla-

tion between the two ears at the onset of the impulse response. As the sound decays, the

correlation rapidly reduces to around zero. This suggested the early reverberation field (0

to 400ms) was fairly diffused in all four conditions. The correlation rises towards the

late reverberation field (after 400ms). It should be noted that the rise in IACF does not

result in a prominent peak across the binaural sound field. This rise is apparent in both


68/80

57

Condition A & C, which suggests a correlation between the subjective preference score

and the behavior of the late reverberation field. The late reverberation in the playhouse

was believed to have contributed by the reflections from the back wall in the auditorium.

The absence of peak in late IACF implies the diffusiveness of the late reflections and a

sense of envelopment.

Furthermore, the energy decay was examined. Due to the fact that, in acquiring

the impulse response, the sound source (artificial mouth) and the receivers (binaural

ears) are in extreme close proximity, conventional reverberation time calculation cannot

be applied directly to inform the subjective sensation of the decay. It is also unknown

that how human perceives reverberation time of the same acoustic space differently

when listening to his or her own voice versus other sound sources. As a result, an alter-

native parameter, Voice Stage Support (VSS), was proposed to analyze the energy de-

cay:

VSSti = 10log10

p2 (ti )dtti !90

ti +10

"p2 (t)dt

0

10

"

where ti is the time interval every 100ms.

Since the energy of the direct sound from the artificial mouth to the ears is constant, it is

taken as a reference of initial energy. The energy ratio was calculated on every 100ms

interval after the initial 10ms. The results of VSS of

Date post:	06-Apr-2018
Category:	Documents
Upload:	wa-yuen
View:	228 times
Download:	0 times

Wa Ms Thesis

Documents