+ All Categories
Home > Documents > The Sound of Reality Sound of Reality ... Carl Vikman, for taking interest in this thesis ......

The Sound of Reality Sound of Reality ... Carl Vikman, for taking interest in this thesis ......

Date post: 07-Jun-2018
Category:
Upload: tranque
View: 215 times
Download: 0 times
Share this document with a friend
62
The Sound of Reality Simulated spatial acoustics in modern game worlds MARKUS BOGREN ERIKSSON Bachelor of Science Thesis Stockholm, Sweden 2010
Transcript

The Sound of Reality

Simulated spatial acoustics in modern game worlds

M A R K U S B O G R E N E R I K S S O N

Bachelor of Science Thesis Stockholm, Sweden 2010

The Sound of Reality

Simulated spatial acoustics in modern game worlds

M A R K U S B O G R E N E R I K S S O N

Bachelor’s Thesis in Media Technology (15 ECTS credits) at the Degree Programme in Media Technology Royal Institute of Technology year 2010 Supervisor at CSC was Jenny Sundén Examiner was Daniel Pargman TRITA-CSC-E 2010:150 ISRN-KTH/CSC/E--10/150--SE ISSN-1653-5715 Royal Institute of Technology School of Computer Science and Communication KTH CSC SE-100 44 Stockholm, Sweden URL: www.csc.kth.se

The Sound of Reality

Simulated spatial acoustics in modern game worlds

Abstract

Spatial acoustics constitute a major part of our experience of the world. The ability to make sense of our environment depends on spatial awareness and hearing plays a major role in this process. Yet in videogames, the acoustic information available to us is reduced. This cripples our hearing, leaving spatial awareness dependant on vision alone.

This thesis examines the problems surrounding the simulation of spatial acoustics in videogames. The purpose of the thesis is to provide Starbreeze, a Swedish game developing studio, with guidelines of how to simulate spatial acoustics in order to create even more immersive gaming experiences. The primary question that this thesis tries to answer is how the gaming experience is affected by the use of simulated spatial acoustics.

To answer this question, eight people were interviewed about their experience playing a set of videogames. The games used were: “Call of duty: Modern Warfare 2”, “Battlefield: Bad Company 2”, “Halo 3” and two games created especially for the investigation. The games were all different in terms of the complexity of the simulated spatial acoustics.

The results show that games that use simulated spatial acoustics are perceived as more realistic and more immersive than games with less simulated spatial acoustics. The player’s ability to localize sound sources is however not affected.

Så låter verkligheten

Simulerad spatial akustik i moderna spelvärldar

Sammanfattning

Spatial akustik har stor inverkan på hur vi upplever världen. Förmågan att förstå vår omgivning är beroende av vår rumsuppfattning och hörseln har en viktig roll i denna process. Men i dataspel är endast en bråkdel av den akustiska informationen tillgäng-lig. Detta begränsar hörseln avsevärt, och resulterar i att vår rumsuppfattning enbart blir beroende av synen.

Detta examensarbete undersöker problematiken kring simulering av spatial akustik i dataspel. Syftet med arbetet är att förse Starbreeze, en svensk spelutvecklare, med riktlinjer för hur de skall simulera spatial akustik för att skapa ännu mer engagerande spelupplevelser. Den primära frågan som detta arbete försöker besvara är hur spel-upplevelsen påverkas av användningen av simulerad spatial akustik.

För att besvara denna fråga har jag intervjuat åtta personer om deras spelupplevelse i en rad spel. De spel som användes var: “Call of duty: Modern Warfare 2”, “Battlefield: Bad Company 2”, “Halo 3”, och två spel producerade enbart för denna undersökning. Den simulerade spatiala akustikens komplexitet skiljde sig åt i alla dessa spel.

Resultaten av undersökningen visar att spel som innehåller simulerad spatial akustik uppfattas som mer realistiska och mer engagerande än spel som innehåller mindre si-mulerad spatial akustik. Spelarens förmåga att lokalisera ljudkällor påverkas däremot inte.

Acknowledgements

I would like to thank my supervisor at Starbreeze, Carl Vikman, for taking interest in this thesis project and for guiding me through the world of game sound. I would like to thank Johan Althoff and the rest of the people at Starbreeze. I would like to thank Stefan Strandberg at DICE, Anders Friberg at KTH and all the people who participated in the interviews.

I would also like to thank my supervisor at KTH, Jenny Sundén for her great help, support and feedback.

I am very grateful for my family and friends who helped me during the work on this thesis.

Markus, August 2010

Contents

1. Introduction................................................................................................................... 1

1.1. Purpose.....................................................................................................................................2

1.2. Problem definition..................................................................................................................2

1.3. Methodology............................................................................................................................2

1.4. Terminology.............................................................................................................................2

1.5. Delimitation .............................................................................................................................3

1.6. Structure of the thesis.............................................................................................................4

2. Theory............................................................................................................................6

2.1. What is sound? ........................................................................................................................6

2.1.1. The sound wave.....................................................................................................6

2.1.2. Sound levels and decibel ......................................................................................7

2.1.3. Spreading losses.....................................................................................................8

2.1.4. Atmospheric absorption.......................................................................................9

2.1.5. Sound in reflective spaces ..................................................................................10

2.1.6. Absorption ...........................................................................................................11

2.2. How do we hear sound? ......................................................................................................12

2.2.1. Outer ear...............................................................................................................12

2.2.2. Middle ear .............................................................................................................13

2.2.3. Inner ear................................................................................................................13

2.3. How do we localize sound sources? ..................................................................................14

2.3.1. Time cues..............................................................................................................14

2.3.2. Amplitude cues ....................................................................................................15

2.3.3. Amplitude cues in multichannel audio systems..............................................16

2.3.4. Head movements and the cone of confusion.................................................17

2.3.5. Influence of vision ..............................................................................................17

2.3.6. Distance perception ............................................................................................18

2.3.7. The Precedence effect ........................................................................................18

2.3.8. Room perception.................................................................................................19

2.4. Current technology in the gaming industry ......................................................................21

2.4.1. Spatialization ........................................................................................................21

2.4.2. Room acoustics....................................................................................................22

2.4.3. Occlusions and obstructions .............................................................................23

2.4.4. Sound cues not used in game worlds ...............................................................24

2.4.5. Realism vs. Aesthetics.........................................................................................24

3. Investigation................................................................................................................26

3.1. Hypothesis..............................................................................................................................26

3.2. Methodology..........................................................................................................................26

3.3. Stimuli .....................................................................................................................................27

3.3.1. Game session 1....................................................................................................27

3.3.2. Game session 2....................................................................................................28

3.4. Participants.............................................................................................................................30

3.5. Interviews...............................................................................................................................31

3.6. Results.....................................................................................................................................32

3.6.1. Immersion ............................................................................................................32

3.6.2. Realism..................................................................................................................33

3.6.3. Localization ..........................................................................................................33

3.7. Reliability and validity...........................................................................................................33

3.7.1. Comparison between games..............................................................................33

3.7.2. Participants ...........................................................................................................34

3.7.3. Terminology.........................................................................................................34

3.7.4. Sources outside the gaming industry................................................................34

4. Analysis........................................................................................................................36

4.1. Immersion, Realism, Localization ......................................................................................36

4.2. I can’t hear any difference, but it sounds better...............................................................36

4.3. Dynamic sound environments............................................................................................36

4.4. Room size vs. absorption.....................................................................................................37

4.5. Proposal..................................................................................................................................38

4.5.1. Directional parameter for early reflections .....................................................38

4.5.2. Dynamic reverb parameters...............................................................................39

4.5.3. Propagation ..........................................................................................................39

4.5.4. Directional echoes...............................................................................................40

4.5.5. Spatialization ........................................................................................................41

4.5.6. HRTF....................................................................................................................41

4.5.7. Causality ................................................................................................................42

5. Conclusion...................................................................................................................44

1

1. Introduction

We are constantly surrounded by sounds. Wherever we go, at all times, sounds are a major part of our experience of the world. In order to make sense of the world, we depend on spatial awareness to guide us through the vast amount of impressions that bombard us from every direction.

Hearing is crucial for our spatial awareness. While our eyes are directed at single objects, our ears take in our entire surroundings, constantly analysing it to provide us with a mental picture of everything that surrounds us. While your eyes are reading this text, your ears and brain are continuously gathering and processing the information provided by thousands of sound waves. Without raising your eyes from this sheet of paper, you are still able to remain aware of the environment that surrounds you: What kind of space are you in? What size is it? Are there people around you? What are they doing?

This is possible because of the extraordinary abilities of our hearing organs: the ears, and the brain. These work constantly and subconsciously, without interruption, from several weeks before we are born, until we die. Our ears are precision instruments capable of perceiving frequencies in the range of ten octaves, whereas the equivalent visual range is one octave. The loudest sound we can experience without damaging our ears has a thousand million times more power than the weakest. This ratio is ten thousand times greater than the visual equivalent. [18] Our brain possesses incredible computational power, capable of analysing sounds with great speed and precision. Together, our hearing organs can perform tricks that no human technology can mimic, and it is all possible thanks to information carried by sound waves.

Sounds do not only provide us with information about their source, but also about the space through which they have travelled and the materials with which they have interacted before reaching our ears. Every reflection, and every meter travelled, changes the character of the sound wave. It is the propagation of sound waves that provide the information used for creating spatial awareness, not the original sounds themselves. This is what spatial acoustics is all about: the propagation of sound from the sound source to the listener.

So how do we use these abilities when playing videogames? The short answer is: We don’t. The game worlds in the majority of modern games are not created to allow us to make full use of our hearing sense.

The graphical aspects of videogames have become incredibly advanced since the introduction of videogames to the consumer market in the early seventies. [8] The visual realism has since then become astounding, and the technologies for graphical simulation are still rapidly moving forward. Sound on the other hand, has been left in the dark. The simulation of spatial acoustics in games has actually not developed much over the last 10 years, at least not close to as much as the development of computer graphics.

The visual representation of the game world is based on the reflection of light, just like in the real world. We do not only see objects that themselves emit light. That would mean that in a candle-lit room, we would only see the flame of the candles and not the surrounding surfaces that the candles illuminate. Such scenario might seem ridiculous but this is in fact the way that sound is simulated in many contemporary games. Sound sources are placed within the game world, but the reflections of the sound that they emit, the spatial acoustics, are mostly ignored. While our eyes marvel at the visual part of the game world, our ears are still stuck in the candle-lit room.

What would happen if we “turned on the light switch”? What would happen if we add an additional sense to the experience of gaming? What effects would it have on our experience of the game world?

This thesis strives to bring light to the problems surrounding the simulation of spatial acoustics in games. I will investigate how spatial acoustics affect the gaming experience, and I will investigate possible development of technologies for simulating spatial acoustics.

CHAPTER 1. INTRODUCTION

2

1.1. Purpose

The overall aim of this thesis is to provide Starbreeze with guidelines on how to develop their audio engine to be able to create as immersive gaming experiences as possible by creating realistic acoustic environments.

1.2. Problem definition

• How is the gaming experience affected by the acoustic realism of the game? • How should the current technologies for simulating spatial acoustics be developed in order

to create as realistic acoustic environments as possible? • How does the human auditory system perceive spatial acoustics?

1.3. Methodology

To answer the first question, eight interviews were conducted. The interviews consisted of two gaming sessions where the interviewees were asked to play three contemporary videogames as well as four game levels created solely for this survey. After playing each game, the interviewees were asked to rate specific parts of the game and to answer a few questions about specific aspects of the gaming experience.

The second question was answered through discussions with game sound designers and through working with current game sound design tools.

The third question was answered through literature studies.

1.4. Terminology

Here follows a few terms that appear frequently throughout this report. This should not be considered to be the general definition of the terms, but rather the definition used in this thesis.

Simulated Spatial Acoustics

“Acoustics” is the science of sound. The word “spatial” implies that an object is related to, occupying or having a character of space. Spatial acoustics refers to how sounds are affected by the space through which it travels from the sound source to the listener. Simulated spatial acoustics refers to how the environment’s effect on the sound is simulated within the game world.

It is common throughout this report that “Simulated spatial acoustics” is referred to with the abbreviation “SSA”.

Game worlds

A game world is the artificial world in which a game takes place. It might be the streets of ancient Jerusalem in Assassins Creed, the sunken city of Rapture in Bioshock or the Mushroom Kingdom of Super Mario Bros.

CHAPTER 1. INTRODUCTION

3

Sound and sound sources

When you hear a sound, for example a shot from a gun, the loud “BANG” that reaches your ear is the sound, and the gun is the sound source. It is important to distinguish between a sound and a sound source since spatial acoustics means that the sound is changed by the space in-between the sound source and your ear.

Realism

A player will perceive a game as realistic if the game world is close to what the player expects that the real world would be like in a similar situation. This means that realism is a subjective property and that different people might perceive the same game as differently realistic depending on their real world experiences of the played scenario.

Immersion

Immersion is a state of consciousness where the person perceives himself to be inside the game world rather than the in real world. The more immersive a game is, the more the player feels that he is part of the game and the game world.

Realism and immersion is not directly related, though realism seems to be an important condition for immersion.

Technologies and models

When speaking of “technologies” for simulating spatial acoustics, I mean a general technique, algorithm, method or group of methods that can be used to simulate spatial acoustics.

When speaking of “models”, I mean a specific technique, algorithm or method that a game developer uses to simulate spatial acoustics by using one or several technologies.

The Player

When speaking of a general player, i.e. someone who is playing a game, I will refer to this player as male. I am fully aware that many gamers are female and it is not my intention to add to the prejudice that games are played mostly by males. The generalization is made solely for the sake of the simplicity of the text.

1.5. Delimitation

I will in this thesis focus on games in the first-person shooter genre. The primary reason for this is that in all games in this genre, the player experiences the game world through the eyes and ears of the character that he is playing.

Not only does this point of view place greater demands on realism and immersion, but it also eliminates some problems that would arise in a third person game. One such problem is that the player of a third person game does not view the world through the eyes of the played character but rather through a camera following the character.

The reason for focusing on shooters is that these games tend to contain a large amount of audio events that the player needs to process, unlike other first-person games such as adventure games and point-and-click.

I have also decided to focus on surround sound. The major reason for this is that an absolute necessity for realistic spatial acoustics is that the sound surrounds the player from every direction. As 5.1 or similar surround systems only surround the player in the horizontal plane, speakers above and below the player would be desirable. This is however very unpractical and such systems do not

CHAPTER 1. INTRODUCTION

4

exist on the consumer market. Since the majority of events in a FPS game take place in the horizontal plane, one can also argue that horizontal surround is enough to stimulate the players’ sense of immersion.

1.6. Structure of the thesis

This thesis consists of 5 chapters where this first chapter serves as an introduction to the rest of the thesis.

Chapter 2 answers the question of how the human auditory system perceives spatial acoustics. The chapter covers the theory needed for understanding the problems and results discussed in the thesis. I will explain the basic physics of sound, the biology of human hearing, and the psychoacoustics of sound localization. This chapter also covers the current state of technology for simulating spatial acoustics in games.

Chapter 3 covers the investigation. I will present my hypothesis and describe the methodology of the investigation. I will describe the stimuli, the participants and the structure of the interviews. The results of the investigation are presented at the end of this chapter.

Chapter 4 covers the analysis of the results of the investigation and the proposal for further development of simulated spatial acoustics.

Chapter 5 consists of a short conclusion of the thesis.

Chapter 2

Theory

6

2. Theory

2.1. What is sound?

A sound is basically a travelling wave of fluctuating pressure in the medium through which the wave is travelling. The medium can be solid, liquid or gas, but for the rest of this chapter, if nothing else is stated, we will assume that the medium is air.

A sound usually originates from a motion or vibration of an object. The motion of the object is transferred to the surrounding air as varying changes in air pressure. The molecules in the air are squeezed closer together (compression), and pulled further apart (rarefaction) than they normally are. It is this fluctuating pressure that we define as sound waves.

2.1.1. The sound wave

There are some important characteristics of a sound wave that we need to understand in order to understand acoustics.

To describe these characteristics, we will look at one of the simplest types of sound, the sine wave. Illustrated in Figure 1.

Figure 1: The sound wave

One full cycle of a sound wave consists of a half cycle of compressed molecules and a half cycle of rarefied molecules.

The frequency (f) of the wave is defined by how many times per second the wave repeats itself. Frequency is specified in hertz (Hz) where 1 Hz = 1 cycle per second.

The amplitude (A) is defined as the amount of pressure variation about the mean. A sound that we perceive as loud will usually have a higher amplitude than a weak sound. (Although “loud” is a subjective term and the loudness of a sound is not always directly relative to the amplitude.)

The phase is the portion of the cycle through which the wave has advanced in proportion to a fixed point in time. Phase is measured in degrees where 0° is the reference point and 360° is the point

CHAPTER 2. THEORY

7

where the wave has gone through a full cycle from the reference point. Phase is not important when observing single sound waves, but it is important when looking at the relationship between two different sound waves.

The time it takes in seconds for a wave to complete a full cycle is called the period (T)

fT

1=

The distance in meters that the sound wave travels over one cycle is called the wavelength (λ )

f

c=λ

The speed with which sound travels through air (c) depends on several factors. The speed increases with increasing temperature, and is also to some extent dependent of particle density and humidity. The speed of sound in a gas can be calculated with

0

02

ργp

c =

where c is the speed of sound, γ is the ratio of heat capacities, 0p is the equilibrium pressure and

0ρ is the equilibrium density. At 0°C, this gives the speed of sound in air a value of 331.5 m/s.[7]

2.1.2. Sound levels and decibel

Magnitudes of sound are usually specified in terms of intensity (I). Intensity is the amount of sound energy transmitted per second through a unit area (W/m²). Experimentally, one can measure the intensity:

dA

dPI =

Where dP is the acoustic power interacting with an infinitesimal detector with the area dA .

Magnitudes can also be specified using sound pressure (p). Sound pressure is described as the deviation in pressure from the ambient pressure, caused by the acoustical energy of the sound wave.

0ppp T −=

Where p is the sound pressure, Tp is the total pressure and 0p is the ambient pressure. The most

common unit for pressure is Newton per square meter (N/m²). but one might also use Pascal (Pa)

1 Pa = 1 (N/m²)

For a medium such as air there is a simple relationship between the pressure variations of a sound wave and the acoustic intensity; the intensity is proportional to the square of the pressure. [11]

2pI =

The human ear can perceive a huge variety of sound pressures. Ranging from 20µPa to 200Pa. Because of this large range, it is convenient to use relative, logarithmic measurements to express sound levels. Therefore, for describing sound levels, we use the decibel (dB) scale.

The decibel scale describes a physical quantity (e.g. intensity or pressure) relative to a specified reference quantity. In acoustics, the reference levels for intensity (IRef) and pressure (pRef) have been

CHAPTER 2. THEORY

8

defined to the minimum intensity/pressure that the human ear can perceive at around 1-3 kHz in a noise-free environment. The reference levels for intensity and pressure are:

212 /10 mWI ref−= and Papref

5102 −×=

When the magnitude of sound is specified in decibels, it is customary to use the term sound pressure level (SPL) and sound intensity level (SIL).

=

refI

IdBSIL log10)(

=

refp

pdBSPL log20)(

This is what we generally mean when we say that “the loudness of a certain sound was 80 dB”.

Since intensity is proportional to the square of the pressure, we can use decibels to express ratios of pressure as well as ratios of intensity. This is because the difference in decibels will be the same no matter if we use intensity or pressure to do the calculations. [10]

Difference in decibels =

2

1log10I

I =

2

2

1log10

p

p=

2

1log20p

p

It is useful to notice that a tenfold increase in SPL is equivalent to 20 dB and a doubling in SPL is equivalent to 6 dB.

2.1.3. Spreading losses

In acoustics, a free field is an environment in which there are no reflections. This is something humans seldom experience since even outdoors, in open spaces, the ground will reflect and absorb sound. The closest thing to a free field that humans might experience is high above the ground (bungee jumping or hang-gliding) or on an open field where the ground absorbs reflections, like snow or high grass. Free field can also be experienced inside an anechoic chamber. An anechoic chamber is an artificial environment created within a large room containing large amounts of absorbing material that reduces reflections to a minimum, creating a virtual free field.

In a free field, all sound that is generated by a sound source is radiated outwards and no sound is reflected back. This results in that the sound level decreases proportionally to the distance from the sound source. Sound intensity at a specific distance is easily calculated with:

r

PI

π4=

where P is the acoustic power in Watts and r is the distance in meters from the sound source. In logarithmic form, the relationship between sound pressure level and distance may be written:

11)log(20 −−= rPSPL

This equation shows that a doubling in distance from a sound source will result in a reduction of 6 dB in sound level.

CHAPTER 2. THEORY

9

2.1.4. Atmospheric absorption

As sound travels through air, a proportion of the sound energy is converted to heat through heat conduction, shear viscosity and molecular relaxation.

The atmospheric absorption becomes noticeable at high frequencies and long distances and results in that air acts as a low-pass filter at long range.

The Absorption coefficient (or Attenuation coefficient) αm specifies the sound level attenuation per meter for a specific frequency (dB/m) (Figure 2)

Exact calculations of αm is complex as it involves not only frequency but also relaxation frequencies associated with the vibration of nitrogen and oxygen molecules, molar concentration of water vapour in the atmosphere, local and reference pressure and temperature. Therefore, exact calculations will not be covered here. (Detailed information on this subject can be found in [11] and [7])

Figure 2: Absorption coefficient (αm) over frequency for 20 degrees C and 75% relative humidity

CHAPTER 2. THEORY

10

2.1.5. Sound in reflective spaces

As mentioned before, humans seldom experience sound in a free field, but rather in reflective spaces. A reflective space is an environment completely or partly surrounded by surfaces that reflect sound. E.g. indoors, or in an urban environment where sound is reflected by the ground and by surrounding buildings.

When a sound source radiate sound in an enclosed reflective space, a proportion of the sound energy will be absorbed by the air and by the surrounding surfaces, but most of the sound will be reflected back into the environment. Some energy will actually be transmitted through the surfaces, but for now we will consider this as part of the absorption. The reflection is specular, which means that the angle of incidence θi is equal to the angle of reflection θr. (Figure 3) (Sound wave propagation is in fact much more complex but will not be explained in detail here. For more information, see [7])

Figure 3: The angle of incidence

As a sound impulse is emitted in an enclosed space, the sound will radiate from the source in all directions, assuming the sound source is omnidirectional. The sound will be reflected by the surface of the walls, floor, ceiling and other objects within the space. The reflected sound will in turn also be reflected as it hits the boundaries of the space. This will continue until all sound energy is absorbed by the air or by the surrounding surfaces, or transmitted through the boundaries. (We will come back to surface absorption later in this chapter).

The time it takes for all sound to be absorbed depends on the size of the space and the materials of the enclosing surfaces. The persistence of reflected sound in a space is called reverberation.

When describing the acoustical characteristics of a room it is common to talk about reverberation time. Reverberation time, RT60, is defined as the time required for the sound level of a continuous sound to drop by 60 dB once it has stopped. It can be calculated by

aS

VRT

161,060 =

Where RT60 is the reverberation time, V is the volume of the room and S is the surface area in the room.a is called the average Sabine absorptivity of the room and is defined by

SAa /=

where A is the total sound absorption, which is the sum of all absorption from all surfaces in the room. [7] The reverberation time also varies over different frequencies as the absorption in the air and in the surface materials varies depending on frequency.

CHAPTER 2. THEORY

11

2.1.6. Absorption

As mentioned before, when a sound wave hit a surface, part of the sound energy will be reflected and part of the energy will be absorbed by the material. The absorbed energy is transformed to heat. This is something that is familiar to most people. For example, the differences between different environments become very apparent when we move between a bedroom and a bathroom. A bedroom is (usually) filled with many “soft” materials that absorb the sound energy to a higher extent than a room covered with “hard” materials like the tile in a bathroom. This will result in more absorption and shorter reverberation time in the bedroom than in the bathroom.

The most common measurement of absorption is the absorption coefficient α. Unlike atmospheric absorption where αm indicates the dB decrease per meter, material absorption coefficient α is a ratio of absorbed to incident sound energy. This means that if the absorption coefficient of a material is α = 0,70 then 70% of the sound energy will be absorbed.

Table 1: Absorption coefficients for some common materials. Data from [7] and [11].

Material 125 Hz 250 Hz 500 Hz 1 kHz 2 kHz 4 kHz

Painted wall 0.10 0.08 0.05 0.03 0.03 0.03

Coarse concrete 0.36 0.44 0.31 0.29 0.39 0.25

Smooth concrete 0.10 0.05 0.06 0.07 0.09 0.08

Smooth brick 0.03 0.03 0.03 0.04 0.05 0.07

Glass 0.05 0.03 0.02 0.02 0.03 0.02

Heavy drapery 0.14 0.35 0.55 0.72 0.70 0.65

Ceramic tile 0.01 0.01 0.01 0.01 0.02 0.02

Heavy carpet 0.02 0.06 0.14 0.35 0.60 0.65

Upholstered seats 0.19 0.37 0.56 0.67 0.61 0.59

Occupied seats 0.39 0.57 0.80 0.94 0.92 0.87

Soil 0.15 0.25 0.40 0.55 0.60 0.60

Grass 0.11 0.26 0.60 0.69 0.92 0.99

Water surface 0.01 0.01 0.01 0.01 0.02 0.03

Acoustic tile 0.10 0.25 0.55 0.65 0.65 0.60

CHAPTER 2. THEORY

12

2.2. How do we hear sound?

What we generally refer to as “The Ear” is only one piece of the intricate system that is the human auditory system. (Figure 3) The purpose of the auditory system is to convert the sounds of the ambient field into nerve signals that is processed and interpreted by the brain.

As we shall see, the purpose of the auditory system is not only to transfer the sounds to the brain, but also to enhance the level of the sounds and to add additional information. This additional information is very important to our ability to localize sound sources and to interpret the sound environment.

Figure 3: The Ear [Illustration from University of Washington Medical Centre]

2.2.1. Outer ear

The outer ear consists of the pinna and the ear canal. It has two main purposes: The first is to efficiently capture the sound energy and transfer it from the ambient field, through the ear canal, to the eardrum. The second purpose is to add directional information. Other advantages of the outer ear and ear canal are that they protect the eardrum and allow the inner ear to be placed closer to the brain and thereby reducing travel time for nerve signals.

Over most of the range of frequencies, the sound pressure becomes higher in the ear canal than in the free field. The amplification is a result of resonances in the ear canal and in the cavities of the pinna. The cone like form of the external ear also acts as an acoustical horn that helps amplify the incoming sounds.

The amplification of the external ear also contributes to what we call “directional sensitivity”. A skill made possible by the fact that different frequencies are amplified differently, depending of the angle of the incoming sound. This is an important part of our ability to localize sounds and it will be described further in the next section.

CHAPTER 2. THEORY

13

2.2.2. Middle ear

The middle ear consists of the eardrum, the malleus (hammer), the incus (anvil), and the stapes (stirrup). The footplate of the stapes is connected to the oval window which forms the entrance to the inner ear.

The purpose of the middle ear is to transfer the sound pressure variations of the air in the ear canal to pressure variations in the fluids in the cochlea.

The malleus, incus and stapes acts as “levers” to match the impedance of the air and the fluids in the cochlea. The area ratio between the eardrum and the smaller oval window also help produce a perfect match between the impedances, (for frequencies around 1 kHz.)

2.2.3. Inner ear

The inner ear consists of the cochlea and the vestibular system.

The purpose of the inner ear is to convert the sound pressure variations in the cochlea to electrical activity in the nerve cells. The conversion occurs in transduction cells, called inner hair cells, and is transferred to neurons in the auditory nerve. These nerve signals are then transferred through the auditory nerve to the central nervous system and to the brain where they are interpreted as sounds.

(More detailed information about the ear can be found in [11].)

CHAPTER 2. THEORY

14

2.3. How do we localize sound sources?

In an evolutionary perspective, the ability to localize sound sources has been critical for our ability to survive. We need to be able to accurately determine the direction of objects to seek or avoid and we need a direction which to direct our visual attention.

The sounds we hear, or rather the way we hear them, provide the brain with several pieces of information about the direction and distance to sound sources. It also gives us information about the nature of our surrounding environment.

There are two primary types of cues that enable localisation: time cues and amplitude cues.

2.3.1. Time cues

When a sound source is located to one side of the head, the sound will reach the ears at different times due to the difference in distance to the right and left ear. This time difference is called the Interaural Time Difference (ITD) and is directly related to the angle of incidence θ of the sound. (Figure 4)

( )c

rITD

θθ sin+=

Figure 4: Interaural Time Difference

ITD is primarily registered at the start and end of a sound but the brain can also distinguish ITDs in continuous sounds like sine waves. This is possible because the ear can perceive the phase of the wave which means that the ITD is equal to the phase difference between the ears. This is only effective for low-frequency sounds whose wavelengths are more than the distance between the ears. For higher frequency sounds the phase difference is misleading. Partly because the hair cells in the inner ear indicates phase more randomly at higher frequencies. Another reason is that for a high frequency wave, there may be many cycles of phase difference between the ears, and the brain has no way to determine which cycle in the later ear that corresponds to a given cycle in the first ear. Determination of ITD through phase differences starts to become ambiguous at around 750 Hz. This is why time cues are more important for lower frequency sounds. For sounds above approximately 1 kHz, amplitude cues become more important.

CHAPTER 2. THEORY

15

2.3.2. Amplitude cues

When a sound reaches us from the side of the head, there will not only be time differences between the ears, but also differences in sound intensity. This is due to the fact that the sound has had to pass the head to reach the obscured ear. For sound at high frequencies, the head acts as a barrier because the size of the head is large relative to the wavelength of the sound. This will result in a higher sound level in the closest ear than in the furthest ear. This difference in sound level is called Interaural Intensity Difference (IID). IIDs are negligible below about 500 Hz, but can be as large as 20 dB at high frequencies.

The spectral content of the sound also varies. The shape of the pinna gives rise to reflections and resonances that change the spectrum of the sound depending on its incoming direction. Reflections off the shoulders and body also modifies the spectral content to some extent.

2.3.2.1. HRTF

The combination of these effects results in that the spectral content of the sound will change in a specific way before reaching the eardrum. This change is described by a Head-Related Transfer Function (HRTF) that is unique for every sound source position around the head, including different elevations and front-back positions.

Figure 5: HRTFs at different angles of incidence. [5]

Figure 5 illustrates HRTFs for different angles of incidence on both ears of a test subject. It becomes clear that the HRTFs include several dips and peaks, particularly in the higher frequencies. Research has found so-called “directional bands” which are frequencies that appear boosted for

CHAPTER 2. THEORY

16

specific directions.[3] For example: frequencies around 8 kHz seem to relate to overhead perception, 300-600 Hz and 3-6 kHz relate to frontal perception, and frequencies around 1200 Hz and 12 kHz relate to rear perception.

Since the shape and size of the pinna is so different for all humans, the HRTFs also differ. It could be compared with fingerprints in that they are unique, but still share many common properties. Extensive research has been made to characterize the human HRTFs and to find what features are most important for spatial perception.[3] There is some evidence that such generalization is possible but it has been proven that people localize better using their own HRTFs.[15]

The human brain seems to be highly adaptable to changes in HRTF. I.e. we do not loose our ability to localize sound sources when we for instance cover parts of our ears or change the shape of our pinna. Experiments have been made where a test subject was given another person’s HRTF by feeding a processed audio signal directly into the ear canal. In this situation, the test subject’s ability to localize sounds was at first reduced significantly. As the experiment continued, the test subject adapted to the foreign HRTF and regained the localization ability.[12] It has also been shown that localization abilities through HRTF are not congenital but rather something that is trained at a young age.

As time cues are considered to be binaural, amplitude cues can be considered to be both binaural and monaural as the HRTF is registered by each ear individually. It is probable that spectral differences between the ears are important for binaural localization, but monaural cues are also possible when only information from one ear is present.

It should also be noted that HRTFs are superimposed on the natural spectrum of the sound. It is therefore hard to understand how the brain separates the natural spectral characteristics from those added by the HRTF, especially when considering monaural localization. There is some evidence that the brain is capable of comparing HRTFs with stored patterns, and that the directional bands become increasingly important at monaural localization.[10] Although monaural cues are likely to be more detectable using head movements since it allows the brain to track changes in spectral characteristics.[14]

2.3.3. Amplitude cues in multichannel audio systems

When listening to sounds in multichannel audio systems such as 2.0 stereo or 5.1 surround systems, localization is a result of amplitude differences between the speakers. When a sound is played from 2 speakers simultaneously, the sound will be perceived as originating from an angle between the speakers relative to the level difference between the speakers. A level difference of approximately 18 dB will result in that the sound is perceived to be originated from the speaker with the higher output. A level difference of 0 dB will result in that the sound is perceived to be originated from an angle in the middle of the speakers.

CHAPTER 2. THEORY

17

2.3.4. Head movements and the cone of confusion

Localization cues are also provided by head movements or movements of the sound source.

When we look closely at ITDs we realise that since ITDs only provide the brain with information on the angle of incidence to a sound source. The ITDs do not pinpoint the sound source to a single point in space, but rather to an angle relative to the listener.

For example, if the angle of incidence is 60° to our left, the sound source could be located at any point in space that corresponds to the same angle to the interaural axis. This is sometimes referred to as the “Cone of confusion”. (Figure 6)

Figure 6: The cone of confusion

In other words, any sound source located on the surface of the cone would produce the exact same ITD. Ambiguities related to the cone of confusion may be resolved by head movements since the changes in ITD produced by even small head movements provide the brain with enough information to pinpoint the source.

As mentioned before, head movements may also add additional information to information provided by IIDs.

2.3.5. Influence of vision

When perceiving our surroundings, we tend to place objects in a specific point in space rather than in a point relative to our head’s position and direction. Moving our heads will produce changes in ITDs and IIDs for sounds coming from a stationary sound source, but our perception of the source’s position in space will be the same. Information about the sound’s localization and information about the head’s position is combined to arrive at a constant perception. Vision plays an important role in this process. Experiments have been done where a test subject’s perception of the location of an object became heavily distorted when the visual information was ambiguous.[14]

Another influence of vision is that sound sources that do not have any apparent source inside the visual field are often perceived to be originating from behind the listener. This may devolve upon the fact that humans rely more on sight than hearing when it comes to gathering information about events in the visual field. Hearing is more important when localizing objects outside the visual field, and an “invisible” sound might therefore be perceived as originating from the rear.

CHAPTER 2. THEORY

18

2.3.6. Distance perception

Besides directional localization we also need to be able to determine the distance to sound sources. Distance cues are given by a number of factors depending on the environment.

If we consider sound in a free field, the distance will affect the level and spectral content of the sound. The sound level will decrease with longer distance due to spreading losses. The spectral content of the sound will change in such a way that there will be less high frequency content in distant sounds due to air absorption.

In reflective spaces we are also provided with cues from reflections. If we compare two sound sources at different distances, the one further away will be more reverberant since the ratio between the direct sound and the reverberation is proportional with distance. The time differences between the direct and reflected sound also provide distance cues. Reverberation time and the timing of early reflections also give the brain enough information to determine an approximate size of the room. This indicates boundaries beyond which sound sources could not reasonably be located.

Judgement of absolute distances is inaccurate compared to directional localization and errors around 20% are not uncommon for unfamiliar sound sources. Especially in non-reflective environments which do not provide the additional information from reflections. Relative distances are much easier to determine, both in reflective and non-reflective spaces.

2.3.7. The Precedence effect

In a reflective space, only some of the sound arrives by a direct path from the sound source. A great deal of the sound reaches our ears after one or several reflections (as described later in this chapter). Each of these reflections will have a different angle of incidence, different distance travelled and therefore a different apparent source. Yet the human brain has no difficulty localizing sound sources in reflective spaces.

This is because the human brain will only consider the characteristics of the first wave front when determining the location of the source. All sound that arrive shortly after the direct sound are recognised as reflections of the direct sound, and are more or less ignored in the localization process.

We are not even aware of these subsequent reflections as they are perceptually fused with the first wave front into a single sound, even if the subsequent wave fronts have a level of up to 10dB more than the direct sound. This phenomenon is called the Precedence effect or “The law of the first wave front”.

The precedence effect varies greatly depending of several factors such as the nature of the sound and even the experience of the listener. (A good summary of the precedence effect is given in [10]) For now it should be noted that the maximum time interval over which fusion takes place varies and is not the same for all sounds. Single clicks have a critical limit of about 5 ms while more complex sounds can have a limit as long as 40 ms. The limit for speech and music is close to 50 ms. Reflections that arrive later than the critical limit will be perceived as reverberation if the level have decayed proportionally over time. The late reflection might also be perceived as an echo if the sound level of that specific reflection is high relative to other reflections and reverberation.

Even though early reflections might not always be consciously perceived due to the precedence effect, the information carried by these wave fronts effect the perception of the source as well as the perceived nature of our surrounding environment.

CHAPTER 2. THEORY

19

2.3.8. Room perception

Reflections in enclosed spaces result in what a listener experiences as an “ambient” sound field. To explain this, let’s imagine an enclosed reflective room with a sound source and a listener. (Figure 7)

As the sound is emitted, it will radiate out from the source as an omnidirectional sound wave. The first wave front to reach the listener is called the Direct sound. This is always the sound that travels the shortest distance between the source and the listener, while not being reflected on the way. Shortly after this, the listener will be reached by Early reflections. These are wave fronts that have been reflected once or twice on the surrounding surfaces before reaching the listener. More reflections continue to reach the listener, and as these become denser and more diffuse, they will be perceived as reverberation. Eventually, no sound energy will remain in the room.

Figure 7: Reflections in an enclosed space. [17]

Even though we as listeners are not aware of the individual wave fronts reaching our ears, due to the precedence effect, our brains use the information from these reflections to interpret our surroundings. The sensation of spaciousness, envelopment and room impression comes solely from the information carried by these reflections, interpreted unconsciously by our brains.

When inside a reflective space, the brain receives enough information to asses the size and composition of the room with high accuracy. This information is provided by the reflections and reverberation from sounds propagating through the space. The reverberation time and the spectral content of the reverb provide the brain with enough information to accurately estimate the size and composition of our surroundings. This happens instantly and unconsciously, which is the reason that many of us are oblivious to this remarkable skill. It is easier to consciously distinguish reflections when the time difference between the direct sound and the reflections (reverberation) is large, for example in a church, a concert hall or even a large bathroom.

Early reflections have been found to greatly influence our sense of size and space of a room while reverberation contributes more to a sense of spaciousness.

CHAPTER 2. THEORY

20

2.3.8.1. Echoes

A single reflection can also be perceived as an echo. In order for this to happen, some conditions have to be met:

The reflected sound has to travel a distance that is neither too long nor too short. If the distance is too short, the sound will be fused together with the direct sound or masked by other reflections. If the distance is too long, the sound level of the reflected sound will be too low to be audible. In order for the reflected sound to be loud enough to be audible, the reflective surface must be fairly smooth and relatively big. Additional surfaces that help reflect the sound also increase the chance of creating an echo. A quiet environment also increases the chances of hearing echoes.

CHAPTER 2. THEORY

21

2.4. Current technology in the gaming industry

So how do game developers simulate spatial acoustics?

Three things make this question difficult to answer. Firstly, practically every game developing studio has its own way of handling sound. The tools used vary as most studios develop and maintain their own set of tools, including the game engine and sound engine. Some audio developing tools are developed by third party developers, but even when studios use the same tools, the knowledge, resourses and ambition greatly influence the result.

Secondly, there is a lack of available information. There is very little literature covering the gaming industry, let alone the sound production part of the industry. Some literature can be found on the subject but these sources cover more general areas.

Thirdly, the rate of which the technology changes is relatively high. Even though the audio production tools might not change at the same rate as its graphical counterpart, the technological development still holds a high pace and new technologies become outdated in relatively short time.

To describe the current technology of simulated spatial acoustics in the gaming industry, I have chosen to focus on the Swedish game developing studios Starbreeze and DICE. The information in this section comes from interviews with sound designers and sound programmers at these two studios.

DICE have produced many games in the well known Battlefield series. The reason why DICE is interesting is that they work more with SSA than any other game developer (known to me).

Starbreeze have produced two games in the well known Riddick series and “The Darkness”. Even though Starbreeze’s SSA is not as complex as DICE’s, their ambition is high when it comes to spatial acoustics. Their sound designers have been able to provide a great deal of information, both on current technology and upcoming development.

2.4.1. Spatialization

2.4.1.1. Direction

When playing through a level of a game, the game-engine knows the position of every object in the level. This includes the player, enemies, team-mates, objects and other sound sources. This means that the game engine can calculate the direction and distance to every sound source relative to the player. This information is mediated to the sound-engine and the sounds are played from the correct direction relative to the player’s current position.

Directional localization in surround systems is achieved through amplitude cues described in section 1.8.3., i.e. through amplitude differences between speakers.

2.4.1.2. Distance

Distance to sound sources is handled in a similar way, but with one big difference: Even though the game-engine knows the distance to a specific sound source, the playback level of the sound is rarely linear relative to the distance to the source.

The sound designers assign several properties to a sound source when placing it in a level. These properties affect the sound’s characteristics at varying distances from the player. This distance-related behaviour is created to simulate distance cues such as spreading losses and atmospheric absorption.

CHAPTER 2. THEORY

22

Each sound source is given a value indicating at what distance from the player the sound will become audible (maximum distance). When the player is within a specified distance (minimum distance) the sound will be at “full” level. (“Full” level is not the same as maximum level but rather a value specified in the mixing process.) The sound level for the distances between minimum and maximum distance is specified by a function of sound level decrease per distance unit. This function is decided by the sound designers for individual sounds.

It is worth noting that directional localization is very close to what we experience in “the real world”, and should therefore be considered as realistic. In the game world, as in the real world, we expect to find the source of a sound in the direction from witch we heard the sound. We also expect a sound to come from the same direction as its source, assuming, of course, that we are using a surround sound system. Distance localization on the other hand, takes a step away from realism. Why is that?

One reason is the limit to the amount of sounds that can be played simultaneously. The amount is not only limited by the processing capabilities of the console, but it is also limited by the human brain. In large, action-filled scenes (the kind that constitute the major part of a normal fps-game) the large number of sounds would quickly become overwhelming if every sound within hearing range of the player was audible. Just imagine a crowded battlefield with numerous gunshots, explosions, movement, voices etc. Exposure to such a sound environment would be very tiring and would most certainly not be a pleasant gaming experience.

Such unwanted results are in large part avoided by assigning distance properties to sound sources, as described above. This allows for certain sounds to be filtered out when the player is not in immediate vicinity of its source. This also gives sound designers greater control of the soundscape as it allows for prioritization between sounds. For example, a FM radio might only be heard when closer than a couple of meters, but an enemy guard’s footsteps might be heard over a greater distance. This is justified because the guard is of much higher importance to the player.

2.4.2. Room acoustics

Besides the spatial positioning of sounds in surround sound systems, room acoustics is the area of spatial acoustics that has been most widely embraced by game developers. The most common simulation of room acoustics is to add reverberation to the sounds of a game.

2.4.2.1. Reverberation

To understand how reverberation is simulated, we must take a quick look on how audio is processed by the sound-engine:

Every sound that is played in a game is processed by one or several Digital Signal Processors, or DSPs. The audio processing of a DSP might be compression, equalization or any other audio effect such as distortion, pitching, or reverberation. This process is similar to the sends/returns of a mixing console. By sending a sound through a reverb DSP, artificial reflections and reverberation are added to the original sound. This adds a perceived, simulated environment to the sound, an environment whose perceived characteristics depend on the properties of the reverb DSP.

The reverb used on Starbreeze is a parametric reverb. This means that several different parameters of the reverb can be adjusted to simulate room acoustics that closely match the artificial environment. The sound designers can adjust 45 different parameters such as level and mix of early reflections and reverb, equalization values and even the number of early reflections. With this tool it is possible to simulate a wide range of sound environments of different size, shape and material. DICE uses a convolution reverb which means that no parameters are set manually but are calculated beforehand using audio recordings made in the environments that the reverb is supposed to mimic. There are advantages and disadvantages with booth kinds of reverb but these will not be discussed here as the method for creating SSA is the same in both cases. The only relevant difference between parametric and convolution reverbs are how the parameters are set.

CHAPTER 2. THEORY

23

As the player moves through a level, the parameters of the reverb DSP do not change gradually depending on the characteristics of the space that currently surrounds the player. Instead, the sound designers use several different predefined reverb presets, (Large Hall, Small Room, Outdoor Alley, Concrete pipe, etc.) This allows for at least one of these presets to match the player’s current environment. The position of the player defines what reverb preset is used. When creating a level, the sound designers place several “reverb-boxes” inside the level, and each box is assigned a specific reverb preset. If the player is inside a box, all sound will pass through the assigned reverb DSP. If the player is inside two boxes, all sound will pass through booth assigned DSPs, but this is avoided if possible.

The reverb-boxes are placed so that they cover every area in a level. In simple levels, each room could basically have one reverb-box with a corresponding reverb preset that matches the size of the room. Most levels however are not so simple and in complex rooms of irregular shapes, many boxes have to be created.

Two factors limit the use of this technique: The first is the time available for sound designers to create and maintain the reverb boxes. If a game studio had unlimited time and resources to spend on this, a level could practically contain a separate reverb box for every cubic meter in the level. These kinds of resources are obviously not realistic, and even if they were, the second restricting factor also prevents such a method: When the player exits a reverb box and enters another, the parameters is temporarily cross-faded. This can result in unwanted artefacts, especially when mixing reverbs with big differences in parameter settings.

2.4.3. Occlusions and obstructions

As mentioned before, when playing through a level of a game, the game-engine knows the position of every object within the level. However, the sound engine does not have any information about what lies in-between the player and a sound source. Structures such as walls, doors, pillars, etc. are invisible to the sound engine and can therefore not be considered in calculations. This can produce unwanted results such as sounds that are clearly audible through walls, floors and ceilings.

Starbreeze solves this problem with the use of scripting. Scripting means that specific instructions are given to the game engine when specific criteria are met. For example, to add the sound of a door opening, the sound is scripted so that it plays when the player is close to the door and presses “the open door button” (the graphical animation of the door opening is probably also scripted to be triggered by the same action.) Scripting can simulate occlusion by adding triggers that turn the sound source on or off depending on the location of the player. This quantization however is far from the way that sound propagation works in the real world. It is also a time-consuming task for the sound designers.

2.4.3.1. Ray-casting

DICE solves the same problem using a different technique. This technique is called Ray-casting and it allows the sound-engine to check if there are any structures between the source and the player. The following is a simplified explanation of ray-casting in general:

Several “Rays” are emitted from the sound source towards the player with slightly different trajectories. Depending on if the rays reach the player or not, the sound engine gets information about whether the sound source is open, occluded, obstructed or excluded.

If all the rays reach the player, this means that the sound source is fully open with no obstacles between it and the player. The sound should then be played without any processing.

If no rays reach the player, the source is considered to be occluded. This could mean that the sound source is located behind a wall or another structural obstacle that separates the source from the player. The sound should therefore not be played. In some cases however, it might be desirable to play the sound anyway, but processed with effects such as a low pass filter. This simulates the

CHAPTER 2. THEORY

24

acoustic effect where the low-frequency content of a sound penetrates an obstacle but the high-frequency content is absorbed by the material in the obstacle.

If some but not all rays reach the player, the source is considered to be obstructed. The source might be in the same space as the player but an obstacle is located in the direct path of the sound. In this situation, the direct sound might be handled as if occluded (processed with a low-pass filter) but the early reflections and reverberation is handled as open. A result where only some rays reach the player can also be achieved when the source is in a different area as the player, but the direct path is open. For example in the next room but visible through the door. The sound source is then considered to be excluded. The direct sound might now be handled as open while early reflections and reverberation is handled as obstructed.

2.4.4. Sound cues not used in game worlds

2.4.4.1. HRTF

Amplitude cues in form of HRTFs are not commonly used in games. Some games have tried to implement HRTFs but the technique has not been widely adapted. The benefits of HRTFs do not seem to outweigh the disadvantages, which are the uniqueness of individual HRTFs and the high risk of distortion. There are also problems related to speaker setups.

2.4.4.2. Reflections

None of the games included in this research has any way of handling reflections. Even though reverb DSPs allow for simulated room acoustics by creating simulated reflections, these reflections are created randomly and their directional characteristics can not be controlled. Out of the 45 reverb parameters available to the sound designers at Starbreeze, none control any directional properties.

2.4.5. Realism vs. Aesthetics

Game developers and sound designers sometimes have to make design decisions where realistic simulations are incompatible with an aesthetically enjoyable gaming experience. It might not always be desirable to design games to be perfectly realistic. For example: realistically simulated gunfights might be exhilarating, but how far should we go to simulate the sensation of being shot?

One example of this applies to sound design is game sequences where large amounts of sounds may become stressful and tiring for the player (as previously discussed in section 2.4.1.2).

Another example is when sounds, if realistically simulated, would be so loud that exposure to the sound could be harmful. In a game sequence where a grenade explodes at the feet of the player, a perfectly realistic simulation would probably hurt the player’s ears, or at least make the gaming experience less pleasant.

Yet another example is when certain sounds are enhanced to make the gameplay easier or more enjoyable. A good example of this is the one also discussed in 2.4.1.2, where a guard’s footsteps are heard unrealistically loud in order to make it easier for the player to attack or pass unnoticed.

Such design problems are an important and interesting part of the sound design of a game.

Chapter 3

Investigation

26

3. Investigation

3.1. Hypothesis

To recap, the purpose of this thesis is to provide Starbreeze with guidelines on how to develop their audio engine to be able to create as immersive gaming experiences as possible by creating realistic acoustic environments. To achieve this, I wanted to find out how the gaming experience is affected by the acoustic realism of the game. Before conducting the investigation, I needed to closely define what constituted the player experience and what should be considered as “acoustic realism”.

In chapter 2, I identified the most common and most advanced technologies that are used in the gaming industry to simulate certain aspects of spatial acoustics. These aspects were: spatialization of sounds, room acoustics, and handling of occlusions and obstructions.

Spatial placement of sound sources is used in every modern fps-game. This technology is well developed and it is difficult to distinguish any differences in how game developers use it. I therefore decided not to focus on this technology.

Handling of occlusions and obstructions on the other hand is used in very few modern games. This, together with the fact that I did not have access to such technologies, made it hard to include this aspect of SSA in the investigation. Therefore, the investigation came to focus on the technique of simulating room acoustics.

Considering “gaming experience”, I reasoned that the use of SSA might affect the gaming experience in the following ways:

• The player would find the game more immersive.

• The player would find the game more realistic.

• The player would find it easier to locate sound sources.

This reasoning lead to the following hypothesis:

Including simulated room acoustics in a game will affect the gaming experience in such a way that the game is going to be perceived as more immersive, more realistic, and localization of sound sources is going to be easier.

The investigation described in this chapter was conducted to verify or falsify that hypothesis.

3.2. Methodology

The investigation consisted of eight structured interviews. During the interviews, the interviewees were asked to play three contemporary videogames and two game levels created solely for the interviews. The part of the interview where the three contemporary games were played is from now on called “Game session 1” and the part where the game levels were played is called “Game session 2”. The interviewees were asked to rate the contemporary games and the game levels on different aspects of the sound. The ratings and the interviews constitute the results which are outlined in the end of this chapter.

CHAPTER 3. INVESTIGATION

27

3.3. Stimuli

3.3.1. Game session 1

In Game session 1, the participants got to play three games for 10-15 minutes each. After each game, the participant was asked to rate the game on different aspects of the sound. The participant was also asked to answer a few questions about their experience of the game and its sound. This was repeated for every game and after all three games, some additional questions were asked.

The three games included in the survey were Halo 3, Call of Duty: Modern Warfare 2 and Battlefield Bad Company 2. These were chosen because they each include a different level of complexity of the SSA. Modern Warfare 2 includes no detectable SSA, Halo 3 includes some SSA and Battlefield Bad Company includes more SSA than any other game on the market (known to me). All three games are first-person shooters. They also take place in similar environments, at least in the sense that the environments alternate frequently between spaces that should have different acoustical properties. This is opposed to, for instance, Bioshock that exclusively take place indoors, and Crysis that takes place mostly outdoors.

3.3.1.1. Halo 3

Halo takes place in the 26th century where war rages between humans and an alliance of alien races. The player takes the role of a cybernetically enhanced supersoldier as he fights for the survival of the human race. The settings of the game are futuristic environments, both on earth and on foreign planets. The game contains both a campaign and multiplayer possibilities.

When looking to the sound of Halo 3, there is a clear difference between the reverberation outdoors and indoors. Some weapons have spatial characteristics in their sounds but these lie within the original sound file. This becomes very clear when you switch between certain weapons whose sounds seem to be recorded in different sized rooms. Unlike Modern Warfare and Bad Company, the weapon and player sounds are not as prominent in Halo.

3.3.1.2. Modern Warfare 2

Modern Warfare 2 is a first-person shooter that takes place in present day. The player takes on different roles throughout the campaign but the major part is played as a member of a special military task force. The storyline circles around a military conflict between America and Russia. The progression of this conflict takes the player on different missions in various locations around the world. This means that the game includes many different environments such as Urban America, Russian mountains, South-American ghettos, oil rigs etc.

The sound in modern warfare 2 contains no SSA. When moving between different environments, it becomes clear that the sounds are not affected by the surroundings in any way. Differentiation between environments is to some extent made through ambience and music.

3.3.1.3. Battlefield Bad Company 2

Bad Company 2 takes place in present day where a military special operations team is taken around the world in search for a secret weapon. Similar to Modern Warfare, the missions are carried out in a large variety of environment, e.g. urban, desert, mountain, jungle etc.

The game contains several instances of SSA: The weapons sounds vary depending of in what environment the weapon is being fired. The sound team at DICE has recorded all the weapons in a number of different environments, e.g. urban, enclosed, open, etc. When a weapon is fired in the game, the engine plays the sound file that matches the player’s current surroundings. Another sound is also played in conjunction to the initial sound of the shot. This is the ”tail” which is

CHAPTER 3. INVESTIGATION

28

basically a recording of the environment’s response to the initial sound. For example, in outdoor environments, the “tail” consists of a distinctive echo. The “core sound” and the “tail” are merged and both are then processed in the sound engine where a reverb is added. This reverb is also different depending on the space in which the player fires his gun, (as explained in chapter 2.4.2.) This means that every shot that the player hears is in fact a merge of three sounds that each is the result of a separate process with the purpose to simulate spatial acoustics.

3.3.2. Game session 2

In Game session 2, the interviewees played four short game levels created exclusively for the survey. These game levels were created using Starbreeze’s level creation tools. The levels A1, A2, B1, and B2 were created in pairs (A1/A2 and B1/B2) so that the levels of each pair were identical except for differences in the complexity of the SSA. After playing each pair of levels, the interviewee was asked to rate the levels on different aspects of the sound and to answer a few questions about his experience of sound in the levels.

3.3.2.1. Level A

Level A consisted of 7 rectangular rooms of 5 different sizes: tiny, small, medium, large and huge. (Figure 8) The dimensions of the rooms are shown in the table below. The dimensions are given in units that correspond to approximately one meter in the game.

Table 2: Room dimensions.

Room Width-x Width-y Height

Small 3 4 2

Medium 5 5 3

Large 12 8 4

Huge 14 18,5 8,5

The tiny room was in the shape of an 8 meter long corridor that was just high enough to allow the player to walk without crouching. The corridor had 90 degree turns so that there was a maximum of 2,25 meters visibility at any point inside the corridor. The purpose of this construction was to give the space a very small, almost claustrophobic, character.

CHAPTER 3. INVESTIGATION

29

Figure 8: Structural overview of level A

The first and the last room were medium sized and there was only one way to navigate through the rooms. The path led the player through increasingly larger rooms, with the exception of the tiny room, (the corridor,) between the medium and the large room. The rooms where connected by door shaped openings.

There were one to three targets in each room. The targets consisted of a bull’s-eye printed on the wall or the ceiling. When the player fired at the target, a sound was played (a sound similar to the sound of breaking glass.) The purpose of the targets was to provide the player with a sound source that was not originating from the player himself.

3.3.2.2. Difference between A1 and A2

The two levels were identical except for the implementation of the reverb-boxes.

In A1, only one reverb-box was used, covering the entire level. This reverb-box contained a reverb that approximately matched the medium room in size and character.

In A2, every room had its own reverb-box and each room was assigned a reverb that matched the specific room in size and character. Since the reverb at Starbreeze did not handle parameters such as room size or reverberation time, the reverb settings could not be set according to calculations of the room characteristics. Instead, this was done using the same method that the Starbreeze sound designers use when designing levels: The designer simply chose a reverb setting that he thought matched the size and structural character of the room.

CHAPTER 3. INVESTIGATION

30

3.3.2.3. Level B

Level B consisted of 4 rooms, identical in size and shape. Each room was decorated and designed to look like rooms that people often come across in everyday life, rooms in which the room acoustics are very specific and well recognizable:

Bedroom:

Heavily furnished with a bed, a sofa and two bookshelves. The walls were covered with tapestry and several pieces of heavy, decorative fabric. The floor was covered by a heavy carpet. The reverb preset had a very short reverberation time that was almost not noticeable. The reverb preset also had a moderate low-pass filter to simulate absorption of high frequencies by the materials in the room.

Living room:

Moderately furnished with a couch, two armchairs and two coffee tables. The floor and ceiling were made of wood and tapestry covered the walls. The reverb preset was similar to that in the bedroom, but with longer reverberation time and a less prominent low-pass filter.

Cellar:

All walls, floor and ceiling were made of concrete. The room was empty except for a few tires, buckets and miscellaneous debris. The reverberation time was long with no spectral processing.

Bathroom:

All walls, floor and ceiling was made of tile. The room was completely empty except for a few urinals and showers placed on the walls to give the visual impression of a large public bathroom or “gym shower”. The reverberation time was the longest of all the rooms and the reverberation was clearly audible.

All the rooms had the same dimensions (5×5×2.25). Each room was connected with two other rooms by a small door shaped opening.

3.3.2.4. Differences between B1 and B2

In B1, only one reverb-box was used, covering the entire level. This reverb-box contained a reverb that approximately matched all rooms in size, but not in character. The same reverb was used as in A1.

In B2, each room had a separate reverb-box with a separate reverb. Each reverb was carefully chosen to match the size and character of the specific room.

3.4. Participants

5 of the participants were employees of Starbreeze and 3 were business students at Uppsala University. 8 participants were interviewed: 6 male and 2 female, all in ages ranging between 21 and 35. The participants had different gaming experience, ranging from an average playtime of 2 hours per week to 20 hours per week. Playing games was a hobby for all participants and everybody had played games for several years. The gaming interest of the participants was not limited to first person shooters or to any specific platform. None of the participants worked with sound design on Starbreeze or anywhere else. Overall, all participants should be considered to be average gamers and to be in the target group of the gaming industry.

CHAPTER 3. INVESTIGATION

31

3.5. Interviews

The interviews were conducted in one of the two game rooms in the Starbreeze office. This was a controlled and well set up listening environment. Lighting and sound volume were adjusted to resemble an ordinary gaming situation. The participants were seated on a couch in the centre of the listening area, straight in front of the screen. 4 of the interviews were conducted in a small room with a large flat screen TV. The other 4 interviews were conducted in a large room with a large projector screen. Both audio systems were calibrated for playing videogames with surround sound.

Before each interview started, the participant was asked to answer a few background questions. The participant was then introduced to the setup and purpose of the investigation. He or she was told that the investigation focused on game sound in general, not specifically spatial acoustics. The interview then started with Game session 1.

In conjunction to playing each game in Game session 1, the participant was asked to rate each game on four specific aspects:

• How he perceived the general quality of the sound in the game

• How realistic he perceived the sound to be.

• How easy it was to immerse himself in the game.

• How easy it was to locate sound sources in the game.

Each aspect was rated on a scale from 0 to 10 where 0 was the lowest (i.e. the worst possible sound quality, the least possible realism) and 10 was the highest (i.e. very easy to immerse oneself in the game, very easy to locate sound sources).

After this, the participant was asked to motivate their scores. This was then followed by a short, structured interview about how the participant experienced the game and the sound in the game. This was repeated for all games. The purpose of this approach was to get measurable data on how the participants experienced the named aspects of the game, and at the same time get the more open-ended inputs that can emerge during an interview.

After playing all games, the participant was asked some questions about their general opinion of the importance of sound in games.

Game session 2 followed where the participant played the four shorter levels (A1, A2, B1, and B2). After each pair of levels, the participant was asked to rate the levels on:

• How realistic he perceived the sound to be.

• How easy it was to immerse himself in the game.

• How much he thought that the sound contributed to the experience of playing the level.

In conclusion, the participant was asked a few questions about the experience playing the levels and if he had noticed any difference between them.

After formally ending the interview, some of the interviewees were keen to stay and continue discussing games, sound, and SSA. Some valuable and interesting information emerged from these casual conversations.

Overall, the interviews took about an hour.

CHAPTER 3. INVESTIGATION

32

3.6. Results

The overall results point to that the game with the most complex SSA (Battlefield) receives the highest scores in all areas. The game with the least SSA (Modern Warfare) on the other hand, does not receive the lowest scores. The lowest scores were given to the game with the intermediate level of SSA (Halo).

In the course of the investigation however, it became clear that the low scores of Halo had nothing to do with SSA. The main reason given in the interviews was the overall aesthetics and quality of the sound in Halo. The lower quality of the sound in Halo, together with the close similarity of the two other games (Battlefield and Modern Warfare) makes comparison of SSA difficult between the three games.

Also, there were very few interviews where the interviewee actually came in contact with the SSA of Halo. The most prominent SSA is the contrast between outdoor and indoor spaces and only a few interviewees came across such an area. It would therefore not be correct to consider Halo to be the intermediate game in terms of complexity in SSA.

Battlefield and Modern Warfare on the other hand are very similar in almost every aspect, not only considering the overall quality of the sound, but also the gameplay and setting of the two games. One could easily say that one of the biggest differences between these games is the complexity of SSA.

I am therefore going to focus on Battlefield and Modern Warfare when presenting the results of the investigation.

3.6.1. Immersion

The results show that simulated spatial acoustics (SSA) has a large impact on immersion in a way that more complex SSA results in a higher level of immersion.

The game with the most complex SSA (Battlefield) received an average score of 9,57 while the game with no SSA (Modern Warfare) received an average score of 9,07. The interviews also confirm that the higher scores were clearly related to the interviewees’ perception of the SSA.

Pontus stated that he found the immersion to be twice as high in Battlefield as in Modern Warfare and that this was largely because of the sound. Artur also stated that the sound was the major reason for him giving Battlefield the highest score on immersion. Alexander stated that he found the immersion in Battlefield to be better, specifically because of the differences between the sounds outdoors and indoors.

The only time that the sound was mentioned as a contributing factor in Modern Warfare was the sound of a helicopter circling above the player, which is a simple form of SSA (spatialization). The music and radio communication of Modern Warfare was also mentioned as positive factors in the discussion around immersion. I should however remind the reader that these sounds are not a form of spatial acoustics since the music is non-diegetic and the voices of the radio communication is located inside the players head and is thereby not affected by the surrounding environment.

The scores from Game session 2 also confirmed the results. Level A1 received a score of 6,14 while level A2 (with complex room acoustics) received a score of 7,46. Level B1 received a score of 5,64 while level B2 received a score of 6,63. This clearly indicates that the participants experienced a higher level of immersion in the levels with the more complex SSA, and this was also confirmed in the interviews. Both Pontus and Sabine stated that the immersion was much greater in level A2. Some of the participants also stated that they found the general quality of the sounds to be better in A2 and B2, but without mentioning the SSA or even acknowledging any difference between the levels.

CHAPTER 3. INVESTIGATION

33

3.6.2. Realism

The results show that SSA has a large impact on the perceived realism of a game in a way that more complex SSA result in a higher level of perceived realism.

Battlefield received a score of 9,43 and Modern Warfare received a score of 8,86.

The interviews confirmed that the SSA was somewhat connected to the realism but not as much as for immersion. More of the participants mentioned non-spatial aspects of the sound as the reason for high realism. The individual sounds of the weapons were mentioned by several participants.

However, a few participants named SSA as an important factor for realism. Artur stated that the fact that the sound was different depending on where you were affected the perceived realism in Battlefield. Alexander also stated that the contrast between sounds indoors and outdoors was the reason he though the sound to be better in Battlefield than in Modern Warfare.

The interviews also pointed to a close correlation between realism and immersion. Realism seems to be closely related to immersion it the way that realism is important in order to allow the player to immerse himself in the game.

The scores from Game session 2 also indicate that SSA lead to a higher perceived realism. Level A1 got a score of 5,50 while level A2 received a score of 7,36. Level B1 received a score of 5,71 while B2 received a score of 7,13.

3.6.3. Localization

The results show that the ability to localize sounds does not seem to be affected by SSA. Battlefield received a score of 8,57 and Modern Warfare received a score of 8,21. Even though the higher score of Battlefield could indicate that more SSA results in better localization, the difference between the scores are relatively small. More importantly, the interviews give no indication that the SSA is a reason for the higher score of Battlefield. The answers vary greatly in the opinion of localization. One participant found it easier to locate sound sources in Battlefield, three found it easier in Modern Warfare and two did not experience any difference. None mentioned any SSA when discussing localization. The determining factors rather seemed to be the complexity of the level, i.e. the amount of sound sources and their distribution. One participant however, Pontus, who had previous experience of playing Battlefield in multiplayer mode, claimed that localization is easier in multiplayer mode and that this is partly due to the simulated echoes in the levels.

3.7. Reliability and validity

Some questions surfaced during the course of this investigation, concerning its validity and reliability. I would like to discuss these areas before continuing to the analysis.

3.7.1. Comparison between games

Is it valid to compare games that are different in many ways other than the SSA? Is it possible that the player’s perception of the SSA is affected by other aspects of a game?

Yes this is possible. The differences between the games in other areas than the SSA surely add to things that might make the validity of the investigation lower. However, I do not believe that this have had any substantial influence on the results.

CHAPTER 3. INVESTIGATION

34

One reason for this is the fact that Battlefield and Modern Warfare is very similar in several key areas. The use of SSA is in fact one of the biggest differences, especially when considering the sound.

Another reason is that Game session 2 bypasses this problem completely since the levels are exactly identical except for the SSA. If the results from the two game sessions would have been ambiguous, one might be compelled to question the method. But this is not the case.

3.7.2. Participants

Is 8 interviews enough? The fact that the majority of the participants were employed in the gaming industry, does this affect the results?

Conducting more interviews would certainly have added to the reliability of the investigation, though I believe that 8 interviews were sufficient. The amount of new information received from each interview started to saturate after 5-6 interviews and little new information came out of the last two interviews.

I do not believe that interviewing employees of the gaming industry affected the results. The participants were carefully chosen not to have any previous experience with sound design or any similar area of game development. The results also confirm that there were no big differences between the results from the participants employed in the gaming industry and the participants studying business.

3.7.3. Terminology

Some terms might be interpreted differently by different participants. Especially less known terms like “immersion” and the subjective use of “realism”.

This is always a risk, especially when the interviews were conducted in two languages (English and Swedish). The risk of misunderstandings was reduced as much as possible by carefully explaining the meaning of important terms when necessary.

3.7.4. Sources outside the gaming industry

Is it enough to only investigate the gaming industry? Other areas, such as auralization, might provide valuable information, especially when looking into possible technological development.

I believe that delimiting the investigation to the gaming industry was necessary. It would be a far bigger undertaking to also look into other areas. Continuing research would certainly benefit from investigating adjacent sciences, but this will have to wait for other researchers.

Chapter 4

Analysis

36

4. Analysis

4.1. Immersion, Realism, Localization

It is clear that the gaming experience is affected by the acoustic realism of the game. If the game includes simulated spatial acoustics, the players will perceive the game as more immersive and more realistic. The ability to localize sound sources is however not affected by the game’s acoustic realism, at least not for the level of realism implemented in the games included in this investigation.

4.2. I can’t hear any difference, but it sounds better

Another important result of the investigation is that players tend to perceive the game as more immersive and realistic, without necessarily realizing that the enhanced experience might be a result of SSA. In several of the interviews, the interviewees stated that they found the sounds to be “better”, but could not exactly pinpoint the reason for this. Specifically the weapons’ sounds were described as ”deeper”, ”wider” and ”more dynamic” in levels with simulated room acoustics (Battlefield, A2, B2). While in games with no simulated room acoustics (parts of Halo, A1, B1) the sounds were described as “week” and “wimpy” and lacking “depth” and “push”. To understand how this relates to room acoustics, and therefore SSA, one must understand how sounds such as weapon sounds behave in reflective spaces.

When a weapon is fired in an enclosed space, the high sound levels and the resulting reverberation, or ”response”, in the space is by far the major part of the experience of the sound. The sound from a weapon fired in an open space is just a transient while the sound of a weapon fired in an enclosed space fills the room, builds up in level, and has a relatively long decay time. The sound of a gun in an open field might sound no more than a large firecracker while the sound from the same gun in an enclosed space might be quite deafening. The single reason for this is the reflections and reverberation of the reflective space.

It can be argued that even in the levels with no simulated room acoustics, a “small” reverberation is still added to the sound. This is true, but using only one reverb preset is very restrictive. In a game that only uses one single reverb preset, the characteristics of this reverb must be so discrete that it never differs too much from the visual characteristics of any environment in the game. In games that use multiple reverbs on the other hand, it is possible to use “larger” reverbs (that are easier for a player to notice) in large spaces, and “small” reverbs in small spaces, without any single reverb being too different from the visual character of the space.

4.3. Dynamic sound environments

Apart from finding specific sounds to sound “better” in levels with SSA, several interviewees also found the levels as a whole to sound “better”. A common opinion was that levels with SSA were more dynamic. In some cases, the interviewees did not notice any differences in SSA but still noticed the difference in dynamics. In other cases, the interviewees specifically named the difference between the reverbs of different rooms to be the reason for the dynamic environment. This was also named as a reason for greater realism and immersion.

This observation also points to that SSA might improve the quality of the sound without the players noticing the SSA specifically.

CHAPTER 4. ANALYSIS

37

4.4. Room size vs. absorption

When it comes to room size and absorption of materials within a room, the results show that simulation of room size is more important than simulation of absorbing materials within a room. This is probably because it is easier for a player to distinguish between differences in reverberation time than between differences in the spectral character of the reverberation. As explained in chapter 2, absorption also has a large effect on reverberation time, but the effect is not as prominent as with room size. The investigation showed that it was only in the extreme rooms in B2 that the players noticed any difference in room acoustics, namely in the bathroom and in some cases the bedroom. Such extreme environments are not common in game worlds, though they do exist. Even if such environments exist in some games, it is more probable that the player, in the midst of an intense fight, will take notice of the room size, not the interior design.

CHAPTER 4. ANALYSIS

38

4.5. Proposal

In this section I am going to present my proposal for further development of the Starbreeze sound engine. The proposal is based on the result of the investigation and the work I did at Starbreeze in connection with the writing of this thesis.

During the creation of the game levels used in “Game session 2”, I spent many hours working with the Starbreeze engine and their technique for simulating reverbs (I will from now on refer to that technique as the “reverb-box model”). Though this did not make me an expert in any way, it provided me with information about the limitations and possibilities of the system. The employees of Starbreeze also provided me with great help, both in learning the system and in answering my many questions.

The results of the investigation also provide a great motivation for developing SSA. The results clearly show that the realism of the spatial acoustics in a game positively affects the perceived realism and experienced immersion. This in turn enhances the gaming experience.

4.5.1. Directional parameter for early reflections

As we learned in chapter 2, large parts of the information on which we base our spatial perception come from early reflections. Starbreeze’s current reverb-box model surely includes early reflections in the reverb, but the spatial distribution of these early reflections is random, and the simulation is therefore not spatially accurate. This means that ambiguities might occur between the visual and the aural information.

By implementing parameters controlling the directional properties of early reflections, the realism of the SSA could be improved greatly.

Directional parameters of early reflections would mean that the early reflections in the reverb could be distributed more heavily as coming from one (or several) specified directions. This would still be a crude simplification of wave propagation but it would most probably be an improvement of realism compared to omnidirectional early reflections.

If directional parameters were implemented in the current reverb-box model, separate reverb-boxes containing only early reflections could be placed where early reflections would be particularly prominent, for example close to large walls or pillars, or in corners. Other possibilities arise if the reverb would have non-static parameters, (which I will propose in the next section.)

One problem with such a development, apart from the time and resources it would require, is that it could be difficult to measure the result directly. Comparison between a model with and one without directional parameters would most probably not give results where people notice the early reflections as a distinct positive difference. It is more likely that the model with directional parameters would be perceived as generally more realistic and more immersive, similar to the results of the investigation covered in this report. It is also probable that localization would be easier if there were no longer ambiguities between the visual and the aural information. Directional parameters would also facilitate the use of echoes, (which I will explain later in this chapter).

CHAPTER 4. ANALYSIS

39

4.5.2. Dynamic reverb parameters

The current reverb-box model gives good results, but it has a few problems.

One problem is the transitions between reverb presets. When a player moves between different reverb-boxes, the parameter values become temporarily mixed, which can create unwanted sound artefacts. These artefacts can be very apparent to the player and might disrupt the immersion. The risk of this problem occurring also increases when the difference between the two reverb presets is big, i.e. when the environment is very dynamic.

Another problem is that the process of creating and adjusting reverb presets is time-consuming for the sound designer. It is the available resourses that decide the complexity of the room acoustics (how many reverb-boxes should be used?) and the desired complexity should always be high when striving to create realistic environments. In the current reverb-box model, the complexity is relative to the time spent creating reverb-boxes and tweaking reverb presets. This might create situations where the acoustic complexity and thereby the acoustic quality of the game is compromised by the available resources of the project. This becomes unsustainable in a business where the consumer is constantly demanding increased quality.

A room acoustics model with non-static parameters would solve both these problems while also creating acoustical environments of greater complexity and dynamics. The sound environment would also become more causal as it would respond immediately to the player’s movements. The idea is that instead of using reverb-boxes and reverb presets with separate and static parameters, only one reverb DSP would be used. The parameters of this reverb would change as the player moves through spaces with different acoustical characteristics.

This would require some form of active ”scanning” of the surrounding space. Such scanning could be done using ray-casting. Rays emanating from the player in different directions could supply the sound engine with information about distance to surrounding boundaries and thereby the size of the space. Further development could also include material recognition which would provide information about absorptivity of the space. This could for example be done utilizing the textures or physical properties of the walls. A single reverb DSP, provided with information about room size and absorptivity, would be enough to simulate very accurate room acoustics.

If directional parameters for early reflections also were implemented, the direction could automatically be set so that the early reflections were more prominent from the direction of the nearest wall.

There are two obvious problems with such a model. One is that the Starbreeze reverb-box model would have to undergo extensive changes. Another problem is the “scanning”. Ray-casting is expensive in terms of computational power and several ray-casts would be needed frequently, in order to minimize artefacts. Many frequent casts would give a better result than few casts less frequently, but a proper amount would have to be decided upon to give a good balance between accuracy and performance.

4.5.3. Propagation

The possibility to handle simulated wave propagation such as occlusion, obstructions and exclusions without scripting would be a big step towards realistic SSA. Such models are already used by several developing studios, DICE is only one example. Since the work began with this thesis, sound programmers on Starbreeze has started working on developing the sound engine to allow for handling of occlusions. Such development, and further development of models related to ray-casting, will always be beneficial in the long run.

CHAPTER 4. ANALYSIS

40

4.5.4. Directional echoes

Echoes are a common feature of our everyday environment. Echoes are also strongly connected to the visual part of our perception of the world. When we hear an echo in the real world, there is (almost) always a clear visual source of that echo, for example a mountainside or a large building. Also, it is not uncommon, even for people with no acoustical education, to expect echoes in certain situations. Yet echoes is a part of SSA that has not been simulated in any game word (that I have come across prior to or during the work with this thesis.)

Simulation of echoes could affect the gaming experience positively in several ways, not only by creating more realistic acoustic environments. Echoes can make us more aware of our surroundings, not only aurally but also visually as we instinctively scan the environment for a possible source of the echo. Echoes also increases the dynamics of an environment as it creates distinctions between areas where the echo is audible and the surrounding areas. Echoes also add to the sense of causality. (Figure 9)

One way of simulating echoes would be to add separate parameters for echoes to the current reverb. The delay time would have to be separated from the delay time of early reflections, and a directional parameter would be necessary. If the current reverb-box model is used, separate reverb-boxes containing only echoes could be placed in positions where an echo would be audible. If a model with dynamic parameters and ray-casting were used, echoes could occur when the right criteria were met, for example: a ray hitting a wall with an angle of incidence of 0 degrees would result in an echo with a delay time relative to the distance to the wall.

It is possible that simulation of echoes require a different model then the one for simulating room acoustics. Simulated wave propagation would be too expensive in terms of computational resources. Ray-casting could be a possible solution but this might also prove to be too expensive in situations where a large amount of sound sources are present. When the computational resources of games allow for ray-casts in almost every direction from the player and other sound sources, this would also be a possible way of simulating echoes. Until those kinds of resources are available to sound designers (my guess is that this will not happen in the next 3-5 years), another way of simulating echoes needs to be applied.

Figure 9: The perceived location of a sound source can become altered due to echoes.

CHAPTER 4. ANALYSIS

41

4.5.5. Spatialization

The way that sound sources are placed in the 3-dimensional space of the game world is more than satisfactory and do not need any further development.

The only thing that differs between spatial sound in the game world and the real world is that in the real world, sounds are placed in the horizontal and vertical plane, whereas in the game world, sounds can only be placed in the horizontal plane. Though vertical placement in game worlds would certainly be a step towards reality, I believe that the benefits of developing this technology would be small. Although the technology for vertical localization exist today (the technology is the same as for horizontal placement), the restraints are more practical than economical or technical. All that is needed for vertical localization is one speaker above and one below the listener, but such a speaker setup would be very impractical. I would also argue that since the gaming industry takes much of it’s aesthetics from the movie industry, vertical localization will not be needed in games until the technique has become widely adapted in movie theatres and in consumer standard home theatre audio systems.

4.5.6. HRTF

I do not recommend developing HRTF.

Even though HRTFs play an important role in localization in the real world, I do not believe that the gaming industry would benefit from implementing HRTFs in the near future. Accurate simulations of HRTFs would of course be beneficial, but I believe that the problems with HRTFs outweigh the benefits.

The identified problem areas are:

The uniqueness of HRTFs between individuals means that spatialization using HRTFs would have different results for every single player. One solution to this problem would be to further identify and adapt common characteristics such as directional bands, so that one audio process fit all players. Extensive research has already been done on this area [3] but further research by the scientific community is still needed. Another solution could be to allow for individual calibration of the audio processing to match a players personal HRTF. This solution would probably give a better result for individual players, but it would also require extensive research.

Another problem with HRTFs is the risk of distortion that appears when several speakers play audio where the spectral content is similar but not identical. To avoid distortions, the listening environment must be well controlled and the player must be seated in the “sweet spot” at all times. (The “sweet spot” is the listening position for which the audio system have been calibrated)

There are also problems with how HRTFs works in speaker versus headphone listening. When using HRTFs in a speaker listening setup, problems arise because listening in speaker systems is binaural. This means that the sound from every speaker reaches both ears. The level difference and the precedence effect make sure that the perceived location of the sound is accurate, but there is no isolation of the left and right ear’s HRTFs. Headphones on the other hand do achieve the necessary isolation between the ears, but headphone listening is not a surround sound speaker setup, since headphones only have two speakers.

CHAPTER 4. ANALYSIS

42

4.5.7. Causality

The interviews indicated that causality (“cause and effect”) might play a part in how realistic and immersive a player perceives a game to be. Several interviewees mentioned that some objects did not make any sound when hitting or shooting it. Such situations conflicts with causality as the action of the player doesn’t have any effect in the game world. This should of course be avoided if possible.

Comparison can be made to room acoustics where the action of entering a large room results in the effect that the sound becomes more reverberant. This is the only situation where causality is applied to contemporary SSA and such situations were mentioned in several interviews to be important for the perceived realism and immersion. The sense of acoustic causality could be an important part of acoustic realism and by developing more complex room acoustics, simulated wave propagation and echoes, the sense of causality could be improved.

Chapter 5

Conclusion

44

5. Conclusion

The hypothesis of this investigation was that simulated room acoustics in a game would affect the gaming experience in such a way that the game was going to be perceived as more immersive, more realistic, and localization of sound sources is going to be easier.

The results clearly validate that a player perceives the game as more realistic and more immersive if the game includes simulated spatial acoustics. However, the results could not verify that the ability to localize sounds is affected by spatial acoustics.

The results also showed that players tend to perceive the game as more immersive and more realistic, without necessarily realizing that the enhanced experience might be the result of SSA. Non-spatial attributes of individual sounds were perceived to be of higher quality in games with SSA. The game world was also perceived as more dynamic. Results also indicate that room size is more important than absorption when simulating room acoustics and that SSA affects the sense of causality.

Several suggestions were made for further development of SSA. These were:

• Implementation of directional parameters for early reflections.

• Implementation of dynamic reverb parameters.

• Increased control of wave propagation.

• Implementation of directional echoes.

45

References

[1] ALTMAN, R. 1992. Sound theory, sound practice. Routledge. New York. ISBN: 99-1238732-5

[2] BÉKÉSY, G. 1960. Experiments in hearing. McGraw-Hill. New York. ISBN: 0-07-004324-8

[3] Blauert, J. 1996. Spatial Hearing - The Psychophysics of Human Sound Localization. MIT Press. Cambridge. ISBN: 0-262-02413-6

[4] BRIDGETT, R. Hollywood sound: Part Two. http://www.gamasutra.com/features/20050930/bridgett_01.shtml.

Date of access 2010-07-12

[5] CHENG, C. I. & WAKEFIELD, G. H. 1999. Spatial frequency response surfaces: an alternative visualization tool for head-related transfer functions (HRTFs). Proceedings of the Acoustics, Speech, and Signal Processing, 1999. Vol. 02. ISBN: 0-7803-5041-3

[6] DAVIS, G & JONES, R. 1990. Sound reinforcement handbook. Hal Leonard Corporation. Milwaukee. ISBN: 0-88188-900-8

[7] KINSLER, E. & FREY, A. & COPPENS, A. & SANDERS, J. 2000. Fundamentals of acoustics. John Wiley & Sons Inc. New York. ISBN: 0-471-84789-5

[8] KOLLINS, K. 2008. Game Sound: an introduction to the history, theory, and practice of video game music and sound design. MIT Press. Cambridge. ISBN: 978-0-262-03378-7

[9] MENSHNIKOV, A. Modern Audio Technologies In Games. http://ixbtlabs.com/articles2/sound-technology/. Date of access 2010-07-12

[10] MOORE, C.J. 1997. An introduction to the psychology of hearing. Academic Press. London.

ISBN: 0-12-505627-3

[11] ROSSING, T (Editor). 2007. Springer Handbook of Acoustics. Springer. New York. ISBN: 0-387-30446-0

[12] RUMSEY, F. 2001. Spatial Audio. Focal Press. Oxford. ISBN: 0-240-51623-0

[13] VORLÄNDER, M.2008. Auralization : fundamentals of acoustics, modelling, simulation, algorithms and acoustic virtual reality. Springer. Berlin. ISBN: 978-3-540-48829-3

[14] WALLACH, H. 1940. The role of head movements and vestibular and visual cues in sound localization, Journal of Experimental Psychology. Vol. 27, No. 4, p.339-368

[15] WENZEL, E. & ARRUDA, M. & KISTLER, D.J. & WIGHTMAN, F.L. 1993. Localization using non-individualized head-related transfer functions. Journal of the Acoustical Society of America. Vol. 94, No. 1, p. 111-123, ISBN: 94(1), 111-123

[16] ZWICKER, E & FASTL, H. 1999. Psychoacoustics: facts and models. Springer. Berlin. ISBN: 3-540-65063-6

[17] http://www.torgny.biz/Recording%20sound_1.htm Date of access 2010-08-10

[18] http://juliantreasure.blogspot.com/2009/08/your-incredible-ears.html Date of access 2010-08-12

46

Games

Army of Two: 40th day. Electronic Arts. EA Montreal. 2010

Assassins creed 2. Ubisoft, Ubisoft montreal. 2010

Battlefield 2142. Electronic Arts. EA Digital Illusions CE. 2006

Battlefield Bad Company 2. Electronic Arts. EA Digital Illusions CE. 2010

Bionic Commando. Capcom. GRIN. 2009

Bioshock 2. 2K Games. 2K Games. 2010

Call of Duty: Modern Warfare 2. Activision. Infinity Ward. 2009

Crysis. Electronic Arts. Crytek Franfurt. 2007

Gears of War 2. Microsoft. Epic Games. 2008

Half-Life 2. Sierra Entertainment. Valve Corporation. 2004

Halo 3. Microsoft. Bungie. 2007

Just Cause 2. Square Enix. Avalanche Studios. 2010

Killzone 2. Sony Computer Entertainment. Guerilla Games. 2009

The Chronicles of Riddick: Escape from butcher Bay. Vivendi Games. Starbreeze Studios. 2004

The Chronicles of Riddick: Assault on Dark Athena. Atari. Starbreeze Studios. 2009.

Wanted: Weapons of fate. Warner Bros. Interactive Entertainment. GRIN. 2009

47

Appendix

48

Interview questions

Halo 3

How is the general quality of

the sound?

How realistic is the sound?

How easy is it to immerse

yourself in the game?

How easy is it to localize sound

sources?

Modern Warfare

How is the general quality of

the sound?

How realistic is the sound?

How easy is it to immerse

yourself in the game?

How easy is it to localize sound

sources?

Battlefield

How is the general quality of

the sound?

How realistic is the sound?

How easy is it to immerse

yourself in the game?

How easy is it to localize sound

sources?

Questions after each game:

What do you think about the sound in the game?

Did you feel that you could immerse yourself in the game?

Was the sound realistic?

Was it easy to localize sound sources?

Did the sound affect how you perceived your surroundings?

Questions after all games:

Do you think that it is important that a game is realistic?

Do you think that it is important that the sounds match the environments?

Do you think that it is important to be able to localize sound sources?

What is the most important aspect of sound in games?

49

Level A1

How realistic is the sound?

How easy is it to immerse yourself in the game?

How much does the sound contribute to the

perception of the environment?

Level B1

How realistic is the sound?

How easy is it to immerse yourself in the game?

How much does the sound contribute to the

perception of the environment?

Level A2

How realistic is the sound?

How easy is it to immerse yourself in the game?

How much does the sound contribute to the

perception of the environment?

Level B2

How realistic is the sound?

How easy is it to immerse yourself in the game?

How much does the sound contribute to the

perception of the environment?

Questions after each level:

How did you experience the difference?

How big was the difference?

How did it affect the experience?

50

Results

Game session 1

General sound quality

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 Average

Participants

Sco

re MW

BF

Realism

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 Average

Participants

Sco

re MW

BF

51

Immersion

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 Average

Participants

Sco

re MW

BF

Localization

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 Average

Participants

Sco

re MW

BF

52

Game session 2

Realism

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 Average

Participants

Sco

re A1

A2

Immersion

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 Average

Participants

Sco

re A1

A2

53

Realism

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 Average

Participants

Sco

re B1

B2

Immersion

0

1

2

3

4

5

6

7

8

9

10

1 2 3 4 5 6 7 8 Average

Participants

Sco

re B1

B2

TRITA-CSC-E 2010:150 ISRN-KTH/CSC/E--10/150--SE

ISSN-1653-5715

www.kth.se


Recommended