Surround From Stereo David Griesinger Lexicon [email protected] griesngr.

Surround From Stereo

David Griesinger

Lexicon

[email protected]

www.world.std.com/~griesngr

mailto:[email protected]

http://www.world.std.com/~griesngr

Main Message• Two channel audio is ubiquitous and obsolete.

– Reproducing two channels through multiple loudspeakers in the listening room (or a car) is a big improvement, Even if it is poorly done.

• Fully automatic two-channel-to-surround processors are widely available as consumer products.– We need to make a processor that works well.

• Operator-controlled two-channel-to-surround processors are an important component in the re-mix process for creating discrete recordings from multichannel or two channel masters.– The better we can make this component the better the ultimate product.

• The design of a processor that works well can teach us how to make better discrete recordings.– If a machine can make a better surround mix than the typical sound

engineer, perhaps we have something to learn!– The design of a superior two-channel-to-surround processor requires an

understanding of room acoustics and human perception. This knowledge can be applied to recording (and processing) technique.

Introduction

• Two channel sound reproduction (stereo) was introduced about 50 years ago.– The improvement in emotional involvement provided by stereo over mono is

compelling and easy to demonstrate. – Since then the basic principles of stereo reproduction have not changed.

• Improvements in S/N or frequency response are welcome, but not compelling.• It seems likely that CD’s replaced LPs largely because they are more convenient,

although the improvement in sound quality was also easily heard.

• The improvements in envelopment and sound stage provided by multichannel sound are easy to appreciate and demonstrate – but they are seldom compelling.

– Current surround recordings improve the reproduction of the original hall, but are otherwise very similar to a two channel recording.

– It takes an innovative and risky mix to show what surround sound can do emotionally, and these mixes are rare.

• The major market for surround recordings in the future will probably be for playback in automobiles.

– Automobile playback probably requires at least a five to seven conversion as part of the playback system.

– This playback venue can reveal problems with common mixing techniques

Home playback of surround will include video.

• Standard Video and DVD are also obsolete• Standard NTSC video was also introduced 50 years ago,

and has changed very little.– Improvements in color rendition and S/N are welcome, but result

in little change in emotional involvement.– Standard video and DVD has insufficient resolution and (usually)

too small a screen to fully engage human visual perception.– Competent directors compensate with frequent close-ups, fast

editing etc.• The result is distracting and ultimately boring, as the director is

always forcing us to watch a particular aspect of the show.• At best such a presentation is good for one viewing only.

– Videos of music performances show the heads or fingers of particular performers, forcing us to listen to only one musical line.

• The intelligent interplay between musical lines (the essence of Western music) is lost.

What is the Solution?

• We need to upgrade both the audio and the video if we want to create a product that is more compelling than current standards.– Translation: We need develop a product that will generate

substantial sales.

• High-Definition video combined with multichannel audio gives this opportunity.– The audio can be made backwards compatible through an active

downmixer, as we will see in this talk.– The video is backwards compatible if the high-def image is

cropped and scanned.And this is becoming routine.

High-Definition Demo• Brahms F minor Piano

Quintet– Performed by the faculty of

the Point-Counter-Point Summer camp.

– Video is high-definition (with some artifacts.)

– Audio is two channel, single microphone pick-up.

– Played here (after post production) with two-channel to five-channel processing.

Five channel to Two channel Encoding

• It is widely believed that it is impossible to automatically mix a five channel recoding into a two channel recording.– Standard methods of downmixing have several serious

errors in balance.– These errors can be analyzed and corrected with an

active downmixer.

• The results can be surprising. Very often the downmixed 5 channel recording is better than the manually mixed two channel recording.

Desired properties of an active downmixer

• Most importantly, the effective energy of each signal source should be preserved in the two channel output.– There is no passive mixer that can achieve this goal

• Next, the two channel mix should reproduce the position of inner voices in the identical position as the 5 channel mix.– This also requires an active mixer, with compensation for errors in

the sin/cosine pan law.– We also must be careful to preserve the stereo width of

decorrelated signals applied to the rear inputs.• Finally, we can make some adjustments to the mix based

on dynamic analysis of the input signals.– Rear signals that are mostly reverberation should be attenuated in

the mix by 3dB.– Relatively low level rear signals in the rear that have prominent

onsets can be briefly brought up in the mix, to give them a better chance of being heard and properly decoded.

Center channel energy• The balance between the center vocal and the

surrounding instruments is critical.– Downmixing is this essential component is difficult, as

different engineers can mix it in different ways.• The usual center downmix mixes the center equally into both

output channels with an attenuation of 3dB.• This mix works well if the vocals are entirely phantom (no

center channel) or hard center (no phantom).• However it is very common to mix the vocals equally in all

front channels.• When this configuration is downmixed the vocals will be more

than 2dB too strong unless some correction is made.

– Likewise an instrument panned half-way between center and left or right will have about a 2dB amplitude error.

– How can we correct for this common error?

Rear channel energy• The same problem occurs in the rear channels. • The basic operation for the rear channels is to mix the left

rear input (x0.91) to the rear output, while also mixing it (x-0.38) to the right rear output.– The result is to preserve the loudness of a discrete rear input.

• But when the two rear inputs are driven in phase, the result is a 2dB to 3dB extra increase in the loudness of the output.– In a passive encoder it is not possible to have the correct balance

for discrete rear inputs while also having the correct balance when the rear inputs are driven in phase.

– All available passive encoders (Dolby Surround, etc.) attenuate the rear inputs by 3dB to compensate for this error.

– This makes discrete rear inputs (left rear only) 3dB too weak.• In addition there is a directional error as the signals pan

from left rear only to left and right rear together.

A complaint

• This problem has vexed me from the beginning of my work on matrix audio.

• Early L7 encoders measured the phase coherence between the front and rear channels, and adjusted mix coefficients on an ad-hoc basis.– The result was adequate for the rear energy problem, but the panning

errors remained.– Panning errors were also problematic in the front, particularly when

vocals were placed in multiple channels.• It is really very difficult to determine how the mix engineer has placed

the vocals – and yet you must do so to correctly downmix the piece. – Mix engineers that put the vocals in all five channels are really asking for

trouble.– And if they add random time delays to the center channel the result

becomes impossible to downmix at all.• (the difference these delays make to the sound is almost entirely imaginary.)

Solution:

• But there is a solution! Elegant, simple, and thoroughly obvious.

• We can correct the output signal based on an energy comparison.– We measure the energy of the input signals, and compare this to the

energy of the output signals.• It is important to frequency contour the measurement to correspond to human

loudness sensitivity. – The difference signal can be used in a simple feedback loop to correct the

amplitudes of the mix.– As an added advantage it is possible with this technique to correct for

panning errors, and for the difference between the sine/cosine pan law and subjective pan positions.

– (patent applied for)

Encoder Block Diagram

Here you can see the basic structure of the active encoder. Note the center is mixed to the output with two variable coefficients, ml, mr, (usually mr=ml= -3dB).

The rear channels are mixed with the basic .91 and -.38 ratio to the outputs, but they are also attenuated by the coefficients mi and ms, to preserve the correct output power under all conditions. (usually mi = 1, ms = 0.)

The use of 90 degree phase shift networks is common in all encoders, as it allows arbitrary pans from front to rear.

Surround from Two channel• Two choices:

– We can create a whole new mix – repositioning the musical forces, adding reverberation, etc.

• This type of upmix requires the thoughtful participation of a skilled operator.• Ideally these operations are done by the musical copyright holder, for

distribution on CD or DVD.– Or we can preserve the original recording as much as possible, while

using multichannel reproduction to:• Enlarge the listening area• Increase envelopment• Transform the listening room to a larger, more comfortable space.

– This type of upmix should be fully automatic, and can be part of a consumer playback system.

• Automatic processing is particularly beneficial in small spaces, such as cars.– This talk will concentrate on automatic processing.

• We want to reproduce the original recording, while improving the listening area and the overall impression.

• In this talk we will deliberately limit ourselves by not discussing adding anything to the original, such as reverberation or early reflections. Potentially at least, the original recording can be reconstructed from the converted outputs.

The default• The MOST AUDIBLE difference between the various

currently available 2-5 processors is what they do when there is NO detectible direction in the input sound.– When there is no detectible sound cue we reproduce the sound in a

default condition– The default condition is the most common condition in most

music, and how we treat this case is of utmost importance.

• If the default behavior of the processor can improve the acoustic performance of the original without altering the original sound stage, we will have succeeded.

• Our major goal is to “do no harm” to the original mix, while improving listener envelopment and increasing the useable listening area.

What does “default” mean?

• In the default condition the input channels are decorrelated – in other words there is no common signal between the two channels.

– An example might be an orchestral recording with violins on one side and cellos on the other.

– Or any complex mix with a lot of reverberation or high left/right separation.• In a recording mastered for two loudspeaker reproduction the default playback

condition is normal stereo – a phantom image spread between the front two speakers.

– Two channel stereo reproduction allows the listener to identify the direction of the sound sources through the (buried) amplitude cues in the original recording.

• If we are unsure about what to do in converting this recording to multichannel, our best option is to preserve the stereo image.

• This seems obvious, but it is not standard practice

Standard 2-to-4 decoders

• The usual default condition of multichannel decoders is Dolby Surround:

• The center and rear channel are as loud as the main the main channels, causing the sound image to move strongly toward the center of the room.

• Image width is reduced by about ½.

• The envelopment in the room is also reduced since the rear energy is added entirely in the medial plane (so there is no lateral sound energy.)

Standard 5 channel decoders

• The most obvious extension of Dolby Surround to 5 channels uses a type of matrix to supply the left and right rear speakers.

• If the input signals contain a buried sound that is encoded to the rear, this matrix will send the signal predominantly to the appropriate speaker.

• However when there is NO buried encoded signal (which is almost always the case) this matrix results in a center channel that is too strong, and rear channels that are out of phase!.

Antiphase rear channels

• The tendency of the rear channels to be out of phase an inherent problem with phase/amplitude encoding and decoding.– As we pan a signal from left to left surround the input signal is positive

phase in the left input, and negative phase in the right input.– If we pan from right to right surround the signal is positive in the right

channel, and negative in the left.

• So what happens when we want to be fully to the rear? – should the input signals be negative on the left or the right?– If we choose to always make the right channel with negative phase, then

in the default condition the loudspeakers will be out of phase.

• The best solution to this problem is to incorporate a variable phase shift network in the rear channels, that will actively flip the phase when there is a sound that is strongly in the rear.

Examples – a “music” decoder

X-Y plot of the rear outputs of a popular “music” decoder when the inputs are driven by uncorrelated pink noise. The correlation coefficient is -0.25

Example – a “film” decoder

The problem can be corrected to some degree by reversing the phase of the right rear channel. The rear channels are now decorrelated – but pans from right front to right rear might be a little peculiar. The correlation coefficient is +0.25.

Example – a decorrelated default

It is possible to design a decoder where the default results in decorrelated rear channels. The result is sonically superior. The correlation coefficient is 0.

Block diagram for a decorrelated default

If we create the rear channels by delaying and frequency contouring the front channels, the full separation in the input channels is maintained in the rear.

This results in higher envelopment around the listeners and a more comfortable sound.

The frequency contouring is vital – and contains shelving as well as rolloff.

The effect is highly audible, particularly in a small listening room or an automobile.

However a decorrelated default makes the decoder design more complex, as there is no inherent cancellation of a center speaker in the rear channels

Decorrelated rear steering• The next most audible difference between current decoders

is their behavior when playing surround recordings where the original rear channels were decorrelated. For example,– Crowd noise in music CDs– Orchestra sounds in the rear on a film– Backup chorus in music CDs

• When these recordings are encoded to two channel these decorrelated signals result in a output with net negative correlation.

• It is desirable to decode these signals such that the original decorrelation is restored.

• Most available decoders do poorly on this test, and the result is highly audible – (particularly in small rooms)

Examples: Decorrelated rear steering

“Film” decoder

(Corr ~0.5)

“Music” decoder

Decorrelated Decoder “Music” and “Film”

Listening Examples

• Decorrelated rear steering during crowd noise in “Hotel California”– Note high positive or negative correlation leads to a

flat, two-dimensional sound

• Example with pink noise• (must play from CD)

How does a decoder work with directional signals?

• Once we have a great default, we need to consider sounds which have a distinct direction:

• We need to detect the direction of these sounds

• And then we need to adjust mix coefficients to direct these sounds to the appropriate speakers.

• Frequency contouring of the rear channels is essential, and must be made to depend on the degree of steering.

– A high frequency rolloff is important when reproducing sounds in the front, but it should vanish smoothly when signals move rear.

– In addition a shelving filter at about 300Hz allows the rear speakers to reproduce bass frequencies while not drawing attention to themselves.

• This filter should also disappear when signals move rear.

• The type of frequency contouring used in upmixing is also very useful in making discrete recordings!

Block diagram of a 5 channel decoder

This diagram does not include shelving and rolloff filters, which are essential in a practical design.

The matrix elements in a decoder designed for maximum decorrelation

• The output of the decoder is entirely determined by the way the ten matrix elements depend on l/r and c/s

• We can graph the surface formed by the matrix element on the l/r c/s plane

• By symmetry we need to graph only 5 of the 10 matrix elements

© Lexicon - a Harman International Company

Matrix elements in a decorrelated decoder

•Left Input

•Center Output •Left Front Output •Left Rear Output

•Right Input

Left input to Left Front output element (LFL) inverted back to front to reveal the trough at left rear

•The peak in front keeps loudness correct for center signals

The Right input to Left front output matrix element (LFR)

•The peak in front keeps loudness correct for mild front steering

The Left input to Left Rear output matrix element (LRL)

•The ridge in the rear keeps separation high as sound moves back

The Right input to Left Rear output matrix element (LRR)

•Note the ridge along the back - which keeps separation high in the rear

The Left input to Center output matrix element (CL)

•Note the rapid increase in level as the steering moves forward. This preserves stereo separation while making a hard center

Decorrelated Decoder Conventional Decoder

•This element shows particular attention to the center of the plane.

•These surfaces are defined by the edges, not the center.

For directional signals we must:

• 1. Analyze the original recording to determine the directions of the original instruments.

• 2. Adjust the mixing parameters of our processor to reproduce these sounds though a different loudspeaker arrangement in precisely the same positions

Original recording analysis• We must look for cues in the original recording that will

allow us to determine the directions of all the instruments

• We have to determine these directions quickly enough and accurately enough to mimic the properties of human localization.

– If we can closely approximate human hearing, the perceived results will be flawless.

• Human sound perception is based on detecting “Sound Events”.– So our processor needs a “Sound Event Detector” with accurate estimations of

probable sound direction.• Human perception uses several directional cues:

– Direct localization of sound events through amplitude and time differences between the two ears.

• These localization cues are mostly supplied through amplitude panning in a recording.– Overall “center of gravity” localization, done through level differences (for left-

right) combined with head rotation (for front-back)• This is how we can be aware that a group of instruments are behind us, even if there are

no discrete sound events.

• If we determine that the original was surround encoded (Logic 7 or Dolby Surround) we need to correctly determine rear directions.

Amplitude Panning

• Nearly all current recordings employ amplitude panning– Phantom images are moved from one speaker location

to another by varying the relative amplitude between the speakers.

• As a consequence the intended position of a sound source can be detected if we compare the amplitudes of the two input channels.

• Both applications require that we know the perceptual result of various amplitudes.– We need to know the “pan-law”

The sine/cosine pan-law

• The most common pan-law in common use is the sine/cosine pan-law

• Left_output = cos(p)*input

• Right_output = sin(p)*input

•

• Where p is assumed to vary from 0 degrees (full left) to 90 degrees (full right), with center at p = 45 degrees. Note that then:

•

• (Left_output)^2 + (right_output)^2 = (input)^2

Sine-Cosine drawing

If the sine-cosine pan law is accurate, and we set p=22.5 degrees, we should hear a sound image half-way between center and left.

How does a surround decoder work?

• Compatibility with Dolby Surround requires a phase/amplitude decoder– thus direction is determined by evaluating |Lin|/|Rin|

(lr) and |Lin+Rin|/|Lin-Rin| (cs).• Stereo compatibility requires that the front localization

roughly follows a sine/cosine pan law.– Thus for strongly steered signals

• Lin = cos(A)• Rin = sin(A)

– As angle A varies from 0 to 90 degrees the sound pans from left front to right front

• Strongly steered rear signals can be encoded by allowing angle A to increase from 90 degrees to 180 degrees.

The left/right and center/surround signals

• Define l/r = arctan(|Lin|/|Rin|) - 45 degrees

• Define c/s = arctan(|Lin+Rin|/|Lin-Rin|) - 45 degrees

– When there is no dominant direction in the input signal, all the levels l/r and c/s ~=0

• A signal cannot be both left and center at the same time! Thus the sum of |l/r| and |c/s| is bounded.

– |l/r| + |c/s| <= 45 degrees

© Lexicon - a Harman International Company

The l/r and c/s signals are bounded

• l/r and c/s are not independent for strongly steered signals, where Lin = cos(A), Rin = sin(A)– in this case c/s ~= A, l/r ~= 45 -A

• Allowed values for l/r and c/s fall within a diamond in the l/r c/s plane.

• A circularly panned signal will produce l/r and c/s values which lie on the boundary of the diamond.

A circularly panned noise signal as seen by a phase/amplitude decoder

•Music signals will fall inside this boundary

Histogram of A a typical stereo piece (first 30 seconds of Jennifer Warnes)

•Note the bulk of the power is not steered (uncorrelated), but it moves from the middle to the front

Sound event direction detection, newer circuit

• Jennifer Warnes – Bird on a wire, whole song

A Histogram of an encoded 5-channel piece (first 30 seconds of Boyz II Men)

•Note the extensive use of the rear directions. Here we need full stereo width in the front and in the rear!

A Histogram of a classical 5-channel piece (30 seconds of 1812 recorded by Eargle)

•Note the music has no net left/right bias. It is stereo, but the voices are mostly in the rear.

Event Detection• A MAJOR problem in the design of a two-channel to

multichannel processor is determining the true direction of the incoming sounds– When the recording includes only a single sound there is no

problem.– But most music is a complex mix of many sounds, with

reverberation. Reverberation provides a random directional signal that can easily confuse the processor.

• An event detector uses information in the amplitude envelope to search for loudness patterns that represent notes in music or syllables in speech

• Depending on the rise-time and fall-time of the detected sound events the time constants of the sound processor are adjusted to maximize the speed with which sounds can be re-directed to the correct positions

• While NOT rapidly steering on more continuous music.

Accommodation in sound perception

• All modes of human perception emphasize transients.– A perception which is sustained tends to drop from consciousness.– This loss of perception of sustained events can happen quite early

in the detection physioligy.• It does not need higher neural processing.

• We can more accurately detect event direction if we can ignore continuous signals.– For example, consider a strong continuous vocal line in the center,

with accompanying sound events on either side

• We can build a differential level-ratio detector, that will accommodate to continuous sounds, while correctly identifying the direction of transient syllabic events.

Differential direction detection• Conventional direction detection relies on the

level in the input channels

• Define l/r = arctan(|Lin|/|Rin|) - 45 degrees• Define c/s = arctan(|Lin+Rin|/|Lin-Rin|) - 45 degrees

• An essential feature of a differential detector is the realization that if we use the input power instead of the input level it is possible to accurately detect the direction of a buried transient event.– Power is additive: if we subtract any constant power from any

input, the power change during a transient event accurately reflects the direction of the transient event.

• The following equations give identical results to the above:• l/r = arctan(sqrt(|Lin^2|/|Rin^2|)) - 45 degrees• c/s = arctan(sqrt(|(Lin+Rin)^2|/|(Lin-Rin)^2|)) - 45 degrees

Now let the input power adapt to constant signals

• (Lin_adapt)^2 = Lin^2 – av(Lin^2)• (Rin_adapt)^2 = Rin^2 – av(Lin^2)• (Similarly for the sum and difference signals)• If we detect directional angles using the adapted

signals, transient events which occur during constant signals can be accurately localized.

• This scheme requires that we accurately detect the start of transient events. In the absence of such an event the unadapted directions must be used– Or the direction detection becomes unacceptably noisy.

Onset Detection

• We can use the amplitude envelope of the signals – particularly after adaptation – to find the onsets of individual events

• The sensitivity of this process must adapt to the nature of the music. Pop music tends to have many buried significant events, while classical music does not.– Decoding classical music as if it were pop results in

many errors in directional panning, which can be audible.

Amplitude envelope of reverberant speech

Anechoic speech with added reverberation equivalent to a small hall at about 20 meters.

Analysis into 1/3 octave bands, followed by envelope detection.

Green = envelope

Yellow = edge detection

This slide shows how the steeply rising edges of the amplitude envelope can be used to trigger a syllable onset detector.

The advantage of human hearing

• Human hearing has the advantage of being frequency selective.– The event separation process takes into account the frequency

content of the incoming signals, and can separate them based on this information.

– Most surround processors are broadband systems. We want to do “almost as well” as human hearing using only phase/amplitude information to both separate events and to determine direction

• In spite of this disadvantage a system based on broadband event detection works remarkably well.

Examples of event detection

Left Right

Center

RearLeft Surround Right Surround

This slide plays an MPEG movie showing the instantaneous output of the event detector playing music and test signals of various types.

Testing event detection

A test signal which consists of a tone that alternates from left to right while panning first to the center, and then to the rear.

We then add an interfering constant tone to the center with increasing level.

Front pan

Rear pan Pan with center tone

Center tone louder

Center tone louder

Analysis of test results

Left Front

Right Front

Left Rear

Right Rear

Center

Output of an accommodating decoder in the “film” setting to the pan tones with no interfering center signal

Output with a constant center signal at -7dB level

Left FrontRight Front

Left RearRight Rear

Center

Output with a constant center signal at equal level


Left RearRight Rear

Center

Output with a constant center signal at +7dB level with an accommodating decoder


Left RearRight Rear

Center

Comparison at 0dB center level

Accommodating decoder output. Note how the first 80ms or so of each tone burst tends to accurately show the intended direction of the sound. The accuracy degrades as the sound continues, but this is not audible.

Commercial decoder output with no accommodation. Note there is no left/right difference in the rear, and there is a high level in the front speakers during rear panning.

Comparisons with plots

We can plot the amplitude graphs as polar diagrams, using the center of gravity of the sound to determine the probable perceived direction. This graph is from a commercial non-adapting decoder, with no constant interfering signal.

Note the first two plots expand the front axis to +-90 degrees. Each pan position has been divided into four equal sections. These plots show very good performance.

Accommodation vs non-accommodation – equal interference and signal

An accommodating detector (in this case in film mode) catches the true direction of the first wavefront.

Standard detectors cause all signals to be brought toward the center. This is the “film” setting

. (These plots show the first 80ms of the tone bursts.)

A better plot – no interference (rear)

It is more revealing to plot the direction of the output by finding the angle of the center of gravity. We plot the result as a circle who’s radius depends on the degree of coherence of the image. If the sound comes from all around, the radius is large, and the center moves to the center of the plot. All phantom images (such as the full rear position) will have large circles, indicating the sound comes from more than one speaker. (The minimum circle radius is limited to 0.05 in these drawings so they remain visible.)

With no interference the two decoders perform similarly. It can also be seen that the non-accommodating decoder puts signals panned only partly to the rear further to the rear than might be expected. (Careful observers may notice that the accommodating drawing has 7 positions, the non-accommodating on has 6.)

A non-accommodating “film” decoder An accommodating “film” decoder

A better plot – interference -7dB

With an interfering signal the positions in the non-accommodating decoder are brought toward the front, with the large circle radius indicating the sound comes from many speakers at the same time. These sounds are heard as quite diffuse.

The performance of the accommodating decoder actually improves somewhat over the previous slide.

A non-accommodating “film” decoder An accommodating “film”decoder

A better plot – equal interference and signal

With equal interference and signal the accommodating decoder degrades somewhat, but the performance is still very good. The non-accommodating detector throws the rear signals increasingly into the front.

Non-accommodating detector (first 80ms) Accommodating detector (first 80ms)

A better rear plot - +7dB interferenceA non-accommodating decoder An accommodating decoder

Neither deocder works perfectly for rear signals at this level of interference from the front, but the accommodating detector does better. At least the front left and right signals are correctly reproduced.

Front positions – no interferenceA non-accommodating “film” decoder An accommodating “film” decoder

Both decoders perform well with no interference. Notice that this non-accommodating “film” decoder widens the front soundstage by putting some of the left-only and right-only signals into the rear speakers as well as in the front.

Front positions - -7dB interferenceA non-accommodating “film” decoder An accommodating “film” decoder

With a -6dB interfering signal both decoders still perform well. Notice that the non-accommodating decoder still has some leakage to the rear. The leakage continues into the first few positions of the front pan.

(The limit on the minimum circle radius prevents us from seeing the circles between center and left become larger as they are being reproduced by two speakers instead of one. We can see that the localization outside the sweet spot will be good. This is due to the close angular spacing of the speakers.)

Front positions – equal interference and signalA non-accommodating “film” decoder An accommodating “film” decoder

With equal interfering tone and signal the performance of the non-accommodating decoder degrades markedly for full-left and full-right signals, which are drawn toward the center, and are reproduced through all loudspeakers.

The accommodating decoder works well, still displaying the minimum circle radius for all positions.

Front positions - +7dB interferenceA non-accommodating “film” decoder An accommodating “film” decoder

With high interference the non-accommodating decoder draws all the signals into the center, and reproduces them through all loudspeakers. The exact center however is exclusively in the center speaker.

The accommodating decoder continues to have high left-right separation, and nearly minimum circle radius.

Front Positions – “Music” vs “Film”A non-accommodating decoder - no interference – an accommodating decoder

It is possible to improve the imaging of a decoder (in the sweet spot only) by reducing the level of the center speaker, and reducing the front steering. The front image is then mostly phantom, and there are fewer steering artifacts. This setting is typically called the “music” setting of the decoder.

With an accommodating decoder it is not necessary to reduce the front steering to prevent artifacts. We deliberately blend only the true center position into the left and right fronts. This treatment of vocals is frequently used in discrete surround mixes. This setting can reduce any audible pumping of the decoder when vocals and background are nearly equal in level.

Front Positions “Music” -7dB interference

A non-accommodating decoder An accommodating decoder

With -6dB interference the two decoders behave much the same as in the previous slide.

Front Positions – Music + interference at 0dB


The non-accommodating decoder shows leakage into the rear channels, which increases the circle radius for left front and right front. The accommodating decoder keeps the leakage low.

Front Positions – Music + interference at +7dB


The leakage in the non-accommodating decoder increases at this level of interference, while the accommodating detector continues to work well.

Some Math• The circular burst plots make an interesting graphic, but making them

involved a number of guesses. It is not obvious how to determine just where a sound will be perceived when it is emitted from several loudspeakers. The math that follows might be useful, but I think the “better” graphs might be ultimately more useful, as they more accurately reflect the performance of the decoder when the listener is not in the exact center between the speakers.

• The calculation of the angle of the sound source for front panning is determined by summing the sound powers from the different loudspeakers, in order to find a “center of gravity” of the sound.

• If we let av_front_left = sqrt(pressure_from_left_front^2); (and so on)• ra = 360/(2*pi);

• l_angle = 45-ra*atan((av_front_right+0.71*av_center))/• (av_front_left+0.71*av_center)));•• This yields an angle for the center of gravity of the front three speakers. The correction

factor of 0.71 for the center level results in a good match to a sin/cosine pan law. No correction for the measured pan laws is included so far.

More math• We can then write a similar expression for the sum for front panning of all five

speakers.

• l_angle_late = 120-ra.667*ra*atan((av_front_right +0.8*av_center +av_rear_right+(.534*av_front_left) +(rear_leak*av_rear_left))) /(av_front_left+0.8*av_center+av_rear_left+(.534*av_front_right) +(rear_leak*av_rear_right)));

• Rear_leak is a fudge factor

• rear_leak=0;• % for low rear levels, use the full value with no leak• if av_rear_left < .71*av_front_left • rear_leak = .303;

% for intermediate levels use an intermediate leak, that fades out when the rear is 3dB stronger than front

• elseif av_rear_left < 1.41*av_fr_al1(angl,i)• rear_leak = .303*(2 - av_rear_left/av_front_left/(2-.71);• end

Math for the better display

% find the back-front center of gravity

f_grav_cnt_fb(angl,i) = (cos(front_ang)*av_fr_al1(angl,i))^2 +(cos(front_ang)*av_fr_ar1(angl,i))^2 +av_ctr_a1(angl,i)^2 -((-cos(rear_ang)*av_rear_al1(angl,i))^2+(-cos(rear_ang)*av_rear_ar1(angl,i))^2);

f_total_power(angl,i) = av_fr_al1(angl,i)^2+av_fr_ar1(angl,i)^2+av_ctr_a1(angl,i)^2+av_rear_al1(angl,i)^2+av_rear_ar1(angl,i)^2;

f_grav_cnt_fb(angl,i) = f_grav_cnt_fb(angl,i)/f_total_power(angl,i);

if f_grav_cnt_fb(angl,i) < 0;f_grav_cnt_fb(angl,i) = -sqrt(-f_grav_cnt_fb(angl,i));

elsef_grav_cnt_fb(angl,i) = sqrt(f_grav_cnt_fb(angl,i));

end

Matlab code for finding center of gravity. Front_ang = angle of front speakers from the center (45). Rear_ang = angle of the rear speakers (120). Av_fr_al1 is the average power from the front left speaker during the time of index i, av_fr_ar1 is the average power from the right front speaker, etc. Angl is the index for the particular angle, and i is the index for a short time period within that angle.


% find the left-right center of gravity

f_grav_cnt_lr(angl,i) = (sin(front_ang)*av_fr_al1(angl,i))^2+(sin(rear_ang)*av_rear_al1(angl,i))^2 -((sin(rear_ang)*av_rear_ar1(angl,i))^2+(sin(front_ang)*av_fr_ar1(angl,i))^2 );

f_grav_cnt_lr(angl,i) = f_grav_cnt_lr(angl,i)/f_total_power(angl,i);

if f_grav_cnt_lr(angl,i) < 0;f_grav_cnt_lr(angl,i) = -sqrt(-f_grav_cnt_lr(angl,i));

elsef_grav_cnt_lr(angl,i) = sqrt(f_grav_cnt_lr(angl,i));

end

Math for the better displayFirst find an average center of gravity for each angle, by averaging over the index i. Then find the polar angle of the center of gravity and the plot radius. The constant ra = 360/(2*pi); Note the step which limits the minimum radius.

for b=1:angstpsav_fba(b) = 0;av_lra(b) = 0;for p = 1:nvarsp

av_fba(b) = av_fba(b) + f_grav_cnt_fb(b,p);av_lra(b) = av_lra(b) + f_grav_cnt_lr(b,p);

endav_fba(b) = av_fba(b)/nvarsp;av_lra(b) = av_lra(b)/nvarsp;

av_plt_ang(b) = 90-atan(abs(av_fba(b)./abs(av_lra(b))))*ra;radius(b) = sqrt(av_fba(b)^2+av_lra(b)^2);av_circ_sz(b) = 1 - radius(b);if av_circ_sz(b) < 0.05

av_circ_sz(b) = 0.05;end

end

Nvarsp = the number of time steps for the index i. (p) If we want to only look at the first half of the burst, we use a smaller value for nvarsp.

The minimum radius for display can be changed here.


figurehold onfor b=1:angstps

x = circx*av_circ_sz(b)+av_fba(b);y = circy*av_circ_sz(b)+av_lra(b);plot(x,y,colorval(b)),axis([-1.1 1.1 -1.1 1.1])

x = circx*av_circ_sz_r(b)+av_fba_r(b);y = circy*av_circ_sz_r(b)+av_lra_r(b);plot(x,y,colorval(b)),axis([-1.1 1.1 -1.1 1.1])

plot(n,zip,'w') % mark the centerplot(zip,n,'w') % mark the centerplot(n,n,'w') % mark the frontplot(n,-n,'w')plot(-n,n,'w')plot(-n,-n,'w')

endxlabel('front_back')ylabel('left_right')title('front positions')

Now plot the results.

x = 2*pi*(1:50)/(50);circx = sin(x);circy = cos(x);n = ((1:50)/25)-1;zip = zeros(size(n));Angsteps = number of angular steps: 6 or 7 in these examples.colorval = ['y','r','b','g','m','c','y'];

Sound redirection

• Once a sound event is detected, the processor must adjust its outputs so the sound is perceived as coming from the correct direction.– This process not trivial. We can detect the electrical

direction of a sound. We need to know its perceived direction.

– Two channel panning and three channel panning can be perceived quite differently

• So we need to study sound panning, both to understand how to detect sound directions in a recording, and how to reproduce them.

Sine-Cosine drawing

If the sine-cosine pan law is accurate, and we set p=22.5 degrees, we should hear a sound image half-way between center and left.

What actually happens:

The actual position of a speaking (or singing) voice when p = 22.5 is closer to the left speaker – where we would expect p=15 to be.

Sine Law, Tangent Law

• If g1 = the gain of left channel, and g2 = gain of the right channel, then: • Tangent law:

Apparent position= arctan(tan(45)*(g1-g2)./(g1+g2));

The tangent law is equivalent to the sine/cosine law as described in the previous slide..

• Sine law:Apparent position = arcsin(sin(45)*(g1-g2)./(g1+g2));

The sine law predicts images even further away from their apparent position (with speech) than the sine/cosine law.

From: V. Pulkki and M. Karjalainen, “Localization of Amplitude-Panned Virtual Sources, Part1: Stereophonic Panning,” J. Audio Eng. Soc., vol. 49, pp 739, 752 (2001).

Pan law accuracy matters!

• With two channel recordings pan-pots can be adjusted by ear, and panning errors are no big deal in practice.

• But say we want to make a surround recording with a 3 channel front image:– We would like to duplicate the mix in two channels

using the same panner settings.

– If our pan-laws were accurate, we could do this easily.

Conversion of two channel to multichannel

• We also want to automatically upmix a two channel recording to a three channel, five channel, or seven channel recording.

• To do this we must determine the true apparent direction of a sound source in the two channel recording, and then duplicate this position in the multichannel recording.– It is not fair to cheat by ignoring the center channel!!!

• To do this we need ACCURATE pan laws.

Panning conversion• Our standard decoder is a two channel to seven channel upmixer.

– It includes a sound event detector which separates the incoming sound into separate events, and determines the intended direction of each event.

– The output matrix is then adjusted to place that sound event as closely as possible to the intended position

• The output pans use only the two loudspeakers that are closest to the intended position.

• We fully use the center channel (in film mode.)

• As the sound event detector improved we noticed that the perceived front image was consistently narrower than the two channel original.– Either the three channel front pan-law or the the two channel pan law (the

sine/cosine law) had to be inaccurate.– So… time to learn something new.

The three channel pan law

• We can test three channel pans by using a sine/cosine pan pot to sweep between the center channel and left or right.

• Left_output = cos(p)*input• Center_output = sin(p)*input• This time as p varies from 0 to 90 degrees the sound image will pan from

left to center. When p = 45 degrees, we might expect the perceived image to be half-way between center and left.

• This is exactly what what we find.• So the panning error is in the two-channel pan law.

The physics of two-channel panning

The pressure at each ear is the sum of the direct sound pressure from one speaker and the diffracted sound pressure from the other.

These two signals interfere with each other, producing a highly frequency dependent signal.

Consequences of panning physics

• A two channel pan is entirely different from the localization process in natural hearing.– Localization always depends on the interaural time delay (ITD)

and the interaural intensity difference (IID).– In natural hearing the ITD and IID vary due to head shadowing

alone.• Between 500Hz and 1500Hz the ear relies increasingly on IID rather

than ITD, and the precise phase of the ear signals becomes inaudible.

– In a two channel pan, ITD and IID vary due to INTERFERENCE.– The interference is entirely PHYSICAL. It happens at all

frequencies, even HF.• Thus the phase relationship of HF signals continues to be important at

all frequencies.

The frequency transmission of the pinnae and middle ear

From: B. C. J. Moore, B. R. Glasberg and T. Baer, “A model for the prediction of thresholds, loudness and partial loudness,” J. Audio Eng. Soc., vol. 45, pp. 224-240 (1997).

The intensity of nerve firings is concentrated in the frequency range of human speech signals, about 700Hz to 4kHz. With a broad-band source, these frequencies will dominate the apparent direction.

Past History• We discovered that the apparent position of a sound source is highly influenced

by its expected position and its past history.– Expectation of azimuth and elevation is particularly important in sound recording.– Localization in a natural environment is almost always dominated by the visual field,

or a memory of a visual field.

• In panning experiments we can alter the bandwidth or the band frequency of a known source type, like speech. This change of frequency mix will drastically alter the IDT and IID.

– And yet a source which appears to be located at a particular position with one mix of frequencies will remain in that position when the frequency mix is changed.

– This is because the brain expects the sound to remain in the same place.

• Alternating the presentation from left to right by switching the speaker channels breaks this hysterisis.

– Thus the subject is asked to estimate the width between sound images which alternate left and right,

– rather than the position of images presented consistently on one side or the other.

Apparent width of broadband sources

• Broadband sources are consistently perceived as narrow in width– But when we analyze them in critical frequency bands we find their

apparent position varies over a wide angle. – The brain must make sense of these conflicting directional cues.

• The neurological process of separating sound into streams assigns a “best guess” position to the entire stream, rather than separating the perception into several streams in different directions.– Since the brain expects most sources to have a narrow spatial extent, the

“best guess” position is applied to all the conflicting frequency bands, and the source is perceived as sharply localized.

• Once the brain has assigned a direction to a source stream, it is quite reluctant to change it.

Pan law tests with third octave filtered speech

• Start with a broadband speech segment that alternates between half-left and half-right using a sine/cosine pan (p = 22.5 and p=67.5 degrees)

• Ask a subject to tell you the angular width between the two apparent positions.

• Now filter the speech into third-octave bands, and ask the same question for each band.

Results at High Frequencies (ILD + ITD)

Apparent position of 1/3 octave filtered speech as a function of frequency.

Note the sine-cosine law is accurate below 600Hz, but the position of the broadband source is strongly pulled away from the center by the increased angle of frequencies above 1kHz.

Conclusions about panning

• Images from broadband speech and music sound sources are consistently perceived as wider than would be predicted by a sine/cosine pan law.

• The discrepancy can be explained by the dominance of frequencies between 700Hz and 4kHz in human hearing.

• The hearing mechanism appears to simply weight the apparent position of each frequency band by the intensity of nerve firings in that band when assigning an azimuth to a particular sound stream.

• The expected position of a sound stream, and the past history of its position, will have a strong (and usually dominant) influence on perception.

Listening tests

Conclusions

• Designing two channel to multichannel decoders can become a bit of a consuming passion.– The results can be surprisingly good.

• The tools shown in this workshop are applicable to a decoder with any number of output channels.– It is possible to decode the front into five channels instead of three, with a

substantial improvement in localization accuracy outside the sweet spot.• The essential features of a good decoder include careful attention to

the center channel and to the decorrelation and frequency contouring of the rear channels.– This same attention can and should be paid when a discrete mix is made.

• It is possible (but sometimes difficult) to make a discrete mix that is clearly superior to an automatic two-to-five conversion. – Comparing the original discrete mix to the same mix after downmixing

and upmixing can be quite revealing, as the down-mixed and up_mixed mix is often superior to the original.

– Analysis of the reasons for this observation can lead to better mixing technique..

Date post:	26-Mar-2015
Category:	Documents
Upload:	riley-holt
View:	216 times
Download:	0 times

Surround From Stereo David Griesinger Lexicon [email protected] griesngr.

Documents