Ryan Janzen and Steve Mann University of Toronto · Ryan Janzen and Steve Mann University of...

HIGH DYNAMIC RANGESIMULTANEOUS SIGNAL COMPOSITING, APPLIED TO AUDIO

Ryan Janzen and Steve Mann

University of Toronto

ABSTRACT

High Dynamic Range (HDR) compositing is well establishedin the field of image processing, where a sequence of differently-exposed images of the same scene are combined to overcomethe limited dynamic range of ordinary cameras.

We extend this technique to audio. Rather than acquiringsamples separated by time or space, as is done in HDR imageprocessing, we propose to perform simultaneous sampling ofthe same input signal, using differently-gained versions of thesame HDR signal fed into separate analog to digital convert-ers (ADCs). An HDR audio signal is thus sampled by merg-ing a set of low dynamic range (LDR) samplings of the orig-inal HDR input signal. We optimize the choice of LDR inputgains to achieve as high a dynamic range as possible for adesired sampling accuracy.

Index Terms— high dynamic range compositing, simul-taneous HDR compositing, composited dynamic range (CDR),HDR audio, CDR audio

1. INTRODUCTION AND PRIOR ART

In digital photography, sequentially capturing and combiningdifferent images of the same scene is an established area ofresearch [1, 2, 3, 4, 5, 6, 7].

Cameras have a limited dynamic range, but it is possi-ble to capture a high dynamic range (HDR) scene in an HDRimage, by combining a series of low dynamic range (LDR)images, each with different exposures [1, 2]. An overexposedimage is saturated in bright regions of the scene, but capturesdark areas well. On the other hand, an underexposed imagehas its response cut off at (or near) 0 in dark regions, but cap-tures bright areas well. By properly merging a series of differ-ently exposed images, it is possible to capture an HDR scenethat cannot be captured accurately by one exposure alone.

We have also applied HDR compositing to RADAR imag-ing, to distinguish strong reflections (such as from large ships)from weak reflections (e.g. from a small iceberg fragment),when both reflections are received simultaneously. Underthese circumstances, other existing methods of handling dy-namic range such as STC (Sensitivity Time Control [8]) whichadjust receiver input gain over time, were ineffective.

, , , ...

TIME

HDR

COMPOSITING

LDR EXPOSURES

...

HDR

COMPOSITING

LDR EXPOSURES

SENSOR

SENSOR

REAL−WORLD

HDR PHENOMENON

Fig. 1: “Sequential HDR” vs. proposed “Simultaneous HDR”

2. SIMULTANEOUS HDR AUDIO COMPOSITINGHDR photography and video typically use time-separated ex-posures (subject to ghosting problems when the subject matteris in motion [7]) or spatially separated sensors (e.g. multipleCCD arrays with beamsplitters). However, we propose par-allel simultaneous samplings of an input signal from a singleacoustic sensor.

To the best of our knowledge, this may be the first publi-cation where HDR compositing is applied to audio. We focuson extreme dynamic ranges which are beyond the samplingcapability of a given analog to digital converter (ADC).

Acoustic HDR compositing may have applications in biomed-ical pulsed ultrasound [9], and research on water-hammer ef-fects [10, 11], where a very strong acoustic impulse occursperiodically, and very weak sounds need to be sensed as well.Another application is capturing sound from a wearable mi-crophone adjacent to a person’s mouth where we wish to useit to also capture more distant (i.e. quiet) voices or ambientsounds in the room, including while the wearer is speaking,i.e. a situation that previously known methods like AGC (Au-tomatic Gain Control) would not be able to handle.

Due to the limited dynamic range of conventional audiorecorders, there is an unfortunate common need to adjust thegain of a recorder (either manually or by AGC, in one or morestages) depending on the sound level being recorded [12].

Instead, it would be far superior if one could simply press“record”, without any saturation or SQNR problems, over awide dynamic range. Furthermore, we believe that audio mix-ing boards [13] should be HDR-capable, so that only one gaincontrol is required per channel, rather than a separate gaincontrol at every stage of amplifier for each channel. For con-

2012 25th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) 978-1-4673-1433-6/12/$31.00 ©2012 IEEE

QUANTIZATION RESCALING

LevelsPossible

Quiet Exposure

Medium Exposure

Loud Exposure

Cert

ain

ty

t

Cert

ain

ty

t

Cert

ain

ty

t CE

RT

AIN

TY

FU

NC

TIO

NS

This example illustrates:3−Bit ADC, 8 Levels of Quantization(2’s complement signed numbers)

AudioInput

HDROutput

ELECTRONICPREPROCESSING

HD

RC

OM

PO

SIT

ING

GAINCLIP SIGNALPRIOR TO ADC

g3

g2

g1

Fig. 2: HDR input compositing system: illustrated for the simple example of 3-bit A-to-D conversion.

ventional or extreme dynamic range inputs, we would wish tohave no distortion other than at the final output stage, so theoperator does not need to constantly worry about saturation(gain too high) and quantization (gain too low) for each of theseparate stages of processing.

HDR audio compositing could serve as a low cost inputsolution using existing input sampling hardware, with simplegain circuitry added. This method captures a high dynamicrange (HDR) audio signal using a set of simultaneous con-ventional low dynamic range (LDR) samplings of the samesignal, gained differently for each ADC. Fig. 2 shows the con-ceptual structure of the system.

USEABLE DYNAMIC RANGE

EXTENDED DYNAMIC RANGEWITH REDUCED SQNR

GAINFACTOR

MSAFETYMARGIN

EXPOSURE 3 SATURATES

EXPOSURE 1 SATURATES

EXPOSURE 2 SATURATESM = 3 g = 8Δ

Δg

SMα

(g 3) (g 2) (g 1)

Fig. 3: Wyckoff set consisting of three exposures of the same inputsignal: a sinusoid which crescendos over a 60 dB range. This simpleexample illustrates a geometric series of gains, causing the exposuresto be evenly spaced (logarithmically) across a chosen dynamic range.

3. AUDIO “EXPOSURES”, CERTAINTY FUNCTIONS

Analogously to image exposures in photography, we combinea Wyckoff set [2, 14] of several LDR audio exposures, eachwith a different gain before ADC. See Fig. 3.

We use sensors and ADCs which are ordinarily used forLDR audio sampling, and thus are required to be linear (butquantized) up to a maximum measurable quantity, after whichthe response may become nonlinear. (Otherwise, stray fre-quencies would be added to the signal during normal LDRsampling.) Thus, we bypass the comparametric analysis [4]that is often needed when compositing HDR images.

Each captured exposure signal is most useful near, butjust short of, the ADC’s maximum. The signal thus overpow-ers quantization noise as much as possible without saturating.We quantify this in a certainty function (similar to a certaintyfunction in HDR imaging [1]), which is used to weight eachexposure when combining samples. We design a certaintyfunction as follows: (1) For ADC output level 0, the certaintyfunction is small but nonzero ε, to recognize that quantizationerror overpowers the signal, and the only information gainedis the fact that the signal is between ADC levels±1; (2) affineincrease starting from ADC 0, representing improving SQNRwhere the signal overpowers the quantization noise; (3) ramp-down near±ADC maximum, to prevent sudden switchover toanother exposure and thus prevent sudden glitches in the caseof imperfect gain or bias calibration, (4) at±ADC maximum,certainty drops to zero to reject saturated signals (no informa-tion can be gained as to how large the signal is beyond theADC limits).

Unlike HDR imaging, the result of applying the certaintyfunction is now a time-varying certainty signal. Fig. 2 showsthe process of combining LDR exposures, using a separatetime-varying certainty signal for each exposure. Later, wewill see (in Fig. 7) the total certainty fluctuating, dependingon the total information available from each exposure.

4. OCCUPYING A DYNAMIC RANGE:RANGE DENSITY FUNCTIONS

We now examine HDR time-varying signals more closely.Let us say “quantity” q when we refer to the instantaneous

value (voltage, current, etc.) of a signal x(t). Then, “ampli-tude” A(t) strictly refers to the envelope (peak or RMS) ofthe quantities taken on by x(t) in one cycle of a waveform ofperiod T (e.g. values of x(τ) in the interval τ ∈ [t− T

2 , t+T2 ])

Thirdly, “dynamic range” refers to the ratio between themaximum amplitude and minimum amplitude of the signal(which occur at different times).

We propose a Range Density Function (RDF) which rep-resents the proportion of the entire signal which occupies eachgiven quantity in the range of the signal. This RDF can be ap-proximated by taking a histogram of the signal quantity (astime progresses), and can be approached in the limit withasymptotically increasing histogram resolution. Each bin of ahistogram (between quantities a and b) is related to the RDFas follows:∫ b

a

Rx(q)dq = Histogram [a ≤ x(t) ≤ b]

=1

t2 − t1

∫ t2

t1

[u(x(τ)−a)· u(b−x(τ))

]· dτ

(1)

where u(·) is the Heaviside step function, and the histogramis found by observing x(t) between times t1 and t2.

The RDF of a function is a deterministic analogue to theProbability Density Function (PDF) of a statistical randomvariable, and should obey the properties: Rx(q) ≥ 0, and∫∞−∞Rx(q)dq = 1. For a periodic waveform (with period T ),

we can determine the RDF directly:

Rx(q) =1

T·

∑{t| x(t)=q}

1

|x′(t)|(2)

for any values of q at which x(t) is differentiable and x′(t) isnonzero1. If x(t) holds steady at quantity q?, a delta measureoccurs at Rx(q?). For example, this occurs twice for a squarewave: RSQUARE(q) =

12δ(q−1)+

12δ(q+1). A sine wave leads

to two asymptotes: RSINE(q) = 1π

1√1−q2

, for q ∈ [−1, 1].See Fig. 4.

In the frequency domain, the most pure waveform is thecomplex exponential, and a uniform distribution is created bya sinc function. In the amplitude domain, the most pure wave-form is the square wave, and a uniform distribution is createdby a sawtooth wave.2 (See Fig. 4.)

We further propose a Dynamic Range Density Function(DRDF) which represents the proportion of the entire signal

1This calculation is analogous to finding the PDF of a function of a ran-dom variable having a uniform distribution over the phases [0, 2π].

2As well, an inverse sawtooth or triangular pattern would also suffice.3This is a multidimensional optimization problem, for future work.

t

t

R (q)x

q

x(t) = SINUSOID

R (q)

q

y

y(t) = SAWTOOTH

R (q)

q

t

t

z(t)

z

RDF

RDF

RDF

log(a)

R (log(a))A

DRDF

z

log(A (t))z

EXPONENTIAL =CRESCENDO

Fig. 4: RDF (range density function). Note that RDF is an ob-served distribution, which varies for each observation or realizationof a random process, unlike the PDF (probability density function),which is a statistical distribution. In the third example, even thoughthe RDF at each given amplitude is still that of a sinusoid, a differ-ent overall RDF is created when the sinusoid is modulated with anexponential crescendo. This pattern gives a uniform DRDF (whenmeasured logarithmically).

which occupies each given amplitude in the dynamic rangeof the signal. The signals in Figs. 3 (top plot) and 4 (bottomplot) have a uniform DRDF in a logarithmic sense, due to theramp from minimum to maximum amplitude. This straightline (when viewed on a log plot) represents an exponentialcrescendo in amplitude.

5. CHOOSING GAIN SPACING OF M EXPOSURESWe would like to carefully choose the relationship betweenthe M exposures that will be combined together to samplean HDR signal. We examine the case where the signal is de-tected by a single sensing element, and each exposure is agained version of this sensed signal. We thus want to care-fully choose the spacing between the gains gm of each expo-sure (“exposure packing”).

To make HDR sampling robust for a variety of situations,we space the exposures in a geometric sequence to equallycover all amplitudes available within the given dynamic range:

gm = gm−1∆ g1 for m = 1...M (3)

where g∆ is the gain factor separating each exposure.If the input signal’s dynamic range is not occupied uni-

formly (biased DRDF or RDF), then an uneven spacing ofthe exposures may be optimal.3 However, we attempt here

to use a stringent set of requirements to create a generalizedHDR system with demands on the full dynamic range (as rep-resented by uniform DRDF and RDF).

We position the first exposure to capture the strongest ex-pected input signalA0,MAX entirely within the sampling rangeAADC,MAX, less a safety margin αSM:

g1 ·A0,MAX < AADC,MAX(1− αSM) (4)Define an “exposure packing” dynamic range,DEP, as the dy-namic range spanned by theM exposures, from the minimumamplitude desired to be sensed by the most-gained exposure,to the max. amplitude sensed by the least-gained exposure:DEP = HDR exposure packing dynamic range = gM

g1g∆ = gM∆

D0 = Dynamic Range of original signal = A0,MAXA0,MIN

QADC = Quantization ratio = maximum quantity limit of ADCquantization step of ADC

g∆ = Gain factor, creating a geometric sequence of gainsM = Number of exposures in Wyckoff set

We hypothesize that the exposure packing may be bestchosen with relation to some function of the quantization noise,the HDR signal’s dynamic range, and number of exposures:

g∆ = M√DEP = f(QADC, D0,M) (5)

Initially, we can devise some rough, approximate constraints.These will be compared against test results. To contain D0

within successive exposures, and to maintain precision greaterthan one quantization step when graduating from one expo-sure to the next, we must have:

D0 <(DEP = gM∆

)< QMADC (6)

This constraint defines a triangular region of the D0 vs. DEPplane, as drawn in Fig. 5. Again, this rough guideline is only ahypothesis. Interestingly, it is indeed observed/confirmed byour tests (e.g. Fig. 8). We created an automated routine, forrebuilding and testing the exposures while varying their pack-ing (Figs. 6 and 8). Even though it is desirable to increase D0

as much as possible, Fig. 8 reveals a tradeoff against recon-struction error. Therefore, we may need to retreat back fromthe DEP = QMADC edge of the triangle by an exposure over-lap factor αEO ≡ log(QADC)/ log(g∆) − 1, as desired in thistradeoff. Therefore, we choose a number of exposures:

M ≥ logD0

logQADC(αEO + 1) (7)

and space them out evenly within a dynamic range less thanthat available from QADC, but more than the dynamic rangeof the original signal. An equal compromise would be:

g∆ = M√DEP = 10

log10 D0+M log10 QADC2M (8)

which has DEP positioned between the bounds in Eqn. 6. Al-ternatively, Fig. 6 demonstrates that g∆ can be optimized com-putationally for a specific RDF and DRDF.

To conclude: By increasing the number of exposures be-yond the bare minimum, we can spread and overlap the expo-sures to increase the coverage of quantities near (but not quitesaturating at) the extrema of each exposure, thus increasingthe compositing certainty.

g

g

g

g

1

2

3

2

EPD

DYNAMIC RANGE OFORIGINAL SIGNAL

D0

LOG−LOG SCALE

ONE POSSIBLECHOICE OF(D , D )

0EP

HYPOTHESIZEDREGIONFOR SAFE HDR

HDREXPOSUREPACKINGDYNAMICRANGE

Q

g4

g3 44

ADCg1 g

Fig. 5: The proposed constraints on the input signal and the expo-sure packing form a triangular region. In this example, M = 4 ex-posures. In this visualization, D0 and DEP are treated as changeablequantities, and the sequence of gains stretch or contract in tandemalong the axes. Contracting the gain sequence allows the exposuresto overlap. We are working here with gains in a geometric sequence,as was explained in the text. (Note that, in this visualization, oneshould be wary of imagining a crescendo input signal along theseaxes, as was done earlier, since here (a) the axes represent differentpossible dynamic ranges, and (b) the gains are arranged such thatthe weakest signal is to be sampled by g4 ∝ g3

∆.) Returning nowto the practicalities of choosing the exposure spacing: Alternatively,instead of varying (DEP, D0), we can take them as given, and insteadchoose the hardware, i.e. M and QADC, to allow the (DEP, D0) pointto fall comfortably within the bounds of the triangle.

Fig. 6: Sampling a 200 dB dynamic range signal using four 16-bitADCs: Optimization of the gain separation between the conditionedsignals fed into each ADC input. Each ADC on its own cannot accu-rately capture the input signal across its entire dynamic range; themean normalized errors from each ADC’s exposure are includedfor comparison. The data were produced by a computationally-generated exponential crescendo input followed by HDR composit-ing. The gains were varied according to Eqns. 3 and 4. This graphwill end up being a slice of a later graph (Fig. 8), since DEP = gM∆ .

InputSignalSampling gives animperfectrepresentation

RecoveredHDR Signal

Reduced errorfrom compositedLDR exposures

Fig. 7: The proposed algorithm, after sampling a HDR computational test signal with M = 4 exposures, compositing the result. This simpleexample, for clarity, has the sampling encumbered by very coarse quantization steps of 0.05 (visible on 2nd plot, upon close inspection).Under these harsh conditions, the quantization error from one of the exposures is compared to the reduced error after reconstruction. Weshow the reconstructed output error signal (pink), along with the error of one of the exposures (blue), for clarity. Even when subjected tosevere quantization in this test, the reduction in error is visible, as the algorithm consolidates information from all four exposures.

6. EVALUATING THE SYSTEM IN OPERATIONWe evaluated the proposed system first using a low-cost PCstereo sound card (M = 2). The system was able to sample a101 dB dynamic range test signal4 (which is beyond what thedevice can capture on its own), producing a 21dB reductionin THD, as compared to feeding the one signal into one input,as would otherwise conventionally be done.

This initial protoype successfully proved the ability to ex-pand the dynamic range of a mediocre audio capture device,without the time-varying artifacts associated with AGC. Theaudio capture device was a Realtek ALC880 (16-bit resolu-tion, 44.1 kHz sampling),

The prototype used a simple circuit to gain each expo-sure, and electronically saturate the highly gained exposuresto avoid damaging the ADCs. This circuit was constructedwith series-connected fast recovery diodes, with two paral-lel chains to limit both polarities, thus limiting the signals toapproximately +/-1.4V just before entry into the ADCs.

4According to Eqn. 7, M = 2 exposures is sufficient for this signal, with78% exposure overlap. The test signal was generated by a hardware signalgenerator, traversing the dynamic range in 16 logarithmically-equal steps.

An important consideration was “Dynamage Range”: theratio between the amplitude leading to damaging the sensoror ADC, and the amplitude of the smallest detectable signal.For example, when we used a hydrophone to listen to waterflow in a pipe, with high gain, we needed to ensure that astrong impulse produced by the water-hammer effect wouldnot damage the hydrophone or ADC.

Additional simulations tested operation withM = 4 gainedinputs. Fig. 7 shows how our technique operates on a computer-generated input signal. Examining the graph titled “error sig-nals normalized”: the reduced error signal after HDR com-positing is plotted in the purple trace, reduced from the bluetrace, which illustrates how Exposure 1, alone, was only use-ful for the largest amplitudes. (This figure shows very coarsequantization for illustration purposes.)

Fig. 8 verifies operation over a wide variation in D0 andDEP. A triangle wave crescendo was used as an input, in orderto test under full coverage of the dynamic range. That is, thetriangular waveform gave a uniform RDF within each period,and the exponential crescendo gave a uniform DRDF (full,uniform amplitude coverage of the dynamic range).

Fig. 8: Verification of HDR reconstruction, and optimization of synchronized exposures across a space of possible dynamic ranges. Thehypothesized constraints are evidenced in the triangular region, visible with reduced error. While we wish to permit as wide an input dynamicrange (D0) as possible, it is clear that there is a tradeoff against reconstruction error. Therefore, as was explained in the text, we can retreatD0 and DEP back from the edge of this triangle, or otherwise increase M , as desired to reduce reconstruction error.

7. CONCLUSION

We proposed a new technique to capture a high dynamic range(HDR) signal, by compositing simultaneous, differently-gainedlow dynamic range (LDR) samplings of it. HDR audio com-positing is a low cost solution, using existing sampling hard-ware, with the addition of simple input gain circuitry.

This work may have applications in RADAR, SONAR,biomedical ultrasound, and high dynamic range audio record-ing. Extreme dynamic range events, in particular, can besensed by geophones (for solid vibrations, such as glass andsteel breakage), hydrophones (for underwater recordings suchas for research on the water-hammer effect), or microphones(for gas environments, e.g. a headset microphone sensingboth the user’s voice as well as distant quiet sounds).

8. REFERENCES

[1] S. Mann, “Compositing multiple pictures of the same scene,”in Proc. Imaging Science and Tech.Conf., 1993, pp. 50–52.

[2] S. Mann and R.W. Picard, “Being ‘undigital’ with digital cam-eras: Extending dynamic range by combining differently ex-posed pictures,” in Proc. IS&T, May 7–11 1995, pp. 422–428.

[3] P. E. Debevec and J. Malik, “Recovering high dynamic rangeradiance maps from photog.,” SIGGRAPH, pp. 369–378, 1997.

[4] S. Mann, “Comparametric equations with practical applica-tions in quantigraphic image processing,” IEEE Trans. ImageProcessing, vol. 9, no. 8, pp. 1389–1406, August 2000.

[5] M.A. Robertson, S. Borman, and R.L. Stevenson, “Estimation-theoretic approach to dynamic range enhancement using mult.exposures,” J. Electronic Img., vol. 12(2), pp. 219–228, 2003.

[6] S.B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “Highdynamic range video,” ACM Trans. Graphics (Proc. SIG-GRAPH 2003), vol. 22(3), pp. 319–325, 2003.

[7] E.A. Khan, O. Akyuz, and E. Reinhard, “Ghost removal inhigh dynamic range images,” Proc. ICIP, pp. 2005–8, 2006.

[8] H. Meikle, Modern radar systems, Artech House, 2008.[9] K. Nightingale, M.S. Soo, R. Nightingale, and G. Trahey,

“Acoustic radiation force impulse imaging: in vivo demonstra-tion of clinical feasibility,” Ultrasound Med. Biol., vol. 28, pp.227–235, 2002.

[10] S. Mann, R. Janzen, J. Huang, M. Kelly, J. Ba, and A. Chen,“User-interfaces based on the water-hammer effect,” in Proc.TEI, 2011, pp. 1–8.

[11] Ryan Janzen and Steve Mann, “Arrays of water jets as user in-terfaces.: Detection...of flow by listening to turbulence signa-tures using hydrophones,” in Proc. ACMMM, 2007, pp. 505–8.

[12] John Eargle, Handbook of Recording Engineering, NY:Springer, 2005.

[13] Roey Izhaki, Mixing Audio, Elsevier Science, 2011.[14] Charles W. Wyckoff, “An experimental extended response

film,” Tech. Rep. NO. B-321, Edgerton, Germeshausen &Grier, Inc., Boston, Massachusetts, MARCH 1961.

Date post:	27-Feb-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Ryan Janzen and Steve Mann University of Toronto · Ryan Janzen and Steve Mann University of...

Documents