Audio Watermarking Techniques

Audio Watermarking Techniques

Hyoung Joong KimDepartment of Control and Instrumentation Engineering

Kangwon National UniversityChunchon 200-701, Korea

[email protected]

Abstract

This paper surveys the audio watermarking schemes.State-of-the-art of the current watermarking schemesand their implementation techniques are briefly sum-marized. They are classified into five categories:quantization scheme, spread-spectrum scheme, two-set scheme, replica scheme, and self-marking scheme.Advantages and disadvantages of each scheme arealso discussed. In addition, synchronization schemesare also surveyed.

1 Introduction

Audio watermarks are special signals embedded intodigital audio. These signals are extracted by detec-tion mechanisms and decoded. Audio watermarkingschemes rely on the imperfection of the human audi-tory system. However, human ear is much more sen-sitive than other sensory motors. Thus, good audiowatermarking schemes are difficult to design (Kim etal. 2003).

Even though the current watermarking techniquesare far from perfect, during the last decade audio wa-termarking schemes have been applied widely. Theseschemes are sophisticated very much in terms of ro-bustness and imperceptibility (Bender et al. 1996)(Cox et al. 2002) (Cox and Miller 2002). Robustnessand imperceptibility are important rquirements ofwatermarking, while they are conflicting each other.

BlindWatermarking

Non-BlindWatermarking

Quantization Spread-Spectrum Two-Set Replica

s[k]=x[k]+w[k] s[k]=x[k]+x[k-d]s[k]=x[k]+ds[k]=Q(x[k]+d)

Self-Marking

Figure 1: A typical audio watermarking schemes.

Non-blind watermarking schemes are theoreticallyinteresting, but not so useful in practical use, sinceit requires double storage capacity and double com-munication bandwidth for watermark detection. Ofcourse, non-blind schemes may be useful as copy-right verification mechanism in a copyright dispute(and even necessary, see (Craver et al. 1998) or in-version attacks). On the other hand, blind water-marking scheme can detect and extract watermarkswithout use of the unwatermarked audio. There-fore, it requires only a half storage capacity and halfbandwidth compared with the non-blind watermark-ing scheme. Hence, only blind audio watermarkingschemes are considered in this chapter. Needlessto say, the blind watermarking methods need self-detection mechanisms for detecting watermarks with-out unwatermarked audio.

This paper presents basically five audio water-marking schemes (see Figure 1). First scheme isquantization based watermarking which quantizesthe sample values to make valid sample values andinvalid ones. Second one is the spread-spectrummethod based on the similarity between watermarkedaudio and pseudo-random sequence. Third one isthe two-set method based on differences between twoor more sets, which includes the patchwork scheme.Fourth one is the replica method using the close copyof the original audio, which includes the replica mod-ulation scheme. Last one is the self-marking scheme.Of course, much more schemes and their variants areavailable. For example, time-base modulation (Footeand Adcock 2002) is theoretically interesting. How-ever, this mechanism is a non-blind watermarkingscheme. Audio watermarking scheme that encodescompressed audio data (Nahrstedt and Qiao 1998)does not embed real watermarking signal into raw au-dio. Furthermore, no psychoacoustic model is avail-able in the compressed domain to enable the adjust-ment of the watermark to ensure inaudibility.

Synchronization is important for detecting water-marks especially when audio is attacked. Most ofthe audio watermarking schemes are position-based,i.e., watermarks are embedded into specific positionsand detected from the position. Thus, shift in posi-tions by attack makes such detection schemes fail towork. Main purpose of synchronization schemes areto find the shifted positions. Several synchronizationschemes are surveyed in this article. In audio water-marking, time-scaling or pitch-scaling attack is oneof the most difficult attacks to manage. A brief ideafor these attacks is summarized, which is proposedby (Tachibana et al. 2001).

2 Quantization Method

A scalar quantization scheme quantizes a samplevalue x and assign new value to the sample x basedon the quantized sample value. In other words, thewatermarked sample value y is represented as follows:

y ={

q(x,D) + D/4 if b = 1q(x,D) − D/4 otherwise (1)

D

2

D

Anchorq(x,D)

Quantized value to 1Quantized value to 0

x

2

D

Figure 2: A simple quantization scheme.

where q(·) is a quatization function and D is a quan-tization step. A quatization function q(x) is given asfollows:

q(x,D) = [x/D] · D,

where [x] rounds to the nearest integer of x. The con-cept of the simplest quantization scheme in Equation(1) is illustrated in Figure (2). A sample value x isquantized to q(x,D) or to the black circle (•). Letq(x,D) denote anchor. If the watermarking bit b is1, the anchor is moved to the white circle (◦). Oth-erwise, the cross (×) stands for the watermarking bit0. For example, let D be 8, and x be 81. Then,q(81, 8) = 80. If b = 1, then y = 82. Otherwise,y = 78. As is shown in the figure, the distance be-tween achors is D.

Detection is the inverse process of embedding. Thedetection process is summarized as follows:

b ={

1 if 0 < y − q(x,D) < D/40 if −D/4 < y − q(x,D) < 0

This scheme is simple to implement. This schemeis robust againt noise attack so long as the noise mar-gin is below D/4. In other words, the additive noiseis larger than D/4, then quantized value is perturbedso much that detector misinterprets the watermark-ing bit. The robustness can be enhanced if dithermodulation (Chen and Wornell 1999) is used. Thisscheme is formulated as follows:

ym = q(x + dm,D) − dm,

2

Psycho-Acoustic Model

WatermarkShaping Filter

Messageb(n)

r(n)Pseudo-RandomSequence

s(n)Audio Signal

Scaling

w(n)

PowerSpectrumEstimation

WatermarkedAudio

x(n)

Optional Part

Figure 3: A typical embedder of the spread-spectrumwatermarking scheme.

where m is an index, and dm is the m-th dither vec-tor. For example, let d1 = 2, d2 = 0, x = 8, andD = 4. Then, y1 = 10 and y2 = 8. Detection proce-dure estimates the distance and detect watermarkingindex as follows:

b ={

m = 1 if e(y1, d1) < e(y1 − d2)m = 2 if e(y2, d2) < (e(y2 − d1)

(2)

where e(yi, dj) =‖ yi − q(yi + dj) + dj ‖. Now,from the Equation (2), it is possible to detect wa-termark index. In the above example, e(y1, d1) = 0and e(y1 − d2) = 2, Thus, it is clear that y1 is closeto d1. Similarly, y2 is close to d2. This procedure cabbe extended to the dither vector.

3 Spread-Spectrum Method

Spread-spectrum watermarking scheme is an exam-ple of the correlation method which embeds pseudo-random sequence and detects watermark by cal-culating correlation between pseudo-random noisesequence and watermarked audio signal. Spread-spectrum scheme is the most popular scheme and hasbeen studied well in literature (Boney et al. 1996)(Cox et al. 1996) (Cvejic et al. 2001) (Kirovski and

Malvar 2001) (Kim 2000) (Lee and Ho 2000) (Seoket al. 2002) (Swanson et al. 1998). This methodis easy to implement, but has some serious disad-vantages: it requires time-consuming psycho-acousticshaping to reduce audible noise, and susceptible totime-scale modification attack. (Of course, usageof psychoacoustic models is not limited to spread-spectrum techniques.) Basic idea of this scheme andimplementation techniques are described below.

3.1 Basic Idea

This scheme spreads pseudo-random sequence acrossthe audio signal . The wideband noise can be spreadinto either time-domain signal or transform-domainsignal no matter what transform is used. Frequentlyused transforms include DCT (Discrete Cosine Trans-form), DFT (Discrete Fourier Transform), and DWT(Discrete Wavelet Transform). The binary water-mark message v = {0, 1} or its equivalent bipolarvariable b = {−1,+1} is modulated by a pseudo-random sequence r(n) generated by means of a secretkey. Then the modulated watermark w(n) = br(n)is scaled according to the required energy of the au-dio signal s(n). The scaling factor α controls thetrade-off between robustness and inaudibility of thewatermark. The modulated watermark w(n) is equalto either r(n) or −r(n) depending on whether v = 1or v = 0. The modulated signal is then added tothe original audio to produce the watermarked audiox(n) such as

x(n) = s(n) + αw(n).

The detection scheme uses linear correlation. Be-cause the pseudo-random sequence r(n) is known andcan be regenerated generated by means of a secretkey, watermarks are detected by using correlation be-tween x(n) and r(n) such as

c =1N

N∑i=1

x(i)r(i), (3)

where N denotes the length of signal. Equation(3) yields the correlation sum of two components asfollows:

3

Filter

r(n)Pseudo-RandomSequence

CorrelatorWatermarked Audio

x(n)r(n)~

c

Figure 4: A typical preprocessing block for detectorof the spread-spectrum watermarking scheme.

c =1N

N∑i=1

s(i)r(i) +1N

N∑i=1

αbr2(i). (4)

Assume that the first term in Equation (4) is al-most certain to have small magnitudes. If those twosignals s(n) and r(n) are independent, the first termshould vanish. However, it is not the case. Thus, thewatermarked audio is preprocessed as is shown in Fig-ure 4 in order to make such assumption valid. Onepossible solution is filtering out s(n) from x(n). Pre-processing methods include high-pass filtering (Har-tung and Girod 1998) (Haitsma et al. 2000), linearpredictive coding (Seok et al. 2002), and filtering bywhitening filter (Kim 2000).

Such preprocessing allows the second term in Equa-tion (4) to have a much larger magnitude and the firstterm almost to be vanished. If the first term has sim-ilar or larger magnitude than the second term, detec-tion result will be erroneous. Based on the hypothesistest using the correlation value c and the predefinedthreshold τ , the detector outputs

m ={

1 if c > τ0 if c ≤ τ

Typical value of τ is 0. The detection thresholdhas a direct effect both on the false positive and falsenegative probabilities. False positive means a type oferror in which a detector incorrectly determines thata watermark is present in a unwatermarked audio.On the other hand, false negative is a type of error

in which a detector fails to detect a watermark in awatermarked audio.

3.2 Pseudo-Random Sequence

Pseudo-random sequence has statistical propertiessimilar to those of a truly random signal, but it can beexactly regenerated with knowledge of privileged in-formation (see Section 2.1). Good pseudo-random se-quence has a good correlation property such that anytwo different sequences are almost mutually orthog-onal. Thus, cross-correlation value between them isvery low, while auto-correlation value is moderatelylarge.

Most popular pseudo-random sequence is the max-imum length sequence (also known as M -sequence).This sequence is a binary sequence r(n) = {0, 1} hav-ing the length N = 2m − 1 where m is the size of thelinear feedback shift register. This sequence has verynice auto-correlation and cross-correlation proper-ties. If we map the binary sequence r(n) = {0, 1} intobipolar sequence r(n) = {−1,+1}, auto-correlationof the M -sequence is given as follows:

1N

N−1∑i=0

r(i)r(i − k) ={

1 if k = 0−1/N otherwise (5)

The M -sequences have two disadvantages. First,length of the M -sequences, which is called chip rate,is strictly limited to as given by 2m − 1. Thus,it is impossible to get, for example, nine-chip se-quences. Length of the typical pseudo-random se-quences is 1,023 (Cvejic et al. 2001) or 2,047. Thereis always a possibility to make the trade-off betweenthe length of the pseudo-random sequence and ro-bustness. However, very short sequences such aslength 7 are also used (Liu et al. 2002). Second,the number of different M -sequences is also lim-ited once the size m is determined. It is shownthat M -sequence is not secure in terms of cryptogra-phy. Thus, not all pseudo-random sequences are M -sequences. Sometimes, non-binary and consequentlyreal-valued pseudo-random sequence r(n) ∈ R withGaussian distribution (Cox et al. 1996) is used. Non-binary chaotic sequence (Bassia et al. 2001) is also

4

used. As long as they are non-binary, its correlationcharacteristic is very nice. However, since we haveto use integer sequences (processed such as �αr(n)�)due to finite precision, correlation properties becomeless promising.

3.3 Watermark Shaping

Carelessly added pseudo-random sequence or noiseto audio signal can cause unpleasant audible soundwhatever watermarking schemes are used. Thus, justreducing the strength α of pseudo-random sequencecannot be the final solution. Because human earsare very sensitive especially when the sound energyis very low, even a very little noise with small valueof α can be heard. Moreover, small α makes thespread-spectrum scheme not robust. One solution toensure inaudibility is watermark shaping based onthe psycho-acoustic model (Arnold and Schilz 2002)(Bassia et al. 2001) (Boney et al. 1996) (Cvejic etal. 2001) (Cvejic and Seppanen 2002). Interestinglyenough, the watermark shaping can also enhance ro-bustness since we can increase the strength α suffi-ciently as long as the noise is below the margin.

Psycho-acoustic models for audio compression ex-ploit frequency and temporal masking effects to en-sure inaudibility by shaping the quantized noise ac-cording to the masking threshold. Psycho-acousticmodel depicts the human auditory system as a fre-quency analyzer with a set of 25 bandpass filters (alsoknown as critical bands). The required intensity of asingle sound expressed in unit of decibel [dB] to beheard in the absence of another sound is known asquiet curve (Cvejic et al. 2001) or threshold of au-dibility (Rossing et al. 2002). Figure 5 shows thequiet curve. In this case, the threshold in quiet isequal to the so-called minimum masking threshold.However, masking effect can increase the minimummasking threshold. A sound lying in the frequencyor temporal neighborhood of another sound affectsthe characteristics of the neighboring sound, whichphenomenon is known as masking. The sound thatdoes the masking is called masker and the sound thatis masked is called the maskee. The psycho-acousticmodel analyzes the input signal s(n) in order to cal-culate the minimum masking threshold T . Figure 6

-10

0

10

20

30

40

50

60

70

80

Soun

d Pr

essu

re L

evel

(dB

)20 50

Frequency (Hz)

100 200 500 1,000 2,000 5,000 10,000 20,000

Threshold of Audibility (Quiet Curve)

Masking Curve

Figure 5: A typical curve for masking. Noise soundbelow the solid line or bold line is inaudible. Boldline is moved upward by taking masking effects intoconsideration.

-10

0

10

20

30

40

50

60

70

80

Pow

er S

pec

tral

Den

sity

(d

B)

0 0.5 1 1.5 2 2.5x 104Frequency (Hz)

Original AudioSignal

Audible WatermarkSignal

Inaudible WatermarkSignal

Figure 6: An example of noise shaping. Audible noise(dotted line) is transformed into inaudible noise (bro-ken line).

5

shows inaudible and audible watermark signals. Theaudible watermark signal can be transformed into in-audible signal by applying watermark shaping basedon the psycho-acoustic model. The frequency mask-ing procedure is given as follows:

1. Calculate the power spectrum.

2. Locate the tonal (sinusoid-like) and non-tonal(noise-like) components.

3. Decimate the maskers to eliminate all irrelevantmaskers.

4. Compute the individual masking thresholds.

5. Determine the minimum masking threshold ineach subband.

This minimum masking threshold defines the fre-quency response of the shaping filter, which shapesthe watermark. The filtered watermark signal isscaled in order to embed the watermark noise be-low the masking threshold. The shaped signal belowthe masking threshold is hardly audible. In addition,the noise energy of the pseudo-random sequence canbe increased as much as possible in order to maxi-mize robustness. The noise is inaudible as far as thenoise power is below the masking threshold T . Tem-poral masking effects are also utilized for watermarkshaping.

Watermark shaping is a time-consuming task es-pecially when we try to exploit the masking effectsframe by frame in real-time because watermark shap-ing filter coefficients are computed based on thepsycho-acoustic model. In this case, we have to useFourier transform and inverse Fourier transform, andfollow the five steps described above. Needless to say,then detection rate increases since robustness of thewatermark increases. However, since it is too time-consuming, watermark shaping filter computed basedon the quiet curve can be used. Since this filter ex-ploits the minimum noise level, it is not optimal interms of the watermark strength α. This results in astrong reduction of the robustness.

Of course, instead of maximizing the maskingthreshold, we can increase the length of the pseudo-random sequence for the robustness. However, thismethod reduces the embedding message capacity.

Figure 7: Seven example waveforms for sinusoidalmodulation watermarking. By the courtesy of Dr.Zheng Liu.

3.4 Sinusoidal Modulation

Another solution is the sinusoidal modulation basedon the orthogonality between sinusoidal signals (Liuet al. 2002). Sinusoidal modulation utilizes the or-thogonality between sinusoidal signals with differentfrequencies

1N

N−1∑i=0

sin(

2πim

N

)sin

(2πin

N

)=

{1 if m = n0 otherwise

Based on this properties, the sinusoidally modu-lated watermark can be generated by adding sinu-soids with different frequencies by pseudo-random se-quences (Liu et al. 2002) as follows:

w =N−1∑i=0

biαisin(2πfi).

Note that watermark signal modulated by the ele-ments of pseudo-random sequence bi keeps the samecorrelation characteristics as that of pseudo-randomsequence in Equation (5). Coefficient bi is a bipolarpseudo-random sequence, αi is a scaling factor for the

6

Frequency

Am

plitu

de

Minimum inaudible amplitude

Just noticeable difference

Figure 8: Just noticeable differences for sinusoidalmodulation.

i-th sinusoidal component with frequency fi. For ex-ample, Figure 8 shows seven waveforms for sinusoidalmodulated watermarking scheme. Seven sinusoidsare linearly combinated with different bi coefficients.This sinusoidal modulation method has following ad-vantages. First, watermark embedding and detectioncan be simply done in the time-domain. Thus, its em-bedding complexity is relatively low. Second, lengthof the pseudo-random sequence is very short. Third,the embedded sinusoids always start from zero andend on zero, which minimizes the chance of blocknoise.

Of course, this scheme also need psychoacousticmodulation for inaudibility. However, the number ofsinusoids are quite few in numbers, just noticeabledifference (see Figure 8) for them can be decided inthe frequency domain by audibility experiments.

4 Two-Set Method

Blind watermarking scheme can be devised by mak-ing two sets different. For example, if two sets aredifferent, then we can conclude that watermark ispresent. Such decisions are made by hypothesis teststypically based on the difference of means betweentwo sets. Making two sets of audio blocks have dif-ferent energies can also be a good solution for blindwatermarking. Patchwork (Arnold 2000) (Bender etal. 1996) (Yeo and Kim 2003) also belongs to this cat-egory. Of course, depending on the applications wecan exploit the differences between two sets or more.

4.1 Patchwork Scheme

Original patchwork scheme embeds a special statis-tic into an original signal (Bender et al. 1996). Thetwo major steps in the scheme are: (i) choose twopatches pseudo-randomly and (ii) add the small con-stant value d to the samples of one patch A and sub-tract the same value d from the samples of anotherpatch B. Mathematically speaking,

a∗i = ai + d, b∗i = bi − d,

where ai and bi are samples of the patchwork sets Aand B, respectively. Thus, the original sample valueshave to be slightly modified. The detection processstarts with the subtraction of the sample values be-tween two patches. Then, E[a∗ − b∗], the expectedvalue of the differences of the sample means is used todecide whether the samples contain watermark infor-mation or not, where a∗ and b∗ are sample means ofthe individual sample a∗

i and b∗i , respectively. Sincetwo patches are used rather than one, it can detectthe embedded watermarks without the original sig-nal, which makes it a blind watermarking scheme.

Patchwork has some inherent drawbacks. Notethat

E[a∗ − b∗] = E[(a + d) − (b − d)] = E[a − b] + 2d,

where a and b are sample means of the individualsample ai and bi, respectively. The patchwork schemeassumes that E[a∗ − b∗] = 2d due to the prior as-sumption that random sample ensures that expectedvalues are all the same such that E[a− b] = 0. How-ever, the actual difference of sample means, a − b,is not always zero in practice. Although the distri-bution of the random variable E[a∗ − b∗] is shiftedto the right as shown in Figure 9, the probability ofa wrong detection still remains (see the area smallerthan 0 in the watermarked distribution). The perfor-mance of the patchwork scheme depends on the dis-tance between two sample means and d which affectsinaudibility. Furthermore, the patchwork scheme hasoriginally been designed for images.

The original patchwork scheme has been appliedto the spatial-domain image (Bender et al. 1996) (or,equivalently, time-domain in audio) data. However,

7

0 2d

Unwatermarked Distriburion

Watermarked Distriburion

Figure 9: A comparison of the unwatermarked andwatermarked distributions of the mean difference.

time-domain embedding is vulnerable even to weakattacks and modifications. Thus, patchwork schemecan be implemented in the transform-domain (Arnold2000) (Bassia et al. 2001) (Yeo and Kim 2003). Theirimplementations have enhanced original patchworkalgorithms. First, mean and variance of the samplevalues are computed in order to detect the water-marks. Second, new algorithms assume that the dis-tribution of the sample values is normal. Third, theytry to decide the value d adaptively.

Modified Patchwork Scheme (MPA) (Yeo and Kim2003) is described below:

1. Generate two sets A = {ai} and B = {bi}randomly. Calculate the sample means a =N−1

∑Ni=1 ai and b = N−1

∑Ni=1 bi, respectively,

and the pooled sample standard error

S =

√∑Ni=1(ai − a)2 +

∑Ni=1(bi − b)2

N(N − 1).

2. The embedding function presented below intro-duces an adaptive value change,

{a∗

i = ai + sign(a − b)√

CS/2b∗i = bi − sign(a − b)

√CS/2

(6)

0 d-d

Unwatermarked Distribution

Watermarked Distribution

Figure 10: A comparison of the un-watermarked andwatermarked distributions of the mean difference bythe modified patchwork algorithm

where C is a constant and ”sign” is the signfunction. This function makes the large valueset larger and the small value set smaller so thatthe distance between two sample means is alwaysbigger than d =

√CS as shown in Figure 10.

3. Finally, replace the selected elements ai and bi

by a∗i and b∗i .

Since the proposed embedding function (6) intro-duces relative distance changes of two sets, a naturaltest statistic which is used to decide whether or notthe watermark is embedded should concern the dis-tance between the means of A and B. In this section,we present the detecting scheme and investigate thestatistical properties. The decoding process is as fol-lows:

1. Calculate the test statistics

T 2 =(a − b)2

S2.

2. Compare T 2 with the threshold τ and decidethat watermark is embedded if T 2 > τ and nowatermark is embedded otherwise.

8

Multiplicative patchwork scheme (Yeo and Kim2003) provides a new way of patchwork embedding.Most of the embedding schemes are additive such asx = x + αw, while multiplicative embedding schemeshave the form x = s(1 + αw). Additive schemes shiftaverage, while multiplicative scheme changes vari-ance. Thus, detection scheme exploits such facts.

4.2 Amplitude Modification

This method embeds watermark by changing energiesof two or three blocks. Energy of each block of lengthN is defined and calculated as

E =N∑

i=1

|s(i)|.

The energy is high when the amplitude of signal islarge. Assume that two consecutive blocks be usedto embed watermark. We can make the two blocksA and B have the same energies or different ener-gies by modifying the amplitude of each block. LetEA and EB denote the energies of blocks A and B,respectively. If EA ≥ EB + τ , then, for example,we conclude that watermark message m = 0 is em-bedded. If EA ≤ EB − τ , then we conclude thatwatermark message m = 1 is embedded. Otherwise,no watermark is embedded.

However, this method has a serious problem. As-sume that block A has much more energy than blockB and the watermark message to be embedded is0, then there is no problem at all. Otherwise, wehave to make EA larger than EB . As long as the en-ergy difference gap is wide, the resulting artifact be-comes obvious and so unnatural to be noticed. Thisscheme can turn ”forte” part into ”piano” part, un-fortunately, or vice versa. Such problem can be mod-erated by using three blocks (Lie and Chang 2001) ormore. By using multiple blocks, such artifacts can bereduced slightly by distributing the burdens acrossother blocks.

5 Replica Method

Original signal can be used as an audio watermark.Echo hiding is a good example. Replica modulation

Original Signal

Echo Signal

Echo Amplitude

DelayOffset

Time0 d

1

a

Figure 11: Kernels for echo hiding.

also embeds part of the original signal in frequencydomain as a watermark. Thus, replica modulationembeds replica, i.e., a properly modulated originalsignal, as a watermark. Detector can also generatethe replica from the watermarked audio and calculatethe correlation. The most significant advantage ofthis method is its high immunity to synchronizationattack.

5.1 Echo Hiding

Echo hiding embeds data into an original audio signalby introducing an echo in the time domain such that

x(n) = s(n) + αs(n − d). (7)

For simplicity, a single echo is added above (seeFigure 11). However, multiple echoes can be added(Bender et al. 1996). Binary messages are embeddedby echoing the original signal with one of two delays,either a d0 sample delay or a d1 sample delay. Extrac-tion of the embedded message involves the detectionof delay d. Autocepstrum or cepstrum detects thedelay d. Ceptrum analysis duplicates the cepstrumimpulses every d samples. The magnitude of the im-pulses representing the echoes are small relative tothe original audio. The solution to this problem isto take auto-correlation of the cepstrum (Gruhl et al.1996). Double echo (Oh et al. 2001) such as

x(n) = s(n) + αs(n − d) − αs(n − d − ∆).

can reduce the perceptual signal distortion and en-hance robustness. Typical value of ∆ is less than

9

three or four samples. Echo hiding is usually imper-ceptible and sometimes makes the sound rich. Syn-chronization methods frequently adopt this methodfor coarse synchronization. Disadvantage of echo hid-ing is its high complexity due to cepstrum or autocep-strum computation during detection. On the otherhand, anybody can detect echo without any priorknowledge. In other words, it provides the clue forthe malicious attack. This is another disadvantage ofecho hiding. Blind echo removing is partially success-ful (Petitcolas et al. 1998). Time-spread echo (Ko etal. 2002) can reduce such a possibility of attacks. An-other way of evading blind attack is auto-correlationmodulation (Petrovic et al. 1999) which obtains wa-termark signal w(n) from the echoed signal x(n) inEquation (7). This method is more sophisticated andelaborated in the replica modulation. Double echohiding scheme (Kim and Choi 2003)

x(n) = s(n) + αs(n − d) + αs(n + d),

is now available. The virtual echo s(n + d) violatesthe causality. However, it is possible to embed vir-tual echoes by delaying echo-embedding process by dsamples. These twin echoes make the cesptrum peakhigher than single echo with the same strength ofecho α. Thus, double echoes can enhance detectionrate due to higher peak or enhance imperceptibilityby reducing α accordingly.

5.2 Replica Modulation

Replica modulation (Petrovic 2001) is a novel wa-termarking scheme that embeds a replica, i.e., amodified version of original signal. Three replicamodulation methods include frequency-shift, phase-shift, and amplitude-shift schemes. The frequency-shift method transforms s(n) into frequency domain,copies a fraction of low-frequency components in cer-tain ranges (for example, from 1 kHz to 4 kHz), mod-ulates them (by moving 20 Hz, for example, with aproper scaling factor), inserts them back to the orig-inal components (to cover ranges from 1020 Hz to4020 Hz) and transforms inversely to time domainto generate watermark signal w(n). Since the fre-quency components are shifted and added in the fre-quency domain, we call it ”frequency-domain echo”

to contrast it with ”time-domain echo” - the casewhere replica is obtained by a time-shift of original(or its portion). Such a modulated signal w(n) isa replica. This replica can be used as a carrier inmuch the same manner as PN sequence in spread-spectrum techniques. Thus, the watermarked signalhas the following form:

x(n) = s(n) + αw(n).

As long as the components are invariant againstmodifications, the replica in the frequency domaincan be generated from the watermarked signal. Thewatermark signal w(n) can be generated from thewatermarked signal x(n) by processing it accordingto the embedding process. Then, correlation betweenx(n) and w(n) is computed as follows

c =1N

N∑i=1

s(i)w(i) +1N

N∑i=1

αw(i)w(i). (8)

to detect watermark. As long as we use frequencyband with lower cut-off much larger than frequencyshift, and the correlation is done over integer numberof frequency shift period, we have very small correla-tion between s(n) and w(n) in Equation (8). On theother hand, the spectra of the product w(n)w(n) hasa strong dc component, and, thus, c contains a termof mean value of w(n)w(n), i.e., it contains the scaledauxiliary signal in the last term of Equation (8).

Note that the frequency-shift is just one way togenerate replica. Combination of frequency-shift,phase-shift, and amplitude-shift makes the replicamodulation more difficult for malicious attacker toderive a clue, and makes the correlation value be-tween s(n) and w(n) even smaller. The main advan-tage in comparison to PN sequence is that chip syn-chronization is not needed during detection, whichmakes replica modulation immune to synchronizationattack. When an attacker makes a jitter attack (e.g.,cuts out a small portion of audio, and splices the sig-nal) against PN sequence techniques, synchronizationis a must. On the contrary, the replica modulationis free from synchronization since replica and originalgive the same correlation before and after cutting and

10

(a) Original Signal

(b) Time-Scale Modified Signal

Bit "1" is embedded. (Gentle slope) Bit "0" is embedded. (Steep slope)

Figure 12: The concept of time-scale modificationwatermarking scheme. Messages, either bit ”0” and”1”, can be embedded by changing slopes betweentwo successive extrema.

splicing. Of course, the time-scaling attacks can af-fect bit and packet synchronization, but this is muchsmaller problem than chip synchronization. Pitch-scaling (Shin et al. 2002) is a variant of the replicamodulation, which makes it possible that the lengthof audio remains unchanged, but the harmonics iseither expanded or contracted accordingly.

6 Self-Marking Method

Self-marking method embeds watermark by leavingself-evident marks into the signal. This method em-beds special signal into the audio, or change sig-nal shapes in time domain or frequency domain.Time-scale modification method (Mansour and Tew-fik 2001) and many schemes based on the salientfeatures (Wu et al. 2000) belong to this category.Clumsy self-marking method, for example, embed-ding a peak into frequency domain, is prone to attacksince it is easily noticeable.

6.1 Time-Scale Modification

Time-scale modification is a challenging attack andcan be used for watermarking (Mansour and Tewfik

2001). Time-scale modification refers to the processof either compressing or expanding the time-scale ofaudio. Basic idea of the time-scale modification wa-termarking is to change the time-scale between twoextrema (successive maximum and minimum pair) ofthe audio signal (see Figure 12). The intervals be-tween two extrema are partitioned to N segments ofequal amplitude. We can change the slope of thesignal in certain amplitude interval(s) according tothe bits we want to embed, which changes the time-scale. For example, the steep slope and gentle slopestand bits ”0” and ”1” or vice versa, respectively. Ad-vanced time-scale modification watermarking scheme(Mansour and Tewfik 2001) can survive time-scalemodification attack.

6.2 Salient Features

Salient features are special and noticeable signal tothe embedders, but common signal to the attackers.They may be either natural or artificial. However,in either case they must be robust against attacks.So far those features are extracted or made empiri-cally. The salient features can be used especially forsynchronization or for robust watermarking, for ex-ample, against time-scale modification attack.

7 Synchronization

Watermark detection starts by alignment of water-marked block with detector. Losing synchronizationcauses false detection. Time-scale or frequency-scalemodification makes the detector lose synchronization.Thus, most serious and malicious attack is probablythe desynchronization. All the watermarking algo-rithms assume that any detector be synchronized be-fore detection. Brute-force search is computationallyinfeasible. Thus, we need fast and exact synchroniza-tion algorithms. Some watermarking schemes suchas replica modulation or echo hiding are rather ro-bust against certain type of desynchronization at-tacks. Such schemes can be used as a baseline methodfor coarse synchronization. Synchronization code canbe used to synchronize the onset of the watermarkedblock.

11

However, refined synchronization scheme design isnot simple. Clever attackers also try to devise sophis-ticated methods for desynchronization. Thus, syn-chronization scheme should also be robust against at-tacks and fast. There are two synchronization prob-lems. First one is to align the starting point ofa watermarked block. This approach is applied tothe attacks such as cropping out or inserting redun-dancy. For example, a garbage clip can be added tothe beginning of audio intentionally or unintention-ally. Some MP3 encoders unintentionally add around1,000 samples, which makes innocent decoder fail todetect exact watermarks. Second one is time-scaleand frequency-scale modifications, intentionally doneby malicious attackers or unintentionally done by theaudio systems (Petrovic et al. 1999), anyway whichare very difficult to cope with. Time-scale modifi-cation is a time-domain attack that adds fake sam-ples periodically into target audio or delete samplesperiodically (Petitcolas et al. 1998) or uses sophis-ticated time-scaling schemes (Arfib 2002) (Dutilleux2002) to keep pitches. Thus, audio length may be in-creased or decreased. On the other hand, frequency-scale modification (or pitch-scaling) adjusts frequen-cies and then applies time-scale modification to keepthe size unchanged. This attack can be implementedby sophisticated audio signal processing techniques(Arfib 2002) (Dutilleux 2002). Aperiodic modifica-tion is more difficult to manage.

There are many audio features such as brightness,zero-crossing rate, pitch, beat, frequency centroid,and so on. Some of them can be used for synchro-nization as long as such features are invariant underattacks. Feature analysis in speech processing hasbeen studied well in literature while very few studiesare available in audio processing.

Recently, a precise synchronization scheme whichis efficient and reliable against time-scaling and pitch-scaling attacks has been presented (Tachibana etal. 2001). For the robustness, this scheme calcu-lates and manipulates the magnitudes of segmentedareas in the time-frequency plane using short-termDFTs. The detector correlates the magnitudes witha pseudo-random array that corresponds to two-dimensional areas in the time-frequency plane. Thepurpose of 2-D array is to detect watermark when at

least one plane of information is alive under the as-sumption that attacking watermarks in both planesat the same time is not so feasible. Manipulation ofmagnitudes (which is similar to amplitude modifica-tion) is useful since magnitudes are less influencedthan phases under attack. This scheme is useful tofight against time-scaling and pitch-scaling attacksand defenses quite well against them.

7.1 Coarse Alignment

Fine alignment is the final goal of synchronization.However, such alignment is not simple. Thus, coarsesynchronization is needed to locate possible positionfast and effectively. Once such positions are identi-fied, fine synchronization mechanisms are used for ex-act synchronization. Thus, coarse alignment schemeshould be simple and fast.

Combination of energy and zero-crossing is a goodexample for coarse alignment scheme. Total energyand number of zero-crossings of each block are calcu-lated. A sliding window is used to confine a block. Ifthe two measures meet the predefined criteria, thenwe can conclude that the block is close to the tar-get block for synchronization. Such conclusion isdrawn from the assumption that energy and numberof zero-crossing are invariant. For example, a blockwith low energy and large number of zero-crossingsmay be a good clue. Number of zero-crossings areclosely related with frequencies. Large number ofzero-crossings implies that the audio contains highfrequency components. Energy computation is sim-ple to implement. Just taking absolute values of eachsample and summing up all gives the energy of thesample. Counting the number of sign changes frompositive to negative and vice versa gives the numberof zero-crossings.

Echo-hiding can also be used for coarse synchro-nization. For example, if an evidence of echo exis-tence is identified, it shows that the block is nearfrom synchronization. Unfortunately, echo detectionis considerably costly in terms of computing com-plexity. Replica modulation is rather robust againstdesynchronization attacks.

12

7.2 Synchronization Code

The synchronization code in time domain based onBark code (Huang et al. 2002) is a notable idea.The Bark code (with bit length 12, for example,given as ”111110011010”) can be used as a syn-chronization since this code has a special auto-correlation function. To embed the Bark code suc-cessively, this method sets the lowest 13 bits to be”1100000000000” when embedding message is ”1”,and set to be ”0100000000000” otherwise, regardlessof the sample values. For example, a 16-bit samplevalue ”1000000011111111” is changed forcibly into”1001100000000000” to embed message ”1” in timedomain. This method is claimed to achieve the bestperformance to resist additive noise and keep suffi-cient inaudibility.

7.3 Salient Point Extraction

Salient point extraction without changing the orig-inal signal (Wu et al. 2000) is also a good scheme.Basic idea of this scheme is to extract salient pointsas locations where the audio signal energy is climb-ing fast to a peak value. This approach works well forsimple audio clips played by few instruments. How-ever, this scheme has two disadvantages with morecomplex audio clips. First, overall energy variationbecomes ambiguous for complex audio where manymusic instruments are played altogether. Then, thestability of the salient points decreases. Second, thereexists the difficulty to define appropriate thresholdsfor all piece music. High threshold value is suitablefor audio with sharp energy variation. However, thesame value to complex audio would yield very fewsalient points. Thus, audio content analysis (Wu etal. 2000) parses complex audio into several simplerones so that stability of salient points could be im-proved and the same threshold could be applied toall audio clips.

In order to avoid such complex operations, spe-cial shaping of audio signal is also useful for coarsesynchronization. This approach intentionally modi-fies signal shape to keep salient points, which is suffi-ciently invariant under malicious modifications. Forexample, choosing the fast climbing signal portion

Sequence A

Sequence B

(a) Exact match(15 matches)

(b) One-chip off(3 matches)

(c) One-chip off with etxtended chip rates by 3(15 matches)

Figure 13: The concept of redundant-chip coding.Right figure is an extended version of the center figureby chip-rate of 3. Correlation is calculated at theareas with dotted lines only.

and marking on it with special sawtooth shape is anexample. Such artificial marking may generate audi-ble high frequency noise. Careful shaping can reducethe noise to a hardly audible level.

7.4 Redundant-Chip Coding

Pseudo-random sequence is a good tool for water-marking. As is mentioned, correlation is effective todetect watermark as long as perfect synchronizationis achieved. When the pseudo-random sequence is ex-actly aligned, its correlation approaches to Equation(5). Figure 13-(a) depicts a perfect synchronizationbetween a 15-chip pseudo-random sequence (if we useM -sequence, but not in this example). Its normalizedauto-correlation is 1. However, if the sequences aremisaligned by one chip off as is shown in (b), its auto-correlation falls down to −3/15. This problem can besolved by redundant-chip coding (Kirovski and Mal-var 2001). Figure 13-(c) shows an expanded chip rate3. Now, misalignment by one chip off doesn’t matter.During the detection phase, only the central sampleof each expanded chip is used for computing correla-tion. The central chips are marked by broken linesin Figure 13-(c). By using such a redundant-chip en-coding with expansion by R chips, correct detectionis possible up to �R/2� chips off misalignment. Ofcourse, this method enhances robustness at the costof embedding capacity.

13

7.5 Beat-Scaling Transform

The beat, salient periodicity of music signal, is oneof the fundamental characteristics of audio. Seriousbeat change can spoil the music. Thus, beat mustbe almost invariant under attacks. In this context,beat can be a very important marker for synchro-nization. The beat-scaling transform (Kirovski andAttias 2002) can be used for enabling synchronicitybetween the watermark detector and the location ofthe watermark in an audio clip.

Beat-scaling transform method calculates the aver-age beat period in the clip and identifies the locationof each beat as accurately as possible. Next, the au-dio clip is scaled (i.e., stretched or shortened) suchthat the length of each beat period is constant andequal to the average beat period rounded to the near-est multiple of a certain block of samples. The scaledclip is watermarked and scaled back to its originaltempo. As long as beat remains unchanged, water-marks can be detected from the scaled beat periods.Beat detection algorithms are presented in (Goto andMuraoka 1999) (Scheirer 1998). Of course, in thiscase the synchronization relies on the accuracy of thebeat detection algorithms.

8 Conclusions

Available studies on audio watermarking is far lessthan that of image watermarking or video water-marking. However, during the last decade audiowatermarking studies have also increased consider-ably. Those studies have contributed much to theprogress of audio watermarking technologies. Thispaper surveyed those papers and classified theminto four categories: quantization scheme, spread-spectrum scheme, two-set scheme, replica scheme,and self-marking. Quantization scheme is not so ro-bust against attacks, but easy to implement. Spread-spectrum scheme requires psycho-acoustic adapta-tion for inaudible noise embedding. This adapta-tion is rather time-consuming. Of course, most ofthe audio watermarking schemes need psychoacous-tic modelling for inaudibility. Another disadvantageof spread-spectrum scheme is its difficulty of syn-

chronization. On the other hand, replica method iseffective for synchronization. However, echo hidingis vulnerable to attack. Replica modulation (Petro-vic 2001) is rather secure than echo hiding. Amongtwo-set schemes, the modified patchwork algorithm(Yeo and Kim 2003) is also very much elaborated.Self-marking method can be used especially for syn-chronization or for robust watermarking, for exam-ple, against time-scale modification attack. Such fiveseminal works have improved watermarking schemesremarkably. However, more sophisticated technolo-gies are required, and expected to be achieved in thenext decade. Some synchronization schemes are alsovery important. This article briefly surveys the basicideas for synchronization.

Acknowledgments

This work was in part supported by the Brain Ko-rea 21 Project, Kangwon National University. Theauthors appreciate Prof. D. Ghose of Indian Insti-tute of Science for their comments. The authors alsoappreciate Dr. Rade Petrovic of Verance Inc., Mr.Michael Arnold of Fraunhofer Gesellscaft, Dr. Fa-bien A. P. Petitcolas of Microsoft, for their kind per-sonal communications and review. The authors alsoappreciate Taehoon Kim, Kangwon National Univer-sity, for implementing various schemes and providinguseful information.

References

Arfib, D., Keiler, F., and Zoler, U. (2002), ”Time-frequency Processing,” in DAFX: Digital Audio Ef-fects, edited by U. Zoler, John Wiley and Sons, pp.237-297.

Arnold, M. (2000), ”Audio watermarking: features,applications and algorithms,” IEEE InternationalConferenc Multimedia and Expo, vol. 2, pp. 1013-1016.

Arnold, M. (2001), ”Audio Watermarking: Buryinginformation in the data,” Dr. Dobb’s Journal, vol.11, pp. 21-28.

14

Arnold, M., and Schilz, K. (2002), ”Quality evalu-ation of watermarked audio tracks,” SPIE Elec-tronic Imaging, vol. 4675, pp. 91-101.

Bassia, P., Pitas, I., and Nikolaidis, N. (2001), ”Ro-bust audio watermarking in the time domain,”IEEE Transactions on Multimedia, vol. 3, pp. 232-241.

Bender, W., Gruhl, D., Morimoto, N., and Lu, A.(1996), ”Techniques for data hiding,” IBM SystemsJournal, vol. 35, pp. 313-336.

Boeuf, J., and Stern, J.P. (2001), ”An analysis ofone of the SDMI audio watermarks,” Proceedings:Information Hiding, pp. 407-423.

Boney, L., Tewfik, A. H., and Hamdy, K. N. (1996),”Digital watermarks for audio signal,” Interna-tional Conference on Multimedia Computing andSystems, Hiroshima, Japan, pp. 473-480.

Chen, B., and Wornell, G.W., (19969), ”Dither mod-ulation: A new approach to digital watermarkingand information embedding,” Proceedings of theSPIE: Security and Watermarking of MultimediaContents, vol. 3657, pp. 342-353.

Cox, I.J., Kilian, J., Leigton, F.T., and Shamoon, T.(1996), ”Secure Spread Spectrum Watermarkingfor Multimedia,” IEEE Trans. Image Processing,vol. 6, pp. 1673-1687.

Cox, I.J., Miller, M.I., and Bloom, J.A. (2002), Dig-ital Watermarking, Morgan Kaufman Publishers.

Cox, I.J., and Miller, M.I. (2002), ”The first 50 yearsof electronic watermarking,” Journal of AppliedSignal Processing, vol. 2, pp. 126-132.

Craver, S. A., Memon, N., Yeo, B.-L., and Yeung, M.M. (1998), ”Resolving Rightful Ownerships withInvisible Watermarking Techniques: Limitations,Attacks, and Implication,” IEEE Journal on Se-lected Areas in Communications, vol. 16, no. 4, pp.573-586, 1998.

Craver, S. A., Wu, M., Liu, B., Stubblefield, A.,Swartzlander, B., Wallach, D. S., Dean, D., and

Felten, E. W. (2001), ”Reading between the lines:Lessons from the SDMI challenge,” UXENIX Se-curity Symposium.

Craver, S., Liu, B, and Wolf, W. (2002), ”Detectorsfor echo hiding systems,” Information Hiding , Lec-ture Notes in Computer SCience, vol. 2578, pp.247-257.

Cvejic. N., Keskinarkaus, A., and Seppanen, T.(2001), ”Audio watermarking using m-sequencesand temporal masking,” IEEE Workshops on Ap-plications of Signal Processing to Audio and Acous-tics, New Paltz, New York, pp. 227-230.

Cvejic. N., and Seppanen, T. (2002), ”Improving au-dio watermarking scheme using psychoacoustic wa-termark filtering,” IEEE Internation Conferenceon Signal Processing and Information Technology,Cairo, Egypt, pp. 169-172.

Dutilleux, P., de Poli, C., and Zoler, U. (2002),”Time-frequency Processing,” in DAFX: DigitalAudio Effects, edited by U. Zoler, John Wiley andSons, pp. 201-236.

Foote. J., and Adcock, J. (2002), ”Time base mod-ulation: A new approach to watermarking audioand images,” e-print.

Goto. M., and Muraoka, Y. (1999), ”Real-time beattracking for drumless audio signals,” Speech Com-munication, vol. 27, nos. 3-4, pp. 331-335.

Gruhl. D., Lu, A, and Bender, W. (1996), ”Echo Hid-ing,” Pre-Proceedings: Information Hiding, Cam-bridge, UK, pp. 295-316.

Haitsma. J., van der Veen, M., Kalker, T., and Bruek-ers, F. (2000), ”Audio watermarking for monitor-ing and copy protection,” ACM Multimedia Work-shop, Marina del Ray, California., pp. 119-122.

Hartung, F., and Girod, B. (1998), ”Watermarking ofuncompressed and compressed video,” Signal Pro-cessing, vol. 66, pp. 283-301.

Hsieh, C.-T., and Tsou, P.-Y. (2002), ”Blind cep-strum domain audio watermarking based on time

15

energy features,” IEEE International Conferenceon Digital Signal Processing, vol. 2, pp. 705-708.

Huang, J., Wang, Y., and Shi, Y. Q. (2002), ”Ablind audio watermarking algorithm with self-synchronization,” IEEE International Conferenceon Circuits and Systems, vol. 3, pp. 627-630.

Kim, H. (2000), ”Stochastic model based audio wa-termark and whitening filter for improved detec-tion,” IEEE International Conference on Acous-tics, Speech, and Signal Processing, vol. 4, pp.1971-1974.

Kim, H.J., Choi, Y.H., Seok, J., and Hong, J. (2003),”Audio watermarking techniques,” Intelligent Wa-termarking Techniques: Theory and Applications,World Scientific Publishing (to appear).

Kim, H.J., and Choi, Y.H. (2003), ”A novel echohiding algorithm,” IEEE Transactions on Circuitsand Systems for Video Technology, (to appear).

Kirovski, D., and Malvar, H. (2001), ”Robust spread-spectrum audio watermarking,” IEEE Interna-tional Conference on Acoustics, Speech, and SignalProcessing, Salt Lake City, UT, pp. 1345-1348.

Kirovski, D., and Attias, H. (2002), ”Audio wa-termark robustness to desynchronization via beatdetection,” Information Hiding, Lecture Notes inComputer Science, vol. 2578, pp. 160-175.

Ko, B.-S., Nishimura, R, and Suzuki, Y. (2002),”Time-spread echo method for digital audio water-marking using pn sequences,” IEEE InternationalConference on Acoustic, Speech, and Signal Pro-cessing, vol. 2, pp. 2001-2004.

Lee, S.K., and Ho, Y.S. (20010), ”Digital audio wa-termarking in the cepstrum domain,” IEEE Trans-actions on Consumer Electronics, vol. 46, no. 3,pp. 744-750.

Lie, W.-N., and Chang, L.-C. (2001), ”Robust andhigh-quality time-domain audio watermarking sub-ject to psychoacoustic masking,” IEEE Interna-tional Symposium on Circuits and Systems, vol. 2,pp. 45-48.

Liu, Z., Kobayashi, Y., Sawato, S., and Inoue, A.(2002), ”A robust audio watermarking method us-ing sine function patterns based on pseudo-randomsequences,” Proceedings of Pacific Rim Workshopon Digital Steganography 2002, pp. 167-173.

Mansour, M. F., and Tewfik, A. H. (2001), ”Time-scale invariant audio data embedding,” Interna-tional Conference on Multimedia and Expo.

Mansour, M. F., and Tewfik, A. H. (2001), ”Audiowatermarking by time-scale modification,” Inter-national Conference on Acoustics, Speech, and Sig-nal Processing, vol. 3, pp. 1353-1356.

Nahrstedt, K., and Qiao, L. (1998), ”Non-invertiblewatermarking methods for MPEG video and au-dio,” ACM Multimedia and Security Workshop,Bristol, U.K., pp. 93-98.

Oh, H.O., Seok, J.W., Hong, J.W., and Youn, D.H.(2001), ”New echo embedding technique for robustand imperceptible audio watermarking,” IEEE In-ternational Conference on Acoustics, Speech, andSignal Processing, vol. 3, pp. 1341-1344.

Petitcolas, F.A.P., Anderson, R.J., Kuhn, M.G.(1998), ”Attacks on copyright marking system,”Information Hiding, Lecture Notes in ComputerScience, vol. 1525, pp. 218-238.

Petrovic, R., Winograd, J.M., Jemili, K., and Metois,E. (1999) ”Data hiding within audio signals,” In-ternational Conference on Telecommunications inModern Satellite, Cable, and Broadcasting Service,vol. 1, pp. 88-95.

Petrovic, R. (2001) ”Audio signal watermarkingbased on replica modulation,” International Con-ference on Telecommunications in Modern Satel-lite, Cable, and Broadcasting Service, vol. 1, pp.227-234.

Rossing, T.D., Moore, F.R., and Wheeler, P.A.(2002), The Science of Sound, 3rd ed., Addison-Wesley, San Francisco.

Seok, J., Hong, J., and Kim, J. (2002), ”A novel audiowatermarking algorithm for copyright protection ofdigital audio,” ETRI Journal, vol. 24, pp. 181-189.

16

Scheirer, E (1998), ”Tempo and beat analysis ofacoustic musical signals,” Journal of the AcousticSociety of America, vol. 103, pp. 588-601.

Shin, S., Kim, O., Kim, J., and Choi, J. (2002), ”Arobust audio watermarking algorithm using pitchscaling,” IEEE International Conference on Digi-tal Signal processing, pp. 701-704.

Swanson, M., Zhu, B., Tewfik, A., and Boney, L.(1998), ”Robust audio watermarking using percep-tual masking,” Signal Processing, vol. 66, pp. 337-355.

Tachibana, R., Shimizu, S., Kobayashi, S., and Naka-mura, T. (2001), ”An audio watermarking methodrobust against time- and frequency-fluctuation,”Proceedings of the SPIE: Security and Watermark-ing of Multimedia Contents, vol. 4314, pp. 104-115.

Wu, C.-P., Su, P.-C., and Kuo, C.-C. J. (2000), ”Ro-bust and efficient digital audio watermarking usingaudio content analysis,” Security and Watermark-ing of Multimedia Contents, SPIE, vol. 3971, pp.382-392.

Wu, M., Craver, S. A., Felten, E. W., and Liu, B.(2001), ”Analysis of attacks on SDMI audio water-marks” IEEE International Conference on Acous-tic, Speech, and Signal Processing, pp. 1369-1372.

Yeo, I.-K., and Kim, H.J. (2003), ”Modofied patch-work algorithm: A novel audio watermarkingscheme,” IEEE Transactions on Speech and AudioProcessing, vol. 11, (to appear).

Yeo, I.-K., and Kim, H.J. (2003), ”Generalized patch-work algorithm for image watermarking scheme,”ACM Multimedia Systems, (to appear).

17

Date post:	22-Oct-2014
Category:	Documents
Upload:	akshay-mahurkar
View:	72 times
Download:	0 times

Audio Watermarking Techniques

Documents