+ All Categories
Home > Education > Hadoop sensordata part2

Hadoop sensordata part2

Date post: 29-Nov-2014
Category:
Upload: joaquin-vanschoren
View: 1,058 times
Download: 0 times
Share this document with a friend
Description:
Terabyte scale Sensor Network data analysis using MapReduce/ Hadoop
35
Convolution Amount of overlap between 2 functions as they are translated In practice: discrete (sampled) signal s, short response function r (kernel) .r s/ j M=2 X k DM=2C1 s j k r k g h Z 1 1 g. /h.t /d
Transcript
Page 1: Hadoop sensordata part2

Convolution• Amount of overlap between 2 functions as they are

translated

• In practice: discrete (sampled) signal s, short response function r (kernel)

!!

“nr3” — 2007/5/1 — 20:53 — page 642 — #664 !!

! !

642 Chapter 13. Fourier and Spectral Applications

sj0

0

0

N ! 1rk

(r*s)j

N ! 1

N ! 1

Figure 13.1.2. Convolution of discretely sampled functions. Note how the response function for negativetimes is wrapped around and stored at the extreme right end of the array rk .

Example: A response function with r0 D 1 and all other rk’s equal to zerois just the identity filter. Convolution of a signal with this response function givesidentically the signal. Another example is the response function with r14 D 1:5 andall other rk’s equal to zero. This produces convolved output that is the input signalmultiplied by 1:5 and delayed by 14 sample intervals.

Evidently, we have just described in words the following definition of discreteconvolution with a response function of finite duration M :

.r ! s/j "M=2X

kD!M=2C1sj!k rk (13.1.1)

If a discrete response function is nonzero only in some range #M=2 < k $ M=2,where M is a sufficiently large even integer, then the response function is called afinite impulse response (FIR), and its duration is M . (Notice that we are defining Mas the number of nonzero values of rk ; these values span a time interval of M # 1sampling times.) In most practical circumstances the case of finite M is the case ofinterest, either because the response really has a finite duration, or because we chooseto truncate it at some point and approximate it by a finite-duration response function.

The discrete convolution theorem is this: If a signal sj is periodic with periodN , so that it is completely determined by theN values s0; : : : ; sN!1, then its discreteconvolution with a response function of finite durationN is a member of the discreteFourier transform pair,

!!

“nr3” — 2007/5/1 — 20:53 — page 602 — #624 !!

! !

602 Chapter 12. Fast Fourier Transform

h.at/” 1

jajH.f

a/ time scaling (12.0.5)

1

jbjh.t

b/” H.bf / frequency scaling (12.0.6)

h.t ! t0/” H.f / e2! if t0 time shifting (12.0.7)

h.t/ e!2! if0t” H.f ! f0/ frequency shifting (12.0.8)

With two functions h.t/ and g.t/, and their corresponding Fourier transformsH.f / andG.f /, we can form two combinations of special interest. The convolutionof the two functions, denoted g " h, is defined by

g " h #Z 1

!1g.!/h.t ! !/ d! (12.0.9)

Note that g " h is a function in the time domain and that g " h D h " g. It turns outthat the function g " h is one member of a simple transform pair,

g " h” G.f /H.f / convolution theorem (12.0.10)

In other words, the Fourier transform of the convolution is just the product of theindividual Fourier transforms.

The correlation of two functions, denoted Corr.g; h/, is defined by

Corr.g; h/ #Z 1

!1g.! C t /h.!/ d! (12.0.11)

The correlation is a function of t , which is called the lag. It therefore lies in the timedomain, and it turns out to be one member of the transform pair:

Corr.g; h/” G.f /H".f / correlation theorem (12.0.12)

[More generally, the second member of the pair is G.f /H.!f /, but we are restrict-ing ourselves to the usual case in which g and h are real functions, so we take theliberty of setting H.!f / D H".f /.] This result shows that multiplying the Fouriertransform of one function by the complex conjugate of the Fourier transform of theother gives the Fourier transform of their correlation. The correlation of a functionwith itself is called its autocorrelation. In this case (12.0.12) becomes the transformpair

Corr.g; g/”jG.f /j2 Wiener-Khinchin theorem (12.0.13)

The total power in a signal is the same whether we compute it in the timedomain or in the frequency domain. This result is known as Parseval’s theorem:

total power #Z 1

!1jh.t/j2 dt D

Z 1

!1jH.f /j2 df (12.0.14)

Frequently one wants to know “how much power” is contained in the frequencyinterval between f and f C df . In such circumstances, one does not usually distin-guish between positive and negative f , but rather regards f as varying from 0 (“zerofrequency” or D.C.) toC1. In such cases, one defines the one-sided power spectraldensity (PSD) of the function h as

Ph.f / # jH.f /j2 C jH.!f /j2 0 $ f <1 (12.0.15)

Page 2: Hadoop sensordata part2

Convolution• Amount of overlap between 2 functions as they are

translated

• In practice: discrete (sampled) signal s, short response function r (kernel)

!!

“nr3” — 2007/5/1 — 20:53 — page 642 — #664 !!

! !

642 Chapter 13. Fourier and Spectral Applications

sj0

0

0

N ! 1rk

(r*s)j

N ! 1

N ! 1

Figure 13.1.2. Convolution of discretely sampled functions. Note how the response function for negativetimes is wrapped around and stored at the extreme right end of the array rk .

Example: A response function with r0 D 1 and all other rk’s equal to zerois just the identity filter. Convolution of a signal with this response function givesidentically the signal. Another example is the response function with r14 D 1:5 andall other rk’s equal to zero. This produces convolved output that is the input signalmultiplied by 1:5 and delayed by 14 sample intervals.

Evidently, we have just described in words the following definition of discreteconvolution with a response function of finite duration M :

.r ! s/j "M=2X

kD!M=2C1sj!k rk (13.1.1)

If a discrete response function is nonzero only in some range #M=2 < k $ M=2,where M is a sufficiently large even integer, then the response function is called afinite impulse response (FIR), and its duration is M . (Notice that we are defining Mas the number of nonzero values of rk ; these values span a time interval of M # 1sampling times.) In most practical circumstances the case of finite M is the case ofinterest, either because the response really has a finite duration, or because we chooseto truncate it at some point and approximate it by a finite-duration response function.

The discrete convolution theorem is this: If a signal sj is periodic with periodN , so that it is completely determined by theN values s0; : : : ; sN!1, then its discreteconvolution with a response function of finite durationN is a member of the discreteFourier transform pair,

!!

“nr3” — 2007/5/1 — 20:53 — page 602 — #624 !!

! !

602 Chapter 12. Fast Fourier Transform

h.at/” 1

jajH.f

a/ time scaling (12.0.5)

1

jbjh.t

b/” H.bf / frequency scaling (12.0.6)

h.t ! t0/” H.f / e2! if t0 time shifting (12.0.7)

h.t/ e!2! if0t” H.f ! f0/ frequency shifting (12.0.8)

With two functions h.t/ and g.t/, and their corresponding Fourier transformsH.f / andG.f /, we can form two combinations of special interest. The convolutionof the two functions, denoted g " h, is defined by

g " h #Z 1

!1g.!/h.t ! !/ d! (12.0.9)

Note that g " h is a function in the time domain and that g " h D h " g. It turns outthat the function g " h is one member of a simple transform pair,

g " h” G.f /H.f / convolution theorem (12.0.10)

In other words, the Fourier transform of the convolution is just the product of theindividual Fourier transforms.

The correlation of two functions, denoted Corr.g; h/, is defined by

Corr.g; h/ #Z 1

!1g.! C t /h.!/ d! (12.0.11)

The correlation is a function of t , which is called the lag. It therefore lies in the timedomain, and it turns out to be one member of the transform pair:

Corr.g; h/” G.f /H".f / correlation theorem (12.0.12)

[More generally, the second member of the pair is G.f /H.!f /, but we are restrict-ing ourselves to the usual case in which g and h are real functions, so we take theliberty of setting H.!f / D H".f /.] This result shows that multiplying the Fouriertransform of one function by the complex conjugate of the Fourier transform of theother gives the Fourier transform of their correlation. The correlation of a functionwith itself is called its autocorrelation. In this case (12.0.12) becomes the transformpair

Corr.g; g/”jG.f /j2 Wiener-Khinchin theorem (12.0.13)

The total power in a signal is the same whether we compute it in the timedomain or in the frequency domain. This result is known as Parseval’s theorem:

total power #Z 1

!1jh.t/j2 dt D

Z 1

!1jH.f /j2 df (12.0.14)

Frequently one wants to know “how much power” is contained in the frequencyinterval between f and f C df . In such circumstances, one does not usually distin-guish between positive and negative f , but rather regards f as varying from 0 (“zerofrequency” or D.C.) toC1. In such cases, one defines the one-sided power spectraldensity (PSD) of the function h as

Ph.f / # jH.f /j2 C jH.!f /j2 0 $ f <1 (12.0.15)

signal

kernel

convolution

Page 3: Hadoop sensordata part2

Convolution• Width of kernel defines smoothing strength

Page 4: Hadoop sensordata part2

Convolution• Width of kernel defines smoothing strength

signal

kernel 1

convolution 2

kernel 2

convolution 1

Page 5: Hadoop sensordata part2

Convolution• Width of kernel defines smoothing strength

• Quite fast (O(N*M)), not fast enough

signal

kernel 1

convolution 2

kernel 2

convolution 1

Page 6: Hadoop sensordata part2

Convolution• Width of kernel defines smoothing strength

• Quite fast (O(N*M)), not fast enough

signal

kernel 1

convolution 2

kernel 2

convolution 1

Page 7: Hadoop sensordata part2

Reduce

Map

Page 8: Hadoop sensordata part2

Reduce

Map

Page 9: Hadoop sensordata part2

Reduce

Map Map Map MapMap

Page 10: Hadoop sensordata part2

Reduce Reduce Reduce Reduce Reduce

Map Map Map MapMap

Page 11: Hadoop sensordata part2

Reduce Reduce Reduce Reduce Reduce

Map Map Map MapMap

Page 12: Hadoop sensordata part2

Reduce Reduce Reduce Reduce Reduce

Map Map Map MapMapBuild

windowsBuild

windowsBuild

windowsBuild

windowsBuild

windows

Page 13: Hadoop sensordata part2

Reduce Reduce Reduce Reduce Reduce

Map Map Map MapMapBuild

windowsBuild

windowsBuild

windowsBuild

windowsBuild

windows

Page 14: Hadoop sensordata part2

Reduce Reduce Reduce Reduce Reduce

Map Map Map MapMapBuild

windowsBuild

windowsBuild

windowsBuild

windowsBuild

windows

Shuffle

Page 15: Hadoop sensordata part2

Reduce Reduce Reduce Reduce Reduce

Map Map Map MapMapBuild

windowsBuild

windowsBuild

windowsBuild

windowsBuild

windows

Shuffle

Convolute Convolute Convolute Convolute Convolute

Page 16: Hadoop sensordata part2

Reduce Reduce Reduce Reduce Reduce

Map Map Map MapMapBuild

windowsBuild

windowsBuild

windowsBuild

windowsBuild

windows

Shuffle

Convolute Convolute Convolute Convolute Convolute

Page 17: Hadoop sensordata part2

Reduce Reduce Reduce Reduce Reduce

Map Map Map MapMapBuild

windowsBuild

windowsBuild

windowsBuild

windowsBuild

windows

Shuffle

Convolute Convolute Convolute Convolute Convolute

Page 18: Hadoop sensordata part2

Convolution in Hadoop• Wrap-around problem

!!

“nr3” — 2007/5/1 — 20:53 — page 644 — #666 !!

! !

644 Chapter 13. Fourier and Spectral Applications

m+

spoiled spoiledunspoiled

m!

response function

sample of original function

convolution

m+

m!

Figure 13.1.3. The wraparound problem in convolving finite segments of a function. Not only mustthe response function wrap be viewed as cyclic, but so must the sampled original function. Therefore,a portion at each end of the original function is erroneously wrapped around by convolution with theresponse function.

response function

m+ m!

m!

m+ m!

m+

zero paddingoriginal function

spoiledbut irrelevant

unspoiled

not spoiled because zero

Figure 13.1.4. Zero-padding as solution to the wraparound problem. The original function is extendedby zeros, serving a dual purpose: When the zeros wrap around, they do not disturb the true convolution;and while the original function wraps around onto the zero region, that region can be discarded.

Page 19: Hadoop sensordata part2

Convolution in Hadoop• Wrap-around problem

• Ignore spoiled regions

• Mirror the sequence (works well in our case)

• Zero-padding

!!

“nr3” — 2007/5/1 — 20:53 — page 644 — #666 !!

! !

644 Chapter 13. Fourier and Spectral Applications

m+

spoiled spoiledunspoiled

m!

response function

sample of original function

convolution

m+

m!

Figure 13.1.3. The wraparound problem in convolving finite segments of a function. Not only mustthe response function wrap be viewed as cyclic, but so must the sampled original function. Therefore,a portion at each end of the original function is erroneously wrapped around by convolution with theresponse function.

response function

m+ m!

m!

m+ m!

m+

zero paddingoriginal function

spoiledbut irrelevant

unspoiled

not spoiled because zero

Figure 13.1.4. Zero-padding as solution to the wraparound problem. The original function is extendedby zeros, serving a dual purpose: When the zeros wrap around, they do not disturb the true convolution;and while the original function wraps around onto the zero region, that region can be discarded.

Page 20: Hadoop sensordata part2

Convolution in Hadoop• Data split problem: windowing

• `Overlap-convolute’

Page 21: Hadoop sensordata part2

Convolution in Hadoop• Data split problem: windowing

• `Overlap-convolute’

Map(window)

timestamp1timestamp2

timestamp3

Page 22: Hadoop sensordata part2

Convolution in Hadoop• Data split problem: windowing

• `Overlap-convolute’

Map(window) 1 2 1 2 3 2 3

Mapper1 Mapper2 Mapper3

timestamp1timestamp2

timestamp3

Page 23: Hadoop sensordata part2

Convolution in Hadoop• Data split problem: windowing

• `Overlap-convolute’

Map(window)

Reduce(convolute)

1 2 1 2 3 2 3

Mapper1 Mapper2 Mapper3

timestamp1timestamp2

timestamp3

Page 24: Hadoop sensordata part2

Convolution in Hadoop• Data split problem: windowing

• `Overlap-convolute’

Map(window)

Reduce(convolute)

1 2 1 2 3 2 3

Mapper1 Mapper2 Mapper3

timestamp1timestamp2

timestamp3

Emit only unpolluted data

Page 25: Hadoop sensordata part2

Convolution in Hadoop• Data split problem: windowing

• `Convolute-add’

Page 26: Hadoop sensordata part2

Convolution in Hadoop• Data split problem: windowing

• `Convolute-add’

Map(convolutewith 0-padding)

0 00 0

0 0

Page 27: Hadoop sensordata part2

Convolution in Hadoop• Data split problem: windowing

• `Convolute-add’

Map(convolutewith 0-padding)

Reduce(add)

Add values in overlapping regions

A A+B B B+C C

Page 28: Hadoop sensordata part2

Hint: Keep mappers alive

• Mappers will be killed if you spend too much time in a loop (e.g. during long convolutions)

• Do this in large loops:

• for(loopcount%1000==0){context.progress();}

Page 29: Hadoop sensordata part2

Even faster: Fourier Transform• Converts signal from time domain to frequency domain

• Stress sensor (time domain)

• f

• Fourier transform (frequency domain)

Page 30: Hadoop sensordata part2

Discrete Fourier Transform• Converts signal from time domain to frequency domain

• Vibration sensor (time domain)

• Fourier transform (frequency domain)

Page 31: Hadoop sensordata part2

DFT for convolution• Convolution theorem: Fourier transform of convolution

is product of individual Fourier transforms

• Discrete convolution theorem:

• Conditions:

• Signal periodic: 0-padding (see above)

• Signals of same length: Pad response function with 0s

!!

“nr3” — 2007/5/1 — 20:53 — page 643 — #665 !!

! !

13.1 Convolution and Deconvolution Using the FFT 643

N=2X

kD!N=2C1sj!k rk ” SnRn (13.1.2)

Here Sn .n D 0; : : : ; N ! 1/ is the discrete Fourier transform of the values sj .j D0; : : : ; N ! 1/, while Rn .n D 0; : : : ; N ! 1/ is the discrete Fourier transform ofthe values rk .k D 0; : : : ; N ! 1/. These values of rk are the same as for the rangek D !N=2C 1; : : : ; N=2, but in wraparound order, exactly as was described at theend of !12.2.

13.1.1 Treatment of End Effects by Zero PaddingThe discrete convolution theorem presumes a set of two circumstances that are

not universal. First, it assumes that the input signal is periodic, whereas real dataoften either go forever without repetition or else consist of one nonperiodic stretchof finite length. Second, the convolution theorem takes the duration of the responseto be the same as the period of the data; they are both N . We need to work aroundthese two constraints.

The second is very straightforward. Almost always, one is interested in aresponse function whose duration M is much shorter than the length of the data setN . In this case, you simply extend the response function to length N by paddingit with zeros, i.e., define rk D 0 for M=2 " k " N=2 and also for !N=2 C1 "" !M=2C 1. Dealing with the first constraint is more challenging. Sincethe convolution theorem rashly assumes that the data are periodic, it will falsely“pollute” the first output channel .r # s/0 with some wrapped-around data from thefar end of the data stream sN!1; sN!2, etc. (See Figure 13.1.3.) So, we need to setup a buffer zone of zero-padded values at the end of the sj vector, in order to makethis pollution zero. How many zero values do we need in this buffer? Exactly asmany as the most negative index for which the response function is nonzero. Forexample, if r!3 is nonzero while r!4; r!5; : : : are all zero, then we need three zeropads at the end of the data: sN!3 D sN!2 D sN!1 D 0. These zeros will protect thefirst output channel .r #s/0 from wraparound pollution. It should be obvious that thesecond output channel .r # s/1 and subsequent ones will also be protected by thesesame zeros. Let K denote the number of padding zeros, so that the last actual inputdata point is sN!K!1.

What now about pollution of the very last output channel? Since the data nowend with sN!K!1, the last output channel of interest is .r # s/N!K!1. This channelcan be polluted by wraparound from input channel s0 unless the number K is alsolarge enough to take care of the most positive index k for which the response functionrk is nonzero. For example, if r0 through r6 are nonzero, while r7; r8 : : : are all zero,then we need at least K D 6 padding zeros at the end of the data: sN!6 D : : : DsN!1 D 0.

To summarize — we need to pad the data with a number of zeros on one endequal to the maximum positive duration or maximum negative duration of the re-sponse function, whichever is larger. (For a symmetric response function of durationM , you will need only M=2 zero pads.) Combining this operation with the paddingof the response rk described above, we effectively insulate the data from artifacts ofundesired periodicity. Figure 13.1.4 illustrates matters.

!!

“nr3” — 2007/5/1 — 20:53 — page 602 — #624 !!

! !

602 Chapter 12. Fast Fourier Transform

h.at/” 1

jajH.f

a/ time scaling (12.0.5)

1

jbjh.t

b/” H.bf / frequency scaling (12.0.6)

h.t ! t0/” H.f / e2! if t0 time shifting (12.0.7)

h.t/ e!2! if0t” H.f ! f0/ frequency shifting (12.0.8)

With two functions h.t/ and g.t/, and their corresponding Fourier transformsH.f / andG.f /, we can form two combinations of special interest. The convolutionof the two functions, denoted g " h, is defined by

g " h #Z 1

!1g.!/h.t ! !/ d! (12.0.9)

Note that g " h is a function in the time domain and that g " h D h " g. It turns outthat the function g " h is one member of a simple transform pair,

g " h” G.f /H.f / convolution theorem (12.0.10)

In other words, the Fourier transform of the convolution is just the product of theindividual Fourier transforms.

The correlation of two functions, denoted Corr.g; h/, is defined by

Corr.g; h/ #Z 1

!1g.! C t /h.!/ d! (12.0.11)

The correlation is a function of t , which is called the lag. It therefore lies in the timedomain, and it turns out to be one member of the transform pair:

Corr.g; h/” G.f /H".f / correlation theorem (12.0.12)

[More generally, the second member of the pair is G.f /H.!f /, but we are restrict-ing ourselves to the usual case in which g and h are real functions, so we take theliberty of setting H.!f / D H".f /.] This result shows that multiplying the Fouriertransform of one function by the complex conjugate of the Fourier transform of theother gives the Fourier transform of their correlation. The correlation of a functionwith itself is called its autocorrelation. In this case (12.0.12) becomes the transformpair

Corr.g; g/”jG.f /j2 Wiener-Khinchin theorem (12.0.13)

The total power in a signal is the same whether we compute it in the timedomain or in the frequency domain. This result is known as Parseval’s theorem:

total power #Z 1

!1jh.t/j2 dt D

Z 1

!1jH.f /j2 df (12.0.14)

Frequently one wants to know “how much power” is contained in the frequencyinterval between f and f C df . In such circumstances, one does not usually distin-guish between positive and negative f , but rather regards f as varying from 0 (“zerofrequency” or D.C.) toC1. In such cases, one defines the one-sided power spectraldensity (PSD) of the function h as

Ph.f / # jH.f /j2 C jH.!f /j2 0 $ f <1 (12.0.15)

Page 32: Hadoop sensordata part2

Discrete Fourier Transform• DFT is O(NlogN)

• In Hadoop:

• Modification of Parallel-FFT

• Convolution:

• MR-DFT

• Take product of both FTs

• inverse MR-DFT

Page 33: Hadoop sensordata part2

ConvoluteG’,G’’,G’’’

ConvoluteG’,G’’,G’’’

ConvoluteG’,G’’,G’’’

ConvoluteG’,G’’,G’’’

ConvoluteG’,G’’,G’’’

Windowing Windowing Windowing WindowingWindowing

Shuffle

Emit zero-crossings

Segmentation

Page 34: Hadoop sensordata part2

Segmentation

1st, 2nd,3rd degree derivatives

signalconvolutionsegmentation

Page 35: Hadoop sensordata part2

Segmentationsignalconvolutionsegmentation


Recommended