MODULE 4: Communication, Storage, and Physical Limits of ...ffh8x/d/soi18F/Module04.pdf · Smoothed...

1

How does the iPhone store and communicate data reliably?

Answer:

MODULE 4: Communication, Storage, and

Physical Limits of the iPhone

Limits on information transfer

Latency

Channels of communication

Bandwidth

Noise and thresholding

Signal-to-noise ratio

Error detection and correction

Parity

Hamming distance

Hamming code

Channel capacity & Shannon-Hartley Law

Toshiba TSB3234X68354TWNA1 64

GB flash memory (iFixit)

Communication Storage

2

Limits on Information Transfer

In general, there are three classes of limits that govern the

transmission of information. We will look at examples of each.

Latency – delay in information transmission.

Bandwidth – spectral range of information at source and within

channel.

Signal/Noise Ratio – extent to which the information is corrupted by

outside sources.

For the future (and past) plumbers in the class, we can use the

analogy of a pipe: The latency is the length of the pipe. The

bandwidth is the diameter of the pipe. The SNR is the porosity of

the pipe.

3

Latency

Latency is the job security term for time lag

Message transmitted at 𝑡 = 0

o When is it received? Delay = latency

o Remember: 𝑡 =𝑑

𝑣

o In general, we're talking about EM waves with 𝑣 = 𝑐 (but

not always)

EX: iPhone conversation between Washington DC and Tokyo.

Cable at ground level: 𝑑 = 11,000 𝑘𝑚

o Assume 𝑣 = 𝑐 = 3 ∙ 105 𝑘𝑚/𝑠; 𝑡 = 36 𝑚𝑠 one way

Link through geosynchronous satellite: 𝑑 = 76,000 𝑘𝑚

o 𝑡 = 255 𝑚𝑠

Earth-Voyager 1: 19:36’:38’’

Delay is actually twice what is cited. Because one person waits to

hear the other before replying.

Let's listen to *thetruth.wav, *thetruth072.wav (72 ms delay) and *thetruth510.wav (510 ms delay).

Unfortunately, latency limits are often absolute. Not much can be

done to improve latency aside from moving the transmitter and

receiver closer together.

4

Latency over communication channels

In many electronic systems (including the iPhone), the transmitter

converts a voltage or current signal into an electromagnetic wave

that travels through wires, a fiber or space – these are the

channels

Channels

o Air / space

Free

Susceptible to noise

Limited capacity

Need for antennae – cellular structures

Square law: signal power dies off as 1/r2

o Wires

Expensive for long distances

Can partially shield against interference

Good capacity for short distances, poor for long

Interface to transmitter and receiver is cheap and easy

o Glass

Medium itself is cheap, but significant costs for long

distances (right of way, trenching, etc.)

5

Speed of EM Waves

The lower limit on latency is generally set by "speed of light"

considerations.

We often naïvely use 3 ∙ 1010 𝑐𝑚 / 𝑠 as the speed of propagation,

this will depend on the material in channel…

o 𝑐 =1

√𝜇𝜀

In a vacuum,

o 𝜇0 = 1.26 ∙ 10−8 𝑊𝑏

𝐴𝑐𝑚 (permeability)

o 𝜀0 = 8.85 ∙ 10−14 𝐹

𝑐𝑚 (permittivity)

o Here 𝑐 =1

√𝜇𝜀 = 3 ∙ 1010 𝑐𝑚/𝑠

Not in a vacuum, 𝜀 = 𝜅𝜀0

6

o Where is the dielectric constant

SiO2, = 3.8

Air, = 1

Fiberglass, = 4.1 to 7.2

Water, = 80

Polyethylene, = 2.3

Then 𝑐 =1

√𝜇0𝜅𝜀0 =

𝑐0

√𝜅

Consider a cable constructed with polyethylene dielectric

o 𝑐 =𝑐0

√𝜅=

𝑐0

√2.3= 0.66𝑐0

7

In an integrated circuit using SiO2, we have

o 𝑐 =𝑐0

√𝜅=

𝑐0

√3.8= 0.51𝑐0

o For a 3 GHz CPU, each clock cycle is 33 𝑛𝑠

o Signal travels 5 cm

EX: Disk drive

Disk drive removal in original iPod

A decent disk drive might spin at 7200 rpm

If the head is on the correct track, average latency = time

necessary for disk to complete ½ revolution.

Period of revolution is the reciprocal of rpm or rps

8

In seconds, 𝑙𝑎𝑡𝑒𝑛𝑐𝑦 =1

2∙

60

𝑟𝑝𝑚 (since there are 60 s in a minute)

For 7200 rpm disk, this latency is about 4 ms

At 1 GHz clock speed, this is four million clock cycles!

o Wasted time waiting for the disk to spin!

There’s also the seek time, needed for the head to move to the

right track, of the order of few milliseconds.

An SSD is hundreds of times faster: no moving parts

DRAM is even faster

Solution: put frequently used data in RAM and cache

This explains why "virtual memory" is slow.

9

Bandwidth Limits

Bandwidth refers to the range of frequencies characterizing the

signal or channel.

The range of frequencies of the source is the message bandwidth,

W.

EX: Audio information can be perceived by the human ear from 20

Hz to 20,000 Hz. W = 20KHz.

EX: Color can be perceived by the human eye from = 430 nm to

700 nm (these are wavelengths for ROYGBIV). Since 𝑓 =𝑐

𝜆, we

have a frequency range of 700 x 1012 Hz to 430 x 1012 Hz (430 THz

to 700 THz). W = 270 THz.

A communication channel can only pass a range of frequencies –

this is the channel bandwidth.

o A typical communication channel is a Linear Time-Invariant

system

o So each frequency has a gain, and typically the gain is non-

zero for only a limited range of frequencies

10

Bandwidth limits result in signal attenuation (job security for

decreased amplitude or power) and changes in waveform.

Consider the product of cosines passed through a low pass filter…

(lower left is inverse transform after application of filter on lower

right)

11

Let's increase the frequency of the "square wave"

12

300 Hz Square Wave

500 Hz Square Wave

13

Distortion and Attenuation as frequency is increased

1 kHz Square Wave

14

Frequency affects maximum switching time in a digital system (e.g.,

from 0 to 1) ***

The undefined region in between is called the noise margin

Ideal Logic Transition

Noisy Logic Transition

Logic High

Logic Low

undefined

15

Due to finite bandwidth, the transition time is not instantaneous

Transition time is the time needed to switch states (from 1 to 0 or

vice versa)

Decrease in amplitude (attenuation) will affect thresholding

Smoothed waveform from frequency cutoff will affect transition

time

Your iPhone’s ARM processor is limited by this transition time!

Actual Logic Transition

Logic High

Logic Low

undefined

output

logic transition time

16

Noise

All signals are corrupted by noise.

Noise = non-message part of signal

Noise comes from many sources:

o Power lines (hum)

o Motors and electrical appliances (radiating EM waves)

o Fluorescent lights

o Radio, TV, radar transmitters

o Computers, monitors, printers

o Cell phones, cordless phones

𝑦ሺ𝑗ሻ = 𝑥ሺ𝑗ሻ + 𝑛ሺ𝑗ሻ

17

Hum

Hiss

18

Noise is random, so we can't predict it (deterministically). But we

can make statistical observations.

For discrete random variables (the result of a random

experiment), we can talk about the probability of any given value:

o X = the number showing on a die

o P(X=3) = 1/6

But noise takes values from a continuous set, for example the set

of real numbers.

For continuous random variables, the probabilities are represented

by their distribution:

o The height of the graph for any point 𝑥 is proportinal to the

probability that the value is around 𝑥

19

The probability of being between 𝑎 and 𝑏 is equal to the area

under the curve between 𝑎 and 𝑏:

The most common noise distribution is normal (Gaussian). The

spread of the distribution is determined by its variance 𝜎2:

CDF: cumulative density function:

For a given distribution 𝐶𝐷𝐹ሺ𝑥ሻ is the probability that the value is

smaller than or equal to 𝑥.

2 .5

2 1

2 2

𝑏 𝑎

20

Error rate in digital systems as with the iPhone is related to Signal

to Noise Ratio

𝑆𝑖𝑔𝑛𝑎𝑙 𝑝𝑜𝑤𝑒𝑟

𝑁𝑜𝑖𝑠𝑒 𝑝𝑜𝑤𝑒𝑟=

𝑃𝑆

𝑃𝑁= 𝑆𝑁𝑅

Power of signal with amplitude 𝐴: 𝑃𝑆 = 𝐴2

Noise power: 𝑃𝑁 = 𝜎2 (variance)

To improve fidelity of message, either

o Boost signal level

o Decrease noise level

To boost signal, you need a more powerful transmitter or be

located closer to antenna

To decrease noise, you need improved shielding or coding or

filtering

Powerful solution = Code information digitally

o Use thresholding to decrease effects of noise

21

Digital Example

Information is encoded as a sequence of ones and zeros (more on

this later)

We start with a continuous signal – a voltage, for example (such

as speech from a microphone or the voltage from a switch)

During the transmission process (or elsewhere) this signal gets

corrupted by random noise

SNR=2 (2 to 1)

22

Recall that data is a sequence of ones and zeros

(e.g., 1 0 1 0 1 0 1 0 1 0 1 0)

We need to sample the analog (voltage) signal at regular

intervals to obtain the 1-0 signal

After sampling, we might obtain 1.05, -0.06, 1.07, 0.09, 1.20,

0.01

Then, we threshold. If the number (the voltage) is above .5, we

call it a ‘1’; if the voltage is below .5, we call it a ‘0’

Then, we have 1, 0, 1, 0, 1, 0

Threshold Operation

“0”

“1”

Sampling + Thresholding restores original data

23

The success of thresholding depends on the severity of the noise,

measured by the SNR.

Effect of increasing noise

0.2= 0.02=

0.05=

0.1=

0.5=

1.0=

24

What is RMS? For a constant signal with voltage

amplitude 𝑣, the power is 𝑣2. But what if the

signal is not constant, such as a sinusoid? Then we

need to find the average power:

𝑃𝑆 = 𝑣2̅̅ ̅ =1

𝑇∫ 𝑣2𝑑𝑡

𝑇

0

But we like saying that the power is equal to the square of the

voltage amplitude, so we define the RMS (root-mean-square)

voltage as:

𝑣𝑅𝑀𝑆 = √𝑣2̅̅ ̅ = √1

𝑇∫ 𝑣2𝑑𝑡

𝑇

0

EX: sinusoidal signal 𝑣ሺ𝑡ሻ = 𝑉𝑀cos ሺ𝜔𝑡ሻ

Remember: 𝑇 =2𝜋

𝜔; cosሺ𝑎ሻ cosሺ𝑏ሻ = ½ ሺcosሺ𝑎 + 𝑏ሻ +

cosሺ𝑎 − 𝑏ሻሻ

So, 𝑣𝑅𝑀𝑆2 =

1

2𝑣𝑀

2 ; 𝑣𝑅𝑀𝑆 =𝑣𝑀

√2 for a sinusoid.

25

The residential 120 V is RMS. Actual voltage = 170 cosሺ2π60tሻ

o Has the same power as a constant (DC) 120 V

Credit: homepower.com

Similarly, if we were to measure a voltage level of noise, we would

not have a constant level. We need an average voltage. And since

that voltage might go negative and positive (like a sinusoid), we

need a measure of the average energy, not just the average signal

height (that could be zero). Computations for the noise is more

26

complicated but it turns out that the RMS value of noise is 𝑛𝑅𝑀𝑆 =

𝜎 (standard deviation).

Then we can write the SNR as 𝑆𝑁𝑅 =𝑣𝑅𝑀𝑆

2

𝑛𝑅𝑀𝑆2

Consider the following signal…

𝑆𝑁𝑅 =0.5

0.0025= 200

Signal + Noise

v(t) = 1cos(t) vRMS = 1/2 ~ 0.707 nRMS = 0.05

27

SNR is often expressed in decibels (dB).

o 𝑺𝑵𝑹 ሺ𝒅𝑩ሻ = 𝟏𝟎 𝒍𝒐𝒈𝟏𝟎ሺ𝑺𝑵𝑹ሻ, 𝑺𝑵𝑹 = 𝟏𝟎𝑺𝑵𝑹ሺ𝒅𝑩ሻ/𝟏𝟎

For example: SNR SNR (dB)

.01 -20 dB

1 0 dB

2 3 dB (classic in engineering)

10 10 dB

1000 30 dB

Does SNR matter to the iPhone? Does it affect the “bars” at the

top of the iPhone?

Probability of error goes down with increased SNR…

SNRdB

28

For both analog and digital signals, fidelity improves with higher

SNR

Signal is unusable at low SNR

SNRdB

29

Some signals are inherently low SNR. Example: ultrasound image

of mouse heart (SNR is about 2, SNR (dB) is about 3).

30

To improve SNR, you have two choices.

o Increase S power, which is usually difficult without moving

closer to transmitter. Remember the inverse square law we

mentioned in the first week of class.

The power density (W/cm2) of your flashlight beam will

die off as 1

𝑑2 .

Same with the power of your cell phone signal

(solution: place more antennae)

o Decrease N power

Let's discuss two noise reduction techniques (that do not involve

filtering).

Shielding prevents noise in the form of EM

waves from inducing currents.

Can prevent external induction of current by wrapping the wire in

a tube called a shield.

Another approach is called signal differencing.

31

Noise is acquired during transmission via long cable.

To implement the differencing method, we first generate

differential signals: 1

2𝑠ሺ𝑡ሻ and −

1

2𝑠ሺ𝑡ሻ

We receive: 1

2𝑠ሺ𝑡ሻ + 𝑛ሺ𝑡ሻ and −

1

2𝑠ሺ𝑡ሻ + 𝑛ሺ𝑡ሻ

After inverting the second signal (and adding together), we have

1

2𝑠ሺ𝑡ሻ + 𝑛ሺ𝑡ሻ + −(−

1

2𝑠ሺ𝑡ሻ + 𝑛ሺ𝑡ሻ) = 𝑠ሺ𝑡ሻ

noise

transmitted signal received signal

n(t)

s(t) s(t) + n(t)

32

s(t)

Generate differential signals

s(t)

2

-s(t)

2

33

noise

transmitted signals received signals

n(t)

0.5s(t) + n(t)

s(t)

2

-s(t)

2

-0.5s(t) + n(t)

34

What does the signal differencing method depend upon?

How to ensure similar noise signals?

o One method: use twisted pair cabling

0.5s(t) + n(t)

-0.5s(t) + n(t) -(-0.5s(t) + n(t))

s(t)

35

Question: how does the iPhone remove background noise?

How many microphones does it have?

Differential signaling in chips:

36

Spectrum of White Noise

The spectrum of the noise is the Fourier transform of the noise.

(more in ECE Fun III if you take it…)

For white noise, the average squared magnitude (over M noise

images) of the spectrum is constant over all frequencies (flat

spectrum, hence white):

On the average, white noise contains all frequencies in equal

amounts. Like white light.

Listen to *whitenoise.wav

White Noise

37

White noise is only an approximate model. It is a way of saying

that the spectrum of the true signal IC(x) will have added to it a

broadband signal that has a lot of high frequencies in it:

Spectrum of signal Spectrum of noise

• Here, the objective of signal enhancement is to remove as much

of the high-frequency noise spectrum as possible while

preserving as much of the image spectrum as possible

x x

+

I (x)C N (x)

C

+

38

• This is generally accomplished by applying a low-pass filter of

fairly wide bandwidth (since images are themselves fairly

wideband)

Digital White Noise

• We make a similar model for digital additive white noise:

where is a digital white noise image.

• On the average (!) the elements of will be zero.

Noise Smoothing - Average Filter

• If we replace each pixel in the noisy image by the average of its

local neighbors within an M x M window:

3 x 3

f

39

40

We are biased toward a certain set of frequencies, we have

"colored noise". Spectrum is not flat

In this case, "pink noise" made with a 1/f filter.

Listen to *pinknoise.wav

Compare to *Athabaska_River.wav

Pink noise is common in electronic circuits.

Pink Noise

41

Error Detection and Correction

If the world is so noisy, why don’t I have more errors on iPhone?

Answer: the iPhone uses coding to detect or correct errors

Not limited to digital systems:

There is natural error-correction in language:

o I w-nt t- li-e f-re-er; so fa-, s- g-od!

o Th-s -s wh- pe-p-e can rea- my ha-dwr-ti-g!

There is natural error-correction in DNA:

o Codons GCT, GCC, GCA, GCG all produce Alanine

Credit cards: Let the card number be 𝑥𝑛𝑥𝑛−1 …𝑥2𝒙𝟏. Then 𝑥1

is the check digit and determined such that

∑ 𝑥𝑖

𝑖=1,3,…

+ ∑ 𝐿ሺ𝑥𝑖ሻ

𝑖=2,4,…

is a multiple of 10, where

𝐿ሺ𝑖ሻ = 𝑠𝑢𝑚 𝑜𝑓 𝑑𝑖𝑔𝑖𝑡𝑠 𝑜𝑓 2𝑖

Called Luhn’s algorithm. Also used for IMEI.

ISBNs also have check digits

What is common to all of these?

Redundancy!

42

Types of error in information storage and transfer

We generally have to deal with two types of errors:

Substitution errors: bits may be flipped:

o 1000101 → 1010001

o this → thus

o This type of error occurs both in communication and

storage

Erasure errors: a bit (or hard drive) may be “erased”

o 1000101 → 10?0?01

o this → th?s

o This type of error occurs in storage: damaged sector or

hard drive

A substitution channel may substitute symbol with a given

probability and an erasure channel may erase symbols with a given

probability

If the type of error is not given, we generally mean substitution

error.

43

A simple method used in computers for sixty years: Parity

Bits Suppose we send: 0101

With noise, we receive 0111

We have no indication that an error has occurred

Even parity: add a one or zero so that the total number of ones is

even.

For 0101 we append a 0 since it has two ones. We then transmit

01010, but we receive 01110

The checksum on 01110 is the sum modulo 2, = 1

If we get a one as the result, we know we have an error (and we

can ask to retransmit).

Obviously, this only works if the error is a single bit (or an odd

number of bits). This method assumes that a single bit error is

most likely.

Principle: with parity, we are adding redundant information.

o Redundancy allows us to reconstruct the message from an

imperfect copy

o How do we add redundancy?

o How much is enough?

o How can we measure the improvement?

44

With a binary code, we need to know which bit is incorrect – then

we can immediately correct

45

Reliability in data centers

Google, Facebook, Amazon, etc., manage data centers with

millions of hard drives

Hard drives may fail and lose data. The users’ documents

and photos could be lost forever!

From a 2013 study (Rashmi et al) of a Facebook

warehouse:

o Hundreds of petabytes (1000 TB) of data

o Growing at a few petabytes per week

o Thousands of machines each storing 24-36 TB of data

o Median of 50 nodes unavailable at a given time

o

How to protect data:

o Buy reliable hard drives: expensive and still unreliable

o Buy reliable but super expensive hard drives and back

up data: even more expensive

o Use error-correcting codes

EX 1: Replication with two copies (triple redundancy)

46

EX 2: Single Parity: group hard drives into sets of 3 drives.

Add a forth drive that stores the parity bits:

Drive 1 = 1011110100…

Drive 2 = 1111001001…

Drive 3 = 0110010001…

Drive 4 = 0010101100…

o If any drive is lost, we can recover its data.

EX 3: Reed-Solomon code: 𝑘 data drives and 𝑟 parity

drives. Data can be recovered from any 𝑘 drives

o Better protection

o More flexible design parameters.

In the Facebook warehouse, frequently accessed data is

protected by replication and less frequently accessed data

uses Reed-Solomon with 𝑘 = 10, 𝑟 = 4.

o 40% more storage used

o Out of a group of 14 drives, if any 4 fail, we won’t lose

any data.

47

Double Redundancy Code

Suppose we are sending a single message bit m.

o Let e = error probability for any bit

o Let c be a check bit

o If c = m, then it is a double

redundancy code.

Possible to receive 4 different

messages:

00, 01, 10, 11

𝑝ሺ0 𝑒𝑟𝑟𝑜𝑟𝑠ሻ = ሺ1 − 𝑒ሻሺ1 − 𝑒ሻ [assuming bits are independent

statistically]

o Good outcome

𝑝ሺ1 𝑒𝑟𝑟𝑜𝑟ሻ = ሺ1 − 𝑒ሻ𝑒 + 𝑒 ሺ1 − 𝑒ሻ

o Error but detected

𝑝ሺ2 𝑒𝑟𝑟𝑜𝑟𝑠ሻ = 𝑒2

o Error, undetected

With double redundancy, undetected error probability has

dropped from e to e2.

However, no error correction is possible with this code (why not?)

00 11

01

10

48

Analogy: If you want to know what time it is, you need three

clocks, not two (why?)

Triple Redundancy Code

Send three 0’s or three 1’s for each message bit.

Single bit errors can be corrected using the majority rule:

001, 010, 100 become 0

110, 101, 011 become 1

What is the undetected error rate?

𝑝ሺ0 𝑒𝑟𝑟𝑜𝑟𝑠ሻ = ሺ1 − 𝑒ሻ3

𝑝ሺ1 𝑒𝑟𝑟𝑜𝑟ሻ = 3ሺ1 − 𝑒ሻ2𝑒 (these are correctable)

𝑝ሺ2 𝑒𝑟𝑟𝑜𝑟𝑠ሻ = 3ሺ1 − 𝑒ሻ𝑒2 (these are not corrected – why?)

𝑝ሺ3 𝑒𝑟𝑟𝑜𝑟𝑠ሻ = 𝑒3 (these are undetected)

110 010

100 011

101 001

111

000

Declare 0 Declare 1

49

Another way to decode:

o Declare error if illegal word is received

Observation: if we don’t care to implement correction, we can

detect 2-bit errors and then the undetected error probability is

solely e3

Remember, if we have a message of n bits, the probability of

having x errors is given by the binomial distribution.

𝑷𝒏ሺ𝒙ሻ = (𝒏

𝒙) 𝒆𝒙ሺ𝟏 − 𝒆ሻ𝒙

110 010

100 011

101 001

111

000

Declare 0 Declare error Declare 1

50

Where (𝑛𝑥) = 𝐶𝑥𝑛

is the number of combinations of n things taken

x at a time, =𝑛!

𝑥!ሺ𝑛−𝑥ሻ! and e is the fixed error rate for one bit

51

Observation: adding check digits make the code words “more

different” from each other:

o 0 and 1 differ by one bit

o 00 and 11 differ by two bits

o 000 and 111 differ by three bits

So, the more different the codewords, the lower probability of

error.

How do we make them more different?

o Increase the Minimum Hamming distance

o Hamming distance is the number of places where two words

(group of bits of the same length) differ

o Another way to think about it: how many bits do we need to

flip in one word to make it like the other

110 010

100 011

101 001

111

000

110 010

100 011

101 001

111

000

Triple redundancy code. Minimum

Hamming Distance = 3 Parity check code. Minimum

Hamming Distance = 2

52

Suppose we design a coding scheme: if we want to detect up to K

errors per word, we must have minimum Hamming distance of

𝒅𝒎𝒊𝒏 = 𝑲 + 𝟏 between the code words

Why? Suppose we have two code words with d = K. So, flipping

K bits will make one equal the other.

o If we have K errors, then we decode the incorrect string and

error goes undetected

o If 𝑑 = 𝐾 + 1, we will always have one bit in error for K

errors

Similarly, if we want to correct J errors per word, we must have

𝒅𝒎𝒊𝒏 = 𝟐𝑱 + 𝟏 between code words.

The double redundancy code could detect one error: 𝐾 + 1 = 2,

𝐾 = 1

The triple redundancy code could detect two errors: 𝐾 + 1 = 3,

𝐾 = 2. Or it could correct one error: 2𝐽 + 1 = 3, 𝐽 = 1.

To correct L erasures, need minimum Hamming distance L+1.

Science of Information Fundamental Tenet VIII:

A Hamming distance (between code words) of 𝐾 + 1 is

needed to detect K errors per word, and a Hamming distance

of 2𝐽 + 1 is needed to correct J errors per word.

53

Of course, adding redundancy decreases the rate at which we can

transmit information.

Suppose the channel can support N bits/s

o Using just the message bits, we can send N bit / s with error

rate e

o Using the triple redundancy code, we can send only 𝑁

3 bits/s,

but the undetected error rate drops to 3𝑒2 − 2𝑒3

o So there is a tradeoff between transfer rate and error control

It is possible to get good error control with more favorable

transfer rates by using more sophisticated codes

EX: Block Code

Idea: Use multiple parity checks to locate the position of the error.

o The (7,4) Hamming Code is an example

o 7 total bits: 4 message bits, 3 check bits

o m1, m2, m3, m4 are the message bits

o c1, c2, c3 are the check bits

o 27 = 128 possible bit strings

o 24 = 16 are valid

54

Matrix multiplication review:

We first need to review matrix multiplication:

Let

𝐴 = [

𝑎11 𝑎12 𝑎13

𝑎21 𝑎22 𝑎23

𝑎31 𝑎32 𝑎33

𝑎41 𝑎42 𝑎43

]

and

𝑥 = [

𝑥1

𝑥2

𝑥3

]

We have

𝐴𝑥 = [

𝑥1𝑎11 + 𝑥2𝑎12 + 𝑥3𝑎13

𝑥1𝑎21 + 𝑥2𝑎22 + 𝑥3𝑎23

𝑥1𝑎31 + 𝑥2𝑎32 + 𝑥3𝑎33

𝑥1𝑎41 + 𝑥2𝑎42 + 𝑥3𝑎43

]

A useful way to think about multiplication:

𝐴𝑥 = 𝑥1 [

𝑎11

𝑎21

𝑎31

𝑎41

] + 𝑥2 [

𝑎12

𝑎22

𝑎32

𝑎42

] + 𝑥3 [

𝑎13

𝑎23

𝑎33

𝑎43

]

55

EX:

𝑥 = [010] ⇒ 𝐴𝑥 = [

𝑎12

𝑎22

𝑎32

𝑎42

]

𝑥 = [110] ⇒ 𝐴𝑥 = [

𝑎11 + 𝑎12

𝑎21 + 𝑎22

𝑎31 + 𝑎32

𝑎41 + 𝑎42

]

Question:

We know that

𝐴 = [

1 3 42 2 45 3 83 4 7

]

that 𝑥 is a 3 × 1 vector whose elements are 0/1,

and that

𝐴𝑥 = [

4487

].

What is 𝑥?

What if we knew 𝑥 contains only one 1?

56

Back to error-correcting codes:

Review: 8-bit parity check code:

o Encoding:

Message bits: 𝑚1, 𝑚2, … ,𝑚7

Parity bit 𝑐1 = 𝑚1 ⊕ 𝑚2 ⊕ …⊕ 𝑚7

Transmit [𝑚1, 𝑚2, … ,𝑚7, 𝑐1]

o Checking:

𝑚1 ⊕ 𝑚2 ⊕ …⊕ 𝑚7 ⊕ 𝑐1 = 0

(7,4) Hamming code:

o Encoding:

Message bits: 𝑚1, 𝑚2, 𝑚3, 𝑚4

Parity bit 1: 𝑐1 = 𝑚1 ⊕ 𝑚2 ⊕ 𝑚4



Transmit [𝑚1, 𝑚2, 𝑚3, 𝑚4, 𝑐1, 𝑐2, 𝑐3]

o Checking:

𝑚1 ⊕ 𝑚2 ⊕ 𝑚4 ⊕ 𝑐1 = 0

𝑚1 ⊕ 𝑚3 ⊕ 𝑚4 ⊕ 𝑐2 = 0

𝑚2 ⊕ 𝑚3 ⊕ 𝑚4 ⊕ 𝑐3 = 0

57

Example: message word = 𝒎𝟏𝒎𝟐𝒎𝟑𝒎𝟒 = 𝟏𝟎𝟏𝟏

𝒄𝟏 = 𝟏 ⊕ 𝟎 ⊕ 𝟏 = 𝟎

𝒄𝟐 = 𝟏 ⊕ 𝟏 ⊕ 𝟏 = 𝟏

𝒄𝟑 = 𝟎 ⊕ 𝟏 ⊕ 𝟏 = 𝟎

𝒙𝟏𝟎𝟏𝟏 =

[ 𝟏𝟎𝟏𝟏𝟎𝟏𝟎]

At the receiver, we check if the parities are correct:

𝑚1 ⊕ 𝑚2 ⊕ 𝑚4 ⊕ 𝑐1 = 0 ⊕ 1 ⊕ 0 ⊕ 1 = 0

𝑚1 ⊕ 𝑚3 ⊕ 𝑚4 ⊕ 𝑐2 = 1 ⊕ 1 ⊕ 1 ⊕ 1 = 0

𝑚2 ⊕ 𝑚3 ⊕ 𝑚4 ⊕ 𝑐3 = 0 ⊕ 0 ⊕ 1 ⊕ 1 = 0

Works fine if there are no errors, but what if there are?

Can we find which bit had an error, if there was one?

58

Claim: the Hamming (7,4) code can correct up to 1 error or

detect up to 2 errors.

o One way to prove this is to write out all 16 valid

codewords and check their pairwise distances.

o Better way: the parity check matrix! The basis for most of

error-correction.

59

We can compute these in matrix form using the parity check

matrix, which also has the advantage of identifying which bit is

erroneous if there is one

Want to write these equations

𝑚1 ⊕ 𝑚2 ⊕ 𝑚4 ⊕ 𝑐1 = 0

𝑚1 ⊕ 𝑚3 ⊕ 𝑚4 ⊕ 𝑐2 = 0

𝑚2 ⊕ 𝑚3 ⊕ 𝑚4 ⊕ 𝑐3 = 0

as matrix equation of the form

𝐻𝑥 = 0

where

𝑥 =

[ 𝑚1

𝑚2

𝑚3

𝑚4

𝑐1

𝑐2

𝑐3 ]

We let

𝐻 = [1 1 0 1 1 0 01 0 1 1 0 1 00 1 1 1 0 0 1

]

60

We then find

𝒔 = [

𝒔𝟏

𝒔𝟐

𝒔𝟑

] = 𝑯𝒙

We want 𝑠1 = 0, the first parity equation is satisfied: 𝑠1 = 1 ⋅ 𝑚1 ⊕ 1 ⋅ 𝑚2 ⊕ 0 ⋅ 𝑚3 ⊕ 1 ⋅ 𝑚4 ⊕ 1 ⋅ 𝑐1 ⊕ 0 ⋅ 𝑐2

⊕ 0 ⋅ 𝑐3 = 𝒎𝟏 ⊕ 𝒎𝟐 ⊕ 𝒎𝟒 ⊕ 𝒄𝟏 = 𝟎

The product 𝑠 = 𝐻𝑥 is called the syndrome. Because it tells us if

there’s anything wrong!

Let’s test this

𝒔 = 𝑯𝒙𝟏𝟎𝟏𝟏 = [𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟎𝟎 𝟏 𝟏 𝟏 𝟎 𝟎 𝟏

]

[ 𝟏𝟎𝟏𝟏𝟎𝟏𝟎]

= [𝟏𝟏𝟎] + [

𝟎𝟏𝟏] + [

𝟏𝟏𝟏] + [

𝟎𝟏𝟎] = [

𝟎𝟎𝟎]

All parities correct!

61

Transmission error:

𝒙𝟏𝟎𝟏𝟏 =

[ 𝟏𝟎𝟏𝟏𝟎𝟏𝟎]

, 𝒙 =

[ 𝟏𝟏𝟏𝟏𝟎𝟏𝟎]

Find syndrome

𝒔 = 𝑯𝒙 = [𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟎𝟎 𝟏 𝟏 𝟏 𝟎 𝟎 𝟏

]

[ 𝟏𝟏𝟏𝟏𝟎𝟏𝟎]

= [𝟏𝟏𝟎] + [

𝟏𝟎𝟏] + [

𝟎𝟏𝟏] + [

𝟏𝟏𝟏] + [

𝟎𝟏𝟎] = [

𝟏𝟎𝟏]

There’s an error somewhere! But where?

62

Transmission error (example 2):

𝒙𝟏𝟎𝟏𝟏 =

[ 𝟏𝟎𝟏𝟏𝟎𝟏𝟎]

, 𝒙 =

[ 𝟏𝟎𝟏𝟎𝟎𝟏𝟎]

Find syndrome

𝒔 = 𝑯𝒙 = [𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟎𝟎 𝟏 𝟏 𝟏 𝟎 𝟎 𝟏

]

[ 𝟏𝟎𝟏𝟎𝟎𝟏𝟎]

= [𝟏𝟏𝟎] + [

𝟎𝟏𝟏] + [

𝟎𝟏𝟎] = [

𝟏𝟏𝟏]

There’s an error somewhere! But where?

63

𝒙𝟏𝟎𝟏𝟏 =

[ 𝟏𝟎𝟏𝟏𝟎𝟏𝟎]

, 𝒙 =

[ 𝟏𝟏𝟏𝟏𝟎𝟏𝟎]

𝒙 = 𝒙𝟏𝟎𝟏𝟏 + 𝒆

𝒆 =

[ 𝟎𝟏𝟎𝟎𝟎𝟎𝟎]

𝒔 = 𝑯𝒙 = 𝑯ሺ𝒙𝟏𝟎𝟏𝟏 + 𝒆ሻ = 𝑯𝒙𝟏𝟎𝟏𝟏 + 𝑯𝒆 = 𝑯𝒆

[𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟎𝟎 𝟏 𝟏 𝟏 𝟎 𝟎 𝟏

]

[ 𝟎𝟏𝟎𝟎𝟎𝟎𝟎]

= [𝟏𝟎𝟏]

If error is the i’th bit, then the syndrome will be the i’th

column of H.

64

The (7,4) Hamming code with the following parity check matrix

[𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟎𝟎 𝟏 𝟏 𝟏 𝟎 𝟎 𝟏

]

can correct any single error because all its columns are different.

The minimum Hamming distance of (7,4) Hamming code is 3.

With four parity bits → ሺ15,11ሻ Hamming code

[

𝟏 𝟏 𝟏 𝟎 𝟎 𝟎 𝟏 𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎 𝟎𝟏 𝟎 𝟎 𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟎 𝟎𝟎 𝟏 𝟎 𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟏 𝟏 𝟎 𝟎 𝟏 𝟎𝟎 𝟎 𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟏 𝟏 𝟏 𝟎 𝟎 𝟎 𝟏

]

ሺ2𝑟 − 1,2𝑟 − 1 − 𝑟ሻ Hamming code has 𝑟 parity bits and 2𝑟 −1 − 𝑟 message bit, with minimum distance 3.

A parity check code can correct 𝑡 errors if the sum of every ≤ 𝑡

columns are different.

EX: parity check matrix for quintuple redundancy code (can

transmit 00000 and 11111):

𝐻 = [

1 1 0 0 01 0 1 0 01 0 0 1 01 0 0 0 1

]

65

The sum of every 2 columns is different, so it can correct two

errors.

The sum of the first 2 columns and the last 3 columns is the

same, so it cannot correct 3 errors.

66

For the (7,4) Hamming Code:

o Signaling rate decreases by 4/7

o Can correct 1 error

𝑑𝑚𝑖𝑛 = 2 × 1 + 1 = 3

o Can detect up to 2 errors

𝑑𝑚𝑖𝑛 = 2 + 1 = 3

o Works well when errors are far apart

o Not good for burst errors (e.g., scratch on CD)

Can use interleaving to improve performance

o Record first bits of four different words, then second bits,

then third bits, then fourth bits

o If a burst error occurs, may be able to recover (since doesn’t

change entire word)

To generalize, if we have d data bits and p parity bits, the

following condition must hold for the Hamming Code:

𝑑 + 𝑝 + 1 ≤ 2𝑝

To create a Hamming code for d data bits and p parity bits, first

you have to generate the column vector that contains the data bits

67

and the parity bits. Then, you have to construct an H matrix that

essentially checks the parity bits.

68

Channel Capacity

Suppose we have a channel that has probability of error 𝑒

Let X be the input to the channel, which is random from our point

of view

We see Y which is a noisy version of X

How much information per transmission can be carried by the

channel? This is the channel capacity 𝐶

o Total information output of the channel = H(Y)

o This includes both information about X and about the noise.

o How much information about X is in Y?

Let 𝐻𝑋ሺ𝑌ሻ be the amount of information left in Y if we

know X. This is the amount of useless information in Y.

𝐻𝑋ሺ𝑌ሻ = 𝑒 log1

𝑒+ ሺ1 − 𝑒ሻ log

1

1−𝑒 = ℎሺ𝑒ሻ

o Useful information = 𝐻ሺ𝑌ሻ − 𝐻𝑋ሺ𝑌ሻ

0

1

0

1

1 − 𝑒

1 − 𝑒

𝑒

𝑒

69

o Best case scenario: 𝐻ሺ𝑌ሻ = 1 ⇒ 𝐶 = 1 − ℎሺ𝑒ሻ

Shannon showed that we can transmit 𝐶 bits of information with

arbitrarily small error but not more than 𝐶

70

More generally, suppose the output is equal to input plus noise

(input is not necessarily 0 or 1)

Assume the bandwidth is B and signal-to-noise ratio S/N

How much information can the channel carry?

o Observation: There is a tradeoff between bandwidth and

SNR.

o Can we get away with reduced power given increased

bandwidth?

Science of Information Fundamental Tenet IX

(The Shannon-Hartley Law):

Given a channel with bandwidth B and signal-to-noise ratio

S/N, the channel capacity (C) is

𝑪 = 𝑩 𝒍𝒐𝒈𝟐 (𝟏 +𝑺

𝑵)

Date post:	24-Sep-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

MODULE 4: Communication, Storage, and Physical Limits of ...ffh8x/d/soi18F/Module04.pdf · Smoothed...

Documents