1
How does the iPhone store and communicate data reliably?
Answer:
MODULE 4: Communication, Storage, and
Physical Limits of the iPhone
Limits on information transfer
Latency
Channels of communication
Bandwidth
Noise and thresholding
Signal-to-noise ratio
Error detection and correction
Parity
Hamming distance
Hamming code
Channel capacity & Shannon-Hartley Law
Toshiba TSB3234X68354TWNA1 64
GB flash memory (iFixit)
Communication Storage
2
Limits on Information Transfer
In general, there are three classes of limits that govern the
transmission of information. We will look at examples of each.
Latency – delay in information transmission.
Bandwidth – spectral range of information at source and within
channel.
Signal/Noise Ratio – extent to which the information is corrupted by
outside sources.
For the future (and past) plumbers in the class, we can use the
analogy of a pipe: The latency is the length of the pipe. The
bandwidth is the diameter of the pipe. The SNR is the porosity of
the pipe.
3
Latency
Latency is the job security term for time lag
Message transmitted at 𝑡 = 0
o When is it received? Delay = latency
o Remember: 𝑡 =𝑑
𝑣
o In general, we're talking about EM waves with 𝑣 = 𝑐 (but
not always)
EX: iPhone conversation between Washington DC and Tokyo.
Cable at ground level: 𝑑 = 11,000 𝑘𝑚
o Assume 𝑣 = 𝑐 = 3 ∙ 105 𝑘𝑚/𝑠; 𝑡 = 36 𝑚𝑠 one way
Link through geosynchronous satellite: 𝑑 = 76,000 𝑘𝑚
o 𝑡 = 255 𝑚𝑠
Earth-Voyager 1: 19:36’:38’’
Delay is actually twice what is cited. Because one person waits to
hear the other before replying.
Let's listen to *thetruth.wav, *thetruth072.wav (72 ms delay) and *thetruth510.wav (510 ms delay).
Unfortunately, latency limits are often absolute. Not much can be
done to improve latency aside from moving the transmitter and
receiver closer together.
4
Latency over communication channels
In many electronic systems (including the iPhone), the transmitter
converts a voltage or current signal into an electromagnetic wave
that travels through wires, a fiber or space – these are the
channels
Channels
o Air / space
Free
Susceptible to noise
Limited capacity
Need for antennae – cellular structures
Square law: signal power dies off as 1/r2
o Wires
Expensive for long distances
Can partially shield against interference
Good capacity for short distances, poor for long
Interface to transmitter and receiver is cheap and easy
o Glass
Medium itself is cheap, but significant costs for long
distances (right of way, trenching, etc.)
5
Speed of EM Waves
The lower limit on latency is generally set by "speed of light"
considerations.
We often naïvely use 3 ∙ 1010 𝑐𝑚 / 𝑠 as the speed of propagation,
this will depend on the material in channel…
o 𝑐 =1
√𝜇𝜀
In a vacuum,
o 𝜇0 = 1.26 ∙ 10−8 𝑊𝑏
𝐴𝑐𝑚 (permeability)
o 𝜀0 = 8.85 ∙ 10−14 𝐹
𝑐𝑚 (permittivity)
o Here 𝑐 =1
√𝜇𝜀 = 3 ∙ 1010 𝑐𝑚/𝑠
Not in a vacuum, 𝜀 = 𝜅𝜀0
6
o Where is the dielectric constant
SiO2, = 3.8
Air, = 1
Fiberglass, = 4.1 to 7.2
Water, = 80
Polyethylene, = 2.3
Then 𝑐 =1
√𝜇0𝜅𝜀0 =
𝑐0
√𝜅
Consider a cable constructed with polyethylene dielectric
o 𝑐 =𝑐0
√𝜅=
𝑐0
√2.3= 0.66𝑐0
7
In an integrated circuit using SiO2, we have
o 𝑐 =𝑐0
√𝜅=
𝑐0
√3.8= 0.51𝑐0
o For a 3 GHz CPU, each clock cycle is 33 𝑛𝑠
o Signal travels 5 cm
EX: Disk drive
Disk drive removal in original iPod
A decent disk drive might spin at 7200 rpm
If the head is on the correct track, average latency = time
necessary for disk to complete ½ revolution.
Period of revolution is the reciprocal of rpm or rps
8
In seconds, 𝑙𝑎𝑡𝑒𝑛𝑐𝑦 =1
2∙
60
𝑟𝑝𝑚 (since there are 60 s in a minute)
For 7200 rpm disk, this latency is about 4 ms
At 1 GHz clock speed, this is four million clock cycles!
o Wasted time waiting for the disk to spin!
There’s also the seek time, needed for the head to move to the
right track, of the order of few milliseconds.
An SSD is hundreds of times faster: no moving parts
DRAM is even faster
Solution: put frequently used data in RAM and cache
This explains why "virtual memory" is slow.
9
Bandwidth Limits
Bandwidth refers to the range of frequencies characterizing the
signal or channel.
The range of frequencies of the source is the message bandwidth,
W.
EX: Audio information can be perceived by the human ear from 20
Hz to 20,000 Hz. W = 20KHz.
EX: Color can be perceived by the human eye from = 430 nm to
700 nm (these are wavelengths for ROYGBIV). Since 𝑓 =𝑐
𝜆, we
have a frequency range of 700 x 1012 Hz to 430 x 1012 Hz (430 THz
to 700 THz). W = 270 THz.
A communication channel can only pass a range of frequencies –
this is the channel bandwidth.
o A typical communication channel is a Linear Time-Invariant
system
o So each frequency has a gain, and typically the gain is non-
zero for only a limited range of frequencies
10
Bandwidth limits result in signal attenuation (job security for
decreased amplitude or power) and changes in waveform.
Consider the product of cosines passed through a low pass filter…
(lower left is inverse transform after application of filter on lower
right)
11
Let's increase the frequency of the "square wave"
12
300 Hz Square Wave
500 Hz Square Wave
13
Distortion and Attenuation as frequency is increased
1 kHz Square Wave
14
Frequency affects maximum switching time in a digital system (e.g.,
from 0 to 1) ***
The undefined region in between is called the noise margin
Ideal Logic Transition
Noisy Logic Transition
Logic High
Logic Low
undefined
15
Due to finite bandwidth, the transition time is not instantaneous
Transition time is the time needed to switch states (from 1 to 0 or
vice versa)
Decrease in amplitude (attenuation) will affect thresholding
Smoothed waveform from frequency cutoff will affect transition
time
Your iPhone’s ARM processor is limited by this transition time!
Actual Logic Transition
Logic High
Logic Low
undefined
output
logic transition time
16
Noise
All signals are corrupted by noise.
Noise = non-message part of signal
Noise comes from many sources:
o Power lines (hum)
o Motors and electrical appliances (radiating EM waves)
o Fluorescent lights
o Radio, TV, radar transmitters
o Computers, monitors, printers
o Cell phones, cordless phones
𝑦ሺ𝑗ሻ = 𝑥ሺ𝑗ሻ + 𝑛ሺ𝑗ሻ
17
Hum
Hiss
18
Noise is random, so we can't predict it (deterministically). But we
can make statistical observations.
For discrete random variables (the result of a random
experiment), we can talk about the probability of any given value:
o X = the number showing on a die
o P(X=3) = 1/6
But noise takes values from a continuous set, for example the set
of real numbers.
For continuous random variables, the probabilities are represented
by their distribution:
o The height of the graph for any point 𝑥 is proportinal to the
probability that the value is around 𝑥
19
The probability of being between 𝑎 and 𝑏 is equal to the area
under the curve between 𝑎 and 𝑏:
The most common noise distribution is normal (Gaussian). The
spread of the distribution is determined by its variance 𝜎2:
CDF: cumulative density function:
For a given distribution 𝐶𝐷𝐹ሺ𝑥ሻ is the probability that the value is
smaller than or equal to 𝑥.
2 .5
2 1
2 2
𝑏 𝑎
20
Error rate in digital systems as with the iPhone is related to Signal
to Noise Ratio
𝑆𝑖𝑔𝑛𝑎𝑙 𝑝𝑜𝑤𝑒𝑟
𝑁𝑜𝑖𝑠𝑒 𝑝𝑜𝑤𝑒𝑟=
𝑃𝑆
𝑃𝑁= 𝑆𝑁𝑅
Power of signal with amplitude 𝐴: 𝑃𝑆 = 𝐴2
Noise power: 𝑃𝑁 = 𝜎2 (variance)
To improve fidelity of message, either
o Boost signal level
o Decrease noise level
To boost signal, you need a more powerful transmitter or be
located closer to antenna
To decrease noise, you need improved shielding or coding or
filtering
Powerful solution = Code information digitally
o Use thresholding to decrease effects of noise
21
Digital Example
Information is encoded as a sequence of ones and zeros (more on
this later)
We start with a continuous signal – a voltage, for example (such
as speech from a microphone or the voltage from a switch)
During the transmission process (or elsewhere) this signal gets
corrupted by random noise
SNR=2 (2 to 1)
22
Recall that data is a sequence of ones and zeros
(e.g., 1 0 1 0 1 0 1 0 1 0 1 0)
We need to sample the analog (voltage) signal at regular
intervals to obtain the 1-0 signal
After sampling, we might obtain 1.05, -0.06, 1.07, 0.09, 1.20,
0.01
Then, we threshold. If the number (the voltage) is above .5, we
call it a ‘1’; if the voltage is below .5, we call it a ‘0’
Then, we have 1, 0, 1, 0, 1, 0
Threshold Operation
“0”
“1”
Sampling + Thresholding restores original data
23
The success of thresholding depends on the severity of the noise,
measured by the SNR.
Effect of increasing noise
0.2= 0.02=
0.05=
0.1=
0.5=
1.0=
24
What is RMS? For a constant signal with voltage
amplitude 𝑣, the power is 𝑣2. But what if the
signal is not constant, such as a sinusoid? Then we
need to find the average power:
𝑃𝑆 = 𝑣2̅̅ ̅ =1
𝑇∫ 𝑣2𝑑𝑡
𝑇
0
But we like saying that the power is equal to the square of the
voltage amplitude, so we define the RMS (root-mean-square)
voltage as:
𝑣𝑅𝑀𝑆 = √𝑣2̅̅ ̅ = √1
𝑇∫ 𝑣2𝑑𝑡
𝑇
0
EX: sinusoidal signal 𝑣ሺ𝑡ሻ = 𝑉𝑀cos ሺ𝜔𝑡ሻ
Remember: 𝑇 =2𝜋
𝜔; cosሺ𝑎ሻ cosሺ𝑏ሻ = ½ ሺcosሺ𝑎 + 𝑏ሻ +
cosሺ𝑎 − 𝑏ሻሻ
So, 𝑣𝑅𝑀𝑆2 =
1
2𝑣𝑀
2 ; 𝑣𝑅𝑀𝑆 =𝑣𝑀
√2 for a sinusoid.
25
The residential 120 V is RMS. Actual voltage = 170 cosሺ2π60tሻ
o Has the same power as a constant (DC) 120 V
Credit: homepower.com
Similarly, if we were to measure a voltage level of noise, we would
not have a constant level. We need an average voltage. And since
that voltage might go negative and positive (like a sinusoid), we
need a measure of the average energy, not just the average signal
height (that could be zero). Computations for the noise is more
26
complicated but it turns out that the RMS value of noise is 𝑛𝑅𝑀𝑆 =
𝜎 (standard deviation).
Then we can write the SNR as 𝑆𝑁𝑅 =𝑣𝑅𝑀𝑆
2
𝑛𝑅𝑀𝑆2
Consider the following signal…
𝑆𝑁𝑅 =0.5
0.0025= 200
Signal + Noise
v(t) = 1cos(t) vRMS = 1/2 ~ 0.707 nRMS = 0.05
27
SNR is often expressed in decibels (dB).
o 𝑺𝑵𝑹 ሺ𝒅𝑩ሻ = 𝟏𝟎 𝒍𝒐𝒈𝟏𝟎ሺ𝑺𝑵𝑹ሻ, 𝑺𝑵𝑹 = 𝟏𝟎𝑺𝑵𝑹ሺ𝒅𝑩ሻ/𝟏𝟎
For example: SNR SNR (dB)
.01 -20 dB
1 0 dB
2 3 dB (classic in engineering)
10 10 dB
1000 30 dB
Does SNR matter to the iPhone? Does it affect the “bars” at the
top of the iPhone?
Probability of error goes down with increased SNR…
SNRdB
28
For both analog and digital signals, fidelity improves with higher
SNR
Signal is unusable at low SNR
SNRdB
29
Some signals are inherently low SNR. Example: ultrasound image
of mouse heart (SNR is about 2, SNR (dB) is about 3).
30
To improve SNR, you have two choices.
o Increase S power, which is usually difficult without moving
closer to transmitter. Remember the inverse square law we
mentioned in the first week of class.
The power density (W/cm2) of your flashlight beam will
die off as 1
𝑑2 .
Same with the power of your cell phone signal
(solution: place more antennae)
o Decrease N power
Let's discuss two noise reduction techniques (that do not involve
filtering).
Shielding prevents noise in the form of EM
waves from inducing currents.
Can prevent external induction of current by wrapping the wire in
a tube called a shield.
Another approach is called signal differencing.
31
Noise is acquired during transmission via long cable.
To implement the differencing method, we first generate
differential signals: 1
2𝑠ሺ𝑡ሻ and −
1
2𝑠ሺ𝑡ሻ
We receive: 1
2𝑠ሺ𝑡ሻ + 𝑛ሺ𝑡ሻ and −
1
2𝑠ሺ𝑡ሻ + 𝑛ሺ𝑡ሻ
After inverting the second signal (and adding together), we have
1
2𝑠ሺ𝑡ሻ + 𝑛ሺ𝑡ሻ + −(−
1
2𝑠ሺ𝑡ሻ + 𝑛ሺ𝑡ሻ) = 𝑠ሺ𝑡ሻ
noise
transmitted signal received signal
n(t)
s(t) s(t) + n(t)
32
s(t)
Generate differential signals
s(t)
2
-s(t)
2
33
noise
transmitted signals received signals
n(t)
0.5s(t) + n(t)
s(t)
2
-s(t)
2
-0.5s(t) + n(t)
34
What does the signal differencing method depend upon?
How to ensure similar noise signals?
o One method: use twisted pair cabling
0.5s(t) + n(t)
-0.5s(t) + n(t) -(-0.5s(t) + n(t))
s(t)
35
Question: how does the iPhone remove background noise?
How many microphones does it have?
Differential signaling in chips:
36
Spectrum of White Noise
The spectrum of the noise is the Fourier transform of the noise.
(more in ECE Fun III if you take it…)
For white noise, the average squared magnitude (over M noise
images) of the spectrum is constant over all frequencies (flat
spectrum, hence white):
On the average, white noise contains all frequencies in equal
amounts. Like white light.
Listen to *whitenoise.wav
White Noise
37
White noise is only an approximate model. It is a way of saying
that the spectrum of the true signal IC(x) will have added to it a
broadband signal that has a lot of high frequencies in it:
Spectrum of signal Spectrum of noise
• Here, the objective of signal enhancement is to remove as much
of the high-frequency noise spectrum as possible while
preserving as much of the image spectrum as possible
x x
+
I (x)C N (x)
C
+
38
• This is generally accomplished by applying a low-pass filter of
fairly wide bandwidth (since images are themselves fairly
wideband)
Digital White Noise
• We make a similar model for digital additive white noise:
where is a digital white noise image.
• On the average (!) the elements of will be zero.
Noise Smoothing - Average Filter
• If we replace each pixel in the noisy image by the average of its
local neighbors within an M x M window:
3 x 3
f
39
40
We are biased toward a certain set of frequencies, we have
"colored noise". Spectrum is not flat
In this case, "pink noise" made with a 1/f filter.
Listen to *pinknoise.wav
Compare to *Athabaska_River.wav
Pink noise is common in electronic circuits.
Pink Noise
41
Error Detection and Correction
If the world is so noisy, why don’t I have more errors on iPhone?
Answer: the iPhone uses coding to detect or correct errors
Not limited to digital systems:
There is natural error-correction in language:
o I w-nt t- li-e f-re-er; so fa-, s- g-od!
o Th-s -s wh- pe-p-e can rea- my ha-dwr-ti-g!
There is natural error-correction in DNA:
o Codons GCT, GCC, GCA, GCG all produce Alanine
Credit cards: Let the card number be 𝑥𝑛𝑥𝑛−1 …𝑥2𝒙𝟏. Then 𝑥1
is the check digit and determined such that
∑ 𝑥𝑖
𝑖=1,3,…
+ ∑ 𝐿ሺ𝑥𝑖ሻ
𝑖=2,4,…
is a multiple of 10, where
𝐿ሺ𝑖ሻ = 𝑠𝑢𝑚 𝑜𝑓 𝑑𝑖𝑔𝑖𝑡𝑠 𝑜𝑓 2𝑖
Called Luhn’s algorithm. Also used for IMEI.
ISBNs also have check digits
What is common to all of these?
Redundancy!
42
Types of error in information storage and transfer
We generally have to deal with two types of errors:
Substitution errors: bits may be flipped:
o 1000101 → 1010001
o this → thus
o This type of error occurs both in communication and
storage
Erasure errors: a bit (or hard drive) may be “erased”
o 1000101 → 10?0?01
o this → th?s
o This type of error occurs in storage: damaged sector or
hard drive
A substitution channel may substitute symbol with a given
probability and an erasure channel may erase symbols with a given
probability
If the type of error is not given, we generally mean substitution
error.
43
A simple method used in computers for sixty years: Parity
Bits Suppose we send: 0101
With noise, we receive 0111
We have no indication that an error has occurred
Even parity: add a one or zero so that the total number of ones is
even.
For 0101 we append a 0 since it has two ones. We then transmit
01010, but we receive 01110
The checksum on 01110 is the sum modulo 2, = 1
If we get a one as the result, we know we have an error (and we
can ask to retransmit).
Obviously, this only works if the error is a single bit (or an odd
number of bits). This method assumes that a single bit error is
most likely.
Principle: with parity, we are adding redundant information.
o Redundancy allows us to reconstruct the message from an
imperfect copy
o How do we add redundancy?
o How much is enough?
o How can we measure the improvement?
44
With a binary code, we need to know which bit is incorrect – then
we can immediately correct
45
Reliability in data centers
Google, Facebook, Amazon, etc., manage data centers with
millions of hard drives
Hard drives may fail and lose data. The users’ documents
and photos could be lost forever!
From a 2013 study (Rashmi et al) of a Facebook
warehouse:
o Hundreds of petabytes (1000 TB) of data
o Growing at a few petabytes per week
o Thousands of machines each storing 24-36 TB of data
o Median of 50 nodes unavailable at a given time
o
How to protect data:
o Buy reliable hard drives: expensive and still unreliable
o Buy reliable but super expensive hard drives and back
up data: even more expensive
o Use error-correcting codes
EX 1: Replication with two copies (triple redundancy)
46
EX 2: Single Parity: group hard drives into sets of 3 drives.
Add a forth drive that stores the parity bits:
Drive 1 = 1011110100…
Drive 2 = 1111001001…
Drive 3 = 0110010001…
Drive 4 = 0010101100…
o If any drive is lost, we can recover its data.
EX 3: Reed-Solomon code: 𝑘 data drives and 𝑟 parity
drives. Data can be recovered from any 𝑘 drives
o Better protection
o More flexible design parameters.
In the Facebook warehouse, frequently accessed data is
protected by replication and less frequently accessed data
uses Reed-Solomon with 𝑘 = 10, 𝑟 = 4.
o 40% more storage used
o Out of a group of 14 drives, if any 4 fail, we won’t lose
any data.
47
Double Redundancy Code
Suppose we are sending a single message bit m.
o Let e = error probability for any bit
o Let c be a check bit
o If c = m, then it is a double
redundancy code.
Possible to receive 4 different
messages:
00, 01, 10, 11
𝑝ሺ0 𝑒𝑟𝑟𝑜𝑟𝑠ሻ = ሺ1 − 𝑒ሻሺ1 − 𝑒ሻ [assuming bits are independent
statistically]
o Good outcome
𝑝ሺ1 𝑒𝑟𝑟𝑜𝑟ሻ = ሺ1 − 𝑒ሻ𝑒 + 𝑒 ሺ1 − 𝑒ሻ
o Error but detected
𝑝ሺ2 𝑒𝑟𝑟𝑜𝑟𝑠ሻ = 𝑒2
o Error, undetected
With double redundancy, undetected error probability has
dropped from e to e2.
However, no error correction is possible with this code (why not?)
00 11
01
10
48
Analogy: If you want to know what time it is, you need three
clocks, not two (why?)
Triple Redundancy Code
Send three 0’s or three 1’s for each message bit.
Single bit errors can be corrected using the majority rule:
001, 010, 100 become 0
110, 101, 011 become 1
What is the undetected error rate?
𝑝ሺ0 𝑒𝑟𝑟𝑜𝑟𝑠ሻ = ሺ1 − 𝑒ሻ3
𝑝ሺ1 𝑒𝑟𝑟𝑜𝑟ሻ = 3ሺ1 − 𝑒ሻ2𝑒 (these are correctable)
𝑝ሺ2 𝑒𝑟𝑟𝑜𝑟𝑠ሻ = 3ሺ1 − 𝑒ሻ𝑒2 (these are not corrected – why?)
𝑝ሺ3 𝑒𝑟𝑟𝑜𝑟𝑠ሻ = 𝑒3 (these are undetected)
110 010
100 011
101 001
111
000
Declare 0 Declare 1
49
Another way to decode:
o Declare error if illegal word is received
Observation: if we don’t care to implement correction, we can
detect 2-bit errors and then the undetected error probability is
solely e3
Remember, if we have a message of n bits, the probability of
having x errors is given by the binomial distribution.
𝑷𝒏ሺ𝒙ሻ = (𝒏
𝒙) 𝒆𝒙ሺ𝟏 − 𝒆ሻ𝒙
110 010
100 011
101 001
111
000
Declare 0 Declare error Declare 1
50
Where (𝑛𝑥) = 𝐶𝑥𝑛
is the number of combinations of n things taken
x at a time, =𝑛!
𝑥!ሺ𝑛−𝑥ሻ! and e is the fixed error rate for one bit
51
Observation: adding check digits make the code words “more
different” from each other:
o 0 and 1 differ by one bit
o 00 and 11 differ by two bits
o 000 and 111 differ by three bits
So, the more different the codewords, the lower probability of
error.
How do we make them more different?
o Increase the Minimum Hamming distance
o Hamming distance is the number of places where two words
(group of bits of the same length) differ
o Another way to think about it: how many bits do we need to
flip in one word to make it like the other
110 010
100 011
101 001
111
000
110 010
100 011
101 001
111
000
Triple redundancy code. Minimum
Hamming Distance = 3 Parity check code. Minimum
Hamming Distance = 2
52
Suppose we design a coding scheme: if we want to detect up to K
errors per word, we must have minimum Hamming distance of
𝒅𝒎𝒊𝒏 = 𝑲 + 𝟏 between the code words
Why? Suppose we have two code words with d = K. So, flipping
K bits will make one equal the other.
o If we have K errors, then we decode the incorrect string and
error goes undetected
o If 𝑑 = 𝐾 + 1, we will always have one bit in error for K
errors
Similarly, if we want to correct J errors per word, we must have
𝒅𝒎𝒊𝒏 = 𝟐𝑱 + 𝟏 between code words.
The double redundancy code could detect one error: 𝐾 + 1 = 2,
𝐾 = 1
The triple redundancy code could detect two errors: 𝐾 + 1 = 3,
𝐾 = 2. Or it could correct one error: 2𝐽 + 1 = 3, 𝐽 = 1.
To correct L erasures, need minimum Hamming distance L+1.
Science of Information Fundamental Tenet VIII:
A Hamming distance (between code words) of 𝐾 + 1 is
needed to detect K errors per word, and a Hamming distance
of 2𝐽 + 1 is needed to correct J errors per word.
53
Of course, adding redundancy decreases the rate at which we can
transmit information.
Suppose the channel can support N bits/s
o Using just the message bits, we can send N bit / s with error
rate e
o Using the triple redundancy code, we can send only 𝑁
3 bits/s,
but the undetected error rate drops to 3𝑒2 − 2𝑒3
o So there is a tradeoff between transfer rate and error control
It is possible to get good error control with more favorable
transfer rates by using more sophisticated codes
EX: Block Code
Idea: Use multiple parity checks to locate the position of the error.
o The (7,4) Hamming Code is an example
o 7 total bits: 4 message bits, 3 check bits
o m1, m2, m3, m4 are the message bits
o c1, c2, c3 are the check bits
o 27 = 128 possible bit strings
o 24 = 16 are valid
54
Matrix multiplication review:
We first need to review matrix multiplication:
Let
𝐴 = [
𝑎11 𝑎12 𝑎13
𝑎21 𝑎22 𝑎23
𝑎31 𝑎32 𝑎33
𝑎41 𝑎42 𝑎43
]
and
𝑥 = [
𝑥1
𝑥2
𝑥3
]
We have
𝐴𝑥 = [
𝑥1𝑎11 + 𝑥2𝑎12 + 𝑥3𝑎13
𝑥1𝑎21 + 𝑥2𝑎22 + 𝑥3𝑎23
𝑥1𝑎31 + 𝑥2𝑎32 + 𝑥3𝑎33
𝑥1𝑎41 + 𝑥2𝑎42 + 𝑥3𝑎43
]
A useful way to think about multiplication:
𝐴𝑥 = 𝑥1 [
𝑎11
𝑎21
𝑎31
𝑎41
] + 𝑥2 [
𝑎12
𝑎22
𝑎32
𝑎42
] + 𝑥3 [
𝑎13
𝑎23
𝑎33
𝑎43
]
55
EX:
𝑥 = [010] ⇒ 𝐴𝑥 = [
𝑎12
𝑎22
𝑎32
𝑎42
]
𝑥 = [110] ⇒ 𝐴𝑥 = [
𝑎11 + 𝑎12
𝑎21 + 𝑎22
𝑎31 + 𝑎32
𝑎41 + 𝑎42
]
Question:
We know that
𝐴 = [
1 3 42 2 45 3 83 4 7
]
that 𝑥 is a 3 × 1 vector whose elements are 0/1,
and that
𝐴𝑥 = [
4487
].
What is 𝑥?
What if we knew 𝑥 contains only one 1?
56
Back to error-correcting codes:
Review: 8-bit parity check code:
o Encoding:
Message bits: 𝑚1, 𝑚2, … ,𝑚7
Parity bit 𝑐1 = 𝑚1 ⊕ 𝑚2 ⊕ …⊕ 𝑚7
Transmit [𝑚1, 𝑚2, … ,𝑚7, 𝑐1]
o Checking:
𝑚1 ⊕ 𝑚2 ⊕ …⊕ 𝑚7 ⊕ 𝑐1 = 0
(7,4) Hamming code:
o Encoding:
Message bits: 𝑚1, 𝑚2, 𝑚3, 𝑚4
Parity bit 1: 𝑐1 = 𝑚1 ⊕ 𝑚2 ⊕ 𝑚4
Parity bit 2: 𝑐2 = 𝑚1 ⊕ 𝑚3 ⊕ 𝑚4
Parity bit 3: 𝑐3 = 𝑚2 ⊕ 𝑚3 ⊕ 𝑚4
Transmit [𝑚1, 𝑚2, 𝑚3, 𝑚4, 𝑐1, 𝑐2, 𝑐3]
o Checking:
𝑚1 ⊕ 𝑚2 ⊕ 𝑚4 ⊕ 𝑐1 = 0
𝑚1 ⊕ 𝑚3 ⊕ 𝑚4 ⊕ 𝑐2 = 0
𝑚2 ⊕ 𝑚3 ⊕ 𝑚4 ⊕ 𝑐3 = 0
57
Example: message word = 𝒎𝟏𝒎𝟐𝒎𝟑𝒎𝟒 = 𝟏𝟎𝟏𝟏
𝒄𝟏 = 𝟏 ⊕ 𝟎 ⊕ 𝟏 = 𝟎
𝒄𝟐 = 𝟏 ⊕ 𝟏 ⊕ 𝟏 = 𝟏
𝒄𝟑 = 𝟎 ⊕ 𝟏 ⊕ 𝟏 = 𝟎
𝒙𝟏𝟎𝟏𝟏 =
[ 𝟏𝟎𝟏𝟏𝟎𝟏𝟎]
At the receiver, we check if the parities are correct:
𝑚1 ⊕ 𝑚2 ⊕ 𝑚4 ⊕ 𝑐1 = 0 ⊕ 1 ⊕ 0 ⊕ 1 = 0
𝑚1 ⊕ 𝑚3 ⊕ 𝑚4 ⊕ 𝑐2 = 1 ⊕ 1 ⊕ 1 ⊕ 1 = 0
𝑚2 ⊕ 𝑚3 ⊕ 𝑚4 ⊕ 𝑐3 = 0 ⊕ 0 ⊕ 1 ⊕ 1 = 0
Works fine if there are no errors, but what if there are?
Can we find which bit had an error, if there was one?
58
Claim: the Hamming (7,4) code can correct up to 1 error or
detect up to 2 errors.
o One way to prove this is to write out all 16 valid
codewords and check their pairwise distances.
o Better way: the parity check matrix! The basis for most of
error-correction.
59
We can compute these in matrix form using the parity check
matrix, which also has the advantage of identifying which bit is
erroneous if there is one
Want to write these equations
𝑚1 ⊕ 𝑚2 ⊕ 𝑚4 ⊕ 𝑐1 = 0
𝑚1 ⊕ 𝑚3 ⊕ 𝑚4 ⊕ 𝑐2 = 0
𝑚2 ⊕ 𝑚3 ⊕ 𝑚4 ⊕ 𝑐3 = 0
as matrix equation of the form
𝐻𝑥 = 0
where
𝑥 =
[ 𝑚1
𝑚2
𝑚3
𝑚4
𝑐1
𝑐2
𝑐3 ]
We let
𝐻 = [1 1 0 1 1 0 01 0 1 1 0 1 00 1 1 1 0 0 1
]
60
We then find
𝒔 = [
𝒔𝟏
𝒔𝟐
𝒔𝟑
] = 𝑯𝒙
We want 𝑠1 = 0, the first parity equation is satisfied: 𝑠1 = 1 ⋅ 𝑚1 ⊕ 1 ⋅ 𝑚2 ⊕ 0 ⋅ 𝑚3 ⊕ 1 ⋅ 𝑚4 ⊕ 1 ⋅ 𝑐1 ⊕ 0 ⋅ 𝑐2
⊕ 0 ⋅ 𝑐3 = 𝒎𝟏 ⊕ 𝒎𝟐 ⊕ 𝒎𝟒 ⊕ 𝒄𝟏 = 𝟎
The product 𝑠 = 𝐻𝑥 is called the syndrome. Because it tells us if
there’s anything wrong!
Let’s test this
𝒔 = 𝑯𝒙𝟏𝟎𝟏𝟏 = [𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟎𝟎 𝟏 𝟏 𝟏 𝟎 𝟎 𝟏
]
[ 𝟏𝟎𝟏𝟏𝟎𝟏𝟎]
= [𝟏𝟏𝟎] + [
𝟎𝟏𝟏] + [
𝟏𝟏𝟏] + [
𝟎𝟏𝟎] = [
𝟎𝟎𝟎]
All parities correct!
61
Transmission error:
𝒙𝟏𝟎𝟏𝟏 =
[ 𝟏𝟎𝟏𝟏𝟎𝟏𝟎]
, 𝒙 =
[ 𝟏𝟏𝟏𝟏𝟎𝟏𝟎]
Find syndrome
𝒔 = 𝑯𝒙 = [𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟎𝟎 𝟏 𝟏 𝟏 𝟎 𝟎 𝟏
]
[ 𝟏𝟏𝟏𝟏𝟎𝟏𝟎]
= [𝟏𝟏𝟎] + [
𝟏𝟎𝟏] + [
𝟎𝟏𝟏] + [
𝟏𝟏𝟏] + [
𝟎𝟏𝟎] = [
𝟏𝟎𝟏]
There’s an error somewhere! But where?
62
Transmission error (example 2):
𝒙𝟏𝟎𝟏𝟏 =
[ 𝟏𝟎𝟏𝟏𝟎𝟏𝟎]
, 𝒙 =
[ 𝟏𝟎𝟏𝟎𝟎𝟏𝟎]
Find syndrome
𝒔 = 𝑯𝒙 = [𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟎𝟎 𝟏 𝟏 𝟏 𝟎 𝟎 𝟏
]
[ 𝟏𝟎𝟏𝟎𝟎𝟏𝟎]
= [𝟏𝟏𝟎] + [
𝟎𝟏𝟏] + [
𝟎𝟏𝟎] = [
𝟏𝟏𝟏]
There’s an error somewhere! But where?
63
𝒙𝟏𝟎𝟏𝟏 =
[ 𝟏𝟎𝟏𝟏𝟎𝟏𝟎]
, 𝒙 =
[ 𝟏𝟏𝟏𝟏𝟎𝟏𝟎]
𝒙 = 𝒙𝟏𝟎𝟏𝟏 + 𝒆
𝒆 =
[ 𝟎𝟏𝟎𝟎𝟎𝟎𝟎]
𝒔 = 𝑯𝒙 = 𝑯ሺ𝒙𝟏𝟎𝟏𝟏 + 𝒆ሻ = 𝑯𝒙𝟏𝟎𝟏𝟏 + 𝑯𝒆 = 𝑯𝒆
[𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟎𝟎 𝟏 𝟏 𝟏 𝟎 𝟎 𝟏
]
[ 𝟎𝟏𝟎𝟎𝟎𝟎𝟎]
= [𝟏𝟎𝟏]
If error is the i’th bit, then the syndrome will be the i’th
column of H.
64
The (7,4) Hamming code with the following parity check matrix
[𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟎𝟎 𝟏 𝟏 𝟏 𝟎 𝟎 𝟏
]
can correct any single error because all its columns are different.
The minimum Hamming distance of (7,4) Hamming code is 3.
With four parity bits → ሺ15,11ሻ Hamming code
[
𝟏 𝟏 𝟏 𝟎 𝟎 𝟎 𝟏 𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟎 𝟎𝟏 𝟎 𝟎 𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟎 𝟎𝟎 𝟏 𝟎 𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟏 𝟏 𝟎 𝟎 𝟏 𝟎𝟎 𝟎 𝟏 𝟎 𝟏 𝟏 𝟎 𝟏 𝟏 𝟏 𝟏 𝟎 𝟎 𝟎 𝟏
]
ሺ2𝑟 − 1,2𝑟 − 1 − 𝑟ሻ Hamming code has 𝑟 parity bits and 2𝑟 −1 − 𝑟 message bit, with minimum distance 3.
A parity check code can correct 𝑡 errors if the sum of every ≤ 𝑡
columns are different.
EX: parity check matrix for quintuple redundancy code (can
transmit 00000 and 11111):
𝐻 = [
1 1 0 0 01 0 1 0 01 0 0 1 01 0 0 0 1
]
65
The sum of every 2 columns is different, so it can correct two
errors.
The sum of the first 2 columns and the last 3 columns is the
same, so it cannot correct 3 errors.
66
For the (7,4) Hamming Code:
o Signaling rate decreases by 4/7
o Can correct 1 error
𝑑𝑚𝑖𝑛 = 2 × 1 + 1 = 3
o Can detect up to 2 errors
𝑑𝑚𝑖𝑛 = 2 + 1 = 3
o Works well when errors are far apart
o Not good for burst errors (e.g., scratch on CD)
Can use interleaving to improve performance
o Record first bits of four different words, then second bits,
then third bits, then fourth bits
o If a burst error occurs, may be able to recover (since doesn’t
change entire word)
To generalize, if we have d data bits and p parity bits, the
following condition must hold for the Hamming Code:
𝑑 + 𝑝 + 1 ≤ 2𝑝
To create a Hamming code for d data bits and p parity bits, first
you have to generate the column vector that contains the data bits
67
and the parity bits. Then, you have to construct an H matrix that
essentially checks the parity bits.
68
Channel Capacity
Suppose we have a channel that has probability of error 𝑒
Let X be the input to the channel, which is random from our point
of view
We see Y which is a noisy version of X
How much information per transmission can be carried by the
channel? This is the channel capacity 𝐶
o Total information output of the channel = H(Y)
o This includes both information about X and about the noise.
o How much information about X is in Y?
Let 𝐻𝑋ሺ𝑌ሻ be the amount of information left in Y if we
know X. This is the amount of useless information in Y.
𝐻𝑋ሺ𝑌ሻ = 𝑒 log1
𝑒+ ሺ1 − 𝑒ሻ log
1
1−𝑒 = ℎሺ𝑒ሻ
o Useful information = 𝐻ሺ𝑌ሻ − 𝐻𝑋ሺ𝑌ሻ
0
1
0
1
1 − 𝑒
1 − 𝑒
𝑒
𝑒
69
o Best case scenario: 𝐻ሺ𝑌ሻ = 1 ⇒ 𝐶 = 1 − ℎሺ𝑒ሻ
Shannon showed that we can transmit 𝐶 bits of information with
arbitrarily small error but not more than 𝐶
70
More generally, suppose the output is equal to input plus noise
(input is not necessarily 0 or 1)
Assume the bandwidth is B and signal-to-noise ratio S/N
How much information can the channel carry?
o Observation: There is a tradeoff between bandwidth and
SNR.
o Can we get away with reduced power given increased
bandwidth?
Science of Information Fundamental Tenet IX
(The Shannon-Hartley Law):
Given a channel with bandwidth B and signal-to-noise ratio
S/N, the channel capacity (C) is
𝑪 = 𝑩 𝒍𝒐𝒈𝟐 (𝟏 +𝑺
𝑵)