Week 6 Coding systems and error detection the Hamming distance between valid codes is 2) parity can...

Computer Mathematics

Week 6Coding systems and error detection

Department of Mechanical and Electrical System Engineering

last week

Mouse Keyboard GPU Audio

Input / OutputController

Universal Serial Bus PCI Bus

Central Processing Unit

addressbus

CU

PCIR

ALU

registers

PSR

DR

operationselect

incrementPC

AR

RandomAccessMemory

0

4

8

16

20

24

28

databus

HDD SSD Net

coding theory

� source coding

information theory concept

� information content

binary codes

� numbers

� text

variable-length codes

� UTF-8

compression

� Huffman’s algorithm

2

this week



Mouse GPU


addressbus

CU

PCIR

ALU

registers

PSR

DR

operationselect

incrementPC

AR

RandomAccessMemory

0

4

8

16

20

24

28

databus

Keyboard HDD Audio SSD Net

coding theory

� channel coding


� Hamming distance

error detection

� motivation

� parity

� cyclic redundancy checks

error correction

� motivation

� block parity

� Hamming codes

3

channel coding — motivations

channel coding protects against data corruption

while being transmitted

� drop outs, dead zones

� electromagnetic interference... 1 0 ... ... 0 1 ...

transmitter receiver

while being stored

� device failure

� ionising radiation

0

00

0

0

00

0

0

0

0

1

1

1

1

1

1 1

1

1 1

1

1

1

0

4

channel coding — approach

add information to a message, allowing corruption to be detected

recover uncorrupted data by

� retransmission– receiver asks for the same data again

� redundancy– repetition: multiple copes of the same data are transmitted/stored, or– encoding: additional bits allow small errors to be identified and corrected

the medium can affect the choice of mechanism, for example

� networks: retransmission is always possible

� DVD/Blu-ray: retransmission is never possible

5

error detection using repetition

send each bit once: 0 or 1 0−→ −→ 1

� no error detection or correction

send each bit twice: 00 or 11 00−→ −→ 01 / 10

� single bit error detection

� no error correction

send each bit three times: 000 or 111 000−→ −→ 001 / 010 / 100

� double bit error detection

� single bit error correction

6


Hamming distance

� the number of bits of difference between two valid codes

valid Hamming error errorcodes distance detection correction

0, 1 1 0 bits 0 bits00, 11 2 1 0

000, 111 3 2 10000, 1111 4 3 100000, 11111 5 4 2

d d− 1 d−12

sending each bit twice (as 00 or 11)

� repeats the data verbatim (obviously), but also

� guarantees an even number of bits are transmitted per bit of information

we can generalise this last idea. . .

7

parity

parity

mathematics: the property of an integer with respect to being odd or even

computing: the state of being odd or even used to detect errors in binary-codeddata

the parity of a block of bits is

� even, if there is an even number of 1s

� odd, if there is an odd number of 1s

0000 even parity0001 odd parity0010 odd parity0011 even parity0100 odd parity

consider a single bit error in a block of bits

� either a 0 changes to a 1

� or a 1 changes to a 0 01010101 even parity01011101 odd parity

in either case, the number of 1s in the block changes by 1

� the parity changes (from odd to even, or from even to odd)

8

parity bits

idea: add a redundant bit to each block of data that forces it to have even parity; e.g.,

� if the block of data has even parity, add a 0 to the end

� if the block of data has odd parity, add a 1 to the end

(or you could choose to force odd parity, it doesn’t matter)

00000 even parity00011 even parity00101 even parity00110 even parity01001 even parity

↑ parity bitrecall: changing a single bit changes the parity

� the Hamming distance between valid codes is 2

⇒ parity can detect 1-bit errors (d− 1 = 1), but cannot correct errors (d−12 = 0)

advantages

� trivial to implement (add the bits, modulo 2)

� can trade between block size and reliability

disadvantages

� longer blocks have higher chance of undetectable double bit errors

� error recover requires retransmission let’s fix both of those. . .9

error detection for larger blocks — checksums

a checksum is a number calculated from the content of some data

� if the content changes, the checksum also changes

they can be used to ‘sign’ data, proving the content is authentic (or uncorrupted)

� on reception, verify that the checksum is correct

� also called ‘digital signatures’, or ‘digests’

popular checksums for digitally signing large blocks of data:

� md5 (e.g., the ‘md5’ program on Linux and Mac)

� sha256 (the ‘shasum’ program on Linux and Mac)

popular checksum for detecting changes in smaller blocks of data

� CRC-32 (e.g., the ‘sum’ program on Linux and Mac)

� CRC = Cyclic Redundancy Check

10

creating a cyclic redundancy check code

a n-bit CRC is the remainder after dividing the data by a large number

� a n+ 1 bit divisor is known to both sender and receiver

� the dividend consists of the original message data bits

� n additional bits, added to the right of the data, are initially all 0– this is where the remainder will appear

then a long division is performed; repeatedly:

� align the leftmost 1of the divisor with the leftmost 1of the dividend (data)

� ‘subtract’ the divisor from the dividend, using modulo-2 arithmetic on individualbits

(this will always change the leftmost 1 in the dividend to a 0)

until the dividend contains only 0s

the n additional bits to the right of the dividend now contain the ‘remainder’

� this is your n-bit CRC

� transmit it along with the message

11

verifying a cyclic redundancy check code

when the message is received

� repeat the CRC process described above, but

� initialise the n additional bits with the received CRC code

� when the algorithm finishes, if the data is undamaged, the n CRC bits will all be 0

using a 3-bit divisor 101, a 2-bit CRC for 10101110 is

transmitter10101110 0010100001110 00

10100000100 00

1010000000100

10100000000 01

−→

receiver10101110 0110100001110 01

10100000100 01

1010000000101

10100000000 00 ok

receiver (corrupted)10100110 0110100000110 01

1010000011101

10100000010 01

10 100000000 11 bad!

12

cyclic redundancy checks in communications

CRC-32 is a 32-bit cyclic redundancy check code

� the data validation code is a 32-bit checksum

� the checksum is based on a cyclic algorithm(a kind of long division, of the data by a 33-bit divisor)

� it is added to the data redundantly(it expands the message without adding new information)

CRCs are popular because they are

� simple to implement in hardware, including for serial streams of bits

� very good at detecting errors caused by electromagnetic noise

CRC-32 is used in Ethernet networks to detect corrupted packets

13

error correction — block parity

parity tells us if a word of data has an error

in a sequence of words

even parity bit ↓01010000 001100001 101110010 0011010010

? → 01100100 001111001 1

� we know which word is corrupted, but

� we do not know which bit in the word is corrupted

thinking two-dimensionally

� we know the row

� we do not know the column

to identify the column, use a parity word

even parity bit ↓01010000 001100001 101110010 0011010010

? → 01100100 001111001 1

even parity word→ 001001110↑?

� a group of parity bits working on columns, not rows

� identifies which column contains the corrupted bit

the parity bits provide lateral parity

the parity words provide longitudinal parity, for a few data words at a time

14

Hamming codes

block parity considers data as a 2-dimensional array of bits with (x, y) coordinates

if the bit at (x, y) is corrupted, we can identify it because

� its row’s lateral parity bit will be wrong, telling us y

� its column’s longitudinal parity bit will be wrong, telling us x

instead of (x, y) addressing within an array of bits, we could number them

� then use parity bits to identify the number of a bit that is corrupted

15

Hamming codesnumber the bits, starting from 1

any bit whose number is a power of 2 is a parity bit, pn (numbered 2n)

each parity bit pn checks all bits whose binary numbering includes 2n

� for example, bit 9 is numbered 10012 in binary, and so

� it is included in the parity calculations of p3 and p0 (because 23 + 20 = 9)

message: p0p1 0 p2 1 1 1 p3 0 1 0 0

bit number: 1 2 3 4 5 6 7 8 9 1 1 10 1 2

in binary: 1 0 1 0 1 0 1 0 1 0 1 0 ← 1 means this bit is included in p0

0 1 1 0 0 1 1 0 0 1 1 0 ← 1 means this bit is included in p1



encoded: 0 1 0 1 1 1 1 1 0 1 0 0

using even parity: p0 = 001100bits 135791

1

, p1 = 101110bits 236711

01

, p2 = 11110bits 45671

2

, p3 = 10100bits 89111

012

(the pattern can be extended to the right, for as many data and parity bits as needed)16

Hamming codes

example:

1 2 3 4 5 6 7 8 9 10 11 12

stored data: 0 1 0 1 1 1 1 1 0 1 0 0retrieved data: 0 1 0 1 1 0 1 1 0 1 0 0

check for correct (even) parities:

p0 = 001100bits 135791

1

ok

p1 = 100110bits 236711

01

wrong

p2 = 11010bits 45671

2

wrong

p3 = 10100bits 89111

012

ok

the incorrect parities are p2 and p1, so the corrupted bit is number 22 + 21 = 4+ 2 = 6

17

channel coding in action

redundant arrays of identical disks for server data storage

� repetition: RAID 1

� parity: RAID 3, RAID 5

ECC (error correcting code) RAM for server memory

� uses Hamming codes plus one additional parity bit

� 100% reliably performs single error correction, double error detection (SECDED)

� essential, if you care at all about losing 1 bit per hour per gigabyte of memory!

18

next week



Mouse GPU


addressbus

CU

PCIR

ALU

registers

PSR

DR

operationselect

incrementPC

AR

RandomAccessMemory

0

4

8

16

20

24

28

databus

Keyboard HDD Audio SSD Net

the mathematics of logic circuits

� the foundation of all digital design

Boolean logic

� when 0 and 1 represent true and false

Boolean algebra

� Boolean functions

� canonical forms

simplification of Boolean expressions

� de Morgan’s laws

19

homework

practice generating and checking parity bits

� write a Python program to generate parity bits– simulate some errors in the data– detect the errors by checking the parity

practice block parity

� extend your Python program to correct single-bit errors

practice generating small CRC codes

� write a Python program to generate a CRC for strings

� using the same divisor, compare your result to somebody else’s result

ask about anything you do not understand

� from any of the classes so far this semester (or the lecture notes)

� it will be too late for you to try to catch up later!

� I am always happy to explain things differently and practice examples with you

20

glossary

channel coding — a form of encoding that adds redundant data to a message to allow detection and/orcorrection of message corruption.

checksum — a number that characterises the content of a block of data. The checksum is included withthe data during transmission, and verified during reception.

corruption — damage to a message during storage or communication, causing the values of bits tochange.

Cyclic Redundancy Check — a checksum based on modular division, especially good at detecting longsequences of corrupted bits as can occur in network communication.

error correction — identifying the specific bit that was corrupted, allowing it to be repaired.

error detection — identifying the presence of a corrupted bit, allowing retransmission of the message tobe requested.

even parity — a property of a block of data in which the number of 1s is even.

even parity bit — a bit added to a block of data that guarantees the data will have even parity.

Hamming code — a parity-based encoding of data that allows a corrupted bit to be identified andcorrected.

21

Hamming distance — the number of bits that must be changed to convert a valid message into anothervalid message.

lateral parity — parity that works ‘horizontally’ across a row of bits.

longitudinal parity — parity that works ‘vertically’ down a column of bits.

medium — anything that can carry a message. Media include copper wire for electrical messages, glassfibre for optical messages, space for electromagnetic radio messages, etc.

odd parity — a property of a block of data in which the number of 1s is odd.

odd parity bit — a bit added to a block of data that guarantees the data will have odd parity.

packet — in a computer network, a single unit of transmission (and retransmission, in the case of error).

parity — the quality of being even or odd.

redundant — data that is added to a message that does not increase the information content of themessage. Redundant data adds the ability to detect or correct errors within the message content.

repetition — transmission (or storage) of the same information multiple times.

retransmission — transmission of a message that has already been transmitted, but was found to containerrors when received.

22

Date post:	06-Jul-2020
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Week 6 Coding systems and error detection the Hamming distance between valid codes is 2) parity can...

Documents