Date post: | 20-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 0 times |
Erasure Correcting Codes
In The Real World
Udi Wieder
Incorporates presentations made by Michael Luby and Michael Mitzenmacher.
Based On.. Practical Loss-Resilient Codes
Michael Luby, Amin Shokrollahi, Dan Spielman, Bolker Stemann STOC ’97
Analysis of Random Processes Using And-Or Tree Evolution Michael Luby, Amin Shokrollahi
SODA ’98
LT Codes Michael Luby
STOC 2002
Online Codes Petar Maymounkov
Probabilistic Channels1-p
1-p
p
p
0
1
0
1
?
1-p
1-p
p
p
0
1
0
1
The binary erasure channel
The binary symmetric channel
Content
Encoding
Received
Content
Encoding
Decoding
Transmission
Erasure Codes
n
cn
≥n
n
Performance Measures Time Overhead
The time to encode and decode expressed as a multiple of the encoding length.
Reception Efficiency Ratio of packets in message to packets needed to decode.
Optimal is 1.
Known Codes Random Linear Codes (Elias)
A linear code of minimum distance d is capable of correcting any pattern of d-1 or less erasures.
Achieves capacity of the channel with high probability, i.e. can be used to transmit over erasure channel at any rate R<1-p.
Decoding time O(n3). Unacceptable.
Reed-Solomon Codes Optimal reception efficiency with probability 1. Decoding and Encoding in Quadratic time. (About one minute to
encode 1MB).
Tornado Codes
Practical Loss-Resilient CodesMichael Luby, Amin Shokrollahi, Dan Spielman, Bolker
Stemann (1997)
Analysis of Random Processes Using And-Or Tree EvolutionMichael Luby, Amin Shokrollahi (1998)
Low Density Parity Check Codes Introduced in the early 60’s by Gallager and were reinvented many
times.
Message bits
Check bits
a b c d e f g h i j k l
eba
The time to encode is proportional to the number of edges.
Encoding Process.
Bipartite
Graph
Bipartite
Graph
Standard Loss-Resilient Code.
Length of message: k
Check bits:
Rate: 1-
Decoding Rule Given the value of a check bit and all but one of the
message bits on which it depends, set the missing message bit to be the XOR of the check bit and its known message bits.
XOR the message bit with all its neighbors.
Delete from the graph the message bit and all edges to which it belongs.
Decoding ends (successfully) when all edges are deleted.
Decoding Process
a
?
c
d
?
f
?
?
b
gb
hge
hgeb
Decoding Process
?
?
?
?
b
gb
hge
hgeb
g
hge
Regular Graphs
Random Permutation of the EdgesDegree 3 Degree 6
3-6 Regular Graph Analysisleft leftright
x = Pr[ not recovered ]
Pr[ all recovered]
= (1-x)5
Pr[ not recovered]
= ¢ (1-(1-x)5)2
Decoding to Completion (sketch) Most message bits are roots of trees.
Concentration results (edge exposure martingale) proves that all but a small fraction of message bits are decoded with high probability.
The remaining bits are decoded do to expansion. (Original graph is a good expander on small sets).
If a set of size s and average degree a has more than as/2 neighbors then a unique neighbor exists and decoding continues.
EfficiencyEncoding time (sec), 1k packets
size Reed-Solomon Tornado
250k 4.6 0.06
500k 19 0.12
1 MB 93 0.26
2 MB 442 0.53
4 MB 1717 1.06
9 MB 6994 2.13
16 MB 30802 4.33
Decoding time (sec), 1k packets
size Reed-Solomon Tornado
250k 2.06 0.06
500k 8.4 0.09
1 MB 40.5 0.14
2 MB 199 0.19
4 MB 800 0.40
9 MB 3166 0.87
16 MB 13829 1.75
Rate = 0.5
Erasure probability = 0.5
Implementation = ?
LT Codes
LT CodesMichael Luby (2002)
‘Rateless’ Codes A different model of transmition. Sender sends an infinite sequence of encoding
symbols. Time complexity: Average time for encoding a symbol.
Erasures are independent of content. Receiver may decode when received enough
symbols. Reception efficiency.
‘Digital Fountain’ approach.
Applications Unreliable Channels.
In Tornado codes small rate implies big graphs and therefore a lot of memory (proportional to the size of the encoding).
Multi-source download. Downloading from different servers requires no coordination. Efficient exchange of data between users requires small rate of
the source.
Multi-cast without feedback (say over the internet). Rateless codes are the natural notion.
Trivial Examples - Repetition Each time unit send a random symbol of the
code. Advantage: Encoding complexity O(1). Disadvantage: Need k’ = k ln(k/) code symbols to
cover all k content symbols with failure probability at most .
Example:
k = 100,000, =10-6
Reception overhead = 2400% (terrible)
Trivial Examples – Reed Solomon Each time unit send an evaluation of the
polynomial on a random point.
Advantage: Decoding possible when k symbols received.
Disadvantage: Large time complexity for encoding and decoding.
Parameters of LT Codes Encoding time complexity O(ln n) per symbol.
Decoding time complexity O(n ln n).
Reception efficiency: Asymptotically zero (unlike Tornado codes).
Failure probability: very small (smaller than Tornado).
2
Content
Insert header, and sendDegree Prob
1 0.055
0.0004
0.32
0.13
0.084
100000
Degree Dist.
XOR content symbols
Choose degree
Choose 2 random content symbols
LT encoding
1
Content
Insert header, and sendDegree Prob
1 0.055
0.0004
0.32
0.13
0.084
100000
Degree Dist.
Copy contentsymbol
Choose degree
Choose 1 randomcontent symbol
LT encoding
4
Content
Insert header, and sendDegree Prob
1 0.055
0.0004
0.32
0.13
0.084
100000
Degree Dist.
XOR contentsymbols
Choose degree
Choose 4 randomcontent symbols
LT encoding
LT encoding properties Encoding symbols generated independently of each other
Any number of encoding symbols can be generated on the fly
Reception overhead independent of loss patterns The success of the decoding process depends only on the degree
distribution of received encoding symbols. The degree distribution on received encoding symbols is the same
as the degree distribution on generated encoding symbols.
1. Collect enough encoding symbols and set up graph between encoding symbols and content symbols to be recovered
3. Copy value of encoding symbol into unique neighbor, XOR value of newly recovered content symbol into encoding symbol neighbors and delete edges emanating from content symbol.
3. Copy value of encoding symbol into unique neighbor, XOR value of newly recovered content symbol into encoding symbol neighbors and delete edges emanating from content symbol.
2. Identify encoding symbol of degree 1. STOP if none exists.2. Identify encoding symbol of degree 1. STOP if none exists.
1. Collect enough encoding symbols and set up graph between encoding symbols and content symbols to be recovered
4. Go to Step 2.4. Go to Step 2.
Content (unknown)
LT decoding
Releasing an encoding symbol
xx-1x-1 recovered
content symbols
i-2
k-x unrecoveredcontent symbols
xth recoveredcontent symbol
releases encoding symbol
encoding symbol of degree i
content symbol can be recovered
by encoding symbol
The Ripple Definition: At each decoding step, the ripple is the set of
encoding symbols that have been released at any previous decoding step but their one remaining content symbol has not yet been recovered.
xx recovered
content symbolsk-x unrecoveredcontent symbols
encoding symbolsin the ripple
collision
Successful Decoding Decoding succeeds iff the ripple never becomes empty
Ripple small Small chance of encoding symbol collisions small reception overheadRisk of ripple becoming empty due to random fluctuations is large
Ripple largeLarge chance of encoding symbol collisions large reception overheadRisk of ripple becoming empty due to random fluctuations is small
LT codes ideaControl the release of encoding symbols over the
entire decoding process so that ripple is never empty but never too large
Very few encoding symbol collisionsVery little reception overhead
Release probability Definition: Release probability for degree i encoding
symbols at decoding step x is q(i,x).
Proposition:For i = 1: q(i,x) = 1 for x = 0, q(i,x) = 0 for all x > 1For i > 1: for x = i -1, …, k-1,
2
1
1
( 1) ( )( , )
1
i
j
i
j
i i k x x jq i x
k j
Release probability
xx-1x-1 recovered
content symbols
i-2
k-x unrecoveredcontent symbols
xth recoveredcontent symbol
releases encoding symbol
encoding symbol is released at decoding step x
content symbol can be recovered
by encoding symbol
Release distributions for specific degrees
i = 2
i = 3
i = 4
i = 10
i = 20
k = 1000
Overall release probability
Definition: At each decoding step x, r(x) is the overall probability that an encoding symbol is released at decoding step x with respect to specific degree distribution p(·)
Proposition: ( ) ( ) ( , )i
r x p i q i x
Uniform release question
Question: Is there a degree distribution such that the overall release distribution is uniform over x?
Why interesting?One encoding symbol released for each content symbol decodedRipple will tend to stay small minimize reception overheadRipple will tend not to become empty decoding will succeed
Uniform release answer: YES! Ideal Soliton Distribution:
1(1)
1For all 1, ( ) ( 1)
p k
i p i i i
Ideal Soliton Distribution
k = 1000
A simple way to choose from Ideal SD
Choose A uniformly from the interval [0,1)
If then degree
Else degree = 1.
0
1/k
1/6
1/5
1/4 1/3 1/2 1
234561/k
Value of A
Degree
1A
1A k
Ideal SD Theorem: The overall release distribution is exactly uniform, i.e., r(x) = 1/k for all x = 0,…,k-1.
Ideal SD theorem
Overall release distribution for Ideal SD
Release
Distribution
k = 1000
In expected value …
Optimal recovery with respect to Ideal SDReceive exactly k encoding symbolsExactly one encoding symbol released before any decoding steps, recovers one content symbolAt each decoding step a content symbol is recovered, it releases exactly one new encoding symbol, which in turn recovers exactly one more content symbolRipple size always exactly 1
Performance AnalysisNo reception overheadAverage degree
21 1( ) H( ) ln( )1
k
i ii p i k kk i
When taking into account random fluctuations …
Ideal Soliton Distribution fails miserablyExpected behavior not equal to actual behavior because of varianceRipple very likely to become emptyFails with very very high probability (even with high reception overhead)
Robust Soliton Distribution design Need to ensure that the ripple never empties
At the beginning of the decoding process ISD: ripple is not large enough to withstand random fluctuations RSD: boost p(1)=c/ sqrt{k} so that expected ripple size at beginning is c *sqrt{k}
At the end of the decoding processISD: expected rate of adding to the ripple not large enough to compensate for collisions towards the end of the decoding process when ripple is large relative to the number of unrecovered content symbolsRSD: boost p(i) for higher degrees i so that expected ripple growth at the end of the decoding process is higher
LT Codes – Bottom line Using the Robust Soliton Distribution:
Number of symbols needed to recover the data with probability is:
The average degree of an encoding symbol is:
Online Codes
Online Codes
Petar Maymounkov
We are out of time