CorrectingLocalized DeletionsUsingGuess&CheckCodes
SalimElRouayheb
JointworkwithSergeKas HannaandHieu Nguyen
RutgersUniversity
55thAnnualAllertonConferenceonCommunication, Control,andComputing
Motivation
2
• Ourmotivation:filesynchronization,E.g.Dropbox
Alice Bob
10101010…
• Recentapplication:DNA-basedstorage
• DeletionswerefirststudiedbyVarshamov-Tenengolts (‘65)andLevenshtein (‘66)
110010…
• Deletions:10101010 110010Transmitted Received
LocalizedDeletions
3
• Motivation:filesynchronization,E.g.Dropbox
Localizededits
PreviousWorkonDeletions
4
Ø Unrestricteddeletions
Ø Bursty deletions
• Filesynchronization:[Maetal.‘11]
• Codeconstructions: [Levenshtein ‘67],[Chengetal.14],[Schoeny etal‘17]
Existenceofcodesforlocalizedmodelw=3,4
• Informationtheoreticapproach:[Gallager ’61],[Dobrushin ‘67];lowerandupperboundsonthecapacity:[Mitzenmacher andDrinea ‘06],[Diggavi etal.‘07],[Kanoria andMontanari ‘13],[Venkataramanan etal.’13]…
• Codeconstructionsandfundamentallimits:[Varshamov andTenenglots‘65],[Levenshtein ‘66],[SchulmanandZuckerman ‘99],[Helberg andFerreira ‘02],[Cullina andKiyavash ‘14],[Gabrys etal.‘16],[Brankensieketal.’16],[Thomas etal.’17]…
• Recentfilesynchronizationalgorithms:[Yazdi andDolecek ‘14],[Venkataramanan etal.‘15],[Salaetal.’17] …
ModelandContribution
5
! = 101010111000100100010110101110000 01010001100111111000010
Ø # ≤ % deletions localizedinawindowofsize%windowofsize%
# deletions(inred)
Ø Ourassumptions:1)positionsofthedeletions areindependent ofthecodeword;2)informationmessage isuniform iid
Ø Contribution:Explicitcodeswithdeterministicpolynomialtimeencodinganddecodingthatcancorrectlocalizeddeletionswhp
• Logarithmicredundancy:& − ( = ) log ( +% + 1
• Polynomialtimeencodinganddecoding• Asymptoticallyvanishingprobabilityofdecoding failure• Canbegeneralized tomultiplewindows
Guess&Check(GC)Codes
Ø Hardproblemfor% = &
Ø [Schoeny etal.’17]:existenceofcodesfor% = 3,4
GCCodesExample
6
• Encodingthemessageoflengthk=16: 1110000000110001
1110000000110001GF(17)
14 0 3 1 (6,4)MDS 14 0 3 1 1 0
Ø Guess1:
• Decoding
111000000000112 0 1?
5 12 0 1
2MDSparities
Ø Guess2: 14 3 0 1
Ø Guess3: 14 0 3 1
Ø Guess4: 14 0 0 4
1110 000000001
1110000000001
1110000000001
0
Decodedusing1st parity
Checkwith2nd parity
1110000000001
• Assumethatthedeletions (inred)affectonlyonesystematicblock
16bits 13bits
111000000011 0001
GF(17)
[Kas HannaandElRouayhebISIT17’]
Deletionsoccurinoneofthesewindows
GeneralizingtoAnyWindowPosition
7
Ø Guess1:
• Decoding,windowofsizelogkcanaffectatmost2consecutive blocks
11000001100013 1?
14 4 3 1
Ø Guess2: 12 7 2 1
Ø Guess3: 12 1 11 15
1100000110001
1100000110001
12
Decodedusing1st &2nd parity
Checkwith3rd parity
• Sameencodingwithoneextraparity
1110010000110001GF(17)
(7,4)MDS 14 4 3 1 5 83MDSparities
12logkbits
1110010000110001 1100000110001
• Assumethat3deletions (inred)affect systematicbits,w=logk=4bits
16bits 13bits
window
RecoveringtheMDSparities
8
• Buffer:wzeros +asingleone
Ø Howtorecover theMDSparitysymbolsatthedecoder?
Ø Trivialsolution:repeat theparitybits
systematicbits
(# + 1) repetitioncode
Ø Better solution:insertabuffer betweensystematicandMDSparitybits
• Ifparitybitsgetaffected ->simplyoutputthefirst( bits.ElseapplyGuess&Checkdecoding
• Deletionscannotaffect bothsystematicandparitybitssimultaneously
111000000011000100010000MDSparities
systematicbits
11100000001100010000000000001111 0… 0
systematicbits
111000000011000100001 00010000MDSparitiesbuffer
w+1bits
Whendoesdecodingfail?
9
• Encodingthemessageoflengthk=16: 1100010001110010
1100010001110010GF(17)
14 4 7 2 (6,4)MDS 14 4 7 2 8 13
Ø Guess1:
• Decoding
00100011100104 7 2?
14 4 7 2
2MDSparities
Ø Guess2: 2 14 7 2
Ø Guess3: 2 3 1 2
Ø Guess4: 2 3 9 11
0010001110010
001000111 0010
0010001110010
13
Decodedusing1st parity
Checkwith2nd parity
logkbits
0010001110010
• Assumethatthedeletions (inred)affectonlyonesystematicblock
16bits 13bits
1100 010001110010
GF(17)
DecodingFailure
Simulations– DecodingFailure
10
( (messagelengthinbits)
Prob
abilityofFailure
4 = 106 iterations
• Simulationresults for:% = log ( , # = log ( − 1 , c = 3,4MDSparities
MainResults
11
Sketch of Theorem 2 (8 > : windows):
Ø Encoding complexity is ;(( log (), Decoding complexity is ;((<=>)
Ø Probability of decoding failure: Pr A ≤ (<(B=C)DE
Ø Redundancy: )(F% + 1) log (
Theorem 1 (One window): Guess & Check (GC) codes can correct inpolynomial time up to # ≤ % = G(log () localized deletions, whereH log ( < % < H + 1 log ( for some integer H ≥ 0. Let ) > H+ 2 be aconstant integer.
Ø Encoding complexity is ;(( log (), Decoding complexity is ;((L/ log ()Ø Probability of decoding failure: Pr A ≤ (B=CDE/ log (
Ø Redundancy: ) log ( + %+ 1
Sketch of Theorem 3 [ISIT ‘17] (Unrestricted deletions):Ø Redundancy: )(# + 1) log (
Ø Encoding complexity is ;(( log (), Decoding complexity is ;((N=>/ logN ()Ø Probability of decoding failure: Pr A = ;((>NDE/ logN ()
TestGCCodesOnline
12
Ø C++&PythoncodesareavailableonGitHub
GitHub repository:https://github.com/serge-k-hanna/GC
Ø TestthecodesonlineusingtheJupyter notebook
Goto:https://try.jupyter.org/
Upload thenotebook filesfromhttp://eceweb1.rutgers.edu/csi/software.html
Ø Formoredetails:http://eceweb1.rutgers.edu/csi/software.html
Simulations– RunningTime
13
C++:Earlyterminationwithprecomputing
Python:Allcaseswithoutprecomputing
Python:Allcaseswithprecomputing
Python:Earlyterminationwithprecomputing
DecodingFailures:WhatHappened.
13
• Exampleforonedeletion:16-bitmessage0000100011110110
Ø (6,4)MDSencodingoverGF(17): 0 8 15 6 12 52parities
Ø Suppose14th bitgetsdeleted,decoding:v Guess1: 8 4 7 10
v Guess4: 0 8 15 6
Guesses1&4satisfythe2parities
• Probabilityofdecoding failureforagivenstring:combinatorialproblemthatdependsonthestringanddeletionposition
v Guess2:
v Guess3:8 4 7 10
8 4 7 10
• Decodingfailure:morethanonepossibleguess,different decodedstrings
• Proofapproach:assumemessage isuniformiid,averageoverallpossiblemessages
DecodingFailure– 1Deletion15
Setofalltransmittedk-bitstrings
B B
A
SetofallGCdecoderoutputs
Setsatisfyingall) parities
AssumeWLOGthatGuess 1iscorrect,observetheoutputofdecoderatwrong GuessO ≠ 1
DecodingfailsifdecodedstringisinA
Lemma:atmost2differenttransmitted sequences canleadtothesamedecodedstring inanyGuessO ≠ 1
Setsatisfyingfirstparity
QR decodingfailureinguessO ≠ 1 = QR(decodedstringisin])
···
·
··
14
Fixed:- Guess- Deletion- Firstparity
ProofofPr(F)forOneDeletion
15
B B
A
k : length of message
Yi : string decoded in Guess i
c : number of parities
A : set satisfying all c parities
B : set satisfying first parity
Unionbound
Lemma
Subspacecardinality
q : field size
Claim
17
Ø Claim1(onedeletion):atmost2different transmittedstringscanleadtothesamedecodedstringinanywrongguess
Setofalltransmittedk-bitstrings
B B
SetofallGCdecoderoutputs
Setsatisfyingfirstparity
·
···
Fixed:- Guess- Deletionposition- Firstparity
Ø Claim2(# deletions): aconstant numberofdifferent transmittedstringscanleadtothesamedecodedstringinanywrongguess
Claim- Example
17
^: = 0000000000000000
^_ = 0010000000000010
000000000000000
000000000000010
Ø Claim1(onedeletion):atmost2different transmittedstringscanleadtothesamedecodedstringinanywrongguess
0000000000000000
Ø Example
Ø 3rd bitdeleted;Guess:deletionoccurred in4th block
0 0 0 0 0parity
` 0 0 ` 0
EncodinginGF(16)
Ø ^L = 1111111111111111 and^a = 0010000000000000
Ø Twoconditions:(1)Symmetryconstraint;(2)Algebraic linearconstraint
Decodeb:
b_Received
Claim
18
Ø Suppose3rd bitisdeleted, guess:deletionoccurred in4th block
8b1 + 4b2 + 2b4 + b5
b1b2b4b5 b6b7b8b9 b10b11b12b13 b14b15b16
8b6 + 4b7 + 2b8 + b9 8b10 + 4b11 + 2b12 + b13 ?
Symbol1 Symbol2 Symbol3 Erasure
Ø Howmanydifferent messagescanleadtosamedecodedstring?
GF(17)
Ø Symmetryconstraint: samedecodedstring⇒ samebitvaluesatpositionsofsymbols1,2and3
Ø Bitswhichcanbedifferent: and(deleted bit)b14, b15, b16 b3
Ø Algebraicconstraint:erasure isdecodedusingfirstparity
4b14 + 2(b3 + b15) + b16 = p1
b14, b15, b16, b3 2 GF (2)
p1 2 GF (17)
Equationhasatmost2solutions
ApplicationtoFileSynchronization
20
• Interactive synchronizationalgorithmby[Venkataramanan etal.’15]Ø Isolatesingledeletions,useVTcodesØ Modification:isolate# orfewer deletions,useGCcodes
• Gain:(1)lesscommunicationrounds,(2)lowercommunicationcost
Upto75%improvement
Upto15%improvement
19
N=1000iterations
Summary
16
• Guess&CheckCodes forlocalizeddeletions
Ø Explicitcodeconstruction withlogarithmicredundancy
Ø Deterministicpolynomialtimeencodinganddecoding
Ø Asymptoticallyvanishingprobabilityofdecoding failure
Ø Forsingleormultiplewindows
• Openproblems• Capacityofdeletionchannelwithlocalizeddeletions?• Codes foradversarial localizeddeletions• Andofcourse for“unrestricted” deletioncapacityandcodesarestill
openproblems
20