Noisy Group Testing and Boolean Compressed Sensing
Venkatesh SaligramaBoston UniversityBoston University
Collaborator: George Atia
OutlineOutlineExamples …
Problem Setup– Noiseless Problem
A erage error• Average error, • Worst case error• Approximate reconstruction
– Noisy Problem• Additive Noise (False Alarms)• Dilution Effect (Misses)
Background MaterialBackground Material
Information-Theoretic Analysis: Achievability & Converse– Tradeoffs between #tests, #defectives, Noise, ,
What is Group testing?What is Group testing?Few soldiers in a large population have disease g p p– Detect by testing pooled blood samples
• Compressed Sensing in 1940s!!
Applications
DNA Screening (clones) [Du,Hwang’00]
Streaming [Gilbert-Strauss 08]– Telephone Calls
Compressed Sensing [Muthukrishnan 05]
Congested IP Link [Nguyen-Thiran07]
S t Vi l ti i C iti R di [Ati S S 08]Spectrum Violation in Cognitive Radio [Atia-S-S 08]MAC layer model of packet drops
XXXX XX XXXXXXXX X XXX XX XXXXTimeMedium few large few medium……
The group testing problemThe group testing problemGiven– N items: R1, R2, R3, …, RN ∈ {0,1}– K defectives: Rm = 1
• j ∈ S ⊂ {1 2 N}
A
j ∈ S ⊂ {1, 2, … N}
Test Matrix: C = [Xmn],
B
C D
– Xmn = 1 Put nth item in pool(test) m
Output (Ym) – mth test Outcome: +ve or –vep ( m)– Channel(noiseless)
Y = C RY = C R
Illustrative ExampleIllustrative ExampleY 0 0 1 0 1 1 0 0 0 0 1 0 1 0 0
1 0 0 0 0 1 1 0 1 0 1 0 0 0 0
TestsItems 1 2 3 … T
1
Defectives
1 0 0 0 0 1 1 0 1 0 1 0 0 0 0
0 0 1 0 1 1 0 0 0 0 0 0 1 0 0
0 0 1 0 1 1 0 0 0 0 1 0 1 0 0
1
2
N
Non-Adaptive vs Adaptive
1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 N
Non Adaptive vs AdaptiveAlgorithm: Greedy (Matching algorithm)
Noiseless ProblemNoiseless ProblemGiven, Sample size N, and Defective size K, p
– Find a Test Matrix • Small Misclassification Error• #Tests = T minimum
Misclassification– Defectives indexed by
– Decoder Indicator function
– Pointwise Error:
Indicator function
– other errors
Problem Statements (Noiseless)Problem Statements (Noiseless)
Our Focus: Does a design matrix exist? g
– Average Error is small
– Worst-case error is small
– Asymptotic Approaches Zero with large K and N. y p pp g
– Distortion:
Distance function
Noisy Case 1Noisy Case 1
Additive noiseAdditive noise
W Bernoulli(q) m=1 2 TWm Bernoulli(q), m=1, 2, …T
Motivation:
•False alarms of tests•Background losses (Wireless)Background losses (Wireless)
Noisy case 2: Dilution effectNoisy case 2: Dilution effect0 0Situation when item is diluted in the pool
1 1
1-s
s
Situation when item is diluted in the pool
1 1
•Dilution effects in blood tests or DNA screeningP b bili ti d i l t i i (A il ’08)•Probabilistic adversarial transmission (Asilomar’08)
•Link Losses [Nguyen-Thiran07]
Allowable transmission pattern
Dilution
0 1 1 0 1 0 0 0 1 0 …...
1-s s s 1-s
Actual transmission pattern 0 0 1 0 1 0 0 0 0 0 …...
ProblemProblem
Does there exist a matrix C Does there exist a matrix C
d
– Small misclassification error
Averaged over noise
Small misclassification error– worst-case, average-case, distortion,
asymptoticsy p
Prior Work: Noiseless CasesPrior Work: Noiseless CasesAdaptive and non adaptive group testing (Du, p p g p g ( ,Hwang’2000)
Non Adaptive• Superimposed codes (Kautz and Singleton’64)• Deterministic designs (Dyachkov and Rykov’83) g y y
(Ruszinko’94)(Erdos’85)(Ngyuen’88)(Porat’08)• Random Designs (Dyachkov’76,’82)(Sebo’85)(Macula’96)• Compressed sensing and approximate identification(Gilbert’08)• Two-Stage Disjuntive Testing: (Berger-Levenshtein 2002)
Our Approach/ContributionOur Approach/ContributionRandom Coding perspectiveg p p
– Information theoretic relationship• Misclassification vs Existence of Random Matrix• Misclassification vs. Existence of Random Matrix
– Mutual Information Formula for different problemsM i i b d• Meets existing bounds
– Extensions to new problems Noisy Case
Main Result Main Result
Theorem (average error 0 asymptotically) if:( g y p y) f
sufficiency
Necessity
Xmn generated i.i.d. Bernoulli pmn g pX(K) corresponds to collection associated with K defective itemsX(i) subset of i defective items in K (that are mis-classified)Necessity: FANO BoundNecessity: FANO Bound
Avg Error InterpretationAvg Error Interpretation#ways i of the K items are mis‐classified
Amount of Info if K‐i items revealed
0Noiseless Case Computation
(1‐p)k‐1 H(p)
Prob(X(K‐1) = 0)
(1 p) H(p)
Noiseless- average Pe
X(K‐1)X(1)
i=1channel
Typical Set Decoding
X(K‐2)
(1)Ychannel
i=2X
1 error typical, 2 errors typical ….
M D ff lchannelX(2)Y
Main Difficulty:– Channel Coding:
Another Codeword is Typical is Independent f T t O t t Y
i=KX
of Test Output Y.
– Here: Another Collection can be overlapping with i=K
channel YX(K) overlapping with True Collection
Errors: •True collection not typical•Another collection typical
P(Ei): Probability that a set which differs from true coalition in exactly i users is jointly typical
Summary of ResultsSummary of ResultsN items; K defectives; T Pools/Tests
Noiseless- average Pe
Noiseless- Max Pe (exact reconstruction)Compare with CS
With distortion (approximate reconstruction)
Additive Noise (False alarms)
Dilution Effects
ConclusionsConclusionsRandom Coding Analysis of Group Random Coding Analysis of Group Testing
Mutual Info Expression
Easy to Compute
Extensions to Noisy Cases
Identity through interference fingerprints
MAC layer model of packet drops
XXXX XX XXXXXXXX X XXX XX XXXXXXXX XX XXXXXXXX X XXX XX XXXXTime
Primary observation y(t)Violation? identify culprits
Medium few large few medium……
of packet drops Violation? identify culprits
Discrimination step Identification step
Our Data is: Collision/No collision in the different time slots
Ati S h i d S li D ’08
1 0 0 1 1 0 0 1 1 1 0 0 0 1 0…
19
Atia, Sahai and Saligrama, Dyspan’08Atia, Saligrama and Sahai, Asilomar ‘08
TheoremFor Constant Time till conviction Tc and sparse number of culprits (K) we can support as many users N with throughput of order p*N (i e fixed utilization) order p N (i.e. fixed utilization)
20