Channel-Independent Viterbi Algorithm (CIVA) for DNA
SequencingXiaohua (Edward) Li
Department of Electrical and Computer Engineering
State University of New York at Binghamton
Outline
• Introduction• CIVA• Use CIVA for base-calling• Simulations• Conclusions
Introduction: DNA sequencing
• DNA sequencing (base-calling)• Procedure
– template, PCR, electrophoresis, gel image, trace file– Base-caller
Introduction: Base-caller
• Base-calling: detect DNA base sequence• Approaches
– Manual reading, automated by heuristic knowledge– Image processing with signal models (ABI, Phred)– Deconvolution with communication (ISI) signal
model, e.g., MLSE, MAP
Proposed Method: CIVA
• Our method: with ISI model, robust to signal irregularity• Difficulty comes from irregular trace signal
– Amplitude and position jitter– Short signal, limited samples, yet time-varying
• Solution: CIVA– joint symbol/position optimization
– without channel estimation
CIVA: Basic Idea
)(
0
0
n
ss
ss
hh
hh
xx
xx
LMPnLMn
Pnn
L
L
MPnMn
Pnn
V
• List all possible symbol matrices S(n), • Find a probe for each possible S(n)
• Use all probes to determine S(n) from X(n)
)()( nn SX H :case Noiseless
ijnn ijii ,)(;)( 0gS0gS
CIVA: Properties
• CIVA: a trellis searching algorithm where metrics are calculated by probes
• Properties– Near optimal for even ill-conditioned channels– No channel estimation, channel independent– High computational complexity
• Applications– Direct application: system with simple signaling and
short channel, e.g., GSM, sensor networks, base-calling
– Future: more application with complexity reduction
CIVA for Base-calling
• Model trace signal with communication system
• Channel effect introduces ISI)()()( nnn VBX H
Symbol Matrix Structure
}
1
0
0
0
,
0
1
0
0
,
0
0
1
0
,
0
0
0
1
,
0
0
0
0
{.
)()()(
n
Tn
Gn
An
Cn
n
LMPnLMn
Pnn
b
b
b
b
Pnnn
bb
bb
bb
bbB
:values 5 has symbol Each
:Matrix Symbol
Probe Construction
ji
jif ji
ki
i
ii
if
if
:conditions metric satisfy Probes
vectors testing several contains usually Probe
probe a find matrix symbol eachFor
,0
,0),(
},,{
.,
1
GB
ggG
G
GB
Probe Construction Example
},,**,{*},,*,*,{*
},*,*,{*},,,,{
)0
.1
1,
1
0,
0
1,
1
1
]}11[],01[],10[],00[{
3210332102
3210132100
3210
ggggGggggG
ggggGggggG
gB
gggg
B
testing) nonzero denotes (* probes Construct
(
matrix symbol eachfor vectors testing Find
marices symbolfour have we Assume
ii
i
Trellis Metric Calculation
||||
:Example
if
if e wher
probe everyfor
metrics calculate matrix sample received eachFor
13,2,01
3
00
1
1
)()(
]),([
||)(||]),([
,)(
,)(]),([
]),([]),([
),(
gXgX
GX
gXGX
0gBgX
0gBgXgX
gXGX
G
X
Gg
nn
nf
nnf
n
nnf
nfnf
n
mm
m m
mim
mim
m
mi
i
im
CIVA Trellis Search
]),([)1({min)(
)](),([minarg )}({
liij
nn
nfnn
nnf
GX
GXG
:search trellis by
procedure this realizeCIVA
sequence probe by
determined is sequence Symbol
:matrices probe optimal
the find matrices, sample
received of sequence aFor
Special Consideration for DNA Trace Signal
• Amplitude jitter– solved inherently
• Limited trace samples and time varying – fast convergence of CIVA
• Timing jitter – looking for best timing for each
sample
Simulations: Experiment 1
• A trace file with reference bases from Staden Package• Normalize trace, find approximate base interval, apply
CIVA with M=P=1 (2-tap channel. 25 trellis states, 125 transitional paths)
• Results: less than 3% error compared with reference
Simulations: Experiment 1• Two zoom-in sections
– #1. with confident base detections
– #2. with undetermined N
Simulations: Experiment 2
• A gel image from Prof. S. Gal with low quality
• Scanning to trace signal
Simulations: Experiment 2
• Apply CIVA for base-calling
• A zoom-in section
Conclusions
• CIVA algorithm proposed for DNA sequence base-calling
• Robust to signal irregularity with affordable computational complexity
• Experiments show positive performance• More experiments are required for evaluation