Indexation, Retrieval and Detection Techniquesfor Spoken Term Detection
Doğan Can
Boğaziçi UniversityDepartment of Electrical & Electronics Engineering
BUSIM Lab
January 26, 2010
OutlineIntroductionLattice Indexation/Search Framework
PreliminariesSpoken Utterance Retrieval with Factor Transducer2-Pass Spoken Term Detection with Factor TransducerSpoken Term Detection with Modified Factor TransducerSpoken Term Detection with Timed Factor TransducerExperimental Results
RetrievalQuery Forming and Expansion for Phonetic SearchExperimental Results
Thresholding for Spoken Term DetectionGlobal ThresholdingTerm Weighted Value Based Term Specific ThresholdingScore Distribution Based Term Specific ThresholdingExperimental Results
2 / 45
OutlineIntroductionLattice Indexation/Search Framework
PreliminariesSpoken Utterance Retrieval with Factor Transducer2-Pass Spoken Term Detection with Factor TransducerSpoken Term Detection with Modified Factor TransducerSpoken Term Detection with Timed Factor TransducerExperimental Results
RetrievalQuery Forming and Expansion for Phonetic SearchExperimental Results
Thresholding for Spoken Term DetectionGlobal ThresholdingTerm Weighted Value Based Term Specific ThresholdingScore Distribution Based Term Specific ThresholdingExperimental Results
Application: Sign Dictionary
3 / 45
demo.movMedia File (video/quicktime)
Comparison of Speech Retrieval TasksSpoken Document Retrieval vs. Spoken Utterance Retrieval vs. Spoken Term Detection
Query Relation Return (Text Analogue)
SDR long lexical+semantic documents (relevant pages)
SUR short inclusion (exact) utterances (sentences)
STD short exact match occurrences (positions)
4 / 45
Challenges of the Spoken Term Detection Task
I Aim: Open vocabulary searchReference: “Taipei night view"
I Challenge: Unreliable transcriptionsASR Output: “tie bay light view"
1. High error rate of one-best transcripts
Alternative transcriptions: [tie bay [light 0.6, night 0.4] view]
2. Out-Of-Vocabulary queries
Phonetic search: /t ay b ey n ay t v iy w/
3. Boost in false alarms due to 1 and 2
5 / 45
Challenges of the Spoken Term Detection Task
I Aim: Open vocabulary searchReference: “Taipei night view"
I Challenge: Unreliable transcriptionsASR Output: “tie bay light view"
1. High error rate of one-best transcripts
Alternative transcriptions: [tie bay [light 0.6, night 0.4] view]
2. Out-Of-Vocabulary queries
Phonetic search: /t ay b ey n ay t v iy w/
3. Boost in false alarms due to 1 and 2
5 / 45
Challenges of the Spoken Term Detection Task
I Aim: Open vocabulary searchReference: “Taipei night view"
I Challenge: Unreliable transcriptionsASR Output: “tie bay light view"
1. High error rate of one-best transcripts
Alternative transcriptions: [tie bay [light 0.6, night 0.4] view]
2. Out-Of-Vocabulary queries
Phonetic search: /t ay b ey n ay t v iy w/
3. Boost in false alarms due to 1 and 2
5 / 45
Challenges of the Spoken Term Detection Task
I Aim: Open vocabulary searchReference: “Taipei night view"
I Challenge: Unreliable transcriptionsASR Output: “tie bay light view"
1. High error rate of one-best transcripts
Alternative transcriptions: [tie bay [light 0.6, night 0.4] view]
2. Out-Of-Vocabulary queries
Phonetic search: /t ay b ey n ay t v iy w/
3. Boost in false alarms due to 1 and 2
5 / 45
Challenges of the Spoken Term Detection Task
I Aim: Open vocabulary searchReference: “Taipei night view"
I Challenge: Unreliable transcriptionsASR Output: “tie bay light view"
1. High error rate of one-best transcripts
Alternative transcriptions: [tie bay [light 0.6, night 0.4] view]
2. Out-Of-Vocabulary queries
Phonetic search: /t ay b ey n ay t v iy w/
3. Boost in false alarms due to 1 and 2
5 / 45
Challenges of the Spoken Term Detection Task
I Aim: Open vocabulary searchReference: “Taipei night view"
I Challenge: Unreliable transcriptionsASR Output: “tie bay light view"
1. High error rate of one-best transcripts
Efficient Indexing and Search Framework for STD
2. Out-Of-Vocabulary queries
Utilizing Weighted OOV Query Pronunciations
3. Boost in false alarms due to 1 and 2Exploiting Score Statistics for STD
5 / 45
Previous Work in the Field
How to Index and Search LatticesI General Indexation of Weighted Automata [Saraclar and
Sproat, 2004,Allauzen et al., 2004]I Position Specific Posterior Lattices (PSPL) [Chelba and
Acero, 2005]I Time-based Merging for Indexing (TMI) [Zhou et al., 2006]
How to Alleviate OOV IssueI Search on Sub-word Decoding [Saraclar and Sproat,
2004,Siohan and Bacchiani, 2005,Mamou et al., 2007]I Search on the sub-word representation of word decoding
[Chaudhari and Picheny, 2007]I Phonetic query expansion [Li et al., 2000]
6 / 45
Anatomy of a Spoken Term Detection (STD) System
User
Query
Preprocess SearchEngine
largerthan τ?
Return
Omit
SpeechDatabase
ASROutput
Index
ASR
INDEXINGRETRIEVAL
DETECTION
yes
no
7 / 45
Anatomy of a Spoken Term Detection (STD) System
User
Query
Preprocess SearchEngine
largerthan τ?
Return
Omit
SpeechDatabase
ASROutput
Index
ASR
INDEXINGRETRIEVAL
DETECTION
yes
no
7 / 45
Anatomy of a Spoken Term Detection (STD) System
User
Query
Preprocess SearchEngine
largerthan τ?
Return
Omit
SpeechDatabase
ASROutput
Index
ASR
INDEXINGRETRIEVAL
DETECTION
yes
no
7 / 45
Anatomy of a Spoken Term Detection (STD) System
User
Query
Preprocess SearchEngine
largerthan τ?
Return
Omit
SpeechDatabase
ASROutput
Index
ASR
INDEXINGRETRIEVAL
DETECTION
yes
no
7 / 45
Anatomy of a Spoken Term Detection (STD) System
User
Query
Preprocess SearchEngine
largerthan τ?
Return
Omit
SpeechDatabase
ASROutput
Index
ASR
INDEXINGRETRIEVAL
DETECTION
yes
no
7 / 45
OutlineIntroductionLattice Indexation/Search Framework
PreliminariesSpoken Utterance Retrieval with Factor Transducer2-Pass Spoken Term Detection with Factor TransducerSpoken Term Detection with Modified Factor TransducerSpoken Term Detection with Timed Factor TransducerExperimental Results
RetrievalQuery Forming and Expansion for Phonetic SearchExperimental Results
Thresholding for Spoken Term DetectionGlobal ThresholdingTerm Weighted Value Based Term Specific ThresholdingScore Distribution Based Term Specific ThresholdingExperimental Results
Notation & DefinitionsSemirings
DefinitionA system (K,⊕,⊗, 0̄, 1̄) is a semiring if:
I (K,⊕, 0̄) is a commutative monoid with identity element 0̄;I (K,⊗, 1̄) is a monoid with identity element 1̄;I ⊗ distributes over ⊕;I 0̄ is an annihilator for ⊗: for all a ∈ K, a⊗ 0̄ = 0̄⊗ a = 0̄.
Set (K) ⊕ ⊗ 0̄ 1̄Boolean B {0, 1} ∨ ∧ 0 1
Probability R R+ + × 0 1Log L R ∪ {+∞} ⊕loga + +∞ 0
Tropical T R ∪ {+∞} min + +∞ 0Tropical T ′ R ∪ {−∞} max + −∞ 0
a a⊕log b = − log(e−a + e−b)
8 / 45
Notation & DefinitionsWeighted Finite-State Automata
DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :
I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :
set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.
qa/1
qb
qc/1
m:h/.5
e:a/.5
e:a/.7
e:v/.3
�:�/1
9 / 45
Notation & DefinitionsWeighted Finite-State Automata
DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :
I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :
set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.
qa/1
qb
qc/1
m:h/.5
e:a/.5
e:a/.7
e:v/.3
�:�/1
9 / 45
Notation & DefinitionsWeighted Finite-State Automata
DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :
I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :
set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.
qa/1
qb
qc/1
m:h/.5
e:a/.5
e:a/.7
e:v/.3
�:�/1
9 / 45
Notation & DefinitionsWeighted Finite-State Automata
DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :
I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :
set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.
qa/1
qb
qc/1
m:h/.5
e:a/.5
e:a/.7
e:v/.3
�:�/1
9 / 45
Notation & DefinitionsWeighted Finite-State Automata
DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :
I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :
set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.
qa/1
qb
qc/1
m:h/.5
e:a/.5
e:a/.7
e:v/.3
�:�/1
9 / 45
Notation & DefinitionsWeighted Finite-State Automata
DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :
I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :
set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.
qa/1
qb
qc/1
m:h/.5
e:a/.5
e:a/.7
e:v/.3
�:�/1
9 / 45
Notation & DefinitionsWeighted Finite-State Automata
DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :
I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :
set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.
qa/1
qb
qc/1
m:h/.5
e:a/.5
e:a/.7
e:v/.3
�:�/1
9 / 45
Notation & DefinitionsWeighted Finite-State Automata
DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :
I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :
set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.
qa/1
qb
qc/1
m:h/.5
e:a/.5
e:a/.7
e:v/.3
�:�/1
9 / 45
Notation & DefinitionsWeighted Finite-State Automata
DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :
I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :
set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.
qa/1
qb
qc/1
m:h/.5
e:a/.5
e:a/.7
e:v/.3
�:�/1
9 / 45
Notation & DefinitionsFactor Automaton
DefinitionGiven two strings u, v ∈ Σ∗, v is a factor (substring) of u ifu = xvy for some x, y ∈ Σ∗. More generally, v is a factor ofL ⊆ Σ∗ if v is a factor of some u ∈ L.
DefinitionThe factor automaton F (u) of u is the minimal deterministicfinite-state acceptor recognizing exactly Xu, the set of factors of u.
DefinitionSimilarly, the factor automaton F (A) of A is the minimaldeterministic finite-state acceptor recognizing exactly XA, the setof factors of the strings recognized by A.
10 / 45
Notation & DefinitionsFactor Automaton
DefinitionGiven two strings u, v ∈ Σ∗, v is a factor (substring) of u ifu = xvy for some x, y ∈ Σ∗. More generally, v is a factor ofL ⊆ Σ∗ if v is a factor of some u ∈ L.
DefinitionThe factor automaton F (u) of u is the minimal deterministicfinite-state acceptor recognizing exactly Xu, the set of factors of u.
DefinitionSimilarly, the factor automaton F (A) of A is the minimaldeterministic finite-state acceptor recognizing exactly XA, the setof factors of the strings recognized by A.
10 / 45
Notation & DefinitionsFactor Automaton
DefinitionGiven two strings u, v ∈ Σ∗, v is a factor (substring) of u ifu = xvy for some x, y ∈ Σ∗. More generally, v is a factor ofL ⊆ Σ∗ if v is a factor of some u ∈ L.
DefinitionThe factor automaton F (u) of u is the minimal deterministicfinite-state acceptor recognizing exactly Xu, the set of factors of u.
DefinitionSimilarly, the factor automaton F (A) of A is the minimaldeterministic finite-state acceptor recognizing exactly XA, the setof factors of the strings recognized by A.
10 / 45
Indexing WFSA for Spoken Utterance Retrieval
0
1
2
good/1
evening/.6 morning/.4
Setup:For each speech utterance ui, i = 1, ..., n,
I A weighted automaton Ai over Σ and L;i.e. word lattice output by ASR.
Objective:Create a full index to directly search for anyfactor of these automata; i.e. evening, goodmorning, etc.Notes:
I different from classical indexation,input data is uncertain
I must make use of weights
11 / 45
Factor Transducer Construction: Toy ExampleFactor Selection
10 2
s
e
lose : �/.4
find : �/.6 yourself : �/1
1. Replace each transition (p, a, w, q) by (p, a, �, w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi
12 / 45
Factor Transducer Construction: Toy ExampleFactor Selection
10 2
s
e
lose : �/.4
find : �/.6 yourself : �/1
1. Replace each transition (p, a, w, q) by (p, a, �, w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi
12 / 45
Factor Transducer Construction: Toy ExampleFactor Selection
10 2
s
e
lose : �/.4
find : �/.6 yourself : �/1
1. Replace each transition (p, a, w, q) by (p, a, �, w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi
12 / 45
Factor Transducer Construction: Toy ExampleFactor Selection
10 2
s
e
lose : �/.4
find : �/.6 yourself : �/1
1. Replace each transition (p, a, w, q) by (p, a, �, w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi
12 / 45
Factor Transducer Construction: Toy ExampleFactor Selection
10 2
s
e
lose : �/.4
find : �/.6 yourself : �/1
� : �/1
� : �/1
� : �/1
1. Replace each transition (p, a, w, q) by (p, a, �, w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi
12 / 45
Factor Transducer Construction: Toy ExampleFactor Selection
10 2
s
e
lose : �/.4
find : �/.6 yourself : �/1
� : �/1
� : �/1
� : �/1
� : i/1� : i/1
� : i/1
1. Replace each transition (p, a, w, q) by (p, a, �, w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi
12 / 45
Factor Transducer Construction: Toy ExampleOptimization
1. Weighted �-removal2. Weighted determinization over L3. Weighted minimization over L by viewing Ti as an acceptor
Before
10 2
s
e
lose: �/.4
find: �/.6 yourself: �/1
� : �/1
� : �/1
� : �/1
� : i/1
� : i/1
� : i/1
After
1 2
0
3
lose: �/.4
find: �/.6 yourself: �/1
� : i/1
yourself: �/1
� : i/1
13 / 45
Full Factor Transducer Construction & SearchFull Factor Transducer:Given Ti for each Ai, i = 1 . . . n:1. Take union,
U =⋃i
Ti, i = 1, . . . , n
2. Weighted �-removal, determinization (and minimization)3. Define T as the transducer obtained after sorting input labels
Search:1. User query X can be any weighted automaton! (Regular
expressions)2. Compose X with T on the input side and project result onto
the output labels,P = Π2(X ◦ T )
3. Weighted �-removal + pruning or n-shortest paths + sort withshortest path algorithm
14 / 45
Full Factor Transducer Construction & SearchFull Factor Transducer:Given Ti for each Ai, i = 1 . . . n:1. Take union,
U =⋃i
Ti, i = 1, . . . , n
2. Weighted �-removal, determinization (and minimization)3. Define T as the transducer obtained after sorting input labels
Search:1. User query X can be any weighted automaton! (Regular
expressions)2. Compose X with T on the input side and project result onto
the output labels,P = Π2(X ◦ T )
3. Weighted �-removal + pruning or n-shortest paths + sort withshortest path algorithm
14 / 45
Spoken Utterance Retrieval with Factor Transducer[Allauzen et al., 2004]
Database:1. “a a"
0 1 2
a/1 a/1
2. “[b .6, a .4] a"
0 1 2
b/.6
a/.4
a/1
0Query: 1
a/1
Index:
0
1
2
5
3
4
a:�/1
b:�/1
a:�/1
�:1/2
�:2/1.4
�:2/.6
a:�/1
�:1/1
�:2/.4
�:2.6
15 / 45
Spoken Utterance Retrieval with Factor Transducer[Allauzen et al., 2004]
Database:1. “a a"
0 1 2
a/1 a/1
2. “[b .6, a .4] a"
0 1 2
b/.6
a/.4
a/1
0Query: 1
a/1
Index:
0
1
2
5
3
4
a:�/1
b:�/1
a:�/1
�:1/2
�:2/1.4
�:2/.6
a:�/1
�:1/1
�:2/.4
�:2.6
15 / 45
Spoken Utterance Retrieval with Factor Transducer[Allauzen et al., 2004]
Database:1. “a a"
0 1 2
a/1 a/1
2. “[b .6, a .4] a"
0 1 2
b/.6
a/.4
a/1
0Query: 1
a/1
Index:
0
1
2
5
3
4
a:�/1
b:�/1
a:�/1
�:1/2
�:2/1.4
�:2/.6
a:�/1
�:1/1
�:2/.4
�:2.6
15 / 45
Spoken Utterance Retrieval with Factor Transducer[Allauzen et al., 2004]
Database:1. “a a"
0 1 2
a/1 a/1
2. “[b .6, a .4] a"
0 1 2
b/.6
a/.4
a/1
0Query: 1
a/1
Results:
0 1 2a:�/1
�:1/2
�:2/1.4
(Utterance ID, Expected Count):1. (1,2)2. (2,1.4)
15 / 45
2-pass STD with Factor Transducer[Parlak and Saraclar, 2008,Can et al., 2009]
ProcedureI For each query:
I Obtain (utterance ID, expected count) pairs (1st pass)I For each utterance with expected count > τ :
I Align the query with the utterance → time interval (2nd pass)[Parlak and Saraclar, 2008]
I Align the query with the lattice → time interval (2nd pass)[Can et al., 2009]
I Return (utterance ID, time interval, expected count) triplet
ProblemsI 2nd pass takes time → slowI Multiple occurrences of a query in the same utterance
contribute to the same expected count.I Ideal for Spoken Utterance RetrievalI Not so for Spoken Term Detection
16 / 45
Modified Factor Transducer Construction: Toy ExampleFactor Selection
L0 L1 L20 1 3
10 2
s
e
lose : 0-1/.4
find : 0-1/.6 yourself : 1-3/1
1. Replace each transition (p, a, w, q) by (p, a, Li[p]-Li[q], w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi
17 / 45
Modified Factor Transducer Construction: Toy ExampleFactor Selection
L0 L1 L20 1 3
10 2
s
e
lose : 0-1/.4
find : 0-1/.6 yourself : 1-3/1
1. Replace each transition (p, a, w, q) by (p, a, Li[p]-Li[q], w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi
17 / 45
Modified Factor Transducer Construction: Toy ExampleFactor Selection
L0 L1 L20 1 3
10 2
s
e
lose : 0-1/.4
find : 0-1/.6 yourself : 1-3/1
1. Replace each transition (p, a, w, q) by (p, a, Li[p]-Li[q], w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi
17 / 45
Modified Factor Transducer Construction: Toy ExampleFactor Selection
L0 L1 L20 1 3
10 2
s
e
lose : 0-1/.4
find : 0-1/.6 yourself : 1-3/1
1. Replace each transition (p, a, w, q) by (p, a, Li[p]-Li[q], w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi
17 / 45
Modified Factor Transducer Construction: Toy ExampleFactor Selection
L0 L1 L20 1 3
10 2
s
e
lose : 0-1/.4
find : 0-1/.6 yourself : 1-3/1
� : �/1
� : �/1
� : �/1
1. Replace each transition (p, a, w, q) by (p, a, Li[p]-Li[q], w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi
17 / 45
Modified Factor Transducer Construction: Toy ExampleFactor Selection
L0 L1 L20 1 3
10 2
s
e
lose : 0-1/.4
find : 0-1/.6 yourself : 1-3/1
� : �/1
� : �/1
� : �/1
� : i/1� : i/1
� : i/1
1. Replace each transition (p, a, w, q) by (p, a, Li[p]-Li[q], w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi
17 / 45
Factor Transducer vs. Modified Factor Transducer(After Optimization)
Factor Transducer
1 2
0
3
lose:�/.4
find:�/.6
yourself:�/1
� : i/1
yourself:�/1
� : i/1
Timed Factor Transducer
1 2
0
3
lose:0-1/.4
find:0-1/.6
yourself:1-3/1
� : i/1
yourself:1-3/1
� : i/1
18 / 45
Spoken Term Detection with Modified Factor Transducer[Can et al., 2009]
Database:1. “a a"
0 1 2
a:0.1-1/1 a:1-1.8/.6
a:1-1.9/.4
2. “[b .6, a .4] a"
0 1 2
b:0.2-1/.6
a:0.1-1/.4
a:1-1.9/1
Index:
0
1
2
6 5
3
4
a:0-1/1
a:1-2/1
b:0-1/1
a:1-2/1
�:1/1
�:2/.4
�:2/1�:1/1
�:2/.6
a:1-2/1
�:1/1
�:2/.4
�:2/.6
19 / 45
Spoken Term Detection with Modified Factor Transducer[Can et al., 2009]
Database:1. “a a"
0 1 2
a:0.1-1/1 a:1-1.9/.6
a:1-1.9/.4
2. “[b .6, a .4] a"
0 1 2
b:0.2-1/.6
a:0.1-1/.4
a:1-1.9/1
CLUSTERING
Index:
0
1
2
6 5
3
4
a:0-1/1
a:1-2/1
b:0-1/1
a:1-2/1
�:1/1
�:2/.4
�:2/1�:1/1
�:2/.6
a:1-2/1
�:1/1
�:2/.4
�:2/.6
19 / 45
Spoken Term Detection with Modified Factor Transducer[Can et al., 2009]
Database:1. “a a"
0 1 2
a:0-1/1 a:1-2/.6
a:1-2/.4
2. “[b .6, a .4] a"
0 1 2
b:0-1/.6
a:0-1/.4
a:1-2/1
QUANTIZATION
Index:
0
1
2
6 5
3
4
a:0-1/1
a:1-2/1
b:0-1/1
a:1-2/1
�:1/1
�:2/.4
�:2/1�:1/1
�:2/.6
a:1-2/1
�:1/1
�:2/.4
�:2/.6
19 / 45
Spoken Term Detection with Modified Factor Transducer[Can et al., 2009]
Database:1. “a a"
0 1 2
a:0-1/1 a:1-2/.6
a:1-2/.4
2. “[b .6, a .4] a"
0 1 2
b:0-1/.6
a:0-1/.4
a:1-2/1
Index:
0
1
2
6 5
3
4
a:0-1/1
a:1-2/1
b:0-1/1
a:1-2/1
�:1/1
�:2/.4
�:2/1�:1/1
�:2/.6
a:1-2/1
�:1/1
�:2/.4
�:2/.6
19 / 45
Spoken Term Detection with Modified Factor Transducer[Can et al., 2009]
Database:1. “a a"
0 1 2
a:0-1/1 a:1-2/.6
a:1-2/.4
2. “[b .6, a .4] a"
0 1 2
b:0-1/.6
a:0-1/.4
a:1-2/1
0Query: 1
a/1
Index:
0
1
2
6 5
3
4
a:0-1/1
a:1-2/1
b:0-1/1
a:1-2/1
�:1/1
�:2/.4
�:2/1�:1/1
�:2/.6
a:1-2/1
�:1/1
�:2/.4
�:2/.6
19 / 45
Spoken Term Detection with Modified Factor Transducer[Can et al., 2009]
Database:1. “a a"
0 1 2
a:0-1/1 a:1-2/.6
a:1-2/.4
2. “[b .6, a .4] a"
0 1 2
b:0-1/.6
a:0-1/.4
a:1-2/1
0Query: 1
a/1
Results:
0
1
2
3
a:0-1/1
a:1-2/1
�:1/1
�:2/.4
�:2/1
�:1/1
(Utterance ID, Time Interval,Posterior Probability):1. (1,0-1,1)2. (1,1-2,1)3. (2,0-1,.4)4. (2,1-2,1)
19 / 45
1-pass STD with Modified Factor Transducer[Can et al., 2009]
ProcedureI For each query:
I Obtain (utterance ID, time interval, posterior probability)I Return triplets with posterior probability > τ
HighlightsI No 2nd pass → fastI No multiple occurrence problem, every distinct interval leads
to another index entry → overlapping intervals are clusteredI Time interval mismatches → common paths are reduced →
larger index → time intervals are quantized
ProblemsI Index non-deterministic!
20 / 45
Timed Factor Transducer Construction: Toy ExampleFactor Selection
L0 L1 L20 1 3
10 2
s
e
lose:1/.4,0,0
find:1/.6,0,0 yourself:1/1,0,0
1. Replace each arc weight w ∈ L with {w, 1, 1} ∈ L × T × T ′
2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new arc (s, �, �, {d[q], Li[q], 1}, q) for q ∈ Qi5. Create a new arc (q, �, i, {f [q], 1, Li[q]}, e) for q ∈ Qi
21 / 45
Timed Factor Transducer Construction: Toy ExampleFactor Selection
L0 L1 L20 1 3
10 2
s
e
lose:1/.4,0,0
find:1/.6,0,0 yourself:1/1,0,0
1. Replace each arc weight w ∈ L with {w, 1, 1} ∈ L × T × T ′
2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new arc (s, �, �, {d[q], Li[q], 1}, q) for q ∈ Qi5. Create a new arc (q, �, i, {f [q], 1, Li[q]}, e) for q ∈ Qi
21 / 45
Timed Factor Transducer Construction: Toy ExampleFactor Selection
L0 L1 L20 1 3
10 2
s
e
lose:1/.4,0,0
find:1/.6,0,0 yourself:1/1,0,0
1. Replace each arc weight w ∈ L with {w, 1, 1} ∈ L × T × T ′
2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new arc (s, �, �, {d[q], Li[q], 1}, q) for q ∈ Qi5. Create a new arc (q, �, i, {f [q], 1, Li[q]}, e) for q ∈ Qi
21 / 45
Timed Factor Transducer Construction: Toy ExampleFactor Selection
L0 L1 L20 1 3
10 2
s
e
lose:1/.4,0,0
find:1/.6,0,0 yourself:1/1,0,0
1. Replace each arc weight w ∈ L with {w, 1, 1} ∈ L × T × T ′
2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new arc (s, �, �, {d[q], Li[q], 1}, q) for q ∈ Qi5. Create a new arc (q, �, i, {f [q], 1, Li[q]}, e) for q ∈ Qi
21 / 45
Timed Factor Transducer Construction: Toy ExampleFactor Selection
L0 L1 L20 1 3
10 2
s
e
lose:1/.4,0,0
find:1/.6,0,0 yourself:1/1,0,0
� : �/1,0,0
� : �/1,1,0
� : �/1,3,0
1. Replace each arc weight w ∈ L with {w, 1, 1} ∈ L × T × T ′
2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new arc (s, �, �, {d[q], Li[q], 1}, q) for q ∈ Qi5. Create a new arc (q, �, i, {f [q], 1, Li[q]}, e) for q ∈ Qi
21 / 45
Timed Factor Transducer Construction: Toy ExampleFactor Selection
L0 L1 L20 1 3
10 2
s
e
lose:1/.4,0,0
find:1/.6,0,0 yourself:1/1,0,0
� : �/1,0,0
� : �/1,1,0
� : �/1,3,0
� : i/1,0,0� : i/1,0,1
� : i/1,0,3
1. Replace each arc weight w ∈ L with {w, 1, 1} ∈ L × T × T ′
2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new arc (s, �, �, {d[q], Li[q], 1}, q) for q ∈ Qi5. Create a new arc (q, �, i, {f [q], 1, Li[q]}, e) for q ∈ Qi
21 / 45
Factor Transducer vs. Timed Factor Transducer(After Optimization)
Factor Transducer
1 2
0
3
lose: �/.4
find: �/.6
yourself: �/1
� : i/1
yourself: �/1
� : i/1
Timed Factor Transducer
1 2
0
3
lose: 1/.4,0,1
find: 1/.6,0,1
yourself: 1/1,1,3
� : i/1,0,0
yourself: 1/1,0,2
� : i/1,0,0
22 / 45
Spoken Term Detection with Timed Factor TransducerDatabase:1. “a b",[0.1,1,1.8]
0 1 2
a:a/1 b:b/1
2. “b a",[0.2,1,1.9]
0 1 2
b:b/1 a:a/1
Index:
0
1
2
5
3
4
a:1/1,.1,1
b:1/1,0,.8
a:1/1,0,.9
b:1/1,.2,1
�:1/1,0,0
�:2/1,.9,.9�:1/1,0,0
�:2/1,0,0
�:1/1,.8,.8
�:2/1,0,0
23 / 45
Spoken Term Detection with Timed Factor TransducerDatabase:1. “a b",[0.1,1,1.8]
0 1 2
a:1/1 b:1/1
2. “b a",[0.2,1,1.9]
0 1 2
b:1/1 a:1/1
CLUSTERING
Index:
0
1
2
5
3
4
a:1/1,.1,1
b:1/1,0,.8
a:1/1,0,.9
b:1/1,.2,1
�:1/1,0,0
�:2/1,.9,.9�:1/1,0,0
�:2/1,0,0
�:1/1,.8,.8
�:2/1,0,0
23 / 45
Spoken Term Detection with Timed Factor TransducerDatabase:1. “a b",[0.1,1,1.8]
0 1 2
a:1/1 b:1/1
2. “b a",[0.2,1,1.9]
0 1 2
b:1/1 a:1/1
Index:
0
1
2
5
3
4
a:1/1,.1,1
b:1/1,0,.8
a:1/1,0,.9
b:1/1,.2,1
�:1/1,0,0
�:2/1,.9,.9�:1/1,0,0
�:2/1,0,0
�:1/1,.8,.8
�:2/1,0,0
23 / 45
Spoken Term Detection with Timed Factor TransducerDatabase:1. “a b",[0.1,1,1.8]
0 1 2
a:1/1 b:1/1
2. “b a",[0.2,1,1.9]
0 1 2
b:1/1 a:1/1
0Query: 1
b/1,1,1
Index:
0
1
2
5
3
4
a:1/1,.1,1
b:1/1,0,.8
a:1/1,0,.9
b:1/1,.2,1
�:1/1,0,0
�:2/1,.9,.9�:1/1,0,0
�:2/1,0,0
�:1/1,.8,.8
�:2/1,0,0
23 / 45
Spoken Term Detection with Timed Factor TransducerDatabase:1. “a b",[0.1,1,1.8]
0 1 2
a:1/1 b:1/1
2. “b a",[0.2,1,1.9]
0 1 2
b:1/1 a:1/1
0Query: 1
b/1,1,1
Results:
0 1 2
b:1/1,.2,1
�:2/1,0,0
�:1/1,.8,.8
(Utterance ID, Time Interval,Posterior Probability):1. (1,1-1.8,1)2. (2,0.2-1,1)
23 / 45
1-pass STD with Timed Factor Transducer
ProcedureI For each query:
I Obtain (utterance ID, time interval, posterior probability)I Return triplets with posterior probability > τ
HighlightsI No 2nd pass → fastI No multiple occurrence problem, every distinct interval leads
to another index entry → overlapping intervals are clusteredI No time interval mismatch problem → efficient optimization
(almost deterministic)
24 / 45
Index Size vs. Beam WidthBUTBN-R data-set > 160 hours
1 2 3 4 5 6 7 8 9 100
500
1000
1500
2000
2500
3000
3500
4000
4500
Beam Width
Inde
x Si
ze (i
n M
Bs)
Timed Factor TransducerModified Factor TransducerFactor Transducer
25 / 45
Search Time vs. Beam WidthBUTBN-R data-set > 160 hours, R-IV query-set: 4400 IV terms
1 2 3 4 5 6 7 8 9 10400
600
800
1000
1200
Beam Width
Tota
l Sea
rch
Tim
e (in
sec
onds
)
Factor Transducer (2−Stage)
1 2 3 4 5 6 7 8 9 105
10
15
20
25
30
Beam Width
Tota
l Sea
rch
Tim
e (in
sec
onds
)
Timed Factor TransducerModified Factor Transducer
26 / 45
Per Query Search Time w.r.t. Query LengthBUTBN-R data-set > 160 hours, R-IV query-set: 4400 IV terms
1 2 3 4 5 6 7 8 9 102
4
6
8
10
12Query Length = 1
Beam Width
Aver
age
Sear
ch T
ime
(in m
s)
Timed Factor TransducerModified Factor Transducer
1 2 3 4 5 6 7 8 9 100
0.5
1
1.5
2Query Length = 2
Beam Width
Aver
age
Sear
ch T
ime
(in m
s)
Timed Factor TransducerModified Factor Transducer
1 2 3 4 5 6 7 8 9 100
0.5
1
1.5
2Query Length = 3
Beam Width
Aver
age
Sear
ch T
ime
(in m
s)
Timed Factor TransducerModified Factor Transducer
1 2 3 4 5 6 7 8 9 100
0.5
1
1.5
2Query Length = 4
Beam Width
Aver
age
Sear
ch T
ime
(in m
s)
Timed Factor TransducerModified Factor Transducer
27 / 45
Per Result Search Time vs. Query LengthBUTBN-R data-set > 160 hours, R-IV query-set: 4400 IV terms
0 1 2 3 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9Beam Width = 4
Query Length
Aver
age
Sear
ch T
ime
(in m
s)
Timed Factor TransducerModified Factor Transducer
28 / 45
Summary
I WFST-based indexing provides a fast, mathematically soundretrieval solution for the STD task.
I Modified Factor Transducer is disk space friendly butnon-deterministic.
I Timed Factor Transducer is “almost deterministic” →search-time linear in query length.
29 / 45
OutlineIntroductionLattice Indexation/Search Framework
PreliminariesSpoken Utterance Retrieval with Factor Transducer2-Pass Spoken Term Detection with Factor TransducerSpoken Term Detection with Modified Factor TransducerSpoken Term Detection with Timed Factor TransducerExperimental Results
RetrievalQuery Forming and Expansion for Phonetic SearchExperimental Results
Thresholding for Spoken Term DetectionGlobal ThresholdingTerm Weighted Value Based Term Specific ThresholdingScore Distribution Based Term Specific ThresholdingExperimental Results
Query Forming for Phonetic Search
Motivation: To search for OOV queries
PreparationI Convert word/subword lattices to phonetic latticesI Build a phonetic index
How to search for OOVs?I Orthographic form (text) available, we need phonetic form
(pronunciation)I Use a letter-to-sound (L2S) system to obtain likely
pronunciationsI Use multiple pronunciations to search for OOV queries
30 / 45
L2S Pronunciations
L2S SystemI n-gram model over (letter, phone) pairsI Scores have a wide dynamic range due to the conditional
independence assumptionI Pointless to use L2S scores as they are
Unweighted L2S Pronunciations
1. Obtain weighted pronunciations from the L2S transducer2. Pick n-best alternatives and remove weights3. Search: Compose the unweighted automaton representing
alternatives with the phonetic index
31 / 45
Weighted L2S Pronunciations (Query: Taipei)
1. Obtain weighted pronunciations from L2S transducer[/t ay b ey/ .5, /t ay p ey/ .05, /d ay b ey/ 0.005, ...]
2. Pick n-best alternatives to prevent false alarms[/t ay b ey/ .5, /t ay p ey/ .05] n = 2
3. Scale the weights with query length
[/t ay b ey/ 6√.5, /t ay p ey/ 6
√.05] query length = 6
4. Normalize scaled weights to obtain posterior scores[/t ay b ey/ .6, /t ay p ey/ .4]
5. Search: Compose the weighted automaton representingalternatives with the phonetic index
32 / 45
Weighted L2S Pronunciations (Query: Taipei)
1. Obtain weighted pronunciations from L2S transducer[/t ay b ey/ .5, /t ay p ey/ .05, /d ay b ey/ 0.005, ...]
2. Pick n-best alternatives to prevent false alarms[/t ay b ey/ .5, /t ay p ey/ .05] n = 2
3. Scale the weights with query length
[/t ay b ey/ 6√.5, /t ay p ey/ 6
√.05] query length = 6
4. Normalize scaled weights to obtain posterior scores[/t ay b ey/ .6, /t ay p ey/ .4]
5. Search: Compose the weighted automaton representingalternatives with the phonetic index
32 / 45
Weighted L2S Pronunciations (Query: Taipei)
1. Obtain weighted pronunciations from L2S transducer[/t ay b ey/ .5, /t ay p ey/ .05, /d ay b ey/ 0.005, ...]
2. Pick n-best alternatives to prevent false alarms[/t ay b ey/ .5, /t ay p ey/ .05] n = 2
3. Scale the weights with query length
[/t ay b ey/ 6√.5, /t ay p ey/ 6
√.05] query length = 6
4. Normalize scaled weights to obtain posterior scores[/t ay b ey/ .6, /t ay p ey/ .4]
5. Search: Compose the weighted automaton representingalternatives with the phonetic index
32 / 45
Weighted L2S Pronunciations (Query: Taipei)
1. Obtain weighted pronunciations from L2S transducer[/t ay b ey/ .5, /t ay p ey/ .05, /d ay b ey/ 0.005, ...]
2. Pick n-best alternatives to prevent false alarms[/t ay b ey/ .5, /t ay p ey/ .05] n = 2
3. Scale the weights with query length
[/t ay b ey/ 6√.5, /t ay p ey/ 6
√.05] query length = 6
4. Normalize scaled weights to obtain posterior scores[/t ay b ey/ .6, /t ay p ey/ .4]
5. Search: Compose the weighted automaton representingalternatives with the phonetic index
32 / 45
Weighted L2S Pronunciations (Query: Taipei)
1. Obtain weighted pronunciations from L2S transducer[/t ay b ey/ .5, /t ay p ey/ .05, /d ay b ey/ 0.005, ...]
2. Pick n-best alternatives to prevent false alarms[/t ay b ey/ .5, /t ay p ey/ .05] n = 2
3. Scale the weights with query length
[/t ay b ey/ .9, /t ay p ey/ .6] query length = 6
4. Normalize scaled weights to obtain posterior scores[/t ay b ey/ .6, /t ay p ey/ .4]
5. Search: Compose the weighted automaton representingalternatives with the phonetic index
32 / 45
Experiment I - Reference Lexicon (Reflex) PronunciationsMSTD data-set, MSTD-OOV query-set: 1290 OOVs, phonetic indexes
Actual Term Weighted Value
Subwords obtained by pruning a phone n-gram model
Data P(FA) P(Miss) ATWVWord 1-best .00001 .770 .215Word Consensus Nets .00002 .687 .294Word Lattices .00002 .657 .322Fragment 1-best .00001 .680 .306Fragment Consensus Nets .00003 .584 .390Fragment Lattices .00003 .485 .484
33 / 45
Experiment I - Reference Lexicon (Reflex) PronunciationsMSTD data-set, MSTD-OOV query-set: 1290 OOVs, phonetic indexes
Actual Term Weighted Value
Subwords obtained by pruning a phone n-gram model
Data P(FA) P(Miss) ATWVWord 1-best .00001 .770 .215Word Consensus Nets .00002 .687 .294Word Lattices .00002 .657 .322Fragment 1-best .00001 .680 .306Fragment Consensus Nets .00003 .584 .390Fragment Lattices .00003 .485 .484
33 / 45
Experiment II - ATWV vs N-best L2S PronunciationsMSTD data-set, MSTD-OOV query-set: 1290 OOVs, phonetic indexes
0 1 2 3 4 5 6 7 8 9 100.2
0.25
0.3
0.322
0.35
0.4
0.45
0.4840.5
N
ATW
V
Fragment Lattices + Weighted L2S PronunciationsFragment Lattices + Unweighted L2S PronunciationsWord Lattices + Weighted L2S PronunciationsWord Lattices + Unweighted L2S Pronunciations
Fragment Lattices + Reflex
Word Lattices + Reflex
34 / 45
Combined DET Plot for Weighted L2S PronunciationsMSTD data-set, MSTD-OOV query-set: 1290 OOVs, phonetic indexes
98
95
90
80
60
40.1.05.02.01.004.001.0001
Miss
pro
babi
lity (i
n %
)
False Alarm probability (in %)
Combined DET Plot: Weighted Letter-to-Sound 1-5 Best Fragment Lattices
1-best, MTWV=0.334, ATWV=0.3722-best, MTWV=0.354, ATWV=0.4223-best, MTWV=0.352, ATWV=0.4404-best, MTWV=0.339, ATWV=0.4475-best, MTWV=0.316, ATWV=0.451
35 / 45
Combined DET Plot for Weighted L2S PronunciationsMSTD data-set, MSTD-OOV query-set: 1290 OOVs, phonetic indexes
Maximum Term Weighted Valuew/ Global Thresholding
peaks at 2-best
98
95
90
80
60
40.1.05.02.01.004.001.0001
Miss
pro
babi
lity (i
n %
)
False Alarm probability (in %)
Combined DET Plot: Weighted Letter-to-Sound 1-5 Best Fragment Lattices
1-best, MTWV=0.334, ATWV=0.3722-best, MTWV=0.354, ATWV=0.4223-best, MTWV=0.352, ATWV=0.4404-best, MTWV=0.339, ATWV=0.4475-best, MTWV=0.316, ATWV=0.451
35 / 45
Combined DET Plot for Weighted L2S PronunciationsMSTD data-set, MSTD-OOV query-set: 1290 OOVs, phonetic indexes
Actual Term Weighted Valuew/ Term Specific Thresholding
(β = 1000)
98
95
90
80
60
40.1.05.02.01.004.001.0001
Miss
pro
babi
lity (i
n %
)
False Alarm probability (in %)
Combined DET Plot: Weighted Letter-to-Sound 1-5 Best Fragment Lattices
1-best, MTWV=0.334, ATWV=0.3722-best, MTWV=0.354, ATWV=0.4223-best, MTWV=0.352, ATWV=0.4404-best, MTWV=0.339, ATWV=0.4475-best, MTWV=0.316, ATWV=0.451
35 / 45
Summary
I Lattice indexes perform better than CN indexes in OOVretrieval task.
I Phone indexes generated from sub-word (fragment) latticesrepresent OOVs better.
I Using multiple pronunciations from the L2S system improvesthe performance, particularly when they are properly weighted.
36 / 45
OutlineIntroductionLattice Indexation/Search Framework
PreliminariesSpoken Utterance Retrieval with Factor Transducer2-Pass Spoken Term Detection with Factor TransducerSpoken Term Detection with Modified Factor TransducerSpoken Term Detection with Timed Factor TransducerExperimental Results
RetrievalQuery Forming and Expansion for Phonetic SearchExperimental Results
Thresholding for Spoken Term DetectionGlobal ThresholdingTerm Weighted Value Based Term Specific ThresholdingScore Distribution Based Term Specific ThresholdingExperimental Results
Global Thresholding
0 0.2 0.4 0.6 0.8 10
5
10
15
Posterior Score
n
Incorrect Class DistributionCorrect Class DistributionIncorrect Class EM EstimateCorrect Class EM Estimate
Normalized histogram of posteriorscores for an example query
I Pick a global threshold θ for all query termsI Apply binary thresholdingI Vary θ for different operating points
No term specific behavior, no joint processing of candidates, hencepoor performance!
37 / 45
Global Thresholding
0 0.2 0.4 0.6 0.8 10
5
10
15
Posterior Score
n
Incorrect Class DistributionCorrect Class DistributionIncorrect Class EM EstimateCorrect Class EM Estimate
Normalized histogram of posteriorscores for an example query
I Pick a global threshold θ for all query termsI Apply binary thresholdingI Vary θ for different operating points
No term specific behavior, no joint processing of candidates, hencepoor performance!
37 / 45
Global Thresholding
0 0.2 0.4 0.6 0.8 10
5
10
15
Posterior Score
n
Incorrect Class DistributionCorrect Class DistributionIncorrect Class EM EstimateCorrect Class EM Estimate
Reject Accept Normalized histogram of posteriorscores for an example query
I Pick a global threshold θ for all query termsI Apply binary thresholdingI Vary θ for different operating points
No term specific behavior, no joint processing of candidates, hencepoor performance!
37 / 45
Global Thresholding
0 0.2 0.4 0.6 0.8 10
5
10
15
Posterior Score
n
Incorrect Class DistributionCorrect Class DistributionIncorrect Class EM EstimateCorrect Class EM Estimate
Reject Accept Normalized histogram of posteriorscores for an example query
I Pick a global threshold θ for all query termsI Apply binary thresholdingI Vary θ for different operating points
No term specific behavior, no joint processing of candidates, hencepoor performance!
37 / 45
Global Thresholding
0 0.2 0.4 0.6 0.8 10
5
10
15
Posterior Score
n
Incorrect Class DistributionCorrect Class DistributionIncorrect Class EM EstimateCorrect Class EM Estimate
Reject Accept Normalized histogram of posteriorscores for an example query
I Pick a global threshold θ for all query termsI Apply binary thresholdingI Vary θ for different operating points
No term specific behavior, no joint processing of candidates, hencepoor performance!
37 / 45
Term Weighted Value (TWV) [NIST, 2006]
TWV = 1− 1Q
Q∑k=1{Pmiss(qk) + βPFA(qk)}
Pmiss(qk) = 1−C(qk)R(qk)
, PFA(qk) =A(qk)− C(qk)T − C(qk)
Q Number of queriesR(qk) Number of occurrences of query qkA(qk) Total number of retrieved documents for qkC(qk) Number of correctly retrieved documents for qk
T Total duration of the speech archiveβ Cost of false alarms relative to hits
38 / 45
TWV Based Term Specific Thresholding [Miller et al., 2007]
V̂hit(qk) =1
R̂(qk), ĈFA(qk) =
β
T − R̂(qk)
θ̂(qk) =ĈFA(qk)
ĈFA(qk) + V̂hit(qk)
V̂hit(qk) Expected value of a hit for qkĈFA(qk) Expected cost of a false alarm for qkR̂(qk) Expected count of occurrences of qkθ̂(qk) Optimal threshold for qk maximizing TWV in the
expected sense
I Term specific expected counts → Term specific thresholdsI Vary β for different operating points
Only the sum of individual scores affects the threshold!
39 / 45
TWV Based Term Specific Thresholding [Miller et al., 2007]
V̂hit(qk) =1
R̂(qk), ĈFA(qk) =
β
T − R̂(qk)
θ̂(qk) =ĈFA(qk)
ĈFA(qk) + V̂hit(qk)
V̂hit(qk) Expected value of a hit for qkĈFA(qk) Expected cost of a false alarm for qkR̂(qk) Expected count of occurrences of qkθ̂(qk) Optimal threshold for qk maximizing TWV in the
expected sense
I Term specific expected counts → Term specific thresholdsI Vary β for different operating points
Only the sum of individual scores affects the threshold!
39 / 45
Exploiting Score Distributions[Manmatha et al., 2001,Can and Saraclar, 2009]
0 0.2 0.4 0.6 0.8 10
5
10
15
Posterior Score
n
Incorrect Class DistributionCorrect Class DistributionIncorrect Class EM EstimateCorrect Class EM Estimate
I Scores follow exponential-likedistributions
I Model both classes (0,1) withexponential distributions:
p0(y) = λ0e−λ0y
p1(y) = λ1e−λ1(1−y)
I Model all candidates as a mixture of exponentials
p(y) = π0p0(y) + (1− π0)p1(y)
I Use EM to estimate parameters ( λ0, λ1, π0)
40 / 45
Exploiting Score Distributions[Manmatha et al., 2001,Can and Saraclar, 2009]
0 0.2 0.4 0.6 0.8 10
5
10
15
Posterior Score
n
Incorrect Class DistributionCorrect Class DistributionIncorrect Class EM EstimateCorrect Class EM Estimate
I Scores follow exponential-likedistributions
I Model both classes (0,1) withexponential distributions:
p0(y) = λ0e−λ0y
p1(y) = λ1e−λ1(1−y)
I Model all candidates as a mixture of exponentials
p(y) = π0p0(y) + (1− π0)p1(y)
I Use EM to estimate parameters ( λ0, λ1, π0)
40 / 45
Computing Term Specific Thresholds
Cost Scheme
C =[
0 1α 0
]where α is a user defined parameter specifying the cost of falsealarms relative to hits.
I Estimate mixture parameters → each component ∼ a class,mixture weights ∼ priors
I For k = 1, . . . , Q, Bayes-optimal threshold θ̂(qk) is given as:
θ̂(qk) =λ̂1(qk) + log(λ̂0(qk)/λ̂1(qk)) + log(π̂0(qk)/π̂1(qk)) + logα
λ̂0(qk) + λ̂1(qk)
I Different operating points can be achieved by changing α.
41 / 45
STD Evaluation Metrics
Precision-Recall CurvesPrecision and recall are the most popular IR evaluation metrics.Given a set of queries qk, k = 1, . . . , Q, let
R(qk) be the number of segments in the collection that arerelated to the query qk,
A(qk) be the total number of retrieved segments andC(qk) be the number of correctly retrieved segments.
Precision = 1Q
Q∑k=1
C(qk)A(qk)
, Recall = 1Q
Q∑k=1
C(qk)R(qk)
ROC CurvesThese curves use NIST’s PMiss and PFA definitions for STD.
42 / 45
Precision-Recall ComparisonBUTBN-R data-set, R-IV query-set, lattice beam = 4
0.5 0.6 0.7 0.8 0.9 10.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Precision
Reca
ll
Global ThresholdingTWV Based TSTScore Distribution Based TST
43 / 45
ROC ComparisonBUTBN-R data-set, R-IV query-set, lattice beam = 4
10−6 10−5 10−4 10−30.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
PFA
P Mis
s
Global ThresholdingTWV Based TSTScore Distribution Based TST
44 / 45
Summary
I Exploiting score distributions leads to a viable term specificthresholding method
I SD-TST optimizes precision metric → superior to TWV-TSTover a large interval of precision values
I TWV-TST optimizes false alarm metric → has much betterROC performance
45 / 45
PublicationsCan, D., Cooper, E., Ghoshal, A., Jansche, M., Khudanpur,S., Ramabhadran, B., Riley, M., Saraclar, M., Sethy, A.,Ulinski, M., and White, C. (2009a).Web derived pronunciations for spoken term detection.In SIGIR, pages 83–90.
Can, D., Cooper, E., Sethy, A., White, C., Ramabhadran, B.,and Saraclar, M. (2009b).Effect of pronunciations on oov queries in spoken termdetection.Acoustics, Speech, and Signal Processing, IEEE InternationalConference on, 0:3957–3960.Can, D. and Saraclar, M. (2009).Score distribution based term specific thresholding for spokenterm detection.In Proceedings of NAACL-HLT 2009, pages 269–272, Boulder,Colorado. Association for Computational Linguistics.
1 / 10
References I
Allauzen, C., Mohri, M., and Saraclar, M. (2004).General-indexation of weighted automata-application tospoken utterance retrieval.In Proc. HLT-NAACL.Can, D., Cooper, E., Sethy, A., White, C., Ramabhadran, B.,and Saraclar, M. (2009).Effect of pronunciations on oov queries in spoken termdetection.Acoustics, Speech, and Signal Processing, IEEE InternationalConference on, 0:3957–3960.Can, D. and Saraclar, M. (2009).Score distribution based term specific thresholding for spokenterm detection.In Proceedings of NAACL-HLT 2009, pages 269–272, Boulder,Colorado. Association for Computational Linguistics.
2 / 10
References II
Chaudhari, U. V. and Picheny, M. (2007).Improvements in phone based audio search via constrainedmatch with high order confusion estimates.In Proc. of ASRU.Chelba, C. and Acero, A. (2005).Position specific posterior lattices for indexing speech.In Proc. of ACL.Li, Y. C., Lo, W. K., Meng, H. M., and Ching., P. C. (2000).Query expansion using phonetic confusions for chinese spokendocument retrieval.In Proc. of IRAL.Mamou, J., Ramabhadran, B., and Siohan, O. (2007).Vocabulary independent spoken term detection.In Proc. of ACM SIGIR.
3 / 10
References III
Manmatha, R., Rath, T., and Feng, F. (2001).Modeling score distributions for combining the outputs ofsearch engines.In SIGIR ’01, pages 267–275, New York, NY, USA. ACM.
Miller, D. R. H., Kleber, M., Kao, C., Kimball, O., Colthurst,T., Lowe, S. A., Schwartz, R. M., and Gish, H. (2007).Rapid and accurate spoken term detection.In Proc. Interspeech.
NIST (2006).The spoken term detection (STD) 2006 evaluation plan.http://www.itl.nist.gov/iad/mig/tests/std/.
Parlak, S. and Saraclar, M. (2008).Spoken term detection for Turkish Broadcast News.In Proc. ICASSP.
4 / 10
References IV
Saraclar, M. and Sproat, R. (2004).Lattice-based search for spoken utterance retrieval.In Proc. HLT-NAACL.Siohan, O. and Bacchiani, M. (2005).Fast vocabulary independent audio search using path basedgraph indexing.In Proc. of Interspeech.
Zhou, Z. Y., Yu, P., Chelba, C., and Seide, F. (2006).Towards spoken-document retrieval for the internet: latticeindexing for large-scale web-search architectures.In Proc. of HLT-NAACL.
5 / 10
Notation & DefinitionsProduct Semiring
DefinitionFor two partially-ordered semirings A = (A,⊕A,⊗A, 0A, 1A) andB = (B,⊕B,⊗B, 0B, 1B), the product semiring over A× B:
A× B = (A× B,⊕×,⊗×, 0A × 0B, 1A × 1B)
where ⊕× and ⊗× are component-wise operators
(a1, b1)⊕× (a2, b2) = (a1 ⊕A a2, b1 ⊕B b2),(a1, b1)⊗× (a2, b2) = (a1 ⊗A a2, b1 ⊗B b2).
The natural order over A× B, given by
((a1, b1) ≤× (a2, b2))⇔ (a1 ⊕A a2 = a1, b1 ⊕B b2 = b1),
is a partial order, even if A and B are totally-ordered.6 / 10
Notation & DefinitionsLexicographic Semiring
DefinitionFor two partially-ordered semirings A = (A,⊕A,⊗A, 0A, 1A) andB = (B,⊕B,⊗B, 0B, 1B), the lexicographic semiring over A× B:
A ∗ B = (A× B,⊕∗,⊗∗, 0A × 0B, 1A × 1B)
where ⊕∗ is a lexicographic priority operator
(a1, b1)⊕∗ (a2, b2) =
(a1, b1 ⊕B b2) a1 = a2(a1, b1) a1 = a1 ⊕A a2 6= a2(a2, b2) a1 6= a1 ⊕A a2 = a2
and ⊗∗ is a component-wise multiplication operator.A ∗ B is a totally-ordered when A and B are so:
((a1, b1) ≤∗ (a2, b2))⇔(a1 = a1 ⊕A a2 6= a2)
or(a1 = a2 and b1 = b1 ⊕B b2) 7 / 10
Notation & DefinitionsWeighted Finite-State Automata
DefinitionI Given a transition e ∈ E:
I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.
I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.
I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].
qa
qb
m:h/.5
8 / 10
Notation & DefinitionsWeighted Finite-State Automata
DefinitionI Given a transition e ∈ E:
I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.
I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.
I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].
qa
qb
m:h/.5
8 / 10
Notation & DefinitionsWeighted Finite-State Automata
DefinitionI Given a transition e ∈ E:
I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.
I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.
I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].
qa
qb
m:h/.5
8 / 10
Notation & DefinitionsWeighted Finite-State Automata
DefinitionI Given a transition e ∈ E:
I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.
I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.
I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].
qa
qb
m:h/.5
8 / 10
Notation & DefinitionsWeighted Finite-State Automata
DefinitionI Given a transition e ∈ E:
I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.
I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.
I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].
qa
qb
m:h/.5
8 / 10
Notation & DefinitionsWeighted Finite-State Automata
DefinitionI Given a transition e ∈ E:
I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.
I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.
I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].
qa
qb
m:h/.5
8 / 10
Notation & DefinitionsWeighted Finite-State Automata
DefinitionI Given a transition e ∈ E:
I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.
I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.
I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].
qa
qb
m:h/.5
qa
qb
qc/1
m:h/.5
e:a/.7
e:v/.3
8 / 10
Notation & DefinitionsWeighted Finite-State Automata
DefinitionI Given a transition e ∈ E:
I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.
I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.
I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].
qa
qb
m:h/.5
qa
qb
qc/1
m:h/.5
e:a/.7
e:v/.3
8 / 10
Notation & DefinitionsWeighted Finite-State Automata
DefinitionI Given a transition e ∈ E:
I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.
I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.
I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].
qa
qb
m:h/.5
qa
qb
qc/1
m:h/.5
e:a/.7
e:v/.3
8 / 10
Notation & DefinitionsWeighted Finite-State Automata
DefinitionI Given a transition e ∈ E:
I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.
I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.
I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].
qa
qb
m:h/.5
qa
qb
qc/1
m:h/.5
e:a/.7
e:v/.3
8 / 10
Notation & DefinitionsWeighted Finite-State Automata
DefinitionI Given a transition e ∈ E:
I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.
I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.
I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].
qa
qb
m:h/.5
qa
qb
qc/1
m:h/.5
e:a/.7
e:v/.3
8 / 10
Notation & DefinitionsWeighted Finite-State Automata
DefinitionI Given a transition e ∈ E:
I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.
I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.
I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].
qa
qb
m:h/.5
qa
qb
qc/1
m:h/.5
e:a/.7
e:v/.3
8 / 10
Notation & DefinitionsWeighted Finite-State Automata
DefinitionI Given a transition e ∈ E:
I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.
I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.
I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].
qa
qb
m:h/.5
qa
qb
qc/1
m:h/.5
e:a/.7
e:v/.3
8 / 10
Factor TransducerDefinitionFor a finite-state automaton Ai, i = 1 . . . n, the factor transducerof Ai is defined as the weighted finite-state transducer Ti for which:
JTiK(x, i) = − log(EPi [Ci(x)]) ∈ T
where x ∈ XAi and EPi [Ci(x)] is the expected count of x in Ai.For each state q ∈ Qi,d[q]: shortest distance from Ii to q (− log of forward prob.),f [q]: shortest distance from q to Fi (− log of backward prob.).
d[q] =log⊕
π∈P (Ii,q)(λi(p[π])+w[π]), f [q] =
log⊕π∈P (q,Fi)
(w[π]+ρi(n[π]))
− log(EPi [Cx]) =log⊕
i[π]=x(d[p[π]] + w[π] + f [n[π]])
9 / 10
Factor TransducerDefinitionFor a finite-state automaton Ai, i = 1 . . . n, the factor transducerof Ai is defined as the weighted finite-state transducer Ti for which:
JTiK(x, i) = − log(EPi [Ci(x)]) ∈ T
where x ∈ XAi and EPi [Ci(x)] is the expected count of x in Ai.For each state q ∈ Qi,d[q]: shortest distance from Ii to q (− log of forward prob.),f [q]: shortest distance from q to Fi (− log of backward prob.).
d[q] =log⊕
π∈P (Ii,q)(λi(p[π])+w[π]), f [q] =
log⊕π∈P (q,Fi)
(w[π]+ρi(n[π]))
− log(EPi [Cx]) =log⊕
i[π]=x(d[p[π]] + w[π] + f [n[π]])
9 / 10
EM Parameter UpdatesI Model all candidates as a mixture of exponentials
p(y) = π0p0(y) + (1− π0)p1(y)I Use EM to estimate parameters (λ0(qk), λ1(qk) and π0(qk))
given the candidate scores (yk,n n = 1, . . . , Nk) of a queryterm qk.
I First compute
P (j|yk,n) =π̂j(qk)pj(yk,n)
p(yk,n)j = 1, 2, n = 1, . . . , Nk
I Then update
λ̂0(qk) =∑n P (0|yk,n)∑
n P (0|yk,n)yk,n,
λ̂1(qk) =∑n P (1|yk,n)∑
n P (1|yk,n)(1− yk,n),
π̂j(qk) =1Nk
∑n
P (j|yk,n).
10 / 10
IntroductionLattice Indexation/Search FrameworkPreliminariesSpoken Utterance Retrieval with Factor Transducer2-Pass Spoken Term Detection with Factor TransducerSpoken Term Detection with Modified Factor TransducerSpoken Term Detection with Timed Factor TransducerExperimental Results
RetrievalQuery Forming and Expansion for Phonetic SearchExperimental Results
Thresholding for Spoken Term DetectionGlobal ThresholdingTerm Weighted Value Based Term Specific ThresholdingScore Distribution Based Term Specific ThresholdingExperimental Results
AppendixAppendix