+ All Categories
Home > Documents > Indexation, Retrieval and Detection Techniques for Spoken ... ·...

Indexation, Retrieval and Detection Techniques for Spoken ... ·...

Date post: 06-Jul-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
132
Indexation, Retrieval and Detection Techniques for Spoken Term Detection Doğan Can Boğaziçi University Department of Electrical & Electronics Engineering BUSIM Lab January 26, 2010
Transcript
  • Indexation, Retrieval and Detection Techniquesfor Spoken Term Detection

    Doğan Can

    Boğaziçi UniversityDepartment of Electrical & Electronics Engineering

    BUSIM Lab

    January 26, 2010

  • OutlineIntroductionLattice Indexation/Search Framework

    PreliminariesSpoken Utterance Retrieval with Factor Transducer2-Pass Spoken Term Detection with Factor TransducerSpoken Term Detection with Modified Factor TransducerSpoken Term Detection with Timed Factor TransducerExperimental Results

    RetrievalQuery Forming and Expansion for Phonetic SearchExperimental Results

    Thresholding for Spoken Term DetectionGlobal ThresholdingTerm Weighted Value Based Term Specific ThresholdingScore Distribution Based Term Specific ThresholdingExperimental Results

    2 / 45

  • OutlineIntroductionLattice Indexation/Search Framework

    PreliminariesSpoken Utterance Retrieval with Factor Transducer2-Pass Spoken Term Detection with Factor TransducerSpoken Term Detection with Modified Factor TransducerSpoken Term Detection with Timed Factor TransducerExperimental Results

    RetrievalQuery Forming and Expansion for Phonetic SearchExperimental Results

    Thresholding for Spoken Term DetectionGlobal ThresholdingTerm Weighted Value Based Term Specific ThresholdingScore Distribution Based Term Specific ThresholdingExperimental Results

  • Application: Sign Dictionary

    3 / 45

    demo.movMedia File (video/quicktime)

  • Comparison of Speech Retrieval TasksSpoken Document Retrieval vs. Spoken Utterance Retrieval vs. Spoken Term Detection

    Query Relation Return (Text Analogue)

    SDR long lexical+semantic documents (relevant pages)

    SUR short inclusion (exact) utterances (sentences)

    STD short exact match occurrences (positions)

    4 / 45

  • Challenges of the Spoken Term Detection Task

    I Aim: Open vocabulary searchReference: “Taipei night view"

    I Challenge: Unreliable transcriptionsASR Output: “tie bay light view"

    1. High error rate of one-best transcripts

    Alternative transcriptions: [tie bay [light 0.6, night 0.4] view]

    2. Out-Of-Vocabulary queries

    Phonetic search: /t ay b ey n ay t v iy w/

    3. Boost in false alarms due to 1 and 2

    5 / 45

  • Challenges of the Spoken Term Detection Task

    I Aim: Open vocabulary searchReference: “Taipei night view"

    I Challenge: Unreliable transcriptionsASR Output: “tie bay light view"

    1. High error rate of one-best transcripts

    Alternative transcriptions: [tie bay [light 0.6, night 0.4] view]

    2. Out-Of-Vocabulary queries

    Phonetic search: /t ay b ey n ay t v iy w/

    3. Boost in false alarms due to 1 and 2

    5 / 45

  • Challenges of the Spoken Term Detection Task

    I Aim: Open vocabulary searchReference: “Taipei night view"

    I Challenge: Unreliable transcriptionsASR Output: “tie bay light view"

    1. High error rate of one-best transcripts

    Alternative transcriptions: [tie bay [light 0.6, night 0.4] view]

    2. Out-Of-Vocabulary queries

    Phonetic search: /t ay b ey n ay t v iy w/

    3. Boost in false alarms due to 1 and 2

    5 / 45

  • Challenges of the Spoken Term Detection Task

    I Aim: Open vocabulary searchReference: “Taipei night view"

    I Challenge: Unreliable transcriptionsASR Output: “tie bay light view"

    1. High error rate of one-best transcripts

    Alternative transcriptions: [tie bay [light 0.6, night 0.4] view]

    2. Out-Of-Vocabulary queries

    Phonetic search: /t ay b ey n ay t v iy w/

    3. Boost in false alarms due to 1 and 2

    5 / 45

  • Challenges of the Spoken Term Detection Task

    I Aim: Open vocabulary searchReference: “Taipei night view"

    I Challenge: Unreliable transcriptionsASR Output: “tie bay light view"

    1. High error rate of one-best transcripts

    Alternative transcriptions: [tie bay [light 0.6, night 0.4] view]

    2. Out-Of-Vocabulary queries

    Phonetic search: /t ay b ey n ay t v iy w/

    3. Boost in false alarms due to 1 and 2

    5 / 45

  • Challenges of the Spoken Term Detection Task

    I Aim: Open vocabulary searchReference: “Taipei night view"

    I Challenge: Unreliable transcriptionsASR Output: “tie bay light view"

    1. High error rate of one-best transcripts

    Efficient Indexing and Search Framework for STD

    2. Out-Of-Vocabulary queries

    Utilizing Weighted OOV Query Pronunciations

    3. Boost in false alarms due to 1 and 2Exploiting Score Statistics for STD

    5 / 45

  • Previous Work in the Field

    How to Index and Search LatticesI General Indexation of Weighted Automata [Saraclar and

    Sproat, 2004,Allauzen et al., 2004]I Position Specific Posterior Lattices (PSPL) [Chelba and

    Acero, 2005]I Time-based Merging for Indexing (TMI) [Zhou et al., 2006]

    How to Alleviate OOV IssueI Search on Sub-word Decoding [Saraclar and Sproat,

    2004,Siohan and Bacchiani, 2005,Mamou et al., 2007]I Search on the sub-word representation of word decoding

    [Chaudhari and Picheny, 2007]I Phonetic query expansion [Li et al., 2000]

    6 / 45

  • Anatomy of a Spoken Term Detection (STD) System

    User

    Query

    Preprocess SearchEngine

    largerthan τ?

    Return

    Omit

    SpeechDatabase

    ASROutput

    Index

    ASR

    INDEXINGRETRIEVAL

    DETECTION

    yes

    no

    7 / 45

  • Anatomy of a Spoken Term Detection (STD) System

    User

    Query

    Preprocess SearchEngine

    largerthan τ?

    Return

    Omit

    SpeechDatabase

    ASROutput

    Index

    ASR

    INDEXINGRETRIEVAL

    DETECTION

    yes

    no

    7 / 45

  • Anatomy of a Spoken Term Detection (STD) System

    User

    Query

    Preprocess SearchEngine

    largerthan τ?

    Return

    Omit

    SpeechDatabase

    ASROutput

    Index

    ASR

    INDEXINGRETRIEVAL

    DETECTION

    yes

    no

    7 / 45

  • Anatomy of a Spoken Term Detection (STD) System

    User

    Query

    Preprocess SearchEngine

    largerthan τ?

    Return

    Omit

    SpeechDatabase

    ASROutput

    Index

    ASR

    INDEXINGRETRIEVAL

    DETECTION

    yes

    no

    7 / 45

  • Anatomy of a Spoken Term Detection (STD) System

    User

    Query

    Preprocess SearchEngine

    largerthan τ?

    Return

    Omit

    SpeechDatabase

    ASROutput

    Index

    ASR

    INDEXINGRETRIEVAL

    DETECTION

    yes

    no

    7 / 45

  • OutlineIntroductionLattice Indexation/Search Framework

    PreliminariesSpoken Utterance Retrieval with Factor Transducer2-Pass Spoken Term Detection with Factor TransducerSpoken Term Detection with Modified Factor TransducerSpoken Term Detection with Timed Factor TransducerExperimental Results

    RetrievalQuery Forming and Expansion for Phonetic SearchExperimental Results

    Thresholding for Spoken Term DetectionGlobal ThresholdingTerm Weighted Value Based Term Specific ThresholdingScore Distribution Based Term Specific ThresholdingExperimental Results

  • Notation & DefinitionsSemirings

    DefinitionA system (K,⊕,⊗, 0̄, 1̄) is a semiring if:

    I (K,⊕, 0̄) is a commutative monoid with identity element 0̄;I (K,⊗, 1̄) is a monoid with identity element 1̄;I ⊗ distributes over ⊕;I 0̄ is an annihilator for ⊗: for all a ∈ K, a⊗ 0̄ = 0̄⊗ a = 0̄.

    Set (K) ⊕ ⊗ 0̄ 1̄Boolean B {0, 1} ∨ ∧ 0 1

    Probability R R+ + × 0 1Log L R ∪ {+∞} ⊕loga + +∞ 0

    Tropical T R ∪ {+∞} min + +∞ 0Tropical T ′ R ∪ {−∞} max + −∞ 0

    a a⊕log b = − log(e−a + e−b)

    8 / 45

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :

    I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :

    set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.

    qa/1

    qb

    qc/1

    m:h/.5

    e:a/.5

    e:a/.7

    e:v/.3

    �:�/1

    9 / 45

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :

    I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :

    set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.

    qa/1

    qb

    qc/1

    m:h/.5

    e:a/.5

    e:a/.7

    e:v/.3

    �:�/1

    9 / 45

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :

    I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :

    set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.

    qa/1

    qb

    qc/1

    m:h/.5

    e:a/.5

    e:a/.7

    e:v/.3

    �:�/1

    9 / 45

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :

    I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :

    set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.

    qa/1

    qb

    qc/1

    m:h/.5

    e:a/.5

    e:a/.7

    e:v/.3

    �:�/1

    9 / 45

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :

    I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :

    set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.

    qa/1

    qb

    qc/1

    m:h/.5

    e:a/.5

    e:a/.7

    e:v/.3

    �:�/1

    9 / 45

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :

    I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :

    set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.

    qa/1

    qb

    qc/1

    m:h/.5

    e:a/.5

    e:a/.7

    e:v/.3

    �:�/1

    9 / 45

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :

    I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :

    set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.

    qa/1

    qb

    qc/1

    m:h/.5

    e:a/.5

    e:a/.7

    e:v/.3

    �:�/1

    9 / 45

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :

    I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :

    set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.

    qa/1

    qb

    qc/1

    m:h/.5

    e:a/.5

    e:a/.7

    e:v/.3

    �:�/1

    9 / 45

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionA weighted finite-state transducer T over asemiring K is an 8-tupleT = (Σ,∆, Q, I, F,E, λ, ρ) :

    I Σ : input alphabet; i.e. Σ = {m, e}I ∆ : output alphabet; i.e. ∆ = {h, a, v}I Q : set of states;I I ⊆ Q : set of initial states;I F ⊆ Q : set of final states;I E ⊆ Q× (Σ ∪ {�})× (∆ ∪ {�})×K×Q :

    set of transitions;I λ : I → K : initial weight function;I ρ : F → K : final weight function.

    qa/1

    qb

    qc/1

    m:h/.5

    e:a/.5

    e:a/.7

    e:v/.3

    �:�/1

    9 / 45

  • Notation & DefinitionsFactor Automaton

    DefinitionGiven two strings u, v ∈ Σ∗, v is a factor (substring) of u ifu = xvy for some x, y ∈ Σ∗. More generally, v is a factor ofL ⊆ Σ∗ if v is a factor of some u ∈ L.

    DefinitionThe factor automaton F (u) of u is the minimal deterministicfinite-state acceptor recognizing exactly Xu, the set of factors of u.

    DefinitionSimilarly, the factor automaton F (A) of A is the minimaldeterministic finite-state acceptor recognizing exactly XA, the setof factors of the strings recognized by A.

    10 / 45

  • Notation & DefinitionsFactor Automaton

    DefinitionGiven two strings u, v ∈ Σ∗, v is a factor (substring) of u ifu = xvy for some x, y ∈ Σ∗. More generally, v is a factor ofL ⊆ Σ∗ if v is a factor of some u ∈ L.

    DefinitionThe factor automaton F (u) of u is the minimal deterministicfinite-state acceptor recognizing exactly Xu, the set of factors of u.

    DefinitionSimilarly, the factor automaton F (A) of A is the minimaldeterministic finite-state acceptor recognizing exactly XA, the setof factors of the strings recognized by A.

    10 / 45

  • Notation & DefinitionsFactor Automaton

    DefinitionGiven two strings u, v ∈ Σ∗, v is a factor (substring) of u ifu = xvy for some x, y ∈ Σ∗. More generally, v is a factor ofL ⊆ Σ∗ if v is a factor of some u ∈ L.

    DefinitionThe factor automaton F (u) of u is the minimal deterministicfinite-state acceptor recognizing exactly Xu, the set of factors of u.

    DefinitionSimilarly, the factor automaton F (A) of A is the minimaldeterministic finite-state acceptor recognizing exactly XA, the setof factors of the strings recognized by A.

    10 / 45

  • Indexing WFSA for Spoken Utterance Retrieval

    0

    1

    2

    good/1

    evening/.6 morning/.4

    Setup:For each speech utterance ui, i = 1, ..., n,

    I A weighted automaton Ai over Σ and L;i.e. word lattice output by ASR.

    Objective:Create a full index to directly search for anyfactor of these automata; i.e. evening, goodmorning, etc.Notes:

    I different from classical indexation,input data is uncertain

    I must make use of weights

    11 / 45

  • Factor Transducer Construction: Toy ExampleFactor Selection

    10 2

    s

    e

    lose : �/.4

    find : �/.6 yourself : �/1

    1. Replace each transition (p, a, w, q) by (p, a, �, w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi

    12 / 45

  • Factor Transducer Construction: Toy ExampleFactor Selection

    10 2

    s

    e

    lose : �/.4

    find : �/.6 yourself : �/1

    1. Replace each transition (p, a, w, q) by (p, a, �, w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi

    12 / 45

  • Factor Transducer Construction: Toy ExampleFactor Selection

    10 2

    s

    e

    lose : �/.4

    find : �/.6 yourself : �/1

    1. Replace each transition (p, a, w, q) by (p, a, �, w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi

    12 / 45

  • Factor Transducer Construction: Toy ExampleFactor Selection

    10 2

    s

    e

    lose : �/.4

    find : �/.6 yourself : �/1

    1. Replace each transition (p, a, w, q) by (p, a, �, w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi

    12 / 45

  • Factor Transducer Construction: Toy ExampleFactor Selection

    10 2

    s

    e

    lose : �/.4

    find : �/.6 yourself : �/1

    � : �/1

    � : �/1

    � : �/1

    1. Replace each transition (p, a, w, q) by (p, a, �, w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi

    12 / 45

  • Factor Transducer Construction: Toy ExampleFactor Selection

    10 2

    s

    e

    lose : �/.4

    find : �/.6 yourself : �/1

    � : �/1

    � : �/1

    � : �/1

    � : i/1� : i/1

    � : i/1

    1. Replace each transition (p, a, w, q) by (p, a, �, w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi

    12 / 45

  • Factor Transducer Construction: Toy ExampleOptimization

    1. Weighted �-removal2. Weighted determinization over L3. Weighted minimization over L by viewing Ti as an acceptor

    Before

    10 2

    s

    e

    lose: �/.4

    find: �/.6 yourself: �/1

    � : �/1

    � : �/1

    � : �/1

    � : i/1

    � : i/1

    � : i/1

    After

    1 2

    0

    3

    lose: �/.4

    find: �/.6 yourself: �/1

    � : i/1

    yourself: �/1

    � : i/1

    13 / 45

  • Full Factor Transducer Construction & SearchFull Factor Transducer:Given Ti for each Ai, i = 1 . . . n:1. Take union,

    U =⋃i

    Ti, i = 1, . . . , n

    2. Weighted �-removal, determinization (and minimization)3. Define T as the transducer obtained after sorting input labels

    Search:1. User query X can be any weighted automaton! (Regular

    expressions)2. Compose X with T on the input side and project result onto

    the output labels,P = Π2(X ◦ T )

    3. Weighted �-removal + pruning or n-shortest paths + sort withshortest path algorithm

    14 / 45

  • Full Factor Transducer Construction & SearchFull Factor Transducer:Given Ti for each Ai, i = 1 . . . n:1. Take union,

    U =⋃i

    Ti, i = 1, . . . , n

    2. Weighted �-removal, determinization (and minimization)3. Define T as the transducer obtained after sorting input labels

    Search:1. User query X can be any weighted automaton! (Regular

    expressions)2. Compose X with T on the input side and project result onto

    the output labels,P = Π2(X ◦ T )

    3. Weighted �-removal + pruning or n-shortest paths + sort withshortest path algorithm

    14 / 45

  • Spoken Utterance Retrieval with Factor Transducer[Allauzen et al., 2004]

    Database:1. “a a"

    0 1 2

    a/1 a/1

    2. “[b .6, a .4] a"

    0 1 2

    b/.6

    a/.4

    a/1

    0Query: 1

    a/1

    Index:

    0

    1

    2

    5

    3

    4

    a:�/1

    b:�/1

    a:�/1

    �:1/2

    �:2/1.4

    �:2/.6

    a:�/1

    �:1/1

    �:2/.4

    �:2.6

    15 / 45

  • Spoken Utterance Retrieval with Factor Transducer[Allauzen et al., 2004]

    Database:1. “a a"

    0 1 2

    a/1 a/1

    2. “[b .6, a .4] a"

    0 1 2

    b/.6

    a/.4

    a/1

    0Query: 1

    a/1

    Index:

    0

    1

    2

    5

    3

    4

    a:�/1

    b:�/1

    a:�/1

    �:1/2

    �:2/1.4

    �:2/.6

    a:�/1

    �:1/1

    �:2/.4

    �:2.6

    15 / 45

  • Spoken Utterance Retrieval with Factor Transducer[Allauzen et al., 2004]

    Database:1. “a a"

    0 1 2

    a/1 a/1

    2. “[b .6, a .4] a"

    0 1 2

    b/.6

    a/.4

    a/1

    0Query: 1

    a/1

    Index:

    0

    1

    2

    5

    3

    4

    a:�/1

    b:�/1

    a:�/1

    �:1/2

    �:2/1.4

    �:2/.6

    a:�/1

    �:1/1

    �:2/.4

    �:2.6

    15 / 45

  • Spoken Utterance Retrieval with Factor Transducer[Allauzen et al., 2004]

    Database:1. “a a"

    0 1 2

    a/1 a/1

    2. “[b .6, a .4] a"

    0 1 2

    b/.6

    a/.4

    a/1

    0Query: 1

    a/1

    Results:

    0 1 2a:�/1

    �:1/2

    �:2/1.4

    (Utterance ID, Expected Count):1. (1,2)2. (2,1.4)

    15 / 45

  • 2-pass STD with Factor Transducer[Parlak and Saraclar, 2008,Can et al., 2009]

    ProcedureI For each query:

    I Obtain (utterance ID, expected count) pairs (1st pass)I For each utterance with expected count > τ :

    I Align the query with the utterance → time interval (2nd pass)[Parlak and Saraclar, 2008]

    I Align the query with the lattice → time interval (2nd pass)[Can et al., 2009]

    I Return (utterance ID, time interval, expected count) triplet

    ProblemsI 2nd pass takes time → slowI Multiple occurrences of a query in the same utterance

    contribute to the same expected count.I Ideal for Spoken Utterance RetrievalI Not so for Spoken Term Detection

    16 / 45

  • Modified Factor Transducer Construction: Toy ExampleFactor Selection

    L0 L1 L20 1 3

    10 2

    s

    e

    lose : 0-1/.4

    find : 0-1/.6 yourself : 1-3/1

    1. Replace each transition (p, a, w, q) by (p, a, Li[p]-Li[q], w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi

    17 / 45

  • Modified Factor Transducer Construction: Toy ExampleFactor Selection

    L0 L1 L20 1 3

    10 2

    s

    e

    lose : 0-1/.4

    find : 0-1/.6 yourself : 1-3/1

    1. Replace each transition (p, a, w, q) by (p, a, Li[p]-Li[q], w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi

    17 / 45

  • Modified Factor Transducer Construction: Toy ExampleFactor Selection

    L0 L1 L20 1 3

    10 2

    s

    e

    lose : 0-1/.4

    find : 0-1/.6 yourself : 1-3/1

    1. Replace each transition (p, a, w, q) by (p, a, Li[p]-Li[q], w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi

    17 / 45

  • Modified Factor Transducer Construction: Toy ExampleFactor Selection

    L0 L1 L20 1 3

    10 2

    s

    e

    lose : 0-1/.4

    find : 0-1/.6 yourself : 1-3/1

    1. Replace each transition (p, a, w, q) by (p, a, Li[p]-Li[q], w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi

    17 / 45

  • Modified Factor Transducer Construction: Toy ExampleFactor Selection

    L0 L1 L20 1 3

    10 2

    s

    e

    lose : 0-1/.4

    find : 0-1/.6 yourself : 1-3/1

    � : �/1

    � : �/1

    � : �/1

    1. Replace each transition (p, a, w, q) by (p, a, Li[p]-Li[q], w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi

    17 / 45

  • Modified Factor Transducer Construction: Toy ExampleFactor Selection

    L0 L1 L20 1 3

    10 2

    s

    e

    lose : 0-1/.4

    find : 0-1/.6 yourself : 1-3/1

    � : �/1

    � : �/1

    � : �/1

    � : i/1� : i/1

    � : i/1

    1. Replace each transition (p, a, w, q) by (p, a, Li[p]-Li[q], w, q)2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new transition (s, �, �, d[q], q) for each state q ∈ Qi5. Create a new transition (q, �, i, f [q], e) for each state q ∈ Qi

    17 / 45

  • Factor Transducer vs. Modified Factor Transducer(After Optimization)

    Factor Transducer

    1 2

    0

    3

    lose:�/.4

    find:�/.6

    yourself:�/1

    � : i/1

    yourself:�/1

    � : i/1

    Timed Factor Transducer

    1 2

    0

    3

    lose:0-1/.4

    find:0-1/.6

    yourself:1-3/1

    � : i/1

    yourself:1-3/1

    � : i/1

    18 / 45

  • Spoken Term Detection with Modified Factor Transducer[Can et al., 2009]

    Database:1. “a a"

    0 1 2

    a:0.1-1/1 a:1-1.8/.6

    a:1-1.9/.4

    2. “[b .6, a .4] a"

    0 1 2

    b:0.2-1/.6

    a:0.1-1/.4

    a:1-1.9/1

    Index:

    0

    1

    2

    6 5

    3

    4

    a:0-1/1

    a:1-2/1

    b:0-1/1

    a:1-2/1

    �:1/1

    �:2/.4

    �:2/1�:1/1

    �:2/.6

    a:1-2/1

    �:1/1

    �:2/.4

    �:2/.6

    19 / 45

  • Spoken Term Detection with Modified Factor Transducer[Can et al., 2009]

    Database:1. “a a"

    0 1 2

    a:0.1-1/1 a:1-1.9/.6

    a:1-1.9/.4

    2. “[b .6, a .4] a"

    0 1 2

    b:0.2-1/.6

    a:0.1-1/.4

    a:1-1.9/1

    CLUSTERING

    Index:

    0

    1

    2

    6 5

    3

    4

    a:0-1/1

    a:1-2/1

    b:0-1/1

    a:1-2/1

    �:1/1

    �:2/.4

    �:2/1�:1/1

    �:2/.6

    a:1-2/1

    �:1/1

    �:2/.4

    �:2/.6

    19 / 45

  • Spoken Term Detection with Modified Factor Transducer[Can et al., 2009]

    Database:1. “a a"

    0 1 2

    a:0-1/1 a:1-2/.6

    a:1-2/.4

    2. “[b .6, a .4] a"

    0 1 2

    b:0-1/.6

    a:0-1/.4

    a:1-2/1

    QUANTIZATION

    Index:

    0

    1

    2

    6 5

    3

    4

    a:0-1/1

    a:1-2/1

    b:0-1/1

    a:1-2/1

    �:1/1

    �:2/.4

    �:2/1�:1/1

    �:2/.6

    a:1-2/1

    �:1/1

    �:2/.4

    �:2/.6

    19 / 45

  • Spoken Term Detection with Modified Factor Transducer[Can et al., 2009]

    Database:1. “a a"

    0 1 2

    a:0-1/1 a:1-2/.6

    a:1-2/.4

    2. “[b .6, a .4] a"

    0 1 2

    b:0-1/.6

    a:0-1/.4

    a:1-2/1

    Index:

    0

    1

    2

    6 5

    3

    4

    a:0-1/1

    a:1-2/1

    b:0-1/1

    a:1-2/1

    �:1/1

    �:2/.4

    �:2/1�:1/1

    �:2/.6

    a:1-2/1

    �:1/1

    �:2/.4

    �:2/.6

    19 / 45

  • Spoken Term Detection with Modified Factor Transducer[Can et al., 2009]

    Database:1. “a a"

    0 1 2

    a:0-1/1 a:1-2/.6

    a:1-2/.4

    2. “[b .6, a .4] a"

    0 1 2

    b:0-1/.6

    a:0-1/.4

    a:1-2/1

    0Query: 1

    a/1

    Index:

    0

    1

    2

    6 5

    3

    4

    a:0-1/1

    a:1-2/1

    b:0-1/1

    a:1-2/1

    �:1/1

    �:2/.4

    �:2/1�:1/1

    �:2/.6

    a:1-2/1

    �:1/1

    �:2/.4

    �:2/.6

    19 / 45

  • Spoken Term Detection with Modified Factor Transducer[Can et al., 2009]

    Database:1. “a a"

    0 1 2

    a:0-1/1 a:1-2/.6

    a:1-2/.4

    2. “[b .6, a .4] a"

    0 1 2

    b:0-1/.6

    a:0-1/.4

    a:1-2/1

    0Query: 1

    a/1

    Results:

    0

    1

    2

    3

    a:0-1/1

    a:1-2/1

    �:1/1

    �:2/.4

    �:2/1

    �:1/1

    (Utterance ID, Time Interval,Posterior Probability):1. (1,0-1,1)2. (1,1-2,1)3. (2,0-1,.4)4. (2,1-2,1)

    19 / 45

  • 1-pass STD with Modified Factor Transducer[Can et al., 2009]

    ProcedureI For each query:

    I Obtain (utterance ID, time interval, posterior probability)I Return triplets with posterior probability > τ

    HighlightsI No 2nd pass → fastI No multiple occurrence problem, every distinct interval leads

    to another index entry → overlapping intervals are clusteredI Time interval mismatches → common paths are reduced →

    larger index → time intervals are quantized

    ProblemsI Index non-deterministic!

    20 / 45

  • Timed Factor Transducer Construction: Toy ExampleFactor Selection

    L0 L1 L20 1 3

    10 2

    s

    e

    lose:1/.4,0,0

    find:1/.6,0,0 yourself:1/1,0,0

    1. Replace each arc weight w ∈ L with {w, 1, 1} ∈ L × T × T ′

    2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new arc (s, �, �, {d[q], Li[q], 1}, q) for q ∈ Qi5. Create a new arc (q, �, i, {f [q], 1, Li[q]}, e) for q ∈ Qi

    21 / 45

  • Timed Factor Transducer Construction: Toy ExampleFactor Selection

    L0 L1 L20 1 3

    10 2

    s

    e

    lose:1/.4,0,0

    find:1/.6,0,0 yourself:1/1,0,0

    1. Replace each arc weight w ∈ L with {w, 1, 1} ∈ L × T × T ′

    2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new arc (s, �, �, {d[q], Li[q], 1}, q) for q ∈ Qi5. Create a new arc (q, �, i, {f [q], 1, Li[q]}, e) for q ∈ Qi

    21 / 45

  • Timed Factor Transducer Construction: Toy ExampleFactor Selection

    L0 L1 L20 1 3

    10 2

    s

    e

    lose:1/.4,0,0

    find:1/.6,0,0 yourself:1/1,0,0

    1. Replace each arc weight w ∈ L with {w, 1, 1} ∈ L × T × T ′

    2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new arc (s, �, �, {d[q], Li[q], 1}, q) for q ∈ Qi5. Create a new arc (q, �, i, {f [q], 1, Li[q]}, e) for q ∈ Qi

    21 / 45

  • Timed Factor Transducer Construction: Toy ExampleFactor Selection

    L0 L1 L20 1 3

    10 2

    s

    e

    lose:1/.4,0,0

    find:1/.6,0,0 yourself:1/1,0,0

    1. Replace each arc weight w ∈ L with {w, 1, 1} ∈ L × T × T ′

    2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new arc (s, �, �, {d[q], Li[q], 1}, q) for q ∈ Qi5. Create a new arc (q, �, i, {f [q], 1, Li[q]}, e) for q ∈ Qi

    21 / 45

  • Timed Factor Transducer Construction: Toy ExampleFactor Selection

    L0 L1 L20 1 3

    10 2

    s

    e

    lose:1/.4,0,0

    find:1/.6,0,0 yourself:1/1,0,0

    � : �/1,0,0

    � : �/1,1,0

    � : �/1,3,0

    1. Replace each arc weight w ∈ L with {w, 1, 1} ∈ L × T × T ′

    2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new arc (s, �, �, {d[q], Li[q], 1}, q) for q ∈ Qi5. Create a new arc (q, �, i, {f [q], 1, Li[q]}, e) for q ∈ Qi

    21 / 45

  • Timed Factor Transducer Construction: Toy ExampleFactor Selection

    L0 L1 L20 1 3

    10 2

    s

    e

    lose:1/.4,0,0

    find:1/.6,0,0 yourself:1/1,0,0

    � : �/1,0,0

    � : �/1,1,0

    � : �/1,3,0

    � : i/1,0,0� : i/1,0,1

    � : i/1,0,3

    1. Replace each arc weight w ∈ L with {w, 1, 1} ∈ L × T × T ′

    2. Create a new state s 6∈ Qi and make s the unique initial state3. Create a new state e 6∈ Qi and make e the unique final state4. Create a new arc (s, �, �, {d[q], Li[q], 1}, q) for q ∈ Qi5. Create a new arc (q, �, i, {f [q], 1, Li[q]}, e) for q ∈ Qi

    21 / 45

  • Factor Transducer vs. Timed Factor Transducer(After Optimization)

    Factor Transducer

    1 2

    0

    3

    lose: �/.4

    find: �/.6

    yourself: �/1

    � : i/1

    yourself: �/1

    � : i/1

    Timed Factor Transducer

    1 2

    0

    3

    lose: 1/.4,0,1

    find: 1/.6,0,1

    yourself: 1/1,1,3

    � : i/1,0,0

    yourself: 1/1,0,2

    � : i/1,0,0

    22 / 45

  • Spoken Term Detection with Timed Factor TransducerDatabase:1. “a b",[0.1,1,1.8]

    0 1 2

    a:a/1 b:b/1

    2. “b a",[0.2,1,1.9]

    0 1 2

    b:b/1 a:a/1

    Index:

    0

    1

    2

    5

    3

    4

    a:1/1,.1,1

    b:1/1,0,.8

    a:1/1,0,.9

    b:1/1,.2,1

    �:1/1,0,0

    �:2/1,.9,.9�:1/1,0,0

    �:2/1,0,0

    �:1/1,.8,.8

    �:2/1,0,0

    23 / 45

  • Spoken Term Detection with Timed Factor TransducerDatabase:1. “a b",[0.1,1,1.8]

    0 1 2

    a:1/1 b:1/1

    2. “b a",[0.2,1,1.9]

    0 1 2

    b:1/1 a:1/1

    CLUSTERING

    Index:

    0

    1

    2

    5

    3

    4

    a:1/1,.1,1

    b:1/1,0,.8

    a:1/1,0,.9

    b:1/1,.2,1

    �:1/1,0,0

    �:2/1,.9,.9�:1/1,0,0

    �:2/1,0,0

    �:1/1,.8,.8

    �:2/1,0,0

    23 / 45

  • Spoken Term Detection with Timed Factor TransducerDatabase:1. “a b",[0.1,1,1.8]

    0 1 2

    a:1/1 b:1/1

    2. “b a",[0.2,1,1.9]

    0 1 2

    b:1/1 a:1/1

    Index:

    0

    1

    2

    5

    3

    4

    a:1/1,.1,1

    b:1/1,0,.8

    a:1/1,0,.9

    b:1/1,.2,1

    �:1/1,0,0

    �:2/1,.9,.9�:1/1,0,0

    �:2/1,0,0

    �:1/1,.8,.8

    �:2/1,0,0

    23 / 45

  • Spoken Term Detection with Timed Factor TransducerDatabase:1. “a b",[0.1,1,1.8]

    0 1 2

    a:1/1 b:1/1

    2. “b a",[0.2,1,1.9]

    0 1 2

    b:1/1 a:1/1

    0Query: 1

    b/1,1,1

    Index:

    0

    1

    2

    5

    3

    4

    a:1/1,.1,1

    b:1/1,0,.8

    a:1/1,0,.9

    b:1/1,.2,1

    �:1/1,0,0

    �:2/1,.9,.9�:1/1,0,0

    �:2/1,0,0

    �:1/1,.8,.8

    �:2/1,0,0

    23 / 45

  • Spoken Term Detection with Timed Factor TransducerDatabase:1. “a b",[0.1,1,1.8]

    0 1 2

    a:1/1 b:1/1

    2. “b a",[0.2,1,1.9]

    0 1 2

    b:1/1 a:1/1

    0Query: 1

    b/1,1,1

    Results:

    0 1 2

    b:1/1,.2,1

    �:2/1,0,0

    �:1/1,.8,.8

    (Utterance ID, Time Interval,Posterior Probability):1. (1,1-1.8,1)2. (2,0.2-1,1)

    23 / 45

  • 1-pass STD with Timed Factor Transducer

    ProcedureI For each query:

    I Obtain (utterance ID, time interval, posterior probability)I Return triplets with posterior probability > τ

    HighlightsI No 2nd pass → fastI No multiple occurrence problem, every distinct interval leads

    to another index entry → overlapping intervals are clusteredI No time interval mismatch problem → efficient optimization

    (almost deterministic)

    24 / 45

  • Index Size vs. Beam WidthBUTBN-R data-set > 160 hours

    1 2 3 4 5 6 7 8 9 100

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    4500

    Beam Width

    Inde

    x Si

    ze (i

    n M

    Bs)

    Timed Factor TransducerModified Factor TransducerFactor Transducer

    25 / 45

  • Search Time vs. Beam WidthBUTBN-R data-set > 160 hours, R-IV query-set: 4400 IV terms

    1 2 3 4 5 6 7 8 9 10400

    600

    800

    1000

    1200

    Beam Width

    Tota

    l Sea

    rch

    Tim

    e (in

    sec

    onds

    )

    Factor Transducer (2−Stage)

    1 2 3 4 5 6 7 8 9 105

    10

    15

    20

    25

    30

    Beam Width

    Tota

    l Sea

    rch

    Tim

    e (in

    sec

    onds

    )

    Timed Factor TransducerModified Factor Transducer

    26 / 45

  • Per Query Search Time w.r.t. Query LengthBUTBN-R data-set > 160 hours, R-IV query-set: 4400 IV terms

    1 2 3 4 5 6 7 8 9 102

    4

    6

    8

    10

    12Query Length = 1

    Beam Width

    Aver

    age

    Sear

    ch T

    ime

    (in m

    s)

    Timed Factor TransducerModified Factor Transducer

    1 2 3 4 5 6 7 8 9 100

    0.5

    1

    1.5

    2Query Length = 2

    Beam Width

    Aver

    age

    Sear

    ch T

    ime

    (in m

    s)

    Timed Factor TransducerModified Factor Transducer

    1 2 3 4 5 6 7 8 9 100

    0.5

    1

    1.5

    2Query Length = 3

    Beam Width

    Aver

    age

    Sear

    ch T

    ime

    (in m

    s)

    Timed Factor TransducerModified Factor Transducer

    1 2 3 4 5 6 7 8 9 100

    0.5

    1

    1.5

    2Query Length = 4

    Beam Width

    Aver

    age

    Sear

    ch T

    ime

    (in m

    s)

    Timed Factor TransducerModified Factor Transducer

    27 / 45

  • Per Result Search Time vs. Query LengthBUTBN-R data-set > 160 hours, R-IV query-set: 4400 IV terms

    0 1 2 3 40

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9Beam Width = 4

    Query Length

    Aver

    age

    Sear

    ch T

    ime

    (in m

    s)

    Timed Factor TransducerModified Factor Transducer

    28 / 45

  • Summary

    I WFST-based indexing provides a fast, mathematically soundretrieval solution for the STD task.

    I Modified Factor Transducer is disk space friendly butnon-deterministic.

    I Timed Factor Transducer is “almost deterministic” →search-time linear in query length.

    29 / 45

  • OutlineIntroductionLattice Indexation/Search Framework

    PreliminariesSpoken Utterance Retrieval with Factor Transducer2-Pass Spoken Term Detection with Factor TransducerSpoken Term Detection with Modified Factor TransducerSpoken Term Detection with Timed Factor TransducerExperimental Results

    RetrievalQuery Forming and Expansion for Phonetic SearchExperimental Results

    Thresholding for Spoken Term DetectionGlobal ThresholdingTerm Weighted Value Based Term Specific ThresholdingScore Distribution Based Term Specific ThresholdingExperimental Results

  • Query Forming for Phonetic Search

    Motivation: To search for OOV queries

    PreparationI Convert word/subword lattices to phonetic latticesI Build a phonetic index

    How to search for OOVs?I Orthographic form (text) available, we need phonetic form

    (pronunciation)I Use a letter-to-sound (L2S) system to obtain likely

    pronunciationsI Use multiple pronunciations to search for OOV queries

    30 / 45

  • L2S Pronunciations

    L2S SystemI n-gram model over (letter, phone) pairsI Scores have a wide dynamic range due to the conditional

    independence assumptionI Pointless to use L2S scores as they are

    Unweighted L2S Pronunciations

    1. Obtain weighted pronunciations from the L2S transducer2. Pick n-best alternatives and remove weights3. Search: Compose the unweighted automaton representing

    alternatives with the phonetic index

    31 / 45

  • Weighted L2S Pronunciations (Query: Taipei)

    1. Obtain weighted pronunciations from L2S transducer[/t ay b ey/ .5, /t ay p ey/ .05, /d ay b ey/ 0.005, ...]

    2. Pick n-best alternatives to prevent false alarms[/t ay b ey/ .5, /t ay p ey/ .05] n = 2

    3. Scale the weights with query length

    [/t ay b ey/ 6√.5, /t ay p ey/ 6

    √.05] query length = 6

    4. Normalize scaled weights to obtain posterior scores[/t ay b ey/ .6, /t ay p ey/ .4]

    5. Search: Compose the weighted automaton representingalternatives with the phonetic index

    32 / 45

  • Weighted L2S Pronunciations (Query: Taipei)

    1. Obtain weighted pronunciations from L2S transducer[/t ay b ey/ .5, /t ay p ey/ .05, /d ay b ey/ 0.005, ...]

    2. Pick n-best alternatives to prevent false alarms[/t ay b ey/ .5, /t ay p ey/ .05] n = 2

    3. Scale the weights with query length

    [/t ay b ey/ 6√.5, /t ay p ey/ 6

    √.05] query length = 6

    4. Normalize scaled weights to obtain posterior scores[/t ay b ey/ .6, /t ay p ey/ .4]

    5. Search: Compose the weighted automaton representingalternatives with the phonetic index

    32 / 45

  • Weighted L2S Pronunciations (Query: Taipei)

    1. Obtain weighted pronunciations from L2S transducer[/t ay b ey/ .5, /t ay p ey/ .05, /d ay b ey/ 0.005, ...]

    2. Pick n-best alternatives to prevent false alarms[/t ay b ey/ .5, /t ay p ey/ .05] n = 2

    3. Scale the weights with query length

    [/t ay b ey/ 6√.5, /t ay p ey/ 6

    √.05] query length = 6

    4. Normalize scaled weights to obtain posterior scores[/t ay b ey/ .6, /t ay p ey/ .4]

    5. Search: Compose the weighted automaton representingalternatives with the phonetic index

    32 / 45

  • Weighted L2S Pronunciations (Query: Taipei)

    1. Obtain weighted pronunciations from L2S transducer[/t ay b ey/ .5, /t ay p ey/ .05, /d ay b ey/ 0.005, ...]

    2. Pick n-best alternatives to prevent false alarms[/t ay b ey/ .5, /t ay p ey/ .05] n = 2

    3. Scale the weights with query length

    [/t ay b ey/ 6√.5, /t ay p ey/ 6

    √.05] query length = 6

    4. Normalize scaled weights to obtain posterior scores[/t ay b ey/ .6, /t ay p ey/ .4]

    5. Search: Compose the weighted automaton representingalternatives with the phonetic index

    32 / 45

  • Weighted L2S Pronunciations (Query: Taipei)

    1. Obtain weighted pronunciations from L2S transducer[/t ay b ey/ .5, /t ay p ey/ .05, /d ay b ey/ 0.005, ...]

    2. Pick n-best alternatives to prevent false alarms[/t ay b ey/ .5, /t ay p ey/ .05] n = 2

    3. Scale the weights with query length

    [/t ay b ey/ .9, /t ay p ey/ .6] query length = 6

    4. Normalize scaled weights to obtain posterior scores[/t ay b ey/ .6, /t ay p ey/ .4]

    5. Search: Compose the weighted automaton representingalternatives with the phonetic index

    32 / 45

  • Experiment I - Reference Lexicon (Reflex) PronunciationsMSTD data-set, MSTD-OOV query-set: 1290 OOVs, phonetic indexes

    Actual Term Weighted Value

    Subwords obtained by pruning a phone n-gram model

    Data P(FA) P(Miss) ATWVWord 1-best .00001 .770 .215Word Consensus Nets .00002 .687 .294Word Lattices .00002 .657 .322Fragment 1-best .00001 .680 .306Fragment Consensus Nets .00003 .584 .390Fragment Lattices .00003 .485 .484

    33 / 45

  • Experiment I - Reference Lexicon (Reflex) PronunciationsMSTD data-set, MSTD-OOV query-set: 1290 OOVs, phonetic indexes

    Actual Term Weighted Value

    Subwords obtained by pruning a phone n-gram model

    Data P(FA) P(Miss) ATWVWord 1-best .00001 .770 .215Word Consensus Nets .00002 .687 .294Word Lattices .00002 .657 .322Fragment 1-best .00001 .680 .306Fragment Consensus Nets .00003 .584 .390Fragment Lattices .00003 .485 .484

    33 / 45

  • Experiment II - ATWV vs N-best L2S PronunciationsMSTD data-set, MSTD-OOV query-set: 1290 OOVs, phonetic indexes

    0 1 2 3 4 5 6 7 8 9 100.2

    0.25

    0.3

    0.322

    0.35

    0.4

    0.45

    0.4840.5

    N

    ATW

    V

    Fragment Lattices + Weighted L2S PronunciationsFragment Lattices + Unweighted L2S PronunciationsWord Lattices + Weighted L2S PronunciationsWord Lattices + Unweighted L2S Pronunciations

    Fragment Lattices + Reflex

    Word Lattices + Reflex

    34 / 45

  • Combined DET Plot for Weighted L2S PronunciationsMSTD data-set, MSTD-OOV query-set: 1290 OOVs, phonetic indexes

    98

    95

    90

    80

    60

    40.1.05.02.01.004.001.0001

    Miss

    pro

    babi

    lity (i

    n %

    )

    False Alarm probability (in %)

    Combined DET Plot: Weighted Letter-to-Sound 1-5 Best Fragment Lattices

    1-best, MTWV=0.334, ATWV=0.3722-best, MTWV=0.354, ATWV=0.4223-best, MTWV=0.352, ATWV=0.4404-best, MTWV=0.339, ATWV=0.4475-best, MTWV=0.316, ATWV=0.451

    35 / 45

  • Combined DET Plot for Weighted L2S PronunciationsMSTD data-set, MSTD-OOV query-set: 1290 OOVs, phonetic indexes

    Maximum Term Weighted Valuew/ Global Thresholding

    peaks at 2-best

    98

    95

    90

    80

    60

    40.1.05.02.01.004.001.0001

    Miss

    pro

    babi

    lity (i

    n %

    )

    False Alarm probability (in %)

    Combined DET Plot: Weighted Letter-to-Sound 1-5 Best Fragment Lattices

    1-best, MTWV=0.334, ATWV=0.3722-best, MTWV=0.354, ATWV=0.4223-best, MTWV=0.352, ATWV=0.4404-best, MTWV=0.339, ATWV=0.4475-best, MTWV=0.316, ATWV=0.451

    35 / 45

  • Combined DET Plot for Weighted L2S PronunciationsMSTD data-set, MSTD-OOV query-set: 1290 OOVs, phonetic indexes

    Actual Term Weighted Valuew/ Term Specific Thresholding

    (β = 1000)

    98

    95

    90

    80

    60

    40.1.05.02.01.004.001.0001

    Miss

    pro

    babi

    lity (i

    n %

    )

    False Alarm probability (in %)

    Combined DET Plot: Weighted Letter-to-Sound 1-5 Best Fragment Lattices

    1-best, MTWV=0.334, ATWV=0.3722-best, MTWV=0.354, ATWV=0.4223-best, MTWV=0.352, ATWV=0.4404-best, MTWV=0.339, ATWV=0.4475-best, MTWV=0.316, ATWV=0.451

    35 / 45

  • Summary

    I Lattice indexes perform better than CN indexes in OOVretrieval task.

    I Phone indexes generated from sub-word (fragment) latticesrepresent OOVs better.

    I Using multiple pronunciations from the L2S system improvesthe performance, particularly when they are properly weighted.

    36 / 45

  • OutlineIntroductionLattice Indexation/Search Framework

    PreliminariesSpoken Utterance Retrieval with Factor Transducer2-Pass Spoken Term Detection with Factor TransducerSpoken Term Detection with Modified Factor TransducerSpoken Term Detection with Timed Factor TransducerExperimental Results

    RetrievalQuery Forming and Expansion for Phonetic SearchExperimental Results

    Thresholding for Spoken Term DetectionGlobal ThresholdingTerm Weighted Value Based Term Specific ThresholdingScore Distribution Based Term Specific ThresholdingExperimental Results

  • Global Thresholding

    0 0.2 0.4 0.6 0.8 10

    5

    10

    15

    Posterior Score

    n

    Incorrect Class DistributionCorrect Class DistributionIncorrect Class EM EstimateCorrect Class EM Estimate

    Normalized histogram of posteriorscores for an example query

    I Pick a global threshold θ for all query termsI Apply binary thresholdingI Vary θ for different operating points

    No term specific behavior, no joint processing of candidates, hencepoor performance!

    37 / 45

  • Global Thresholding

    0 0.2 0.4 0.6 0.8 10

    5

    10

    15

    Posterior Score

    n

    Incorrect Class DistributionCorrect Class DistributionIncorrect Class EM EstimateCorrect Class EM Estimate

    Normalized histogram of posteriorscores for an example query

    I Pick a global threshold θ for all query termsI Apply binary thresholdingI Vary θ for different operating points

    No term specific behavior, no joint processing of candidates, hencepoor performance!

    37 / 45

  • Global Thresholding

    0 0.2 0.4 0.6 0.8 10

    5

    10

    15

    Posterior Score

    n

    Incorrect Class DistributionCorrect Class DistributionIncorrect Class EM EstimateCorrect Class EM Estimate

    Reject Accept Normalized histogram of posteriorscores for an example query

    I Pick a global threshold θ for all query termsI Apply binary thresholdingI Vary θ for different operating points

    No term specific behavior, no joint processing of candidates, hencepoor performance!

    37 / 45

  • Global Thresholding

    0 0.2 0.4 0.6 0.8 10

    5

    10

    15

    Posterior Score

    n

    Incorrect Class DistributionCorrect Class DistributionIncorrect Class EM EstimateCorrect Class EM Estimate

    Reject Accept Normalized histogram of posteriorscores for an example query

    I Pick a global threshold θ for all query termsI Apply binary thresholdingI Vary θ for different operating points

    No term specific behavior, no joint processing of candidates, hencepoor performance!

    37 / 45

  • Global Thresholding

    0 0.2 0.4 0.6 0.8 10

    5

    10

    15

    Posterior Score

    n

    Incorrect Class DistributionCorrect Class DistributionIncorrect Class EM EstimateCorrect Class EM Estimate

    Reject Accept Normalized histogram of posteriorscores for an example query

    I Pick a global threshold θ for all query termsI Apply binary thresholdingI Vary θ for different operating points

    No term specific behavior, no joint processing of candidates, hencepoor performance!

    37 / 45

  • Term Weighted Value (TWV) [NIST, 2006]

    TWV = 1− 1Q

    Q∑k=1{Pmiss(qk) + βPFA(qk)}

    Pmiss(qk) = 1−C(qk)R(qk)

    , PFA(qk) =A(qk)− C(qk)T − C(qk)

    Q Number of queriesR(qk) Number of occurrences of query qkA(qk) Total number of retrieved documents for qkC(qk) Number of correctly retrieved documents for qk

    T Total duration of the speech archiveβ Cost of false alarms relative to hits

    38 / 45

  • TWV Based Term Specific Thresholding [Miller et al., 2007]

    V̂hit(qk) =1

    R̂(qk), ĈFA(qk) =

    β

    T − R̂(qk)

    θ̂(qk) =ĈFA(qk)

    ĈFA(qk) + V̂hit(qk)

    V̂hit(qk) Expected value of a hit for qkĈFA(qk) Expected cost of a false alarm for qkR̂(qk) Expected count of occurrences of qkθ̂(qk) Optimal threshold for qk maximizing TWV in the

    expected sense

    I Term specific expected counts → Term specific thresholdsI Vary β for different operating points

    Only the sum of individual scores affects the threshold!

    39 / 45

  • TWV Based Term Specific Thresholding [Miller et al., 2007]

    V̂hit(qk) =1

    R̂(qk), ĈFA(qk) =

    β

    T − R̂(qk)

    θ̂(qk) =ĈFA(qk)

    ĈFA(qk) + V̂hit(qk)

    V̂hit(qk) Expected value of a hit for qkĈFA(qk) Expected cost of a false alarm for qkR̂(qk) Expected count of occurrences of qkθ̂(qk) Optimal threshold for qk maximizing TWV in the

    expected sense

    I Term specific expected counts → Term specific thresholdsI Vary β for different operating points

    Only the sum of individual scores affects the threshold!

    39 / 45

  • Exploiting Score Distributions[Manmatha et al., 2001,Can and Saraclar, 2009]

    0 0.2 0.4 0.6 0.8 10

    5

    10

    15

    Posterior Score

    n

    Incorrect Class DistributionCorrect Class DistributionIncorrect Class EM EstimateCorrect Class EM Estimate

    I Scores follow exponential-likedistributions

    I Model both classes (0,1) withexponential distributions:

    p0(y) = λ0e−λ0y

    p1(y) = λ1e−λ1(1−y)

    I Model all candidates as a mixture of exponentials

    p(y) = π0p0(y) + (1− π0)p1(y)

    I Use EM to estimate parameters ( λ0, λ1, π0)

    40 / 45

  • Exploiting Score Distributions[Manmatha et al., 2001,Can and Saraclar, 2009]

    0 0.2 0.4 0.6 0.8 10

    5

    10

    15

    Posterior Score

    n

    Incorrect Class DistributionCorrect Class DistributionIncorrect Class EM EstimateCorrect Class EM Estimate

    I Scores follow exponential-likedistributions

    I Model both classes (0,1) withexponential distributions:

    p0(y) = λ0e−λ0y

    p1(y) = λ1e−λ1(1−y)

    I Model all candidates as a mixture of exponentials

    p(y) = π0p0(y) + (1− π0)p1(y)

    I Use EM to estimate parameters ( λ0, λ1, π0)

    40 / 45

  • Computing Term Specific Thresholds

    Cost Scheme

    C =[

    0 1α 0

    ]where α is a user defined parameter specifying the cost of falsealarms relative to hits.

    I Estimate mixture parameters → each component ∼ a class,mixture weights ∼ priors

    I For k = 1, . . . , Q, Bayes-optimal threshold θ̂(qk) is given as:

    θ̂(qk) =λ̂1(qk) + log(λ̂0(qk)/λ̂1(qk)) + log(π̂0(qk)/π̂1(qk)) + logα

    λ̂0(qk) + λ̂1(qk)

    I Different operating points can be achieved by changing α.

    41 / 45

  • STD Evaluation Metrics

    Precision-Recall CurvesPrecision and recall are the most popular IR evaluation metrics.Given a set of queries qk, k = 1, . . . , Q, let

    R(qk) be the number of segments in the collection that arerelated to the query qk,

    A(qk) be the total number of retrieved segments andC(qk) be the number of correctly retrieved segments.

    Precision = 1Q

    Q∑k=1

    C(qk)A(qk)

    , Recall = 1Q

    Q∑k=1

    C(qk)R(qk)

    ROC CurvesThese curves use NIST’s PMiss and PFA definitions for STD.

    42 / 45

  • Precision-Recall ComparisonBUTBN-R data-set, R-IV query-set, lattice beam = 4

    0.5 0.6 0.7 0.8 0.9 10.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    Precision

    Reca

    ll

    Global ThresholdingTWV Based TSTScore Distribution Based TST

    43 / 45

  • ROC ComparisonBUTBN-R data-set, R-IV query-set, lattice beam = 4

    10−6 10−5 10−4 10−30.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    PFA

    P Mis

    s

    Global ThresholdingTWV Based TSTScore Distribution Based TST

    44 / 45

  • Summary

    I Exploiting score distributions leads to a viable term specificthresholding method

    I SD-TST optimizes precision metric → superior to TWV-TSTover a large interval of precision values

    I TWV-TST optimizes false alarm metric → has much betterROC performance

    45 / 45

  • PublicationsCan, D., Cooper, E., Ghoshal, A., Jansche, M., Khudanpur,S., Ramabhadran, B., Riley, M., Saraclar, M., Sethy, A.,Ulinski, M., and White, C. (2009a).Web derived pronunciations for spoken term detection.In SIGIR, pages 83–90.

    Can, D., Cooper, E., Sethy, A., White, C., Ramabhadran, B.,and Saraclar, M. (2009b).Effect of pronunciations on oov queries in spoken termdetection.Acoustics, Speech, and Signal Processing, IEEE InternationalConference on, 0:3957–3960.Can, D. and Saraclar, M. (2009).Score distribution based term specific thresholding for spokenterm detection.In Proceedings of NAACL-HLT 2009, pages 269–272, Boulder,Colorado. Association for Computational Linguistics.

    1 / 10

  • References I

    Allauzen, C., Mohri, M., and Saraclar, M. (2004).General-indexation of weighted automata-application tospoken utterance retrieval.In Proc. HLT-NAACL.Can, D., Cooper, E., Sethy, A., White, C., Ramabhadran, B.,and Saraclar, M. (2009).Effect of pronunciations on oov queries in spoken termdetection.Acoustics, Speech, and Signal Processing, IEEE InternationalConference on, 0:3957–3960.Can, D. and Saraclar, M. (2009).Score distribution based term specific thresholding for spokenterm detection.In Proceedings of NAACL-HLT 2009, pages 269–272, Boulder,Colorado. Association for Computational Linguistics.

    2 / 10

  • References II

    Chaudhari, U. V. and Picheny, M. (2007).Improvements in phone based audio search via constrainedmatch with high order confusion estimates.In Proc. of ASRU.Chelba, C. and Acero, A. (2005).Position specific posterior lattices for indexing speech.In Proc. of ACL.Li, Y. C., Lo, W. K., Meng, H. M., and Ching., P. C. (2000).Query expansion using phonetic confusions for chinese spokendocument retrieval.In Proc. of IRAL.Mamou, J., Ramabhadran, B., and Siohan, O. (2007).Vocabulary independent spoken term detection.In Proc. of ACM SIGIR.

    3 / 10

  • References III

    Manmatha, R., Rath, T., and Feng, F. (2001).Modeling score distributions for combining the outputs ofsearch engines.In SIGIR ’01, pages 267–275, New York, NY, USA. ACM.

    Miller, D. R. H., Kleber, M., Kao, C., Kimball, O., Colthurst,T., Lowe, S. A., Schwartz, R. M., and Gish, H. (2007).Rapid and accurate spoken term detection.In Proc. Interspeech.

    NIST (2006).The spoken term detection (STD) 2006 evaluation plan.http://www.itl.nist.gov/iad/mig/tests/std/.

    Parlak, S. and Saraclar, M. (2008).Spoken term detection for Turkish Broadcast News.In Proc. ICASSP.

    4 / 10

  • References IV

    Saraclar, M. and Sproat, R. (2004).Lattice-based search for spoken utterance retrieval.In Proc. HLT-NAACL.Siohan, O. and Bacchiani, M. (2005).Fast vocabulary independent audio search using path basedgraph indexing.In Proc. of Interspeech.

    Zhou, Z. Y., Yu, P., Chelba, C., and Seide, F. (2006).Towards spoken-document retrieval for the internet: latticeindexing for large-scale web-search architectures.In Proc. of HLT-NAACL.

    5 / 10

  • Notation & DefinitionsProduct Semiring

    DefinitionFor two partially-ordered semirings A = (A,⊕A,⊗A, 0A, 1A) andB = (B,⊕B,⊗B, 0B, 1B), the product semiring over A× B:

    A× B = (A× B,⊕×,⊗×, 0A × 0B, 1A × 1B)

    where ⊕× and ⊗× are component-wise operators

    (a1, b1)⊕× (a2, b2) = (a1 ⊕A a2, b1 ⊕B b2),(a1, b1)⊗× (a2, b2) = (a1 ⊗A a2, b1 ⊗B b2).

    The natural order over A× B, given by

    ((a1, b1) ≤× (a2, b2))⇔ (a1 ⊕A a2 = a1, b1 ⊕B b2 = b1),

    is a partial order, even if A and B are totally-ordered.6 / 10

  • Notation & DefinitionsLexicographic Semiring

    DefinitionFor two partially-ordered semirings A = (A,⊕A,⊗A, 0A, 1A) andB = (B,⊕B,⊗B, 0B, 1B), the lexicographic semiring over A× B:

    A ∗ B = (A× B,⊕∗,⊗∗, 0A × 0B, 1A × 1B)

    where ⊕∗ is a lexicographic priority operator

    (a1, b1)⊕∗ (a2, b2) =

    (a1, b1 ⊕B b2) a1 = a2(a1, b1) a1 = a1 ⊕A a2 6= a2(a2, b2) a1 6= a1 ⊕A a2 = a2

    and ⊗∗ is a component-wise multiplication operator.A ∗ B is a totally-ordered when A and B are so:

    ((a1, b1) ≤∗ (a2, b2))⇔(a1 = a1 ⊕A a2 6= a2)

    or(a1 = a2 and b1 = b1 ⊕B b2) 7 / 10

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionI Given a transition e ∈ E:

    I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.

    I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.

    I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].

    qa

    qb

    m:h/.5

    8 / 10

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionI Given a transition e ∈ E:

    I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.

    I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.

    I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].

    qa

    qb

    m:h/.5

    8 / 10

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionI Given a transition e ∈ E:

    I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.

    I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.

    I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].

    qa

    qb

    m:h/.5

    8 / 10

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionI Given a transition e ∈ E:

    I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.

    I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.

    I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].

    qa

    qb

    m:h/.5

    8 / 10

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionI Given a transition e ∈ E:

    I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.

    I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.

    I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].

    qa

    qb

    m:h/.5

    8 / 10

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionI Given a transition e ∈ E:

    I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.

    I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.

    I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].

    qa

    qb

    m:h/.5

    8 / 10

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionI Given a transition e ∈ E:

    I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.

    I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.

    I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].

    qa

    qb

    m:h/.5

    qa

    qb

    qc/1

    m:h/.5

    e:a/.7

    e:v/.3

    8 / 10

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionI Given a transition e ∈ E:

    I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.

    I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.

    I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].

    qa

    qb

    m:h/.5

    qa

    qb

    qc/1

    m:h/.5

    e:a/.7

    e:v/.3

    8 / 10

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionI Given a transition e ∈ E:

    I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.

    I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.

    I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].

    qa

    qb

    m:h/.5

    qa

    qb

    qc/1

    m:h/.5

    e:a/.7

    e:v/.3

    8 / 10

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionI Given a transition e ∈ E:

    I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.

    I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.

    I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].

    qa

    qb

    m:h/.5

    qa

    qb

    qc/1

    m:h/.5

    e:a/.7

    e:v/.3

    8 / 10

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionI Given a transition e ∈ E:

    I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.

    I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.

    I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].

    qa

    qb

    m:h/.5

    qa

    qb

    qc/1

    m:h/.5

    e:a/.7

    e:v/.3

    8 / 10

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionI Given a transition e ∈ E:

    I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.

    I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.

    I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].

    qa

    qb

    m:h/.5

    qa

    qb

    qc/1

    m:h/.5

    e:a/.7

    e:v/.3

    8 / 10

  • Notation & DefinitionsWeighted Finite-State Automata

    DefinitionI Given a transition e ∈ E:

    I p[e] : its previous state,I n[e] : its next state,I i[e] : its input label,I o[e] : its output label,I w[e] : its weight.

    I A path π = e1 · · · ek ∈ E∗:n[ei − 1] = p[ei], i = 2, . . . , k.

    I We extend p, n, i, o, w to paths:I p[π] = p[e],I n[π] = n[ek],I i[π] = i[e1] · · · i[ek],I o[π] = o[e1] · · · o[ek],I w[π] = w[e1]⊗ · · · ⊗ w[ek].

    qa

    qb

    m:h/.5

    qa

    qb

    qc/1

    m:h/.5

    e:a/.7

    e:v/.3

    8 / 10

  • Factor TransducerDefinitionFor a finite-state automaton Ai, i = 1 . . . n, the factor transducerof Ai is defined as the weighted finite-state transducer Ti for which:

    JTiK(x, i) = − log(EPi [Ci(x)]) ∈ T

    where x ∈ XAi and EPi [Ci(x)] is the expected count of x in Ai.For each state q ∈ Qi,d[q]: shortest distance from Ii to q (− log of forward prob.),f [q]: shortest distance from q to Fi (− log of backward prob.).

    d[q] =log⊕

    π∈P (Ii,q)(λi(p[π])+w[π]), f [q] =

    log⊕π∈P (q,Fi)

    (w[π]+ρi(n[π]))

    − log(EPi [Cx]) =log⊕

    i[π]=x(d[p[π]] + w[π] + f [n[π]])

    9 / 10

  • Factor TransducerDefinitionFor a finite-state automaton Ai, i = 1 . . . n, the factor transducerof Ai is defined as the weighted finite-state transducer Ti for which:

    JTiK(x, i) = − log(EPi [Ci(x)]) ∈ T

    where x ∈ XAi and EPi [Ci(x)] is the expected count of x in Ai.For each state q ∈ Qi,d[q]: shortest distance from Ii to q (− log of forward prob.),f [q]: shortest distance from q to Fi (− log of backward prob.).

    d[q] =log⊕

    π∈P (Ii,q)(λi(p[π])+w[π]), f [q] =

    log⊕π∈P (q,Fi)

    (w[π]+ρi(n[π]))

    − log(EPi [Cx]) =log⊕

    i[π]=x(d[p[π]] + w[π] + f [n[π]])

    9 / 10

  • EM Parameter UpdatesI Model all candidates as a mixture of exponentials

    p(y) = π0p0(y) + (1− π0)p1(y)I Use EM to estimate parameters (λ0(qk), λ1(qk) and π0(qk))

    given the candidate scores (yk,n n = 1, . . . , Nk) of a queryterm qk.

    I First compute

    P (j|yk,n) =π̂j(qk)pj(yk,n)

    p(yk,n)j = 1, 2, n = 1, . . . , Nk

    I Then update

    λ̂0(qk) =∑n P (0|yk,n)∑

    n P (0|yk,n)yk,n,

    λ̂1(qk) =∑n P (1|yk,n)∑

    n P (1|yk,n)(1− yk,n),

    π̂j(qk) =1Nk

    ∑n

    P (j|yk,n).

    10 / 10

    IntroductionLattice Indexation/Search FrameworkPreliminariesSpoken Utterance Retrieval with Factor Transducer2-Pass Spoken Term Detection with Factor TransducerSpoken Term Detection with Modified Factor TransducerSpoken Term Detection with Timed Factor TransducerExperimental Results

    RetrievalQuery Forming and Expansion for Phonetic SearchExperimental Results

    Thresholding for Spoken Term DetectionGlobal ThresholdingTerm Weighted Value Based Term Specific ThresholdingScore Distribution Based Term Specific ThresholdingExperimental Results

    AppendixAppendix


Recommended