+ All Categories
Home > Documents > Bang-Xuan Huang Department of Computer Science & Information Engineering

Bang-Xuan Huang Department of Computer Science & Information Engineering

Date post: 31-Dec-2015
Category:
Upload: courtney-bryan
View: 19 times
Download: 1 times
Share this document with a friend
Description:
A word graph algorithm for large vocabulary continuous speech recognition Stefan Ortmanns, Hermann Ney, Xavier Aubert. Bang-Xuan Huang Department of Computer Science & Information Engineering National Taiwan Normal University. Outline. Introduction Review of the integrated method - PowerPoint PPT Presentation
16
A word graph algorithm for large vocabulary continuous speech recognition Stefan Ortmanns, Hermann Ney, Xavier Aubert Bang-Xuan Huang Department of Computer Science & Information Engineering National Taiwan Normal University
Transcript
Page 1: Bang-Xuan Huang Department of Computer Science & Information Engineering

A word graph algorithm for large vocabularycontinuous speech recognition

Stefan Ortmanns, Hermann Ney, Xavier Aubert

Bang-Xuan Huang

Department of Computer Science & Information Engineering

National Taiwan Normal University

Page 2: Bang-Xuan Huang Department of Computer Science & Information Engineering

Outline

• Introduction

• Review of the integrated method

• Word graph method

• Experiment result

2

Page 3: Bang-Xuan Huang Department of Computer Science & Information Engineering

Introduction

• Why LVCSR difficult ?

• This paper describes a method for the construction of a word graph (or lattice) for large vocabulary, continuous speech recognition.

• The advantage of a word graph is that the final search at the word level using a complicated language model can be achieved.

• The difficulty in efficiently constructing a good word graph is the following: the start time of a word depends in general on the predecessor words.

• Integrated method - Word conditioned lexical tree search algorithm

- bigram

- trigram

3

Page 4: Bang-Xuan Huang Department of Computer Science & Information Engineering

Review of the integrated method(1/4)

4

Page 5: Bang-Xuan Huang Department of Computer Science & Information Engineering

Review of the integrated method(2/4)

• where σvmax(t, s) is the optimum predecessor state for the

hypothesis (t, s) and predecessor word v.

• q(xt, s|σ) is the product of transition and emission probabilities of the Hidden Markov models used for the context dependent or independent phonemes.

• At word boundaries, we have to find the best predecessor word v for each word w. To this purpose, we define:

• where the state Sw denotes the terminal state of word w in the lexical tree.

5

Page 6: Bang-Xuan Huang Department of Computer Science & Information Engineering

Review of the integrated method(3/4)

• To propagate the path hypothesis into the lexical tree hypotheses or to start them up if they do not exist yet, we have to pass on the score and the time index before processing the hypotheses for time frame t:

• Garbage collection

• Pruning techniques and language model look-ahead

6

Page 7: Bang-Xuan Huang Department of Computer Science & Information Engineering

Review of the integrated method(4/4)

• Extension to trigram language models

7

Page 8: Bang-Xuan Huang Department of Computer Science & Information Engineering

• fundamental problem of word graph construction:

Hypothesizing a word w and its ending time t, how can we find a limited number of “most likely” predecessor words?

Word graph method

8

t

w

v

…..

Page 9: Bang-Xuan Huang Department of Computer Science & Information Engineering

Word-Pair Approximation: for each

path hypothesis, the position of word

boundary between the last two words

is independent of the other words

of this path hypothesis

9

Page 10: Bang-Xuan Huang Department of Computer Science & Information Engineering

Word graph generation algorithm

10

Page 11: Bang-Xuan Huang Department of Computer Science & Information Engineering

11

Page 12: Bang-Xuan Huang Department of Computer Science & Information Engineering

12

Page 13: Bang-Xuan Huang Department of Computer Science & Information Engineering

13

Page 14: Bang-Xuan Huang Department of Computer Science & Information Engineering

• Word-conditioned Search

– A virtual tree copy is explored for each active LM history

– More complicated in HMM state manipulation (dependent on the LM complexity)

• Time-conditioned Search

– A virtual tree copy is being entered at each time by the word end hypotheses of the same given time

– More complicated in LM-level recombination

14

Page 15: Bang-Xuan Huang Department of Computer Science & Information Engineering

• Garbage collection• For large-vocabulary recognition, it is essential to keep the storage

costs as low as possible.

• To reduce the memory requirements back pointers and traceback arrays are needed, whereas the traceback arrays are used to record the decisions about the best predecessor word for each word start-up.

• In order to remove these obsolete hypothesis entries from the traceback arrays, we apply a garbage collection or purging method.

• In principle, this garbage collection process can be performed every time frame, but to reduce the overhead, it is sufficient to perform it in regular time intervals, say every 50th time frame.

15

Page 16: Bang-Xuan Huang Department of Computer Science & Information Engineering

• Pruning techniques and language model look-ahead

16


Recommended