Date post: | 28-Mar-2015 |
Category: |
Documents |
Upload: | katherine-macpherson |
View: | 218 times |
Download: | 4 times |
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
The SPEERAL Decoder
NOCERA PascalLaboratoire d ’Informatique d ’Avignon
AGROPARC
BP 1228, 84911 AVIGNON Cedex 9
Tel : 04.90.84.35.07
E-mail : [email protected]
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
The SPEERAL System
Stochastic approach
Find the best hypothesis among all the possible hypotheses with the A* algorithm.
)()/(maxˆ
)/(maxˆ
wPwXPArgw
XwPArgw
w
w
Acoustic Parameters Decoder CE SOIR
X w
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
The SPEERAL System
Stochastic approach
)()/(maxˆ
)/(maxˆ
wPwXPArgw
XwPArgw
w
w
Acoustic Models
t
Linguistic Models
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Acoustic Models
Hidden Markov ModelsGaussian Mixture ModelsContextual Models (Phonemes)
S1 S2 S3
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Acoustic Model Toolkit
Parameterization program Text to phone program Alignment program HMM learning program Supervised and unsupervised Model Adaptation
– MLLR
– MAP
– Structural Model Space Transformation
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Linguistic Models
Stochastic Language Models– N-grams– Class based language models
)/()W(1
1 hwPP i
n
i
n
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Linguistic Model Toolkit
Text Normalization Tools Language Model Training
– CMU toolkit– SRI toolkit– AT&T toolkit
Language Model Compilation Lexicon Compilation
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Standard A* algorithm
« best-first » search algorithm– Extend the best path to generate new candidates– Assign a score F(x) to all explored path
g(x) combines Language Model and acoustic scoresh(x) estimates the probability of the best extension
– Keep the list of explored paths as a priority queue– When the best path reaches ‘end’ then stop
F(x) = g(x) + h(x)
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Requires an admissible heuristic function– h(x) underestimates the true remaining cost path (the
more accurate the better). Heuristics samples
– h(x) = 0
• Breadth-First search
– h(x) = true remaining cost (i.e. F(x) never changes)
• Deterministic search
Standard A* algorithm (2/2)
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
The SPEERAL System
Language model– Stochastic n-gram LM (n=3)
Lexical, phonetic and acoustic knowledge source– Acoustic model (HMM, …)
– Decoding vocabulary (lexicon)
– Input signal Phoneme lattice
• ( p, beg, end, sc ) with score sc = P(X beg..end/p)
+ …/…
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Sounding function h Remaining path estimation
– Acoustic score only
– Computed with a backward Viterbi, during the phoneme lattice generation
Heuristic admissibility– Underestimate remaining cost : no LM information
– Cannot be true cost (lack of LM information)
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Lexicon
Prefix-tree organization– Widely applied
– Compact representation• search effort occurs at word begin
W1 : p1p2p3
W2 : p1p3
W3 : p2p1
Lexicon
p1
p1p2
p2 p3
p3
W1
W2
W3
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Search space
Phoneme lattice Concatenation of lexical trees
W1
W2W3
Lexicon: W2W1W1
W2W1W2
W3W2W2
W3W2W1
W1W1
Sentence beginning
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
LM look-ahead
Word anticipation– n is a lexicon node
– wn is any leaf (i.e. word) of the sub-tree starting at n
• P(n/...wi-2 w i-1) = Part_LM(n, wi-2 wi-1 )
• Part_LM(n, wi-2 w i-1 ) = maxWn[P(wn/wi-2 wi-1)]
Paths leading to improbable words are early penalized
p1
p1p2
p2 p3
p3
W1
W2
W3
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Start-synchronous tree
Asynchronous search– The search processes the same part (lexicon) with a
different history.
With start-synchronous capabilities– Most advanced path can be reused when encountered
twice.
• For each frame x, the lexicon starting at x is stored.
• Only the deepest nodes (or leaves) are stored.
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Principle (1/5)
p1
p1p2 W3
p1
p1p2
p2 p3
p3
Frame tFrame 0
Deepest lexicon nodes at frame 0
Deepest lexicon nodes at frame t
p1
p1p2
p2 p3
p3
W1
W2
W3
W 1
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Principle (2/5)
p1
p1p2
p2
p3
W3
p1
p1p2
p2 p3
p3
Frame tFrame 0
W 1
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Principle (3/5)
p1
p1p2
p2
p3
W3
Frame tFrame 0
W2
p1
p1p2
p2 p3
p3
W 1
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Principle (4/5)
p1
p1p2
p2
p3
W3
Frame tFrame 0
W2
p1
p1p2
p2 p3
p3
W 1
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Principle (5/5) ….
p1
p1p2
p2
p3
W3
Frame tFrame 0
W2
p1
p1p2
p2 p3
p3
W 1
p1
p2
p1
p2
Frame t+n
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Search space pruning
Optimization– If two candidates end with the same 3
words, only the best is kept.
Cut– Short candidates are dropped when their
distance increase too much with the deepest.
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
ASR Output
– 1 best hypothesis
– N best hypothesis
– word graph
Applications
– Transcription
– Question answering
– Named entities extraction
– Information Retrieval
– Call-type classification
– …
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
French Broadcast News Campain
ESTER
AcousticSegmentation
Broadcast News (1h long show)
SpeakerSegmentation
Information Extraction
Speechtranscription
Acousticmodels
Language models
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
System Description
Acoustic Models :• 10k HMM contextual• 3.6k states• 230k gaussian
Lexicon : 65K Words Language model Combination :
• (Le Monde 87-02, 0.41)• (Le Monde 02-03, 0.24)• (ESTER, 0.35)
FLAVOR workshop
LABORATOIRE D’INFORMATIQUE
CERI339 Chemin des Meinajariès
BP 122884911 AVIGNON CEDEX 09
Tél. + 33 (0)4 90 84 35 09Fax. + 33 (0)4 90 84 35 01
[email protected]://www.lia.univ-avignon.fr
Results and Demonstration
WER 25 % (10 RT)
Demonstration on TV