+ All Categories
Home > Documents > Applying the Pronunciation Lexicon Specification to ASR & TTS 1 Patrizio Bergallo 1 Monday, August...

Applying the Pronunciation Lexicon Specification to ASR & TTS 1 Patrizio Bergallo 1 Monday, August...

Date post: 14-Dec-2015
Category:
Upload: triston-loar
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
24
Applying the Pronunciation Lexicon Specification to ASR & TTS 1 Patrizio Bergallo 1 Monday, August 20, 2007 SpeechTEK ASTS - Advances in Text-to-Speech Processing Applying the Pronunciation Lexicon Specification to ASR & TTS Patrizio Bergallo
Transcript

Applying the Pronunciation Lexicon Specification to ASR & TTS 1Patrizio Bergallo 1

Monday, August 20, 2007SpeechTEK ASTS - Advances in Text-to-Speech Processing

Applying the Pronunciation Lexicon Specification to ASR & TTS

Patrizio Bergallo

Applying the Pronunciation Lexicon Specification to ASR & TTS 2Patrizio Bergallo

Agenda

• Loquendo Today

• Introduction to PLS– Reference Scenario

– Pronunciation Lexicons

– International Phonetic Alphabet

• Overview of PLS– How does TTS use PLS?

– How does ASR use PLS?

• Examples of Use

• Latest Improvements

Applying the Pronunciation Lexicon Specification to ASR & TTS 3Patrizio Bergallo

Loquendo Today

• Global company of the Telecom Italia group, leader in Europe and South America in the Speech Technologies market

• Company founded in 2001 from Telecom Italia Labs, benefiting from know-how gained from more than 30 years research experience

• Complete set of Multilingual speech technologies on a wide spectrum of devices; 25 patents, 50 voices and 20 languages

• Full support for international standards (MRCPv1/v2, VoiceXML 2.0/2.1, CCXML, SSML, SRGS, SISR)

• Company ready for challenging future scenarios: Multimodality, Security • 100 employees, and displayed strong growth throughout 2007• HQ in Turin, Offices in US, Spain, Germany and France, and a Worldwide

Network of Partners

Applying the Pronunciation Lexicon Specification to ASR & TTS 4Patrizio Bergallo

Reference Scenario

• Many speech applications need to specify pronunciation for words and phrases– Surnames, locations, company names

– Acronyms

– Names in specific contexts (restaurants, sports, movie titles, etc.)

– Foreign words, mixed languages

• Pronunciation is critical both for TTS and ASR– Improves reading of prompts by TTS

– Improves ASR performance

• VoiceXML 2.0/2.1 applications are the reference scenario– Prompts are based on SSML 1.0 (or in future SSML 1.1)

– Recognition grammars are based on SRGS 1.0

Applying the Pronunciation Lexicon Specification to ASR & TTS 5Patrizio Bergallo

Pronunciation Lexicons

• Pronunciation Lexicon– a mapping between words (or short phrases), their written

representations, and their pronunciations suitable for use by an ASR engine or a TTS engine

• Pronunciation lexicons are not only useful for voice browsers – They have also proven effective mechanisms to support accessibility for

the differently able as well as greater usability for all users

– They are used to good effect in screen readers and user agents supporting multimodal interfaces

• The W3C Pronunciation Lexicon Specification (PLS) Version 1.0 is designed to enable interoperable specification of pronunciation lexicons

Applying the Pronunciation Lexicon Specification to ASR & TTS 6Patrizio Bergallo

Pronunciation Lexicon Specification

• W3C specification status– Second Last Call Working Draft (26 October, 2006)

– Currently the Implementation Report Plan and the Disposition of Comments are under development (all public comments were addressed)

– Candidate Recommendation expected 3Q07

Part of first version of the Speech Interface

Framework (Larson, 2000)

W3C Recommendation

W3C Last Call Working Draft

Applying the Pronunciation Lexicon Specification to ASR & TTS 7Patrizio Bergallo

International Phonetic Alphabet

• Pronunciation is represented by a phonetic alphabet– Standard phonetic alphabets

• International Phonetic Alphabet (IPA)

– Well known phonetic alphabet• SAMPA - ASCII based (simple to write)• Pinyin (Chinese Mandarin), JEITA (Japanese), etc.

– Proprietary phonetic alphabets

• International Phonetic Alphabet (IPA)– Created by International Phonetic Association (active since 1896),

collaborative effort by all the major phoneticians around the world

– Universally agreed system of notation for sounds of languages

– Covers all languages

– Requires UNICODE to write it

– Normatively referenced by PLS

Applying the Pronunciation Lexicon Specification to ASR & TTS 8Patrizio Bergallo

Overview of PLS

• A PLS document is a container (<lexicon>) of several lexical entries (<lexeme>)

• Each lexical entry contains– One or more spellings (<grapheme>)

– One or more pronunciations (<phoneme>) or substitutions (<alias>)

• Each PLS document is related to a single unique language (xml:lang)

• SSML 1.0 and SRGS 1.0 documents can reference one or more PLS documents

• Current version doesn’t include morphological, syntactic and semantic information associated with pronunciations

Applying the Pronunciation Lexicon Specification to ASR & TTS 9Patrizio Bergallo

PLS Example

<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/01/pronunciationlexicon

http://www.w3.org/TR/2007/CR-pronunciation-lexicon2007@@@@/pls.xsd" alphabet="ipa" xml:lang="en-US">

<lexeme> <grapheme>Sepulveda</grapheme> <phoneme>səˈpʌlvɪdə</phoneme> </lexeme>

<lexeme> <grapheme>W3C</grapheme> <alias>World Wide Web Consortium</alias> </lexeme>

</lexicon>

Applying the Pronunciation Lexicon Specification to ASR & TTS 10Patrizio Bergallo

How does TTS use PLS?

• SSML 1.0<?xml version="1.0" encoding="UTF-8"?><speak version="1.0" … xml:lang="en-US"> <lexicon uri="http://www.example.com/SSMLexample.pls"/> The title of the movie is: "La vita è bella" (Life is beautiful), which is directed by Benigni. </speak>

• PLS 1.0<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>La vita è bella</grapheme> <phoneme>ˈlɑ ˈviːɾə ˈʔeɪ ˈbɛlə</phoneme> </lexeme> <lexeme> <grapheme>Benigni</grapheme> <phoneme>bɛˈniːnji</phoneme> </lexeme></lexicon>

Applying the Pronunciation Lexicon Specification to ASR & TTS 11Patrizio Bergallo

How does ASR use PLS?

• SRGS 1.0<?xml version="1.0" encoding="UTF-8"?><grammar version="1.0" … xml:lang="en-US” root="movies" mode="voice"> <lexicon uri="http://www.example.com/SRGSexample.pls"/> <rule id="movies" scope="public"> <one-of> <item>Terminator 2: Judgment Day</item> <item>Pluto's Judgement Day</item> </one-of> </rule></grammar>

• PLS 1.0<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>judgment</grapheme> <grapheme>judgement</grapheme> <phoneme>ˈdʒʌdʒ.mənt</phoneme> </lexeme></lexicon>

Applying the Pronunciation Lexicon Specification to ASR & TTS 12Patrizio Bergallo

Examples of Use

• Multiple pronunciations for the same orthography

• Multiple orthographies

• Homophones

• Homographs

• Acronyms, Abbreviations, etc.

Applying the Pronunciation Lexicon Specification to ASR & TTS 13Patrizio Bergallo

Multiple pronunciations for the same orthography

• Multiple pronunciations are represented by more than one <phoneme> or <alias> element

<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" … alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme>ˈnjuːtən</phoneme> <phoneme>ˈnuːtən</phoneme> </lexeme></lexicon>

Applying the Pronunciation Lexicon Specification to ASR & TTS 14Patrizio Bergallo

Multiple orthographies

• Alternative textual representations for the same word or phrase are represented by more than one <grapheme> inside the same <lexeme>

• All the pronunciations given within the <lexeme> apply to each and every <grapheme> within the <lexeme>

<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" … alphabet="ipa" xml:lang="jp"> <lexeme> <grapheme>nihongo</grapheme> <grapheme> 日本語 </grapheme> <grapheme> にほんご </grapheme> <phoneme>ɲihoŋo</phoneme> </lexeme></lexicon>

Applying the Pronunciation Lexicon Specification to ASR & TTS 15Patrizio Bergallo

Homophones

• Words with the same pronunciation but different meanings are represented as different lexemes

<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>cede</grapheme> <phoneme>siːd</phoneme> </lexeme> <lexeme> <grapheme>seed</grapheme> <phoneme>siːd</phoneme> </lexeme></lexicon>

Applying the Pronunciation Lexicon Specification to ASR & TTS 16Patrizio Bergallo

Homographs (1/2)

• Words with the same spelling but pronounced in different ways are represented using the role attribute of the <lexeme> element

• This mechanism allows for the referencing of defined taxonomies of word classes (part of speech, meaning, etc.)

<lexicon version="1.0“ xmlns:claws=“http://www.example.com/claws7tags” alphabet="x-myorganization-pinyin" xml:lang="zh-CN"> <lexeme role="claws:VV0"> <!-- base form of lexical verb -->

<grapheme> 处 </grapheme> <phoneme>chu3</phoneme> <!-- pinyin string is: "chǔ" in 处罚 处置 --> </lexeme> <lexeme role="claws:NN"> <!-- common noun, neutral for number -->

<grapheme> 处 </grapheme> <phoneme>chu4</phoneme> <!-- pinyin string is: "chù" in 处所 妙处 --> </lexeme></lexicon>

Applying the Pronunciation Lexicon Specification to ASR & TTS 17Patrizio Bergallo

Homographs (2/2)

<speak version="1.1“ xmlns:claws="http://www.example.com/claws7tags" xml:lang="zh-CN">

<lexicon uri="http://www.example.com/lexicon.pls“

type="application/pls+xml“ xml:id="mylex"/>

<lookup ref="mylex">

他这个人很不好相 <w role="claws:VV0"> 处 </w> 。 此 <w role="claws:NN"> 处 </w> 不准照相。 </lookup>

</speak>

• SSML 1.1 will support the role attribute

• Currently PLS doesn’t define/mandate any taxonomy

• PLS generally defines role values as qualified names (QNames)

Applying the Pronunciation Lexicon Specification to ASR & TTS 18Patrizio Bergallo

Acronyms, Abbreviations, etc.

• Pronunciations expressed as a sequence of other orthographies (acronyms, abbreviations, etc.) are represented by the <alias> element

<?xml version="1.0" encoding="UTF-8"?><lexicon version="1.0" … alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>W3C</grapheme> <alias>World Wide Web Consortium</alias> </lexeme> <lexeme> <grapheme>101</grapheme> <alias>one hundred and one</alias> </lexeme></lexicon>

Applying the Pronunciation Lexicon Specification to ASR & TTS 19Patrizio Bergallo

Latest Improvements

• W3C Last Call Working Draft stage allows public comments to be addressed– Large majority were clarifications

– New functionalities were deferred to a future version of PLS specification

• Major clarifications were about– <alias> recursion

– Multiple pronunciations

• Changes are subject to a formal approval by the Working Group

• Next Steps– PLS 1.0 is very close to Candidate Recommendation stage

– SSML 1.1 will provide a more complete support of PLS 1.0

Applying the Pronunciation Lexicon Specification to ASR & TTS 20Patrizio Bergallo

<alias> recursion

• Pronunciations of the <alias> element contents MUST be generated by the processor, using pronunciations described by the <phoneme> element of any constituent graphemes in the PLS document, and without invoking recursive access to the PLS document on the <alias> elements of any constituent graphemes

<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0" … alphabet="ipa" xml:lang="en-US">

<lexeme>

<grapheme>GNU</grapheme>

<alias>GNU is Not Unix</alias>

<phoneme>gəˈnuː</phoneme>

</lexeme>

<lexeme>

<grapheme>Unix</grapheme>

<grapheme>UNIX</grapheme>

<alias>a multiplexed information and computing service</alias>

<phoneme>ˈjuːnɪks</phoneme>

</lexeme>

</lexicon>

GNU is pronounced:gəˈnuː is Not ˈjuːnɪks

Applying the Pronunciation Lexicon Specification to ASR & TTS 21Patrizio Bergallo

Multiple pronunciations (1/2)

• ASR– If more than one pronunciation for a given <lexeme> is specified, an ASR

processor MUST consider each of them as valid pronunciations for the <grapheme>

• TTS– If more than one pronunciation for a given <lexeme> is specified, a TTS

processor MUST use the first one in document order that has the prefer attribute set to "true“

– If none of the pronunciations has prefer set to "true", the TTS processor MUST use the first one in document order unless the TTS processor is documented as having a method of selecting pronunciations, in which case the processor MUST use any one of the pronunciations

Applying the Pronunciation Lexicon Specification to ASR & TTS 22Patrizio Bergallo

Multiple pronunciations (2/2)

• An ASR processor will recognize both pronunciations, whereas a TTS processor will only use the first one (because it is the first in document order that has prefer set to "true").

<?xml version="1.0" encoding="UTF-8"?>

<lexicon version="1.0" … alphabet="ipa" xml:lang="en-US">

<lexeme>

<grapheme>lead</grapheme>

<alias prefer="true">led</alias>

<phoneme prefer="true">liːd</phoneme>

</lexeme>

<lexeme>

<grapheme>led</grapheme>

<phoneme>led</phoneme>

</lexeme>

</lexicon>

Applying the Pronunciation Lexicon Specification to ASR & TTS 23Patrizio Bergallo

References

• PLS 1.0 Second Last Call Working Draft (26 October, 2006) – http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/

• Voice Browser Activity Page (VoiceXML, SSML, SRGS, …)– http://www.w3.org/Voice/

• International Phonetic Association– http://www.arts.gla.ac.uk/IPA/

• VoiceXML Forum– http://www.voicexml.org/

Applying the Pronunciation Lexicon Specification to ASR & TTS 24Patrizio Bergallo

Final Remarks

THANK YOUTHANK YOU

• For more information please

– Visit Loquendo’s booth #509

– Keep an eye on: www.loquendo.com

– Contact us: [email protected]


Recommended