SPIRE-SST: An automatic web-based self-learning tool for ... · [1]C. Yarra, O. D. Deshmukh, and P....

SPIRE-SST: An automatic web-based self-learning tool for syllable stresstutoring (SST) to the second language learners

Chiranjeevi Yarra1,a, Anand P A2, Kausthubha N K1, Prasanta Kumar Ghosh1,b

1Electrical Engineering, Indian Institute of Science (IISc), Bangalore-560012, India2National Institute of Technology Karnataka (NITK), Surathkal-575025, India

{achiranjeeviy,bprasantg}@iisc.ac.inAbstract

Correct stress placement on the syllables in a word or wordgroups is important in the spoken communication. Thus, in-correct syllable stress, typically made by second language (L2)learners, could result in miscommunication. In this demo, wepresent SPIRE-SST tool that tutors to learn correct stress pat-terns in a self-learning manner. Thus, the proposed tool couldalso benefit the learners without any access to the effective train-ing methods. For this, we design a front-end containing self-explanatory instructions that can be easily followed by the user.Using the front-end, learners can submit their audio to the back-end and can view the corresponding feedback from the back-end. In the back-end, we divide the entire audio from the learnerinto syllable segments and detect each syllable as stressed or un-stressed. Using these stress markings, we compute a score rep-resenting the stress quality in comparison with the ground-truthstress markings and send it to the front-end as a feedback. Wealso send a set of three features by comparing the audio fromthe expert and learner as the feedback, which we assume to beuseful for correcting the pronunciation errors.

1. IntroductionIn the language learning, localized pronunciation errors, typ-ically made by second language (L2) learners, could be min-imized with correct usage of syllable stress patterns [1]. Of-ten, incorrect stress patterns would result in miscommunica-tion [2]. There have been many works proposed to detect thesyllable stress automatically for the purpose of automatic lan-guage training [3]. However, we observe that the training toolsdeveloped from those algorithms are limited. For the benefitof L2 learning, we design SPIRE-SST1 tool that trains the L2learners with the correct usage of syllable stress in an automatedway. The proposed tool automatically assesses the learner’sstress patterns with respect to the ground-truth (referred to asexperts) stress patterns and provides a feedback. Moreover, ithas been shown that such online tools benefit the learners forwhom effective training methods are not easily accessible [4].In this demo, we present SPIRE-SST tool. To the best of ourknowledge, there have been no similar online tools available.

2. Proposed architectureThe architecture of the proposed web-based tool is shown inFigure 1. It has two major components – front-end (user in-terface) and back-end (web-server). The front-end is availableat the learner’s location and the back-end is situated at our lo-cation. Both the front-end and the back-end communicate via

Figure 1: Architecture of the proposed web-based tool

Internet. The learner can access SPIRE-SST using electronicdevices such as Desktop, laptop, mobiles, tablets etc. In addi-tion, these devices are required to connect to a microphone for

1https://spire.ee.iisc.ac.in/SPIRE-SST

recording the learner’s voice. When a learner logs-in, the mi-crophone is controlled by SPIRE-SST according to the learner’sinput until he/she logs out. We discuss more details of the front-end and the back-end in the following sub-sections.2.1. Front-endIn order to train the learners, the proposed front-end providesfollowing three main functionalities – 1) submission of thelearner’s voice 2) practice by listening to the expert 3) viewthe learner’s performance/feedback. After learner logs-in, thesethree functionalities can be accessed by clicking the respectivefollowing three buttons – 1) Submit the recording 2) Listen toexpert 3) Know your performance.2.1.1. Submit the recordingFigure 2a shows an exemplary screen that appears for the firstfunction in SPIRE-SST. On this screen, we provide a stimulito read and four buttons to control the interface – 1) Submitthe recording 2) Previous 3) Next 4) button with microphonesymbol. With mouse click on the microphone symbol, learnercan start recording his/her voice and at the same time the mi-crophone symbol is replaced with a stop symbol as shown inthe figure. With on click of the stop button, the recording isstopped and the stop symbol is replaced with the microphonesymbol. After the recording is stopped, a play button appearsbelow the stimuli as shown in the figure. On clicking the playbutton, the recorded voice can be listened, thus the learner canverify his/her recorded voice before submitting for the analy-sis. If he/she feels, the voice can be re-recorded and listened tillhis/her expected recording is achieved.

On click of ‘Submit the recording’, the most recentrecorded voice is sent to the back-end and a score represent-ing the stress quality is displayed after it is received from theback-end. The black dotted rectangular box in the figure en-closes the window that displays the quality score. In the mean-time, ‘Submit the recording’ button is replaced with the buttons,‘Listen to expert’ and ‘Know your performance’. Now, learnercan choose to view the detailed feedback from functionalitiesassociated with those two buttons or can move to previous/nextstimuli by clicking ‘Previous/Next’ button.2.1.2. Listen to expertFigure 2b shows an exemplary screen that appears by clicking‘Listen to expert’. Using this, learner can correct syllable stresserrors in his/her pronunciation by following pronunciation ofthe expert. On this screen, we provide a play button to listenthe expert pronunciation and display the syllable transcriptions(arpabet format) in the expert’s pronunciation with stress mark-ings (stressed syllable is in bold). In addition, we show threebar-graphs containing syllable specific information. In the bar-graphs, we indicate values of the following three parameters– 1) syllable duration 2) average loudness in the syllable seg-ment 3) peak loudness in the syllable segment. Each bar inthe bar-graph indicates the parameter value corresponding tothe expert’s voice in each syllable. These parameters are typi-cally used for computing the features in detecting the syllablestress and have been shown to influence the syllable stress sig-nificantly [1, 3]. Thus, we display these parameters during the

Interspeech 20182-6 September 2018, Hyderabad

2390

Figure 2: An illustration of the three main functionalities in SPIRE-SSTpractice. This helps learners to adapt their pronunciation ac-cording to the parameter values to achieve the stress markingssimilar to that of the experts. Further, they can view the pa-rameter values in their pronunciation using ‘Know your perfor-mance’ button.2.1.3. Know your performanceFigure 2c illustrates the function associated with ‘Know yourperformance’ button. On click, the expert’s bar-graph (high-lighted with thick black rectangular box) in Figure 2b replaceswith the learner’s performance as shown in Figure 2c. In thelearner’s performance, we display the syllables (transcriptions)in the learner’s pronunciation which are estimated using theforce-alignment process. Among all the syllables, one sylla-ble, which is estimated as stressed, is indicated in boldface. Inaddition, we also display a color bar with red and green colorblocks, whose length equals the number of syllables in the ex-pert’s pronunciation. In this color bar, each color block indi-cates the learner’s performance in each syllable. A red colorindicates a mismatch between the stress parameters in the pro-nunciations of the learner and the expert. The green color in-dicates no mismatch. Further, on click of each color block, itshows three bar-graphs, which display the values of the threeparameters in the respective syllable as shown in the figure. Ineach bar-graph, we show the parameter values in every syllablefrom the expert’s and the learner’s pronunciation. Using thisinformation, we assume that the learners can identify the mis-matches with the expert pronunciation and can train themselvesto achieve the expert like pronunciation.

2.2. Back-endGiven a learner’s audio, in order to obtain the syllable transcrip-tions and its stress markings, we perform force-alignment on theaudio using an automatic speech recognition (ASR) tool kit andestimate phoneme transcriptions and its boundaries. From thesetranscriptions, we obtain syllable transcriptions using automaticsyllabification software and, thus, the syllable boundaries canbe obtained. Following this, for each syllable, we estimate thestress markings as well as scores representing the confidence inestimating those markings. Further, these stress markings arecolor encoded in the color bar for each syllable. We considerthe red color when the stress marking in the learner’s pronun-ciation is not matched with that in the expert’s pronunciation,otherwise, we use the green color.

2.2.1. Stress quality score computation:Further, for the entire audio, we compute a score represent-ing the stress quality. For this, we consider the scores be-longing to every syllable obtained from the algorithm [1]. LetSE(i) and SL(i) are the scores corresponding to the expertand the learner for i-th syllable in a set of N syllables inthe expert’s pronunciation. Here, we assume that N as wellas syllable transcriptions are same in both the expert’s andlearner’s pronunciation. In case of any mismatch, we indicatethe learners about the incorrect pronunciation. Using the sylla-ble level scores, we compute the score for the entire stimuli as1N

∑Ni=1

(1− min(SE(i)−SL(i),1))

SE(i)

), such that 1 represents the

highest quality and 0 for least quality.

3. DemonstrationIn order to demonstrate SPIRE-SST, in the force-alignment pro-cess, we consider Kaldi speech recognition tool kit [5], P2TKsyllabifier [6], and a lexicon containing pronunciations for eachword. We use JavaScript and HTML for front-end coding andNode.js for back-end coding [7]. We set-up the server usingLAMP (Linux, Apache, MySQL, PHP) stack on Ubuntu 14.04LTS operating system. At the back-end, we obtain the stressmarkings and the confidence scores for both the learner and theexpert by following the work proposed by Yarra et al. [1]. Weimplement their work using Python programming language. Weconsider a set of 204 stimuli taken from the training materialused for spoken English training [2]. We divide the entire stim-uli into four parts and are available in four lessons. We obtainthe expert’s audio by recording the stimuli from a voice-overartist, proficient in British English spoken communication.

4. ConclusionWe present a web-based tool, named SPIRE-SST, that tutorsthe syllable stress to L2 learners. We design the front-end forSPIRE-SST with Javascript and HTML coding and the back-end with Node.js coding and Python programming language.Using this tool, we provide a feedback by showing the stressquality score of the learners with respect to experts as a as-sessment measure as well as the three parameters belonging tothe expert‘s and learner‘s pronunciation as corrective measures.Further investigations are required to measure the effectivenessof the proposed tool as well as analyze sufficiency of the feed-back parameters in the self-learning process.

5. AcknowledgementWe thank Pratiksha Trust for their support and Chetak R M ofMaiyas Enterprises for his support in developing SPIRE-SST.

6. References[1] C. Yarra, O. D. Deshmukh, and P. K. Ghosh, “Automatic detection

of syllable stress using sonority based prominence features for pro-nunciation evaluation,” IEEE International Conference on Acous-tics, Speech and Signal Processing, pp. 5845–5849, 2017.

[2] J. D. O’Connor, Better English Pronunciation. Cambridge Uni-versity Press, 1980.

[3] L. Ferrer, H. Bratt, C. Richey, H. Franco, V. Abrash, and K. Pre-coda, “Classification of lexical stress using spectral and prosodicfeatures for computer-assisted language learning systems,” SpeechCommunication, vol. 69, pp. 31–45, 2015.

[4] M. Kam, A. Kumar, S. Jain, A. Mathur, and J. Canny, “Improv-ing literacy in rural india: Cellphone games in an after-school pro-gram,” International Conference on Information and Communica-tion Technologies and Development (ICTD), pp. 139–149, 2009.

[5] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek,N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz et al.,“The kaldi speech recognition toolkit,” IEEE workshop on auto-matic speech recognition and understanding (ASRU), 2011.

[6] J. Tauberer, “P2TK automated syllabifier,” Available athttps://sourceforge.net/p/p2tk/code/HEAD/tree/python/syllabify/,last accessed on 10-03-2018.

[7] M. Cantelon, M. Harter, T. Holowaychuk, and N. Rajlich, Node. jsin Action. Manning Publications, 2017.

2391

Date post:	26-Sep-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

SPIRE-SST: An automatic web-based self-learning tool for ... · [1]C. Yarra, O. D. Deshmukh, and P....

Documents