Developing SynctoLearn, an Automatic Video and Script Synchronization Tool, for Language Learners...

Post on 20-Dec-2015

224 views 1 download



Developing SynctoLearn, an Automatic Video and Script Synchronization Tool, for Language Learners

Howard Chen and Berlin Chen

National Taiwan Normal Univesity

Authentic Video and ESL Learners

Videos have been widely used in foreign language teaching for a long time.

Currently, there is increasing attention to authentic video clips since there are many interesting and exciting video clips available on the Internet.

These authentic videos should be very useful for foreign language learners. However, learners might still need some supports while watching these authentic learning materials.

Input Modification in SLA

Some argue that the design of the pedagogical materials should be informed by theory such as the interactionist SLA theory, which suggests that input modification can help comprehension.

Based on our observations, without captions and other supporting devices, intermediate level ESL learners often have great difficulties in comprehending these videos because of the fast speed and many unknown vocabulary items.

Captions as a Key Support for Learners Although there are plenty of video clips available, these

video clips often do not have captions because most of these videos are targeted at native speakers. 

Captions for videos are commonly believed to be able to facilitate listening comprehension (Chapelle, 1997).

If captions can be added to some of these videos, students are more likely to better understand the content and pick up more new vocabulary items of the target language.  However, adding captions manually would be a time-consuming task.

A Recent Study Comparing Scripts and Captions Grgurovic, M. & Hegelheimer, V. (2007). conducted a

study comparing learners’ use of both scripts and captions.

They want investigated whether subtitles or transcripts are more effective in providing modified input to learners.

A multimedia listening activity containing a video of an academic lecture was designed to offer help in the form of target language subtitles (captions) and lecture transcripts in cases of comprehension breakdowns.

The results indicate that participants interacted with the subtitles more frequently and for longer periods of time than with the transcript.

Examples of Online Video and Scripts

VOA news CNN news Students need to view the video and read the


An automatic video and script synchronization system called SynctoLearn

To help language teachers and students to make better use of a wide variety of authentic videos, we developed an automatic video and script synchronization system called SynctoLearn.

We used videos and scripts taken from VOA (Voice of America) web site. This automatic synchronization system was developed mainly with the help of speech recognition technologies. The system was first trained with VOA video and scripts. A tri-phone acoustic model of the VOA news was then built up. The HTK (Hidden Markov Model Toolkit) of Cambridge University was used to run the force alignment procedure. Through the alignment procedure, we have time-stamped VOA videos.  


Force Alignment

VOA Corpus



OOV Removal

VOA transcription

Transcriptionwith time boundary

這邊是 rm檔經過軟體所抽取出來的 wave音訊

Speech feature vector sequence

這邊利用 HTK裡面的HCopy函式來抽取MF


PS. HTK Toolkit : 乃是劍橋大學所開發的語音辨識軟體

這邊利用 HTK裡面的 HVite函式來執行 force align的動作

這是經由庭瑋根據 VOA語料所訓練得到的三連聲學音素模型 (Tri-phone acoustic model)


每一則新聞腳本於辨識前必須先進行前處理,把標點符號與 OOV

(Out Of Vocabulary)給過濾掉


PS. MFCC (Mel-Frequency Cepstral Coefficient) 乃是語音辨識中常用的語音特徵參數

A SynctoLearn Server

With the help of this automatic synchronization engine, anyone can upload videos and scripts into a SynctoLearn system and obtain automated captioning videos.  In addition to VOA videos, we also uploaded many videos and scripts of the CNN Student News to the server and found that SynctoLearn system can synchronize the CNN student news accurately.

Video-viewing System with Automatic Captions In addition to the core automatic synchronization engine, some

other useful options of viewing videos were also provided. When students watch the video, the scripts automatically synchronize with the audios/videos by default. Nevertheless, students can also choose to turn off the captions (synchronized texts) and watch the videos without captions.

This option can encourage students not to rely on the scripts. If students’ listening abilities reach a higher level, they can sometimes turn off the captions. In addition, because the videos and scripts are time-stamped, students can click on any word in the script and the video will be (re)played from that specific word. The convenient playback function can help students quickly capture what they missed in the video viewing processes. These options might be useful for vocabulary learning and listening comprehension.

Demo of SynctoLearn-VOA

Demo of SynctoLearn-CNN Student News

User Feedback on This System

Based on the survey results from two groups of ESL students who used this system for several months, we found that most students enjoyed watching the synchronized video clips generated by SynctoLearn.

Most students (85%) felt very satisfied or satisfied about this new tool. They felt more comfortable and confident with the support of this synchronization tool.  In addition, students indicated that they in particular like the following two options: the option to hide the captions and the option to randomly replay the video segments by clicking on the words in the scripts. With automated captions, students had more opportunities to learn the new words and their pronunciations. They also could better understand the video content.

Suggestions for Improvement

However, there were some problems in this prototype system. Students suggested that screen size and the quality of the VOA video can be improved. They expected to see a larger video with higher resolution. In addition, they also recommended that the display of captions can be modified or improved.

The Future Development

Based on these encouraging results of using SynctoLearn on VOA/CNN videos, we can further extend the learning content to other types of English videos and scripts and fix the problems identified by students. There are more and more video clips and scripts available on the Internet and these materials can be synchronized automatically with the same technologies. Similar synchronization technologies can also be adapted to process video and texts in other different languages. It is expected that the automatic synchronization system can help more language learners improve their listening abilities and learn more vocabulary items.

Procedures of Preparing the Alignment

1. Download video clips and use Flash Video Encoder to convert them into flv format


2. Use audio converting software to extract audio (wav files) from video clips

Sample Rate : 16 khzBits : 16 bitsChannel : MonoBitrate : 256.0 kb/sec


3. Use Adobe Audition to convert wav files into pcm files

Run Batch Processing


3. Use Adobe Audition to convert wav files into pcm files

Run Batch Processing

Add wav files


3. Use Adobe Audition to convert wav files into pcm files

Run Batch Processing

Add wav files

Change destination format

Sample Rate : 16000 HzChannels : MonoResolution : 16 bit


3. Use Adobe Audition to convert wav files into pcm files

Run Batch Processing

Add wav files

Change destination format

Run Batch

Procedure4. Upload the flv file, the subtitle text (txt file), and t

he pcm file onto the website (

The final outcome

Thanks for your attentionQuestions and Discussions Taiwan Normal University