+ All Categories
Home > Documents > Turtlebot Poster_Summer 2016

Turtlebot Poster_Summer 2016

Date post: 15-Apr-2017
Category:
Upload: ye-sung-kim
View: 34 times
Download: 1 times
Share this document with a friend
1
Implementing Task-Oriented Dialogues on Turtlebot 2 Mahima Ghale, Caitlin Coggins, Rebecca Kim, Raeesa Mehjabeen Interactive Computing Research Lab Mount Holyoke College, Department of Computer Science Professor Heather Pon-Barry turtlebot edit.jpg Text to speech(TTS) is a speech synthesizer that converts text input into speech output. Google TTS was used because its voice output flows smoothly and sounds most human-like out of all that were tried during this summer research. Future works for this research involve improving speech recognition by using acoustic modeling in Pocketsphinx, switching to Kaldi, and/or improving input audio quality by either placing the Kinect on top of the Turtlebot. Dialogues can be made more natural by finding ways to signal (using LEDs, beep sound etc.) the user when the turtlebot is ready to listen, and using a mixed-initiative interaction and varying patterns in the dialogue. Localization and navigation will need to be refined by customizing the SLAM algorithm so that the Turtlebot can recover from sudden obstacles quickly and efficiently. Future Works Text To Speech (TTS) Kinect Figure 4. The process of running Google TTS on the Turtlebot Acknowledgements We would like to thank Professor Heather Pon-Barry for providing us with the opportunity to work on this project, the Clare Boothe Luce Fund and Mount Holyoke LYNK Fund for providing necessary funding, and the Computer Science Department for constant help and support. We would also like to thank Joydeep and his team in AMRL at University of Massachusetts for helping us set up the Turtlebot. Navigation, Mapping, and Localization For Navi to be able to go to specific rooms, it must create a map ( mapping), be able to read the map, keep track of its position in the map ( localization) and calculate a path to the desired destination ( navigation). For this purpose, we used a ROS package called turtlebot_navigation, which implements the SLAM (Simultaneous Localization and Mapping) algorithm. The Kinect’s 3D Sensors detect walls and everything it considers to be an obstacle, which are then saved as a map. During the research, several places inside the lab were marked with room numbers for convenience. When given a map of the environment and the Navi’s initial position, the turtlebot_navigation package calculates a path to reach its destination. particular word or a group of words in a phrase or a sentence, enables Navi to understand the user as long as a keyword is found in the user’s speech or utterance. This allows the user to answer Navi’s questions freely, without having to follow a dialogue script. The conversation was converted into a Turtlebot-readable format using GraphML, an XML representation of a graph containing nodes and edges. Dialogues Figure 2. A part of GraphML from the Turtlebot’s Dialogue.The yellow boxes above are nodes(Turtlebot’s speech) and the thin arrows with text labels are edges(keywords from user’s speech). ASR TTS Task Graph Figure 5. Kinect, with its labeled parts, used for ASR, as well as for mapping and navigation Turtlebot 2 is a service robot that should be able to perform tasks for its users. The goal for this summer was to enable it to deliver items or guide a Abstract visitor to a room. To make this possible, the main focus was on behavior and speech recognition, which allows users to ask the TurtleBot for help rather than typing in instructions on a computer. Figure 1. Turtlebot 2, named Navi, in Interactive Computing Research Lab (ICRL) Figure 5. The map of Interactive Computing Research Lab (ICRL) created by driving Navi around through teleoperation, using keyboards to control where it moves. Automatic Speech Recognition (ASR) Pocketsphinx is an open-source, speaker-independent, and continuous speech recognition engine. Although more challenging to install and use, Pocketsphinx has a much better recognition quality than that of Rospeex. Users can fine-tune Pocketsphinx by creating a new dictionary, which contains a list of pronunciations of words that the TurtleBot can recognize. Grammar also makes it easier to figure out which words from the dictionary were spoken. Google ASR is a closed-source, online ASR system that converts audio to text. It returns several texts that may correspond to the audio input, and also provides a confidence level. Automatic Speech Recognition(ASR) is a process by which a computer translates a person’s speech into text. Several different ASR engines were used, including Rospeex, Pocketsphinx, and Google ASR. Rospeex is a Robot Operating System (ROS) package. While simple to install and use, Rospeex provided the worst recognition of all ASR systems tested. The package is closed sourced, so there is no way to improve the recognition. Figure 3. Several scripts are need to run Pocketsphinx. Kinect is a Microsoft sensor add-on for the Xbox Gaming console. It consists of a microphone array, 3D Depth Sensors, and an RGB camera. A task-oriented dialogue (conversation) was developed based on the information required by Navi in order to perform a task. In order to make the conversation as unconstrained (meaning that users don’t have to follow an exact script to converse with Navi) as possible, word spotting and regular expression (regex) were adopted. Regex, which can find a
Transcript
Page 1: Turtlebot Poster_Summer 2016

Implementing Task-Oriented Dialogues on Turtlebot 2Mahima Ghale, Caitlin Coggins, Rebecca Kim, Raeesa Mehjabeen

Interactive Computing Research LabMount Holyoke College, Department of Computer Science

Professor Heather Pon-Barry

turtlebot edit.jpg

Text to speech(TTS) is a speech synthesizer that converts text input into speech output. Google TTS was used because its voice output flows smoothly and sounds most human-like out of all that were tried during this summer research.

Future works for this research involve improving speech recognition by using acoustic modeling in Pocketsphinx, switching to Kaldi, and/or improving input audio quality by either placing the Kinect on top of the Turtlebot. Dialogues can be made more natural by finding ways to signal (using LEDs, beep sound etc.) the user when the turtlebot is ready to listen, and using a mixed-initiative interaction and varying patterns in the dialogue. Localization and navigation will need to be refined by customizing the SLAM algorithm so that the Turtlebot can recover from sudden obstacles quickly and efficiently.

FutureWorks

TextToSpeech(TTS)

Kinect

Figure4.TheprocessofrunningGoogleTTSontheTurtlebot

AcknowledgementsWe would like to thank Professor Heather Pon-Barry for providing us with the opportunity to work on this project, the Clare Boothe Luce Fund and Mount Holyoke LYNK Fund for providing necessary funding, and the Computer Science Department for constant help and support. We would also like to thank Joydeep and his team in AMRL at University of Massachusetts for helping us set up the Turtlebot.

Navigation,Mapping,andLocalization

For Navi to be able to go to specific rooms, it must create a map (mapping), be able to read the map, keep track of its position in the map (localization) and calculate a path to the desired destination (navigation). For this purpose, we used a ROS package called turtlebot_navigation, which implements the SLAM (Simultaneous Localization and Mapping) algorithm.

The Kinect’s 3D Sensors detect walls and everything it considers to be an obstacle, which are then saved as a map. During the research, several places inside the lab were marked with room numbers for convenience. When given a map of the environment and the Navi’s initial position, the turtlebot_navigation package calculates a path to reach its destination.

particular word or a group of words in a phrase or a sentence, enables Navi to understand the user as long as a keyword is found in the user’s speech or utterance. This allows the user to answer Navi’squestions freely, without having to follow a dialogue script. The conversation was converted into a Turtlebot-readable format using GraphML, an XML representation of a graph containing nodes and edges.

Dialogues

Figure 2. A part of GraphML from the Turtlebot’s Dialogue.The yellow boxes above are nodes(Turtlebot’s speech) and the thin arrows with text labels are edges(keywords from user’s speech).

ASR TTS TaskGraphFigure5.Kinect, withitslabeledparts,usedforASR,aswellasformappingandnavigation

Turtlebot 2 is a service robot thatshould be able to perform tasks for its users. The goal for this summer was to enable it to deliver items or guide a

Abstract

visitor to a room. To make this possible, the main focus was on behavior and speech recognition, which allows users to ask the TurtleBotfor help rather than typing in instructions on a computer.

Figure 1. Turtlebot 2, named Navi, in Interactive Computing Research Lab (ICRL)

Figure 5. The map of Interactive Computing Research Lab (ICRL) created by driving Navi around through teleoperation, using keyboards to control where it moves.

AutomaticSpeechRecognition(ASR)

Pocketsphinx is an open-source, speaker-independent, and continuous speech recognition engine. Although more challenging to install and use, Pocketsphinx has a much better recognition quality than that of Rospeex. Users can fine-tune Pocketsphinx by creating a new dictionary, which contains a list of pronunciations of words that the TurtleBot can recognize. Grammar also makes it easier to figure out which words from the dictionary were spoken.

Google ASR is a closed-source, online ASR system that converts audio to text. It returns several texts that may correspond to the audio input, and also provides a confidence level.

Automatic Speech Recognition(ASR) is a process by which a computer translates a person’s speech into text. Several different ASR engines were used, including Rospeex, Pocketsphinx, and Google ASR.

Rospeex is a Robot Operating System (ROS) package. While simple to install and use, Rospeex provided the worst recognition of all ASR systems tested. The package is closed sourced, so there is no way to improve the recognition.

Figure3.SeveralscriptsareneedtorunPocketsphinx.

Kinect is a Microsoft sensor add-on for the Xbox Gaming console. It consists of a microphone array, 3D Depth Sensors, and an RGB camera.

A task-oriented dialogue (conversation) was developed based on the information required by Navi in order to perform a task. In order to make the conversation as unconstrained (meaning that users don’t have to follow an exact script to converse with Navi) as possible, word spotting and regular expression (regex) were adopted. Regex, which can find a

Recommended