+ All Categories
Home > Documents > Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs,...

Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs,...

Date post: 19-Oct-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
22
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Automatic Evaluation of End-to-End Dialog Systems with Adequacy-Fluency Metrics d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog systems are gaining interest due to the recent advances of deep neural networks and the availability of large human-human dialog corpora. However, in spite of being of fundamental importance to systematically improve the performance of this kind of systems, automatic evaluation of the generated dialog utterances is still an unsolved problem. Indeed, most of the proposed objective metrics shown low correlation with human evaluations. In this paper, we evaluate a two-dimensional evaluation metric that is designed to operate at sentence level, which considers the syntactic and semantic information carried along the answers generated by an end-to-end dialog system with respect to a set of references. The proposed metric, when applied to outputs generated by the systems participating in track 2 of the DSTC-6 challenge, shows a higher correlation with human evaluations (up to 12.8% relative improvement at the system level) than the best of the alternative state-of-the-art automatic metrics currently available. Special issue on DSTC6 in Computer Speech and Langauge This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, Inc., 2019 201 Broadway, Cambridge, Massachusetts 02139
Transcript
Page 1: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog

MITSUBISHI ELECTRIC RESEARCH LABORATORIEShttp://www.merl.com

Automatic Evaluation of End-to-End Dialog Systems withAdequacy-Fluency Metrics

d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou

TR2018-195 March 14, 2019

AbstractEnd-to-End dialog systems are gaining interest due to the recent advances of deep neuralnetworks and the availability of large human-human dialog corpora. However, in spite ofbeing of fundamental importance to systematically improve the performance of this kind ofsystems, automatic evaluation of the generated dialog utterances is still an unsolved problem.Indeed, most of the proposed objective metrics shown low correlation with human evaluations.In this paper, we evaluate a two-dimensional evaluation metric that is designed to operateat sentence level, which considers the syntactic and semantic information carried along theanswers generated by an end-to-end dialog system with respect to a set of references. Theproposed metric, when applied to outputs generated by the systems participating in track 2of the DSTC-6 challenge, shows a higher correlation with human evaluations (up to 12.8%relative improvement at the system level) than the best of the alternative state-of-the-artautomatic metrics currently available.

Special issue on DSTC6 in Computer Speech and Langauge

This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy inwhole or in part without payment of fee is granted for nonprofit educational and research purposes provided that allsuch whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi ElectricResearch Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and allapplicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall requirea license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved.

Copyright c© Mitsubishi Electric Research Laboratories, Inc., 2019201 Broadway, Cambridge, Massachusetts 02139

Page 2: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 3: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 4: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 5: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 6: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 7: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 8: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 9: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 10: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 11: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 12: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 13: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 14: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 15: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 16: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 17: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 18: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 19: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 20: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 21: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog
Page 22: Automatic Evaluation of End-to-End Dialog Systems with ... · d’Haro, Luis Fernando; Banchs, Rafael; Hori, Chiori; Li, Haizhou TR2018-195 March 14, 2019 Abstract End-to-End dialog

Recommended