An Ensemble Dialogue System for Facts-Based Sentence Generation
Ryota Tanaka, Akihide Ozeki, Shugo Kato, Akinobu Lee
Graduate School of Engineering, Nagoya Institute of Technology, Japan
Track2 Oral Session : Sentence Generation
Introductionl Neural networks-based dialogue model has several
problems. ! Not informative response such as ”I don’t know”! Inconsistent response with real world facts
l Combining multiple facts-based models allows the response to be more informative and diverse.
1
Pre-processing of Facts1. Categorize facts into 2 types based on HTML tags
! Subject Facts !"#$% : enclosed by <h>, <title>! Description Facts !&'"( : enclosed by <p>,
not enclosed by any tags2. Select facts highly related to the context
2
1 <title> Justice League </title>2 From Wikipedia, the free encyclopedia
3 For other uses, see Justice League
4 <h1> Story </h1>
… …
N <p> The film was … </p>
1 Justice League
2 Story
… ….
K’ Cast
1 From Wikipedia, the free encyclopedia
2 For other uses, see Justice League
… ….
L’ The film was …
Subject Facts !"#$%
Description Facts !&'"(
Facts
Pre-processing of Facts
1. Categorize facts into 2 types based on HTML tags
2. Select facts highly related to the context
! Select 10 facts using the cosine similarity of the
word2vec between facts and the context
3
Justice League filming wraps in London.
cosine similarity
1 Justice League
2 Superhero
… ….
K Spiderman
1 Justice League is one of the most …
2 Justice League is a 2017American superhero
… ….
L The film was …
Context
select
* This study uses K=L=10
Ensemble Dialogue Systeml Dialogue system combining 3 proposed modules
! Select the final response by feeding all the candidates
• Memory-augmented HRED (MHRED)• Sentence selection module with Facts Retrieval (FR)• Reranker
4
Dialogue Data
Reranker
Subject Facts
Description Facts
Select Response
MHRED
Facts
DB
FR
GenerateRetrieve
Ensemble Dialogue Systeml Dialogue system combining 3 proposed modules
! Select the final response by feeding all the candidates
• Memory-augmented HRED (MHRED)• Sentence selection module with Facts Retrieval (FR)• Reranker
5
Dialogue Data
Reranker
Subject Facts
Description Facts
Select Response
MHRED
Facts
DB
FR
GenerateRetrieve
Memory-augmented HRED (MHRED)
l Generate a response conditioned on a previous context and facts
6MHRED : HRED [Serban et al., 15] + MemN2N [Sukhbaatar et al.,15 ]
Memory-augmented HRED (MHRED)
l Generate a response conditioned on a previous context and facts
7
Hierarchical Recurrent Encoder (HRE)
Facts EncoderDecoder
Facts Encoderl Select facts to be injected in responsesl Map facts to a continuous representation
8
� paragraph � sub-header
final hidden state of HRE
� header � title
Ensemble Dialogue Systeml Dialogue system combining 3 proposed modules
! Select the final response by feeding all the candidates
• Memory-augmented HRED (MHRED)• Sentence Selection Module with Facts Retrieval (FR)• Reranker
9
Dialogue Data
Reranker
Subject Facts
Description Facts
Select Response
MHRED
Facts
DB
FR
GenerateRetrieve
Sentence selection with facts Retrieval (FR)
1. Construct DB2. Response Selection
Context Response
10
< Query, Response >
DB
Justice League filmingwraps in London.
I think it is betterthan Spiderman.
Sentence selection with Facts Retrieval (FR)
1. Construct DB2. Response Selection
Subject Facts
duplicate words
� Spiderman� Marvel� Justice League
� I think it is better than Spiderman.� I love Marvel movie.� Justice League is directed by Snyder.
search entries
output up to 10 responses
11
1 Justice League
2 Superhero
… ….
K Spiderman
Hit !!
Ensemble Dialogue Systeml Dialogue system combining 3 proposed modules
! Select the final response by feeding all the candidates
• Memory-augmented HRED (MHRED)• Sentence selection module with Facts Retrieval (FR)• Reranker
12
Dialogue Data
Reranker
Subject Facts
Description Facts
Select Response
MHRED
Facts
DB
FR
GenerateRetrieve
Reranker (1/2)l Reranker sorts candidates by feeding all the results of
the MHRED and FR, and then selects the best responsel Classify whether a candidate is “positive” or
“negative” as a response using the XGBoost
13
candidates
Reranker (2/2)l Dataset (created by ourselves using the distributed dataset)
! Positive Examples (44449 pairs)• Context – Response pairs selected with the high response score
in the dialogue dataset! Negative Examples (44449 pairs)
• Context – Response pairs generated from the positive examples, changing sentence length, order or topic randomly
l Features! Candidate : Length, Fluency, etc.! Last Utterance - Candidate pair : Word sim, N-gram sim,
etc.! Context - Candidate pair : Topic sim
14
Experimentsl DSTC7 Dataset
! Dialogue• Reddit (2011/01�2017/11)• Train : 832908 dialog, Dev : 40932 dialog, Test : 13440 dialog
! Facts• Articles extracted from web sites such as Wikipedia
l Evaluation Metrics! Automatic
• NIST, BLEU, METEOR : word overlap metrics• div [Li et al., 15] : diversity metric
! Human (5-point Likert scale)• Appropriateness : conversationally appropriate and relevant
to the previous turns.• Informativeness : informative response that is relevant to the
user input and has potential utility. 15
(Submitted) Models for Comparisonl S2S
! Seq2seq [Vinalys et al, 15]
l HRED! HRED [Serban et al, 16]
l HRED-Fl MHRED-Fl MHRED-F5-Rl MHRED-F15-Rl Ensemble
! proposed model
16
F: Facts Penalty
R: Reranker5, 15: Beam width
Automatic Evaluation
17
l Ensemble performs better than other models! Combining multiple systems is effective
Automatic Evaluation
18
l Ensemble performs better than other models! Combining multiple systems is effective
l MHRED-F performs on the diversity score notably! Generate diverse responses on the new domain
Automatic Evaluation
19
lEnsemble performs better than other models! Combining multiple systems is effective
l MHRED-F performs on the diversity score notably! Generate diverse responses on the new domain
l Improve word-overlap scores by introducing the Reranker ! Reranker tends to select natural responses close to human
Human Evaluation
l Official Baseline! Baseline(constant): Only “I don’t know what you mean.”
Valid response to any context ! baseline(random): Random sampling from the dialogue data.
Response is usually high fluent because of human-made
l Our model beats baseline models
20
Examples (MHRED)
21
Model Response
that is that ?
And then had one episode and it it that was it was the show
Georges show up in episode aired in a show
UserA : Til Seinfeld wasn‘t actually a “show aboutnothing,” it was a show about how comedians get their material.
UserB : They never really had an episode where someone read a book to themselves for 22 minutes. That, would be a show “about nothing”
Ensemble(MHRED)
Examples (FR)
22
Model Response
I’m not true
We are there. I don’t really count, but it’s not
Lake Baikal volume. So the answer is yes a bit more than the all Great Lakes, and a little under twice the volume of Lake Superior.
UserA : There is enough water in Lake Superior to flood the entire landmasses of north and south America to a depth of 1 foot. It contains over 3 quadrillion gallons of fresh water.
UserB : The world's largest by volume and deepest lake is located in southern Russia. Lake Baikal.
Ensemble(FR)
Example of Reranking
23
Model Response Rank
I think Tokyo Godzilla, but as well Kyoto.
They also have been a lot of Tokyo as the Tokyo are they
have the same as well. the Kyoto is the only one.
Villages aren’t cities.
UserA: Til that Kyoto, the former capital of japan, just means “capital city” and Tokyo means “eastern capital”
UserB : I only just noticed that Tokyo and Kyoto are anagrams.
���������
MHRED
MHRED
FR
1
2
worst
Conclusionl Proposed an ensemble dialogue system with facts
! Consists of the MHRED�FR and Reranker! Generate more diverse and informative responses than
a sole model
l Future Work! Extend an end-to-end learning for multiple systems
simultaneously
24