An Ensemble Dialogue System for Facts-Based Sentence...

An Ensemble Dialogue System for Facts-Based Sentence Generation

Ryota Tanaka, Akihide Ozeki, Shugo Kato, Akinobu Lee

Graduate School of Engineering, Nagoya Institute of Technology, Japan

Track2 Oral Session : Sentence Generation

Introductionl Neural networks-based dialogue model has several

problems. ! Not informative response such as ”I don’t know”! Inconsistent response with real world facts

l Combining multiple facts-based models allows the response to be more informative and diverse.

1

Pre-processing of Facts1. Categorize facts into 2 types based on HTML tags

! Subject Facts !"#$% : enclosed by <h>, <title>! Description Facts !&'"( : enclosed by <p>,

not enclosed by any tags2. Select facts highly related to the context

2

1 <title> Justice League </title>2 From Wikipedia, the free encyclopedia

3 For other uses, see Justice League

4 <h1> Story </h1>

… …

N <p> The film was … </p>

1 Justice League

2 Story

… ….

K’ Cast

1 From Wikipedia, the free encyclopedia

2 For other uses, see Justice League

… ….

L’ The film was …

Subject Facts !"#$%

Description Facts !&'"(

Facts

Pre-processing of Facts

1. Categorize facts into 2 types based on HTML tags

2. Select facts highly related to the context

! Select 10 facts using the cosine similarity of the

word2vec between facts and the context

3

Justice League filming wraps in London.

cosine similarity

1 Justice League

2 Superhero

… ….

K Spiderman

1 Justice League is one of the most …

2 Justice League is a 2017American superhero

… ….

L The film was …

Context

select

* This study uses K=L=10

Ensemble Dialogue Systeml Dialogue system combining 3 proposed modules

! Select the final response by feeding all the candidates

• Memory-augmented HRED (MHRED)• Sentence selection module with Facts Retrieval (FR)• Reranker

4

Dialogue Data

Reranker

Subject Facts

Description Facts

Select Response

MHRED

Facts

DB

FR

GenerateRetrieve




5

Dialogue Data

Reranker

Subject Facts

Description Facts

Select Response

MHRED

Facts

DB

FR

GenerateRetrieve

Memory-augmented HRED (MHRED)

l Generate a response conditioned on a previous context and facts

6MHRED : HRED [Serban et al., 15] + MemN2N [Sukhbaatar et al.,15 ]

Memory-augmented HRED (MHRED)

l Generate a response conditioned on a previous context and facts

7

Hierarchical Recurrent Encoder (HRE)

Facts EncoderDecoder

Facts Encoderl Select facts to be injected in responsesl Map facts to a continuous representation

8

� paragraph � sub-header

final hidden state of HRE

� header � title



• Memory-augmented HRED (MHRED)• Sentence Selection Module with Facts Retrieval (FR)• Reranker

9

Dialogue Data

Reranker

Subject Facts

Description Facts

Select Response

MHRED

Facts

DB

FR

GenerateRetrieve

Sentence selection with facts Retrieval (FR)

1. Construct DB2. Response Selection

Context Response

10

< Query, Response >

DB

Justice League filmingwraps in London.

I think it is betterthan Spiderman.

Sentence selection with Facts Retrieval (FR)

1. Construct DB2. Response Selection

Subject Facts

duplicate words

� Spiderman� Marvel� Justice League

� I think it is better than Spiderman.� I love Marvel movie.� Justice League is directed by Snyder.

search entries

output up to 10 responses

11

1 Justice League

2 Superhero

… ….

K Spiderman

Hit !!




12

Dialogue Data

Reranker

Subject Facts

Description Facts

Select Response

MHRED

Facts

DB

FR

GenerateRetrieve

Reranker (1/2)l Reranker sorts candidates by feeding all the results of

the MHRED and FR, and then selects the best responsel Classify whether a candidate is “positive” or

“negative” as a response using the XGBoost

13

candidates

Reranker (2/2)l Dataset (created by ourselves using the distributed dataset)

! Positive Examples (44449 pairs)• Context – Response pairs selected with the high response score

in the dialogue dataset! Negative Examples (44449 pairs)

• Context – Response pairs generated from the positive examples, changing sentence length, order or topic randomly

l Features! Candidate : Length, Fluency, etc.! Last Utterance - Candidate pair : Word sim, N-gram sim,

etc.! Context - Candidate pair : Topic sim

14

Experimentsl DSTC7 Dataset

! Dialogue• Reddit (2011/01�2017/11)• Train : 832908 dialog, Dev : 40932 dialog, Test : 13440 dialog

! Facts• Articles extracted from web sites such as Wikipedia

l Evaluation Metrics! Automatic

• NIST, BLEU, METEOR : word overlap metrics• div [Li et al., 15] : diversity metric

! Human (5-point Likert scale)• Appropriateness : conversationally appropriate and relevant

to the previous turns.• Informativeness : informative response that is relevant to the

user input and has potential utility. 15

(Submitted) Models for Comparisonl S2S

! Seq2seq [Vinalys et al, 15]

l HRED! HRED [Serban et al, 16]

l HRED-Fl MHRED-Fl MHRED-F5-Rl MHRED-F15-Rl Ensemble

! proposed model

16

F: Facts Penalty

R: Reranker5, 15: Beam width

Automatic Evaluation

17

l Ensemble performs better than other models! Combining multiple systems is effective


18

l Ensemble performs better than other models! Combining multiple systems is effective

l MHRED-F performs on the diversity score notably! Generate diverse responses on the new domain


19

lEnsemble performs better than other models! Combining multiple systems is effective

l MHRED-F performs on the diversity score notably! Generate diverse responses on the new domain

l Improve word-overlap scores by introducing the Reranker ! Reranker tends to select natural responses close to human

Human Evaluation

l Official Baseline! Baseline(constant): Only “I don’t know what you mean.”

Valid response to any context ! baseline(random): Random sampling from the dialogue data.

Response is usually high fluent because of human-made

l Our model beats baseline models

20

Examples (MHRED)

21

Model Response

that is that ?

And then had one episode and it it that was it was the show

Georges show up in episode aired in a show

UserA : Til Seinfeld wasn‘t actually a “show aboutnothing,” it was a show about how comedians get their material.

UserB : They never really had an episode where someone read a book to themselves for 22 minutes. That, would be a show “about nothing”

Ensemble(MHRED)

Examples (FR)

22

Model Response

I’m not true

We are there. I don’t really count, but it’s not

Lake Baikal volume. So the answer is yes a bit more than the all Great Lakes, and a little under twice the volume of Lake Superior.

UserA : There is enough water in Lake Superior to flood the entire landmasses of north and south America to a depth of 1 foot. It contains over 3 quadrillion gallons of fresh water.

UserB : The world's largest by volume and deepest lake is located in southern Russia. Lake Baikal.

Ensemble(FR)

Example of Reranking

23

Model Response Rank

I think Tokyo Godzilla, but as well Kyoto.

They also have been a lot of Tokyo as the Tokyo are they

have the same as well. the Kyoto is the only one.

Villages aren’t cities.

UserA: Til that Kyoto, the former capital of japan, just means “capital city” and Tokyo means “eastern capital”

UserB : I only just noticed that Tokyo and Kyoto are anagrams.

��

MHRED

MHRED

FR

1

2

worst

Conclusionl Proposed an ensemble dialogue system with facts

! Consists of the MHRED�FR and Reranker! Generate more diverse and informative responses than

a sole model

l Future Work! Extend an end-to-end learning for multiple systems

simultaneously

24

Thank You

Q & [email protected]

Special thanks to Track2 Organizers

Date post:	16-Oct-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

An Ensemble Dialogue System for Facts-Based Sentence...

Documents