Date post: | 06-Jan-2017 |
Category: |
Data & Analytics |
Upload: | ogushi-masaya |
View: | 784 times |
Download: | 0 times |
Chat Bot Made by Chainer Chainer is a Neural Network Framework
PyCon JP 2016Masaya Ogushi
1
Attention
I will not show any Mathematical formula
If you understand the machine learning model, I recommend to read the paper in the last page
2
Agenda
:
Self Introduction
Dialogue Value
Character of the Bot
System
Feature Plan
3
Chat Bot
Choosethe Topic
UnderstandContents Control
DialogueGenerateAnswer
Answer Candidates
Neural Network1
UnderstandContents Control
DialogueGenerateAnswer
Neural Network 2
Question
Question Answer
Self Introduction
4
Self Introduction
Name:Masaya Ogushi
@SnowGushiGit
PORT. Inc
Web Development Research and Development team
Tech-Circle staff
machine learning, Natural Language Processing, Crawler Dev, Automatic Infrastructure Construction, parallel processing, SearchFunction
5
Self Introduction
We’re Hiring!!
https://www.theport.jp/recruit/information/6
Self Introduction
Free Consulting for your job search
https://port-recruitment.jp/7
Dialogue Value
8
Agenda
:
Self Introduction
Dialogue Value
Character of the Bot
System
Feature Plan
9
Chat Bot
Choosethe Topic
UnderstandContents Control
DialogueGenerateAnswer
Answer Candidates
Neural Network1
UnderstandContents Control
DialogueGenerateAnswer
Neural Network 2
Question
Question Answer
Dialogue Value
Dialogue Value
Continuously
Interactive
New User Experiences
10
Dialogue Value
Continuously
It is possible to use the prior conversation’s information
11
I really love to play the tennis
Chat Bot
That’s sounds great
Well, a friend of mine owns a sports shop and is
looking for help. find the part time job
candidates
Yeah well I looking for the
part time job Do you know the any good ones?
Dialogue Value
Interactive
It is possible to react to new information
12
I found the delicious sweets
I am on a diet
Don’t say such a things during the
I’m on a diet
This food is a good for losing
the weight
Really ???
Chat Bot
Dialogue Value
New User Experiences
Character
13
Example
Dialogue Value
Dialogue Value
continuously
Interactive
New User Experiences
14
Need Dialogue Data
Possible to achieve even without dialogue data
Character of the Bot
15
Agenda
:
Self Introduction
Dialogue Value
Character of the Bot
System
Feature Plan
16
Chat Bot
Choosethe Topic
UnderstandContents Control
DialogueGenerateAnswer
Answer Candidates
Neural Network1
UnderstandContents Control
DialogueGenerateAnswer
Neural Network 2
Question
Question Answer
Which Characters looks smart ?
17
Character of the Bot
Which Characters would you like to talk with?
18
Character of the Bot
Which looks and character is a big gap ?
19
Character of the Bot
I read the “Statistical Machine
Translation”
I read the “Statistical Machine
Translation”
Character of the Bot
I recommend Matsuya
Do you know any good part time
jobs
No, something more suited to
me
How about a sweets shop ?
Sounds Good
Character is very important
20
Food Shop
Maybe Sweets are the better
Looks Funny
Uhhh.. He only think
about food.
You understand it better than I
expected
Learning
allowance
Surprise
Chat Bot
Character of the BotImprove the new use experiences
Decreasing the expected value of the answer and Became easer to talk to
Image of the Icon
Conversation of the Bot
21
Cognitive Science
Expect value of the answer:High
Easy to talk:Low
Expect value of the answer:Low
Easy to talk:High
Feel free to talk to me
Feel free to talk to me
Talk it
Character of the BotPreparing the sentences for each character is costly
We have to change a little in the same sentences
*The Example uses the different levels of politeness in Japanese, the nuances of which are hard to translate into English
22
あなたは食べたパン の数を
把握されていますか?
お前は食ったパンの数を
覚えているか?
あなたは食べたパン の数を
覚えているの?
23
Character of the BotWe need Many scenario writers
Character of the Bot
We would like to change the Character but not change the contents
24
あなたは食べたパン の数を
把握されていますか?
お前は食ったパンの数を
覚えているか?
あなたは食べたパン の数を
覚えているの?
Character of the Bot
25
Woman
Fat Man
Steward
Do you remember the number of the breads ?
Add the Character
Character of the BotNeuralStoryTeller
Add the Character into the normal sentences。Add the romantic elements below
26
27
Character of the BotIt is possible to apply to a variety of situations, if we
prepare the characters sentences
28
Character of the BotI’m sorry.
I can’t implement this characters function.
Agenda
:
Self Introduction
Dialogue Value
Character of the Bot
System
Feature Plan
29
Chat Bot
Choosethe Topic
UnderstandContents Control
DialogueGenerateAnswer
Answer Candidates
Neural Network1
UnderstandContents Control
DialogueGenerateAnswer
Neural Network 2
Question
Question Answer
System
30
Agenda
:
System Architecture
Dialogue Interface is Slack
Prepare the conversation data form Twitter
Pre training use the Wikipedia Data and the Dialogue Breakdown Collection
Choose the TOPIC by using the WordNet and WikiPedia Entity Vector
Dialogue model made by Chainer
Question and Answer functionally uses Elasticsearch
31
Chat Bot
Choosethe Topic
UnderstandContents Control
DialogueGenerateAnswer
Answer Candidates
Neural Network1
UnderstandContents Control
DialogueGenerateAnswer
Neural Network 2
Question
Question Answer
WordNet WikiPediaVector
Agenda
:32
Chat Bot
Choosethe Topic
UnderstandContents Control
DialogueGenerateAnswer
Answer Candidates
Neural Network1
UnderstandContents Control
DialogueGenerateAnswer
Neural Network 2
Question
Question Answer
WordNet WikiPediaVector
Choose the Topic
SystemChoose the Topic
Conversation contents is changed by the someone
33
Which is more important
to you me or your job
Please tell me where
you bought your clothes
Please give me a money
Boy Friend
Young Sister
Father
SystemChoose the Topic
Word Net
Data set is the grouped into set of cognitive synonyms, each expressing a distinct concept
34
Scottish hold Black cat Orange Cat
Cat
35
SystemMany concept. 57238 concepts It is difficult to prepare the data.
36
SystemGrouping the concepts
SystemThe way of grouping
Mapping the concept space
Grouping by distance
37
cat
cat
catTiger
SystemMapping the concept space
38
Close??
We could not understand the distance comparing each words
We have to map the word to space, which makes it possible to measure the distance
SystemMapping the concept space
Entity Linking
Mapping the Keyword to the Knowledge space
39
Close??
Knowledge Space
FacebookTwitter
SNS
SystemChoose the Topic
Mapping the concept to the knowledge space
Japanese WikiPedia Entity Vector !!!
Vector representations of Words and WikiPedia(Knowldge)
(Wikipedia is the called the Entity)
40
SystemChoose the Topic
Synonym get the vector by the WikiPedia Entity Vector
41
cat:[0.2, 0.3, 0.4…] dog:[0.3, 0.4, 0.5…]
42
SystemMeasure the Distance
SystemMeasuring the Concept Distance
43
Choose the appropriate measure for the distance in mapping space
If we make a mistake choosing the measuring of the distance.
It looks yellow
is close
Light blue is closer than
yellow
44
SystemI used cosine similarity to measure the
vector distance
SystemChoose the Topic
Synonym get the vector by the WikiPedia Entity Vector
45
Cat:[0.2, 0.3, 0.4…] Dog:[0.3, 0.4, 0.5…]
Cosine Similarity
46
SystemMany Concepts yet
SystemChoose the Topic
Add the Unknown words of the WordNet from the Wikipedia Entity Vector.
47
Black CatWhite Cat:
calico cat:
CatWikipedia Entity Vector
Close the Cosine
Similarity
Add the Unknown words
Word Net
SystemChoose the Topic
Calculate the each concept Average vector
48
Black Cat:[0.2, 0.3, 0.4…]White Cat:[0.1, 0.3, 0.…]
:
Cat
Shiba:[0.1, 0.3, 0.4…]Tosa:[0.1, 0.2, 0.…]
:
Dog
Average Vector
Average Vector
SystemChoose the Topic
If the average vector is close to each concept, group them by concept
49
Black Cat:[0.2, 0.3, 0.4…]White Cat:[0.1, 0.3, 0.…]
:
Cat
Shiba:[0.1, 0.3, 0.4…]Tosa:[0.1, 0.2, 0.…]
:
Dog
Average Vector
Average Vector
grouping the each concept
50
SystemMany concepts yet(20000 concepts)
SystemChoose the Topic
Choose the concept from over the 1000 words. It is easy to match the phrase.
51
Black Cat White Cat
:
Cat
Shiba Tosa :
Dog
swan duck :
Bird
koala
Koala
Choosing the Concept
52
System76 concepts
(Attention:I didn’t use the all WikiPedia Entity Vectors)
SystemChoose the Topic
The way of the choosing the dialogue
Choose the each concept by the word match rate
53
Where can I buy
cute clothes ?
Boy FriendCool
Nice guy :
Young SisterCute
Clothes :
Fathermoney gentle :
Calculate the word match
rate
Agenda
:54
Chat Bot
Choosethe Topic
UnderstandContents Control
DialogueGenerateAnswer
Answer Candidates
Neural Network1
UnderstandContents Control
DialogueGenerateAnswer
Neural Network 2
Question
Question Answer
WordNet WikiPediaVector
Understand Contents
Control Dialogue
Generate the Answer
55
SystemAll parts made by the Neural Network(Attention:I might be the best way)
56
SystemWhy Neural Network?
I will explain how to apply the Neural Network to Natural Language Processing
Dialogue Value
Value of the Neural Network
Expression
Continuously
Focus
57
Dialogue Value
Value of the Neural Network
Expression
Continuously
Focus
58
59
SystemMapping natural language to the vector space using the Bag of
words(Prepare the Dictionary and Count the word in the dictionary)
Low High
Word Phrase sentenceExpression
I show am me your you … when are1, 0, 0, 0, 0, 0, … 0, 0I
am
Shota
I show am me your you … when are0, 0, 1, 0, 0, 0, … 0, 0
I show am me your you … when are0, 0, 0, 0, 0, 0, … 0, 0
It is rate time to use it, but over the million wordsData
60
SystemIt only considers words.
61
SystemDeep Learning is an efficient method for learning high-quality
distributed vector representations that capture a large number of precise syntactic and semantic word relationships
Low High
Word Phrase sentenceExpression
I
am
Shota
Distributed representations of words in a vector space by the Deep leaning
Data
Deep Learning
0.5, 0.0, 1.0, 1.0, 0.3, 0.0
0.5, 0.0, 1.0, 1.0, 0.0, 0.0
0.5, 0.0, 1.0, 0.5, 0.3, 0.0
Dialogue Value
Value of the Neural Network
Expression
Continuously
Focus
62
63
System
太郎 さん こんにちは
Phrase is important for Continuously Recurrent Neural Network is possible to consider the
Continuously
Dialogue Value
Value of the Neural Network
Expression
Continuously
Focus
64
65
System
+
太郎 さん こんにちは
Focus is important for important phrasing.A Attention Model(Neural Network) considers which are
the focus words
SystemValue of the Neural Network
Expression
Continuously
Focus
66
Which is more important
to you me or your job
Please tell me where
you bought your clothes
Please give me money
Boy Friend
Young Sister
Father
67
SystemHow to implement a Neural Network
68
SystemThis is a Dialogue Model
太郎 さん こんにちは
こんにちは<EOS>
+Encoder
Decoder
69
SystemMapping the Phrases to a neural network space.
The middle layer express a neural network space.
太郎 さん こんにちは
太郎:1さん:0こんにちは:0
:
太郎:1さん:0こんにちは:0
:
70
SystemContinuously learn from phrases
0 0 0 0 1 : 0 output
layer
さん
こんにちはhidden layer
太郎の時 の 隠れ層
Transform Matrix
Copy the past value
太郎 さん こんにちは
71
SystemThe input sentence is reversed.
The first word is the most important.
太郎 さん こんにちは
72
SystemForward Information and Reverse Information are
Convolution
+Encoder
太郎 さん こんにちは
73
SystemGenerating the phrases by the Convolution information
太郎 さん こんにちは
+Encoder
こんにちはDecoder
74
SystemIt is consider the continuously Generating the Phrases
太郎 さん こんにちは
こんにちは<EOS>
+Encoder
Decoder
75
SystemThe Value of this model
Expression
Continuously
Focus
Agenda
:76
Chat Bot
Choosethe Topic
UnderstandContents Control
DialogueGenerateAnswer
Answer Candidates
Neural Network1
UnderstandContents Control
DialogueGenerateAnswer
Neural Network 2
Question
Question Answer
WordNet WikiPediaVector
Question and Answer Function
77
SystemHow to know if it’s a question or not
SystemIt is very simple to decide
Is there a question mark (?)
If you interested in detecting questions, I recommend you read the paper below
Li, Baichuan, et al. "Question identification on twitter." Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 2011.
78
Where can I buy
cute clothes ?
UnderstandContents Control
DialogueGenerateAnswer
Neural Network1
Question Answer
Please tell me where I can find
cute clothes
Agenda
:
Self Introduction
Dialogue Value
Character of the Bot
System
Feature Plan
80
Chat Bot
Choosethe Topic
UnderstandContents Control
DialogueGenerateAnswer
Answer Candidates
Neural Network1
UnderstandContents Control
DialogueGenerateAnswer
Neural Network 2
Question
Question Answer
Feature Plan
81
Feature PlanPrepare the enough test
Not Enough test code
Evaluation
F measure
Apply the latest Chainer
I hear the Trainer function is good
Rule base and Neural Network
NeuralStoryTeller
Add the character82
ConclusionWord Net is a Concept Dataset
It is possible to find other data which express the concept
Mapping words to Vector space using Wikipedia Entity Vectors
We make the Vector spaces using our own data set
Hybrid function (Neural Network and Rule based)
Please search github for “Chainer Slack Twitter”
Please give me a star
I prepare the Docker Container
please search for “Docker hub Chainer-Slack-Twitter-Dialogue”83
Conclusion
We’re Hiring!!
https://www.theport.jp/recruit/information/84
Reference• Chainerで学習した対話用のボットをSlackで使用+Twitterから学習データを取得してファインチューニン• http://qiita.com/GushiSnow/items/79ca7deeb976f50126d7
• WordNet• http://nlpwww.nict.go.jp/wn-ja/
• 日本語 Wikipedia エンティティベクトル• http://www.cl.ecei.tohoku.ac.jp/~m-suzuki/jawiki_vector/
• PAKUTASO• https://www.pakutaso.com/
• Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).
• Rush, Alexander M., Sumit Chopra, and Jason Weston. "A neural attention model for abstractive sentence summarization." arXiv preprint arXiv:1509.00685 (2015).
• Tech Circle #15 Possibility Of BOT • http://www.slideshare.net/takahirokubo7792/tech-circle-15-possibility-of-bot
• Generating Stories about Images• https://medium.com/@samim/generating-stories-about-images-d163ba41e4ed#.h80qhbd54
• 二つの文字列の類似度• http://d.hatena.ne.jp/ktr_skmt/20111214/1323835913
• Li, Baichuan, et al. "Question identification on twitter." Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 2011.
• 音源:スカイウォーキング• http://dova-s.jp/bgm/download5052.html
• 音源:get into the rhythm• http://dova-s.jp/bgm/download5145.html
• 構文解析• http://qiita.com/laco0416/items/b75dc8689cf4f08b21f6
85