Copyright © 2015 Japan Society of Kansei Engineering. All Rights Reserved.
International Journal of Affective Engineering Vol.15 No.2 (Special Issue) pp.125-134 (2016)doi: 10.5057/ijae.IJAE-D-15-00030
125J-STAGE Advance Published Date: 2015.12.22
1. INTRODUCTION
In the 21st century, artificial intelligence is one of the
active research fields with rapidly increasing concern.
Along with the development and specialization of its
theories, the effects and applications have been spreading
over a great number of areas. Not only engineering and
industries, AI has been connected with literature and arts,
including the poetry, a popular literary form. AI is expected
to compose poems as well as human. To realize that,
researchers did a series of trials, which started with in the
first computer-create German poem by T. Lutz in 1959 [1].
Obviously, the first noticed feature of poem is stylistics,
leading to pattern-based approach of automatic poem
generation. The earliest method is to fill in designed patterns
using words. RACTER and PROSE [2] systems are
representative works. But this kind of system has an
inherent defect that quality of generated poems highly
depends on pattern design, resulting in lack of flexibility
of creation. Afterwards, an approach named CBR (Case-
Based Reasoning), based on known instances, appeared,
and ASPERA [3], COLIBRI [4] systems came out. This
method can improve generation quality, but is still subject
to its algorithm design and difficult to optimize. Thus,
introduction of rapidly developing a genetic algorithm into
poem generation has been a creative method [5] and is used
in many researches, such as MCGONAGALL [6] system
by H. Manurung. This relatively mature research also
provided definition of the following 3 required elements
of poem: meaningfulness, grammaticality and poeticness.
As Manurung also mentioned in another paper [7], what
makes it difficult to generate poems meeting the above
3 requirements are: (1) Difficulty of unity and inter-
dependency between grammar and semantics; (2) Very rich
supply of resources to support generation; (3) Objective
evaluation of output. Besides, poem composing by human
is a process starting with emotions, which results in poems
with sufficient emotional features. But recently, it is still an
impossible level for AI to reach at, let along artistic value.
On the other hand, most of the existing researches of
poem generation are based on English poetry. Ones based
on Eastern-Asia languages, such as Japanese and Chinese,
are in limited number. Especially on Japanese poetry,
although there have been many achievements on poetic
analysis, text mining, database construction, computer
assisted composing [8-11], etc., research on automatic
generation is not sufficient. Because of long traditional
history, Japanese poetry has a series of systematic theory
of composing, including many rules and restrictions on
stylistics and contents. Besides, there is a large difference
between poetic expression and ordinary expression of
Japanese, making it much more difficult to automatically
generate than English poetry whose stylistics is relatively
free.
Thus, to improve the deficiencies of existing poem
systems at some degree, and contribute to development in
the field of Japanese poem generation, on the basis of
previous computer-based researches on Japanese poetry,
we propose a new automatic Waka generation system
having the following distinctive features. (1) Text-based.
Most of the existing poem generation systems choose one
or more independent keywords as the origin, leading to
that generated poems have weak semantic and logical
relation inside themselves [2]. Comparatively, using texts
given by user as content materials for generation, because
Received: 2015.06.01 / Accepted: 2015.10.29
Special Issue on ISASE 2015
ORIGINAL ARTICLE
A Text-based Automatic Waka Generation System using Kansei
Ming YANG and Masafumi HAGIWARA
Graduate School of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan
Abstract: In this paper, we propose an automatic Waka generation system with custom database, based on texts given by the user. The proposed text-based system has better compatibility with Waka poem, improving consistence and logicality of generated poems. Kansei information is also considered to make poems natural and closer to the emotions the user wants to express. Presented by interactive generation experiments, the proposed system can generate Waka poems reflecting stylistic and grammatical requirements. Meanwhile, the poems are also with related meanings and emotions to the original text and some poeticness.
Keywords: Waka Generation, Interactive Genetic Algorithm, Kansei Information
International Journal of Affective Engineering Vol.15 No.2 (Special Issue)
126
of its stronger consistence, can compensate for the weak-
ness. Correspondingly, we selected Waka, a traditional
genre of Japanese poetry, as research object rather than
popular Haiku for the reason that Haiku is a concise genre
with high requirement of summarizing and connotation
beauty. Since texts contain much more information than
discrete keywords, it may be difficult to generate highly
correlative Haiku from text. Besides, because of longer
length, Waka has better compatibility with both narrative
and depictive texts. (2) Usage of custom database.
Although the amount of available databases about
Japanese poems is scarce, there are enough paper-based
dictionaries with required contents. Thus, we digitized
these dictionaries and several open databases on Internet
to construct a scalable database for Japanese poem genera-
tion, providing sufficient resources. (3) Introduction of
Kansei system. Since automatically generated poems have
inherent weakness on emotional expression, this research
proposes introduction of existing Kansei system into a
process of generation and evaluation in the system to build
corresponding relation between poem and the original
texts. That will make generated poems more natural and
closer to the emotions the user would like to express.
(4) Usage of an interactive genetic algorithm. Because of
the subjectivity of evaluation in arts fields, it remains
impossible to let computer complete all the tasks and still
needs subjective thinking of human. As a solution, an
interactive genetic algorithm is introduced in this research.
Considering the potential applications, since Waka is a
kind of concentrated expression of Japanese culture,
emotions and aesthetics, this research would contribute
to learning and analyzing these elements in Japanese
people’s thoughts for the fields of cross-culture communi-
cation. Meanwhile, it may also contribute to the design of
automatic conversation AI by adding the ability to deal
with emotional and cultural information.
Following this introduction, the proposed Waka genera-
tion system is explained in Chap. 2. In Chap. 3, the
experimental results are shown, and Chap. 4 concludes
this paper.
2. WAKA GENERATION SYSTEM
2.1 OutlineThe proposed system consists of 3 connected phases.
Figure 1 shows the flowchart showing its running proce-
dures and data flow.
One run starts with that an original text is inputted into
the proposed system by the user. In the first phase, Text
Analysis, the system extracts main information from the
text, including keywords, case grammar and context
information. Following, the extracted keywords will be
assigned Kansei tags and become the input to the next
phase, while case grammar and context will be stored for
the generation process. In the second phase, Database and
Retrieval, the system searches for pieces in literary
vocabulary with similar meaning and Kansei tags of
keywords from database as materials to generate poem.
The third phase is Generation and Evaluation based on an
interactive genetic algorithm, further divided into auto-
matic and interactive part. In the former, the proposed
system combines the materials, literary pieces, and the
thread, case grammar and context, by stylistic rules of
Waka poetry to generate poems, and performs automatic
evaluation. This evolutionary loop will continue repeating
until a sufficient number of eligible poems are generated.
In the interactive part, the system considers some items of
subjective evaluation by human experimenters, and
continues to regenerate new generations until the set
number of generation is reached at. Finally, the system
run ends with one selected output Waka poem. In the
following Sec. 2.2-2.4, the procedure and methods in the
3 phases will be explained.
2.2 Text AnalysisAs shown in Figure 2, the text analysis part is divided into
4 levels to be performed in sequence. In order to continue
the subsequent procedures smoothly, some restrictions are
defined for the input text. It should be written in modern
Japanese plain form with length between 80 and 200 kanas,
and the maximum of sentences is 5.
Figure 1: System structure and data flow
Figure 2: Text analysis
A Text-based Automatic Waka Generation System using Kansei
International Journal of Affective Engineering Vol.15 No.2 (Special Issue)
127
(1) Morphological Analysis
The purpose is to separate a sentence into independent
words and return them into dictionary form. Meanwhile,
pronunciation and part-of-speech (POS) of each word can
also be recognized. In the area of Japanese morphological
analysis, MeCab [12] is a representative tool with high
efficiency. It is used directly in the system for analysis. On
this basis, keyword extraction is performed. Termex [13]
can directly extract keywords from paragraph, but only
nouns. As complementation, the proposed system uses
Chakoshi [14] simultaneously, which is an online corpus
tool based on Aozora Bunko [15]. Compared with
Balanced Corpus of Contemporary Written Japanese [16],
Aozora Bunko, whose main contents are literary works,
is more suitable for the proposed system. Using this tool,
the system can obtain frequency of every word other than
nouns in the whole Aozora Bunko. And through simplifying
TF-IDF [17] formula, we can obtain the following formula,
(1)
If R>1/1000, the word will be regarded as a keyword.
Finally, the keyword set includes ones selected by both
Termex and Chakoshi. Afterwards, some of the keywords
will be assigned Kansei tags. The standard to determine
Kansei of words is Emotional Expression Dictionary [18],
and in whose emotion system, Kansei of verbal expres-
sion is defined as 10 categories shown in Table 1.
(2) Syntactic Analysis
This analysis is to recognize the level structure in a
sentence, including phrase structure and dependency
relationship. CaboCha is a dependency analyzer based on
MeCab we tried to use in this part. However, although
CaboCha is excellent in phrase structure analysis, the
result of dependency recognition does not meet our
demand. Thus, based on the POS relationship between
2 words in Japanese, we designed a simple method to
determine the dependency relationship in a sentence.
The examples of grammar rules considered are listed in
Table 2. Certainly, since this procedure is modularized,
recent method could be taken place if there is another
effective one leading to better result.
(3) Semantic Analysis
This is to recognize role of each substantive (noun,
pronoun, etc.) in a sentence, which is also named as case
grammar. In this research, deep classification of case
grammar has not been involved, but classification by
postposition after the substantive. Table 3 shows these
categories. Performing deep case [19] analysis can mine out
more semantic information, but external corpus is necessary,
which makes the system much more complicated.
(4) Context Analysis
Context is inter-sentence relation, including conjunction
and correspondence. In Japanese NLP, there have been a
few achievements on context processing, and which are
not general principles but discussion on certain special
cases. That also indicates it is quite difficult to extract
and apply context information. However, some context
information is expressed through some symbols that
might be main objects we focus on to extract inter-sentence
relation information, as shown in Table 4.
By the 4 levels of text analysis, the proposed system
extracts main information from the original text. Keywords
will enter into the next phase one by one as an input while
case grammar and context information will be reserved
for the generation.
Table 1: Kansei categories
喜 Happy 好 Fond
怒 Angry 厭 Disgusted
哀 Sad 昂 Excited
怖 Afraid 安 Relaxed
恥 Embarrassed 驚 Amazed
Table 2: Examples of custom grammar rules
Pattern (Type = -1) Pattern(Type = 1)AはB A(sub.)のB(sub.)
Aへ・から・まで・たりB AとB(v.)A、<conj.>、B A<aux.>B
Aと、B A(dec.)B(sub.)
A/B : Word or phraseType = 1 : No modifying relation between A and BType = 1 : A modifies B( ) : POS of A/B < > : Auxiliary wordSub. : substantive word Dec. : declinable word
Table 3: Postposition and case type
が GA case(Agent)
に NI Case(Patient, Location, Goal, Source, etc.)
を WO case(Patient)
で DE Case(Agent, Instrument,
Location, Cause, etc.)
も GA case or WO Case(Agent or Patient)
と TO Case(Patient)
Table 4: Context information
た Pastなかった Past & Negation
ば・たら・なら Hypothesisなければ・なかったら Hypothesis & Negationない・なく・なくて・ず Negationそうだ・ようだ・らしい Conjectureている・てある・ておく Stateで・ので・ため・から Reasonけど、けれど、が、のに Contradiction
ても Concession
International Journal of Affective Engineering Vol.15 No.2 (Special Issue)
128
2.3 Database and RetrieveAs widely known, classical Japanese poems, represented
by Haiku and Waka, different from modern English
poem with free stylistics, use language style combining
traditional Japanese grammar and literary vocabulary.
But the original text inputted by the user is based on the
modern Japanese grammar, leading to great discrepancy at
many respects with Waka poems. Therefore, to generate
poems closer to the classical Japanese style, we need a
process that transforms modern ordinary Japanese words to
the traditional literary words: A literary vocabulary resource
database and the retrieval procedure are necessary.
About design of database structure, research of Hitch-
Haiku [20], a Haiku creation support system, has proposed
a shaped idea. It classified verbal resource for Haiku
composition as several sub-databases, and assigned each
one a weight according to the frequency and importance
in Haiku poems. Since the Waka and Haiku have many
similarities in vocabulary, we reconstructed a database for
Waka generation on the basis of the database of Hitch-
Haiku system, including 5 sub-databases.
(1) Synonym and Association
To enrich the expression variation of one word, using
synonyms as alternative expression is a feasible choice.
Here, Japanese WordNet [21] is used to provide synonyms
for an input noun, if there is any. In consideration of
language style of Waka, the synonyms in Katakana (loan-
words) will be discarded. As for association, Associative
Concept Dictionary [22] is selected to provide various
associative nouns for an input noun called as stimulus.
In this part, only two types of association, attribute
concepts, adjectives describing the stimulus, and action
concepts, verbs related to the stimulus are used. Because
of similarity of function and matching method between
synonym and association, they are merged into one sub-
database in the proposed system.
(2) Kago
Kago is the vocabulary in Waka, including several
types [23]. However, in the proposed system, it especially
refers to the literary vocabulary in traditional Japanese,
which is the kernel of vocabulary for Waka generation.
Unfortunately, no available Kago database or digitalized
dictionary could be imported directly. To deal with it, we
typed the contents of a paper dictionary by K. Seryo [24]
into our database just for research. And the matching
method is to search for corresponding Kago with a given
word or its kana form. Furthermore, in Kago category,
there is a special type named Makura-Kotoba, which is a
set of fixed phrases with the usual length of 5 kanas, used
to modify and enhance rhythm feeling [23]. Mostly,
Makura-Kotoba is put in the first phrase of Waka, and
each keyword relates to different ones.
Design of data table and the record example are shown in
Table 5. Actually, Kansei information does not exist in the
original dictionary. Therefore, we added it to the database
according to the Emotional Expression Dictionary manu-
ally, which is the same situation for the other sub-databases.
Recently, this sub-database contains 3,954 records.
(3) Kigo
Kigo is the words of animals, plants, human acts, etc., to
add seasonal feeling in poems, originally in Haiku. Kigo
has 5 categories of spring, summer, autumn, winter and
new year. Our database uses data resource from an online
Kigo database [25], shown in Table 6. The amount of
records is 2,737. If there is any season name or word
obviously relating to a certain season (e.g. Sakura is a
seasonal word for spring) in the original text, the proposed
system will select corresponding Kigos by category.
(4) Idiom
The number of idioms is quite large in Japanese, and
many of them are concentrated from episodes with
logicality and rich sentiment. Effective usage of idioms in
Waka makes meaning expression profound, resulting in
better literary effects. However, idioms are with various
lengths. For this reason we selected ones with length above
14 kanas, whose minimal phrases are no longer than 7 kanas,
from an open online idiom database [26] and imported
them into our database. Since explanation for each idiom is
a paragraph of text in the original database, we extracted
some representative words for explanation manually. Data
table design is shown in Table 7. At present, it contains
1,399 records. As for matching, keywords contained in
Table 5: Examples of Kago sub-database
Original Kana Kago
太陽(The sun)
たいよう あまつひ・まめのひ【天日】 はくじつ【白日】 *たかひかる【高光る】
Type1 Type2 Type3 Type4
自然(Nature)
天文・気候・地理(Astronomy, Weather, Geography)
日(The sun)
/
Table 6: Example of Kigo sub-database (the first wind of New Year)
Kigo Kana Season Type1 Type2 POS Kansei
初風 はつかぜ 新年 天文 気候 名詞 喜
(First wind) (New year) (Astronomical) (Climate) (n.) (Happy)
Table 7: Data table design for Idiom sub-database
Idiom Kana Explanation Kansei
朝の露(Dew in the morning)
あしたのつゆ 短い 儚い(Temporal, Ephemeral)
悲(Sad)
A Text-based Automatic Waka Generation System using Kansei
International Journal of Affective Engineering Vol.15 No.2 (Special Issue)
129
each sentence will be matched to explanation words in
idiom by record, through the following equation,
(2)
where h is the number of matched words, t is the total of
explanation words. Larger value of Si means that the idiom
matches meaning of original sentence better, and only
idioms with Si > 50 will be remained.
(5) Onomatopoeia
Onomatopoeia is a collection of words simulating sounds,
movement or behaviors of things [27]. In Japanese,
onomatopoeia is used in various situations to express
abundant meanings, but mostly in daily language. In Waka,
it also appears, but usually in more special expressions
rather than the common one [23]. Thus, in the proposed
system, a mutually transforming process between
onomatopoeia and ordinary words is designed. To realize
it, we chose Japanese Onomatopoeia Dictionary [27],
including more than 2,700 items, as a resource and
typed them into the data table shown in Table 8. As the
idiom sub-database, explanation for each item has
also rewritten as discrete words. Firstly, all records are
classified as 2 types, Wago, whose etymology is tradi-
tional Japanese, and Kango, whose etymology is Chinese.
For the Wago-Onomatopoeia appears in the original text,
the proposed system converts it to explanation words by
direct matching. On the other hand, through matching
words in the original text to the explanation words in
records, proper Kango-Onomatopoeia will be found
as material for Waka generation. The formula for this
matching is
(3)
where h is the number of matched words, t is the total of
explanation words. Larger value of Sok means that the
Kango-Onomatopoeia matches meaning of original
sentence better, and only ones with Sok > 30 will be
remained.
In the proposed system, every sub-database is scalable
only by importing new resource in the same defined form
of database. The larger the amount of records in database
is, the more abundant the materials for generation are.
That may improve the quality of generated poems.
However, enlarging database leads to more time consump-
tion during the matching process.
2.4 Generation and EvaluationSo far, through above 2 phases, the materials, the literary
pieces, and the thread, the case grammar and context,
are both ready. In the last phase, an evolutionary loop
mainly consisting of generation and evaluation done by an
interactive genetic algorithm, which will be described by
procedure as follows.
(1) Chromosome Design
A Waka poem consists of 5 phrases with lengths of 5, 7,
5, 7, 7 kanas, respectively. Considering that the length
is not long, we made a restriction that each phrase must
contain one or two kernel words. Besides, by the differ-
ence of kernel words, prefix and suffix may be used to
modify them in Waka. Thus, such a chromosome is
designed to represent primitive contents of a Waka poem,
showed by Figure 3.
In this chromosome, either of 1st or 3rd phrase uses
4 genes, while each other phrase uses 6 genes.
Furthermore, there is a gene indicating the use of
Makura-Kotoba. Thus, a chromosome consists of 27
genes with integer coding.
(2) Content Determination
According to the above restrictions, maximum of original
text sentences is 5. However, considering that Waka poem
is much shorter than the original text, usage of all
sentences as contents is not possible. Therefore, at first, it
must be decided randomly which sentences, from 2 to 5,
will be used. When the number is less than 5, it should
also be determined which phrases are assigned to each
sentence. According to Waka poetry, the first 3 phrases
belong to Kami-no-Ku and the last 2 phrases belong to
Shimo-no-Ku, which restricts that the 3rd and 4th phrases
cannot be assigned to the same sentence. Besides, when
the 1st and 2nd phrases are assigned to the same sentence,
the 1st may be replaced by Makura-Kotoba.
Table 8: Examples of Onomatopoeia sub-database
Ono. Type1 Type2 Type3 Explanation Kansei
ほかほか 自然(Nature)
温度 (Temperature)
暑い温かい(Hot,
Warm)
体 食べ物 温かい
(Body, Food, Warm)
安(Relaxed)
Ono. Type1 Kana / Explanation Kansei
烈烈 漢語(Kango)
れつれつ / 激しい勢い 盛ん(Intense,
Vigorous)
昂(Excited)
Figure 3: Chromosome design for genetic algorithm
International Journal of Affective Engineering Vol.15 No.2 (Special Issue)
130
The next step is content determination of every individ-
ual. Firstly, a primary population with n individuals should
be generated randomly. In chromosome, value of the gene
representing for Makura-Kotoba can be -1 (Makura-Kotoba
is not used) or non-negative integer M (serial number of
available Makura-Kotobas). Ones stand for kernel words
can be -1 (not enabled) or positive integer K (serial number
of available literary pieces). Ones show the use of prefix
and suffix can be -1 (not enabled) or 1 (enabled). However,
there are some basic restrictions for every individual.
(a) Makura-Kotoba and the 1st phrase cannot be enabled at
the same time. If the gene for Makura-Kotoba is M, all
genes in the 1st phrase will be regarded as -1. (b) In each
phrase, if there is only one kernel word enabled, it must be
the first one. (c) Considering frequency of prefix and suffix
in Waka poems, at most 2 prefix and suffix can be used in
each phrase. If generated result does not meet the require-
ment, the part with the problem will be re-generated until
it does.
(3) Generation
So far, every individual is still in status of numerical
values, requiring a series of process to convert it into
Waka poem meeting stylistic requirements.
Type recognition of verbs. Because of discrepancy
between the modern and the traditional grammar of Japa-
nese, classification and inflection are almost different.
Thus, it must be determined which type in traditional
grammar a verb belongs to. By invoking online dictionary
Weblio Kogo [28], the proposed system can grasp required
important information.
Determination of inflection. Inflection of a declinable
word (verb, adjective, adjective-verb, auxiliary verb, etc.)
should be determined by considering both POS of the
following word and the meaning. Here, we consider 3 types
of verb connections, Rentai (followed by substantive word),
Renyo (followed by declinable word, or end of a phrase),
and Shushi (end of the whole poem). On the other hand,
extracted context information attaches meaning to some
declinable words. By this we also classified them into 3 types,
Kako (past tense), Hitei (negation) and Futsu (others).
Moreover, to make the meaning natural, one more type for
verb, Izen-Katei, meaning the order of actions in Waka, will
also be considered. Table 9 shows possible inflections in the
system, represented by a Simo-Nidan verb.
Judgment of phrase length. Since literary pieces are
with different lengths, after inflection, recent materials
may have been exceeded length standard of phrase. Such
individuals cannot generate appropriate Waka poems. The
proposed System will test every phrase of each individual.
If over-length phrase is found, the individual will be
tagged as “F”, meaning that it is failed individual that will
not be processed in the subsequent steps.
Adding fillers. For individuals of no over-length, it
should be done to add fillers to make them reach at
standard length, according to genes in individual and
previous case grammar. The first step is to add prefix and
suffix considering whether the genes for them are
enabled. This is a rule-based process by references [24].
For certain words, corresponding prefix or suffix will be
used like some examples shown in Table 10. If the
corresponding rule does not exist, nothing will be added.
The second step is adding postpositions after kernel words
by the case grammar that a word belongs to which case
of postposition. Available postpositions are listed in
Table 11. Once the length requirement has still not been
met, interjection “よ”, used in the 1st or 3rd phrase only
containing a single noun, or exclamation “かも”, “や”, etc.,
used in the end of poem.
(4) Automatic Evaluation
After generated fundamentally formed poems from
individual genes, these poems should be evaluated on
quality. Firstly, computer automatic evaluation will be
performed based on quantized indices. Quality of a poem is
indicated by value of fitness, computed at following aspects.
Length of phrase. Although over-length individuals
are screened out, there may be still some individuals
with phrase whose length does not meet the standard after
Table 9: Examples of Inflections of a Simo-Nidan verb
Original 流れる Kako-Rentai 流れし
Futsu-Shushi 流る Hitei-Shushi 流れず
Futsu-Rentai 流れる Hitei-Rentai 流れぬ
Futsu-Renyo 流れ Hitei-Renyo 流れず
Kako-Shushi 流れき Izen-Katei 流るれば
Table 10: Examples of prefix and suffix
Kernel Word Prefix Kernel Word Suffix
明かり(Light)
薄~(Dim~)
親(Parent)
~がかり(Depend on~)
紅(Red)
真~(Dark~)
雨(Rain)
~がち(~ continually)
Table 11: Postposition
Case Type Postposition
GA Case が
WO Case を
GA Case or WO Case も
DE Case で
NI Case に・にも
TO Case と・とも
Modification between nouns の
A Text-based Automatic Waka Generation System using Kansei
International Journal of Affective Engineering Vol.15 No.2 (Special Issue)
131
a series of processing. However, these ones, tagged as
“S” (successfully generated) can be used for genetic
operations. For the individual with all phrases eligible, the
proposed system tags it as “P”, representing for perfect
individual.
Balance of POS. If an individual contains too many
kernel words of the same POS, it can hardly express
meaning completely and fluently. Thus, the proposed
system will select individuals only with percentage of
declinable words in the range of 33%-60% and addition of
declinable and substantive words account for at least 60%.
The others will be tagged as “F”.
Usage of each sub-database. To ensure various
materials are used, a fitness score table is decided as
shown in Table 12 on the number of pieces from each
sub-database in an individual.
Usage of fillers. To make expression various, fillers
leads to more natural meaning of poem. The score rules
are shown in Table 13.
Usage of Kansei information. To keep emotional
expression consistent and with natural variation, the
proposed system calculates types and frequency of each
kind of Kansei of all kernel words, according to the score
rules shown in Table 14.
After statistics on these aspects, fitness values of every
individual are computed. Fitness of individuals with
tag “F” will be set as 0. However, if the filtering leads to
that individuals with tag “F” are excessive, there will be
not enough blocks for genetic operations. Thus, if “F”
individuals are more than (1-α%)∙n, where n is the size of
population, system will repeat procedure (2)-(4) on these
individuals until amount of “S” and “P” individuals is at
least α%∙n.
(5) Genetic Operations
After primary poem population is generated, the
proposed system will perform a series of genetic opera-
tions on it to obtain better poem individuals. (a)Selection.
In recent population, α%∙ n individuals with highest
fitness values will be selected. (b)Crossover. These ones
will be divided into α%∙(n /2) pairs to perform 2-point
crossover, which is to select 1-4 consecutive whole
phrases and exchange with each other. Besides of the
α%∙n optimal individuals that will be remained, all of
the other vacant ones will be generated by crossover.
(c)Mutation. For any kernel word in every individual,
there is a probability of pm that mutation occurs. That is to
replace the kernel word by another literary piece related to
the same word or sentence in the original text. After these
operations, the proposed system runs the evolutionary
loop of (4)-(5) to generate a number of generations of
populations until the number of individuals with fitness
higher than a set threshold. Then the loop stops, and the
system turns to the interactive evaluation.
(6) Interactive Evaluation and Regeneration
In this process, human experimenters are demanded to
evaluate some subjective items beyond the power of
computer. As listed in Table 15, these items are also
assigned different weights according to their importance.
After the evaluation above, the proposed system sums
the scores of both automatic part and this part by each
individual, and regenerates a new generation of poem
individuals mainly through genetic operations that are
almost the same with done before. This process is also a
loop and ends when the number of generation reaches at
the set termination parameter m. Finally, one poem indi-
vidual with highest fitness value will be selected as the
final output. Hereto, an entire run of system has been
completed.
Table 12: Fitness score for usage of each sub-database
Original word or synonym 1 for each
Associative word 5 for each
Kago 10 for each
Kigo (only once) 15
Kigo (more than once) -10
Idiom (only once) 10
Idiom (more than once) -30
Kango-Onomatopoeia (only once) 15
Kango-Onomatopoeia (more than once) 10
Table 13: Fitness score for usage of fillers
Prefix or Suffix 5 for each Interjection 5 if exists
Postposition 3 for each Exclamation 3 if exists
Table 14: Fitness score for usage of Kansei information
Kansei of a kernel word is same with it of corresponding word in original sentence
10
Number of Kansei types in poem (1-2) 20
Number of Kansei types in poem (≥3) -30
Number of Kansei Changes in poem (1-2) 10
Number of Kansei Changes in poem (≥3) -25
Table 15: Fitness score for interactive evaluation items
Grammar 45
Meaning Consistence 45
Logicality 45
Accordance with Text 30
Rhythm and Rhyme 30
Aesthetics: Artistic Conception 60
Aesthetics, Emotional Expression 60
International Journal of Affective Engineering Vol.15 No.2 (Special Issue)
132
3. EXPERIMENT
To show whether this system and its approach can
successfully generate eligible Waka poems, we carried out
some generation experiments with details and results
described as follows.
The texts used in the experiments are a paragraph from
Kokoro, a novel composed by Soseki Natsume in 1914,
opened by Aozora Bunko, as shown in Figure 4. This
depictive text with length of 119 kanas describes a scene
of beach, and contains narrator’s disgusted emotion
against the crowd. Using it, we let the system run for
several times and recorded output results. The parameters
set in the experiments are shown in Table 16. As for the
experimenters in interactive evaluation part, the subjects
are 10 university students specializing in informatics.
Among results of the generation, we selected 2 of
the best poems shown in Figure 5. Index results of the
experiment are shown in Tables 17 and 18, indicating the
level of the proposed system’s performance by normal-
ized values of fitness scores, especially the score of the
best individuals. Differently, Table 17 shows the overall
scores while Table 18 subdivides them into 3 items,
the 3 required elements of poem defined by Manurung:
(a) Meaningfulness, including meaning consistence,
logicality and accordance with text; (b) Grammaticality,
only including the sub-item of grammar; (c) Poeticness,
including rhythm and rhyme, artistic conception and
emotional expression. The score of each item is normal-
ized respectively.
Considering what the indices suggest, some discussion
could be done on the example poems in Figure 5 accord-
ing to the above 3 items. (a) It can be found that general
meaning expressions by the poems are slightly vaguer
than the original text. However, their expressions do
reproduce concrete scenes related to the ones described
by the text. (b) Considering stylistics and traditional
Japanese grammar, generated poems are completely
eligible. (c) In the view of emotional expression, the text
suggests a kind of disgusted emotion. The first example
has similar emotion with the text, but in the second one,
except some words, the general emotion has not been
united as the text. As for aesthetic view, because of
features of Waka poetry, vagueness of meaning expression
provides blank for imagination, corresponding to the
connotation beauty, an essential concept in Japanese
poetry. Certainly, evaluation on poeticness has not had a
widely recognized quantized standard, which leading to
that the evaluation for selecting the “best” individuals
may be slightly subjective. Therefore, some further
researches are required to improve on this point, such
as cooperative studies with experts in Waka poetry.
By specialized knowledges, reliability of interactive evalu-
ation and definition exaction of evaluation items could
both be improved.
Figure 4: Testing text
Table 16: Experiment parameters
n (Size of population) 200
pm (Probability of mutation) 5%
α% (Starting parameter) 15%
m (Termination parameter) 4
Figure 5: Examples of best results of experiment
Table 17: Index results of experiments (overall)
Experiment No. 1 2 3 Average
Best Individual 0.561 0.700 0.725 0.662
Worst Individual 0.470 0.521 0.563 0.518
Average Score 0.510 0.572 0.605 0.562
Table 18: Index results of experiments (itemized)
Experiment No. 1 2 3 Average
(a) Best Individual 0.925 0.950 0.950 0.942
(a) Average Score 0.624 0.685 0.638 0.649
(b) Best Individual 1.0 1.0 1.0 1.0
(b) Average Score 0.776 0.76 0.792 0.776
(c) Best Individual 0.920 0.920 0.840 0.890
(c) Average Score 0.675 0.729 0.647 0.684
A Text-based Automatic Waka Generation System using Kansei
International Journal of Affective Engineering Vol.15 No.2 (Special Issue)
133
4. CONCLUSION
In this paper, we proposed an automatic Waka genera-
tion system with custom database, based on the text given
by the user. The proposed system has 3 connected phrases
with methods including leveled text analysis, Kansei
information processing, literary vocabulary database,
genetic algorithm and both automatic and interactive
evaluation. Through a series of generation experiments,
we found that using this system, Waka poems reflecting
stylistic and grammatical requirements of Waka poetry
can be generated. Meanwhile, they are also with related
meanings and emotions to the original text and poeticness
at some degree.
In the future, we will mainly focus on refining the
indices of Kansei information and expending the capacity
of database to improve the generation results. Besides, the
feasibility to apply the proposed research on other genres
of poetry will also be investigated.
REFERENCES1. C.O. Hartman; Virtual Muse: Experiments in
Computer Poetry, Wesleyan University Press,
Middletown, USA, 1996.
2. C. Zhou, W. You and X.J. Ding; Genetic Algorithm
and Its Implementation of Automatic Generation of
Chinese SONGCI, Journal of Software, 21(3),
pp.427-437, 2010.
3. P. Gervás; An Expert System for the Composition of
Formal Spanish Poetry, Knowledge-Based Systems,
14(3), pp.181-188, 2001.
4. B. Díaz-Agudo, P. Gervás and P.A. González-Calero;
Poetry Generation in COLIBRI, Advances in Case-
Based Reasoning, Springer Berlin Heidelberg,
pp.73-87, 2002.
5. H.E. Gruber and S.N. Davis; Inching Our Way up
Mount Olympus: the Evolving-Systems Approach
to Creative Thinking, The Nature of Creativity,
Contemporary Psychological Perspectives, Cambridge
University Press, Cambridge, pp.243-269, 1988.
6. H. Manurung; An Evolutionary Algorithm Approach
to Poetry Generation, The University of Edinburgh,
Edinburgh, 2004.
7. H. Manurung, G. Ritchie and H. Thompson; Towards
a Computational Model of Poetry Generation, The
University of Edinburgh, Edinburgh, 2000.
8. M. Yamamoto and J. Rokui; Semantic Information
Extraction from Waka Poems Using Decision Rule
of Rough Set, IEICE Technical Report, 105(682),
pp.73-78, 2006.
9. M. Takeda, T. Fukuda, I. Nanri and M. Yamasaki;
Discovering Characteristic Patterns from Classical
Japanese Poem Database, Journal of Information
Processing Society of Japan, 40(3), pp.783-795,
1999.
10. R. Yoshioka; The Development of the Kigo-database
and the Outline of the “Haiku Entry and Appreciation
System”, National Institute for Educational Policy
Research of Japan, 2006.
11. M. Suzuki, G. Hattori, C. Ono, N. Takada and
N. Minagawa; Recommendation of Kigo for Support
to Create Haiku: Photo-Haiku Communication,
IEICE Technical Report, 111(273), pp.7-10, 2011.
12. T. Kudo; MeCab: Yet another Part-of-Speech and
Morphological Analyzer, 2005.
http://mecab.sourceforge.net/.
13. H. Nakagawa, A. Maeda and H. Kojima; Termex,
University of Tokyo, 2003.
http://gensen.dl.itc.u-tokyo.ac.jp/index.html.
14. J. Fukada; Chakoshi, Purdue University, 2008.
http://tell.cla.purdue.edu/chakoshi/public.html.
15. Aozora Bunko, http://www.aozora.gr.jp/.
16. Balanced Corpus of Contemporary Written Japanese,
Center for Corpus Development, NINJAL, 2009.
http://www.ninjal.ac.jp/corpus_center/bccwj/.
17. G. Salton, A. Wong and C.S. Yang; A Vector Space
Model for Automatic Indexing, Communications of
the ACM, 18(11), pp.613-620, 1975.
18. Akira Nakamura; Emotional Expression Dictionary,
Tokyodo Press, Tokyo, 1993.
19. M. Motoki, Y. Shimazu and N. Takahashi; Deep Case
Analysis Using a Layered Neural Network, Journal
of Information Processing Society of Japan, 36(11),
pp.2597-2610, 1995.
20. N. Tosa, H. Obara, M. Minoh and S. Matsuoka;
Hitch-Haiku: Japanese Haiku Poem Creation Support
System by Computer, Academic Center for Comput-
ing and Media Studies, Kyoto University, 62(2),
pp.247-255, 2008.
21. Japanese WordNet, National Institute of Information
and Communications Technology of Japan, 2009.
http://nlpwww.nict.go.jp/wn-ja/.
22. J. Okamoto and S. Isizaki; Evaluating a Method of
Extracting Important Sentences Using Distance between
Entries in an Associative Concept Dictionary, Journal
of Natural Language Processing, 10(5), pp.139-151,
2003.
23. Books Iituka; Introduction to Grammar of Tanka,
Books Iituka, Tokyo 1994.
International Journal of Affective Engineering Vol.15 No.2 (Special Issue)
134
24. K. Seryo; Dictionary for Looking up Traditional
Vocabulary by Modern Vocabulary, Sanseido Press,
Tokyo, 2007.
25. Kigo Database, International Research Center for
Japanese Studies, http://www.nichibun.ac.jp/graphic-
version/dbase/kigo/index.html.
26. T. Kurogo; Kurogo-Style Idiom Dictionary, 1999.
http://www.geocities.jp/tomomi965/index2.html.
27. M. Ono; Japanese Onomatopoeia Dictionary,
Shogakukan Press, Tokyo, 2007.
28. Weblio Kogo Dictionary, http://kobun.weblio.jp/.
29. S. Natsume, E. McClellan; Translation of: Kokoro,
Gateway Editions, Washington, 1985.
Ming YANG Ming Yang received his M.E. degree from Keio
University, Japan, in 2015. His research is related to
natural language processing and Kansei information.
Especially, he has interests in language processing
of Japanese, and its potential applications in
literature.
Masafumi HAGIWARA (Member)Masafumi Hagiwara received his B.E., M.E. and
Ph.D degrees in electrical engineering from Keio
University, Yokohama, Japan, in 1982, 1984 and
1987, respectively. Since 1987 he has been with
Keio University, where he is now a Professor.
From 1991 to 1993, he was a visiting scholar at
Stanford University. He received IEEE Consumer Electronics Society
Chester Sall Award in 1990, Author Award from the Japan Society of
Fuzzy Theory and Systems in 1996, Technical Award and Paper Awards
from Japan Society of Kansei Engineering in 2003, 2004 and 2014,
Best research award from Japanese Neural Network Society in 2013.
He is a member of IEICE, IPSJ, JSAI, SOFT, IEE of Japan, Japan Society
of Kansei Engineering, JNNS and IEEE (Senior member). His research
interests include neural networks, fuzzy systems, and affective engineering.
He is now the president of the Japan Society for Fuzzy Theory and
Intelligent Informatics (SOFT).