+ All Categories
Home > Documents > A Text-based Automatic Waka Generation System using Kansei

A Text-based Automatic Waka Generation System using Kansei

Date post: 29-Oct-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
10
Copyright © 2015 Japan Society of Kansei Engineering. All Rights Reserved. International Journal of Affective Engineering Vol.15 No.2 (Special Issue) pp.125-134 (2016) doi: 10.5057/ijae.IJAE-D-15-00030 125 J-STAGE Advance Published Date: 2015.12.22 1. INTRODUCTION In the 21st century, artificial intelligence is one of the active research fields with rapidly increasing concern. Along with the development and specialization of its theories, the effects and applications have been spreading over a great number of areas. Not only engineering and industries, AI has been connected with literature and arts, including the poetry, a popular literary form. AI is expected to compose poems as well as human. To realize that, researchers did a series of trials, which started with in the first computer-create German poem by T. Lutz in 1959 [1]. Obviously, the first noticed feature of poem is stylistics, leading to pattern-based approach of automatic poem generation. The earliest method is to fill in designed patterns using words. RACTER and PROSE [2] systems are representative works. But this kind of system has an inherent defect that quality of generated poems highly depends on pattern design, resulting in lack of flexibility of creation. Afterwards, an approach named CBR (Case- Based Reasoning), based on known instances, appeared, and ASPERA [3], COLIBRI [4] systems came out. This method can improve generation quality, but is still subject to its algorithm design and difficult to optimize. Thus, introduction of rapidly developing a genetic algorithm into poem generation has been a creative method [5] and is used in many researches, such as MCGONAGALL [6] system by H. Manurung. This relatively mature research also provided definition of the following 3 required elements of poem: meaningfulness, grammaticality and poeticness. As Manurung also mentioned in another paper [7], what makes it difficult to generate poems meeting the above 3 requirements are: (1) Difficulty of unity and inter- dependency between grammar and semantics; (2) Very rich supply of resources to support generation; (3) Objective evaluation of output. Besides, poem composing by human is a process starting with emotions, which results in poems with sufficient emotional features. But recently, it is still an impossible level for AI to reach at, let along artistic value. On the other hand, most of the existing researches of poem generation are based on English poetry. Ones based on Eastern-Asia languages, such as Japanese and Chinese, are in limited number. Especially on Japanese poetry, although there have been many achievements on poetic analysis, text mining, database construction, computer assisted composing [8-11], etc., research on automatic generation is not sufficient. Because of long traditional history, Japanese poetry has a series of systematic theory of composing, including many rules and restrictions on stylistics and contents. Besides, there is a large difference between poetic expression and ordinary expression of Japanese, making it much more difficult to automatically generate than English poetry whose stylistics is relatively free. Thus, to improve the deficiencies of existing poem systems at some degree, and contribute to development in the field of Japanese poem generation, on the basis of previous computer-based researches on Japanese poetry, we propose a new automatic Waka generation system having the following distinctive features. (1) Text-based. Most of the existing poem generation systems choose one or more independent keywords as the origin, leading to that generated poems have weak semantic and logical relation inside themselves [2]. Comparatively, using texts given by user as content materials for generation, because Received: 2015.06.01 / Accepted: 2015.10.29 Special Issue on ISASE 2015 ORIGINAL ARTICLE A Text-based Automatic Waka Generation System using Kansei Ming YANG and Masafumi HAGIWARA Graduate School of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan Abstract: In this paper, we propose an automatic Waka generation system with custom database, based on texts given by the user. The proposed text-based system has better compatibility with Waka poem, improving consistence and logicality of generated poems. Kansei information is also considered to make poems natural and closer to the emotions the user wants to express. Presented by interactive generation experiments, the proposed system can generate Waka poems reflecting stylistic and grammatical requirements. Meanwhile, the poems are also with related meanings and emotions to the original text and some poeticness. Keywords: Waka Generation, Interactive Genetic Algorithm, Kansei Information
Transcript

Copyright © 2015 Japan Society of Kansei Engineering. All Rights Reserved.

International Journal of Affective Engineering Vol.15 No.2 (Special Issue) pp.125-134 (2016)doi: 10.5057/ijae.IJAE-D-15-00030

125J-STAGE Advance Published Date: 2015.12.22

1. INTRODUCTION

In the 21st century, artificial intelligence is one of the

active research fields with rapidly increasing concern.

Along with the development and specialization of its

theories, the effects and applications have been spreading

over a great number of areas. Not only engineering and

industries, AI has been connected with literature and arts,

including the poetry, a popular literary form. AI is expected

to compose poems as well as human. To realize that,

researchers did a series of trials, which started with in the

first computer-create German poem by T. Lutz in 1959 [1].

Obviously, the first noticed feature of poem is stylistics,

leading to pattern-based approach of automatic poem

generation. The earliest method is to fill in designed patterns

using words. RACTER and PROSE [2] systems are

representative works. But this kind of system has an

inherent defect that quality of generated poems highly

depends on pattern design, resulting in lack of flexibility

of creation. Afterwards, an approach named CBR (Case-

Based Reasoning), based on known instances, appeared,

and ASPERA [3], COLIBRI [4] systems came out. This

method can improve generation quality, but is still subject

to its algorithm design and difficult to optimize. Thus,

introduction of rapidly developing a genetic algorithm into

poem generation has been a creative method [5] and is used

in many researches, such as MCGONAGALL [6] system

by H. Manurung. This relatively mature research also

provided definition of the following 3 required elements

of poem: meaningfulness, grammaticality and poeticness.

As Manurung also mentioned in another paper [7], what

makes it difficult to generate poems meeting the above

3 requirements are: (1) Difficulty of unity and inter-

dependency between grammar and semantics; (2) Very rich

supply of resources to support generation; (3) Objective

evaluation of output. Besides, poem composing by human

is a process starting with emotions, which results in poems

with sufficient emotional features. But recently, it is still an

impossible level for AI to reach at, let along artistic value.

On the other hand, most of the existing researches of

poem generation are based on English poetry. Ones based

on Eastern-Asia languages, such as Japanese and Chinese,

are in limited number. Especially on Japanese poetry,

although there have been many achievements on poetic

analysis, text mining, database construction, computer

assisted composing [8-11], etc., research on automatic

generation is not sufficient. Because of long traditional

history, Japanese poetry has a series of systematic theory

of composing, including many rules and restrictions on

stylistics and contents. Besides, there is a large difference

between poetic expression and ordinary expression of

Japanese, making it much more difficult to automatically

generate than English poetry whose stylistics is relatively

free.

Thus, to improve the deficiencies of existing poem

systems at some degree, and contribute to development in

the field of Japanese poem generation, on the basis of

previous computer-based researches on Japanese poetry,

we propose a new automatic Waka generation system

having the following distinctive features. (1) Text-based.

Most of the existing poem generation systems choose one

or more independent keywords as the origin, leading to

that generated poems have weak semantic and logical

relation inside themselves [2]. Comparatively, using texts

given by user as content materials for generation, because

Received: 2015.06.01 / Accepted: 2015.10.29

Special Issue on ISASE 2015

ORIGINAL ARTICLE

A Text-based Automatic Waka Generation System using Kansei

Ming YANG and Masafumi HAGIWARA

Graduate School of Science and Technology, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan

Abstract: In this paper, we propose an automatic Waka generation system with custom database, based on texts given by the user. The proposed text-based system has better compatibility with Waka poem, improving consistence and logicality of generated poems. Kansei information is also considered to make poems natural and closer to the emotions the user wants to express. Presented by interactive generation experiments, the proposed system can generate Waka poems reflecting stylistic and grammatical requirements. Meanwhile, the poems are also with related meanings and emotions to the original text and some poeticness.

Keywords: Waka Generation, Interactive Genetic Algorithm, Kansei Information

International Journal of Affective Engineering Vol.15 No.2 (Special Issue)

126

of its stronger consistence, can compensate for the weak-

ness. Correspondingly, we selected Waka, a traditional

genre of Japanese poetry, as research object rather than

popular Haiku for the reason that Haiku is a concise genre

with high requirement of summarizing and connotation

beauty. Since texts contain much more information than

discrete keywords, it may be difficult to generate highly

correlative Haiku from text. Besides, because of longer

length, Waka has better compatibility with both narrative

and depictive texts. (2) Usage of custom database.

Although the amount of available databases about

Japanese poems is scarce, there are enough paper-based

dictionaries with required contents. Thus, we digitized

these dictionaries and several open databases on Internet

to construct a scalable database for Japanese poem genera-

tion, providing sufficient resources. (3) Introduction of

Kansei system. Since automatically generated poems have

inherent weakness on emotional expression, this research

proposes introduction of existing Kansei system into a

process of generation and evaluation in the system to build

corresponding relation between poem and the original

texts. That will make generated poems more natural and

closer to the emotions the user would like to express.

(4) Usage of an interactive genetic algorithm. Because of

the subjectivity of evaluation in arts fields, it remains

impossible to let computer complete all the tasks and still

needs subjective thinking of human. As a solution, an

interactive genetic algorithm is introduced in this research.

Considering the potential applications, since Waka is a

kind of concentrated expression of Japanese culture,

emotions and aesthetics, this research would contribute

to learning and analyzing these elements in Japanese

people’s thoughts for the fields of cross-culture communi-

cation. Meanwhile, it may also contribute to the design of

automatic conversation AI by adding the ability to deal

with emotional and cultural information.

Following this introduction, the proposed Waka genera-

tion system is explained in Chap. 2. In Chap. 3, the

experimental results are shown, and Chap. 4 concludes

this paper.

2. WAKA GENERATION SYSTEM

2.1 OutlineThe proposed system consists of 3 connected phases.

Figure 1 shows the flowchart showing its running proce-

dures and data flow.

One run starts with that an original text is inputted into

the proposed system by the user. In the first phase, Text

Analysis, the system extracts main information from the

text, including keywords, case grammar and context

information. Following, the extracted keywords will be

assigned Kansei tags and become the input to the next

phase, while case grammar and context will be stored for

the generation process. In the second phase, Database and

Retrieval, the system searches for pieces in literary

vocabulary with similar meaning and Kansei tags of

keywords from database as materials to generate poem.

The third phase is Generation and Evaluation based on an

interactive genetic algorithm, further divided into auto-

matic and interactive part. In the former, the proposed

system combines the materials, literary pieces, and the

thread, case grammar and context, by stylistic rules of

Waka poetry to generate poems, and performs automatic

evaluation. This evolutionary loop will continue repeating

until a sufficient number of eligible poems are generated.

In the interactive part, the system considers some items of

subjective evaluation by human experimenters, and

continues to regenerate new generations until the set

number of generation is reached at. Finally, the system

run ends with one selected output Waka poem. In the

following Sec. 2.2-2.4, the procedure and methods in the

3 phases will be explained.

2.2 Text AnalysisAs shown in Figure 2, the text analysis part is divided into

4 levels to be performed in sequence. In order to continue

the subsequent procedures smoothly, some restrictions are

defined for the input text. It should be written in modern

Japanese plain form with length between 80 and 200 kanas,

and the maximum of sentences is 5.

Figure 1: System structure and data flow

Figure 2: Text analysis

A Text-based Automatic Waka Generation System using Kansei

International Journal of Affective Engineering Vol.15 No.2 (Special Issue)

127

(1) Morphological Analysis

The purpose is to separate a sentence into independent

words and return them into dictionary form. Meanwhile,

pronunciation and part-of-speech (POS) of each word can

also be recognized. In the area of Japanese morphological

analysis, MeCab [12] is a representative tool with high

efficiency. It is used directly in the system for analysis. On

this basis, keyword extraction is performed. Termex [13]

can directly extract keywords from paragraph, but only

nouns. As complementation, the proposed system uses

Chakoshi [14] simultaneously, which is an online corpus

tool based on Aozora Bunko [15]. Compared with

Balanced Corpus of Contemporary Written Japanese [16],

Aozora Bunko, whose main contents are literary works,

is more suitable for the proposed system. Using this tool,

the system can obtain frequency of every word other than

nouns in the whole Aozora Bunko. And through simplifying

TF-IDF [17] formula, we can obtain the following formula,

(1)

If R>1/1000, the word will be regarded as a keyword.

Finally, the keyword set includes ones selected by both

Termex and Chakoshi. Afterwards, some of the keywords

will be assigned Kansei tags. The standard to determine

Kansei of words is Emotional Expression Dictionary [18],

and in whose emotion system, Kansei of verbal expres-

sion is defined as 10 categories shown in Table 1.

(2) Syntactic Analysis

This analysis is to recognize the level structure in a

sentence, including phrase structure and dependency

relationship. CaboCha is a dependency analyzer based on

MeCab we tried to use in this part. However, although

CaboCha is excellent in phrase structure analysis, the

result of dependency recognition does not meet our

demand. Thus, based on the POS relationship between

2 words in Japanese, we designed a simple method to

determine the dependency relationship in a sentence.

The examples of grammar rules considered are listed in

Table 2. Certainly, since this procedure is modularized,

recent method could be taken place if there is another

effective one leading to better result.

(3) Semantic Analysis

This is to recognize role of each substantive (noun,

pronoun, etc.) in a sentence, which is also named as case

grammar. In this research, deep classification of case

grammar has not been involved, but classification by

postposition after the substantive. Table 3 shows these

categories. Performing deep case [19] analysis can mine out

more semantic information, but external corpus is necessary,

which makes the system much more complicated.

(4) Context Analysis

Context is inter-sentence relation, including conjunction

and correspondence. In Japanese NLP, there have been a

few achievements on context processing, and which are

not general principles but discussion on certain special

cases. That also indicates it is quite difficult to extract

and apply context information. However, some context

information is expressed through some symbols that

might be main objects we focus on to extract inter-sentence

relation information, as shown in Table 4.

By the 4 levels of text analysis, the proposed system

extracts main information from the original text. Keywords

will enter into the next phase one by one as an input while

case grammar and context information will be reserved

for the generation.

Table 1: Kansei categories

喜 Happy 好 Fond

怒 Angry 厭 Disgusted

哀 Sad 昂 Excited

怖 Afraid 安 Relaxed

恥 Embarrassed 驚 Amazed

Table 2: Examples of custom grammar rules

Pattern (Type = -1) Pattern(Type = 1)AはB A(sub.)のB(sub.)

Aへ・から・まで・たりB AとB(v.)A、<conj.>、B A<aux.>B

Aと、B A(dec.)B(sub.)

A/B : Word or phraseType = 1 : No modifying relation between A and BType = 1 : A modifies B( ) : POS of A/B < > : Auxiliary wordSub. : substantive word Dec. : declinable word

Table 3: Postposition and case type

が GA case(Agent)

に NI Case(Patient, Location, Goal, Source, etc.)

を WO case(Patient)

で DE Case(Agent, Instrument,

Location, Cause, etc.)

も GA case or WO Case(Agent or Patient)

と TO Case(Patient)

Table 4: Context information

た Pastなかった Past & Negation

ば・たら・なら Hypothesisなければ・なかったら Hypothesis & Negationない・なく・なくて・ず Negationそうだ・ようだ・らしい Conjectureている・てある・ておく Stateで・ので・ため・から Reasonけど、けれど、が、のに Contradiction

ても Concession

International Journal of Affective Engineering Vol.15 No.2 (Special Issue)

128

2.3 Database and RetrieveAs widely known, classical Japanese poems, represented

by Haiku and Waka, different from modern English

poem with free stylistics, use language style combining

traditional Japanese grammar and literary vocabulary.

But the original text inputted by the user is based on the

modern Japanese grammar, leading to great discrepancy at

many respects with Waka poems. Therefore, to generate

poems closer to the classical Japanese style, we need a

process that transforms modern ordinary Japanese words to

the traditional literary words: A literary vocabulary resource

database and the retrieval procedure are necessary.

About design of database structure, research of Hitch-

Haiku [20], a Haiku creation support system, has proposed

a shaped idea. It classified verbal resource for Haiku

composition as several sub-databases, and assigned each

one a weight according to the frequency and importance

in Haiku poems. Since the Waka and Haiku have many

similarities in vocabulary, we reconstructed a database for

Waka generation on the basis of the database of Hitch-

Haiku system, including 5 sub-databases.

(1) Synonym and Association

To enrich the expression variation of one word, using

synonyms as alternative expression is a feasible choice.

Here, Japanese WordNet [21] is used to provide synonyms

for an input noun, if there is any. In consideration of

language style of Waka, the synonyms in Katakana (loan-

words) will be discarded. As for association, Associative

Concept Dictionary [22] is selected to provide various

associative nouns for an input noun called as stimulus.

In this part, only two types of association, attribute

concepts, adjectives describing the stimulus, and action

concepts, verbs related to the stimulus are used. Because

of similarity of function and matching method between

synonym and association, they are merged into one sub-

database in the proposed system.

(2) Kago

Kago is the vocabulary in Waka, including several

types [23]. However, in the proposed system, it especially

refers to the literary vocabulary in traditional Japanese,

which is the kernel of vocabulary for Waka generation.

Unfortunately, no available Kago database or digitalized

dictionary could be imported directly. To deal with it, we

typed the contents of a paper dictionary by K. Seryo [24]

into our database just for research. And the matching

method is to search for corresponding Kago with a given

word or its kana form. Furthermore, in Kago category,

there is a special type named Makura-Kotoba, which is a

set of fixed phrases with the usual length of 5 kanas, used

to modify and enhance rhythm feeling [23]. Mostly,

Makura-Kotoba is put in the first phrase of Waka, and

each keyword relates to different ones.

Design of data table and the record example are shown in

Table 5. Actually, Kansei information does not exist in the

original dictionary. Therefore, we added it to the database

according to the Emotional Expression Dictionary manu-

ally, which is the same situation for the other sub-databases.

Recently, this sub-database contains 3,954 records.

(3) Kigo

Kigo is the words of animals, plants, human acts, etc., to

add seasonal feeling in poems, originally in Haiku. Kigo

has 5 categories of spring, summer, autumn, winter and

new year. Our database uses data resource from an online

Kigo database [25], shown in Table 6. The amount of

records is 2,737. If there is any season name or word

obviously relating to a certain season (e.g. Sakura is a

seasonal word for spring) in the original text, the proposed

system will select corresponding Kigos by category.

(4) Idiom

The number of idioms is quite large in Japanese, and

many of them are concentrated from episodes with

logicality and rich sentiment. Effective usage of idioms in

Waka makes meaning expression profound, resulting in

better literary effects. However, idioms are with various

lengths. For this reason we selected ones with length above

14 kanas, whose minimal phrases are no longer than 7 kanas,

from an open online idiom database [26] and imported

them into our database. Since explanation for each idiom is

a paragraph of text in the original database, we extracted

some representative words for explanation manually. Data

table design is shown in Table 7. At present, it contains

1,399 records. As for matching, keywords contained in

Table 5: Examples of Kago sub-database

Original Kana Kago

太陽(The sun)

たいよう あまつひ・まめのひ【天日】 はくじつ【白日】 *たかひかる【高光る】

Type1 Type2 Type3 Type4

自然(Nature)

天文・気候・地理(Astronomy, Weather, Geography)

日(The sun)

/

Table 6: Example of Kigo sub-database (the first wind of New Year)

Kigo Kana Season Type1 Type2 POS Kansei

初風 はつかぜ 新年 天文 気候 名詞 喜

(First wind) (New year) (Astronomical) (Climate) (n.) (Happy)

Table 7: Data table design for Idiom sub-database

Idiom Kana Explanation Kansei

朝の露(Dew in the morning)

あしたのつゆ 短い 儚い(Temporal, Ephemeral)

悲(Sad)

A Text-based Automatic Waka Generation System using Kansei

International Journal of Affective Engineering Vol.15 No.2 (Special Issue)

129

each sentence will be matched to explanation words in

idiom by record, through the following equation,

(2)

where h is the number of matched words, t is the total of

explanation words. Larger value of Si means that the idiom

matches meaning of original sentence better, and only

idioms with Si > 50 will be remained.

(5) Onomatopoeia

Onomatopoeia is a collection of words simulating sounds,

movement or behaviors of things [27]. In Japanese,

onomatopoeia is used in various situations to express

abundant meanings, but mostly in daily language. In Waka,

it also appears, but usually in more special expressions

rather than the common one [23]. Thus, in the proposed

system, a mutually transforming process between

onomatopoeia and ordinary words is designed. To realize

it, we chose Japanese Onomatopoeia Dictionary [27],

including more than 2,700 items, as a resource and

typed them into the data table shown in Table 8. As the

idiom sub-database, explanation for each item has

also rewritten as discrete words. Firstly, all records are

classified as 2 types, Wago, whose etymology is tradi-

tional Japanese, and Kango, whose etymology is Chinese.

For the Wago-Onomatopoeia appears in the original text,

the proposed system converts it to explanation words by

direct matching. On the other hand, through matching

words in the original text to the explanation words in

records, proper Kango-Onomatopoeia will be found

as material for Waka generation. The formula for this

matching is

(3)

where h is the number of matched words, t is the total of

explanation words. Larger value of Sok means that the

Kango-Onomatopoeia matches meaning of original

sentence better, and only ones with Sok > 30 will be

remained.

In the proposed system, every sub-database is scalable

only by importing new resource in the same defined form

of database. The larger the amount of records in database

is, the more abundant the materials for generation are.

That may improve the quality of generated poems.

However, enlarging database leads to more time consump-

tion during the matching process.

2.4 Generation and EvaluationSo far, through above 2 phases, the materials, the literary

pieces, and the thread, the case grammar and context,

are both ready. In the last phase, an evolutionary loop

mainly consisting of generation and evaluation done by an

interactive genetic algorithm, which will be described by

procedure as follows.

(1) Chromosome Design

A Waka poem consists of 5 phrases with lengths of 5, 7,

5, 7, 7 kanas, respectively. Considering that the length

is not long, we made a restriction that each phrase must

contain one or two kernel words. Besides, by the differ-

ence of kernel words, prefix and suffix may be used to

modify them in Waka. Thus, such a chromosome is

designed to represent primitive contents of a Waka poem,

showed by Figure 3.

In this chromosome, either of 1st or 3rd phrase uses

4 genes, while each other phrase uses 6 genes.

Furthermore, there is a gene indicating the use of

Makura-Kotoba. Thus, a chromosome consists of 27

genes with integer coding.

(2) Content Determination

According to the above restrictions, maximum of original

text sentences is 5. However, considering that Waka poem

is much shorter than the original text, usage of all

sentences as contents is not possible. Therefore, at first, it

must be decided randomly which sentences, from 2 to 5,

will be used. When the number is less than 5, it should

also be determined which phrases are assigned to each

sentence. According to Waka poetry, the first 3 phrases

belong to Kami-no-Ku and the last 2 phrases belong to

Shimo-no-Ku, which restricts that the 3rd and 4th phrases

cannot be assigned to the same sentence. Besides, when

the 1st and 2nd phrases are assigned to the same sentence,

the 1st may be replaced by Makura-Kotoba.

Table 8: Examples of Onomatopoeia sub-database

Ono. Type1 Type2 Type3 Explanation Kansei

ほかほか 自然(Nature)

温度 (Temperature)

暑い温かい(Hot,

Warm)

体 食べ物 温かい

(Body, Food, Warm)

安(Relaxed)

Ono. Type1 Kana / Explanation Kansei

烈烈 漢語(Kango)

れつれつ / 激しい勢い 盛ん(Intense,

Vigorous)

昂(Excited)

Figure 3: Chromosome design for genetic algorithm

International Journal of Affective Engineering Vol.15 No.2 (Special Issue)

130

The next step is content determination of every individ-

ual. Firstly, a primary population with n individuals should

be generated randomly. In chromosome, value of the gene

representing for Makura-Kotoba can be -1 (Makura-Kotoba

is not used) or non-negative integer M (serial number of

available Makura-Kotobas). Ones stand for kernel words

can be -1 (not enabled) or positive integer K (serial number

of available literary pieces). Ones show the use of prefix

and suffix can be -1 (not enabled) or 1 (enabled). However,

there are some basic restrictions for every individual.

(a) Makura-Kotoba and the 1st phrase cannot be enabled at

the same time. If the gene for Makura-Kotoba is M, all

genes in the 1st phrase will be regarded as -1. (b) In each

phrase, if there is only one kernel word enabled, it must be

the first one. (c) Considering frequency of prefix and suffix

in Waka poems, at most 2 prefix and suffix can be used in

each phrase. If generated result does not meet the require-

ment, the part with the problem will be re-generated until

it does.

(3) Generation

So far, every individual is still in status of numerical

values, requiring a series of process to convert it into

Waka poem meeting stylistic requirements.

Type recognition of verbs. Because of discrepancy

between the modern and the traditional grammar of Japa-

nese, classification and inflection are almost different.

Thus, it must be determined which type in traditional

grammar a verb belongs to. By invoking online dictionary

Weblio Kogo [28], the proposed system can grasp required

important information.

Determination of inflection. Inflection of a declinable

word (verb, adjective, adjective-verb, auxiliary verb, etc.)

should be determined by considering both POS of the

following word and the meaning. Here, we consider 3 types

of verb connections, Rentai (followed by substantive word),

Renyo (followed by declinable word, or end of a phrase),

and Shushi (end of the whole poem). On the other hand,

extracted context information attaches meaning to some

declinable words. By this we also classified them into 3 types,

Kako (past tense), Hitei (negation) and Futsu (others).

Moreover, to make the meaning natural, one more type for

verb, Izen-Katei, meaning the order of actions in Waka, will

also be considered. Table 9 shows possible inflections in the

system, represented by a Simo-Nidan verb.

Judgment of phrase length. Since literary pieces are

with different lengths, after inflection, recent materials

may have been exceeded length standard of phrase. Such

individuals cannot generate appropriate Waka poems. The

proposed System will test every phrase of each individual.

If over-length phrase is found, the individual will be

tagged as “F”, meaning that it is failed individual that will

not be processed in the subsequent steps.

Adding fillers. For individuals of no over-length, it

should be done to add fillers to make them reach at

standard length, according to genes in individual and

previous case grammar. The first step is to add prefix and

suffix considering whether the genes for them are

enabled. This is a rule-based process by references [24].

For certain words, corresponding prefix or suffix will be

used like some examples shown in Table 10. If the

corresponding rule does not exist, nothing will be added.

The second step is adding postpositions after kernel words

by the case grammar that a word belongs to which case

of postposition. Available postpositions are listed in

Table 11. Once the length requirement has still not been

met, interjection “よ”, used in the 1st or 3rd phrase only

containing a single noun, or exclamation “かも”, “や”, etc.,

used in the end of poem.

(4) Automatic Evaluation

After generated fundamentally formed poems from

individual genes, these poems should be evaluated on

quality. Firstly, computer automatic evaluation will be

performed based on quantized indices. Quality of a poem is

indicated by value of fitness, computed at following aspects.

Length of phrase. Although over-length individuals

are screened out, there may be still some individuals

with phrase whose length does not meet the standard after

Table 9: Examples of Inflections of a Simo-Nidan verb

Original 流れる Kako-Rentai 流れし

Futsu-Shushi 流る Hitei-Shushi 流れず

Futsu-Rentai 流れる Hitei-Rentai 流れぬ

Futsu-Renyo 流れ Hitei-Renyo 流れず

Kako-Shushi 流れき Izen-Katei 流るれば

Table 10: Examples of prefix and suffix

Kernel Word Prefix Kernel Word Suffix

明かり(Light)

薄~(Dim~)

親(Parent)

~がかり(Depend on~)

紅(Red)

真~(Dark~)

雨(Rain)

~がち(~ continually)

Table 11: Postposition

Case Type Postposition

GA Case が

WO Case を

GA Case or WO Case も

DE Case で

NI Case に・にも

TO Case と・とも

Modification between nouns の

A Text-based Automatic Waka Generation System using Kansei

International Journal of Affective Engineering Vol.15 No.2 (Special Issue)

131

a series of processing. However, these ones, tagged as

“S” (successfully generated) can be used for genetic

operations. For the individual with all phrases eligible, the

proposed system tags it as “P”, representing for perfect

individual.

Balance of POS. If an individual contains too many

kernel words of the same POS, it can hardly express

meaning completely and fluently. Thus, the proposed

system will select individuals only with percentage of

declinable words in the range of 33%-60% and addition of

declinable and substantive words account for at least 60%.

The others will be tagged as “F”.

Usage of each sub-database. To ensure various

materials are used, a fitness score table is decided as

shown in Table 12 on the number of pieces from each

sub-database in an individual.

Usage of fillers. To make expression various, fillers

leads to more natural meaning of poem. The score rules

are shown in Table 13.

Usage of Kansei information. To keep emotional

expression consistent and with natural variation, the

proposed system calculates types and frequency of each

kind of Kansei of all kernel words, according to the score

rules shown in Table 14.

After statistics on these aspects, fitness values of every

individual are computed. Fitness of individuals with

tag “F” will be set as 0. However, if the filtering leads to

that individuals with tag “F” are excessive, there will be

not enough blocks for genetic operations. Thus, if “F”

individuals are more than (1-α%)∙n, where n is the size of

population, system will repeat procedure (2)-(4) on these

individuals until amount of “S” and “P” individuals is at

least α%∙n.

(5) Genetic Operations

After primary poem population is generated, the

proposed system will perform a series of genetic opera-

tions on it to obtain better poem individuals. (a)Selection.

In recent population, α%∙ n individuals with highest

fitness values will be selected. (b)Crossover. These ones

will be divided into α%∙(n /2) pairs to perform 2-point

crossover, which is to select 1-4 consecutive whole

phrases and exchange with each other. Besides of the

α%∙n optimal individuals that will be remained, all of

the other vacant ones will be generated by crossover.

(c)Mutation. For any kernel word in every individual,

there is a probability of pm that mutation occurs. That is to

replace the kernel word by another literary piece related to

the same word or sentence in the original text. After these

operations, the proposed system runs the evolutionary

loop of (4)-(5) to generate a number of generations of

populations until the number of individuals with fitness

higher than a set threshold. Then the loop stops, and the

system turns to the interactive evaluation.

(6) Interactive Evaluation and Regeneration

In this process, human experimenters are demanded to

evaluate some subjective items beyond the power of

computer. As listed in Table 15, these items are also

assigned different weights according to their importance.

After the evaluation above, the proposed system sums

the scores of both automatic part and this part by each

individual, and regenerates a new generation of poem

individuals mainly through genetic operations that are

almost the same with done before. This process is also a

loop and ends when the number of generation reaches at

the set termination parameter m. Finally, one poem indi-

vidual with highest fitness value will be selected as the

final output. Hereto, an entire run of system has been

completed.

Table 12: Fitness score for usage of each sub-database

Original word or synonym 1 for each

Associative word 5 for each

Kago 10 for each

Kigo (only once) 15

Kigo (more than once) -10

Idiom (only once) 10

Idiom (more than once) -30

Kango-Onomatopoeia (only once) 15

Kango-Onomatopoeia (more than once) 10

Table 13: Fitness score for usage of fillers

Prefix or Suffix 5 for each Interjection 5 if exists

Postposition 3 for each Exclamation 3 if exists

Table 14: Fitness score for usage of Kansei information

Kansei of a kernel word is same with it of corresponding word in original sentence

10

Number of Kansei types in poem (1-2) 20

Number of Kansei types in poem (≥3) -30

Number of Kansei Changes in poem (1-2) 10

Number of Kansei Changes in poem (≥3) -25

Table 15: Fitness score for interactive evaluation items

Grammar 45

Meaning Consistence 45

Logicality 45

Accordance with Text 30

Rhythm and Rhyme 30

Aesthetics: Artistic Conception 60

Aesthetics, Emotional Expression 60

International Journal of Affective Engineering Vol.15 No.2 (Special Issue)

132

3. EXPERIMENT

To show whether this system and its approach can

successfully generate eligible Waka poems, we carried out

some generation experiments with details and results

described as follows.

The texts used in the experiments are a paragraph from

Kokoro, a novel composed by Soseki Natsume in 1914,

opened by Aozora Bunko, as shown in Figure 4. This

depictive text with length of 119 kanas describes a scene

of beach, and contains narrator’s disgusted emotion

against the crowd. Using it, we let the system run for

several times and recorded output results. The parameters

set in the experiments are shown in Table 16. As for the

experimenters in interactive evaluation part, the subjects

are 10 university students specializing in informatics.

Among results of the generation, we selected 2 of

the best poems shown in Figure 5. Index results of the

experiment are shown in Tables 17 and 18, indicating the

level of the proposed system’s performance by normal-

ized values of fitness scores, especially the score of the

best individuals. Differently, Table 17 shows the overall

scores while Table 18 subdivides them into 3 items,

the 3 required elements of poem defined by Manurung:

(a) Meaningfulness, including meaning consistence,

logicality and accordance with text; (b) Grammaticality,

only including the sub-item of grammar; (c) Poeticness,

including rhythm and rhyme, artistic conception and

emotional expression. The score of each item is normal-

ized respectively.

Considering what the indices suggest, some discussion

could be done on the example poems in Figure 5 accord-

ing to the above 3 items. (a) It can be found that general

meaning expressions by the poems are slightly vaguer

than the original text. However, their expressions do

reproduce concrete scenes related to the ones described

by the text. (b) Considering stylistics and traditional

Japanese grammar, generated poems are completely

eligible. (c) In the view of emotional expression, the text

suggests a kind of disgusted emotion. The first example

has similar emotion with the text, but in the second one,

except some words, the general emotion has not been

united as the text. As for aesthetic view, because of

features of Waka poetry, vagueness of meaning expression

provides blank for imagination, corresponding to the

connotation beauty, an essential concept in Japanese

poetry. Certainly, evaluation on poeticness has not had a

widely recognized quantized standard, which leading to

that the evaluation for selecting the “best” individuals

may be slightly subjective. Therefore, some further

researches are required to improve on this point, such

as cooperative studies with experts in Waka poetry.

By specialized knowledges, reliability of interactive evalu-

ation and definition exaction of evaluation items could

both be improved.

Figure 4: Testing text

Table 16: Experiment parameters

n (Size of population) 200

pm (Probability of mutation) 5%

α% (Starting parameter) 15%

m (Termination parameter) 4

Figure 5: Examples of best results of experiment

Table 17: Index results of experiments (overall)

Experiment No. 1 2 3 Average

Best Individual 0.561 0.700 0.725 0.662

Worst Individual 0.470 0.521 0.563 0.518

Average Score 0.510 0.572 0.605 0.562

Table 18: Index results of experiments (itemized)

Experiment No. 1 2 3 Average

(a) Best Individual 0.925 0.950 0.950 0.942

(a) Average Score 0.624 0.685 0.638 0.649

(b) Best Individual 1.0 1.0 1.0 1.0

(b) Average Score 0.776 0.76 0.792 0.776

(c) Best Individual 0.920 0.920 0.840 0.890

(c) Average Score 0.675 0.729 0.647 0.684

A Text-based Automatic Waka Generation System using Kansei

International Journal of Affective Engineering Vol.15 No.2 (Special Issue)

133

4. CONCLUSION

In this paper, we proposed an automatic Waka genera-

tion system with custom database, based on the text given

by the user. The proposed system has 3 connected phrases

with methods including leveled text analysis, Kansei

information processing, literary vocabulary database,

genetic algorithm and both automatic and interactive

evaluation. Through a series of generation experiments,

we found that using this system, Waka poems reflecting

stylistic and grammatical requirements of Waka poetry

can be generated. Meanwhile, they are also with related

meanings and emotions to the original text and poeticness

at some degree.

In the future, we will mainly focus on refining the

indices of Kansei information and expending the capacity

of database to improve the generation results. Besides, the

feasibility to apply the proposed research on other genres

of poetry will also be investigated.

REFERENCES1. C.O. Hartman; Virtual Muse: Experiments in

Computer Poetry, Wesleyan University Press,

Middletown, USA, 1996.

2. C. Zhou, W. You and X.J. Ding; Genetic Algorithm

and Its Implementation of Automatic Generation of

Chinese SONGCI, Journal of Software, 21(3),

pp.427-437, 2010.

3. P. Gervás; An Expert System for the Composition of

Formal Spanish Poetry, Knowledge-Based Systems,

14(3), pp.181-188, 2001.

4. B. Díaz-Agudo, P. Gervás and P.A. González-Calero;

Poetry Generation in COLIBRI, Advances in Case-

Based Reasoning, Springer Berlin Heidelberg,

pp.73-87, 2002.

5. H.E. Gruber and S.N. Davis; Inching Our Way up

Mount Olympus: the Evolving-Systems Approach

to Creative Thinking, The Nature of Creativity,

Contemporary Psychological Perspectives, Cambridge

University Press, Cambridge, pp.243-269, 1988.

6. H. Manurung; An Evolutionary Algorithm Approach

to Poetry Generation, The University of Edinburgh,

Edinburgh, 2004.

7. H. Manurung, G. Ritchie and H. Thompson; Towards

a Computational Model of Poetry Generation, The

University of Edinburgh, Edinburgh, 2000.

8. M. Yamamoto and J. Rokui; Semantic Information

Extraction from Waka Poems Using Decision Rule

of Rough Set, IEICE Technical Report, 105(682),

pp.73-78, 2006.

9. M. Takeda, T. Fukuda, I. Nanri and M. Yamasaki;

Discovering Characteristic Patterns from Classical

Japanese Poem Database, Journal of Information

Processing Society of Japan, 40(3), pp.783-795,

1999.

10. R. Yoshioka; The Development of the Kigo-database

and the Outline of the “Haiku Entry and Appreciation

System”, National Institute for Educational Policy

Research of Japan, 2006.

11. M. Suzuki, G. Hattori, C. Ono, N. Takada and

N. Minagawa; Recommendation of Kigo for Support

to Create Haiku: Photo-Haiku Communication,

IEICE Technical Report, 111(273), pp.7-10, 2011.

12. T. Kudo; MeCab: Yet another Part-of-Speech and

Morphological Analyzer, 2005.

http://mecab.sourceforge.net/.

13. H. Nakagawa, A. Maeda and H. Kojima; Termex,

University of Tokyo, 2003.

http://gensen.dl.itc.u-tokyo.ac.jp/index.html.

14. J. Fukada; Chakoshi, Purdue University, 2008.

http://tell.cla.purdue.edu/chakoshi/public.html.

15. Aozora Bunko, http://www.aozora.gr.jp/.

16. Balanced Corpus of Contemporary Written Japanese,

Center for Corpus Development, NINJAL, 2009.

http://www.ninjal.ac.jp/corpus_center/bccwj/.

17. G. Salton, A. Wong and C.S. Yang; A Vector Space

Model for Automatic Indexing, Communications of

the ACM, 18(11), pp.613-620, 1975.

18. Akira Nakamura; Emotional Expression Dictionary,

Tokyodo Press, Tokyo, 1993.

19. M. Motoki, Y. Shimazu and N. Takahashi; Deep Case

Analysis Using a Layered Neural Network, Journal

of Information Processing Society of Japan, 36(11),

pp.2597-2610, 1995.

20. N. Tosa, H. Obara, M. Minoh and S. Matsuoka;

Hitch-Haiku: Japanese Haiku Poem Creation Support

System by Computer, Academic Center for Comput-

ing and Media Studies, Kyoto University, 62(2),

pp.247-255, 2008.

21. Japanese WordNet, National Institute of Information

and Communications Technology of Japan, 2009.

http://nlpwww.nict.go.jp/wn-ja/.

22. J. Okamoto and S. Isizaki; Evaluating a Method of

Extracting Important Sentences Using Distance between

Entries in an Associative Concept Dictionary, Journal

of Natural Language Processing, 10(5), pp.139-151,

2003.

23. Books Iituka; Introduction to Grammar of Tanka,

Books Iituka, Tokyo 1994.

International Journal of Affective Engineering Vol.15 No.2 (Special Issue)

134

24. K. Seryo; Dictionary for Looking up Traditional

Vocabulary by Modern Vocabulary, Sanseido Press,

Tokyo, 2007.

25. Kigo Database, International Research Center for

Japanese Studies, http://www.nichibun.ac.jp/graphic-

version/dbase/kigo/index.html.

26. T. Kurogo; Kurogo-Style Idiom Dictionary, 1999.

http://www.geocities.jp/tomomi965/index2.html.

27. M. Ono; Japanese Onomatopoeia Dictionary,

Shogakukan Press, Tokyo, 2007.

28. Weblio Kogo Dictionary, http://kobun.weblio.jp/.

29. S. Natsume, E. McClellan; Translation of: Kokoro,

Gateway Editions, Washington, 1985.

Ming YANG Ming Yang received his M.E. degree from Keio

University, Japan, in 2015. His research is related to

natural language processing and Kansei information.

Especially, he has interests in language processing

of Japanese, and its potential applications in

literature.

Masafumi HAGIWARA (Member)Masafumi Hagiwara received his B.E., M.E. and

Ph.D degrees in electrical engineering from Keio

University, Yokohama, Japan, in 1982, 1984 and

1987, respectively. Since 1987 he has been with

Keio University, where he is now a Professor.

From 1991 to 1993, he was a visiting scholar at

Stanford University. He received IEEE Consumer Electronics Society

Chester Sall Award in 1990, Author Award from the Japan Society of

Fuzzy Theory and Systems in 1996, Technical Award and Paper Awards

from Japan Society of Kansei Engineering in 2003, 2004 and 2014,

Best research award from Japanese Neural Network Society in 2013.

He is a member of IEICE, IPSJ, JSAI, SOFT, IEE of Japan, Japan Society

of Kansei Engineering, JNNS and IEEE (Senior member). His research

interests include neural networks, fuzzy systems, and affective engineering.

He is now the president of the Japan Society for Fuzzy Theory and

Intelligent Informatics (SOFT).


Recommended