+ All Categories
Home > Documents > Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be...

Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be...

Date post: 21-Feb-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
17
Mapping Out Narrative Structures and Dynamics Using Networks and Textual Information Semi Min and Juyong Park Graduate School of Culture Technology, KAIST, Daejeon, Korea 34141 Human communication is often executed in the form of a narrative, an account of connected events composed of characters, actions, and settings. A coherent narrative structure is therefore a requisite for a well-formulated narrative – be it fictional or nonfictional – for informative and effective communication, opening up the possibility of a deeper understanding of a narrative by studying its structural properties. In this paper we present a network-based framework for modeling and analyzing the structure of a narrative, which is further expanded by incorporating methods from computational linguistics to utilize the narrative text. Modeling a narrative as a dynamically unfolding system, we characterize its progression via the growth patterns of the character network, and use sentiment analysis and topic modeling to represent the actual content of the narrative in the form of interaction maps between characters with associated sentiment values and keywords. This is a network framework advanced beyond the simple occurrence-based one most often used until now, allowing one to utilize the unique characteristics of a given narrative to a high degree. Given the ubiquity and importance of narratives, such advanced network-based representation and analysis framework may lead to a more systematic modeling and understanding of narratives for social interactions, expression of human sentiments, and communication. I. INTRODUCTION Recent advances in quantitative methodologies for the modeling and analyses of large-scale heterogeneous data have enabled novel understanding of various complex sys- tems from the social, technological, and biological do- mains [1]. The field of application is also rapidly ex- panding, now including the traditional academic fields of cultural studies humanities. It is allowing researchers to obtain novel answers to both long-standing problems by finding complex patterns that were previously hid- den. Recent examples include high-throughput analyses of language and literature based on massive digitization of books (e.g., Project Gutenberg [2] and Google Books) and proliferation of social media [1, 3], emergent pro- cesses in cultural history [4], and scientific analysis of art [5]. A theoretical data modeling and analysis framework that has attracted attention for cultural studies is net- works [4, 6]. Network science attempts to understand the structure and behavior of a complex system from the connection and interaction patterns between its compo- nents [7–10]. Owing to its flexibility as a modeling frame- work, network science has led to a novel understanding of many systems that are not only easily recognizable as a network such as the Worldwide Web [11, 12], the Internet [13], but also those that have been extensively studied in non-network contexts such as biological sys- tems or social organizations [14, 15]. In this paper we propose a network science-based framework for a cultural system that is ubiquitous in soci- ety and boasts a long history of study but we believe still can benefit from one: Narratives. Narratives (or stories) are important in that they are the most common way in which we communicate and recount our experiences. The connection between networks and narratives can also be seen in the very definition of the word: The New Oxford American Dictionary, for instance, defines narrative as “a spoken or written account of connected events.” This suggests that using networks may help us understanding how the various building blocks of narratives are weaved to become a coherent structure for effective delivery of messages and arousal of emotions. This way of thinking about narratives is deeply correlated with an interesting recent movement in literary studies named “distant read- ing” proposed by Moretti [16–18]. Distant reading is an approach to literature based on processing large amounts of literary data to devise and construct general “models” of narratives to understand them as a class, in contrast to reading each work very closely (hence the term “dis- tant”) to understand it. A model constructed through reduction and abstraction, the reasoning goes, would en- able us to grasp the general underlying structures and patterns of a class of complex objects called narrative, as an X-ray machine would allow us to understand the general skeletal features of the human body. To many of us this way of thinking is familiar as the very principle of research in the natural sciences: To un- derstand a system, one collects data and performs statis- tical analysis based on abstract models to gain an under- standing of the general characteristics of the system. A model of a system has the following characteristics. An abstract representation or notion of a system, a model necessarily excludes some features of the system it is representing. A random exclusion of features, of course, is unlikely to result in a useful model; it is important to make a judicious choice on which features to retain and which to exclude so that the model incorporates im- portant or essential of the system. Of course, it is very difficult to know beforehand which is the best choice of features. One practical starting point can be a common description of the system by people, since such a descrip- tion is already a type of mental representation which can be viewed as a model, however rudimentary. The network arXiv:1604.03029v1 [cs.CL] 24 Mar 2016
Transcript
Page 1: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

Mapping Out Narrative Structures and Dynamics Using Networks and TextualInformation

Semi Min and Juyong ParkGraduate School of Culture Technology, KAIST, Daejeon, Korea 34141

Human communication is often executed in the form of a narrative, an account of connectedevents composed of characters, actions, and settings. A coherent narrative structure is thereforea requisite for a well-formulated narrative – be it fictional or nonfictional – for informative andeffective communication, opening up the possibility of a deeper understanding of a narrative bystudying its structural properties. In this paper we present a network-based framework for modelingand analyzing the structure of a narrative, which is further expanded by incorporating methodsfrom computational linguistics to utilize the narrative text. Modeling a narrative as a dynamicallyunfolding system, we characterize its progression via the growth patterns of the character network,and use sentiment analysis and topic modeling to represent the actual content of the narrative inthe form of interaction maps between characters with associated sentiment values and keywords.This is a network framework advanced beyond the simple occurrence-based one most often useduntil now, allowing one to utilize the unique characteristics of a given narrative to a high degree.Given the ubiquity and importance of narratives, such advanced network-based representation andanalysis framework may lead to a more systematic modeling and understanding of narratives forsocial interactions, expression of human sentiments, and communication.

I. INTRODUCTION

Recent advances in quantitative methodologies for themodeling and analyses of large-scale heterogeneous datahave enabled novel understanding of various complex sys-tems from the social, technological, and biological do-mains [1]. The field of application is also rapidly ex-panding, now including the traditional academic fieldsof cultural studies humanities. It is allowing researchersto obtain novel answers to both long-standing problemsby finding complex patterns that were previously hid-den. Recent examples include high-throughput analysesof language and literature based on massive digitizationof books (e.g., Project Gutenberg [2] and Google Books)and proliferation of social media [1, 3], emergent pro-cesses in cultural history [4], and scientific analysis ofart [5].

A theoretical data modeling and analysis frameworkthat has attracted attention for cultural studies is net-works [4, 6]. Network science attempts to understandthe structure and behavior of a complex system from theconnection and interaction patterns between its compo-nents [7–10]. Owing to its flexibility as a modeling frame-work, network science has led to a novel understandingof many systems that are not only easily recognizableas a network such as the Worldwide Web [11, 12], theInternet [13], but also those that have been extensivelystudied in non-network contexts such as biological sys-tems or social organizations [14, 15].

In this paper we propose a network science-basedframework for a cultural system that is ubiquitous in soci-ety and boasts a long history of study but we believe stillcan benefit from one: Narratives. Narratives (or stories)are important in that they are the most common way inwhich we communicate and recount our experiences. Theconnection between networks and narratives can also beseen in the very definition of the word: The New Oxford

American Dictionary, for instance, defines narrative as“a spoken or written account of connected events.” Thissuggests that using networks may help us understandinghow the various building blocks of narratives are weavedto become a coherent structure for effective delivery ofmessages and arousal of emotions. This way of thinkingabout narratives is deeply correlated with an interestingrecent movement in literary studies named “distant read-ing” proposed by Moretti [16–18]. Distant reading is anapproach to literature based on processing large amountsof literary data to devise and construct general “models”of narratives to understand them as a class, in contrastto reading each work very closely (hence the term “dis-tant”) to understand it. A model constructed throughreduction and abstraction, the reasoning goes, would en-able us to grasp the general underlying structures andpatterns of a class of complex objects called narrative,as an X-ray machine would allow us to understand thegeneral skeletal features of the human body.

To many of us this way of thinking is familiar as thevery principle of research in the natural sciences: To un-derstand a system, one collects data and performs statis-tical analysis based on abstract models to gain an under-standing of the general characteristics of the system. Amodel of a system has the following characteristics. Anabstract representation or notion of a system, a modelnecessarily excludes some features of the system it isrepresenting. A random exclusion of features, of course,is unlikely to result in a useful model; it is importantto make a judicious choice on which features to retainand which to exclude so that the model incorporates im-portant or essential of the system. Of course, it is verydifficult to know beforehand which is the best choice offeatures. One practical starting point can be a commondescription of the system by people, since such a descrip-tion is already a type of mental representation which canbe viewed as a model, however rudimentary. The network

arX

iv:1

604.

0302

9v1

[cs

.CL

] 2

4 M

ar 2

016

Page 2: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

2

model of a narrative, from this perspective, appears tobe sensible and immediately understandable; in many in-stances when we recount a story, we focus prominentlyon the characters and their relationships. Take the StarWars movie franchise, for instance, the top grossing spaceopera in modern times [19, 20]. Once the generic phys-ical setting of “a galaxy far, far away” is presented, thestory progresses via the character’s actions, adventures,and relationships and interactions with others; that LeiaOrgana and Luke Skywalker are twins play an crucial rolein their fate, and the revelation via “I am your father” isperhaps the most memorable scene in the narrative. Inaddition to these individual relationships, group-level re-lationships are important as well for the story, such as theEmpire versus the Rebel Alliance, the dark side versusthe light side of the Force, etc. Examples abound fromhistory: The story of Oedipus that precedes Star Wars interms of shocking familial revelation; Dexter, a favoriteAmerican TV show of one of the authors, is a series ofepisodes that portray the titular character navigating hissocial world of his sibling, family, and rival criminals [50]

These examples all function as empirical bases for ap-proaching narratives from the network modeling frame-work. One of the earliest models proposed for narrativesin the distant reading philosophy introduced above, infact, was networks. Moretti applied the network frame-work to Shakespeare’s Hamlet for detecting specific re-gions in the plot, and performed many experiments suchas extracting specific nodes in the network of charactersto observe changes and make comparisons between differ-ent networks. Other network-based studies of narrativeinclude the study of the community structure of the char-acter network in Victor Hugo’s Les Miserables [21], thesocial networks of characters based on conversation in19th-century British novels [22], networks of mythologiesand sagas [23–25], and more recently, a technique for di-alog detection in novels applied to writer J. K. Rowling’sHarry Potter series [26]. While these serve to demon-strate the scientific community’s interest in network-based understanding of narratives, most works are lim-ited to the story of the static topological properties ofnetworks found in the said stories, when we know thatnarratives are essentially dynamically progressing enti-ties, and the text itself is a source of much informationthat can be used extensively. Given the wide range of an-alytical and computational tools that constitute networkscience, we believe there is much opportunity for furtherstudies of narratives using networks that take into ac-count such essential aspects. This paper is intended tobe one such attempt inspired by those works, hopefullylaying out potential future directions in utilizing networkscience and computational linguistics for understandingthe dynamics of narratives in a systematic manner.

II. MATERIALS AND METHODS

A. Material: Victor Hugo’s Les Miserables

We analyze Victor Hugo’s novel Les Miserables usingthe methods introduced in this paper. Set around thepopular uprising in Paris in 1832 CE, Les Miserables isknown for its vivid depiction of the conditions of thetumultuous times and intuition into the human psychevia multiple intersecting plots involving richly developedcharacters [27]. Its main plot follows fugitive Jean Val-jean’s trajectory that shows him transform into a forcefor good while being constantly haunted by his criminalpast. During his journey he interacts with many charac-ters, some helpful and friendly, and others antagonisticand hostile. The most important characters of the novelinclude the following:

• Fantine: A young woman abandoned with daugh-ter Cosette early in the novel. She later leavesCosette in the care of the Thenadiers, who thenabuse her. She is rescued by Valjean when Javertarrests her on charge of assaulting a man.

• Cosette: Fantine’s daughter, later adopted by Val-jean. Under Valjean’s care she grows into a beau-tiful woman, and falls in love with Marius.

• Marius: A young man associated with the “Friendsof the ABC (Les Amis de l’ABC in French),” agroup of revolutionaries. He is critically woundedat the barricade, but is rescued by Valjean. Helater marries Cosette.

• Javert : A police inspector in a relentless pursuitof Valjean. After being rescued by Valjean at thebarricades and realizing the immorality of the oldFrench system he has served loyally, he commitssuicide.

• Thenadier : A wretched man who abuses youngCosette. A lifetime schemer of robbery, fraud, andmurder, he conspires to rob Valjean until Mariusstops him, and gets arrested by Javert.

B. Method: Interacting Timelines and NetworkConstruction

The widely-accepted essential building blocks of a nar-rative are characters (also called agents or actants),events, and the causal or temporal relationships thatweave them together [28, 29]. An interrelated sequencecomposed of those elements is called a plot which maybe viewed as the backbone of a narrative. A narrativemay also be broken down into formal units such as acts,scenes, chapters, etc. [30]. Historically there have beenmany attempts to establish a general form of narrativestructure, of which a well-known example is Aristotle’sthree-act plot structure theory. It states that Act One

Page 3: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

3

presents the central theme and questions, followed byActs Two and Three that present major turning pointsand conclusion. Variant forms exist such as the four-actstructure theory [31, 32].

While these have existed for a long time and beenwidely applied, we find it difficult to imagine that thereis an a priori reason for all narratives to consist of threeor four parts. Then, can we deduce the structure of anarrative from the narrative itself? It appears that theincreasing availability of narrative texts in digital for-mat and analytical methods for data analysis offer anopportunity for a new look at narrative structures, andthe formulation of a flexible framework that can properlycapture the complexity of a given narrative.

An interesting pair of concepts helpful for picturingthe content of a narrative that serve as the basis for for-malism was given by Propp [33] who, while trying to es-tablish a symbolic notation-based formalism for Russianfolktales, proposed that narrative content consists of twolayers that he labeled the fabula and the sjuzet. Thefabula refers to the entire world that contains the nar-rative, while the sjuzet refers to those elements of thatworld explicitly presented to the audience. For instance,if the narrative is depicting a man dining with his fam-ily in his home, the sjuzet comprises the man and hisfamily (the characters), the act of dining (the event),and his home (the place), while the fabula is all of theabove plus the rest of the story world such as the man’scolleagues at work, their concurrent actions and where-abouts, etc. The sjuzet therefore can be considered thepart of the story world currently under observation, andthe rest of the fabula the part that “operates” in thebackground. Each component of the fabula may or maynot become sjuzet (explicitly presented to the audience)at another point in the narrative, but they are neverthe-less indispensable for the consistency of the story worldand future plot development via implicit action.

We start by representing a narrative as a set of char-acter timelines, basically the record of a character’sappearances in the narrative. The point of appearanceis marked in narrative units which can be scenes, chap-ters, etc., shown in Fig. 1. In our paper we follow theconvention used in the construction of the character net-work from Victor Hugo’s Les Miserables in Ref. [21]: twocharacters are connected in the network when they ap-pear in the same narrative unit. An interaction definedin this fashion would be more general than direct con-versations, as it could include a common experience orshared space in addition to a conversation. The narrativeformat can also present some practical issues in definingan interaction. In a play or movie script, for instance, itwould be much easier to identify a conversation betweencharacters as an interaction, which would be more ex-plicit and narrower in scope. An online resource namedmoviegalaxies provides a collection of social networks ofcharacters in hundreds of movies built in this way, alongwith static network properties such as the diameters andclustering coefficients [34]. Using this narrower defini-

tion in a novel is potentially problematic: it is difficultto detect conversations in a novel (though some advanceshave been recently made [26]), but more fundamentallyit would miss non-verbal interactions which exist abun-dantly in a novel. For this reason, it is difficult to stateat this point which would be a better approach. Perhapsa comparison study could be illustrating, although it isout of the scope of this work [18]. The rest of this pa-per is dedicated to exploring what the character networkbased on Fig. 1 can tell us about the narrative structureand how it progresses. The methodology will be demon-strated using the English translation of Victor Hugo’s LesMiserables [35], although it will be clear that the formal-ism itself applicable to any comparable narrative. Ourchoice of Hugo’s work is based on its stature as a classicknown for a set of richly developed characters [27], fa-miliarity in network science [21], and the free availabilityof the complete text on Project Gutenberg. Using theoriginal French version would be ideal, but we point outthat the network construction according to Fig. 1 is un-affected, and the wider availability of advanced computa-tional linguistic tools for the English language does pro-vide advantages for incorporating the text for enrichedanalyses that will be demonstrated in the latter part ofthe work.

In Fig. 1 (B) we show the narrative units in LesMiserables on several levels. From top to bottom, theyare the Chapters (colored according to their SentimentPolarity Index defined in Sec. II C 1), Books (groups ofChapters), Sequences (groups of Books), and Volume(even larger groups of Books). All but the Sequences,whose definition is given later in Sec. III 4, are by theauthor’s designation. It is reasonable to assume that theauthor intended each unit to represent a theme or sub-plot. The five Volumes of Les Miserables, for instance,are titled “Fantine,” “Cosette,”, “Marius,” “The Idyll inthe Rue Plumet and the Epic in the Rue St. Denis,”, and“Jean Valjean,” indicating their central character or plot.Since the different narrative units offer a varying degreeof resolution of the narrative, one may again choose theone that is most useful for their purposes. For our goalof studying the complexity of Les Miserables, however,the five Volumes appear too few; we therefore choose towork with the Chapters (of which there are 365) for mostpurposes, and the Sequence in a later analysis.

1. Network Topology and Growth Patterns

From the network of characters built based on the In-teracting Timelines of Fig. 1 (A) we can measure variousstatic network properties. But a narrative is essentiallya dynamical system that unfolds in time; what interestsa reader is how the story is told in time, not necessarilythe final, static network of characters. We need to studyhow the network grows over time and what we can learnabout the narrative from it. This is because the net-work growth is essentially coupled to the narrative flow:

Page 4: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

4

CharactersN

arra

tive

Uni

tsC

hara

cter

Tim

elin

esN

etw

ork

Con

stru

ctio

n

A Narrative Units, Interacting Character Timelines, and Character Network Construction

Valjean

Book1 2 3 4 5 6 7 8 10 11 12 13 14 18 19 20 21 22 23 24 26 27 28 29 30 32 33 35 36 37 38 39 40 42 43 44 45464748

Sequence 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Volume 1 Volume 2 Volume 3 Volume 4 Volume 5

B

FIG. 1: Interacting Timeline Framework for Network Modeling of Narratives. (A) Construction of the characternetwork from a narrative. We represent the narrative as a set of character timelines, the record of appearances of the charactersin narrative units (e.g. chapters, scenes, etc.). An interaction can be defined as co-appearance in a narrative unit. (B) Anarrative unit is not unique. One may use the author’s designation (i.e. the Volumes, Books, or Chapters in a novel) ordefine a new one such as the Sequence based on the unit-to-unit continuity of character compositions (defined in Sec. III 4).The narrative units in Victor Hugo’s Les Miserables are shown here, from the finest (Chapters, top) to the coarsest (Volumes,bottom).

Starting from an empty network in the beginning of thenarrative, the network grows as new characters are in-troduced and interact with others. In this sense, we cansay that the temporal growth of the network is intimatelyconnected to the concept of the so-called narrative stages.A common classification of narrative stages includes Ex-position, Rising Action, Climax, Falling Action, Resolu-tion, etc., named according to their role and nature [36]. For example, the Exposition stage introduces the char-acters and the space they inhabit. Once the motives andallegiances of the characters are presented, in the RisingAction the characters begin to struggle against each otheruntil all conflicts are resolved through the later stages.

We study the network growth pattern on two levels.First, on the aggregate level, we measure the growth the

number of nodes n and edges m of the network. Sec-ond, on the individual character level, we measure twovalues, appearance a (the number of chapters in whicha character makes an appearance) and degree k of thecharacters.

C. Method: Sentiment Analysis and TopicModeling

An analysis focused solely on the network topologyleaves out an essential component of a narrative, the text.This is important because a narrative is in essence muchmore than a record of who-meets-whom; in the form oftext, a narrative contains the details that can vary signif-

Page 5: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

5

0

0.2

-0.2

-0.4

-0.6

0.4

0.6

0.8

0 50

100

150

200

250

300

350

Sent

imen

t Pol

arity

Inde

x

Ch. 12: Myriel, a virtuous man, is introduced.

Ch. 31: Fantine goes on a picnic with friends.

Ch. 131: Writer muses on religion and cloisters

Ch. 259: Cosettes and Marius fall in love.

Ch. 22: Valjean nearly drowns.

Ch. 51: Fantine falls into misery.

Ch. 81: Writer describes Battle of Waterloo.

Chapters

Ch. 261: Cosette and Marius part.

Positive Words

AdmireHappyLove

Negative Words

HatePainSad…

Topic 1 Topic 2

Conflict MuseumViolenceWar ExhibitionMilitary

D

W

T

A Sentiment Analysis B Topic Modeling

C Chapter Sentiments in Les Misérables

Sentiment Polarity Index (SPI)-1 +1

FIG. 2: Sentiment Analysis and Topic Modeling of Narratives. (A) The principle of sentiment analysis. Wordsassociated with positive or negative sentiments contribute towards the Sentimental Polarity Index (SPI) of the text rangingfrom −1 (most negative) to +1 (most positive). (B) The principle of topic modeling. Clusters of words detected from a set oftexts that tend to appear together are identified as the topics. (C) SPIs of the chapters of Les Miserables. Vertical gray barsindicate the 21 Sequences of Les Miserables. Each sequence is colored according to the sign of the mean SPI of its constituentchapters (blue for positive, and red for negative). We compare the SPI and content for eight chapters in the narrative: Positivechapters depict uplifting characters or events (e.g., introduction of Myriel, a man of great character in Chapter 12) and happyevents (e.g., Fantine going on a picnic, Cosette and Marius falling in love, etc.), while negative chapters depict pain and suffering(e.g., Valjean nearly drowning, Fantine in misery, war, lovers parting, etc.)

Page 6: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

6

icantly between interactions [32, 33]. In Les Miserables,for instance, the nature of Valjean’s relationships to dif-ferent characters that is at the center of its drama – at thesame time a savior and protector to Cosette, and a fugi-tive criminal to Javert – is wholly missing in the simpleappearance-based network. This means that leveragingthe actual text of the narrative may lead to a richer andproper understanding of the narrative, which we performby using some tools developed in computational linguis-tics. Here we utilize two: The first tool is SentimentAnalysis that identifies the positive and negative sen-timental qualities of a text, which allows us to studythe sentimental states of character relationships and thebuild-up and the resolution of tension in the narrative.The second tool is Topic Modeling that identifies thetopics inside the novel, which allow us to associate thecharacters with the topics at different points in the nar-rative that define the characters’ states, and quantify theimpact of events on the characters.

1. Sentiment Analysis

Sentiment Analysis, also called Mood Analysis orOpinion Mining, is a technique for determining the sen-timental qualities of a given text based on the words itcontains. Its origin can be traced back to an attempt inthe 1990’s to translate written reviews of products intonumerical rating scores: To this day it is common toproduce a numerical Sentiment Polarity Index (SPI) of agiven text that shows its positive or negative quality. Ba-sically it count the words of known positive or negativesentimental states from a text to produce SPI. For in-stance, words such as “admire,” “happy,” and “love” con-tribute to the text’s positive sentiment, where as “hate,”“pain,” and “sad” would contribute to its negative senti-ment. (See Fig. 2 (A)). We note an interesting connectionto the Western literary tradition of the generic division ofdrama into comedy and tragedy, often stylized using twomasks – the laughing that represents Thalia, the Museof comedy in Greek and Roman mythology, and weep-ing one that represent Melpomene the Muse of tragedy,also shown in Fig. 2 (A). Here we use the LIWC (Linguis-tic Inquiry and Word Count) [37] program, one of severalavailable [38], to determine the SPI of the chapters of LesMiserables. LIWC actually returns two separate values,π ≥ 0 and ν ≥ 0, for the positive and negative sentimentsfor the input text, which we combine, for convenience,into a single SPI variable

σ ≡ log10

(π + 1

ν + 1

). (1)

Defined in this way, σ > 0 when the text is net positive(π > ν), σ = 0 when neutral (π = ν), and σ < 0 whennet negative (π < ν). We now have a set of values Σ ={σ1, . . . , σc}, where c = 365 is the number of chaptersin Les Miserables.

From Σ = {σ} we can compute SPIs of the charac-ters and character pairs using the timeline framework inFig. 1 (A). If a character α, for instance, has appeared inChapters 1, 2, and 100, we define Σ[α] = {σ1, σ2, σ100}to be the SPI set of α, from which we can calculatequantities such as the character’s average character SPI,σ[α] = (σ1 + σ2 + σ100)/3. The SPI of a character pairis similar: if two characters α and β have co-appearedin Chapters 2 and 100, for instance, their average SPI isσ[α, β] = (σ2 + σ100)/2.

2. Topic Modeling

Our second example of incorporating textual informa-tion for network-based narrative study is topic modeling.We will see that it allows us to determine the “topicalstate” of a character at any given point in the narrativeand map out a detailed picture of interaction betweencharacters. Topic modeling is a method for extractingclusters of correlated keywords from a set of documentsthat can be identified as separate “topics” of the texts.The basic idea is presented in Fig. 2 (B) through a tri-partite network composed of three layers of nodes: D ofdocuments, W of words, and T of topics. The goal is tofind T , essentially “bags of words” appearing often to-gether in documents, from the text data consisting of Dand W. Many studies have reported the success of topicmodeling in identifying word sets that match the humanunderstanding of groups of texts, and its practical appli-cability to problems like word sense induction [39–41].

Here we employ the Non-Negative Matrix Factoriza-tion (NNMF), famously used for identifying distinguish-able parts in images [42–45]. It decomposes the word–document TF-IDF (Term-Frequency – Inverse DocumentFrequency) matrix M (dim(M) = |W| × |D|) into QH,product of two matrices Q and H such that dim(Q) =|W| × |T | and dim(Q) = |T | × |D|. The number of top-ics |T | is an input parameter typically set to be smallerthan |W| and |D| [51]. We can then interpret the ma-trix Q = {qij} as the association strength between wordi and topic j, and H = {hjk} as that between topic jand chapter k. Using the framework of Fig. 1 (A) we canagain define the character-topic association strength tαkbetween character α and topic k as follows:

tαk ≡∑

Cαhjk∑

k∈T ,j∈Cαhjk

, (2)

where Cα is the set of chapters where character α appearsin. It is the normalized sum of all topic-chapters associ-ations hjk from the chapters featuring character α. Weused the scikit-learn Python machine learning pack-age to perform NNMF.

Page 7: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

7

III. RESULTS

3. Network Topology and Narrative Structure

This section is a summary of our previous work [47].We start by constructing the network of characters basedon Fig. 1. The final network of Les Miserables contains 63characters after very minor ones are excluded. Drawingan edge between two characters if they have appeared in achapter together results in m = 504 edges. The networkis shown in Fig. 3. In it, 25.8% of the character pairs areconnected, the mean geodesic length is 1.85, the networkdiameter is 4 (between the pair of Babet and Geborand,and 17 other pairs of relatively minor characters), andthe clustering coefficient is 0.77 [52].

Based on the distinction between different stages in anarrative, we can assume that n and m would not simplyincrease linearly in time but nonlinearly in accordancewith the nature of the stages. In Fig. 4 we show thegrowth of n and m along the narrative time measuredin chapters. As expected, the growth is not linear, espe-cially for the number of nodes n. After the first batch ofcharacters are introduced at the beginning of the narra-tive, there are specific points in the narrative where manynew characters are introduced simultaneously (noted S1,S2, and S3 in Fig. ??) that suggest they are the Expo-sition stages. An inspection of the actual story confirmsthis:

• Stage S1 : Fantine’s friends are introduced as herhappy days are depicted.

• Stage S2 : Valjean’s former fellow prison inmatestestify during the trial of the fake Valjean.

• Stage S3 : “The Friends of ABC” (young progres-sives) are introduced, shown debating various socialissues of the day.

There is also a stretch of chapters (S4 ) where thenetwork shows little growth. This part largely coin-cides with Volume 2 (“Cosette”) of the novel composedof those chapters that contain no narrative progression(i.e. the author digresses to discuss the battle of Wa-terloo, religion, the vagrant children of Paris, etc.) orthat show no network growth, being mainly about Val-jean and Cosette’s flight from the pursuit of Thenadierwhile avoiding people in general. Finally, near the end ofthe narrative at S5, it is the number of edges m that leadthe growth of the network while n shows little increase.This – new edges being created between existing nodeswithout the addition to new ones – implies a convergenceof the characters into a common environment: this partin fact describes the scene at the barricade where nearlyall major characters (who have been introduced before)converge.

In Fig. 5 we show the appearance a and degree k of theindividual characters. The final histogram of a is shownin Fig. 5 (A). It has a skewed distribution with many

characters appearing in a handful of chapters and a fewcharacters appearing in many chapters, for instance Mar-ius appears in 122 chapters, Valjean in 121, and Cosettein 97, whereas the mean and the median are 19.3 and 9respectively, nearly an order of magnitude smaller thanthe most frequent characters. In Fig. 5 (C) we showthe temporal growth of each character’s cumulative ap-pearance. Although Marius and Valjean are similar inthe total appearances (122 and 121, respectively), howthese values are reached are very different. Valjean firstappears in the beginning of the novel, then with regular-ity until there is a noticeable absence between chapters160 and 233 (indicated by a plateau). During Valjean’sabsence, Marius, making his first appearance in chapter170, takes the center stage in the novel and appears in al-most every chapter until he overtakes Valjean in appear-ance. This is a direct reflection of the structure of LesMiserables: the first part is mainly about Valjean (withMarius absent), the second part is mainly about Marius(with Valjean absent), and the final part features both asmajor characters. The degree k (Fig. 5 (B) and (D)), onthe other hand, differs in interesting ways from a. Thethree highest-degree nodes are Valjean (k = 43), Cosette(k = 41), and Javert (k = 39), whereas Marius is down tok = 34. The degree therefore captures the nature of thesocial sphere around a character that appearance alonecannot tell: Valjean is a well-travelled character linkingmany different spheres of the story world, whereas Mar-ius associates with a narrow pool of characters (namelythe young fellow rebels) and his love interest Cosette.

4. Sentiment Analysis and Narrative Progression

The chapter sentiments are shown in Fig. 2 (C). Wealso study how the sentiments and content of chaptersmatch: Positive chapters tend to depict uplifting char-acters (e.g. Myriel, a virtuous man) and events (e.g.,Fantine going on a picnic, Cosette and Marius falling inlove, etc.), whereas negative chapters depict pain and suf-fering (e.g., Valjean nearly drowning, Fantine in misery,war, lovers parting, etc.). We also see alternating clus-ters of positive and negative chapters, indicating a certainpattern of emotional fluctuations. This is reminiscent ofan interpretation of narrative as a metaphor for life thatfluctuates between contradictory states of harmony andpeace, and tension and fear [46]. We also note that theaverage chapter SPI is σ = 0.06± 0.01, i.e. net positive.We believe this is an example of the so-called “Pollyannaeffect” referring to a universal positivity bias in humanlanguage [3].

We show the sentiments σ for select characters andpairs in Fig. 6. In Fig. 6 (A) we show ten characters –five major (frequently appearing) and five minor (infre-quently appearing) – for comparison. While their averagevalues are positive (due to the Pollyanna effect), the joy-less Javert is more negative than other main characterssuch as Marius, Valjean, and Cosette. Nevertheless, ma-

Page 8: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

8

Courfeyrac

FantineGavroche

Marius

Pontmercy

Bahorel

Bossuet

Combeferre Enjolras

Feuilly

Grantaire

Joly Mabeuf

HucheloupProuvaire Tholomyes

Baptistine

Magloire

Myriel

FaucheleventGillenormand

Dahlia

Favourite

Zephine

Fameuil

Listolier

Marius

Pontmercy

FaucheleventGillenormand

Javert Brevet

ChampmathieuChenildieu

CochepailleGervais

Baptistine

Magloire

Myriel

Courfeyrac

GavrocheBahorel

Combeferre Enjolras

Feuilly

Grantaire

Joly Mabeuf

HucheloupProuvaire

Babet

Brujon

Claquesous

EponineMagnon

Montparnasse

ThenardierToussaint

Babet

Brujon

Claquesous

EponineMagnon

Montparnasse

ThenardierToussaint

Courfeyrac

GavrocheBahorel

Combeferre Enjolras

Feuilly

Grantaire

Joly Mabeuf

HucheloupProuvaire

Fantine

TholomyesDahlia

Favourite

Zephine

Fameuil

Listolier

JavertBossuet

Brevet

ChampmathieuChenildieu

CochepailleGervais

CosetteValjean

CosetteValjean Simplice

VII II

III

IV

V

VI

I

FIG. 3: The Character Network of Les Miserables and Its Community Structure. The character network of LesMiserables. The node radius is proportional to its degree (the number of its neighbors). The network shows many commoncharacteristics of a social network such as the small-world property and the community structure. The node color indicates thecommunity to which it belongs (we identify seven, labeled I to VII), while the edge color indicates the sign of the cosentimentsof the character pair (blue for positive, and red for negative), defined and discussed further in Sec. III 4.

0

100

200

300

400

500

0

10

20

30

40

50

60

Narrative Time

NodesLinks

Number of Nodes Number of Links

0 50 100 150 200 250 300 350

Num

ber o

f nod

es n

Num

ber o

f edg

es m

Narrative Time (Chapters)

S1

S2

S4S3

S5

Edges

FIG. 4: Growth Patterns of the Character Networks The number of characters n and the number of edges m in LesMiserables grow in a nonlinear fashion, indicating that different stages in narratives contribute differently to the network growthvia character introduction or formation of new connections.

Page 9: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

9

0

1

2

3

4

5

6

7

8

0 10 20 30 40 50 60 70 80 90 100 110 1200

1

2

3

4

5

6

7

8

0 5 10 15 20 25 30 35 40

0

5

10

15

20

25

030

060

090

012

0015

0018

0021

0024

0027

0030

0033

0036

0039

0042

0045

0048

00

0

20

40

60

80

100

120

Marius

Valjean

Cosette

Javert

FantineBossuet

0

5

10

15

20

25

30

35

40

45

50

Fantine

Javert

CosetteValjean

0

1000

2000

3000

4000

5000

6000

Marius

Valjean

Cosette

ThenardierJavertCourfeyrac

Fantine

Number of Characters

Narrative Time

Marius,

(D) Appearance (E) Degree (F) Weighted Degree

Thenardier,

Boussuet,

Courfeyrac

CourfeyracThenardier

0

1

2

3

4

5

6

7

8

0 10 20 30 40 50 60 70 80 90 100 110 1200

1

2

3

4

5

6

7

8

0 5 10 15 20 25 30 35 40

(A) Appearance Distribution (B) Unweighted Degree Distribution (C) Weighted Degree Distribution

Narrative Time Narrative Time0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350

0

5

10

15

20

25

10 100 1000 10000

Appearance Degree Weighted Degree

(A) Appearance Histogram (B) Degree Histogram

(C) Cumulative Appearance (D) Cumulative Degree

Chapters Chapters

Valjean

Marius

FIG. 5: Centralities of Characters in Les Miserables Histograms of (A) appearance and (B) unweighted degrees of thecharacters of Les Miserables. The histograms are relatively skewed, with some characters having high values and many havingsmall values. The three most frequently appearing characters are Marius (122), Valjean (121), and Cosette (97), while the threehighest-degree characters are Valjean (43), Cosette (41), and Javert (39), and the three highest-weighted degree characters areValjean (5203), Marius (4148), and Cosette (3977). The discrepancies indicate the differences in the characteristics of theirsocial networks. (C) and (D) are the growths of these quantities for each character, showing the differing points at which thecharacters are actively depicted.

jor characters experience a wider range of SPIs than theminor ones, which we believe indicates their sentimentalcomplexity. In the figure we see that Valjean appearsfrequently in both positive and negative chapters, show-ing his role as the carrier of varying sentimental states,in contrast to short-lived minor ones. In Fig. 6 (B) weshow the SPIs of a number of character pairs. Valjeanunderstandably shows a higher average SPI when withhis adoptive daughter Cosette than with his archneme-sis Javert, although the wide range of SPIs again indi-cate the sentimental complexity of the leading character

pairs. The Pollyanna effect still stands true here; in gen-eral, the average SPI of character pairs (dotted line) isa positive value at σ0 = 0.07. Therefore it is sensibleto define the cosentiment of a character pair (α, β) tobe σ[α, β]− σ0. This quantity was already used for edgecolors in Fig. 3. We can also use this to study the senti-mental states within and between communities, shown inFig. 6 (C). In the figure, the diagonal elements show thefractions of positive and negative edges inside the com-munities, whereas the off-diagonal elements show thosebetween two communities. The circle radius indicates the

Page 10: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

10

logarithm of the number of edges. Communities II andVI are in general the most negative inside, showing theharsh and tragic nature of the common experiences of theprisoners and revolutionaries. To the contrary, Commu-nities V and VII are the most positive inside. Betweencommunities, II and VI are the most negative, due toJavert’s presence at the tragic barricade scene with therevolutionaries.

We now study the sentimental qualities of the networkand how they change along the narrative progression. Itis shown in Fig. 7, where each panel corresponds to aSequence of the novel first introduced in Fig. 1 (B). Thedefinition and rationale for the Sequence are as follows:Sometimes a plot or a storyline may span multiple con-secutive narrative units, which makes it reasonable tobundle them into a larger one. To achieve it we needto determine the similarity between subsequent narra-tive units. One possibility we use here is the charactercomposition; consecutive units belonging to the same orhighly similar plots are likely to contain similar char-acters. Specifically, starting from the 40 Books of LesMiserables(excluding eight that contain no characters),we bundle the consecutive ones whose characters are sim-ilar above a prescribed threshold. Using the cosine simi-larity (although others such as the Jaccard index may beused) and setting the threshold to be the average simi-larity (0.49) between consecutive book pairs, we end upwith the 21 Sequences shown in Fig. 7. We also show thefraction of negative and positive edges.

The correlation between sentimental fluctuations andnarrative flow are perhaps the best understood fromFig. 7 by studying Marius and his revolutionary friends.When they are first introduced in Sequence 8, the sen-timent is overwhelmingly positive, reflecting the air ofoptimism from their cause. Such initial positivity is notlong-lived, however, as they have to struggle with theiradversaries in subsequent Sequences 11, 14, and 15. Afterthey overcome these challenges they briefly regain theirpositive sentiment (Sequence 16), but then are thrust intothe most tragic and climactic circumstances (Sequences17–20) that show high negativity. Finally, at the end ofthe novel (Sequence 21) the resolution is reached show-ing a highly positive sentiment. The fluctuations betweenpositive and negative in this fashion are known to be bydesign [46]

5. Topic Modeling and Mapping Interaction Dynamics viaTopical States

We set |T | = 50. The results for all 50 are given inFig. 8. The keywords (the strongest ones are in bold) tellus that the topics are often about the characters (e.g. T1,T2, and T3), places (e.g. T11, T20, and T25), or events(e.g. T7, T22, and T42). The character–topic associa-tions {tαk} are visualized in Fig. 9 (A) for Valjean andMarius, scaled so that the strongest topic fills the spacebetween the two circles. The five strongest topics for

each characters are T1, T4, T3, T2, and T7 for Vajean,and T2, T1, T3, T4, and T14 or Marius. From Fig. 8we see that they are about themselves and related char-acters or actions (valjean, escape, marius, eponine,etc.). We can also use them to identify topics associatedwith the communities by summing up the tαk over thechapters that contain two or more of the members of thecommunity, which are shown in Fig. 9 (B). The topicsshown are relevant to multiple members of the group, forinstance, characters from inside the community (e.g., T1and T4 for Community I) or outside (e.g., T2 and T29for Community I), or the events or places, for instanceT41 (the trial) for Community II of Javert and Valjean’sfellow prison inmates.

We now introduce an interesting use of topics for rep-resenting narrative dynamics. An impactful event in aperson’s life is one that brings about significant changesin the person’s state. This means that even in a nar-rative, if one could define a character’s state at a givenpoint, one could measure the impact or significance of anevent by comparing the states from before and after theevent. We use topic modeling to do exactly this, by inter-preting tαk as the topical state of the character. Theidea is straightforward: Since an associated topic indi-cates the action, events, interactions, etc. taking place inthe character’s presence, it can be understood as tellingus the situation or the state of the character. WhileFig. 9 (A) shows the topical states averaged over the en-tire novel, we can define a character’s topical state ata given point in the narrative by obtaining the topicalassociations from the corresponding chapter(s). As anexample now study the impact that the interactions be-tween Marius and Valjean have on the character’s states.For simplicity we consider Valjean and Marius to be in-teracting largely two times in Les Miserables, promptingus to partition the novel into the following four phases:

1. Phase I (Chapters 1 to 233): Before the first inter-action. Valjean and Marius lead separate lives.

2. Phase II (Chapters 234 to 266): The first interac-tion take place. Marius falls in love with Cosette,causing Valjean to become anxious about losingher.

3. Phase III (Chapters 272 to 295): Valjean is absentfrom the narrative, so no interaction takes place.Marius parts from Cosette, then joins the revolu-tionaries at the barricade.

4. Phase IV (Chapters 296 to the end of the narra-tive): The second interaction takes place. Mariusgets injured at the barricade, then is rescued byValjean. Cosette and Marius marry. Valjean dies.

Our strategy now is to observe the changes in charac-ters’ states tαk. We then use them to understand the de-tails of the interaction dynamic. First, the changes in tαkfor the characters at the end of each phase are shown inFig. 10 (A), obtained by subtracting the tαk immediately

Page 11: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

11

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

Chenildieu.Javert

Chenildieu.Cochepaille

Dahlia.Fameuil

Dahlia.Listolier

Fameuil.Listolier

Gillenormand.Marius

Javert.Valjean

Marius.Valjean

Cosette.Marius

Cosette.Valjean

−0.4

−0.2 0.0

0.2

0.4

Chenildieu

Cochepaille

Fameuil

Listolier

Marguerite

Thenardier

Javert

Cosette

Valjean

Marius

-1

-0.5

0

0.5

1

SPI SPI

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0 50 100 150 200 250 300 350

−0.4

−0.2 0.0

0.2

0.4

0 50 100 150 200 250 300 350

A B

MeanMedian

Chapter Average

Maj

or C

hara

cter

sM

inor

Cha

ract

ers

Maj

or C

hara

cter

sM

inor

Cha

ract

ers

MeanMedian

Character Pair Average

Sentiment Polarity Indices of Individual Characters Sentiment Polarity Indices of Character Pairs

VI

VII

V

I

II

III

IV

SPI SPI

Appearances by Valjean Appearances by Valjean and Javert

ChaptersChapters

C Intra- and Inter-community CosentimentsCommunities

I

II

III

IV

V

VI

VII

Fraction of Negative Edges

Fraction of Positive Edges

I II III IV V VI VII

12 edges

22 edges

55 edges

FIG. 6: Sentiments of Characters, Character Pairs, and Communities. (A) Sentiment Polarity Indices (SPIs) forthe characters of Les Miserables. The yellow boxes indicate the SPI ranges of 50% of the chapters around each character’smedian (25% below, 25% above). The leading characters (higher in the plot) feature a wider range of SPIs than the marginalones (lower in the plot), reflecting their role in the sentimental fluctuations of the narrative. The SPIs of the chapters inwhich Valjean appears are shown below. (B) SPIs for character pairs. Valjean indeed shows an higher SPI when togetherwith protegee Cosette than pursuer Javert, although SPIs for leading characters again show a wide range. (C) The intra- andinter-community cosentiments. Communities II and VI are in general the most negative inside, due to the fact that prisonersand revolutionaries share difficult and tragic experiences (harsh prison terms and death at the barricade). Communities Vand VII are the most positive inside. Between communities, II and VI are the most negative, due to Javert’s presence at thebarricade with the revolutionaries.

before the interactions from that from immediately after.At the end of Phase I, Valjean is the most strongly asso-ciated with T1, T5, T21, T47, and T29, whereas Mariusis with T2, T14, T8, T32, and T37 which represent their

trajectories up to that point according to Fig. 8. Theyshare no common topics, as expected from the lack ofany interaction up to that point – in fact, the correlationbetween their {tαk} is negative at −0.20 ± 0.01. At the

Page 12: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

12

Volume4

Volume5

18

20

Prouvaire

Courfeyrac

Enjolras

Mabeuf

Hucheloup

Com

beferre

Gavroche

Bahorel

Bossuet

Joly

Feuilly

Prouvaire

JavertV

aljean

Cosette

Grantaire

Marius

Thenardier

Eponine

Fauchelevent

Pontmercy

Gillenorm

and

Magnon

Toussaint

Boulatruelle G

illenormand

Claquesous

Pontmercy

Fauchelevent

Cosette

Thenardier

Eponine

Marius

Valjean

Mabeuf

Cham

pmathieu

Javert

Baroness

Toussaint

Fantine

Gavroche

Courfeyrac

Grantaire

Prouvaire

Bossuet

Com

beferre

Enjolras

Volume 1

Volume 3

Volume 4

Volume 5

Gueulem

er

JavertB

abet

Claquesous

Montparnasse

Brujon

Eponine

Magnon

Thenardier

Boulatruelle

Marius

Joly

Enjolras

Grantaire

Feuilly Com

beferre

Courfeyrac

Bossuet

ProuvaireB

ahorel

Fantine

Pontmercy

Mabeuf

Positive Cosentim

entN

egative Cosentim

ent

Fraction of Positive Edges

Fraction of Negative E

dges

1117

2021

Philippe

Pontmercy

Cosette

Enjolras

Marius

Javert

Bossuet

Baptistine

Cravatte

Jacquin

Magloire

Myriel

Philippe

Valjean

Bossuet

Brevet

Count

Geborand

Isabeau

1

Myriel is a positive influence on Valjean

by forgiving him w

hen he steals his silverw

are.

2

Fantine enjoys her life with her friends,

but falls into a miserable one. Valjean

saves her from Javert.

Bam

atabois

FantineJavert

Tholom

yes

Bossuet

Dahlia

Favourite

Zephine

Cosette

Eponine

Marguerite

PhilippeT

henardierV

aljean

Victurnien

Fameuil

Listolier

Fauchelev

Myriel

Bam

atabois

Bossuet

Brevet

Cham

pmathieu

ChenildieuC

ochepaille

Gervais

Javert

Valjean

Scaufflaire Cosette

Fantine

Fauchelevent

Douai

Perpetue

Simplice

3

Valjean reveals his true identity to save a m

an. Fantine dies.

Boulatruelle

Thenardier

Valjean

4

Valjean fakes his own death by falling

into the ocean, and buried his treasure in M

ontfermeil.

Cosette

Eponine

Fantine

Fauchelevent

Gribier

Javert

Thenardier

Valjean

5

Valjean saves Cosette from

Thenardier and flees to Paris. Pursued by Javert, they take refuge in the in the Petit-Picpus convent.

6

Gillenorm

andV

aubois

Jacquin

Magnon

Gillenorm

and, Marius’ grandfather,

is introduced.

7

Marius, estranged from

his grandfather for his liberal view

s, leaves his family.

Bahorel

Bossuet

Com

beferre

Courfeyrac

Enjolras

Feuilly Grantaire

Joly

Legle

Marius

Prouvaire

Tholom

yes

Pontmercy 8

Marius befriends Friends of A

BC

, young revolutionaries, w

ho have hopes for a better France.

Courfeyrac

Enjolras

Gillenorm

and

Mabeuf

Marius

Pontmercy

Thenardier

Montparnasse

Lieutenant 9

Marius befriends M

abeuf, who helps him

find a job.

Burgon

Courfeyrac

Marius

10

Marius develops a feeling for a w

oman he

sees at the Luxembourg G

ardens, (w

ho later turns out to be Cosette).

Thenardier attacks Valjean, who is saved

by Marius and later turns out.

Cosette

FantineFauchelevent

Javert

Marius

Thenardier T

oussaint

Valjean

12

Valjean worries that he m

ight lose C

osette who has feelings for M

arius.

13

Cosette fears for her safety w

hen a stranger follow

s her in the garden, (who turns out

to be Marius).

Brujon

Gavroche

Babet

Montparnasse

Thenardier

Eponine

Magnon

Gillenorm

and 14

Thenardier and his ilk escape from the

prison with the aid of G

avroche.

Bahorel

Bossuet

Com

beferre

CourfeyracE

njolras

Feuilly Gavroche

GrantaireJoly

Mabeuf

Marius

Hucheloup

Philippe

Prouvaire

Javert 16

The revolutionaries build barricades and are filled w

ith optimism

.M

arius, despondent upon finding C

osette gone, joins the barricade.

18

Joly

Com

beferre

Gavroche

Javert

Bahorel

Enjolras

Hucheloup

Courfeyrac

Feuilly

Bossuet

Prouvaire

Mabeuf

Gillenorm

andE

ponine

Thenardier

Cosette

Fauchelevent

Valjean

Toussaint

Pontmercy

Marius

Marius escapes a near death w

ith help from

Eponine, who dies

Toussaint

Valjean

Marius

Cosette

Gavroche

19

Gavroche delivers a letter from

Marius to

Cosette, w

hich is intercepted by Valjean.Valjean saves M

arius at the barricade. A

massacre takes place.

The bloodshed is over. Cosette and

Marius got m

arried. Valjean dies in piece. Babet

Brujon

Claquesous

Eponine M

agnon

Marius

Montparnasse

ThenardierT

oussaint

Bahorel

Cosette

CourfeyracV

aljean

Com

beferre

Enjolras

Feuilly

Gillenorm

andJavert

FaucheleventL

ieutenant

Pontmercy

Gavroche

Mabeuf

Marguerite

15

Valjean decides to leave Paris. Marius w

ants to m

arry Cosette, but his grandfather suggests

he make her his m

istress. Thenardier attem

pts to attack Valjean again, but is thw

arted by Eponine.

Volume 1

Volume 4

Volume 2

Volume 3

Volume 4

Philippe

Thenardier

MabeufM

arius

Courfeyrac

Pontmercy

Gillenorm

andL

ieutenant

Enjolras

Montparnasse

Valjean

Fantine

Cosette

Marius

Lieutenant

Toussaint

Gillenorm

and

FIG. 7: Network Snapshots Showing Sentiments and Narrative Flow Snapshots of character networks in the 21Sequences of Les Miserables. Edges are colored according to the cosentiment between the characters. The fractions of positiveand negative edges are indicated in each snapshot, along with the summary of major plots in the Sequence. The sentimentalfluctuations often reflect the build up of drama, tension, and resolution.

Page 13: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

13

TopicsTopic Keywords Topic Keywords

1 valjean , jean, lantern, light, perceive 26 fantine , franc, marguerite, thenardiers, victurnien2 marius , father, say, letter, know 27 paris , city, make, rome, populace3 enjolras , barricade, gun, combeferre, insurgent 28 gavroche , montparnasse, elephant, pistol, bahorel4 cosette , say, toussaint, garden, happy 29 sister , fantine, simplice, madeleine, doctor5 bishop , magloire, madame, monseigneur, say 30 tholomyes , favourite, blachevelle, fantine, dahlia6 thenardier , man, franc, doll, madame 31 horse , maire, tilbury, scaufflaire, say7 revolution , louis, king, philippe, france 32 courfeyrac , bahorel, portress, tragedy, coffer8 gillenormand , theodule, grandfather, mademoiselle, aunt 33 infinite , infinity, philosophy, god, i9 child , mother, father, little, girl 34 cloister , convent, monasticism, monastery, community

10 english , wellington, cuirassier, battle, road 35 sand , water, fontis, foot, ship11 rue , street, du, des, calvaire 36 gamin , titi, arab, pear, paris12 javert , police, inspector, mayor, spy 37 grantaire , enjolras, laigle, joly, bossuet13 wall , door, grating, large, house 38 et, qu, je, la, le14 bench , girl, marius, luxembourg, young 39 woman , dandy, old, man, spur15 love , soul, heart, god, destiny 40 mur , droit, rue, polonceau, picpus16 jondrette , leblanc, benefactor, door, say 41 president , district, attorney, champmathieu, jury17 bed , sleep, candlestick, open, bishop 42 revolt , war, insurrection, uprising, riot18 social , undermine, civilization, society, human 43 brujon , babet, montparnasse, eponine, claquesous19 fauchelevent , coffin, digger, grave, cemetery 44 esprit , luc, century, vulgar, age20 sewer , paris, subterranean, bruneseau, city 45 mabeuf , plutarque, book, flora, old21 madeleine , sur, mayor, town, cart 46 hucheloup , rue, wine, barricade, shop22 waterloo , napoleon, wellington, battle, blucher 47 boulatruelle , forest, treasure, road, mender23 convent , nun, prioress, mother, holy 48 let, arab, doctor, populace, indignantly24 barricade , insurrection, paris, saint, faubourg 49 slang , language, le, historian, word25 garden , night, flower, forest, grass 50 vi, plank, eighty, voice, shovelful

FIG. 8: Complete List of Topics for Les Miserables. 50 Topics of Les Miserables found via Non-Negative MatrixFactorization (NNMF). Strongly associated keywords are also listed (the strongest keywords in bold). The topics are frequentlyabout the characters (e.g. T1, T2, and T3), places (e.g. T11, T20, and T25), or events (e.g. T7, T22, and T42).

end of Phase II after their first interaction the correla-tion increases to 0.42± 0.01, showing that an interactionworks to correlate the character states. At the end ofPhase III (no interaction) it decreases again slightly to0.33±0.001. At the end of Phase IV where they interactagain for the final time and quite extensively it reachesits highest value of 0.70 ± 0.01. These show that an in-teraction functions to assimilate the characters’ states,and an inspection of the changes ∆tαk provides us withmore detail of this assimilation dynamics. For simplic-ity, we again focus on the five topics (for each character)that gain the most in strength after each phase, shown inFig. 10 (A). After the first interaction, we find that thefive such topics for Valjean are T4, T2, T1, T7, and T25,whereas for Marius they are T1, T4, T25, T7, and T45.When we compare the strongly associated topics frombefore and after the interactions, we find there are somethat we can interpret as having been transferred fromone character to the other. An example is T2 (marius),the strongest one with Marius before Phase II, whichgains the most for Valjean after. The same goes for T1(valjean), this time from Valjean to Marius. Second,there are topics that have entered the characters statesexogenously, i.e. those that not strongly associated witheither character. They represent new common experi-ences or interests that occur during the interactions: T4(cosette), T7 (revolution), and T25 (garden) are suchcases. They again reflect the story accurately: Cosette

becomes the focal point of both characters, as a new loveinterest for Marius that causes severe anxiety to Valjean.Some topics enter only one character’s state, such asT45 (mabeuf) which is about a character Mabeuf whoshares his story with Marius at the barricade, but haslittle to do with Valjean – Valjean’s topical state indeedhas near-zero component of T45. Next, during PhaseIII, T11 (rue), T24 (barricade), T46 (hucheloup), T42(revolt), and T28 (gavroche) gain the most strengthfor Marius, reflecting the events and the characters heexperiences during that time. Valjean is absent. Finally,during Phase IV, T3 (enjolras), T2, T28 (gavroche),T24 (barricade), and T1 gain the most strength withValjean, whereas topics T3 T1, T28, T35 (sand), and T12(javert) gain the most strength with Marius. Note howthe directionality of T28 and T24 from Marius to Val-jean reflects the actual way things happen between thecharacters: Gavroche (T28), a friend of Marius’, carriesa letter from Marius to Valjean that motivates Valjean tojoin the barricade (T24) in search of Marius. Our discus-sion here about topic transfers and entry can be system-atically visualized as in Fig. 10 (B) on top of the basicinteraction timeline first introduced in Fig. 1, showingthat the textual information indeed allows to constructa much more detailed picture of an interaction than asimple occurrence-based network construction.

The results provided in this section, by showing thatthe story of a narrative can be identified, quantified, an-

Page 14: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

14

1 2 3 4

7

1 2 3 4

14MariusValjean

A

TopicsTopics

Strongly Associated Topics for Valjean and Marius

valjean, jean, lantern, light,

perceive, escape, old, wall,

day, galley, patrol, come,

life, place

marius, father, say, letter,

know, man, come, address, eponine, tell,

think, promise, baron, hand

Topic 1 Topic 2

VI

VII

V

I

II

III

IV

T3, T41, T12, T1, T31

T30, T39, T26, T9, T25

T5, T17, T1, T7, T31

T1, T4, T2, T29, T19T6, T43, T2, T4, T26

T3, T2, T32, T37, T46

T2, T8, T1, T4, T12

B Strongly Associated Topics for Communities

FIG. 9: Topics Associated with Valjean, Marius, and The Communities. (A) Topics strongly associated with Valjean(left) and Marius (right). The topic-character association strengths are scaled so that the largest value fills the space betweenthe two circles. Topic T1 is the most relevant to Valjean, while Topic T2 is to Marius. They contain the respective characternames as the strongest keywords, but also contain with words closely related to each character. (B) Topics strongly associatedwith the communities in Fig. 2. The topics can be about the characters inside the community, or even from the outside as longas they are sufficiently associated with multiple members of the community. For instance, T2 (marius), T29 (sister, fantine),and T19 (fauchelevent) are strongly associated with Community I, although the characters belong to other communities. Thetopics can also be the events involving the community members, for instance T41 (the trial – attorney, jury) for CommunityII composed of Javert and Valjean’s fellow prison inmates.

alyzed, and visualized by making use of appropriate ana-lytical and computational tools, we believe demonstratethe benefits and opportunities of approaching traditionalsubjects as narrative from a novel perspective that allowsus to find new patterns and gain a richer understandingnot readily available previously.

IV. DISCUSSIONS AND CONCLUSIONS

In this paper we proposed a network-based frameworkfor modeling a narrative by focusing on the charactersand their interactions. We started by representing a nar-rative as a set of interacting character timelines, fromwhich we constructed a growing character network. Tolegitimize our approach it was necessary to understandhow the character network topology and dynamics re-flected the narrative structure correctly. We found that

character centralities captured the role and the nature ofthe social spheres of characters in the narrative, while thetemporal growth of network showed distinct phases withdiffering patterns of increasing nodes or edges depend-ing on whether the narrative was focusing on isolatedcharacters (stagnant growth), expanding the story worldby introducing new characters (growth led by numberof nodes), or when existing characters converge into thebuilding process to the resolution (growth led by numberof edges).

An important characteristic of well-written drama isthat it evokes emotion in the reader, which in the west-ern literary tradition is conventionally represented by thegeneric division of drama into comedy and tragedy. Thishad an interesting connection to a modern computationalmethodology called sentiment analysis. We found thatmany characters, especially the central ones, showed sig-nificant fluctuations of sentiments during the narrative

Page 15: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

15

First Interaction Second Interaction

11 24 28

42 46

Valjean

Marius

Time

Timeline of a character

Newly incorporated topics during the interaction

Marius’ new topicsobtained in Phase 3

Sequences 1-5, 7-11 Sequences 12-13, 15 Sequences16-17 Sequences18-21

4 7 25

45

12 35

Marius’ new topicsobtained in first meeting

AfterPhase I

AfterPhase II

AfterPhase III

MariusPhase 4

AfterPhase IV

AfterPhase I

AfterPhase II

AfterPhase IV

Absent During Phase III

1

Topic transferred fromValjean to Marius

1 228

24

3

A Major Topics for Valjean and Marius

Marius

Valjean

Phase I Phase II Phase III Phase IV

B Timeline of Interaction between Valjean and Marius

Topic transfer between charactersTopic input from outside

15

2129

47

2

14

8

32

37

14

7

25

45

1 2 4

7

25

11

2428

42

46

2428

1 2 3

1 3

28

35

12Topics

Topics

FIG. 10: Mapping Out Interactions Diagrammatically As Dynamic Topic Exchange. Events lead to charactertransformation, which we quantify via the character’s topical states. With respect to the interaction between Valjean andMarius, we divide Les Miserables into four phases. (A) The net changes in the topical states of the characters at the end ofeach phase, quantified by the differences tαk. After Phase II, T2 (the strongest topic for Marius before) shows a sharp increasefor Valjean. Likewise, T1 (Valjean’s strongest topic before) shows a sharp increase for Marius. T4, T7, and T25 increase forboth characters, while T45 increases only for Marius. (B) Diagrammatic representation of the changes of Marius’ and Valjean’stopical states as ’topic transfers’ during each phase; Topics can be exchanged between characters (e.g., T1 and T2 during PhaseII) or enter either character’s topical state exogenously (e.g., T4, T7, T25, and T45). The dirctions can also reflect those ofactual story elements: during Phase IV (Chapters 296–365), Valjean, prompted by a letter from Marius, joins the barricade.This is directly reflected in the transfer of topics T24 and T28 from Marius to Valjean.

Page 16: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

16

flow, acting as the carriers of mood and emotions of thenarrative. This was true of character relationships aswell, and we showed how the sentimental fluctuationscorrelated with the narrative progression that showed de-tectable patterns of dramatic tension build-up and reso-lution.

Finally, we used topic modeling as a way to define thestate of a character via the topics (keywords) with whichthey are associated at various points in the narrative.This allowed us to trace quantitatively the changes incharacters’ states, and quantify and map out the detailsof an event or an interaction between characters. We alsodemonstrated that the flow of topics between characterscan reflect the actual story in interesting ways, providingus with a way to systematically represent the patterns ofcharacter interactions that previously resided in the textof the narrative.

We believe that our paper presents a wide range ofideas for studying narrative structures that merit furtherexploration using the methods of network science, dataanalysis, and computational linguistics. Looking further,representing a narrative as a dynamically unfolding sys-tem of character networks and interactions also sets thestage for using theories and tools for understanding of

dynamical systems, not only networks. Advances in thisarea have practical implications as well, such as an im-proved algorithm for computer-assisted writing and sto-rytelling which no doubt can benefit from a more robustunderstanding of the patterns of character relationshipsand interactions. Given the ubiquity and importance ofnarratives, we hope that future developments based onour work will be beneficial for a wide range of fields in-cluding literature, communication, and storytelling.

Acknowledgments

The authors would like to thank Kyungyeon Moon,Wonjae Lee, and Bong Gwan Jun for helpful com-ments. This work was supported by the NationalResearch Foundation of Korea (NRF-20100004910 andNRF-2013S1A3A2055285), BK21 Plus Postgraduate Or-ganization for Content Science, and the Digital ContentsResearch and Development program of MSIP (R0184-15-1037, Development of Data Mining Core Technologiesfor Real-time Intelligent Information Recommendation inSmart Spaces).

[1] J.-B. Michel, Y. K. Shen, A. P. Aiden, A. Veres, M. K.Gray, J. P. Pickett, D. Hoiberg, D. Clancy, P. Norvig,J. Orwant, et al., Science 331, 176 (2011).

[2] Project Gutenberg, URL https://www.gutenberg.org/.[3] P. S. Dodds, E. M. Clark, S. Desu, M. R. Frank, A. J.

Reagan, J. R. Williams, L. Mitchell, K. D. Harris, I. M.Kloumann, J. P. Bagrow, et al., Proceedings of the Na-tional Academy of Sciences 112, 2389 (2015).

[4] M. Schich, C. Song, Y.-Y. Ahn, A. Mirsky, M. Martino,A.-L. Barabasi, and D. Helbing, Science 345, 558 (2014).

[5] D. Kim, S.-W. Son, and H. Jeong, Scientific Reports 4,7370 (2014).

[6] D. Park, A. Bae, M. Schich, and J. Park, EPJ Data Sci-ence 4, 1 (2015).

[7] M. Newman, Networks: An Introduction (Oxford Uni-versity Press, Pres2010).

[8] R. Albert and A.-L. Barabasi, Reviews of ModernPhysics 74, 47 (2002).

[9] D. Easley and J. Kleinberg, Networks, Crowds, and Mar-kets: Reasoning About a Highly Connected World (Cam-bridge University Press, 2010).

[10] J. Han, M. Kamber, and J. Pei, Data Mining: Conceptsand Techniques (Elsevier, New York, 2011).

[11] L. A. Adamic and B. A. Huberman, Science 287, 2115(2000).

[12] R. Albert, H. Jeong, and A.-L. Barabasi, Nature 401,130 (1999).

[13] J. H. Choi, G. A. Barnett, and B.-S. Chon, Global Net-works 6, 81 (2006).

[14] S. P. Borgatti and P. C. Foster, Journal of Management29, 991 (2003).

[15] V. Grimm, E. Revilla, U. Berger, F. Jeltsch, W. M.Mooij, S. F. Railsback, H.-H. Thulke, J. Weiner, T. Wie-

gand, and D. L. DeAngelis, Science 310, 987 (2005).[16] F. Moretti, New Left Review 81, 80 (2011).[17] F. Moretti, Distant Reading (Verso, New York, 2013).[18] Pamphlets by Stanford Literary Lab, URL https://

litlab.stanford.edu/pamphlets/.[19] Box office mojo, URL http://www.boxofficemojo.com/.[20] The Numbers, URL http://www.the-numbers.com/.[21] M. E. J. Newman and M. Girvan, Phys. Rev. E 69,

026113 (2004).[22] D. K. Elson, N. Dames, and K. R. McKeown, in Pro-

ceedings of the 48th Annual Meeting of the Associationfor Computational Linguistics (Association for Compu-tational Linguistics, 2010), pp. 138–147.

[23] P. M. Carron and R. Kenna, EPL 99, 28002 (2012).[24] P. Mac Carron and R. Kenna, The European Physical

Journal B 86, 1 (2013).[25] D. Kydros, P. Notopoulos, and G. Exarchos, Interna-

tional Journal of Humanities and Arts Computing 9, 115(2015).

[26] M. C. Waumans, T. Nicodeme, and B. Hugues, PLoSOnene 10, e0126470 (2015).

[27] A. Welsh, Nineteenth-Century Fiction 33, 8 (1978).[28] S. Rimmon-Kenan, Narrative Fiction: Contemporary

Poetics (Routledge, London, 2003).[29] M. Bal and C. V. Boheemen, Narratology: Introduction

to the Theory of Narrative (University of Toronto Press,2009).

[30] H. P. Abbott, The Cambridge Introduction to Narrative(Cambridge University Press, 2008).

[31] S. Field, Screenplay: The Foundations of Screenwriting(Delta, New York, 2007).

[32] C. Vogler, The Writer’s Journey (Michael Wiese Pro-ductions, Seattle, 2007).

Page 17: Information · 2016. 4. 12. · 2 model of a narrative, from this perspective, appears to be sensible and immediately understandable; in many in-stances when we recount a story, we

17

[33] V. Propp, Morphology of the Folktale (University ofTexas Press, Austin, Texas, 2010).

[34] Moviegalaxies, URL http://www.moviegalaxies.com.[35] Les Miserables, URL http://www.gutenberg.org/

files/135/135-h/135-h.htm.[36] G. Freytag, Freytag’s Technique of the Drama: An Expo-

sition of Dramatic Composition and Art (Scholarly Press,1896).

[37] Y. R. Tausczik and J. W. Pennebaker, Journal of lan-guage and social psychology 29, 24 (2010).

[38] P. Goncalves, M. Araujo, F. Benevenuto, and M. Cha, inProceedings of the first ACM conference on Online socialnetworks (ACM, 2013), pp. 27–38.

[39] D. Jurgens and K. Stevens, in Proceedings of the ACL2010 System Demonstrations (2010), pp. 30–35.

[40] T. Van de Cruys and M. Apidianaki, in Proceedings ofthe 49th Annual Meeting of the Association for Com-putational Linguistics: Human Language Technologies-Volume 1 (2011), pp. 1476–1485.

[41] K. Stevens, P. Kegelmeyer, D. Andrzejewski, and D. But-tler, in Proceedings of the 2012 Joint Conference onEmpirical Methods in Natural Language Processing andComputational Natural Language Learning (Associationfor Computational Linguistics, 2012), pp. 952–961.

[42] D. D. Lee and H. S. Seung, Nature 401, 788 (1999).[43] D. D. Lee and H. S. Seung, in Advances in Neural Infor-

mation Processing Systems (2001), pp. 556–562.[44] W. Xu, X. Liu, and Y. Gong, in Proceedings of the

26th annual international ACM SIGIR conference on Re-

search and development in informaion retrieval (ACM,2003), pp. 267–273.

[45] Y. Zhao and G. Karypis, Machine Learning 55, 311(2004).

[46] R. McKee, Substance, Structure, Style, and the Princi-ples of Screenwriting (HarperCollins, New York, 1997).

[47] S. Min and J. Park, in Complex Networks VII: Studiesin Computational Intelligence (2016), pp. 257–266.

[48] S. Wasserman and K. Faust, Social Network Analysis:Methods and Applications (Cambridge University Press,1994).

[49] P. V. Marsden, Annual Review of Sociology pp. 435–463(1990).

[50] As a mature-rated show, the titular character’s identityand the overarching plot of the drama may be discom-forting and too cruel for some readers to describe here;we refer the interested to seek appropriate sources formore information.

[51] The decomposition is also approximate in practice, i.e.M ' QH with the difference between M and QH (calledreconstruction error) measured by the squared Frobeniusnorm.

[52] Although our network appears denser than typical socialnetworks [48, 49], this is likely due to the fact that mostcharacters of the novel are involved in some common plotwhile the rest of the story world is pushed into the back-ground.


Recommended