! 1!
Discovering hidden motifs in ISIS jihadi texts by using text mining
1) Project*abstract
The Islamic State of Iraq and Sham (ISIS) has been a worldwide growing concern due to
its brutal attacks and unconventional recruitment strategy. Emerging in the middle of
Syrian civil war, ISIS has distinguished itself from other jihadi groups in the region
including Al-Qaeda affiliated Al-Nusra Front. It has been observed that ISIS is not only
successful in managing multi-cultural manpower, but also highly capable of attracting
foreign fighters through its improved “netwar” skills when compared to other groups.
Earlier studies demonstrated how the group operates and manages human resources.
Furthermore, it has been also shown that how the propagandist videos, images, and texts
are disseminated through Internet with the adoption of social media platforms by the
organization. Due to its persistent online presence ISIS remains as a global threat not
only for the people of Middle East, but also Western youth many of which have already
chosen to fight for ISIS. Whereas propaganda videos and images have been analyzed to
account for any propaganda technique, a systematic analysis of textual content is still
lacking. Therefore, we plan to examine ISIS texts to discover propagandist elements that
can have impact on Western individuals. We will be using three different data sources
that include ISIS periodical, Dabiq magazine, and Twitter and Web postings of
supporters. Using text mining, natural language processing, and topic modeling
algorithms, we aim to extract features in those texts and interpret our findings with
existing literature, news and religious resources. Hence, we will be able to learn what
types of propaganda ISIS adopts, which incidents can be linked to textual content and
how they frame Islamic motifs.
! 2!
2)*Description*of*project*
a)*Literature*review*The Islamic State of Iraq and Sham (ISIS) has emerged as a rebel organization in the
Syrian war that is considered to be the most socially mediated conflict in history [1]. As
aiming the Caliphate, ISIS is not only in war with Syrian and Iraqi governments, but also
in a competition with other jihadist groups such as al-Qaeda affiliated Nusra Front [2].
That ambition towards territorial expansion for the Caliphate resulted in extreme violence
on the ground and caused a big fear in the region.
In addition to the struggle in the physical battlefield, there has been a battle in the
digital field in which ISIS strove to distribute its message in different forms such as
beheading videos of two American journalists in orange jumpsuits [3]. Such “theater of
terror” has been around to intimidate a particular audience [4] for years, but
dissemination of that message through social media and jihadi forums has become a
recent tactic since the death of Osama bin Laden [5]. Whereas earlier means of
propaganda used to make terrorist organizations dependent on mainstream media such as
Al-Jazera for Al Qaeda, current social media platforms eliminated such dependency [6]
and opened room for both propaganda and recruitment.
As jihadists are encouraged to show up in the digital battlefield [5], extensive
online presence of ISIS has demonstrated how the new media paradigm is widely
accepted among the supporters. Indeed, being resilient and acting in small and dispersed
groups on the Internet, ISIS seems to coordinate and conduct campaigns successfully as
anticipated in the form of “netwar”, which was introduced by Ronfeld and Arquilla [7].
Authors intentionally distinguish netwar from cyberwar since it relies on a societal level
involvement of people in a conflict, whereas cyberwar requires inclusion of military
forces in the competition [8].
Being a major security threat for Western countries, ISIS is still in need of
manpower to reach its goals. In addition to local fighters in the region, the organization
seeks to recruit foreign fighters including European citizens. In the literature, foreign
fighter is defined by David Malet and Thomas Haggemmer with slight differences. While
Malet defines foreign fighters as “non-citizens of conflict states who join insurgencies
during civil conflicts” [9], Haggemmer gives the following definition: “an agent who (1)
! 3!
has joined, operated within the confines of an insurgency, (2) lacks citizenship of the
conflict state or kinship links to its warring factions, (3) lacks affiliation to an official
military organization, and (4) is unpaid.” [10]. Under the light of these two definitions,
we refer to any individual leaving his home country to join ISIS as a foreign fighter in
this study.
Whereas Al Qeada in Iraq failed to manage foreign recruits, ISIS exhibits a
multinational corporation behavior that can successfully recruit foreign fighters,
including those with good jobs in the West [11], and integrate multi-cultural manpower
[12]. They first start segregating foreigners from natives and assessing their skills in order
to allocate human resources appropriately. For example, if foreign recruit has social
media skills, but lacks from combat skills, a job is assigned accordingly [13]. In addition
to those technically talented people, wives and young female supporters help the
recruitment cycle running with their linguistics skills in the back office. Consequently,
ISIS succeeds in both attracting and employing foreign fighters that is mostly due to its
persistent online presence, which is again sustained by manpower [6, 14].
Studies showed that ISIS has effective propaganda machinery, which not only
publicizes their brutality, but also demonstrates members’ familiarity with Western
lifestyle [2] as illustrated in Figure 1. Furthermore, their discourses include themes such
as governance and justice, which are presented in English with a good choice of music
and sound legitimate to Western youth [15]. As a result, we even observe fleeing females
to Syria [16]. However, reasons to leave homes vary among the foreign fighters and
cannot be explained by a single factor. While some are motivated by the millennial-
apocalyptic promises [17], others might search for identity or have desire to impress the
local community, but not necessarily they are all alienated and socio-economic
underperformers. In fact, it was reported that there were also participants having
wealthier background especially coming from UK [2].
! 4!
Figure 1. A French fighter in Syria holds a jar of Nutella1 symbolizing the familiarity with
Western lifestyle.
While there is no single reason for a foreign fighter to support or join ISIS, the
paradigm shift from mainstream media to online social media platforms has been
undeniably playing a major role in attracting young people [18]. Especially the increasing
popularity of Twitter, which masks the user with its anonymity [19], lets unregulated
conversation in contrast to earlier Internet forums that could be policed [20]. Thus,
information from battleground is spread across the globe through the disseminator
accounts that are mostly fluent in both Arabic and English and take time to interact their
followers. On the other hand, official accounts, who are the source of the message and
primarily tweet in Arabic [19], are not actively involved in conversations with the
followers.
Earlier work about ISIS successfully addresses several questions about their
modus operandi, financial resources, and their structure [21]. It has been demonstrated
how ISIS has been so effective in recruitment [22] by using E-Jihad methods. Quiggle
further highlighted that ISIS disseminators are excellent narrators and their symbols in
the videos are chosen very carefully to convey the desired message to the enemies and
members of the group [23]. Al-Khateeb and Agarwal investigated how these beheading
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!1!https://twitter.com/GuyVanVlierden/status/479582210413305856!
! 5!
videos distributed across Twitter and they found out that bot accounts were also utilized
to disperse the content [24]. Ingram and Colas studied ISIS periodical, Dabiq, which is
another propaganda means targeting English-speaking audiences [25-27]. Authors
discussed structure, content and potential purposes of Dabiq, but they did not use any text
mining approach.
As for the automated text analysis, Vergani et al. analyzed first eleven issues of
Dabiq, by using Linguistic Inquiry and Word Count (LIWC) [28] to count the words of
propaganda types such as achievement, affiliation, power [29]. Ghajar-Khosravi et al.
examined the tweets posted by 14 ISIS fan girls, which is a term introduced for the
female sympathizers of ISIS [30], by counting occurrences of selected words that were
assumed to account for sentiments of the tweets [31]. Ashcroft et al. worked on the
classification of tweets to distinguish jihadi texts, but they did not necessarily focus on
ISIS [32].
b)*Contribution*The main contribution of this project will be a comprehensive automated analysis of ISIS
texts in English, which include propaganda elements and eventually target recruitment of
Western youth. Earlier studies, which mostly focus on videos and images, roughly
categorized ISIS text to report propaganda elements [15]. Very few studies performed
text mining on ISIS-related texts [30, 31], and those studies were limited to either word
frequencies or number of Twitter users. For a comprehensive textual analysis, we will be
applying a series of different text mining and natural language processing (NLP)
techniques on various ISIS texts (Dabiq, Twitter, Web), which can help us uncover the
propaganda motifs.
c)*Purpose*and*Problem*Statement*As tweets and other short links on the Internet are just used to refer individuals to the
actual message, which may be sometimes a longer article rather than video or image, the
content is always tailored for the purpose of propaganda. It turns out that ISIS is effective
not only in disseminating the message, but also in tailoring the content so that the group
seems more appealing than everything for those who dare to leave their homes. That
tailoring stage needs a reverse engineering, which can identify the pattern in ISIS jihadi
texts such as Dabiq. Therefore, by using text mining techniques, we plan to investigate
! 6!
how the content is built, what kind of Islamic symbols or discourses are exploited. In
doing so, our study will serve for the purpose of a productive strategy [33], which adopts
the use of the information violent extremes are sharing with others rather than censoring
the content.
d)*Objectives*
I. To determine how the propagandist texts are exploiting Islamic values or any
other universal values to attract foreign fighters, especially Western youth. In
order to achieve this, we plan to systemically go through ISIS periodicals named
Dabiq first and come up with a distribution of the words that will eventually tell
us which values/symbols are emphasized to strength the propaganda. For instance,
are they using “justice” more than “war” to frame their message?
II. To assess at what level fear and violence is used to carry out their messages. It is
already known that ISIS mostly uses videos or images showing extreme violence
for propaganda. However, the group is also effective in using the blogosphere
where supporters posts longer texts and promote them by using Twitter or
justpast.it2. As such, Dabiq has rich textual content even though it still reserves
space for images. We aim to detect the terms related to violence in ISIS text and
identify any co-existence of those terms with the words from holy scriptures.
III. To extract the holy text from which they are more inspired. Since ISIS also has to
keep the group cohesive and attract local fighters, their narratives mostly include
either verses from Qur’an or hadiths, which are the sayings of Prophet
Mohammed. Our goal is to reveal the pattern of those holy words to understand
how ISIS interprets Islam and wants it to be perceived. Then, we will be able to
see how they are aligned with other Islamic schools or sects.
IV. To understand which religious groups or communities they are targeting at. ISIS
ideology strongly rejects even other Islamic schools and the group suppresses
individuals once they invade a new territory. During their territorial expansion for !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!2 https://justpaste.it/
! 7!
the Islamic Caliphate, local people have been the primary target of their brutal
attacks and some other incidents such as Paris bombing was also attributed to the
same group. Similarly, any other country might be under the same threat, which
puts the group on the spot as a globally growing concern. For these reasons, we
see the value of ISIS digital documents that can be examined to find out whether
past targets or possible prospective incidents are mentioned in the textual content.
By applying natural language processing (NLP) tools [34], we expect to recognize
locations and other faith or religious groups that were mentioned in the text.
V. To integrate different temporal data sources to see any possible correlation
between their attacks and narration. Thus far, we have got digitized forms of all
Dabiq issues [35] and we are in the process of crawling Twitter and the
blogosphere to attain the postings that promote ISIS. Having various data sources,
we plan to do an integrated data analysis out of which we expect to see whether
the content of various datasets are aligned in terms of timing. Specifically, we
plan to employ a topic modeling approach [36], which can summarize the
topics/themes of the documents. Relying on the similarities, we are hoping to
elucidate the facts that led attacks conducted by the group.
VI. To analyze the sentiments in jihadi texts of ISIS. The primary goal of a
propagandist jihadi text is to influence the audience and the Web is a domain
where plenty of opinion- and emotion- related such posting exists [37]. Such
directional or opinionated text is a strong factor, which shapes the perceptions
[38]. By using sentiment analysis tools that were developed to measure the
polarity in a sentence or document, we expect to know how the authors embed the
emotions in the text so that readers change their decisions.
e)*Research*Methodology*In our study, we adopt a series of text mining and natural language processing (NLP)
techniques that perform a wide-range analysis of ISIS text. By implementing those
algorithms on various datasets appropriately, we plan to realize our objectives listed
! 8!
above. We summarize our methodological work in Figure 2 and provide the explanation
in the following paragraphs.
Figure 2. Aspects of proposed research along with the techniques and the outcomes
Our initial analysis will start with a thorough exploration of Dabiq issues, which
have been around since the declaration of the Caliphate by ISIS in July 2014. We will
first examine the frequency of the words that have been used in the articles authored by
ISIS volunteers. Next, co-occurrence of the words will be sought to address whether any
group of words are intentionally used together in the same sentence/paragraph/article. In
order to achieve this, we will be utilizing N-gram models [39] that consider N
consecutive words appearing together. Thus, we will learn how often N consecutive
words follow each other to highlight a message. The same approach can also help us
predict possible terms that might have been used after another set of terms. The third
approach when looking into co-occurrences or associations regardless of consecutiveness
is to reveal frequently appearing bag of terms in an article. This is also expected to give
! 9!
further insight about the authorship. To accomplish this, we will employ frequent itemset
and association rule mining. Consequently, all these three angles are expected to help us
addressing objectives I and II, since frequency of the terms and their co-
existence/association with extremism- or faith- related words will be determined.
In Dabiq, we observe that ISIS cites various verses or hadiths within the text and
blend these sentences with their own words to convey a propagandist message. Using the
above techniques, we will focus on these quoted holy texts and compare the occurrence
and co-existence of the words used with whole article. Further, we plan to do a contextual
evaluation of the verses to determine whether they are actually dominant in their original
chapter in Qur’an. Specifically, we plan to implement a topic modeling approach [36],
which assumes that a text is a weighted (probabilistic) mixture of words, produces groups
of words that represent a topic or theme. By looking into the topics generated, we will
detect whether a cited verse is deliberately taken out from its original context. This
comparative analysis will help us achieve objective III, which aims to demonstrate the
interpretation of Islam by ISIS and its way of framing the religion for propaganda
purposes.
By using further NLP tools that can perform entity recognition based on the
sequence of words, we will first determine entities such as person/community/group
names and locations. Out of those entities, previously targeted ones will be marked as the
positive group. Then, we will revisit the text to inspect associated words in the
neighborhoods of all discovered entities regardless of their group label by using N-gram
models. In doing so, we expect to discover pattern of wording around the positive group.
Further, we will extend the neighborhood to sentences and paragraphs, which will give us
the opportunity to measure the opinion by using sentiment analysis. Resulting features
from these two approaches will be quantified and used to check the existence of any
discrepancies or similarities between past and potential targets that was aimed in
objective IV.
To achieve the goals highlighted in objective V, we will follow the same
procedure for the Web content and report our content-specific findings. However, having
a different type of data form, which is more dynamic, we need to consider the time course
of the events such as release of the beheading videos and attacks. At this stage, we plan to
! 10!
use an extended version of topic modeling, dynamic topic modeling [40], which extracts
the topics over time. Table 1 illustrates our preliminary findings from a topic modeling
experiment that we conducted on Dabiq issues. We inferred the topics/themes by looking
into the bags of words that were returned by the model. Similarly, dynamic topic
modeling will enable us to observe the change of topics over time. Hence, we expect to
measure the parallelism between the content of Dabiq issues and the conducted
propaganda on the Web and the real incidents in particular time frames. Since
associations are quantified by the probabilistic values in topic modeling, similarity can be
measured to account for any correlation between narratives, too.
Table 1. Topics obtained from preliminary analysis of Dabiq issues.
Topic 1 (Middle East/politics)
Topic 2 (Jihadi discourse)
Topic 3 (Administrative)
Iraq Allah Iraq Military State Hijrah Syrian Islamic Dabiq Iran Muslims Land Political War Authority Syria Muslim Remaining Council Soldiers Tribes War People Ummah
Since we hypothesize that the language used in the jihadi texts includes a
customized style, which can influence the audience emotionally, we plan to utilize
sentiment analysis as indicated in objective VI. Hence, we will attain a score for each text
that quantifies the emotions within a scale. By using those scores and the topics obtained
earlier, we will observe which emotions are chosen to convey a particular theme of
propaganda. As a result, we will be able to draw conclusions about the authorship styles
by sentiment scores and the types of black propaganda by categorizing discovered
themes.
! 11!
f)*Timeline*Our project timeline is illustrated in Figure 3.
Figure 3. Timeline
g)*Limitations*Our first limitation is that text data often include noise that need to be filtered out. We
indicated a preprocessing step in our timeline to clean our data, but this may also cause
some information loss. Although this is a limitation for whole text mining field, we see
the value of emphasizing this point. Secondly, our research is limited to the text in
English. However, it has been reported that most of the ISIS texts are in Arabic [15]. We
are currently exploring tools that can enable us to incorporate Arabic content; and
therefore, this remains as a future direction for our study.
h)*Participant*Roles*The PI has an extensive experience in the field of text mining with several publications
and will be employing aforementioned algorithms and procedures. Therefore, the PI will
train a student assistant and guide him/her throughout the data collection and data
analysis stages. The PI will finalize and verify the findings and hold regular meetings
with Co-Investigators.
Two external Co-Investigators from two different institutions will take part in the
project and they will contribute to the project as follows:
! 12!
• The first Co-Investigator has expertise on terrorism and criminal justice with
years of experience and several publications. Our findings from data analysis will
be interpreted or validated by this person.
• The second Co-Investigator is highly experienced in such algorithms and
programming. We will benefit from the expertise of this person and the
computational tasks in Figure 2 will be also shared among us.
The student will work under the supervision of the PI and s/he will be required to
attend weekly meetings to report about the progress of data collection and analysis.
His/her further time will be allocated to do literature review and to write assigned
sections of a potential conference/journal paper in which the student will be listed as a
co-author.
i)*Professional*Outcomes*
• This research is expected to introduce an exemplary work of computational social
sciences that includes computational sciences, criminal justice, and terrorism.
• By this interdisciplinary work in which a computer science application will
address research questions in terrorism, the PI will gain experience in a new field
that can bring external funding opportunities.
• We will submit our findings to publish a minimum of two papers in peer-viewed
conferences or journals
• The student assistant will gain new skills and experience not only in data analysis,
but also in programming. The student will get exposed to a real research
environment by which s/he is expected to publish papers.
j)*Future*Plans*Since the PI has experience in using aforementioned techniques, this project will provide
the opportunity to project earlier knowledge into the field of terrorism. After gaining
insight in this new field, our next goal is to collect and integrate Arabic text, which will
complement the current work from various aspects. With these research aims in mind, we
plan to seek for new funding opportunities from national institutions such as Department
of Defense (DOD).
! 13!
k)*Quality*of*Life/Economic*Development*Although there may not be a direct contribution to the UM-Flint community/campus, we
believe that this study can be informative for the undergraduate students as ISIS targets
the similar age group. Hence, we hope to contribute to improved national security3 by
addressing the research questions above. Moreover, a new kind of interdisciplinary
research field would be introduced to UM-Flint community, which may stimulate more
inter-department collaborations between Computer Science and Criminal Justice faculty.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!3!NSF Grant Proposal Guide; Chapter II, Section C.2.d.i, paragraph 3. !
! 14!
3)*Bibliography* 1. Lynch, M., D. Freelon, and S. Aday, Syria’s socially mediated civil war. United
States Institute Of Peace, 2014. 91(1): p. 1-35. 2. Gates, S. and S. Podder, Social Media, Recruitment, Allegiance and the Islamic
State. Perspectives on Terrorism, 2015. 9(4). 3. Lee, D., James Foley: Extremists battle with social media, in BBC News. 2014. 4. Weichert, S.A., Gabriel Weimann: Terror on the internet. The new arena, the new
challenges. Publizistik, 2007. 52(1): p. 130-131. 5. Prucha, N., Online Territories of Terror–Utilizing the Internet for Jihadist
Endeavors. ORIENT IV, 2011: p. 46. 6. Klausen, J., Tweeting the jihad: Social media networks of Western foreign fighters
in Syria and Iraq. Studies in Conflict & Terrorism, 2015. 38(1): p. 1-22. 7. Ronfeldt, D. and J. Arquilla, Networks, netwars and the fight for the future. First
Monday, 2001. 6(10). 8. Arquilla, J. and D. Ronfeldt, The advent of netwar (revisited). Networks and
netwars: The future of terror, crime, and militancy, 2001. 1382: p. 1. 9. Malet, D., Why Foreign Fighters?: Historical Perspectives and Solutions. Orbis,
2010. 54(1): p. 97-114. 10. Hegghammer, T., The rise of Muslim foreign fighters: Islam and the globalization
of jihad. 2011. 11. Bloom, M., Constructing Expertise: Terrorist Recruitment and “Talent Spotting”
in the PIRA, Al Qaeda and ISIS. Studies in Conflict and Terrorism, 2016. 12. Mapping militant organizations’,. [cited 2016 Feb 25]; Available from:
http://web.stanford.edu/group/mappingmilitants/cgi-bin/groups/view/1. 13. Lake, E., Foreign Recruits are Islamic State’s Cannon Fodder. Bloomberg View,
2015. 14. Berger, J., The Metronome of Apocalyptic Time: Social Media as Carrier Wave
for Millenarian Contagion. Perspectives on Terrorism, 2015. 9(4). 15. Zelin, A.Y., Picture Or It Didn’t Happen: A Snapshot of the Islamic State’s
Official Media Output. 2015. Vol. 9. 2015. 16. Frankl, A., Moths to a Flame: Why Are Young British Women Drawn to The
Islamic State? Journal of Promotional Communications, 2016. 4(1). 17. Musselwhite, M., ISIS & Eschatology: Apocalyptic Motivations Behind the
Formation and Development of the Islamic State. 2016. 18. Lieber, P.S. and P.J. Reiley, Countering ISIS’s Social Media Influence. Special
Operations Journal, 2016. 2(1). 19. Carter, J.A., S. Maher, and P.R. Neumann, # Greenbirds: Measuring importance
and influence in Syrian foreign fighter networks. 2014. 20. Berger, J., War on error. Foreign Policy, 2014. 21. Arango, T., et al., How ISIS Works, in New York Times. 2014. 22. ISIS recruits fighters through powerful online campaign, in CBS NEWS. 2014. 23. Quiggle, D. The ISIS Beheading Narrative. Small Wars Journal. 2015 [cited
2016 Feb 25]; Available from: http://smallwarsjournal.com/jrnl/art/the-isis-beheading-narrative.
! 15!
24. Al-Khateeb, S. and N. Agarwal. Examining Botnet Behaviors for Propaganda Dissemination: A Case Study of ISIL's Beheading Videos-Based Propaganda. in IEEE International Conference on Data Mining Workshop, (ICDMW) 2015. 2015. Atlantic City, NJ, USA.
25. Brandon, C., What Does Dabiq Do? ISIS Hermeneutics and Organizational Fractures within Dabiq Magazine. Studies in Conflict & Terrorism, 2016. 0(0): p. 1-18.
26. Ingram, H.J., An Analysis of Inspire and Dabiq: Lessons from AQAP and Islamic State's Propaganda War. Studies in Conflict and Terrorism, 2016.
27. Ingram, H.J., An analysis of Islamic State's Dabiq magazine: Australian Journal of Political Science: Vol 51, No 3. 2016. 51(3): p. 458-477.
28. Tausczik, Y.R. and J.W. Pennebaker, The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. 2010.
29. Vergani, M. and A.-M. Bliuc, The evolution of the ISIS'language: a quantitative analysis of the language of the first year of Dabiq magazine. 2015.
30. Huey, L. and E. Witmer, #IS_Fangirl: Exploring a New Role for Women in Terrorism. Journal of terrorism, 2016. 7(1).
31. Ghajar-Khosravi, S., et al., Quantifying Salient Concepts Discussed in Social Media Content: A Case Study using Twitter Content Written by Radicalized Youth. Journal of Terrorism, 2016. 7(2).
32. Ashcroft, M., et al. Detecting Jihadist Messages on Twitter. 7-9 Sept. 2015. Intelligence and Security Informatics Conference (EISIC), 2015 European: IEEE.
33. Neumann, P.R. and B.P. Center, Countering online radicalization in America. 2012: Bipartisan Policy Center.
34. Finkel, J.R., T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by gibbs sampling. in Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 2005. Association for Computational Linguistics.
35. Clarion Project. [cited 2016 Feb 20]; Available from: http://www.clarionproject.org/news/islamic-state-isis-isil-propaganda-magazine-dabiq.
36. Blei, D.M., A.Y. Ng, and M.I. Jordan, Latent dirichlet allocation. the Journal of machine Learning research, 2003. 3: p. 993-1022.
37. Abbasi, A., H. Chen, and A. Salem, Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums. ACM Transactions on Information Systems (TOIS), 2008. 26(3): p. 12.
38. Picard, R.W. and R. Picard, Affective computing. Vol. 252. 1997: MIT press Cambridge.
39. Cavnar, W.B. and J.M. Trenkle, N-gram-based text categorization. Ann Arbor MI, 1994. 48113(2): p. 161-175.
40. Blei, D.M. and J.D. Lafferty. Dynamic topic models. in Proceedings of the 23rd international conference on Machine learning. 2006. ACM.