Date post: | 02-Mar-2018 |
Category: |
Documents |
Upload: | anita-jankovic |
View: | 222 times |
Download: | 0 times |
of 14
7/26/2019 Short Text Coherence Hypothesis
1/14
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/271167995
Short Text Coherence Hypothesis
Article in Journal of Quantitative Linguistics August 2016
Impact Factor: 0.33 DOI: 10.1080/09296174.2016.1142328
READS
127
4 authors:
Sylvia Poulimenou
Ionian University
3PUBLICATIONS 1CITATION
SEE PROFILE
Sofia Stamou
Ionian University
68PUBLICATIONS 369CITATIONS
SEE PROFILE
S. Papavlasopoulos
Ionian University
39PUBLICATIONS 69CITATIONS
SEE PROFILE
Marios Poulos
Ionian University
77PUBLICATIONS 559CITATIONS
SEE PROFILE
All in-text references underlined in blueare linked to publications on ResearchGate,
letting you access and read them immediately.
Available from: Marios Poulos
Retrieved on: 13 June 2016
https://www.researchgate.net/profile/S_Papavlasopoulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_4https://www.researchgate.net/profile/S_Papavlasopoulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_5https://www.researchgate.net/profile/Sylvia_Poulimenou3?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/profile/Sofia_Stamou?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_1https://www.researchgate.net/profile/Marios_Poulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/profile/Marios_Poulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_5https://www.researchgate.net/profile/Marios_Poulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_4https://www.researchgate.net/profile/S_Papavlasopoulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/profile/S_Papavlasopoulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_5https://www.researchgate.net/profile/S_Papavlasopoulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_4https://www.researchgate.net/profile/Sofia_Stamou?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/profile/Sofia_Stamou?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_5https://www.researchgate.net/profile/Sofia_Stamou?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_4https://www.researchgate.net/profile/Sylvia_Poulimenou3?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/profile/Sylvia_Poulimenou3?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_5https://www.researchgate.net/profile/Sylvia_Poulimenou3?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_4https://www.researchgate.net/?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_1https://www.researchgate.net/publication/271167995_Short_Text_Coherence_Hypothesis?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_3https://www.researchgate.net/publication/271167995_Short_Text_Coherence_Hypothesis?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_27/26/2019 Short Text Coherence Hypothesis
2/14
[1]
Short Text Coherence Hypothesis
Sylvia Poulimenou, Sofia Stamou, Sozon Papavlasopoulos, Marios PoulosLaboratory of Information Technologies
Faculty of Information Science and Informatics - Ionian University
IoannouTheotoki 72-CorfuCorresponding Author: [email protected]
Abstract: In this paper, we experimentally study the degree to which the length of a short text affects its
comprehensiveness and readability, within quantitative linguistics. The quantitative linguistics focus mainly in
analysis of large text collections and one of the major scientific theories in use is the Menzerath-Altmann law. In
this paper we attempt to define the quantitative analysis framework for short texts consisting approximately of one
or two sentences, due to the fact that they are considered very important in many scientific fields. To achieve the
aim of this paper, a coherence statistical testing process of 3 variables was created for short texts. The
implementation of that was possible through experimental and statistical evaluation. Upon completion of the above
mentioned evaluation, the statistical results showed that short text coherence, comprehensiveness and readability
are fully achieved in short texts consisting of 14 words, when 3 predetermined variables are associated and vice
versa. To prove the above hypothesis the theory of Vector Space Model and Kendalls Coefficient of Concordance
were used. The assessment of statistical results concluded that the above hypothesis can be fully met for a number
of cases with a probability p>99%. Moreover, in the experiment were used short texts in English language but it
was proven that language can be considered irrelevant. To corroborate this, a smaller scale experiment with short
texts in the German language was conducted and hypothesis was confirmed, that the proposed model of this paper
can be applied in all short texts regardless of their linguistic origin.
Keywords: Short Text Processing, Vector Space Model, Lexical Coherence
1.
Introduction
In the quantitative linguistics theory, the lexical coherence of texts regarding word distribution is considered as a
very important scientific field. According to Carstens (2001) text linguistics coherence is defined as the ways inwhich components of the sentences of a text, i.e. the words we actually hear and use, are mutually connected
(grammatically and lexically). Haliday and Hasan (1976) underline cohesion as a semantic relation between one
element and another in the text and some other element that is crucial to the interpretation of it. In addition as
mentioned by Fahnestock (1983), coherence derived from correct text composition is a crucial element so that
ideas can flow smoothly throughout the text and so that readability can maintain high levels in comprehending the
texts meaning. Based on Richards et al. (1992), readability means: how easily written materials can be read and
understood. This depends on several factors including the average length of sentences, the number of new words
contained, and the grammatical complexity of the language used in a passage. Moreover, the concept of
readability is associated with the concept of comprehensiveness. Sparks (2012) mentions that discourse
comprehension involves building meaning from extended segments of language. Moreover that successfully
comprehending larger unit of text and discourse requires making inferences to connect ideas both within and
across local and global discourse contexts.
As mentioned in Eroglu (2013) linguistic organization in texts can be accomplished, where the Menzerath-
Altmann law (MA) exists in Altmann, G. (1980). MA law is considered a fundamental law in quantitative
linguistics, where one can observe the relationships between the size of the whole and the size of the parts in
language according to Baixeries et al. (2013). In short, according to MA law in Eroglu S. (2013), the longer a
linguistic construct the shorter its constituents, where a construct is considered to be the whole and a constituent to
be a part of the whole. MA law is a basic and important law in quantitative linguistics where the main focus is in
statistical analysis of large texts. In particular, as mentioned by Hebek (2002), MA law does not apply in
extremely short texts whereas short texts can be considered as a sentence or a complex sentence. However, short
texts are considered very important in many scientific fields, mainly in online communication and e-commerce but
also in quick searching in internet. According to Ge Song et al. (2014) it is considered a big challenge to classify
short texts because their limited number of words cannot represent either the feature space or the actual
relationships between words and documents. Because short texts have small word length Xiaojun Quan (2009)
explains, similarity measure cannot be applied successfully due to the lack of word co-occurrence or sharedcontext.
Journal of Quantitative Linguistics
Acceptance Date 5 Dec. 2014
in press -Volume 22 Issue 3
Taylor & Francis Group
https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220283480_Short_text_similarity_based_on_probabilistic_topics_Knowl_Inf_Syst?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220283480_Short_text_similarity_based_on_probabilistic_topics_Knowl_Inf_Syst?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==7/26/2019 Short Text Coherence Hypothesis
3/14
[2]
Its a known fact that people use sentences in order to communicate with each other successfully and certain
parameters need to be taken under consideration, due to the fact that correct sentencing is quite an important issue.
In successful communication there has to be an average length of the sentence, so that it is not confusing, or
complicated. Thus, the lexical coherence of the sentence has been introduced empirically by Kornai (2008), in
journalistic prose that the median sentence length is above 15 words. Moreover, Titelov (1992) mentions that
during the 80s there was interest in scientific research of the sentence in the context of syntactic phenomena.
Furthermore, the complexity of sentence meaning depends on the length of the sentence. According to Cutts(2009) the average sentence length lies in between 15-20 words, in order to maintain readability, when the average
length cannot always be achieved. Taskar B. et al. (2004) in their work regarding descriptive parsing and dynamic
approach techniques undertake their experiments by setting a restriction of sentence length equal or less than 15
words. Other scientific fields that express interest regarding sentence length in research considering memory are
cognitive psychology and neuroscience. Baddeley (2003)in his three part model for working memory encountered
problems in interaction to long term memory, where the limitation of 15 words per sentence in mentioned again. In
their experiment Daveman and Carpenter (1980) support that it requires about 5 seconds for a person to read a
sentence. In addition Anderson et al. (2001) in their study concerning sentence memory, where they describe
various models of memory, observed that in order for the brain to process a word, it takes a few hundred
milliseconds. The same research refers to an experiment conducted by Zimny (1987) where a word by word
presentation procedure was completed, with 300 ms per word. Therefore, from the above mentioned researches
one can draw the conclusion that a typical sentence must contain approximately 16 to 17 words. Taking everything
into account, coherence analysis in short texts and especially in sentences can be considered as a very important
scientific field to be explored. The lexical coherence in sentences has been empirically observed and placed
between 15 to 20 words per sentence.
The aim of this paper is to establish a statistical hypothesis that corroborates the empirical observations regarding
the coherence of short texts or of a sentence. That way the gap left from the MA law considering short texts can be
filled and this study can set the cornerstone in short text analysis.
For the implementation of the above, we tracked variables which are considered crucial regarding text coherence.
Text coherence was examined regarding the impact in terms of each constituent to the construct. That correlation
was feasible through the use of three variables. The proof of the above hypothesis was realized through the use of
Kendalls coefficient of concordance. The conclusions emerged from that methodology introduce an innovation in
the field of computational linguistics because the above experimental linguistics observations about short text
stability around 15 constituents are verified, which demonstrates statistically the correlation of the short text
through those 3 variables.This paper is divided to the following parts:
a) Methodology, where the algorithm is presented fully in detail. Moreover its statistical evaluation.
b) Experimental part, which is implemented via the application of the algorithm on a wide sample of short
texts.
2. Method
2.1 Vector Space Model
According to Salton, Wong and Yang (1975) documents can be represented as vectors, in order to index them and
find their degree of similarity. As noted by Raghavan and Wong (1986) vectors are quite useful since they obey
basic axioms and algebraic rules. Vector Space Model (VSM) is used in several scientific fields such as
information filtering and relevancy rankings. According to Turney and Patel (2010)with the usage expansion of
VSM to semantic tasks considering language processing, brilliant results can be found. VSM is an algebraic model
which makes possible the representation of any text object (term), such as document, sentence, clause, phrase,
word and morpheme. The VSM representation can be analyzed in to three steps.
In the first step, the content bearing terms (which are typically words or short phrases) are extracted creating the
document indexing. This indexing is executed via two alternative methods the non - linguistic. The linguistic
methods are based on gathering function words containing high and low frequency which are reflected in
document semantically. On the other hand non - linguistic methods are based on different indexing procedures
such as probabilistic indexing and automatic indexing.
In the second step, the weighting of the indexed terms is created according to relevance to the user in a possible
retrieval procedure. Term weighting has been applied by testing the sensitivity and specificity of the search, where
the specificity is related to precision and sensitivity to recall. There are three dominant variables term weighting,
Journal of Quantitative Linguistics
Acceptance Date 5 Dec. 2014
in press -Volume 22 Issue 3
Taylor & Francis Group
https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==7/26/2019 Short Text Coherence Hypothesis
4/14
[3]
which are related with the term frequency, the collection frequency and the length normalization. These three
variables are combined via propagation way in order to make the resulting term weight.
As an example, the definition of the algebraic expression of a text with VSM is possible by using the following
equation:
1, 2, ,, ...,j j j t jV w w w
(1)
Where j stands for the number of the constituents of a text and t represents the number of weights (variables)
which are defined by the model.
Finally, in the third step, the text is ranked according to similarity measure with respect to the query. The similarity
in VSM is estimated by using connective variables based on the normalized inner product between text vector and
query vector, where constituent overlap indicates similarity. The inner product is usually normalized. The most
current similarity measure is the cosine variable, which measures the angle between the text vector and the query
vector.
2.2 The Basis of the Algorithm
The variable extraction is based on VSM theory. In more details, in first step (section 2.1) we select a short text as
construct which is as sentence or complex sentence and we considered the word as the constituent of the construct.Subsequently according to second step (see section 2.1) a non-linguistic approach is adopted because all the
constituents of the construct are used with an indexing procedure which depends on their order position. In details,each constituent obtains particular weighting independently from its number of appearances in the construct, sot=j.
In our case, the three (3) dominant variables term weighting are replaced by the variables of the following vector
jj jj jj jjw i s k
(2)
Where, iis the order of the constituent in a short text, kis the number of characters and sis an encoding measure
and for simplification reason this is defined by the ASCII encoding procedure (Poulos, Papavlasopoulos,Chrissikopoulos 2006),
Then the equation (1) is transformed into equation (3)
1, 2, ,, ...,j j j j jV w w w
(3)
For normalization reason, the vectorjV
is obtained by the equivalent vector
j
j
j
VF
V
(4)
Furthermore, vectorsV
is depicted as the resultant vector and from now on will be addressed as short text vector
(see figure 1).
s j
j
V F
(5)
2.3 Similarity Criterion
The degree of correlation between thejF
ands
V
can be extracted by equation 6 and specifically of factor rwhich
represents the inner product between text vector sV
and query vectorjF
according to step 3 (see section 2.1).
Also, this procedure is expressed by the general consideration regarding to document similarities theory (Harispe
et al. 2013)
( ) ( )cosj sj j
j s
F Vr
F V
(6)
Journal of Quantitative Linguistics
Acceptance Date 5 Dec. 2014
in press -Volume 22 Issue 3
Taylor & Francis Group
https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==7/26/2019 Short Text Coherence Hypothesis
5/14
2.4
A ve
equa
in th
reprto its
2.5
The
are a
dem
M r
cons
In theach
hypo
the e
Fig. 1 The d
ransformati
ctor jjU
is ad
tion 7). The re
e semantic theo
sents the influinfluence.
Statistic Fou
statistical corro
ssociated. This
nstrated in the
presents the
ituents in the s
is case the statother, in orde
thesis oH in
xact opposite.
epiction of co
on of variab
opted instead
son of this rep
ry according t
nces degree t
ndation
boration of thi
is possible by
formula belo
umber of va
hort text.
istic control isr to define th
icated that the
herefore, the
stituent vecto
les
f vector jjw
lacement took
(Harispe et al
hrough variabl
jjU r
transformatio
applying Ken
:
iables being c
based on thenumber of c
three variable
hi square test i
23
rx
[4]
r (F1, F2) and
here the i va
lace because
. 2013).The re
e rbecause t
j jj jjs k
is achieved b
alls Coefficie
2
1
2
n
j
j
R
FM
orrelated, in
here
1
n
jR
ull hypothesisonstituents tha
are not assoc
s defined by:
*( 1)*n F
resultant vect
iable of jjw
is
ariable ris c
placement was
is will lead in
testing wheth
t of Concorda
1
2 1
n
j
j
R
n
his case M=3
(rank
1
n
j
d
that these thrt influence th
iated and the a
( , , )r s k
r vs in the Eu
replaced by v
nsidered as a
made delibera
a new ranking
er the three (3)
nce (Jerrold H.
and n stands
)j
e variables arparticipating
lternative hypo
lidean plane
riable rof U
ery important
ely since vecto
procedure acc
variables ( r,. Zar 1999), w
for the num
not associateshort text. Th
thesis AH in
j (see
factor
r jU
rding
(7)
, k)
ich is
(8)
er of
(9)
(10)
withe null
icates
(11)
Journal of Quantitative Linguistics
Acceptance Date 5 Dec. 2014
in press -Volume 22 Issue 3
Taylor & Francis Group
7/26/2019 Short Text Coherence Hypothesis
6/14
[5]
The value of chi-square cumulative distribution is extracted by using as degree of freedom 1n and iscalculated by equation 12.
2
0 2
2
21
22
t
x t e
P
(12)
This test is a one-tailed test because we search the appropriate constituent number for a short text. Then, the null
hypothesiso
H is accepted when 0.001P and consequentlyA
H is satisfied with 0.001P .
3. Experimental Part
This experiment lies in the following steps:
1. At first a sample was to be gathered in order to apply the algorithm. The data set of this research consists
of 100 short texts extracted from abstracts of scientific articles. The scientific articles were acquired in pdfform from the Directory of Open Access Journals (DOAJ), all coming from the biomedicine field,
regarding their subject. By using a common browser it was possible to visit the website of Directory ofOpen Access Journals (DOAJ) and to download all 100 articles in a local folder.
2. Continuing, 100 short texts were selected, where each one of them had an average length of more than 25constituents. Each short text was processed until the length of 25 constituents and not further above due tothe reasons below:
Since this algorithm is based on VSM, there is limitation considering long texts. Their representation
cannot be successful because in such length there can be found little similarity variables according to
Salton (1975).
Another reason for choosing for the experiment the limit of 25 constituents resulted from the
experimental procedure, which showed that above the 25th constituent results did not show a
substantial change.
Finally that limitation occurs to the whole sample since the selected sample must be homogeneous in
order to derive valid and objective conclusions on a general level.
Finally, the compilation of the algorithm and the implementation process were carried out by the use of MATLAB
software.
3.1 Implementation Algorithm and Statistical Process
In this section the implementation of the proposed algorithm will be presented by using as an example the
following short text (see Table 1):
Many theorists have suggested that working memory capacity plays a crucial role in reading comprehension
however, traditional measures of short-term memory, like digit span and word span, are either not correlated or
only weakly correlated with reading ability
In the implementation part by equations 2-6, the variables data (r,s,k) are extracted (see figure 7). In Kendalls
Coefficient of Concordance procedure the variables data (r, s, k) are ranked and Rank (Rj) sums are estimated for
each constituent by equations 8-10.
For example the constituent many, which is the first word on the above mentioned example, corresponds to a
vector with deviation angle equal to 1.6435 in relevance with the resultant vector of the short text. S equals to
value 4 (number of characters) and k according to ASCII encoding corresponds to 405. In the same way, the values
for all 37 constituents of the short text are extracted, as one can see in Table 1.
After the process that was analyzed above, the chi square value with (38-1=37) degrees of freedom is calculated by
equation 2 and from this value the cumulative probability 3.5055e-06P is obtained through equation 12.
Finally, by using the one-tailed probability test for 00.001H P
, the null hypothesis is accepted, indicating thatthere is no association among the three variables (r, s, k).
Journal of Quantitative Linguistics
Acceptance Date 5 Dec. 2014
in press -Volume 22 Issue 3
Taylor & Francis Group
7/26/2019 Short Text Coherence Hypothesis
7/14
[6]
Table 1. Algorithm Implementation-Kendalls Coefficient of Concordance-Decision H0(an example short text)
Words r s k Sums of Rjj Data rank data rank data rank
1 1.6435 28 4 12.0 405 8.0 48.0000
2 1.6699 29 9 32.5 997 33.0 94.5000
3 1.3755 23 4 12.0 420 9.0 44.0000
4 1.5487 26 9 32.5 971 32.0 90.5000
5 1.1232 16 4 12.0 433 11.0 39.0000
6 1.3378 22 7 25.5 769 28.0 75.5000
7 1.1818 18 6 21.0 665 22.0 61.0000
8 1.2430 21 8 30.0 846 30.0 81.0000
9 0.8526 11 5 18.0 553 19.0 48.0000
10 4.1013 37 1 1.0 97 1.0 39.0000
11 0.9320 12 7 25.5 739 26.0 63.5000
12 0.2012 4 4 12.0 434 12.5 28.5000
13 1.6753 30 2 3.0 215 3.0 36.0000
14 0.6862 7 7 25.5 730 24.5 57.0000
15 1.1718 17 13 38.0 1402 38.0 93.0000
16 0.6565 6 8 30.0 812 29.0 65.0000
17 0.9587 13 11 37.0 1179 37.0 87.0000
18 0.5982 5 8 30.0 869 31.0 66.000019 3.3125 35 2 3.0 213 2.0 40.0000
20 0.6884 8 10 35.0 1045 34.0 77.0000
21 0.0927 1 7 25.5 709 23.0 49.5000
22 1.2065 20 4 12.0 421 10.0 42.0000
23 0.7047 9 5 18.0 529 18.0 45.0000
24 1.3804 24 4 12.0 434 12.5 48.5000
25 2.8707 33 3 6.0 307 5.0 44.0000
26 1.5666 27 4 12.0 444 14.5 53.5000
27 1.4494 25 5 18.0 478 17.0 60.0000
28 3.3433 36 3 6.0 312 6.0 48.0000
29 0.8056 10 6 21.0 641 20.0 51.0000
31 3.3023 34 3 6.0 337 7.0 47.0000
32 0.1112 3 10 35.0 1061 35.5 73.5000
33 6.3095 38 2 3.0 225 4.0 45.000034 2.4095 31 4 12.0 450 16.0 59.0000
35 1.1958 19 6 21.0 653 21.0 61.0000
36 0.1046 2 10 35.0 1061 35.5 72.5000
37 2.8506 32 4 12.0 444 14.5 58.5000
38 1.1168 15 7 25.5 730 24.5 65.0000
F 0.8023 2 89.06rx 3.5055e-06P accept for 0.001P
3.2 Iteration Procedure
By using the same example of short text the algorithm38
3
JU
is applied. The iteration procedure is carried out
with j=3:1:38 words. In other words, the original short text is segmented into short texts with various lengths
beginning from 3 constituents up to 38 and this procedure is executed for each of the above short texts (see section
3.1). Then the chi-square test is applied (35) thirty five times (see figure 2) and the cumulative probabilities are
estimated as well as the control of hypothesis testing (see figure 3).
Journal of Quantitative Linguistics
Acceptance Date 5 Dec. 2014
in press -Volume 22 Issue 3
Taylor & Francis Group
7/26/2019 Short Text Coherence Hypothesis
8/14
[7]
Fig. 2 The Chi-Square Distribution Function according to iterative procedure of the segmented short text
Fig. 3 The cumulative probabilities and the control of hypothesis testing
3.3 Iteration Procedure in the Data Set
Iteration procedure for each short text in a range of 3 up to 25 constituents is executed totally (22) twenty two
times. The procedure is executed for a sample of 100 data sets, in total 2200 calculations. Then the chi-square test
of the data set is represented (see figure 4) and the cumulative probabilities as well as the control of hypothesis
testing are presented in figures 5 and 6 respectively.
Journal of Quantitative Linguistics
Acceptance Date 5 Dec. 2014
in press -Volume 22 Issue 3
Taylor & Francis Group
7/26/2019 Short Text Coherence Hypothesis
9/14
[8]
Fig. 4 The cumulative probabilities and the control of hypothesis testing
Fig. 5 Data Set of Probability Cumulative Distribution
Fig. 6 The Cumulative Probabilities and the Control of Hypothesis Testing
Journal of Quantitative Linguistics
Acceptance Date 5 Dec. 2014
in press -Volume 22 Issue 3
Taylor & Francis Group
7/26/2019 Short Text Coherence Hypothesis
10/14
[9]
By conducting an assessment for the above statistical results they are divided into 3 parts and make the followingobservations:
1. As one can see the distribution a, x in figure 4 The cumulative probabilities and the control of
hypothesis testing via the experimental function, coherence is observed for the whole sample of 100short texts at the sentence length of 14 constituents. From 14 constituents and more one can observe a
lack of coherence at the experimental function.
2.
According to the experiment as represented in figure 5 Data Set of Probability CumulativeDistribution, all estimated probabilities using equation 11 reject the null hypothesis for a short textlength equal to 14 constituents, with a probability, p
7/26/2019 Short Text Coherence Hypothesis
11/14
[10]
7 21 0 7
7 22 0 7
6 23 0 6
6 24 0 6
3 25 0 33 26 0 3
2 27 0 2
We present an example derived from table 2 in the following sentence:
Die Computerindustrie htte nach Michael Levitt einen Teil des Nobelpreises fr Chemie 2013 vedient denn ihre
Forschungs und Entwicklungsleistung hatte zu drastisch hheren Rechengeschwindigkeiten gefhrt (siehe
Tabelle)
Fig. 7 The German Case of the Cumulative Probabilities and the Control of Hypothesis Testing
Fig. 8 Data Set of Probability Cumulative Distribution from ten (10) German short Texts
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Number of Words
CumulativeProbabilities
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Number of Words
PValues
Journal of Quantitative Linguistics
Acceptance Date 5 Dec. 2014
in press -Volume 22 Issue 3
Taylor & Francis Group
7/26/2019 Short Text Coherence Hypothesis
12/14
[11]
As can be seen in Table 2 and in figures 7 and 8, the experiments results conducted with short texts from the
German are in agreement with the main large scale experiment that was carried out in the beginning of the
experimental part. That proves that constituents of a short text in the proposed model can be considered as
morphs, where the linguistic origins of the short text are disregarded and therefore the language of the text is not
significant. More specifically, according to Figures 7 and 8 the cumulative probability begins to declinedrastically between 15th and 16thword , where p
7/26/2019 Short Text Coherence Hypothesis
13/14
[12]
the possible relation between the number of variables and the constituents number in extended fields (such
biology) should be considered as an axis to the systems coherence and as the upcoming scientific priority.
References
Altmann, G. (1980). Prolegomena to Menzeraths Law. Glottometrika 2, 110. Bochum: Brock-meyer,
Anderson J. R., Budiu R. and Reder L. M. (2001). A theory of sentence memory as part of a General Theory ofMemory. Journal of Memory and Language. 45. 337-336.
Baddeley A. (2003). Working memory: looking back and looking forward. Nature reviews - neuroscience. 5. 829-839.
Baixeries J. et al. (2013). The parameters of the Menzerath-Altmann Law in genomes. Journal of Quantitative
Linguistics. 20 (2). 94104.
Baixeries J., Hernndez-Fernndez A., Ferrer-i-Cancho R. (2012), Random models of MenzerathAltmann law ingenomes. Biosystems. 107 (3), 167173.
Carstens W. (2001). Text Linguistics: relevant linguistics? Poetics and linguistics; discources of war andconflictConference. 588-595.
Cutts, M. (2009). Oxford guide to plain English. 3rded. Oxford: Oxford University Press.
Daveman M., Carpenter P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior. 19. 450-466.
Eroglu S. (2013).MenzerathAltmann law for distinct word distribution analysis in a large text. Physica A. 392
(12). 27752780
Fahnestock J. (1983). Semantic and Lexical Coherence. College Composition and Communication. 34 (4). 400-416. National Council of Teachers of English.
Forns N. et al. (2013). The challenges of statistical patterns of language: The case of Menzerath s law in genomes.Complexity. 18 (3). 1117.
Ge Song et al (2014). Short Text Classification: A Survey. Journal of Multimedia. 9 (5). 635-643. AcademyPublisher.
Halliday, M.A.K., Hasan R. (1976). Cohesion in English. London: Longman.
Harispe S. et al. (2013). Semantic measures for the comparison of units of language, concepts or entities from text
and knowledge base analysis. Arxiv. 1310. 1285. 1-159.
Harispe, S. et al. (2013). The semantic measures library and toolkit: fast computation of semantic similarity andrelatedness using biomedical ontologies. Bioinformatics. 30 (5). Oxford: Oxford University Press.Kornai A. (2008). Mathematical linguistics. [online]. Advanced Information and Knowledge Processing. London:
Springer.
Ludk Hebek (2002). Zipfs Law and Text. Glottometrics. 3. 27-38. Ram Verlag.
Poeppel, D., Embick, D. (2005). Defining the relation between linguistics and neuroscience. Twenty-first Century
Psycholinguistics: Four Cornerstones. 103118.
Poulos M., Papavlasopoulos S., Chrissikopoulos V. (2006). A text categorization technique based on a numerical
conversion of a symbolic expression and an onion layers algorithm . Journal of Digital Information. 6 (1).
Raghavan V. V., Wong S. K. M. (1986). A critical analysis of vector space model for informationretrieval. Journal of the American Society for Information Science. 37 (5). 279-287.
Richards J. C., Platt J., Platt H. (1992). Longman dictionary of language teaching and applied linguistics. London:
LongmanSalton G., Wong A., and Yang C. S. (1975). A vector space model for automatic indexing. Communications of the
ACM. 18 (11). 613-620.
Journal of Quantitative Linguistics
Acceptance Date 5 Dec. 2014
in press -Volume 22 Issue 3
Taylor & Francis Group
https://www.researchgate.net/publication/285642568_Prolegomena_to_Menzerath's_law?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222247018_Individual_Differences_in_Working_Memory_and_Reading?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222247018_Individual_Differences_in_Working_Memory_and_Reading?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/227755049_A_critical_analysis_of_vector_space_model_in_information_retrieval?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/227755049_A_critical_analysis_of_vector_space_model_in_information_retrieval?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/200773081_A_Vector_Space_Model_for_Automatic_Indexing?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/200773081_A_Vector_Space_Model_for_Automatic_Indexing?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/200773081_A_Vector_Space_Model_for_Automatic_Indexing?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/200773081_A_Vector_Space_Model_for_Automatic_Indexing?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/285642568_Prolegomena_to_Menzerath's_law?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/227755049_A_critical_analysis_of_vector_space_model_in_information_retrieval?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/227755049_A_critical_analysis_of_vector_space_model_in_information_retrieval?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222247018_Individual_Differences_in_Working_Memory_and_Reading?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222247018_Individual_Differences_in_Working_Memory_and_Reading?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==7/26/2019 Short Text Coherence Hypothesis
14/14
[13]
Sparks J.R. (2012). Language/discourse comprehension and understanding. Encyclopedia of the Learning Sciences. 1713-1717. Springer.
Taskar B. et al. (2004). Max-Margin Parsing. Proceedings of EMNLP 2004. 1-8.
Titelov M. (1992). Quantitative linguistics. Linguistics and literary studies in Eastern Europe (LLSEE). 37.
Philadelphia: John Benjamins
Turney P. D., Pantel P. (2010). From frequency to meaning: vector space models of semantics. Journal of Artificial
Intelligence Research. 37 (1). 141-188.
Xiaojun Quan, Gang Liu et al. (2009). Short text similarity based on probabilistic topics. Knowl Inf Syst. 473-491.
Zar Jerrold H. (1999). Biostatistical Analysis. 4th ed. Upper Saddle River, N.J.: Prentice Hall.
Zimny, S. T. (1987). Recognition memory for sentences from a discourse. Unpublished doctoral dissertation,Boulder: University of Colorado.
Journal of Quantitative Linguistics
Acceptance Date 5 Dec. 2014
in press -Volume 22 Issue 3
Taylor & Francis Group
https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220283480_Short_text_similarity_based_on_probabilistic_topics_Knowl_Inf_Syst?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/270252746_Biostatistical_Analysis?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/270252746_Biostatistical_Analysis?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220283480_Short_text_similarity_based_on_probabilistic_topics_Knowl_Inf_Syst?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==