Short Text Coherence Hypothesis

7/26/2019 Short Text Coherence Hypothesis

1/14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/271167995

Short Text Coherence Hypothesis

Article in Journal of Quantitative Linguistics August 2016

Impact Factor: 0.33 DOI: 10.1080/09296174.2016.1142328

READS

127

4 authors:

Sylvia Poulimenou

Ionian University

3PUBLICATIONS 1CITATION

SEE PROFILE

Sofia Stamou

Ionian University

68PUBLICATIONS 369CITATIONS

SEE PROFILE

S. Papavlasopoulos

Ionian University


SEE PROFILE

Marios Poulos

Ionian University


SEE PROFILE

All in-text references underlined in blueare linked to publications on ResearchGate,

letting you access and read them immediately.

Available from: Marios Poulos

Retrieved on: 13 June 2016
https://www.researchgate.net/profile/S_Papavlasopoulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_4https://www.researchgate.net/profile/S_Papavlasopoulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_5https://www.researchgate.net/profile/Sylvia_Poulimenou3?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/profile/Sofia_Stamou?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_1https://www.researchgate.net/profile/Marios_Poulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/profile/Marios_Poulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_5https://www.researchgate.net/profile/Marios_Poulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_4https://www.researchgate.net/profile/S_Papavlasopoulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/profile/S_Papavlasopoulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_5https://www.researchgate.net/profile/S_Papavlasopoulos?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_4https://www.researchgate.net/profile/Sofia_Stamou?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/profile/Sofia_Stamou?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_5https://www.researchgate.net/profile/Sofia_Stamou?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_4https://www.researchgate.net/profile/Sylvia_Poulimenou3?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_7https://www.researchgate.net/institution/Ionian_University?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_6https://www.researchgate.net/profile/Sylvia_Poulimenou3?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_5https://www.researchgate.net/profile/Sylvia_Poulimenou3?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_4https://www.researchgate.net/?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_1https://www.researchgate.net/publication/271167995_Short_Text_Coherence_Hypothesis?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_3https://www.researchgate.net/publication/271167995_Short_Text_Coherence_Hypothesis?enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA%3D%3D&el=1_x_2


2/14

[1]

Short Text Coherence Hypothesis

Sylvia Poulimenou, Sofia Stamou, Sozon Papavlasopoulos, Marios PoulosLaboratory of Information Technologies

Faculty of Information Science and Informatics - Ionian University

IoannouTheotoki 72-CorfuCorresponding Author: [email protected]

Abstract: In this paper, we experimentally study the degree to which the length of a short text affects its

comprehensiveness and readability, within quantitative linguistics. The quantitative linguistics focus mainly in

analysis of large text collections and one of the major scientific theories in use is the Menzerath-Altmann law. In

this paper we attempt to define the quantitative analysis framework for short texts consisting approximately of one

or two sentences, due to the fact that they are considered very important in many scientific fields. To achieve the

aim of this paper, a coherence statistical testing process of 3 variables was created for short texts. The

implementation of that was possible through experimental and statistical evaluation. Upon completion of the above

mentioned evaluation, the statistical results showed that short text coherence, comprehensiveness and readability

are fully achieved in short texts consisting of 14 words, when 3 predetermined variables are associated and vice

versa. To prove the above hypothesis the theory of Vector Space Model and Kendalls Coefficient of Concordance

were used. The assessment of statistical results concluded that the above hypothesis can be fully met for a number

of cases with a probability p>99%. Moreover, in the experiment were used short texts in English language but it

was proven that language can be considered irrelevant. To corroborate this, a smaller scale experiment with short

texts in the German language was conducted and hypothesis was confirmed, that the proposed model of this paper

can be applied in all short texts regardless of their linguistic origin.

Keywords: Short Text Processing, Vector Space Model, Lexical Coherence

1.

Introduction

In the quantitative linguistics theory, the lexical coherence of texts regarding word distribution is considered as a

very important scientific field. According to Carstens (2001) text linguistics coherence is defined as the ways inwhich components of the sentences of a text, i.e. the words we actually hear and use, are mutually connected

(grammatically and lexically). Haliday and Hasan (1976) underline cohesion as a semantic relation between one

element and another in the text and some other element that is crucial to the interpretation of it. In addition as

mentioned by Fahnestock (1983), coherence derived from correct text composition is a crucial element so that

ideas can flow smoothly throughout the text and so that readability can maintain high levels in comprehending the

texts meaning. Based on Richards et al. (1992), readability means: how easily written materials can be read and

understood. This depends on several factors including the average length of sentences, the number of new words

contained, and the grammatical complexity of the language used in a passage. Moreover, the concept of

readability is associated with the concept of comprehensiveness. Sparks (2012) mentions that discourse

comprehension involves building meaning from extended segments of language. Moreover that successfully

comprehending larger unit of text and discourse requires making inferences to connect ideas both within and

across local and global discourse contexts.

As mentioned in Eroglu (2013) linguistic organization in texts can be accomplished, where the Menzerath-

Altmann law (MA) exists in Altmann, G. (1980). MA law is considered a fundamental law in quantitative

linguistics, where one can observe the relationships between the size of the whole and the size of the parts in

language according to Baixeries et al. (2013). In short, according to MA law in Eroglu S. (2013), the longer a

linguistic construct the shorter its constituents, where a construct is considered to be the whole and a constituent to

be a part of the whole. MA law is a basic and important law in quantitative linguistics where the main focus is in

statistical analysis of large texts. In particular, as mentioned by Hebek (2002), MA law does not apply in

extremely short texts whereas short texts can be considered as a sentence or a complex sentence. However, short

texts are considered very important in many scientific fields, mainly in online communication and e-commerce but

also in quick searching in internet. According to Ge Song et al. (2014) it is considered a big challenge to classify

short texts because their limited number of words cannot represent either the feature space or the actual

relationships between words and documents. Because short texts have small word length Xiaojun Quan (2009)

explains, similarity measure cannot be applied successfully due to the lack of word co-occurrence or sharedcontext.

Journal of Quantitative Linguistics

Acceptance Date 5 Dec. 2014

in press -Volume 22 Issue 3

Taylor & Francis Group
https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220283480_Short_text_similarity_based_on_probabilistic_topics_Knowl_Inf_Syst?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220283480_Short_text_similarity_based_on_probabilistic_topics_Knowl_Inf_Syst?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==


3/14

[2]

Its a known fact that people use sentences in order to communicate with each other successfully and certain

parameters need to be taken under consideration, due to the fact that correct sentencing is quite an important issue.

In successful communication there has to be an average length of the sentence, so that it is not confusing, or

complicated. Thus, the lexical coherence of the sentence has been introduced empirically by Kornai (2008), in

journalistic prose that the median sentence length is above 15 words. Moreover, Titelov (1992) mentions that

during the 80s there was interest in scientific research of the sentence in the context of syntactic phenomena.

Furthermore, the complexity of sentence meaning depends on the length of the sentence. According to Cutts(2009) the average sentence length lies in between 15-20 words, in order to maintain readability, when the average

length cannot always be achieved. Taskar B. et al. (2004) in their work regarding descriptive parsing and dynamic

approach techniques undertake their experiments by setting a restriction of sentence length equal or less than 15

words. Other scientific fields that express interest regarding sentence length in research considering memory are

cognitive psychology and neuroscience. Baddeley (2003)in his three part model for working memory encountered

problems in interaction to long term memory, where the limitation of 15 words per sentence in mentioned again. In

their experiment Daveman and Carpenter (1980) support that it requires about 5 seconds for a person to read a

sentence. In addition Anderson et al. (2001) in their study concerning sentence memory, where they describe

various models of memory, observed that in order for the brain to process a word, it takes a few hundred

milliseconds. The same research refers to an experiment conducted by Zimny (1987) where a word by word

presentation procedure was completed, with 300 ms per word. Therefore, from the above mentioned researches

one can draw the conclusion that a typical sentence must contain approximately 16 to 17 words. Taking everything

into account, coherence analysis in short texts and especially in sentences can be considered as a very important

scientific field to be explored. The lexical coherence in sentences has been empirically observed and placed

between 15 to 20 words per sentence.

The aim of this paper is to establish a statistical hypothesis that corroborates the empirical observations regarding

the coherence of short texts or of a sentence. That way the gap left from the MA law considering short texts can be

filled and this study can set the cornerstone in short text analysis.

For the implementation of the above, we tracked variables which are considered crucial regarding text coherence.

Text coherence was examined regarding the impact in terms of each constituent to the construct. That correlation

was feasible through the use of three variables. The proof of the above hypothesis was realized through the use of

Kendalls coefficient of concordance. The conclusions emerged from that methodology introduce an innovation in

the field of computational linguistics because the above experimental linguistics observations about short text

stability around 15 constituents are verified, which demonstrates statistically the correlation of the short text

through those 3 variables.This paper is divided to the following parts:

a) Methodology, where the algorithm is presented fully in detail. Moreover its statistical evaluation.

b) Experimental part, which is implemented via the application of the algorithm on a wide sample of short

texts.

2. Method

2.1 Vector Space Model

According to Salton, Wong and Yang (1975) documents can be represented as vectors, in order to index them and

find their degree of similarity. As noted by Raghavan and Wong (1986) vectors are quite useful since they obey

basic axioms and algebraic rules. Vector Space Model (VSM) is used in several scientific fields such as

information filtering and relevancy rankings. According to Turney and Patel (2010)with the usage expansion of

VSM to semantic tasks considering language processing, brilliant results can be found. VSM is an algebraic model

which makes possible the representation of any text object (term), such as document, sentence, clause, phrase,

word and morpheme. The VSM representation can be analyzed in to three steps.

In the first step, the content bearing terms (which are typically words or short phrases) are extracted creating the

document indexing. This indexing is executed via two alternative methods the non - linguistic. The linguistic

methods are based on gathering function words containing high and low frequency which are reflected in

document semantically. On the other hand non - linguistic methods are based on different indexing procedures

such as probabilistic indexing and automatic indexing.

In the second step, the weighting of the indexed terms is created according to relevance to the user in a possible

retrieval procedure. Term weighting has been applied by testing the sensitivity and specificity of the search, where

the specificity is related to precision and sensitivity to recall. There are three dominant variables term weighting,




https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==


4/14

[3]

which are related with the term frequency, the collection frequency and the length normalization. These three

variables are combined via propagation way in order to make the resulting term weight.

As an example, the definition of the algebraic expression of a text with VSM is possible by using the following

equation:

1, 2, ,, ...,j j j t jV w w w

(1)

Where j stands for the number of the constituents of a text and t represents the number of weights (variables)

which are defined by the model.

Finally, in the third step, the text is ranked according to similarity measure with respect to the query. The similarity

in VSM is estimated by using connective variables based on the normalized inner product between text vector and

query vector, where constituent overlap indicates similarity. The inner product is usually normalized. The most

current similarity measure is the cosine variable, which measures the angle between the text vector and the query

vector.

2.2 The Basis of the Algorithm

The variable extraction is based on VSM theory. In more details, in first step (section 2.1) we select a short text as

construct which is as sentence or complex sentence and we considered the word as the constituent of the construct.Subsequently according to second step (see section 2.1) a non-linguistic approach is adopted because all the

constituents of the construct are used with an indexing procedure which depends on their order position. In details,each constituent obtains particular weighting independently from its number of appearances in the construct, sot=j.

In our case, the three (3) dominant variables term weighting are replaced by the variables of the following vector

jj jj jj jjw i s k

(2)

Where, iis the order of the constituent in a short text, kis the number of characters and sis an encoding measure

and for simplification reason this is defined by the ASCII encoding procedure (Poulos, Papavlasopoulos,Chrissikopoulos 2006),

Then the equation (1) is transformed into equation (3)

1, 2, ,, ...,j j j j jV w w w

(3)

For normalization reason, the vectorjV

is obtained by the equivalent vector

j

j

j

VF

V

(4)

Furthermore, vectorsV

is depicted as the resultant vector and from now on will be addressed as short text vector

(see figure 1).

s j

j

V F

(5)

2.3 Similarity Criterion

The degree of correlation between thejF

ands

V

can be extracted by equation 6 and specifically of factor rwhich

represents the inner product between text vector sV

and query vectorjF

according to step 3 (see section 2.1).

Also, this procedure is expressed by the general consideration regarding to document similarities theory (Harispe

et al. 2013)

( ) ( )cosj sj j

j s

F Vr

F V

(6)




https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==


5/14

2.4

A ve

equa

in th

reprto its

2.5

The

are a

dem

M r

cons

In theach

hypo

the e

Fig. 1 The d

ransformati

ctor jjU

is ad

tion 7). The re

e semantic theo

sents the influinfluence.

Statistic Fou

statistical corro

ssociated. This

nstrated in the

presents the

ituents in the s

is case the statother, in orde

thesis oH in

xact opposite.

epiction of co

on of variab

opted instead

son of this rep

ry according t

nces degree t

ndation

boration of thi

is possible by

formula belo

umber of va

hort text.

istic control isr to define th

icated that the

herefore, the

stituent vecto

les

f vector jjw

lacement took

(Harispe et al

hrough variabl

jjU r

transformatio

applying Ken

:

iables being c

based on thenumber of c

three variable

hi square test i

23

rx

[4]

r (F1, F2) and

here the i va

lace because

. 2013).The re

e rbecause t

j jj jjs k

is achieved b

alls Coefficie

2

1

2

n

j

j

R

FM

orrelated, in

here

1

n

jR

ull hypothesisonstituents tha

are not assoc

s defined by:

*( 1)*n F

resultant vect

iable of jjw

is

ariable ris c

placement was

is will lead in

testing wheth

t of Concorda

1

2 1

n

j

j

R

n

his case M=3

(rank

1

n

j

d

that these thrt influence th

iated and the a

( , , )r s k

r vs in the Eu

replaced by v

nsidered as a

made delibera

a new ranking

er the three (3)

nce (Jerrold H.

and n stands

)j

e variables arparticipating

lternative hypo

lidean plane

riable rof U

ery important

ely since vecto

procedure acc

variables ( r,. Zar 1999), w

for the num

not associateshort text. Th

thesis AH in

j (see

factor

r jU

rding

(7)

, k)

ich is

(8)

er of

(9)

(10)

withe null

icates

(11)






6/14

[5]

The value of chi-square cumulative distribution is extracted by using as degree of freedom 1n and iscalculated by equation 12.

2

0 2

2

21

22

t

x t e

P

(12)

This test is a one-tailed test because we search the appropriate constituent number for a short text. Then, the null

hypothesiso

H is accepted when 0.001P and consequentlyA

H is satisfied with 0.001P .

3. Experimental Part

This experiment lies in the following steps:

1. At first a sample was to be gathered in order to apply the algorithm. The data set of this research consists

of 100 short texts extracted from abstracts of scientific articles. The scientific articles were acquired in pdfform from the Directory of Open Access Journals (DOAJ), all coming from the biomedicine field,

regarding their subject. By using a common browser it was possible to visit the website of Directory ofOpen Access Journals (DOAJ) and to download all 100 articles in a local folder.

2. Continuing, 100 short texts were selected, where each one of them had an average length of more than 25constituents. Each short text was processed until the length of 25 constituents and not further above due tothe reasons below:

Since this algorithm is based on VSM, there is limitation considering long texts. Their representation

cannot be successful because in such length there can be found little similarity variables according to

Salton (1975).

Another reason for choosing for the experiment the limit of 25 constituents resulted from the

experimental procedure, which showed that above the 25th constituent results did not show a

substantial change.

Finally that limitation occurs to the whole sample since the selected sample must be homogeneous in

order to derive valid and objective conclusions on a general level.

Finally, the compilation of the algorithm and the implementation process were carried out by the use of MATLAB

software.

3.1 Implementation Algorithm and Statistical Process

In this section the implementation of the proposed algorithm will be presented by using as an example the

following short text (see Table 1):

Many theorists have suggested that working memory capacity plays a crucial role in reading comprehension

however, traditional measures of short-term memory, like digit span and word span, are either not correlated or

only weakly correlated with reading ability

In the implementation part by equations 2-6, the variables data (r,s,k) are extracted (see figure 7). In Kendalls

Coefficient of Concordance procedure the variables data (r, s, k) are ranked and Rank (Rj) sums are estimated for

each constituent by equations 8-10.

For example the constituent many, which is the first word on the above mentioned example, corresponds to a

vector with deviation angle equal to 1.6435 in relevance with the resultant vector of the short text. S equals to

value 4 (number of characters) and k according to ASCII encoding corresponds to 405. In the same way, the values

for all 37 constituents of the short text are extracted, as one can see in Table 1.

After the process that was analyzed above, the chi square value with (38-1=37) degrees of freedom is calculated by

equation 2 and from this value the cumulative probability 3.5055e-06P is obtained through equation 12.

Finally, by using the one-tailed probability test for 00.001H P

, the null hypothesis is accepted, indicating thatthere is no association among the three variables (r, s, k).






7/14

[6]

Table 1. Algorithm Implementation-Kendalls Coefficient of Concordance-Decision H0(an example short text)

Words r s k Sums of Rjj Data rank data rank data rank

1 1.6435 28 4 12.0 405 8.0 48.0000

2 1.6699 29 9 32.5 997 33.0 94.5000

3 1.3755 23 4 12.0 420 9.0 44.0000

4 1.5487 26 9 32.5 971 32.0 90.5000

5 1.1232 16 4 12.0 433 11.0 39.0000

6 1.3378 22 7 25.5 769 28.0 75.5000

7 1.1818 18 6 21.0 665 22.0 61.0000

8 1.2430 21 8 30.0 846 30.0 81.0000

9 0.8526 11 5 18.0 553 19.0 48.0000

10 4.1013 37 1 1.0 97 1.0 39.0000

11 0.9320 12 7 25.5 739 26.0 63.5000

12 0.2012 4 4 12.0 434 12.5 28.5000

13 1.6753 30 2 3.0 215 3.0 36.0000

14 0.6862 7 7 25.5 730 24.5 57.0000

15 1.1718 17 13 38.0 1402 38.0 93.0000

16 0.6565 6 8 30.0 812 29.0 65.0000

17 0.9587 13 11 37.0 1179 37.0 87.0000

18 0.5982 5 8 30.0 869 31.0 66.000019 3.3125 35 2 3.0 213 2.0 40.0000

20 0.6884 8 10 35.0 1045 34.0 77.0000

21 0.0927 1 7 25.5 709 23.0 49.5000

22 1.2065 20 4 12.0 421 10.0 42.0000

23 0.7047 9 5 18.0 529 18.0 45.0000

24 1.3804 24 4 12.0 434 12.5 48.5000

25 2.8707 33 3 6.0 307 5.0 44.0000

26 1.5666 27 4 12.0 444 14.5 53.5000

27 1.4494 25 5 18.0 478 17.0 60.0000

28 3.3433 36 3 6.0 312 6.0 48.0000

29 0.8056 10 6 21.0 641 20.0 51.0000

31 3.3023 34 3 6.0 337 7.0 47.0000

32 0.1112 3 10 35.0 1061 35.5 73.5000

33 6.3095 38 2 3.0 225 4.0 45.000034 2.4095 31 4 12.0 450 16.0 59.0000

35 1.1958 19 6 21.0 653 21.0 61.0000

36 0.1046 2 10 35.0 1061 35.5 72.5000

37 2.8506 32 4 12.0 444 14.5 58.5000

38 1.1168 15 7 25.5 730 24.5 65.0000

F 0.8023 2 89.06rx 3.5055e-06P accept for 0.001P

3.2 Iteration Procedure

By using the same example of short text the algorithm38

3

JU

is applied. The iteration procedure is carried out

with j=3:1:38 words. In other words, the original short text is segmented into short texts with various lengths

beginning from 3 constituents up to 38 and this procedure is executed for each of the above short texts (see section

3.1). Then the chi-square test is applied (35) thirty five times (see figure 2) and the cumulative probabilities are

estimated as well as the control of hypothesis testing (see figure 3).






8/14

[7]

Fig. 2 The Chi-Square Distribution Function according to iterative procedure of the segmented short text

Fig. 3 The cumulative probabilities and the control of hypothesis testing

3.3 Iteration Procedure in the Data Set

Iteration procedure for each short text in a range of 3 up to 25 constituents is executed totally (22) twenty two

times. The procedure is executed for a sample of 100 data sets, in total 2200 calculations. Then the chi-square test

of the data set is represented (see figure 4) and the cumulative probabilities as well as the control of hypothesis

testing are presented in figures 5 and 6 respectively.






9/14

[8]

Fig. 4 The cumulative probabilities and the control of hypothesis testing

Fig. 5 Data Set of Probability Cumulative Distribution

Fig. 6 The Cumulative Probabilities and the Control of Hypothesis Testing






10/14

[9]

By conducting an assessment for the above statistical results they are divided into 3 parts and make the followingobservations:

1. As one can see the distribution a, x in figure 4 The cumulative probabilities and the control of

hypothesis testing via the experimental function, coherence is observed for the whole sample of 100short texts at the sentence length of 14 constituents. From 14 constituents and more one can observe a

lack of coherence at the experimental function.

2.

According to the experiment as represented in figure 5 Data Set of Probability CumulativeDistribution, all estimated probabilities using equation 11 reject the null hypothesis for a short textlength equal to 14 constituents, with a probability, p


11/14

[10]

7 21 0 7

7 22 0 7

6 23 0 6

6 24 0 6

3 25 0 33 26 0 3

2 27 0 2

We present an example derived from table 2 in the following sentence:

Die Computerindustrie htte nach Michael Levitt einen Teil des Nobelpreises fr Chemie 2013 vedient denn ihre

Forschungs und Entwicklungsleistung hatte zu drastisch hheren Rechengeschwindigkeiten gefhrt (siehe

Tabelle)

Fig. 7 The German Case of the Cumulative Probabilities and the Control of Hypothesis Testing

Fig. 8 Data Set of Probability Cumulative Distribution from ten (10) German short Texts

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

Number of Words

CumulativeProbabilities

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Number of Words

PValues






12/14

[11]

As can be seen in Table 2 and in figures 7 and 8, the experiments results conducted with short texts from the

German are in agreement with the main large scale experiment that was carried out in the beginning of the

experimental part. That proves that constituents of a short text in the proposed model can be considered as

morphs, where the linguistic origins of the short text are disregarded and therefore the language of the text is not

significant. More specifically, according to Figures 7 and 8 the cumulative probability begins to declinedrastically between 15th and 16thword , where p


13/14

[12]

the possible relation between the number of variables and the constituents number in extended fields (such

biology) should be considered as an axis to the systems coherence and as the upcoming scientific priority.

References

Altmann, G. (1980). Prolegomena to Menzeraths Law. Glottometrika 2, 110. Bochum: Brock-meyer,

Anderson J. R., Budiu R. and Reder L. M. (2001). A theory of sentence memory as part of a General Theory ofMemory. Journal of Memory and Language. 45. 337-336.

Baddeley A. (2003). Working memory: looking back and looking forward. Nature reviews - neuroscience. 5. 829-839.

Baixeries J. et al. (2013). The parameters of the Menzerath-Altmann Law in genomes. Journal of Quantitative

Linguistics. 20 (2). 94104.

Baixeries J., Hernndez-Fernndez A., Ferrer-i-Cancho R. (2012), Random models of MenzerathAltmann law ingenomes. Biosystems. 107 (3), 167173.

Carstens W. (2001). Text Linguistics: relevant linguistics? Poetics and linguistics; discources of war andconflictConference. 588-595.

Cutts, M. (2009). Oxford guide to plain English. 3rded. Oxford: Oxford University Press.

Daveman M., Carpenter P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior. 19. 450-466.

Eroglu S. (2013).MenzerathAltmann law for distinct word distribution analysis in a large text. Physica A. 392

(12). 27752780

Fahnestock J. (1983). Semantic and Lexical Coherence. College Composition and Communication. 34 (4). 400-416. National Council of Teachers of English.

Forns N. et al. (2013). The challenges of statistical patterns of language: The case of Menzerath s law in genomes.Complexity. 18 (3). 1117.

Ge Song et al (2014). Short Text Classification: A Survey. Journal of Multimedia. 9 (5). 635-643. AcademyPublisher.

Halliday, M.A.K., Hasan R. (1976). Cohesion in English. London: Longman.

Harispe S. et al. (2013). Semantic measures for the comparison of units of language, concepts or entities from text

and knowledge base analysis. Arxiv. 1310. 1285. 1-159.

Harispe, S. et al. (2013). The semantic measures library and toolkit: fast computation of semantic similarity andrelatedness using biomedical ontologies. Bioinformatics. 30 (5). Oxford: Oxford University Press.Kornai A. (2008). Mathematical linguistics. [online]. Advanced Information and Knowledge Processing. London:

Springer.

Ludk Hebek (2002). Zipfs Law and Text. Glottometrics. 3. 27-38. Ram Verlag.

Poeppel, D., Embick, D. (2005). Defining the relation between linguistics and neuroscience. Twenty-first Century

Psycholinguistics: Four Cornerstones. 103118.

Poulos M., Papavlasopoulos S., Chrissikopoulos V. (2006). A text categorization technique based on a numerical

conversion of a symbolic expression and an onion layers algorithm . Journal of Digital Information. 6 (1).

Raghavan V. V., Wong S. K. M. (1986). A critical analysis of vector space model for informationretrieval. Journal of the American Society for Information Science. 37 (5). 279-287.

Richards J. C., Platt J., Platt H. (1992). Longman dictionary of language teaching and applied linguistics. London:

LongmanSalton G., Wong A., and Yang C. S. (1975). A vector space model for automatic indexing. Communications of the

ACM. 18 (11). 613-620.




https://www.researchgate.net/publication/285642568_Prolegomena_to_Menzerath's_law?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222247018_Individual_Differences_in_Working_Memory_and_Reading?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222247018_Individual_Differences_in_Working_Memory_and_Reading?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/227755049_A_critical_analysis_of_vector_space_model_in_information_retrieval?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/227755049_A_critical_analysis_of_vector_space_model_in_information_retrieval?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/200773081_A_Vector_Space_Model_for_Automatic_Indexing?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/200773081_A_Vector_Space_Model_for_Automatic_Indexing?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/200773081_A_Vector_Space_Model_for_Automatic_Indexing?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/200773081_A_Vector_Space_Model_for_Automatic_Indexing?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/284318816_Working_memory_Looking_back_and_looking_forward?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220357464_A_Text_Categorization_Technique_based_on_a_Numerical_Conversion_of_a_Symbolic_Expression_and_an_Onion_Layers_Algorithm?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222665326_Theory_of_Sentence_Memory_as_Part_of_A_General_Theory_of_Memory?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/237129862_Dictionary_of_Language_Teaching_and_Applied_Linguistics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/257218998_Menzerath-Altmann_law_for_distinct_word_distribution_analysis_in_a_large_text?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/285642568_Prolegomena_to_Menzerath's_law?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/227755049_A_critical_analysis_of_vector_space_model_in_information_retrieval?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/227755049_A_critical_analysis_of_vector_space_model_in_information_retrieval?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222247018_Individual_Differences_in_Working_Memory_and_Reading?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/222247018_Individual_Differences_in_Working_Memory_and_Reading?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==


14/14

[13]

Sparks J.R. (2012). Language/discourse comprehension and understanding. Encyclopedia of the Learning Sciences. 1713-1717. Springer.

Taskar B. et al. (2004). Max-Margin Parsing. Proceedings of EMNLP 2004. 1-8.

Titelov M. (1992). Quantitative linguistics. Linguistics and literary studies in Eastern Europe (LLSEE). 37.

Philadelphia: John Benjamins

Turney P. D., Pantel P. (2010). From frequency to meaning: vector space models of semantics. Journal of Artificial

Intelligence Research. 37 (1). 141-188.

Xiaojun Quan, Gang Liu et al. (2009). Short text similarity based on probabilistic topics. Knowl Inf Syst. 473-491.

Zar Jerrold H. (1999). Biostatistical Analysis. 4th ed. Upper Saddle River, N.J.: Prentice Hall.

Zimny, S. T. (1987). Recognition memory for sentences from a discourse. Unpublished doctoral dissertation,Boulder: University of Colorado.




https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220283480_Short_text_similarity_based_on_probabilistic_topics_Knowl_Inf_Syst?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/270252746_Biostatistical_Analysis?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/260118515_LanguageDiscourse_Comprehension_and_Understanding?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/247354774_Recognition_memory_for_sentences_from_a_discourse?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/270252746_Biostatistical_Analysis?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/220283480_Short_text_similarity_based_on_probabilistic_topics_Knowl_Inf_Syst?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==https://www.researchgate.net/publication/45904528_From_Frequency_to_Meaning_Vector_Space_Models_of_Semantics?el=1_x_8&enrichId=rgreq-c9127f4c2e900642f44b2db6ad707d1c-XXX&enrichSource=Y292ZXJQYWdlOzI3MTE2Nzk5NTtBUzoxODc5NzE5NTc0MzY0MTZAMTQyMTgyNzQxNTYzNA==

Date post:	02-Mar-2018
Category:	Documents
Upload:	anita-jankovic
View:	222 times
Download:	0 times

Short Text Coherence Hypothesis

Documents