8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
1/19
This article was downloaded by: [Universidad De Concepcion]On: 06 October 2014, At: 18:42Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House37-41 Mortimer Street, London W1T 3JH, UK
Psychotherapy ResearchPublication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/tpsr20
Automated method of content analysis: A device for
psychotherapy process researchSergio Salvatore
a, Alessandro Gennaro
a, Andrea Francesco Auletta
a, Marco Tonti
a&
Mariangela Nittia
aDepartment of Pedagogy, Psychology, and Teaching Science , University of Salento ,
Lecce , Italy
Published online: 16 Jan 2012.
To cite this article:Sergio Salvatore , Alessandro Gennaro , Andrea Francesco Auletta , Marco Tonti & Mariangela Nitti
(2012) Automated method of content analysis: A device for psychotherapy process research, Psychotherapy Research, 22:3,256-273, DOI: 10.1080/10503307.2011.647930
To link to this article: http://dx.doi.org/10.1080/10503307.2011.647930
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the Content) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of thContent. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon anshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveor howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
http://dx.doi.org/10.1080/10503307.2011.647930http://www.tandfonline.com/action/showCitFormats?doi=10.1080/10503307.2011.647930http://www.tandfonline.com/page/terms-and-conditionshttp://www.tandfonline.com/page/terms-and-conditionshttp://dx.doi.org/10.1080/10503307.2011.647930http://www.tandfonline.com/action/showCitFormats?doi=10.1080/10503307.2011.647930http://www.tandfonline.com/loi/tpsr208/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
2/19
Automated method of content analysis: A device for psychotherapy
process research
SERGIO SALVATORE, ALESSANDRO GENNARO, ANDREA FRANCESCO AULETTA,
MARCO TONTI, & MARIANGELA NITTI
Department of Pedagogy, Psychology, and Teaching Science, University of Salento, Lecce, Italy
(Received 4 October 2010; revised 4 November 2011; accepted 28 November 2011)
Abstract
The work presents a computer-aided method of content analysis applicable to verbatim transcripts of psychotherapy: theAutomated Co-occurrence Analysis for Semantic Mapping (ACASM). ACASM is able to perform a context-sensitivestrategy of analysis aimed at mapping the meanings of the text through a trans-theoretical procedure. The paper is devotedto the presentation of the method and testing its validity. To the latter end we have compared ACASM and independentblind human coders on two tasks of content analysis: (a) estimating the semantic similarity between two utterances; (b) the
semantic classification of a set of utterances. Results highlight that: (a) ACASMs estimates of semantic similarity areconsistent with the corresponding estimates provided by coders; (b) coders agreement and coder-ACASM agreement onthe task of semantic classification have the same magnitude. Results lead to the conclusion that the content analysisproduced by ACASM is indistinguishable from that performed by human coders.
Keywords: qualitative research methods; technology in psychotherapy research and training; content analysis;
meaning
Introduction
Consistent with Freuds definition of psychotherapy
as the talking cure, psychotherapy process research
has since its very beginning commonly focused on
the communicative exchange unfolding within ses-sions. Many methods of process analysis have been
developed for investigating such an exchange (e.g.,
Colli & Lingiardi, 2009; Dahl, Kachele, & Thoma,
1988; Dimaggio & Semerari, 2004; Goncalves,
Matos, & Santos, 2009; Greenberg & Pinsof, 1986;
Luborsky & Crits-Christoph, 1990; Mergenthaler,
1996a; Perry, 1991; Salvatore, Gelo, Gennaro,
Manzo, & Al Radaideh, 2010). A good proportion
of these methods of process analysis is based on
verbatim transcripts of sessions*exclusively or to-
gether with other kind of data (e.g. data concerning
non-verbal behaviour). Consequently, the develop-ment of the efficacy and efficiency of methods of
textual analysis is worth considering as a major task
for psychotherapy process research (Mergenthaler,
1996b). This study intends to contribute to such
development, through the presentation of a bottom-
up automated method of content analysis of texts.
Semantic Analysis: Top-Down Versus
Bottom-Up Methods
The method presented in this study belongs to the
family of models focusing on the semantic level of
text (henceforth: semantic analysis). These methodsare aimed at mapping the content of the text, namely
the meaning it conveys. Semantic analysis is essential
for psychotherapy process research. Psychotherapy is
an exchange of meanings (Angus & McLeod, 2004;
Dimaggio & Semerari, 2004; Hermans & Hermans-
Jansen, 1995; McNamee & Gergen, 1992; Salvatore
et al., 2010; Salvatore & Venuleo, 2008; Santos,
Goncalves, Matos, & Salvatore, 2009) and therefore
it is hard to consider deepening our understanding of
it without taking into account the content of what
patient and therapist say.
Within semantic analysis it is worth differentiatingbetween top-down methods and bottom-up methods.
Top-down methods are based on pre-defined coding
systems according to which units of texts
are categorized. The Core Conflictual Relational
Theme (Luborsky & Crits-Christoph, 1990), the
Defence Mechanism Rating Scale (Perry, 1991), the
Correspondence concerning this article should be addressed to Alessandro Gennaro, University of Salento, Department of Pedagogy,
Psychology, and Teaching Science, via stampacchia, Lecce, 73100 Italy. Email: [email protected]
Psychotherapy Research, May 2012; 22(3): 256273
ISSN 1050-3307 print/ISSN 1468-4381 online # 2012 Society for Psychotherapy Research
http://dx.doi.org/10.1080/10503307.2011.647930
http://dx.doi.org/10.1080/10503307.2011.647930http://dx.doi.org/10.1080/10503307.2011.6479308/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
3/19
Collaborative Interactions Scale (Colli & Lingiardi
2009) and the Innovative Moments Coding System
(Goncalves, Ribeiro, Mendes, Matos, & Santos
2011; Goncalves, Ribeiro, Matos, Santos, &
Mendes, 2010) are examples of top-down semantic
methods. In general terms, they consist of a repertoire
of categories of content working as coding system
and of a set of rules for applying the categories to the
text. Bottom-up methods pursue the same aim ofmapping the meaning of the text, but they do not
adopt a pre-defined coding system. Rather, like the
logic of Grounded Theory (Glaser & Strauss, 1967;
Rennie, 2000), these methods start from the text
and define the coding categories together with the
mapping of the textual content*through an iterative
interpretative procedure. Task analysis (Greenberg &
Pascual-Leone, 2001; Pascual-Leone, Greenberg,
& Pascual-Leone, 2009) is an example of this iterative
way of working. It starts from a set of theoretical
assumptions that are deliberately used for orienting
the extrapolation of sequences of events of change. In
turn, observed sequences can lead to the modificationof the original theoretical assumptions and therefore
to further observations.
The Contextuality of Meaning: Implications for
Semantic Analysis
The meaning of a linguistic sign (a word, a sentence) is
inherently dynamic and contextual (Salvatore, 2011,
2012; Valsiner, 2007; for a discussion of this general
tenet in the field of psychotherapy, see Gennaro,
Al-Radaideh, Gelo, Manzo, Nitti, & Salvatore, 2010;
Greenberg, & Pinsoff 1986; Salvatore et al., 2010;Salvatore, Gennaro, Auletta, Grassi & Rocco, 2011). It
is not a fixed, pre-established content (e.g., an idea, an
image, a concept) held in the sign itself; rather, it
emerges from the way the linguistic signs combine with
each other in the contingency of the talk (Linell, 2009;
Salvatore & Valsiner, 2011; Wittgenstein, 1953/1958).
Thus, understanding the meaning of the signa means
mapping with which other signs a occurs, in the specific
context of its use.
This pragmatic, dynamic and contextual defini-
tion of meaning provides a way to appreciate the
inherent multidimensionality and fuzziness of mean-
ing. In the concrete circumstance of communication,signs always occur within an array of connections
with many other signs; therefore, meaning depends
on how the interpreter selects some of these connec-
tions as pertinent, leaving others in the background.
In sum, meaning is not in the text, but in the
constructive, hermeneutic relationship between text
and interpreter.
Semantic analysis of text, therefore, cannot be
performed in terms of the application of context-
blind rules of coding*namely, if the word x occurs,
then this means that content A has occurred; rather,
inferential reconstruction of the linguistic and/or
extra-linguistic context of the text is required. In
other words, the specific interconnections that words
create within that particular text must be taken into
account*namely, word x in the context of its
connection with words y and z means A; but in the
context of its connection with words m and n itmeans B.
Thus far, automated procedures of semantic
analysis have not proved able to take into account
efficaciously the contextuality of meaning. And this
has prevented the spread of this kind of procedure
within psychotherapy research. As a result, the
semantic methods adopted in psychotherapy re-
search are currently based on human judgment.
Yet, the use of human coders raises several metho-
dological, metric and organizational problems that
place a considerable constraint on the heuristic
potentialities of this kind of method.
First of all, semantic analysis is usually verylabour-intensive and time-demanding work: it re-
quires time, people, and hours and hours of work.
This hinders the possibility of generalizing the
application of semantic methods across cases and
researchers. We are led to consider the methodolo-
gical fragmentation of contemporary process re-
search related to this constraint: the work required
for developing the competence for applying a coding
system*and for reaching a satisfying agreement
among coders*entails a level of commitment that
can often be expressed only by the group of
researchers working on developing the coding system
itself.
Secondly, the codersinferences will be always and
in any case endowed with an irreducible subjective
valence that cannot but have negative consequence
on the levels of reliability, and therefore on the
semantic methods power of revealing significant
relationships. On the other hand, in the case of
semantic analysis, the problem of reliability cannot
be considered merely in terms of error of measure-
ment; rather it reflects the inherent multidimension-
ality of meaning: the variability among coders stems
from the fact that the text is open to many different
levels of interpretation. Consequently, increasing thereliability of semantic analysis requires clarifying and
sharing the hermeneutic criteria according to which
the coders reduce the multidimensionality of mean-
ing. In this way a specific semantic map of the text is
constructed. In accordance with this perspective,
many efforts have been put into making the rules of
coding clearer and more specific and forcing the
coders to use procedures of consensual validation
(Lambert & Ogles, 2009; Lutz & Hill, 2009); yet,
Automated content analysis in process research 257
8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
4/19
given the high level of inference inherently implied in
these methods, these solutions cannot be fully
resolutive. And above all, they make the semantic
methods even more work and more time-consuming.
The above considerations lead us to conclude that
an alternative way is worth pursuing: the develop-
ment of bottom-up procedures of semantic analysis
based on explicit, invariant rules of coding and yet
able to take the contextuality of meaning intoaccount. Procedures of this kind would represent a
highly significant contribution to the growth of
psychotherapy process research. On the one hand,
they would allow the automated implementation of
the semantic analyses. On the other hand, they
would provide a shared ground supporting and
constraining the (at least to date) non-renounceable
human inferential judgments, so to increase the
inter-coder agreement as well as the comparability
among textual analysis.
Purpose of the Study and Hypothesis
This study intends to present an automated bottom-
up procedure of semantic analysis, Automated Co-
occurrence Analysis for Semantic Mapping
(ACASM), and to provide a first test of its validity.
ACASM constructs a map of the text in terms of
thematic nuclei active in it. It works through
invariant, ostensible, yet context-sensitive proce-
dures, defined in terms of computational algorithms.
Due to these characteristics, ACASMs procedures
are: (a) implementable through automated routines
carried out by computer; (b) reproducible reliably
across analyses and analysers; (c) able to produce avalid representation of the textual data (Lancia,
2002).
The current paper pursues two complementary
aims. First, the ACASM method is presented
together with an exemplification of its application
to a case of psychotherapy. Second, an initial
empirical test of ACASM validity is performed. As
concerns the latter point, we adopt a Turing-like
criterion of validity (for similar logic, see Rosenberg,
Schnurr & Oxmann, 1990; Steinbach, Karypis &
Kumar, 2000). Following this criterion, ACASM
could be considered a valid semantic method if and
only if the analysis it produces cannot be distin-guished from those produced by expert human
coders. We adopted this criterion because in the
case of bottom-up semantic analysis it is not possible
to refer to an external, objective normative criterion
in accordance with which to evaluate the validity of
the analysis in absolute terms. Meaning is multi-
dimensional and therefore any text permits many
representations of its semantic content. Conse-
quently, we assumed that in order that an automated
bottom-up procedure of semantic analysis could be
considered valid, such a procedure has to produce a
map of the text whose level of agreement with the
maps produced by expert coders is comparable with
the level of agreement that coders show with each
other.
Our hypothesis is that ACASM passes the Turing-
like test of validity.
Method
ACASMs Conceptual Framework
ACASM is an example of a bottom-up method of
semantic analysis. This is so because it does not start
with a pre-established repertoire of thematic con-
tents in accordance with which the units of analysis
are classified. Rather, the repertoire of thematic
contents working as a coding system is produced
by the analysis itself.
ACASM belongs to a set of methods focused on
the co-occurrence of words (Carli & Paniccia, 2007;Lancia, 2002; Reinert, 1986)*that is, the way the
words combine with each other within the same unit
of analysis into which the text is segmented (gen-
erally, the unit of analysis consists of an utterance or
a group of a few utterances). The co-occurrence of
words is taken as a criterion of similarity for
clustering the units of text. That is, the units of
analysis are clustered in accordance with the words
co-occurring within them: units of text holding the
same co-occurring words are considered similar and
therefore grouped. The rationale is that a set of co-
occurring words marks a specific thematic content
(named thematic nucleon too). Therefore, unitshaving a certain set of co-occurring words in
common share the thematic content marked by
such a set. In this way, the procedure of semantic
analysis is able to provide a fine level of semantic
representation, coding each unit of analysis in terms
of a specific content*namely, the one marked by the
set of co-occurring words according to which the
unit has been clustered.
From a conceptual point of view, the reference to
co-occurrence of words within the same unit of
analysis can be considered a way of taking into
account the linguistic level of the contextuality of
meaning*namely the level consisting of the way the
words are combined within the text.
ACASMs Procedure of Analysis
ACASM is performed in terms of invariant algo-
rithms implemented automatically by ad hoc soft-
ware on the basis of parameters of analysis
established by the researcher (Alceste, T-LAB).
258 S. Salvatore et al.
8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
5/19
We adopted the procedure implemented by the
software T-LAB (Lancia, 2002), in the version T-
LAB PRO_XL2. T-LAB PRO_XL2 is able to
analyse textual data of various languages (English,
Italian, Spanish, Portuguese, German).
ACASM is implemented through four steps,
which take about 1 hour of work, performed by
even only one researcher (the dimension of the
textual dataset affects only marginally the durationof the procedure).
Step 1. Segmentation of transcripts. ACASM
works on the textual dataset (henceforth: corpus) as
defined by the researcher in accordance to the aim of
the study. The corpus may consist of the verbatim
transcript of the patient and/or therapists talk,
concerning all or only sampled sessions. ACASM
divides the corpus into units of analysis*each of
them called an elementary context unit (ECU). An
ECU consists of a group of a few contiguous
utterances.The dividing of the text into ECUs has to find a
point of equilibrium between two requirements
dialectically linked to each other: interpretability
and specificity. On the one hand, the segments
have to be long enough to be interpretable in terms
of thematic content. On the other hand, the longer
the segments are, the greater the likelihood is that
each segment may not be associated with a specific
thematic content. The point of equilibrium between
interpretability and specificity is an empirical issue
(varying according to the language). After a series of
trials and simulations, we have got to the point (to
date) of defining the following criterion (for
the English language): (a) each ECU begins with
the character just subsequent to the last character of
the previous ECU; (b) each ECU ends with the first
punctuation mark (., or !, or ?) occurring after
the 250th character from the first character (i.e.
punctuation marks occurring before the 250th char-
acter are not considered for closing the ECU); (c) at
any rate the ECUs length must not be more than
500 characters; therefore, the ECU in any case ends
with the last word remaining within this limit, even if
no punctuation mark has occurred.
As one can note, the formulation of the criterion isexpressed in terms of characters. This is so because
the ACASMs algorithm adopts characters as basic
computational unit*namely the lexical units are
defined as the string of characters encompassed
between two empty characters. Nevertheless, pre-
vious application of this criterion on psychotherapy
transcripts (Salvatore et al., 2010) has shown that it
leads to definitions of units of text endowed with
semantic meaningfulness.
Step 2. Selection of the lexical forms and
construction of the dictionary. Depending on its
size, a textual corpus can hold even several thou-
sands of lexical forms. Lexical forms play the role of
variables in the ACASM procedure (see step 3).
Consequently, it is necessary to reduce them to a
number suitable for the constraints of the following
multidimensional analysis (see step 4), which re-
quires a reduction in the dispersion of the datamatrix.
This task is performed through two sequential
sub-steps.
Firstly, the procedure singles out all the lexical
forms present in the text and categorizes them
according to the lemma they belong to. A lemma is
the citation form (namely, the headword) used in a
language dictionary to refer to a lexeme (i.e., a set of
word forms having the same lexical root and mean-
ing). For example, word forms such as go,goes,
going and went have go as their lemma;
childand childrenhave childas their lemma.
The output of this sub-step is the list of lemmaspresent in the textual corpus.
The second sub-step is the selection of a subset of
lemmas within the list of lemmas. This sub-set
constitutes the dictionary the following analysis will
be based on. To this end, 10% of the whole list of
lemmas is selected. Selected lemmas are the most
frequent ones*yet the 5% highest-frequency lem-
mas are excluded by the ACASM dictionary. The
exclusion is motivated by the fact that the higher the
frequency of the lemma the less it contributes to
discriminating among the ECUs: high-frequency
lemmas (e.g., words like and, to, of) tend to
be present in too many ECUs, therefore enter too
many patterns of co-occurrences. This criterion of
exclusion has been determined through a prelimin-
ary empirical work of approximation; however, it is
consistent with the lexical-statistical logic grounding
several methods of textual analysis (Bolasco, 1999).
It is worth noting that, because of the high
frequency of the most commonly used words, the
10% percentage of lemmas included in the ACASM
dictionary corresponds to the level of coverage of the
text considered acceptable in literature, namely
about 7085% of the occurrences as a whole,
depending on the dimension of the textual corpus
(Bolasco, 1999; Lancia, 2002).
Step 3. Digital representation of the text.The
reduction of the original text into ECUs and the
identification of the lemmas active in the corpus
allows the text to be transformed into a digital matrix
representing the distribution of lemmas in ECUs (in
binary terms: present/absent). The matrix has all
ECUs displayed in rows and the lemmas in columns;
Automated content analysis in process research 259
8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
6/19
the value 1 in the generic cell xij represents the
presence of the jth
lemma in the ith
ECU, the value
0otherwise (Table I).
Step 4. Identification of cluster of ECUs/
co-occurring lemmas and classification of the
ECUs. A Cluster Analysis (CA; Aldenderfer &
Blashfield, 1984) is applied to the matrix. Inciden-
tally, note that CA incorporates a previous procedureof Multidimensional Lexical Correspondence Ana-
lysis, transforming the binary variables of the original
data matrix into continuous classificatory dimen-
sions. Cluster Analysis groups the ECUs using the
co-occurrence of lemmas as the criterion of similar-
ity: the higher the number of lemmas shared by two
ECUs, the higher is the probability that these two
ECUs are grouped in the same cluster. Therefore, in
the final analysis, each cluster obtained is a set of
utterances (i.e., of ECUs) that share many lemmas
among them. According to this criterion of similar-
ity, the ACASM considers a given cluster the marker
of a thematic content which is active in the text and
semantically characterizes the ECUs grouped in that
cluster (see below, the section Semantic Interpreta-
tion of the ACASM Output). The number of clusters
in which the text is segmented is defined in
accordance with an iterative algorithm; the proce-
dure of clustering stops when further partitions do
not further produce a significant improvement of the
inter/intra cluster ratio, which means that increasing
the number of clusters does not produce an appreci-
able increment of information.
A complementary output of the Cluster Analysis is
the assignation of each ECU to the cluster withwhich it has the highest index of association. In this
way, each ECU is marked with the most representa-
tive cluster representing one of the thematic contents
extrapolated by the Cluster Analysis. (Table III
shows the most representative ECUs, in English
translation, of the 14 clusters defined in the case
analysed in the current study, together with their
interpretation.)
Before concluding the presentation of the method,
it is worth noting that though ACASMs computa-
tional rules (i.e., the operative criteria according to
which the text is segmented, lemmatized and the-
matically clustered) are invariant, they can bemodified in accordance with the aim of the research-
er. For instance, if the researcher is interested in
analysing patients feelings concerning the marital
couple, she could find it useful in distinguishing two
lemmas for any word denoting a feeling: one lemma
concerning the word when associated with the
marital couple and the other concerning the word
when used outside such domain.
Semantic Interpretation of the ACASM Output
The interpretation is provided by the researcher.
Since each cluster represents a subset of ECUs
sharing lemmas tending to co-occur in the same
utterances, it can be understood as a thematic
nucleon made up of a set of words whose aggregation
reflects the shared presence of certain semantic traits
(Lancia, 2005). It is worth noting that the words
composing the set may have various kinds and
degrees of semantic relationship among them (e.g.,
they may be synonymous, as in muchand a lot,
antonymous, as in good and bad, connected
functionally, as in car and trip, and so forth).
The interpretation of the content of the set is based
on the identification of such a network of semantic
relationships.
Characteristics of ACASM
Before concluding the presentation of ACASM, it is
worth pointing out three peculiar characteristics of
the method.
1. Though the process of human comprehension
of texts is a highly debated issue (Kintsch, 1988;
Landauer & Dumais, 1997; Visetti & Cadiot,2002), in general terms one can assume that
human bottom-up semantic analysis requires
the implementation of two basic complemen-
tary functions. Firstly, semantic analysis con-
sists of the evaluation of semantic similarity
between the units of analysis (e.g., groups of
words, utterances, groups of utterance, and so
on) into which the text is segmented. Thus,
utterances considered to have a similar semantic
content are grouped together and this leads to
the identification of a semantic/thematic nu-
cleon. For instance, utterances concerning
trouble at work, conflicts within the familyand health issues can be clustered in terms
of their sharing of the content: undesirable,
Table I. A hypothetical example of digital representation of the text:I went home. Kate was still therein terms of the matrix ECU/lemma
ECU/Lemmas I Go Home Kate Be Still There
I went home 1 1 1 0 0 0 0
Kate was still there 0 0 0 1 1 1 1
260 S. Salvatore et al.
8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
7/19
problematic events. On the other hand, seman-
tic analysis implies an operation of categoriza-
tion: utterances are attributed to the semantic
nucleon that is the most representative of their
content. ACASM performs the same two basic
functions of human coders bottom-up seman-
tic analysis too. It does so through context-
sensitive computational rules, namely the
multidimensional analysis of the distribution
of co-occurrence through ECUs.
2. We do not claim that ACASMs parametersand computational rules are the same as those
used by human coders. On this point we keep
an open position, though some studies lead
one to think that human comprehension of text
is also based on computational rules similar
to multidimensional analysis (Landauer &
Dumais, 1997; Visetti & Cadiot, 2002). What
we maintain is that, given their context-
sensitiveness, ACASMs computational rules
are functionally equivalent to human coders
procedures: ACASM reproduces the same basic
functions*evaluation of semantic similarity
and classification*as human coders bottom-up semantic analysis.
3. ACASM is assumed to be functionally equiva-
lent to a model of human bottom-up semantic
analysis based on commonsense, namely to a
human coder interpreting the textual content
guided by no specific theoretical criterion, but
based on the basic cultural and linguistic
competence in terms of which she/he commu-
nicates, understands and interprets in daily life
(Garfinkel, 1967; Valsiner, 2007).
Data Source
The present study concerns a sample of verbatim
transcripts, extracted from a good-outcome Italian-
speaking 124-session psychotherapy (the Katja
case). Katja received a Cognitive-Constructivist
Therapy for Narcissistic Disorder (Dimaggio, Her-
mans, & Lysaker, 2010; Semerari, Dimaggio,
Nicolo, Procacci, & Carcione, 2007). The treat-
ment lasted three and a half years; according to
several independent analyses Katjas therapy was
considered a good-outcome therapy (for details, see
the review proposed by Nicolo & Salvatore, 2007).
Good outcome proved to be maintained a year
later in follow-up (Dimaggio & Semerari, 2001).
Analysis was performed on the transcripts of 48
sessions of the third and last stage of the psy-
chotherapy (from session 74 to session 121,
corresponding to the last year and half of psy-
chotherapy*note that the last three sessions were
left out because they were participated in by othersubjects than the therapeutic dyad). We decided to
concentrate our analysis on just the last part of the
psychotherapy because one can expect that the
patient-therapist talk is subjected to a process of
specialization in the use of words*namely certain
combinations of words become progressively more
and more probable while others become more and
more improbable; empirical evidence supporting
this hypothesis on the same case is provided by
Salvatore, Tebaldi and Pot (2009). Therefore,
given that our analysis is a first test of validity of
the method, we preferred to focus on the lastportion of clinical dialogue, where patterns of co-
occurrences should be more differentiated and
therefore more efficiently distinguishable in
clusters.
Following a dialogical clinical approach (Gennaro
et al., 2010) the whole transcript of the sessions,
encompassing both patient and therapist talk, was
inserted into the analysis.
Design
Analysis of the thematic contents and theirtemporal evolution. First, we applied the
ACASM procedure to the textual corpus and
interpreted the clusters defined in terms of their
thematic content. Second, in order to take into
account the temporal evolution of the thematic
contents, we divided the period of therapy analysed
into three sub-periods (sub-period A, sessions 74
89; sub-period B, sessions 90105; sub-period C,
sessions 106121). The incidence of each thematic
Table II. Descriptive parameters of the textual corpus subjected to analysis (Katja case)
Descriptive parameters Amount
Sessions 48
Number of elementary context units (ECUs) of the text 5548
Number of elementary context units (ECUs) clustered 5054
Number of occurrences in the text (token) 146673
Number of lemmas in the text (type) 7258
Number of lemmas in analysis 726
Number of cluster extracted 14
Automated content analysis in process research 261
8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
8/19
Table III. Katja cases ACASM output: clusters most representative ECUs, and their semantic interpretation
ECU Thematic interpretation
or so so then on Sunday morning, so see the time he poor thing was trying to give me that
freedom I took advantage of it in an extreme way but because he was giving me the freedom
exactly but in his opinion it was not permission from his view point he just took a decision yes
from my point of view, it was (74; 136.562)
1. Own vs. others point of
view
Its just that I have to be able not to care about it or at least I have to understand his Katja s
view point both or if I understand my view point and the other person s view point then I say
that the other person is right (83; 81.255)Yes exactly, probably, and this is from your viewpoint, we know, we know each other quite
well, and from this point of view you have to take the responsibility and if you look sincerely at
your thoughts you cant think that its the other person that have to notice it, here you have to
discipline yourself, you know, you feel it, two points (79; 73.475)
I dont want to be balanced no, it s just that I also understand my viewpoint, now the difficulty
is understanding both of them, lets say mediating so as to perform actions that somehow are
good for yourself without making you feel too guilty or anyway, without knowing my
viewpoint (83; 54.308)
its clear that changing this perspective changes the way of seeing the defects of others, and of
yourself and all, and also the relationship and this is a change in the vision of yourself, of Katja
and of the relationship, and the general vision of your issues in other things, the vision of
yourself (75; 61.268)
2. Differences in perspectives
This inner torment continually between the choice and continuing to have that perspective
which however was not confined to the view of other things but really of emotions and feelings
linked to. . .
and its what we were saying about challenging the choice each two or threeseconds because if I have another perspective linked to a different sensation (92; 24.638)
its not like buying a Ferrari because its one thing to buy a car and another to buy a Ferrari, in
sum between one million six hundred and the thirty million that a Ferrari costs there is a huge
difference, but between one million six hundred and thirty million which is the price of a car
theres also a difference, but its always less than that between a million six hundred and a
Ferrari (86; 24.107)
its true in the sense that as it were, your story is like that, but with dad I understand, yes I
agree, but what we were saying last time, its one thing if someone doesnt understand me, I
had some problems too, then when I had problems I wasnt able to explain myself, I mean,
when I explain myself, when I say one plus one equals two and then if you want to do as you
like its as she said (114; 23.364)
I mean I tried to explain some things to him, to tell him after that episode of the bloodiness
and so . . . I actually see that its all pointless, all quite pointless, but he insists on a specific
topic that is the daughter who needs to be treated, no, no, not on the topic of the daughter that
needs to be treated (109; 148.387)
3. Concerns for relational
problems
Yes, the couple doesnt work but not only the couple doesnt work, I dont work and neither
do you, that means it becomes a false happiness because children are absolutely are like pet
animals so they feel what happens (98; 92.132)
in those aspects you are highlighting, the relationship with your parents, now it seems silly,
you are highlighting some daughter-like aspects, that is, those are aspects related to
dependence, the car, upkeep, and obviously in this position you feel bad, as soon I m in this
position I cant stand it (94; 90.888)
Im calming down but if you assume a more stable identity people trust it too much and thus
you get bored because nothing happens (87; 88.162)
A. and I gave our parents some presents, for my mother we bought a pair of shoes, we gave the
same present each other, more or less the same gift . . .because between us we never give any
gifts. . . how come between Alberta and me? (75; 126.676)
4. Exchange of presents
his, mine that is because when I went to get it he was very kind, he said here it is, I wanted to
give it to you for your birthday, its an engagement gift, its nicehe was very kind to try to
connect the gift with what you are feeling (77; 110.891)But silly example if someone knows that on my birthday I like to receive flowers, I tell you this
the first year, the second year, do I have to tell you the third year? Or do I say please will you
give me some flowers? Ah, what did you give them to me for? (98; 79.356)
in this hour we did some window shopping do you like this bag?I said, yes, nice, sure,
well, I was thinking he was going to give me a present and so, you know, that was my birthday
present (121; 68.016)
more than attacked I felt misunderstood but also not respected yes is it more important to be
not respected or to be misunderstood? Well, I think that its a result, that is, maybe
misunderstood in the sense that there is no effort to relate to someone else and therefore to
understand them and try to respect what has been understood (79; 50.439)
5. Experience of feelings in the
relationship
262 S. Salvatore et al.
8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
9/19
Table III (Continued)
ECU Thematic interpretation
You are imagining yourself in our work? And you are saying: Im imagining, lets do a meta-
thought, right, ok lets make a meta-thought and you are saying there is the risk that I could be
here, pay attention exactly to the way you described yourself before (118; 31.874)
moustache I think that I will need his help to continue my therapy, that is the therapy goes on
by itself, but we need his help for the therapy. Lets think how it could be presented, and then
actually also the sense of our proposal, where he can help us or where I could convince him to
come? (116; 23.848)
we are talking about a sensation of emptiness which you are describing eh but the silence of
last time in the way you described it to me may help me to understand. . .that is Im trying to
visualize the scene, the inner scene in this moment its more or less like this, there is an inner
feeling about something inside that, isnt there? (100; 16.484)
I mean even if I had some difficulties understanding those signals of attention, that doesn t
concern these episodes, its about something else, different circumstances, but not these
episodes even if I have some difficulties reading the attention signals that are given eh, anyway
I dont care about it (98; 88.81)
6. Experience of difficulty in the
relationship
guys eh of course broadly speaking your difficulty is to admit to yourself that you re involved
in a relationship, somehow it was really I still have difficulties, but if this is the difficulty, as I
said, have more (119; 88.767)
maybe its a difficulty related to being able to live inside the world of others, able to move, you
see? In the world, in general even without the relationship thats under way its a difficulty in
directing this energy which is anyway activated, that is it s somewhere (100; 83.576)
Im trying to check with you and with everyone and this is a difficulty that also belongs to you:
P: the difficulty is being afraid that the emotions could be too big to be constrained, controlled
or anyway felt, I dont know (118; 42.205)
Im leaving again and thats all, anyway on Wednesday I decided to take a day off because that
friend I study with, lucky her, passed the written part of the Police entrance exam to become
an officer and now she has the oral exam and she said obviously I want to become a
magistrate (111; 52.298)
7. Work activities
so I dont have any working identity simply because I dont work because Im doing that
public exam so its pointless because I have or I dont have difficulties, that is, the issue of
working identity doesnt exist, if it arises when Im working the difficulty or the limit will exist
but at the moment it doesnt (94; 43.806)
the following days I wanted to sleep in the morning but I couldnt either on Tuesday or
Wednesday because I had to go to work because there was something to do, so I didn t relax
on Tuesday or Wednesday but in the afternoon I did my stuff, I had a wax, I went out with a
friend of mine thats (89; 42.793)
another thing that I must say now I realized that I was led astray by you insisting so much on
making choices without following your nature, which meant working three times harder S but
visibly working hard for my magistrates exam I thought you were referring to that kind of
fatigue in the sense that its one thing to (96; 38.153)
Well, some things happened so you couldnt not link together the frame of mind with what
happened V well, but you could have connect the frame of mind to what happened in general
with your dad, which probably created your frame of mind, a basic sadness, yes but it was also,
you see, (104; 189.949)
8. Account of negative feelings
it was due to the suffering that I feel each time I meet or I hear my dad, the frame of mind it s
that frame of mind, I didnt cry because Im narcissistic, but anyway sometimes I shed a tear,
yeah, very often, but this is not the point because anyway what this meeting gave you (115;
125.425)
its a period that it seems to me that Ive been living with this struggle for ages, fighting,
improving myself, but I dont enjoy myself, Ive had enough! Its a drag, I get bored, laughter,
I dont understood your frame of mind K., Im sorry my frame of mind (79; 110.369)
because obviously Ill become bad, Ill be bad, Ill be bad, I dont know, I dont feel as if Im
bad, but at times when he says hes sorry, it seems to me that he expresses his upset, but others
see it as being bad (88; 100.742)
And so you also have to accept all the consequences on you, this lack of sensibility from your
dad, that is if I accept that others can make mistakes, I also accept, I dont understand I dont
think its so easy for you to accept that people in general could be more or less sensitive could
have a different degrees of sensitivity, why cant these ones? (114; 56.695)
9. Tolerance of negative feelings
Automated content analysis in process research 263
8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
10/19
Table III (Continued)
ECU Thematic interpretation
I cant do it, I cant feel this connection, that is to do it mentally, I can do that, I can do it,
anyway you dont want to accept that its an aspect of you, yes but I accept it, but what would
it mean in concrete terms? Accepting that Im working _t hard, I accept that, I feel it, but I
cant understand which part is holding off at a distance, and at a distance from what? (87;
39.952)
Yes, its like saying that there is an aspect of loving, of taking care, I mean thats ok, accepting
that it has a limit, accepting that there is a degree of suffering on the part of someone else that
we cant do anything about, no? And accepting that a degree of guilt feeling, ok could I
suggest something you could read? (84; 37.911)
going into a closed agency has no sense because it makes me waste time that I could use
differently doing more interesting things, so yes it troubles me, ok, so you are telling me that
now you are able to manage your troubles in a more natural way, that is we can say (93;
32.784)
This? No ah, I dont have to but the first, the first moment _no, this Saturday and Sunday he
has An and I dont but Monday is a holiday yes Monday is a holiday and we will be together,
but it doesnt count as a weekend, Monday really I will not make it weigh, this Saturday and
Sunday he is with An, and Katja? (105; 72.368)
10. Leisure
anyway, its better than before, thats all, then its always the same struggle with money he
complained all winter and he still goes on I wont be able to have any holidays, my god, my
god, my god, but in fact he goes away every weekend, now he s leaving for ten days I dont
know where in the mountains and then maybe hell go to his relatives in France, down in YY
(97; 38.19)
alone, and yes laugh he is going to the gym, poor him, yes nice I like him, no it s something I
like, hes nice, then slowly slowly in the following days, in the following days, in the following
days uh. Wednesday and thats all because on Thursday weve decided to go away not to go
away (77; 29.37)
we had a weekend alone on 4 September because An went to OO to a child s birthday party,
so we now at 4 October, and October has 4 weekends and I say see if you can take one, good,
otherwise anyway Im living a life where I get up every day at 8.00 including Saturdays and
Sundays (102; 24.718)
ah, no no, so I had to phone to get information the day after this marriage was too much, the
day after the marriage we had to leave for two days in NN but the weather wasn t good on
Sunday and so the ferries werent leaving (97; 22.734)
relatively its _not that Im leaving but goodbye and thank you from a view point obviously I
mean goodbye and thank you its not directed to my parents, goodbye and thank you no, no of
course not the fact of being a daughter, not the fact of being daughter, of being maintained its
one of the aspects connected to being a daughter, but not the whole thing I imagine, no, its a
very important part of it (94; 94.047)
11. Adherence to others
expectations
there isnt a dialogue so its impossible and so he needs to be surrounded by people or
someone to say yes, yes, the partner or whoever say yes, of course he s smart, yes, hes good,
hes good, but whats hes good at? (103; 88.427)
in the sense that she has some shortcomings of her own but she is nice in the relationship
because she doesnt smother, she isnt, anyway shes good, while my dad isnt, not at all, so
hes there, so thats why I think _ that more or less_ there is a better balance form that point of
view, then obviously until I start working and earning, goodbye and thank you, well (94;
85.956)
my goals, so that if he wants to give me something yes, no, but more than good or bad which
anyway is all relative, its the fact that I felt at ease which is much harder than being good or
not, a person may be good but feel like this and thats a quality that, youve seen I think, that
comes quite naturally to me, no? (91; 77.94)
and also its hard, hard in the sense that everything is hard, a hard perspective the way it s
managed yes, my dad, then it depends on the period because when I want to see him often,
certainly in this period*continues less certain*the less I see him the better I feel, but I went
to his home, I saw the lights, I saw a whole difficult period with dad, (94; 134.537)
12. Refusal of dependency
maybe Im wrong or maybe its because Im used to it for so long, I never depended on you,
you depended on me the difference is maybe quite considerable if it will make you more feel
better in a few years you will depend on us, well be equal, I really dont think so, Id prefer to
shoot myself rather than depend on you (120; 108.038)
Also very long periods when I was happy. I was calm, I felt good with that person name period
and period, I can also forget name, I remember him, well, name, ok . . .. (98; 82.972)
264 S. Salvatore et al.
8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
11/19
content was calculated as the percentage of ECUs
associated with it.
Analysis of validity. We have translated the
Turing-like criterion of validity (see section Pur-
pose of the Study and Hypothesis) into two
complementary hypotheses, each of them concern-ing one of the two basic functions implemented by
bottom-up semantic analyses*evaluation of simi-
larity and classification (see section Characteristics
of ACASM, point 1). More in particular, we
expect to find that: (a) the ACASMs evaluation
of similarity of ECUs is consistent with the
evaluation of semantic similarity produced by blind
expert human coders (hypothesis 1); (b) the
ACASM classification of ECUs is consistent with
those provided by blind human coders and based
on their semantic content (hypothesis 2); needless
to say, hypothesis 2 is only exploratory, being
expressed in terms of confirmation of the null
hypothesis.
In order to test these two hypotheses, we subjected
the ACASM output to the following two analysesbased on judgments performed by independent blind
expert human coders.
Analysis 1. Association of the ACASMs
Assessment of Similarity and Human Coders
Measure of Semantic Similarity
The aim of this comparison is to estimate the
consistency between the ECUs evaluation of
Table III (Continued)
ECU Thematic interpretation
I was thinking of the wheel breaking, things more like this getting stuck in the middle of the
road yes yes, but also an aggression could happen no, no, usually those episodes can happen to
girls alone at night, I know usually*it happens I know, actually its always happening, Ive got
some girl friends who always get someone to take them home into the house, oh well, (76;
77.202)
its not important that its good for you or not but knowing that if it hurt you or it s good for
you its in your hands yes, it certainly is, but as I was saying before, its not that as you said
now it hurt you or its not good for you, its the same, its like two levels before, its not that
you dont know theres a level below a level, which, I mean, (89; 84.752)
13. Attitude towards the other
Yes, he is a good guy, he understands you, hes improved, he really loves you, but you, I mean,
are you in love or not? I dont answer myself and he doesnt give any answer and I dont
answer myself and if I want to give an answer yes but not yes, but it isnt an answer, its an
attempt at an answer (98; 73.914)
because they are not equal on me because for sure I feel better because for sure I know that
before the answer might have been dictated by an aggressive attitude and so the answer was
aggressive, and now its not like that anymore, that is, the answer is the answer and thats all,
that is its not linked to something that I do, its his choice and thus if one feels like that eh,
(102; 64.278)
that, its always what we said last time, that it seems obvious to me to make certain requests
and wait for certain answers and instead different ones arrive, I dont know, about the photo,
on the furniture, then uh (101; 52.244)
I still dont have it, ok, I wont read them, I wont read them if you want to wait it would be
better then I will also let you read other things that I m writing, probably, in fact I will also ask
you for advice about things, about as it were, about the relevance of what Ive written to what
youve experienced here (103; 93.922)
14. To communicate
obviously in some way the fact of being noticed the two things are exchanged, you wear
something of someone else no? The goal is to be noticed to write a story R: together but
belonging to someone else, not to me, that is I didn t have to write it was her that needed to
write and asked me please, please (87; 68.722)
because I need to be noticed because I have to write, I need to be noticed by that writer
because I have to write I dont know, a biography, or something, in sum he had to write
something and I was trying to say no but, (56.995)
And what did he write on the card, uh . . . he wrote well now I dont remember exactly the
words it was like you are my grumbling love but you are wonderful, Id never change you with
anyone elsethats all. I think it could correspond to reality, what was on the card? (89;
46.646)
Note. Translation from the original in Italian.
The first number in parentheses indicates the session in which the ECU occurred; the second number is a measure of the level of ECU s
representativeness of the corresponding cluster (Chi square metrics).
Automated content analysis in process research 265
8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
12/19
semantic similarity by human coders and the evalua-
tion of similarity provided by ACASM. To this end,
we adopted the following five-step procedure.
First, we selected 70 ECUs, the five most repre-
sentative ones from each of the 14 clusters defined
by the application of ACASM to the corpus. As
criterion of representativeness we adopted the
chi-square derived parameter computed by Cluster
Analysis for each ECU (the output of CA is reportedin the section Results; cf. Tables II and III too). This
parameter is based on the computation of the
number of a clusters words co-occurring within
the ECU*the more the words are in the cluster, the
greater is the representativeness of the ECU for that
cluster.
Second, two blind coders (PhD students), with
experience in content analysis for psychosocial
research, were separately asked to evaluate the
semantic similarity of the 2415 pairs of ECUs
produced by the combination of the 70 selected
ECUs (each ECU was compared with all the
others the number of pairs is given by theformula k(k 1)/2, where knumber of
elements70; therefore: 70(70 1)/22415
couples). Consistent with the commonsense criteria
of coding (see section Characteristics of ACASM,
point 3), we have chosen to use coders not
endowed with clinical expertise and not to provide
them with any specific, theory-oriented semantic
rules and criteria for coding. The coders received 2
hours of preliminary training. Training was aimed
at clarifying the task. Moreover, coders were
informed that the ECUs had been extracted from
the verbatim transcript of a psychotherapy and
asked to use a 5-point Likert scale*from 1
indicating very different thematic content, t o 5
meaning same thematic content. No further
information on the aim of the task was provided
to them; coders were blind to ECUs belonging to
ACASM clusters. The ECUs were presented in
random order, the same for both coders. By so
doing, 2415 similarity judgments were obtained
from each coder. It is worth noting that we did not
implement any consensus procedure, often adopted
in semantic analysis for the sake of increasing the
inter-coder convergence (e.g., Stiles, Elliott, Lle-
welyn, FirthCozens, Margison, Shapiro & Hardy,1990). Thus, the comparison between ACASM
and human coders is limited at the basic level of
functioning of semantic analysis*namely not en-
compassing the post-coding process of increasing
reliability.
Third, in order to make the matrix thus obtained
suitable for parametric analysis, the Likert scores
were transformed into metric scale, following the
procedure proposed by Ciavolino and Dahlgaard
(2009), based on the probability associated with the
relative frequency of each level of similarity.
Fourth, we calculated an ACASM rate of similarity
for allthe 2415 couples of ECUs. To this end, we used
the Euclidian distance as the ACASM measure of
similarity between two ECUs. In order to understand
this parameter, one has to consider that each ECU
corresponds to a point on the multidimensional
factorial space resulting from the multidimensionallexical correspondence analysis performed as the first
step of the procedure of Cluster Analysis (see above,
Method section, ACASM step 4). The Euclidian
distance is the metric distance between two points on
this space. The closer the two points, the less is the
Euclidian distance, and the more similar are the
ECUs they represent (Lancia, 2002). In formal terms,
the distance between every couple of ECUs was
calculated as:
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiffiffiffiffiffiffiffip1 q1 2
p2 q2 2
::: pn qn 2
q
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiffiffiffiffiffiffiXnk1
pkqk 2:
s
withP(p1,p2,. . .pn) andQ(q1,q2,. . .qn) represent-
ing the coordinates on the n-dimension factorial
space of the two generic ECUs whose distance is
computed. In the case of our analysis, we used the
first 10 factorial dimensions defined by the multi-
dimensional lexical correspondence analysis applied
to the corpus (i.e., n10).
Finally, we compared the values of Euclidian
distance (as ACASMs measure of dissimilarity) andthe human coders judgments of semantic similarity.
The comparisons were performed on all the 2415
couples of ECUs for each coder. Given the structure
of the ECU sample*five ECUs for each of the 14
clusters*most of the pairs of ECUs had a low level of
similarity. Consequently, most of the 2415 pairs were
rated 1 by both coders (coder A: 1 corresponded to
91% of judgments; mean1.1085; d.s..37506;
curtosis18.216; skeweness4.042; coder B:
point 1 corresponded to 77.8% of judgments;
mean1.3102; d.s..65622; curtosis4.304;
skeweness2.203). For this reason, we adopted a
nonparametric index of correlation*Spearmans rho.According to the first hypothesis, we expect to find
a significant negative correlation between the Eu-
clidian distance and the average human coders
evaluations of similarity; the negative direction of
the correlation is due to the fact that the Euclidian
distance is a measure of dissimilarity, rather than
similarity. Moreover, we expect to find that this level
of correlation is not distinguishable from the level of
association between the two coders.
266 S. Salvatore et al.
8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
13/19
Analysis 2. Level of Agreement Between
ACASM and Human Coders Classification of
ECUs.
This analysis is aimed at comparing the
ACASM classification with those based on human
interpretation of the semantic content of the ECUs.
The analysis is based on the same set of 70 ECUs
adopted for analysis 1. The ECUs were ranked
randomly, the same for all the coders, to avoid that
their order of presentation being related to cluster
membership. Three blind coders, different from
those involved in the previous analysis, yet similar
for level and type of competence (i.e. PhD students,
skilled in content analysis for psychosocial research,
lacking clinical expertise) were separately asked to
group the 70 ECUs into 14 groups of five ECUs on
the basis of their thematic similarity. We have
indicated 14 partitions in order to make the human
coders classification directly comparable with the
ACASMs. Also in this case, the coders were given 2
hours of preliminary training, for the sake of makingthe task clear to them. Moreover, coders were
informed that the ECUs had been extracted from
the verbatim transcript of a psychotherapy. No
further information on the aim of the task was
provided to them; coders were blind to ECUs
belonging to ACASM clusters. Akin to analysis 1,
no theory-oriented semantic criterion of classifica-
tion was provided to coders, and no consensus
procedure was implemented.
Finally, Cohens K inter-coders agreement was
calculated for the four classifications (i.e., those
carried out by the three coders and the one produced
by ACASM); thus, we calculated six Cohens K
values: three concerning the coders against each other
and three each coder against the ACASM.
According to the second operative hypothesis, we
expect to find that the level of ACASM-human
coders agreement is at least of the same degree as
the level of agreement between human coders.
Results
Interpretation of the Thematic Contents and
their Incidence
Theapplication of ACASMto thecorpus (cf., Table IIfor statistics describing it) produced 5548 ECUs and a
list of 7258 lemmas, from which we sampled 726
lemmas, following the procedure described above (see
step 2 of the ACASM procedure). Therefore, the
Cluster Analysis (step 4 of the ACASM procedure)
was performed on the matrix defined by 5548 ECUs/
rows726 lemmas/columns. Cluster Analysis was
able to group 5054 out 5548 ECUs (91.095%, cf.,
Table II). It provides 14 clusters as the optimal
partition. A sample of the most representative ECUs
for each cluster, together with the clusters interpreta-
tion in terms of thematic content, is provided in Table
III. Table IV shows the number of ECUS grouped ineach cluster. Exchange of presents (10.74%), Differences
in perspective(9.36%), Adherence to others expectations
(8.71%), Leisure (8.43%) and Tolerance of negative
feelings (8.23%) are shown to be the most frequent
cluster/thematic contents, while the least frequent are:
Own vs. others point of view (4.45%), Experience of
difficulty in relationship(4.71%), Concerns for relational
problems(5.05%)Account of negative feelings (5.1%).
The frequency of the 14 clusters changes signifi-
cantly through the three sub-periods (Chi-square
132,684; df 26; pB.000). Nevertheless, the visual
inspection of the distribution of the clusters shows
how all clusters tend to be spread among the three
periods, namely that they occur in every sub-period
(cf., Figure 1).
Analysis of validity. As concerns analysis 1,
the ACASM measure of similarity (the Euclidian
Table IV. Partition of ECUs in the clusters
Cluster/Thematic content Number of ECUs Percentage
1. Own vs. others point of view 225 4.45%
2. Difference in perspectives 473 9.36%
3. Concerns for relational problems 255 5.05%
4. Exchange of presents 543 10.74%
5. Experience of feelings in the relationship 353 6.98%
6. Experience of difficulty in relationship 238 4.71%
7. Work activities 332 6.57%
8. Account of negative feelings 258 5.1%
9. Tolerance of negative feelings 416 8.23%
10. Leisure 426 8.43%
11. Adherence to others expectations 440 8.71%
12. Refusal of dependency 380 7.52%
13. Attitude toward the other 405 8.01%
14. To comunicate 310 6.13%
Automated content analysis in process research 267
8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
14/19
distance) and the human coders judgment of the-
matic similarity were significantly correlated in both
cases (ACASM-coder A: Rho.125, PB.01;
ACASM-coder B: Rho.121;PB.01). The correla-
tion between the two coders (coder A-coder B:
Rho.162; PB.01) is of the same magnitude as
the correlation coder-ACASM.
Table V shows the Cohens K measures of inter-
coder agreement concerning the classification of the
70 ECUs into 14 partitions (analysis 2). The
magnitudes of K are quite similar among the six
scores; all comparisons lie within the range 0.34
0.42 (according to Landis & Koch, 1977 this
corresponds to a fair to moderate level of agree-
ment). The levels of agreement between human
coders and between human coders and ACASM
are substantially overlapping*the average K
concerning the agreement between coders is .383(sd.034); the average K concerning the agreement
between human coders and ACASM is 0.378
(sd .45). The highest K (.427) concerns the
inter-coder agreement between coder 3 and
ACASM.
Discussion
ACASM has mapped the transcripts content in terms
of 14 clusters, each of them being interpretable in
terms of thematic content. From a quantitative
standpoint, all thematic contents prove to be specific,
in the sense that every cluster encompasses only a
limited portion of the therapeutic exchange*the
most frequent thematic content concerns about 10%
of the classified text*but not marginal*no cluster
represents less than about 5% of the classified text.
Moreover, though the overall distribution of thematic
contents has been shown to change significantlythrough time, all thematic contents are present in a
Figure 1. Distribution of the thematic contents in the three sub-periods of psychotherapy.
Table V. Cohens Kbetween coders and ACASM classification
Coder 2 Coder 3 ACASM
Coder 1 .400 .407 .338
Coder 2 .344 .369
Coder 3 .427
268 S. Salvatore et al.
8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
15/19
non-marginal way in all three sub-periods. Taken
together, these results lead us to conclude that each of
the 14 thematic contents mapped by ACASM repre-
sents a systematic semantic area of the clinical
exchange analysed*namely a line of discourse which
is present in varying degrees through sessions, but
which goes through the whole treatment.
Interestingly enough, the most frequent thematic
contents concerns the account of positive circum-stances, associated with the patients experience of
relational engagement (Exchange of presents,
Differences in perspectives, Adherence to others
expectations, Leisure) and/or her inner states and
feelings (Tolerance of negative feelings), while the
least frequent refer to negative issues*in terms of
negative feelings (Account of negative feelings) or
relational disengagement (Concerns for relational
problems, Experience of difficulty in relationship, Own
vs. others point of view). Moreover, one can observe
that some thematic contents seems to be stable
through the three sub-periods into which the
period of therapy examined has been divided*inparticular, Differences in perspectives, Exchange of
presents, Leisure, Experience of difficulty in
relationship, Own vs. others point of view.
If one considers that the period analysed consists
of the last year of the three and half years of good-
outcome therapy, this result lends itself to be
interpreted as a marker of the positive evolution
of the therapeutic dialogue*namely, of the fact
that, in the final segment of the psychotherapy,
patient and therapist have focused on the patients
more positive personal and relational experiences,
leaving conflictual and problematic issues partially
in the background. Needless to say, given the
exemplificative purpose of the analysis at stake,
such interpretation has to be considered in merely
descriptive terms, namely as a picture of the
content of the clinical dialogue between Katja
and her therapist which is consistent with the
good outcome of the psychotherapy.
As concerns the analysis of the ACASMs validity,
findings are consistent with both the hypotheses we
subjected to test.
Analysis 1. Evaluation of Similarity
Results of analysis 1 highlight that ACASM provides
a measure of the similarity of the units of text (in
ACASM terms: ECUs) which is associated with the
evaluation of thematic similarity provided by two
blind coders with average experience in semantic
analysis. More in particular, we have found a
significant negative correlation between the
ACASMs measure of similarity between couples of
ECUs (Euclidian distance) and the human coders
evaluation of thematic similarity. Hence, ACASMs
way of representing the relationship of (dis)similarity
among the units of text tends to agree with that
produced by human coders.
The level of correlation is not high for both the
comparisons (rho.125; rho.150); yet it is
similar to that between the two coders
(rho.162). As concerns this quite low level of
the correlational indexes, we are led to think that itdepends on two convergent factors. First of all, a
role could have been played by the structure of the
data. As observed, the distribution of the evalua-
tion of similarity inevitably proved to have a
limited variability and this has an inherent negative
impact on the calculation of correlation. Secondly,
the limited agreement between the two coders
reflects the data driven bottom-up logic of the
task given to the coders. Each coder was asked to
evaluate the thematic similarity between ECUs,
without providing her/him further indications
about the criterion of similarity which had to be
used. Therefore, the coders low level of agreement
could reflect the multidimensionality of the seman-
tic content: two utterances may be thematically
similar from a certain point of view but different
from many others. Take for example the following
two sentences:
We hope to be able to convince the readers of the
utility of ACASM (1)We hope to be able to enjoy
ourselves with ACASM(2)
Now, if one considers them from the perspective of
the fact that both of them concern a wish related toACASM, they are quite similar; on the other hand, if
one considers them from the point of view of the
content of the desire, they can be considered quite
different, with (1) oriented to a third (the readers)
and (2) to the subject of the sentence; moreover, (1)
concerns the scientific evaluation of ACASM, (2) the
use of it*and so on.
Obviously, bottom-up methods of semantic ana-
lysis can be endowed with constraints increasing the
level of agreement among coders, in accordance with
the specific aim of the analysis. However, the same
can be done with ACASM*
for instance, throughworking on the choice of lemmas to be selected for
analysis. Yet, given that extending the comparison
with human coders to this further level of ACASMs
functioning would have required a different design,
according to the initial aim of this study, we have
decided not to include these further constraints,
limiting the analysis to the extent of potential
agreement at the level of basic data driven bottom-
up analysis.
Automated content analysis in process research 269
8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
16/19
Analysis 2. Classification Task
Results from analysis 2 confirm the picture provided
by analysis 1, from the complementary point of view
concerning the task of classification. Here we have
found that the agreement between the ACASM
classification and the coders classifications is of the
same extent as the agreement among human coders
who are expert in content analysis for psychosocial
research. At the same time, however, analysis 2 also
highlights how the extent of agreement among
classifications*regardless of whether they are per-
formed by human coders or ACASM*is rather low:
from fair to moderate. This double finding requires
some comments.
Preliminarily, in order to appreciate it, one has to
take into account the very large degrees of freedom
associated with the task of classification at stake. As a
matter of fact, the probability of ordering 70 ECUs
(n) into 14 groups (g) of five items (k) is:P(n,g,k)(k!
(nk)!n!)g(5! (705)!70!)146.19100. This in-
finitesimal value of probability of casual agreementcan be considered an assessment of the difficulty of
semantic classification tasks*and in the current
study the classification task is a rather simple
example, compared to those usually addressed in
semantic analysis. Thus, even if the level of agree-
ment is not high in absolute terms, it is more
appreciable as one takes into account that it has
been reached in the context of a task having to deal
with a very high level of uncertainty.
Needless to say, the coders might have made some
mistakes in classifying the ECUs; yet, given their
level of expertise, the error of measurement could
help marginally at best in explaining the not high
level of Cohens K. Just as for the evaluation of
similarity, in the classification task the partial diver-
gence among coders also needs to be considered in
the light of the multidimensionality of meaning.
Texts do not hold a pre-established, fixed meaning;
rather, they define the constraints within which the
reader constructs the interpretation (Eco, 1979).
Hence, any ECU has no single true meaning, as
such able to define normatively which is the right
classification and, complementarily, to qualify all the
other classifications as errors. On the contrary, any
unit of text is open to a multiplicity of interpreta-tions. Consequently, the divergence among classifi-
cations that we have found depends on the fact that
coders may classify the ECUs in accordance with a
plurality of hermeneutic criteria, each of them
grounded on a certain component of the meaning
at stake and made pertinent by the coders specific
point of view and interpretative plan (Salvatore,
2011). In sum, the moderate-fair level of agreement
has to be considered in the light of the inherent
interpretative autonomy of the coder. Anyway, we
recognize that our results do not allow us to exclude
the alternative interpretation*namely that the mod-
erate-fair level of agreement (as well as the low level
of correlation shown by analysis 1) is a matter of
error of measurement. Further analyses are required
for arriving at a conclusive statement on this point.
From a complementary point of view, the similar-
ity of the levels of agreement among the three pairsof coders provides food for thought. In order to
interpret this aspect of results, one has to take into
account that coders were asked to classify the ECUs
in terms of commonsense (see section: Design). One
can thus conclude that the convergence among
coders reflects the fact that they share some implicit
semantic criteria rooted in their common cultural-
linguistic membership. Incidentally, the statement
just made is not contradicted by the fact that the
agreement documented by analysis 2 is only of
moderate-fair extent. This is so because common-
sense guides the interpretations through texts in a
variable way: according to their semantic, syntacticand lexical characteristics, some units of text are
more conventionalized (Bartlett, 1932) sensitive to
the influence of commonsense, while others are less
affected by this semantic attractor (Rommetveit,
1992; Valsiner, 2007). To summarize, we consider
the agreement between the classifications performed
by the independent coders as the effect of the
commonsense ground shared by the coders and as
such guiding them to converge with each other. On
the other hand, the intermediate extent of the
agreement shows that this common ground put
some constraint on the interpreters autonomy*on
the more conventionalized part of the text*but it
did not cancel it.
The homogeneity of the levels of agreement
between the set of the three inter-coder comparisons
and the set of the three coder-ACASM comparisons
allows us to draw the following double conclusion.
Firstly, as expected by hypothesis 2, the ACASM
classification reaches a level of agreement with those
carried out by human coders, which is consistent
with the level of agreement the coders are able to
reach with each other. In the final analysis, this
means that*as the Turing-like criterion requires*
an external observer blind to the nature of theclassifier could not distinguish among the four
classifications (the three provided by coders and
the one by ACASM). Hence, analysis 2 shows that
ACASM satisfies the Turing-like criterion as far as
the classification task is concerned. Secondly, the
level of agreement between ACASM and coders is
comparable to the level of agreement that human
coders reach with each other on the basis of the
commonsense competence they share as members of
270 S. Salvatore et al.
8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
17/19
a given cultural-linguistic community. Therefore,
though this does not necessarily mean that ACASM
performs the same job carried out by human coders
on the basis of commonsense (i.e. computational
equivalence), it means that it does a job at least
quantitatively equivalent to that (i.e., functional
equivalence).
Methodological Limits of the Study
Before concluding, some major limits of our study
have to be underlined, for the sake of clarifying how
the results discussed above have to be interpreted.
Firstly, two issues concerning the design have to
be highlighted. On the one hand, the comparison
between human coder and ACASM is based on a
non-random sample of units of analysis*we selected
the units of text in accordance with the ACASM
output, sampling the most representative ECUs for
each cluster defined by the automated method. We
adopt this modality of sampling in order to reduce a
potential source of variability and focus the compar-
ison on the parts of text that are more clearly and
reliably interpretable from the perspective of
ACASM output. On the other hand, for the sake
of making the human and ACASM classification
homogeneous, and therefore immediately compar-
able, we asked coders to classify the units of text in
the same number of classes as those produced by the
automated method (14). We recognize that these two
choices weakened the Turing-like criterion, because
they made the terms of the comparison (i.e., the
ACASM output and the human coders perfor-
mance) non-independent. Thus, even though thedesign adopted might have improved the reliability
and power of the analysis, it did so at the cost of
reducing its external validity: our study leaves open
the question of whether the indistinguishableness
between the performance of human coders and
ACASM would have been retained if a random
sample of units of text had been used and no
constraints had been put on the number of classes
human coders adopt for the sake of classifying.
Secondly, we compared ACASM and human
coders just on the two basic functions of similarity
and classification. Yet, human coders perform such
functions on the basis of a preliminary operation ofselection of the pertinent part of the text. In order to
code, human coders firstly have to select as relevant
any parts of the text, thereby defining the units of
analysis to be subjected to coding. And it is evident
that the output of any semantic analysis strongly
depends on how (in terms of which criteria) perti-
nentization is carried out. For instance, according to
the Narrative Process Coding System (NPCS; An-
gus, Levitt, & Hardtke, 1999; Angus & Hardtke
1994; Angus, Hardtke, & Levitt, 1996), coders
assume as unit of analysis the thematic nuclei
(according to the terminology of the method: con-
tent areas); once this construction of the unit of
analysis has been performed, they code them in
terms of narrative categories (External Narrative
Process Sequences, Internal Narrative Process Se-
quences, and Reflexive Narrative-Process Se-
quences). Still, think of methods like the CoreConflictual Relational Theme (CCRT; Luborsky &
Crits-Cristoph, 1990) and the Innovative Moments
Coding System (IMCS, Goncalves et al., 2009;
Goncalves et al., 2010), whose systems of coding
are applied only after the selection of the units of text
considered pertinent (Narrative Episode in CCRT;
Innovative Moments in IMCS). As concerns
ACASM, it adopts a data-driven bottom-up proce-
dure of pertinentization, as implemented by the
methods step 1. According to this procedure, all
the text is selected, and the pertinentization concerns
the length of the segments of text. However, the non-
selective, data-driven character of the ACASMprocedure of pertinentization does not mean that it
is a neutral operation. Rather, through its specific
way of segmenting, ACASM constructs a peculiar
version of the textual corpus (e.g., a partition of
groups of sentences) as the object of coding: its
thematic map cannot but reflect and move within the
limits defined by such a version. Consequently, we
have to conclude that the validity of our comparison
among human coders and ACASM is limited to a
model of a human coder adopting the ACASMs
version of text as object of coding. However, we do
not consider this limitation a reason for invalidating
the results of the current study. Meaning does not
have its own length and place in the text: one may
segment units of analysis at many gradients of
length*words, sentences, groups of sentences, as
well as larger partitions of texts*and will none-
theless create a version of text that is semantically
interesting. Thus, there is not a preferential way of
pertinentizing*any system of coding entails a defi-
nition of the units of analysis, in accordance with its
aim and theoretical framework as well as with the
computational requirements for implementing it.
Consequently, further studies have to verify whether
the ability of ACASM to satisfy the Turning-like testwithin the constraints of the current study is the basis
for the more general capability of ACASM to provide
a thematic map that is meaningful in itself and usable
for clinical purposes, in integration with other
methods too. As concerns the latter point, we already
have some promising evidence*Nitti, Ciavolino,
Salvatore, & Gennaro (2010) have applied ACASM
as the first phase of a more articulated method
(Discourse Flow Analysis) aimed at analysing the
Automated content analysis in process research 271
8/10/2019 Metodo Automatizado Analisis Contenido Psicoterapia
18/19
way contents connect with each other within the
communicational flow of the psychotherapy. In so
doing, they were able to show that the way contents
are related to each other changes through the
psychotherapy process, and that this change is a
valid marker, thanks to which one can discriminate
the clinical quality of sessions.
Conclusion
This study has presented an automated method of
data-driven bottom-up semantic analysis*
ACASM*providing a first test of its validity. Results
have shown that ACASM produces a meaningful,
systematic map of the thematic content of verbatim
psychotherapy transcripts, which is consistent with
the one produced by expert human coders.
Needless to say, this study is just a first step in the
direction of ACASM validat