Seediscussions,stats,andauthorprofilesforthispublicationat:https://www.researchgate.net/publication/247177388
Fromspendingtounderstanding:Analyzingcustomersbytheirspendingbehavior
ARTICLEinJOURNALOFRETAILINGANDCONSUMERSERVICES·JANUARY2009
DOI:10.1016/j.jretconser.2008.04.001
CITATIONS
5
READS
30
4AUTHORS,INCLUDING:
GregB.Davies
UniversityofOxford
18PUBLICATIONS64CITATIONS
SEEPROFILE
NickChater
TheUniversityofWarwick
286PUBLICATIONS9,212CITATIONS
SEEPROFILE
Allin-textreferencesunderlinedinbluearelinkedtopublicationsonResearchGate,
lettingyouaccessandreadthemimmediately.
Availablefrom:PhilippOtto
Retrievedon:05February2016
This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies areencouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy
From spending to understanding: Analyzing customers by theirspending behavior
Philipp E. Otto �, Greg B. Davies, Nick Chater, Henry Stott
Department of Psychology, University College London, 26 Bedford Way, London WC1H 0AP, UK
a r t i c l e i n f o
Keywords:
Lifestyles
Direct marketing
Customer relations management
Segmentation
a b s t r a c t
In customer segmentation, a common strategy is to use individual differences as a predictor of future
behavior. Recent advances in data management in large financial institutions give an unprecedented
and potentially powerful source of data for identifying such differences. We show that spending data
can substantially help target the direct marketing of financial products, and constitutes new
information, not captured by demographics. In particular, a systematic combination of this independent
source and more traditional measures can enhance the predictive power of marketing research and
improve the relationship with customers as illustrated in a direct mailing selection method which
substantially raises response rates.
& 2008 Elsevier Ltd. All rights reserved.
1. Introduction
Spending in general, and especially shopping, has considerableinformative potential as it carries an expression of people’spreferences. In pursuing our wishes, we display various purchasebehaviors differing in sort, frequency, and variability. Theimproved storage and processing of transactional data by largefinancial institutions makes it possible to analyze these differ-ences in detail. Existing research in this field mainly concentrateson purchasing frequency, retention, or customer loyalty (i.e.,Eriksson and Vaghult, 2000; Stern and Hammond, 2004; for acritical comment see Reinartz and Kumar, 2002). In this article, apsychometric approach is adopted which examines the under-lying consumption styles as differences in financial behavior.Based on a rich set of automatically processed and readilyavailable data in personal financial services, a new differentiationmethod is introduced which extracts differences in financialbehavior directly corresponding to observed behavioral data.
Customer segmentation is widely used in marketing, wheredifferent predictive characteristics like ‘‘attitudes’’, ‘‘lifestyles’’,‘‘psychographics’’, or ‘‘purchasing involvement’’ have beenadopted (Gould, 1997; Hustad and Pessemier, 1974; Lockshinet al., 1997; Pernica, 1974; Plummer, 1974; Slama and Tashchian,1985).1 We focus on the understanding of the individual customerand propose different dimensions which can be used as a general-
purpose tool for improving customer relations. The methodproposed here differentiates between customers by using directlyobserved behavior. A promising concept in this context is that ofpsychometric factors which account for differences in financialbehavior. The records of manifested behavior are analyzed toextract the underlying personal financial characteristics, whichrepresent the main individual differences. The advantage of thisdirect behaviorally based analysis is that it is independent ofadditionally gathered data and, thus, can supplement informationon attitude, interests, or demographic data.
In what follows, we first describe the underlying data sourceand the data sample employed. Second, we outline our method ofbehavioral analysis. Third, we report the advantages of the derivedmethod in relation to a direct targeting which is followed by theconcluding discussion and outlook.
2. Research background and data set
Today, behavioral data is being accumulated in large quantitiesby corporations and government, but is often not appliedeffectively. For example, in designing coupon programs, Rossiet al. (1996) have shown that the largely neglected purchasehistory can be highly valuable for improving the profitability ofdirect marketing. The importance of categorized purchases isfurther supported on the household level by Ainslie and Rossi(1998) as well as Bucklin and Gupta (1992). Customer informationis often not processed systematically, and hence its full potentialis not exploited.
In this paper, we used the data of a financial services retailinstitution with sophisticated records of customers’ regular
ARTICLE IN PRESS
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/jretconser
Journal of Retailing and Consumer Services
0969-6989/$ - see front matter & 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.jretconser.2008.04.001
� Corresponding author. Tel.: +49 3355534 2239; fax: +49 3355534 2390.
E-mail address: [email protected] (P.E. Otto).1 Lesser and Hughes (1986) provide a generalizability test for psychographic
market segments. For an early critical overview see for example Wells (1975).
Journal of Retailing and Consumer Services 16 (2009) 10– 18
Author's personal copy
spending behavior. This pre-recorded information was aggregatedand made usable through standard statistical procedures. Theproposed procedure involves low running costs, uses widelyavailable statistical techniques (which can be applied automati-cally), and can serve a variety of marketing purposes.
2.1. Data description
The processed source data consisted of debit transactionsmade within the different payment mechanisms (Fig. 1). Thesedata are available at an individual level on the customerinformation warehouse alongside other personal informationsuch as demographics, credit scores, lifestyle variables, etc. Allrecorded transactions are evaluated on the basis of the Britishmerchant Standard Industry Classification (SIC). This informationallows a separation into different types of spending behaviors. Thetransactions are separated into 370 different debit categories. Thisinitial data classification is completely automated, and carried outby the bank’s existing systems. The predefined categories allow anevaluation of individual spending behavior, and provide behavio-rally meaningful data by enabling a characterization of individualsaccording to what they spend their money on, how much theyspend, and how spending in the different areas is distributed overtime. In the following analysis we focus on the spendingfrequency and the amount of money spent in the different debitcategories.
It is important, however, to stress the inevitably partial natureof the available data as only customer data passing through thisspecific bank are considered. Hence, possible transactions withother financial services providers are not captured. We addressedthis problem by evaluating only customers who predominantlybank with one institution. This potentially neglects behavioralvariations of people who are more flexible in the use of financialproviders. A more adequate consideration of this bias is onlypossible when customer information is shared by differentinstitutions (Lin et al., 2003). The chosen method of data analysisalso proves to be robust against missing data (Kamakura andWedel, 2000). The considered information is further restricted toinformative transactions only. Within the recorded transactionsthe cash retrievals (ATM) and some of the other transactionswhich do not classify as specific purpose transactions are notfollowed up. The categorized transactions constitute 74% of thetotal number of transactions.
2.2. Sample description
For computational ease in the analysis, the total customer baseof 20 million individuals was reduced. Initially only ‘‘activecustomers’’ were selected, where ‘‘active’’ is defined as thosecustomers who have both a credit card and a debit card with thefinancial institution and who show at least one transaction oneach within the last 3 months. About half of the customers fit thisspecification. From these approximately 10 million active custo-mers, a sample of 300,000 was randomly selected. In the analysisbelow, we use the aggregated annual transactions in differentcategories; nonetheless it is interesting that an examination of thedaily data shown in Fig. 2 illustrates that there are also significantweekly and seasonal patterns, which are not further consideredhere, but which might be useful in future work.
The ages of customers in the sample includes only the rangebetween 18 and 99 years. The age distribution with their amountsspent is shown in Fig. 3. In addition, the definition of activecustomers influences the representativeness of the used sample.2
The selected data provide a substantial record of differences inpurchasing behavior for a specific sample of 300,000 customers.
3. Behavioral differentiation
Using the data of financial services institutions allowsindividual differentiation on multiple purchasing events whichleaves aside specific shopping characteristics such as brandswitching, and focuses on more general drivers guiding thevariation in overall behavior. The aim was to reduce the mass ofbehavioral data into a limited number of useful and manageablefactors to be employed for providing a better understanding of theindividual customer, which can be used in specific marketingcampaigns, thereby promoting individualized services in theprivate financial sector.
3.1. Data aggregation
Our aim was to develop a readily automated statistical methodfor aggregating spending data and not to rely on intuitive
ARTICLE IN PRESS
DebitCard29%
StandingOrder
4%
ElectronicBill Payment
1%
ATM 16%
ElectronicTransfer
0.1%
Direct Debit18%
Cheque12%
CounterTransaction
2%
Credit Card18%
Fig. 1. Debit channel usage frequency.
28th March Good Friday
25th May Bank
Holiday Weekend
21st December Last Weekend
Before Christmas
£10
£20
£30
March 2002 January 2003
Mea
n sp
end
per
day
Fig. 2. Volatility of credit card spending.
2 Generally, the sample is representative for adults of the UK. But as only credit
card holders with a regular spending pattern with one provider are included, parts
of the total population have been left aside. Therefore, the following observations
of spending behavior are restricted to these customers only and are to be
interpreted within these limitations. The average annual income for example is
with £38,000 slightly above the average income of the total UK population with
£34,000.
P.E. Otto et al. / Journal of Retailing and Consumer Services 16 (2009) 10–18 11
Author's personal copy
judgments about the relationship between different categories ofspending. The first step consisted of deriving a suitable level ofaggregation for the spending data. On the one hand, it appearednecessary that the expenditure categories were sufficientlyaggregated in order to enable useful comparisons across indivi-duals, to prevent the analysis from being swamped by noise fromvery small expense categories, and to make the analysis tractable.On the other hand, there needed to be a sufficient number ofexpense categories to ensure that spending behavior could bedifferentiated across individuals.
We therefore grouped the initial 370 categories into largercategories. To do this, we undertook a cluster analysis of the 370debit categories into 32 new spending classes. Thus, similar debitcategories are grouped together forming more or less homo-geneous groups of spending incidents depending on the data. Forthe purpose of achieving a specified number of homogenousclusters we applied the k-means method (MacQueen, 1967),which generates different solutions based on the number ofclusters specified. A hierarchical clustering method was not usedas a specific number of spending clusters was demanded. Thek-means clustering analysis was based on the correlation of thenumber of transactions within the different categories andsearches for the lowest deviations from the means. The numberof transactions were taken here to not rely on the spendingcategory-dependent pound values and to reflect every singleaction.
One advantage of k-means clustering is that distance informa-tion for the items and clusters becomes readily available (Table 1).This simplifies the understanding and interpretation of the clusterresults. Outliers and central categories can be easily determinedand explanations for discrepancies sought. In cases where thereason for the behavioral similarity is not immediately obvious,further investigation into the categories could prove useful inunderstanding the dependencies between the categories. Forexample the grouping of Stockbrokers, Investment, Department of
Social Security (DSS), and Rent initially seemed counter-intuitive.However, once we understand that the data underlying Rent relatemore to commercial rent than to private rent, and that DSS largelyconsists of National Insurance payments on the part of smallbusinesses, then the grouping makes much more sense, and canbe taken to reflect the spending behavior of small businesses orentrepreneurs. Besides the clusters’ interpretability, the hetero-geneity or stability is of empirical importance. The distance ofeach item from its centroid (cluster mean) and the distancesbetween the centroids themselves are good indicators of theclusters’ stability. The clusters vary greatly and have strongoverlaps with each other, often with single outliers distorting
the cluster solution. The 32 spending clusters provide broaderclasses of spending behavior which can be applied to furtheranalysis.
3.2. Data interpretation
For a deeper understanding of the individual differences infinancial behavior an abstraction method to find the underlyingdifferences in the purchasing characteristics is needed. Factoranalysis is a common statistical technique in psychometric tests todetermine the fundamental dimensions of differences withinobserved data. This method was used to compress variables into alimited number of factors which account for these differences. Thederived factors are orthogonal, which means that scores on eachfactor are uncorrelated and hence independent from each other.Each factor thus reflects a different behavioral aspect. Theunderlying aim here is to evaluate spending behaviors and tofind the dimensions on which to differentiate between customers.The personal diagnostic factors are differentially dependent onspecific behaviors and therefore describe different aspects of theoverall behavior. They are seen as the underlying dimensions ofbehavioral variation, presumed to reflect individual differencesand thereby a propensity for a specific behavior (Fishbein andAjzen, 1975). The results of the 32 derived spending clustersprovided the starting point for the factor analysis where weconsidered the individual amount spent in each cluster. Thecorrelation matrix of the spending clusters based on the amountspent in the different clusters is given in Appendix A.
It is desirable to use a small number of factors whilstexplaining as much variance as possible. To determine the optimalnumber of factors we first generated all 32 possible factors in aprincipal component analysis. A measure for selecting a usefulnumber of factors is the eigenvalue of the factors, which measuresthe importance of a factor by giving an estimation of the varianceexplained by that factor in a given data set. A common heuristic isto keep all factors with an eigenvalue of at least one (Kaiser, 1970).In our final solution seven factors where the eigenvalue is clearlyabove one were selected. This limit was chosen because onlystrong, clearly interpretable factors are useful, and factors eight toten, though slightly above one, were not directly interpretable(Table 2). A ‘‘scree test’’ (Cattell, 1966) points into the samedirection. But for a critical review and test of the selection of thecorrect number of factors compare Cattell (1965) and Hakstianet al. (1982).
In the next step the factors were rotated and made moredistinct. The initial factor solution takes the variance between the
ARTICLE IN PRESS
0
500
1000
1500
2000
2500
18 28 38 48 58 68 78 88 98
Number ofcustomers
Avg. £ spentper customera year
Age
Fig. 3. Age distribution of the 300,000 sample.
P.E. Otto et al. / Journal of Retailing and Consumer Services 16 (2009) 10–1812
Author's personal copyARTICLE IN PRESS
Table 1Description of the derived spending clusters
Spending cluster (in order of avg.
member distance from centroid)
No. of members Debits in £‘000,000 RMS (%)a Nearest cluster PCA7b FA7c
1 Catalogue Shopping 1 .18 28 .619 .225
2 Loan Repayments 2 100 6.2 31 �.072 .418
3 Subscriptions 2 20 6.3 28 .009 .865
4 Home Maintenance 3 14 5.9 29 .151 .505
5 Household Bills 6 308 5.3 9 .146 .455
6 Petrol & DIY 3 385 6.0 20 .205 .405
7 Children & Graduates 3 6.7 6.3 29 .681 .593
8 Specialist Holidays 3 16 6.4 29 .918 .014
9 Mortgage & Assurance 4 906 6.4 5 �.098 .283
10 Education 5 17 6.2 29 .114 .229
11 Pensions & Insurance 5 44 6.2 29 .403 .505
12 Leisure—Luxury 4 229 6.6 22 .009 .587
13 Charity 8 5.6 6.1 29 .150 .232
14 Television 4 46 6.6 29 �.011 .391
15 Retail—Other 6 17 6.3 28 .185 .302
16 Health 6 47 6.3 29 .057 .216
17 Services—Financial 7 276 6.2 31 .432 .202
18 Retail—Food & Drink 6 38 6.4 29 �.006 .102
19 Services—Commercial 6 13 6.4 29 .555 .525
20 Retail—General 11 247 6.2 24 .052 .409
21 Services—Other 9 26 6.2 29 �.079 .167
22 Leisure—Creative 14 116 6.1 26 .023 .363
23 International Travel 4 167 6.8 30 .079 .291
24 Retail—Clothing & Home 13 926 6.3 20 .092 .486
25 Car purchase & Running Costs 8 99 6.4 29 �.062 .158
26 Leisure—Intellectual 12 2 6.3 22 .558 .021
27 Leisure—Sports 10 27 6.3 29 .112 .145
28 Services—Professional 5 80 6.8 31 .005 .873
29 Investment & Self-Employed 4 93 7.0 31 .572 .063
30 Travel & Cash 8 112 6.8 31 .065 .454
31 Payment Cards 8 519 6.8 29 .013 .362
32 Career Specific 10 64 6.7 29 1.398 .042
This 32 k-means cluster solution is based on the transactions in the 370 debit categories.a RMS describes the root mean square of the cluster standard deviation across the categories.b PCA7 are the estimated communalities, as the sum of squared loadings across factors, for the seven first principal components of the initial factor solution.c FA7 are the estimated communalities, as the sum of squared loadings across factors, for the seven orthogonal factors of the equamax rotated factor solution.
Table 2Initial factor solution
Factor 1 Factor 2 Factor 3 Factor 4 Factor 5
Eigenvalue 3.93 1.61 1.24 1.13 1.08
Retail—Clothing & Home .68 Household Bills .20 Television .50 Financial Services .42 Loan Repayments .26
Leisure—Luxury .66 Mortgage & Assurance .17 Loan Repayments .50 Leisure—Intellectual .35 Home Maintenance .20
Retail—General .59 Leisure—Luxury .16 Household Bills .40 Services—Commercial .34 Retail–Other .20
Leisure—Creative .58 International Travel .15 Mortgage & Assurance .27 Travel & Cash .23 Services—Commercial .19
Payment Cards .57 Retail—Clothing & Home .14 Financial Services .21 Payment Cards .21 Leisure—Intellectual .19
Petrol & DIY .46 Car purchase & Run. Costs .12 Petrol & DIY .17 Investment & Self-empl. .16 Petrol & DIY .15
International Travel .44 Travel & Cash .12 Catalogue Shopping .13 Loan Repayments .16 Financial Services .14
Career Specific .06 Retail—General �.13 International Travel �.20 Retail—Clothing & Home �.23 Household Bills �.32
Children & Graduates .06 Subscriptions �.81 Leisure–Luxury �.32 Home Maintenance �.38 Charity �.38
Catalogue Shopping .01 Services—Professional �.82 Travel & Cash �.41 Petrol & DIY �.39 Pensions & Insurance �.63
Factor 6 Factor 7 Factor 8 Factor 9 Factor 10
Eigenvalue 1.03 1.01 1.01 1.00 1.00
Home Maintenance .50 Career Specific .73 Services—Commercial .45 Children & Graduates .55 Catalogue Shopping .56
Services–Other .34 Specialist Holidays .35 Retail—Other .44 Education .46 Investment & Self-empl. .38
Financial Services .28 Investment & Self-empl. .24 Catalogue Shopping .43 Career Specific .30 Leisure—Intellectual .15
Leisure–Intellectual .25 Financial Services .19 Career Specific .12 Home Maintenance .20 Home Maintenance .13
Services–Commercial .24 Leisure—Intellectual .18 Charity .11 Mortgage & Assurance .14 Career Specific .12
Investment & Self-empl. .22 Children & Graduates .13 Retail—Food & Drink .11 Services–Other .13 Services—Other .11
Pensions & Insurance .18 Services—Commercial .12 Services–Other .11 Charity .08 Television .08
Children & Graduates �.24 Education �.17 Children & Graduates �.19 Pensions & Insurance �.12 Services—Commercial �.25
Television �.25 Mortgage & Assurance �.17 Investment & Self-empl.�.32 Leisure–Intellectual �.14 Children & Graduates �.42
Catalogue Shopping �.26 Loan Repayments �.23 Leisure—Intellectual �.39 Specialist Holidays �.47 Specialist Holidays �.43
The 10 principal components with their variance explained measured by their eigenvalue. For each factor the seven highest and the three lowest standardized factor
loadings of the spending classes are shown.
P.E. Otto et al. / Journal of Retailing and Consumer Services 16 (2009) 10–18 13
Author's personal copy
input variables into account and not the differences between thefactors themselves. In order to derive comparable factors, whichexplain a higher proportion of variance, a factor rotation methodhas to be used. We wanted to have more than one explanatoryfactor, where the factors themselves are highly distinct accordingto the input variables, therefore an equamax rotation was applied(Landahl, 1938), which is a standard optimization method oforthogonally rotating the factors according to the data fit. Throughthis process the factors’ differences in explained variance isdecreased and we obtain high factor loadings for only a fewvariables on each factor, rendering the factors more distinct fromeach other and making them directly interpretable. The estimatedcommunalities of the different spending clusters, defined as thesum of squared loadings across factors, are shown in Table 1 forthe principal component analysis with seven factors and for therotated factor analysis. As can be expected, through the rotationhigher communalities are obtained. In general, the higher thefactor loading of the spending cluster the more important is thatspecific variable for that factor. The loadings of the spendingclusters determine the factor and are used for the factorinterpretation. The shaded standardized factor loadings inTable 3 show the categories that were most important for thefactor interpretation. The first factor, for example, is highlydependent on the debit categories Leisure—Luxury, Travel&Cash,International Travel, and Payment Cards and is therefore calledLEISURE & TRAVEL. All the factors received labels as they appear tocapture specific behavioral characteristics, though these labels aresubjective interpretations. Together the factors describe a sub-stantial amount of the variance in the underlying data with thefirst two as the main dividers (see Table 3).
The factor analysis, which was used to find regularities in theindividual differences, revealed the underlying dimensions ofbuying behavior. The seven generated factors systematicallyrepresent various characteristics of spending behavior and there-fore reflect seven dimensions of personal financial differences. Asthe factors describe different aspects of the individual behavior,they can be used to differentiate customers on these dimensions.Every customer can be assigned an individual score on each factorby multiplying their percentage of the amount spent in each of thederived spending clusters by the loading on the factor. Summedup over the factor these create the factor score. The factor scorestands for the degree of a specific behavioral aspect (described bythat factor) which can be attributed to that individual or group ofindividuals. For example, the individual factor score for factor one(LEISURE & TRAVEL) is highly dependent on a person’s spending onleisure goods and travel, thus people with a high spending herereceive a high score. But people who instead spends their moneyon loan repayments and home maintenance are described by alow or negative score on this factor.
The seven spending dimensions can be applied in a multitudeof ways. One possibility is to segment the customer baseaccording to the specific purchasing likelihood. An example forloan products is given in the next section. But in principle thedescribed method could serve any marketing strategy whereindividual spending differences are of importance and correlatewith the behavior of interest.
4. Insights and usefulness
The main question then is what the dimensions of spendingbehavior tell us besides the already known and frequently usedpersonal characteristics like demographic information or ‘‘lifestylevariables’’. What additional explanatory value do they provideand, perhaps more importantly, how can these insights be used incustomer relation management (CRM) or marketing in general?
ARTICLE IN PRESS
Ta
ble
3E
qu
am
ax
rota
ted
fact
or
solu
tio
n
Fact
or
1Fa
cto
r2
Fact
or
3Fa
cto
r4
Fact
or
5Fa
cto
r6
Fact
or
7
Leis
ure
&T
rav
el
Ge
ne
ral
Ma
inte
na
nce
Re
gu
lar
Pay
me
nts
Ris
k&
So
cia
lS
erv
ice
Ori
en
tati
on
Futu
reO
rie
nta
tio
n
Eig
en
va
lue
2.0
82
.07
1.6
01
.50
1.3
51
.17
1.1
2
Leis
ure
—Lu
xu
ry.6
8S
erv
ice
s—P
rofe
ssio
na
l.9
3H
om
eM
ain
ten
an
ce.6
8T
ele
vis
ion
.62
Pe
nsi
on
s&
Insu
ran
ce.6
9S
erv
ice
s—C
om
me
rcia
l.7
2C
hil
dre
n&
Gra
du
ate
s.7
5
Tra
ve
l&
Ca
sh.6
5S
ub
scri
pti
on
s.9
3P
etr
ol
&D
IY.5
9Lo
an
Re
pa
ym
en
ts.6
0H
ou
seh
old
Bil
ls.4
6R
eta
il—
Oth
er
.53
Ed
uca
tio
n.4
3
Inte
rna
tio
na
lT
rav
el
.51
Re
tail
—G
en
era
l.3
8R
eta
il—
Clo
thin
g&
Ho
me
.44
Ho
use
ho
ldB
ills
.44
Ch
ari
ty.4
5Fi
na
nci
al
Se
rvic
es
.39
Mo
rtg
ag
e&
Ass
ura
nce
.22
Pay
me
nt
Ca
rds
.41
Leis
ure
—C
rea
tiv
e.2
1R
eta
il—
Ge
ne
ral
.37
Mo
rtg
ag
e&
Ass
ura
nce
.41
He
alt
h.3
7S
erv
ice
s—O
the
r.2
4P
aym
en
tC
ard
s.1
6
Re
tail
—C
loth
ing
&H
om
e.4
0R
eta
il—
Clo
thin
g&
Ho
me
.19
Leis
ure
—C
rea
tiv
e.3
4C
ar
pu
rch
ase
&R
un
.C
ost
s.2
7E
du
cati
on
.19
Pay
me
nt
Ca
rds
.19
Ca
ree
rS
pe
cifi
c.1
5
Leis
ure
—C
rea
tiv
e.3
9P
aym
en
tC
ard
s.1
7S
erv
ice
s—O
the
r.3
2P
aym
en
tC
ard
s.2
4R
eta
il—
Clo
thin
g&
Ho
me
.18
Tra
ve
l&
Ca
sh.1
3H
ou
seh
old
Bil
ls.1
3
Leis
ure
—S
po
rts
.35
Leis
ure
—Lu
xu
ry.1
6M
ort
ga
ge
&A
ssu
ran
ce.1
8C
ata
log
ue
Sh
op
pin
g.2
2Le
isu
re—
Lux
ury
.17
Leis
ure
—Lu
xu
ry.1
2R
eta
il—
Clo
thin
g&
Ho
me
.12
Ca
ree
rS
pe
cifi
c�
.07
Sp
eci
ali
stH
oli
da
ys
.00
Pe
nsi
on
s&
Insu
ran
ce�
.08
Ho
me
Ma
inte
na
nce
�.1
0Le
isu
re—
Inte
lle
ctu
al
�.1
0C
hil
dre
n&
Gra
du
ate
s�
.03
Sp
eci
ali
stH
oli
day
s�
.08
Loa
nR
ep
ay
me
nts
�.1
0Fi
na
nci
al
Se
rvic
es
.00
Ch
ild
ren
&G
rad
ua
tes�
.09
Ca
ree
rS
pe
cifi
c�
.11
Ch
ild
ren
&G
rad
ua
tes�
.15
Ca
rp
urc
ha
se&
Ru
n.
Co
sts
�.0
6P
en
sio
ns
&In
sura
nce
�.0
9
Ho
me
Ma
inte
na
nce
�.1
8C
are
er
Sp
eci
fic
�.0
1C
ata
log
ue
Sh
op
pin
g�
.09
Ch
ari
ty�
.12
Loa
nR
ep
ay
me
nts
�.2
0In
ve
stm
en
t&
Se
lf-
Em
plo
ye
d
�.1
9C
ata
log
ue
Sh
op
pin
g�
.39
Th
ese
ve
nd
eri
ve
dfi
na
nci
al
pe
rso
na
lity
fact
ors
wit
hth
eir
ass
ign
ed
na
min
ga
nd
the
va
ria
nce
exp
lain
ed
me
asu
red
by
the
ire
ige
nv
alu
e.
For
ea
chfa
cto
rth
ese
ve
nh
igh
est
an
dth
eth
ree
low
est
sta
nd
ard
ize
dfa
cto
rlo
ad
ing
so
fth
e
spe
nd
ing
cla
sse
sa
resh
ow
n,
rep
rese
nti
ng
the
we
igh
to
fth
isv
ari
ab
lefo
rth
ere
spe
ctiv
efa
cto
r.
P.E. Otto et al. / Journal of Retailing and Consumer Services 16 (2009) 10–1814
Author's personal copy
In a first step it has to be shown that the seven spendingdimensions do not simply align with demographic information,which is usually applied in the domain of targeting or individua-lized services. Table 4 shows that this is not the case, and that thePearson correlation with standard demographic measures like sexand age is in general relatively low, although substantialcorrelations exist for some product usages. Therefore, weconclude that additional information is provided by this type ofspending analysis, allowing us to better differentiate betweencustomers. Although the relation between the different personalvariables needs further investigation, one obvious advantage ofthe present approach is that it outputs continuous variables,rather than distinct categories, and these variables can then beflexibly integrated with other statistical information for a widerange of purposes.
To illustrate how to apply these factors, we provide anexample. The factors can be used to optimize the targeting
ARTICLE IN PRESS
FACTOR 4(REGULARS)
FACTOR 5(RISK & SOCIAL)
Perc
enta
ge o
f C
usto
mer
s
0%
1%
2%
3%
4%
5%
6%
7%
0.0 0.1 0.2 0.3 0.4 0.5 0.6Factor Scores
no personal loan
personal loan
0%
2%
4%
6%
8%
10%
12%
-0.2 -0.1 0.0 0.1 0.2 0.3Factor Scores
no personalloan
personalloan
Fig. 4. Loan holdings for debit factors 4 and 5.
Table 4Debit factor correlation
Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Factor 6 Factor 7
Leisure & Travel General Maintenance Regulars Risk & Social Service Orientation Future Orientation
General demographics
Age �.07 .01 .08 .04 .37 �.18 �.14
Sexa�.11 �.12 �.12 .05 �.08 .03 .02
Spendingb .01 �.12 �.18 �.04 �.09 .15 .17Debitsc .06 �.07 .06 �.02 �.05 .07 .15
Product usage
Credit Card .22 .09 .04 �.20 �.06 .07 .00
Debit Card .09 �.02 �.01 �.10 �.12 .09 .01
Direct Debit �.11 �.19 �.24 .09 �.04 .12 .22Overdraft .06 �.01 .02 �.02 .03 �.01 .05
Loan �.11 �.06 �.11 .15 �.17 .05 �.07
Pension �.04 �.04 �.02 .03 �.04 .00 .04
Saving Online .02 .00 �.05 �.03 �.03 .05 .02
Saving General .01 .00 �.03 �.06 .00 .00 .01
Funds .00 .00 .00 �.03 .05 �.04 �.02
Mortgage �.07 �.09 �.06 .12 �.01 �.01 .15
The usage of the products is captured by the number of entries representing the holding of the product in the cases of ‘‘Loan’’, ‘‘Pension’’,’’Saving Online’’, ‘‘Saving General’’,
‘‘Funds’’, and ‘‘Mortgage’’. ‘‘Credit Card’’, ‘‘Debit Card’’, ‘‘Direct Debit’’, and ‘‘Overdraft’’ usages are described in terms of amounts.a The gender is coded 0 for female and 1 for male.b ‘‘Spending’’ is the total amount spent in the last year.c ‘‘Debits’’ is the total number of outgoing transactions in the last year.
Standard MailSample - 203 KResponse rate:
0.196%(398 individuals)
Overlap20 K
Response rate:0.341%
(68 individuals)
Debit Factor MailSample - 203 KResponse rate: ?
Fig. 5. Response rates for standard and debit factor model.
P.E. Otto et al. / Journal of Retailing and Consumer Services 16 (2009) 10–18 15
Author's personal copy
method for products with a high factor correlation. A simplisticmethod to improve the likelihood of a specific behavior is to usethe expenditure database for a cut-off-based segmentation. Thosedebit factors are used which best distinguish customers concern-ing the focused criteria. In the case of loan holdings, factors fourand five are the most predictive. Fig. 4 shows the distribution ofloan holdings for the factor values of these two factors for the300,000 sample. If a specific number of customers is desired thecut-off can be set accordingly. The differentiation value of the twofactors is visualized by an examplary cut-off. Initially half of thecustomers who score highly (F4X.22) on the fourth factor areselected. Subsequently this number of customers is furtherdecreased according to their score on the fifth factor (F5p.13).The two factors were selected according to their high correlationwith the targeted behavior. The restriction of the customer basewith regard to these two factor scores can significantly increasethe identification of those customers likely to hold loans from 1%to 9%. The first selection criterion leaves 166,000 customers with3% holding a loan. The final selection results in 45,000 customersof which 9% are holding a loan.
This straightforward hierarchical selection method has beenused in a first implementation of the debit factors in directmailing to improve the mailshot selection as well as to optimizethe mailshot size to acquire new loan customers. In a sampleindependent test the additional usage of the debit factors nearlydoubled the response rate compared to only demographic andlifestyle based data from 0.196% to 0.341% (Fig. 5).3 Thissubstantial uplift in the response rate is assumed to generalizeto the whole debit factor mail sample. Alternatively the debitfactors can be used to optimize the size of the standard mailsample.
In both cases all available customer information is taken toselect the most responsive mailing sample in a logistic regression.This is the standard procedure for model building in financialservices. Both selection models in common are financial behavior,product holdings, risk scores, and household information. Theyare derived in the same way and trained on past mailings. All datapreceding the mailing are regarded but only the alternative modelincludes the debit factors. The uplift in the response rate by thisadditional information is substantial as achieving the samenumber of responses with the standard model would mean todouble the mailing size which would add a cost of approximately£100K (assuming £0.50 per mail). Thus, on economic groundsalone the debit factors achieve a fundamental gain, in addition tothe reduction in ‘‘annoyance of the customer’’ by additional mail.
This illustrates how the factors can be used to substantiallyimprove the effectiveness of a direct marketing campaign. Themethod outlined is consistent with the idea of one reason decisionmaking (Gigerenzer and Goldstein, 1999) but leaves room forimprovements and only exemplifies how the debit factors can beused. Further work might investigate temporal as well asinterregional stability of the factors. Also other methodologicalissues and the different advances in the field of segmentation havenot been investigated in full detail (for a summary see Kamakura,2002; Wind, 1978). Therefore, the real value of factor-baseddifferentiations of spending behavior described here, and the fullrange of potential applications remains to be explored. The mainresult is that a two-step approach, where the first step is asystematic understanding of customer behavior, and the secondstep applies this understanding to the specific practical problemin hand (e.g., direct mail), can substantially change and improvethe efficacy of CRM in service industries, although long-termeffects resulting from a better understanding of the customers’needs could be the more prominent.
Similar results could be achieved for other customer segments.The regression results for saving products, shown in Table 5,indicate a predictive value of the factors also for the savingdomain. The debit factors add additional information to demo-graphic information and might be worth considering for a varietyof behavioral differences.
ARTICLE IN PRESS
Table 5Segmentation for saving products
Product
Pension Mortgage General saving
Estimate t-Value Estimate t-Value Estimate t-Value
Factor 1: Leisure & Travel �27.64 (7.97) �3.47�� �5474 (825) �6.64��� 988.5 (252) 3.91���
Factor 2: General �8.79 (11.20) �.78 �9154 (1159) �7.90��� 25.4 (355.5) .70
Factor 3: Maintenance �23.90 (8.21) �2.91� 5377 (850) 6.33��� �821.4 (26.8) �3.15��
Factor 4: Regulars �4.76 (1.89) �.44 �25275 (1127) �22.43��� �2484 (345.7) �7.19���
Factor 5: Risk & Social �58.86 (1.64) �5.53��� 2153 (1101) 1.95 812.3 (337.9) 2.40
Factor 6: Service Orientation �9.09 (13.86) �6.50��� 22604 (1434) 15.76��� 25.60 (439.9) .06
Factor 7: Future Orientation 234.2 (2.99) 11.15��� �62573 (2173) �28.8��� 885.6 (666.5) 1.33
Age �.345 (.0395) �8.74��� 37.41 (4.09) 9.15��� 15.91 (1.25) 12.68���
Sexa 21.37 (1.07) 2.03��� �539.2 (11.4) �4.88��� �19.22 (33.88) �.57
Spendingb .001 (.00003) 21.53��� �.33 (.003) �103.52��� .013 (.0098) 13.06���
Debitsc .013 (.00369) 3.57��� 6.31 (.382) 16.50��� �.634 (.117) �5.40���
Intercept 23.63 (4.78) 4.94 10777 (495) 21.77 �172.3 (151.9) �1.13
Multiple regression of the different saving products on amounts (negative amounts in the case of mortgages). Values in parentheses are standard errors of the respective
parameter estimates.� po.01�� po.001��� po.0001a The gender is coded 0 for female and 1 for male.b ‘‘Spending’’ is the total amount spent in the last year.c ‘‘Debits’’ is the total number of outgoing transactions in the last year.
3 The alternative direct mail selection method takes in addition to the existing
predictors the debit factors into account where the spending was averaged over
the past year. This information is used for the next month’s direct mailing. All
customers approached are new customers not holding a loan with the provider.
Therefore, already existing loan repayments are not used as an indicator for a
possible loan purchasing inclination. The response rate is the percentage of people
which purchase a loan within 2-month after the mailshot.
P.E. Otto et al. / Journal of Retailing and Consumer Services 16 (2009) 10–1816
Author's personal copyARTIC
LEIN
PRESS
Table A1
Spending cluster Catalogue
shopping
Loan
repayments
Subscriptions Home
maintenance
Household
bills
Petrol &
DIY
Children &
graduates
Specialist
holidays
Mortgage &
assurance
Education Pensions &
insurance
Leisure
–Luxury
Charity Television Retail–
other
Loan Repayments .15
Subscriptions .06 .04
Home Maintenance .05 .11 .07
Household Bills .18 .30 .08 .18
Petrol & DIY .18 .37 .07 .22 .48Children & Graduates .05 .09 .02 .07 .12 .15
Specialist Holidays .21 .02 .02 �.06 .02 �.02 .00
Mortgage & Assurance .11 .20 .04 .15 .36 .34 .10 .02
Education .08 .05 �.02 .02 .08 .05 .01 �.02 .02
Pensions & Insurance .08 .03 .03 .07 .07 .07 .03 .03 .05 .03
Leisure—Luxury .15 .18 .03 .11 .23 .27 .09 .07 .19 .03 .05
Charity .25 .21 .05 .11 .24 .32 .09 .11 .21 .09 .06 .26Television .16 .21 .05 .18 .30 .32 .14 .04 .23 .03 .07 .26 .26Retail—Other .18 .27 .06 .12 .54 .40 .13 .05 .27 .06 .05 .25 .24 .36Health .08 .14 .04 .06 .15 .13 .04 .02 .08 .02 .03 .07 .07 .09 .15
Financial Services .13 .20 .03 .09 .28 .35 .09 .03 .22 .03 .04 .16 .19 .18 .23
Retail—Food & Drink .14 .14 .06 .10 .19 .22 .07 .02 .11 .01 .10 .14 .15 .16 .18
Services—Commercial .16 .08 .04 .05 .10 .10 .02 .10 .06 .08 .06 .09 .14 .09 .10
Retail—General .12 .25 .05 .15 .26 .44 .12 �.01 .25 .03 .04 .24 .28 .21 .24Services—Other .23 .14 .10 .17 .28 .23 .07 .02 .11 .07 .16 .11 .20 .18 .20
Leisure—Creative .04 .04 .00 .03 .03 .06 .02 �.01 .03 .08 .02 .03 .08 .05 .03
International Travel .13 .27 .04 .14 .35 .51 .14 �.05 .22 .04 .09 .21 .25 .25 .32Retail—Clothing & Home .35 .22 .08 .08 .30 .27 .08 .16 .13 .04 .11 .18 .26 .23 .29Car purchase & Running Costs .18 .34 .08 .22 .52 .55 .14 .01 .32 .06 .07 .24 .26 .34 .45Leisure—Intellectual .12 .22 .04 .15 .31 .34 .11 .01 .23 .02 .05 .21 .22 .25 .28Leisure—Sports .05 .10 .02 .05 .16 .14 .05 �.01 .09 .01 .03 .08 .07 .11 .18
Services—Professional .06 .06 .05 .04 .04 .10 .01 .03 .05 .00 .00 .06 .10 .04 .03
Investment & Self-Employed .12 .18 .04 .13 .30 .31 .12 .03 .21 .03 .05 .17 .18 .26 .27Travel & Cash .09 .16 .05 .15 .33 .28 .09 �.05 .18 .03 .09 .14 .14 .19 .24Payment Cards .21 .28 .09 .11 .47 .41 .11 .06 .22 .06 .08 .23 .23 .28 .45Career Specific .08 .09 .08 .21 .11 .17 .06 �.03 .09 .04 .09 .07 .12 .18 .10
Spending cluster Health Financial
services
Retail–
food &
drink
Services–
commercial
Retail–
General
Services–
Other
Leisure–
Creative
International
travel
Retail–
Clothing
&
home
Car purchase
&
running
costs
Leisure–
Intellectual
Leisure–
Sports
Services–
Professional
Investment
&
self-
employed
Travel
&
cash
Payment
cards
Financial Services .08
Retail—Food & Drink .07 .12
Services—Commercial .03 .07 .07
Retail—General .08 .22 .15 .09
Services—Other .12 .13 .21 .14 .11
Leisure—Creative .00 .03 .03 .03 .05 .05
International Travel .10 .24 .21 .07 .41 .20 .06
Retail—Clothing & Home .10 .18 .24 .18 .16 .43 .05 .24Car Purchase & Running Costs .15 .28 .20 .09 .32 .24 .04 .38 .27Leisure—Intellectual .08 .19 .15 .07 .24 .14 .05 .25 .19 .32Leisure—Sports .06 .09 .08 .02 .07 .11 .02 .11 .13 .15 .14
Services—Professional .01 .05 .03 .03 .10 .01 .01 .05 .03 .07 .05 �.01
Investment & Self-Employed .07 .18 .12 .06 .19 .13 .03 .22 .16 .31 .21 .09 .03
Travel & Cash .08 .14 .15 .06 .17 .22 .03 .24 .18 .29 .18 .10 .02 .16
Payment Cards .18 .24 .29 .09 .19 .28 .04 .31 .38 .42 .28 .24 .03 .22 .25Career Specific .08 .08 .13 .05 .10 .22 .05 .12 .13 .18 .11 .06 .01 .10 .13 .15
P.E.
Otto
eta
l./
Jou
rna
lo
fR
etailin
ga
nd
Co
nsu
mer
Services
16(2
00
9)
10
–18
17
Author's personal copy
5. Implications
Transaction data is being gathered ever more rapidly, andstored in sophisticated data warehouses, at great cost. The presentwork aims to show how simple and readily automated methodscan be applied to such large-scale data, adding significantcommercial value, both by increasing commercial understandingand by supporting practical applications. We assume that this andsimilar methods of analysis for large behavioral data-bases will beincreasingly important.
We believe that it will be crucial to take careful account of theindividual customer in this type of work. On the one hand dataprotection and information control have been raised as issues forpublic policies and legislation matters (Goodwin, 1991; Milne,2000; Phelps et al., 2000). On the other hand, the role andpotential of personal data in CRM is substantial (Godin, 1999;Milne and Boza, 1999). Only if behavioral data is used to thecustomer’s benefit can real improvement of data-based customerservices be achieved; and only then can the storage of such databe readily justified both to regulators and customers themselves.
Transactional data can be seen as a valuable source ofimproving customer understanding in addition to existingcustomer evaluation methods (i.e., Gould, 1997). As demonstrated,transactional data cannot only be easily transformed into usefulinformation for marketing purposes, but can also help to buildpsychological models to provide a better understanding of thecustomer in general. This can be seen as a method of system-atically putting an understanding of the customer first, using datadrawn from their own behavior, thus emphasizing the keymoment for building and maintaining useful customer ware-houses. With the use of dimensions rather than segments wewant to promote the development into the direction of individualspecific relations to enable services which relate directly toindividuals and their demands. The method outlined shows that abetter understanding of specific behavioral aspects helps to focusand improve marketing initiatives in concrete settings. Aggrega-tion of behavior and deriving generalizations, therefore, can beseen as a valuable source for improving CRM.
The new technological possibilities demand a new way ofthinking and definitely new ways of marketing, which go hand inhand with the improvement of analytical and statistical methods.Only on the basis of a fundamental understanding of theaccessible data and with the adequate methods at hand can weprovide reliable resources for coping with the changing demandsin personal services and finally reach the land beyond targetingalone—enabling the delivery of products and information that ispersonalized for each customer.
6. Appendix A
Spending cluster Spearman correlation (see Table A1).
References
Ainslie, A., Rossi, P.E., 1998. Similarities in choice behavior across productcategories. Marketing Science 17 (2), 91–106.
Bucklin, R.E., Gupta, S., 1992. Brand choice, purchase incidence, and segmentation:an integrated modeling approach. Journal of Marketing Research 29 (May),201–215.
Cattell, R.B., 1965. Factor analysis: an introduction to essentials II. The role of factoranalysis in research. Biometrics 21 (2), 405–435.
Cattell, R.B., 1966. The scree test for the number of factors. Multivariate BehavioralResearch 1, 245–276.
Eriksson, K., Vaghult, A.L., 2000. Customer retention, purchasing behavior, andrelationship substance in professional services. Industrial Marketing Manage-ment 29, 363–372.
Fishbein, M., Ajzen, I., 1975. Belief, Attitude, Intention, and Behavior. Addison-Wesley, Reading, MA.
Gigerenzer, G., Goldstein, D., 1999. Betting on one good reason: the take thebest heuristic. In: Gigerenzer, G., Todd, P.M., The ABC Research Group (Eds.),Simple Heuristics that Make Us Smart. Oxford University Press, New York,pp. 75–95.
Godin, S., 1999. Permission Marketing: Turning Strangers into Friends and Friendsinto Customers. Simon & Schuster, New York.
Goodwin, C., 1991. Privacy: recognition of a consumer right. Journal of Public Policyand Marketing 12 (1), 106–119.
Gould, S.J., 1997. The use of psychographics by advertising agencies: an issue ofvalue and knowledge. In: Kahle, L.R., Chiagouris, L. (Eds.), Values, Lifestyles,and Psychographics. LEA, New Jersey, pp. 217–229.
Hakstian, A.R., Rogers, W.T., Cattell, R.B., 1982. The behavior of number-of-factorsrules with simulated data. Multivariate Behavioral Research 17 (2), 193–219.
Hustad, T.P., Pessemier, E.A., 1974. The development and application of psycho-graphic life style and associated activity and attitude measures. In: Wells, W.D.(Ed.), Life Style and Psychographics. AMA, Chicago, pp. 31–70.
Kaiser, H.F., 1970. A second generation Little Jiffy. Psychometrika 35 (4), 401–415.Kamakura, W.A., 2002. Introduction to the special issue on market segmentation.
International Journal of Research in Marketing 19, 181–183.Kamakura, W.A., Wedel, M., 2000. Factor analysis and missing data. Journal of
Marketing Research 37 (4), 490–498.Landahl, H.D.A., 1938. Centroid orthogonal transformation. Psychometrika 3,
219–223.Lesser, J.A., Hughes, M.A., 1986. The generalizability of psychographic market
segments across geographic locations. Journal of Marketing 50 (1), 18–27.Lin, Q.-Y., Chen, Y.-L., Chen, J.-S., Chen, Y.-C., 2003. Mining inter-organizational
retailing knowledge for an alliance formed by competitive firms. Informationand Management 40, 431–442.
Lockshin, L.S., Spawton, A.L., Macintosh, G., 1997. Using product, brand, andpurchasing involvement for retail segmentation. Journal of Retailing andConsumer Services 4 (3), 171–183.
MacQueen, J.B., 1967. Some methods for classification and analysis of multivariateobservations. In: LeCam, L.M., Neyman, J. (Eds.), Proceedings of the FifthBerkeley Symposium on Mathematical Statistics and Probability. CambridgeUniversity Press, London, pp. 282–297.
Milne, G.R., 2000. Privacy and ethical issues in database/interactive marketing andpublic policy: a research framework and overview of the special issue. Journalof Public Policy and Marketing 19 (1), 1–6.
Milne, G.R., Boza, M.-E., 1999. Trust and concern in consumers’ perceptions ofmarketing information management practices. Journal of Interactive Market-ing 13 (1), 5–24.
Pernica, J., 1974. The second generation of market segmentation studies: an auditof buying motivations. In: Wells, W.D. (Ed.), Life Style and Psychographics.AMA, Chicago, pp. 279–313.
Phelps, J., Nowak, G., Ferrell, E., 2000. Privacy concerns and consumer willingnessto provide personal information. Journal of Public Policy and Marketing 19 (1),27–41.
Plummer, J.T., 1974. The concept and application of life style segmentation. Journalof Marketing 38 (1), 33–37.
Reinartz, W., Kumar, V., 2002. The mismanagement of customer loyalty. HarvardBusiness Review 80 (7), 86–94.
Rossi, P.E., McCulloch, R.E., Allenby, G.M., 1996. The value of purchase history datain target marketing. Marketing Science 15 (4), 321–340.
Slama, M.E., Tashchian, A., 1985. Selected socioeconomic and demographiccharacteristics associated with purchasing involvement. Journal of Marketing49, 72–82.
Stern, P., Hammond, K., 2004. The relation between customer loyalty and purchaseincidence. Marketing Letters 15 (1), 5–19.
Wells, W.D., 1975. Psychographics: a critical review. Journal of Marketing Research12 (May), 196–213.
Wind, Y., 1978. Issues and advances in segmentation research. Journal of MarketingResearch 15 (August), 317–337.
ARTICLE IN PRESS
P.E. Otto et al. / Journal of Retailing and Consumer Services 16 (2009) 10–1818