Algorithm that creates product combinations based on ...1117197/FULLTEXT01.pdfDEGREE PROJECT IN...

IN DEGREE PROJECT TECHNOLOGY,FIRST CYCLE, 15 CREDITS

, STOCKHOLM SWEDEN 2017

Algorithm that creates product combinations based on customer data analysis

An approach with Generalized Linear Models and Conditional Probabilities

ENKHZUL UYANGA

LIDA WANG

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ENGINEERING SCIENCES

Algorithm that creates product combinations based on customer data analysis An approach with Generalized Linear Models and Conditional Probabilities ENKHZUL UYANGA LIDA WANG Degree Projects in Applied Mathematics and Industrial Economics Degree Programme in Industrial Engineering and Management KTH Royal Institute of Technology year 2017 Supervisor at eleven AB: Joachim Ramberg Supervisors at KTH: Henrik Hult, Pontus Braunerhjelm Examiner at KTH: Henrik Hult

TRITA-MAT-K 2017:17 ISRN-KTH/MAT/K--17/17--SE Royal Institute of Technology School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

Abstract

This bachelor’s thesis is a combined study of applied mathematical statistics and indus-trial engineering and management implemented to develop an algorithm which createsproduct combinations based on customer data analysis for eleven AB. Mathematically,generalized linear modelling, combinatorics and conditional probabilities were applied tocreate sales prediction models, generate potential combinations and calculate the condi-tional probabilities of the combinations getting purchased. SWOT analysis was used toidentify which factors can enhance the sales from an industrial engineering and manage-ment perspective.

Based on the regression analysis, the study showed that the considered variables, whichwere sales prices, brands, ratings, purchase countries, purchase months and how new theproducts are, affected the sales amounts of the products. The algorithm takes a barcodeof a product as an input and checks whether if the corresponding product type satisfiesthe requirements of predicted sales amount and conditional probability. The algorithmthen returns a list of possible product combinations that fulfil the recommendations.

Keywords: E-commerce; Customer data; Generalized linear model; Conditional probability;Product combination; SWOT; Algorithm.

Sammanfattning

Algoritm som skapar produktkombinationer baserad på kunddataanalys: En metod med Generaliserade Linjära Modeller och

Betingade Sannolikheter

Detta kandidatexamensarbete är en kombinerad studie av tillämpad matematisk statistikoch industriell ekonomisk implementering för att utveckla en algoritm som skapar pro-duktkombinationer baserad på kunddata analys för eleven AB. I den matematiska delentillämpades generaliserade linjära modeller, kombinatorik och betingade sannolikheter föratt skapa prediktionsmodeller för försäljningsantal, generera potentiella kombinationeroch beräkna betingade sannolikheter att kombinationerna bli köpta. SWOT-analys an-vändes för att identifiera vilka faktorer som kan öka försäljningen från ett industriellekonomiskt perspektiv.

Baserat på regressionsanalysen, studien har visat att de betraktade variablerna, somvar försäljningspriser, varumärken, försäljningsländer, försäljningsmånader och hur nyaprodukterna är, påverkade försäljningsantalen på produkterna. Algoritmen tar emoten streckkod av en produkt som inmatning och kontrollerar om den motsvarande pro-dukttypen uppfyller kraven för predikterad försäljningssumma och betingad sannolikhet.Algoritmen returnerar en lista av alla möjliga kombinationer på produkter som uppfyllerrekommendationerna.

Nyckelord : E-handel; Kunddata; Generaliserad linjär modell; Betingad sannolikhet; Produk-tkombination; SWOT; Algoritm

Acknowledgements

We would like to express our gratitude to our supervisor, Professor Henrik Hult at theDepartment of Mathematics, for his great support, inspiration and valuable advices dur-ing the project. In addition, we would like to thank our supervisor, Professor PontusBraunerhjelm at the Department of Industrial Economics and Management, for his helpand suggestions during the project.

We would also like to extend our warmest thanks to eleven AB, and specially JoachimRamberg, the CTO of eleven, for providing us with the subject, data and continuousfeedback.

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Theory 42.1 Generalized linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Link functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.2 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . 5

2.2 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Variable selection and model evaluation . . . . . . . . . . . . . . . . . . . 6

2.3.1 Wald test and confidence intervals . . . . . . . . . . . . . . . . . . 62.3.2 AIC and BIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.5 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.6 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.7 Recommender systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.8 SWOT-analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Methods 123.1 About the implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 Part I: Analysis and model development by GLM . . . . . . . . . . . . . . 13

3.2.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2.2 Model building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2.3 Variable selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.4 Residual analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2.5 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2.6 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2.7 Sales performance predictions . . . . . . . . . . . . . . . . . . . . . 21

3.3 Part II: Conditional probability matrices . . . . . . . . . . . . . . . . . . . 213.3.1 Generating product combinations . . . . . . . . . . . . . . . . . . . 213.3.2 Generating conditional probability matrices . . . . . . . . . . . . . 22

i

3.4 Algorithm formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Results 254.1 Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1.1 Sales performance prediction models . . . . . . . . . . . . . . . . . 254.1.2 Recommendations for each product type . . . . . . . . . . . . . . . 274.1.3 Selection of product types . . . . . . . . . . . . . . . . . . . . . . . 27

4.2 Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.2.1 Product combinations . . . . . . . . . . . . . . . . . . . . . . . . . 284.2.2 Conditional probability matrices . . . . . . . . . . . . . . . . . . . 29

4.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.4 SWOT chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5 Discussion 325.1 Interpretation of the results . . . . . . . . . . . . . . . . . . . . . . . . . . 325.2 Marketing strategy analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 335.3 Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.4 Comparison to earlier studies . . . . . . . . . . . . . . . . . . . . . . . . . 36

6 Conclusion 38

7 Further research 40

8 References 41

ii

Chapter 1

Introduction

1.1 Background

As the e-commerce market grows rapidly, there is a growing tendency in online beautyproducts. According to Euromonitor International, global consumers spent $24 billiononline on beauty and personal care products in 2015, which is equal to 6,5% of the totalmarket and double of the statistics back in 2010 (Weinswig, 2016). The industry actorsare using different campaigns, offers, personalized services, smart promotions and such,which intensifies the competition within the market. The importance of understandingcustomers’ needs in order for companies to keep their competitive advantages is increas-ing.

Advancements in information storage give the e-commerce market valuable informationabout the customers, if interpreted correctly. Such as seasonal variations, extremelyshort clockspeeds of beauty products and market dynamism make it complex to predictcustomer’s buying behaviour without mathematically analysing the big data.

The thesis is a collaboration between the authors and eleven AB which is a certifieddistributive online reseller located in Sweden. eleven AB offers a wide range of beautyproducts in categories: makeup, perfume, skincare and haircare. Since its establishmentin 2004 in Västervik, Sweden, the company successfully expanded into neighbouring mar-kets, Finland and Norway. As of today, eleven AB is one of Scandinavia’s leading beautystores and one of Sweden’s oldest online reseller of beauty products.

eleven started with a vision of "providing fantastic beauty products to its customers,no matter where they live" (eleven AB, 2017). Therefore, unique shipping alternatives,such as home delivery, Volvo In-car Delivery and Collect In Store, are offered. In addi-tion to above-mentioned unique shipping alternatives, they offer free standard shippingon all its orders. To avoid situations where an order value falls short of the standardshipping cost, offering good product combinations will encourage customers to buy moreleading to increased average order value. As of today, eleven AB offers a few product

1

combinations for its customers. The combinations are created independently by workers,considering the current trends and campaigns (Ramberg, 2017). Most of the currentproduct combinations consist of the same product types, which are proved to be not sowell-selling.

The sales statistics are not only affected by different offers, but also determined by theeffectiveness of the marketing strategy. A good marketing strategy will benefit eleven ABwith increased sales and a more sustainable competitive advantage.

The thesis is of impact for both eleven AB and its customer base. If the product com-binations are of customers’ tastes and needs, they could simply buy the package for abetter price than purchasing each of the products separately both on eleven or from othermarket competitors. eleven AB will hopefully be able to increase its average order value.

As to mention earlier similar studies, recommender systems (RS) are widely used in thee-commerce market for personalized recommendations of other products for each cus-tomer. This, both, enriches the online experience and increases the revenues significantly(Schafer, Konstan and Riedl, 1999). As ideas and concepts will be studied from thesebasic approaches in order to simplify the development of the formula, one has to realisethat the product combinations will be developed based on the all existing factual cus-tomer data. Furthermore, as a RS tries to make predictions on user preferences and makerecommendations that should interest a certain customer, the product combinations haveto consider the final likelihood of customers buying the specific combinations consistingof two or more products. This eliminates the characteristic of being personal from thealgorithm which will be derived from this study, unlike the recommender systems.

1.2 Purpose

The purpose of this thesis is to find, through regression analysis, the significant factors orvariables that lead to purchase of different products. The variables considered are salesprice, rating, brand, purchase month, purchase country and how new or old the productis. Combinations are made of products with high sales predictions. Conditional proba-bilities are applied afterwards to create matrices that explain the purchase probabilitiesof the product combinations consisting of two and three products.

Based on the analysis, an algorithmic specification is developed which creates productcombinations including two or three different products. The algorithm should be able totake a barcode as an input and in return show barcodes for other products that togetherwith the input makes an optimal product package for a specific month.

Furthermore, a marketing strategy analysis is presented by applying marketing theoriesand models. Some recommendations are given in order to improve the sales statisticsand profitability of the company.

2

The algorithm, together with the marketing strategy recommendations, should be usedby a staff in charge for purpose of optimizing the average order values of eleven AB.

1.3 Problem definition

With the intention of accomplishing the purpose of the thesis, the central questions thatwill be answered are:

• Which factors are most significant for sales statistics of certain product types?

• Which product types have the highest sales amount during specific months?

• Is it possible to create product combinations including two or three differentproducts of same and different product categories?

• If so, what is the conditional probability of a customer buying the second (andthe third) product of the combination, given the first product as input?

In addition to the mathematical formulated questions above, the following questions willbe addressed from a strategic perspective:

• How can better understanding of the market and its competitors increase sales?

1.4 Scope

This study creates combinations of two and three products. The combinations have norepetition, that is to say, the combinations consist of different product types. It is worthnoting that only the three main product categories are studied in this thesis. A SWOTanalysis is applied in order to identify internal and external factors that have positiverespectively negative impacts on eleven AB’s sales.

Due to a Confidential Disclosure Agreement (CDA), between the authors and eleven AB,the study uses implicit product names. Upper-case Latin letters are used to refer to prod-uct categories. Product types that belong to a certain product category are referred withthe corresponding upper-case Latin letter and numerals. Products that belong to a cer-tain product type are referred as the corresponding upper-case Latin letter, correspondingnumeral and a lower-case Latin letter. For instance, A1a is the product a which belongsto the product type A1 which in return belongs to the product category A.

3

Chapter 2

Theory

2.1 Generalized linear model

Generalized linear model (GLM) is a generalization of the ordinary linear regression,which the assumption with normal distribution is not requested. However, the responsevariable must be a member of the exponential family in a GLM, which consists of normal,Poisson, binomial, exponential and gamma distributions.

The general form of the distribution of the exponential family can be expressed as:

f(yi, θi, φ) = eyiθi−b(θi)

a(φ)+h(yi,φ)

where yi is the response variable, θi is the natural location parameter and φ is a scaleparameter.

For members of the exponential family, expectation and variance can be represented as:

E(y) = µ =db(θi)

dθi

Var(y) =d2b(θi)

dθ2ia(φ) =

dµ

dθia(φ)

(Montgomery, Peck and Vining 2012, p.450)

2.1.1 Link functions

The basic concept of a GLM is to develop a linear model by using an appropriate linkfunction for the specific function. Assume that the expected value of the response variablecan be written as:

E(yi) = µi.

4

Let ηi be the linear predictor. The function g that relates the mean of the responsevariable to the linear predictor is called link function and it can be defined as:

g(µi) = ηi = β0 + β1x1 + · · ·+ βkxk = x′iβ

where β0 is an intercept, and xi =

{1 if βi is used in cell i0 if βi is a reference cell

Thus, the expected value of the response variable is:

E(yi) = g−1(ηi) = g−1(x′iβ).

Table 2.1 shows the most common distributions with link functions in a GLM.

Distribution Link name Link function

Normal Identity g(µi) = µi

Exponential Inverse g(µi) =1

µi

Gamma Inverse g(µi) =1

µiPoisson Log g(µi) = ln(µi)

Binomial Logit g(µi) = ln(µi

1− µi)

Table 2.1: Link functions for the Generalized Linear Model

(Montgomery, Peck and Vining 2012, p.451)

2.1.2 Maximum likelihood estimation

Maximum likelihood is the theoretical method for estimating regression coefficients in aGLM. The basic idea behind the maximum likelihood estimation is to find the coefficientsβi by creating a likelihood function such that the estimates have the highest probabilitiesof matching the actual observed values. The likelihood function can be written as:

L(y, θ, φ) =n∏i=1

f(yi, θi, φ).

Since the logarithm is a monotonically increasing function and achieves its maximumvalue at the same point as the function itself, the log-likelihood can be used instead ofthe likelihood function.

ln(L(y, θ, φ)) =

n∑i=1

f(yi, θi, φ).

The maximum-likelihood estimators, βis can be obtained by solving the equations:

∂lnL

∂βi= 0.

5

Thus, the fitted generalized linear model will be given by:

yi = g−1(x′iβ).

For the Poisson distribution, the fitted generalized model becomes:

yi = ex′iβ

(Montgomery, Peck and Vining 2012, p.444-446)

2.2 Poisson distribution

The probability mass function of the Poisson distribution with mean µ is:

pX(k) =µk

k!e−µ, k = 0, 1, 2....

A discrete Poisson distributed random variable is often denoted by X ∈ Po(µ) and themean and variance of X are both equal to µ, i.e. E(X) = Var(X) = µ.

The Poisson distribution is a discrete probability distribution that represents the prob-ability of a given number of events occurring randomly within a fixed time interval orspace. With events occurring randomly in a time interval, it means that events mayoccur at any time and they occur independently of each other. The assumption of inde-pendence implies that if several events occur over a period of time, it will not affect thenumber of events that occur in a later period. In general, the Poisson distribution canalso be used to model the probability that a number of events occur in other specifiedintervals such as distance, area or volume.(Blom, et al. 2016, p.180,182)

2.3 Variable selection and model evaluation

2.3.1 Wald test and confidence intervals

The Wald test is used to find out if explanatory variables in a model are significant andis often called a "significance test". The principle of the statistical test is to fit the re-gression model without restrictions and then evaluate whether the results appear withinthe range of sampling variability agree with the hypothesis (Greene 2012, p.155).

The simplest case of the Wald test is tests on individual model coefficients. The nullhypothesis is written as:

H0 : βj = 0, H1 : βj 6= 0,

and the test statistic for the null hypothesis is given by:

Z0 =βj

se(βj).

6

The Wald test also can be used to construct confidence intervals on individual regres-sion coefficients in the linear predictor. An approximate 100(1-α) percentage confidenceinterval on the j-th model coefficient is:

Iβj = [βj − Zα/2se(βj), βj + Zα/2se(βj)]

where α is the significance level and the commonly used significance levels are 0.10, 0.05and 0.01.

If the confidence interval Iβj does not include zero, the null hypothesis that the modelcoefficient is zero should be rejected at the α% significance level.(Montgomery, Peck and Vining 2012, p.436-437)

2.3.2 AIC and BIC

The AIC measures the quality of each model, comparing with each of the other models.Let L denote the likelihood function for a specific model, the AIC is defined as:

AIC = −2ln(L) + 2p

where p is the number of parameters in the model.

The difference of AIC between a full and a reduced model can be used to decide whetherto include or exclude an explanatory variable. If the AIC of the full model is greaterthan the reduced model, the variable should be excluded. But it is worth noting that ifthe AIC of the full model is smaller than the reduced model, it can not conclude thatthe variable must be excluded since the AIC does not provide an absolute certain answerabout which model is the best and therefore needs other complementary tests.

The Bayesian information criterion (BIC) is another criterion for model selection and itis closely related to the AIC. The BIC is formally defined as:

BIC = −2ln(L) + pln(L)

and the model with lowest BIC is preferred.(Montgomery, Peck and Vining 2012, p.336)

2.4 Multicollinearity

The problem of multicollinearity occurs when the regressors in a multiple regressionmodel are near-linear dependent. The presence of multicollinearity among the regressorsresults in large variances and covariances for the least-square estimators of the regressioncoefficients.

7

There are several ways for detecting multicollinearity, one of which is the eigensystemanalysis of covariance matrixX′X. If there is one or more small eigenvalues λ, it impliesthat the near-dependencies exist among the columns ofX. The eigensystem analysis alsocan be performed through examining the condition number of X′X, defined as:

κ =λmaxλmin

In general, if κ is less than 100, there is no serious problem with multicollinearity. Ifκ is between 100 and 1000, it implies moderate to strong multicollinearity. For κ thatexceeds 1000, the multicollinearity problem is severe.(Montgomery, Peck and Vining 2012, p.285-298)

2.5 Combinatorics

For any integer n≥ 0 and any real number x and y, the binomial theorem is stated asbelow:

(x+ y)n =n∑k=0

(n

k

)xn−kyk

Because of the binomial theorem, the numbers(nk

)are called binomial coefficients. The

term combination is sometimes used to indicate a subset or to indicate an unorderedcollection of different items. Thus,

(nk

)denotes the number of k-element subsets of an

n-element set and can be formulated as below:(n

k

)=

n!

(n− k)!k!The key features of this combination are that order does not matter and repeated elementsare not allowed.(Mazur 2010, p.9)

2.6 Conditional probability

Given two events A and B, with P(B) > 0, the conditional probability of B occurringgiven that A occurs is defined as:

P (B|A) = P (A ∩B)

P (A)

(Gut 2009, p.5)Based on the definition, the idea of conditional probability extends naturally to the casewhen the probability of an event is conditioned on several events such as:

P (C|A ∩B) =P (A ∩B ∩ C)P (A ∩B)

8

2.7 Recommender systems

Basic theories

Recommender systems are widely used in the e-commerce market for personalized rec-ommendations of other products for each customer. The recommended products can forexample be physical goods, films, music, articles, social tags and services. The systemenriches the online experience, increases the conversion rate and affects the revenues pos-itively (Schafer, Konstan and Riedl, 1999).

Theoretically, recommender systems are a "spectrum of systems describing any systemthat provides individualization of the recommendation results and leads to a procedurethat helps users in a personalized way to interesting or useful objects in a large spaceof possible options"(Lampropoulus and Tsihrintzis 2015, p.1). A recommender systemhelps its user by filtering an overload of information by providing the most appropriateand valuable information for the specific user.

To make recommendations, personal information about the user preference is needed inorder to predict the user’s rating for other items than they have been in touch before.There are three different methods of collecting knowledge about user preferences: im-plicit, explicit and mixing approach. The implicit approach does not require any activeinvolvement from the user and is based on recording the user behaviour. A typical exam-ple of implicit rating is a historic purchase data. The explicit approach is based on userinterrogation by requiring the user to specify their preference for any particular item.Lastly, the mixing approach is a combination of the previous two.

There are two main approaches of designing a recommender system: content-based meth-ods and collaborative methods. The first mentioned, content-based learning, is based onmodelling a user profile by monitoring user actions or/and by examining the objects theuser has evaluated. By assuming that a user’s preferences remain unchanged throughtime, one can predict their future actions based on past user behaviours. In other words,all the information stored about the user will be used to customize the services offered.While, the main assumption for collaborative filtering is that similar users prefer similaritems. This method relies entirely on interest ratings from the users and can be cate-gorized into two different branches: model-based and memory-based. The model-basedalgorithms use statistical and machine-learning techniques to make predictions based onthe underlying data. The memory-based methods can be further divided into two classes:user-based and item-based. User-based collaborative systems make user-user similaritycalculations by matching the user against a database of other users who have similarinterests. Items that the other users have bought but unknown to the specific user areoffered as a recommendation for the specific user. The item-based collaborative systemis, on other hand, based on matching a specific item to a database of other items. Thus,this approach is based on item relations rather than user relations and makes the final

9

prediction based on similarities between items which have been rated by a common user.

A common flaw of collaborative methods is the risk of sparse data. Since, every useris not obligated to rate and provide their opinions for every item, the sparsity leads toinsufficient results. But, there are different techniques for reducing the sparsity and De-fault Voting is the simplest.

Additional studies

Many different methods, approaches and algorithms for recommender systems have beenderived from earlier studies. One of the main purposes of further researches within thisarea is to find a better way of handling common problems such as sparsity, user interfaceissues, information loss and overall effectiveness of the system. Among the different re-search questions, ones that might be interesting for this study is shift of customer demandor interest and how the recommender systems can be combined with strategic operationsof an organization.

Regardless if a customer’s future actions are predicted with his/hers past actions or ifthe assumption of ’similar users prefer similar items’ is used, these do not consider theinterest shift of the customers with the change of time. Reliability of the user ratingsand the quality of final recommendation performance are affected negatively when theinterest shift is not considered. Zhimin Cheng, Yi Jiang and Yao Zhao(2010) proposedan effective collaborative filtering algorithm which uses time-based weighting functionthat makes final recommendations that reflect the change of user interest in time. Thetime-based weighting makes it possible to put more importance on more recent ratingswhen matching a user to another. The basic idea is that the algorithm specifies a timeinterval of user ratings, as to say, greater value for a time interval indicates an earlierevaluation time. Using the proposed weighting function, rating time intervals can beused to enhance the contribution of recent ratings to the prediction and weaken the con-tribution of earlier ratings.

By comprising marketing strategy with the essential parts of the recommender systems, abetter recommendation environment can be achieved through taking both the customers’satisfactions and the e-commerce company’s profits into consideration. A study by Hsiao-Fan Wang and Cheng-Ting Wu(2010) took two marketing strategies, win-win strategyand maximal profit strategy, into consideration for a case study with a collaborative-filtering inspired approach of recommender systems. Win-win strategy not only matchesthe user’s taste but also enhances the supplier’s profit. Maximal profit strategy rec-ommends products based in maximization of profit. When the recommending processused only the supplier viewpoint, an optimization problem arose where the goal was tomaximize the profits under a set of items that satisfy the user’s preferences and budgets.The study showed that the maximal profit strategy brought the highest income to thesuppliers but did not provide good service to the customers since their desires were sat-

10

isfied only passively. When the win-win strategy was used, a bi-objective programmingproblem arose; maximizing both the suppliers profit and customer’s satisfaction undersame conditions as in the other strategy. The experiment with the win-win strategyshowed that the service level decreases while the profit gain increases. But the resultsalso showed that it was possible to maintain a specific recommendation performancewhile increasing the profit gains.

2.8 SWOT-analysis

SWOT analysis, which stands for Strength, Weakness, Opportunity and Threat, is aprocess where internal and external inhibitors and enhancers of performance can be iden-tified. The analysis can be applied on organizations, people, business ideas, products,departments etc. Based on the analysis, these organizational influences can for instancehelp the stakeholders to decide what future actions to take from a strategic perspective(Silber et al. 2010, p.115-116).

Strength: an internal enhancer that give an advantage over others and excels the orga-nization from competitors.

Weakness: an internal inhibitor that stops the organization from performing at its opti-mum level.

Opportunities: an external enhancer of performance that can be pursued or exploited togain benefit and competitive advantage.

Threat: an external inhibitor of performance that has the potential to reduce accom-plishments and harm the organization.

11

Chapter 3

Methods

3.1 About the implementation

Concepts of recommender systems are used in this study. Historical sales statistics andproduct ratings are used as data in this project, which can be seen as a mixing approachto collecting knowledge about eleven AB’s customers. This study aims to generate prod-uct combinations and its probability of getting purchased based on customers’ preferencesassuming that the preferences will be unchanged through time as in the content-basedmethod.

The following parts in Methods are divided into two main parts. The purpose of thefirst part is to develop models with help of Generalized Linear Models (GLM) for eachproduct type in order to identify the significant variables that lead to a purchase andalso to predict sales amount. The significant variables are helpful when guiding the al-gorithm user in exactly which products should be included in the combinations lateron. The predictions are needed when identifying the most sold products during differenttime intervals. The purpose of the second part is to test if the conditional probabilitiesof the combinations are high enough. In other words, the conditional probabilities arenecessary for determining the feasibility of the proposed combinations.

The reason behind this implementation, where models are applied as a first step insteadof directly calculating combination probabilities, is because the final algorithm should beable to be applied for all currently existing and prospective products. By modelling, apredicted sales amount is taken into account rather than the factual sales amount. Thismakes it possible to include products with insufficient factual sales statistics in the prod-uct combinations by categorizing them to its corresponding product type model. Simi-larly, prospective products are able to be flexibly included in the models when launched.Furthermore, modelling makes it easier to change the original sales data, for instance toa recent one. The prediction models are supposed to provide a better overview for thepurchase department of eleven AB on the upcoming product buy-ins and pricing.

12

The algorithm specifies the exact steps from getting a barcode as an input to offeringbarcodes as possible combinations. This can be seen as an overall summary of the meth-ods used for the study with a clear context.

The programme R is the main tool of this study. For each section in the following partsof Methods, a brief summary is presented including which functions are used in order toget the required results, in which format the data is stored etc. In addition, Python andMicrosoft Excel are used as additional tools to sort some of the data.

3.2 Part I: Analysis and model development by GLM

3.2.1 Data collection

The data was obtained through assistance of eleven AB’s CTO and consisted of fourmain data categories: order/sales data, product data, group data and customer data.The order data contained information about each order and consisted of variables suchas order ID, product ID, currency, purchase date, purchased quantity, customer ID andshipping cost. The sales data used in this project contained all orders during the periodof 1st of January 2014 to 31st of December 2016. The product data contained informa-tion about each product and consisted of variables such as product ID, barcode, brand,rating, sales price and launch date. The group data contained information about thedifferent product types and consisted of variables such as product ID, barcode and grouptitles of the products. The customer data contained information about the customersand consisted of the variables customer ID, country and city.

There is a need to clarify the hierarchical structure of the products and its groupings usedin this study. Each product belongs to a product type and each product type belongsto a product category. For example, a ’red colored lipstick of brand X’ belongs to theproduct type ’lipstick’ and the product type ’lipstick’ belongs to the product category’makeup’. The product type ’lipstick’ has a certain product ID and can contain many dif-ferent products with different barcodes, in other words, barcoded product is the bottomelement of the structure. The highest element, product categories, can contain differentproduct types, as ’lipstick’ and ’mascara’ both belong to ’makeup’.

The main data collections could be merged as one with help of barcodes, product ID,customer ID and order ID which links the files together. By merging, a final data filewas able to be retrieved which included a detailed information about each order. Somedata points have been deleted in advance because there were empty values on order IDand sales price.

The final data was scoped where three product categories, A, B and C, were to be in-cluded in the proceeding study. The decision to scope the initial data was taken in orderto focus on the main product categories and to reduce the huge size of the data, which

13

had to be handled during the project. The final algorithm developed by this study can beused for the other product categories, which were not included in the steps of developingthe algorithm.

The scoped data was manipulated and transformed into two types of variables as follow-ing:

Response variable Explanation

Sales quantity the total number of each soldproduct type during a set timeinterval

Explanatory variables Explanation

Sales price the ordinary sales price of eachproduct

Brand the brand name of the product

Rating the numeric rating scale in-dicated by points from 1 to 5for each product or ’NA’ if theproduct has no rating

How new the measure on how new or oldthe product is by calculatingthe number of days between thepurchase and launch date

Purchase country the country where the purchasewas completed

Purchase month the month when the purchasewas completed

Table 3.1: The response and explanatory variables with explanations.

In R, the data is stored as a data frame where the rows correspond the observations orthe orders and columns correspond the variables.

3.2.2 Model building

According to Introduction to Linear Regression (Montgomery, Peck and Vining 2012,p.444), the Poisson distribution is often used for modelling the relationship between theobserved counts and potentially useful regressors or predictor variables. Thus, consider-ing the sales statistics for every product as count data, the GLM models with Poissondistributions seemed to be appropriate and have been applied in order to build the initial

14

models for each product type.

Initial grouping of data

The data was grouped into several groups for each explanatory variable. The mainintention behind the grouping of the data was to better classify each variable’s characterand to test for differences between the groups in overall evaluation. The factors thathave been taken into consideration during the grouping of the variables were:

• The number of observations in each group had to be large enough and similarlydistributed such that the GLM estimation would be stable.

• The groups had to be as sales performance homogeneous as possible such thatthe performance did not vary too much within the groups.

Explanatory variables Group 1 Group 2 Group 3 Group 4 Unit

Sales price <100 100-300 >300 - SEKBrand Premium Others - - -Rating NA 1-2 3 4-5 -How new < 1 1-2 2-3 >3 yearPurchase country SE FI NO Others -Purchase month 1-2, 9-10 3-5 6-8 11-12 month

Table 3.2: The initial grouping of data, where NA stands for not available.

Table 3.2 shows the initial grouping of data. The reason why the sale price group wasdivided into three groups and took 100 and 300 as boundaries depended on the pricesstatistics of all products in general. For prices under 100, it indicated that the productswere relatively cheap. Products that cost over 300 were classified as quite expensive incomparison to the others. As for the brand group, there was an existing list includingall premium brands for eleven AB. Hence, the rest of brands naturally belonged to thegroup of other brands. Since the ratings of 1 and 2 respective 4 and 5 tended to showsimilar influence on the sales performance, they have been merged into same group sep-arately. Majority of the products had no ratings that were resolved by creating a NAgroup. As for the how new group, the boundaries were chosen based on if the productswere newly launched, relatively new, relatively old or old. Since Sweden (SE), Finland(FI) and Norway (NO) are the main markets of eleven AB, they were maintained sep-arately as its own groups while the rest of the purchase countries were merged into asingle group, ’Others’. Considering the seasonal variations, the purchase month grouphas been divided into four groups, where the months 3-5, 6-8 and 11-12 corresponded tospring, summer and winter respectively. January, February, September and October weremerged into the same group because these months seemed to be most inactive during

15

the whole year for the beauty e-commerce market..

Model evaluation

The Wald test was used to check if the chosen groupings led to significant estimators,βi, which can be expressed by showing the significance level for each explanatory variable.

Coefficients:

Estimate Std. Error z value Pr(>|z|)

Intercept 0.001098 0.113156 0.010 0.99226salesprice100-300 0.628124 0.053344 11.775 < 2e-16 ***salesprice≥ 300 0.027535 0.057623 0.478 0.63276brandothers 0.305168 0.031202 9.780 < 2e-16 ***rating1-2 0.185946 0.055286 3.363 0.00077 ***rating4-5 -0.100352 0.033484 -2.997 0.00273 **hownew1-2 0.035756 0.045394 0.788 0.43088hownew2-3 -0.294372 0.049087 -5.997 2.01e-09 ***hownew>3 0.051804 0.044463 1.165 0.24398counrtyNO 0.163335 0.108044 1.512 0.13060countrySE 2.481706 0.086608 28.654 < 2e-16 ***counrtyOthers -0.150151 0.125921 -1.192 0.23309month3-5 0.728554 0.043885 16.601 < 2e-16 ***month6-8 0.813184 0.043909 18.520 < 2e-16 ***month11-12 -0.232227 0.056488 -4.111 3.94e-05 ***

Table 3.3: The Wald test for product type A37, where the significance levels are denoted by: 0 ’***’, 0.001 ’**’,0.1 ’ ’.

Table 3.3 shows the significance levels of product type A37 with regard to the initialgrouping of data, which indicated that the initial grouping was not good enough sincethe groups How new and Purchase country tended to be not significant at all. In fact,the same problem with grouping of How new and Purchase country occurred for over onefourth of the product types, which implied that some new groupings were needed.

New grouping of data

Based on the Wald test that was obtained through the initial grouping of data, newgroupings for explanatory variables How new and Purchase country have been done asbelow:

16

Explanatory variables Group 1 Group 2 Group 3 Group 4 Unit

Sales price <100 100-300 >300 - SEKBrand Premium Others - - -Rating NA 1-2 3 4-5 -How new ≤ 2 >2 - - yearPurchase country SE DKFINO Others - -Purchase month 1-2, 9-10 3-5 6-8 11-12 month

Table 3.4: The new grouping of data, where NA stands for not available and DKFINO represents the countrycodes for Denmark, Finland and Norway.

New model evaluation

Coefficients:

Estimate Std. Error z value Pr(>|z|)

Intercept 2.80679 0.06616 42.422 < 2e-16 ***salesprice100-300 1.02609 0.05186 19.787 < 2e-16 ***salesprice≥ 300 0.33093 0.05678 5.828 5.60e-09 ***brandothers 0.30863 0.03099 9.959 < 2e-16 ***rating1-2 -0.24263 0.05346 -4.538 5.67e-06 ***rating4-5 -0.20255 0.03322 -6.097 1.08e-09 ***hownew>2 -0.09764 0.03186 -3.065 0.00218 **counrtyOthers -2.89240 0.09970 -29.011 < 2e-16 ***counrtyDKFINO -2.21855 0.05389 -41.170 < 2e-16 ***month3-5 0.75875 0.04384 17.309 < 2e-16 ***month6-8 0.81355 0.04384 18.558 < 2e-16 ***month11-12 -0.25541 0.05644 -4.525 6.03e-06 ***

Table 3.5: The Wald test for product type A37 after new grouping, where the significance levels are denoted by:0 ’***’, 0.001 ’**’, 0.1 ’ ’.

Table 3.5 shows the significance levels of product type A37 based on the new grouping.According to Table 3.5, all of the explanatory variables have become significant, whichalso means that the new grouping provided a better grouping, compared to the initialone. In fact, the overall significance of the variables How new and Purchase country im-proved notably since there were only a very few product types which had low significanceafter the final grouping.

In R, the methods combinelevels(), difftime(), cut() and as.Date() are used in order togroup the data. Methods aggregate(), assign() and paste() are used in order to manip-ulate and sort the data in preparation for the GLM model fitting. Finally, the methodsummary() is used to produce result summaries, including the Wald test, of the resultsgotten from the model fitting function glm().

17

3.2.3 Variable selection

There were few product types that had low significance or no significance at all for allgroups under some specific explanatory variables. Under these circumstances, variableselection methods AIC and BIC were used in order to decide whether to include or ex-clude the variables of low significance.

AIC BICwithout full model without full model

A26 207.6061 210.66 221.51 229.78A35 579.86 555.204 600.72 585.332

Table 3.6: Methods AIC and BIC applied for product types A26 and A35, comparing a full model and a reducedmodel without the explanatory variable Purchase month.

AIC BICwithout full model without full model

A26 207.58 210.66 219.75 229.78A35 567.71 555.204 588.57 585.332

Table 3.7: Methods AIC and BIC applied for product types A26 and A35, comparing a full model and a reducedmodel without the explanatory variable Rating.

Table 3.6 shows that the product type A26 had a better model when the variable Pur-chase month was excluded. This result can be interpreted as month or season did notmatter for expected sales performance. On the contrary, the variable should be preservedfor the product type A35.

Table 3.7 shows that the product type A26 had a better model when the variable Rat-ing was excluded. This result can be interpreted as rating did not matter for expectedsales performance or that the available data was not enough to model a rating significantmodel since as mentioned earlier a majority of the products did not have rating data. Onthe other hand, the product type A35 had a better model when the variable was included.

When a product type had a reduced model by excluding the explanatory variable Pur-chase month and/or more, a decision had to be made whether to keep the product typefor the continued study or not. A reduced model could be a result of having insufficientdata, which led to insignificant variables that can affect the final prediction model nega-tively. Particularly, if the Purchase month variable was removed, then a prediction modelfor each month could not be retrieved. Therefore, product types with this problem hadto be removed in order to secure the significance of the final models.

In R, the methods BIC() and AIC() are used to test different models containing differentnumbers of explanatory variables.

18

3.2.4 Residual analysis

Figure 3.1: Residuals vs. Fitted values plot

Figure 3.2: Scale-Location plot

Figure 3.1 shows the relationship between the residuals and the predicted values for themodel of product type B3. Based on the figure, the residuals are not equally spreadaround the horizontal 0-line. Instead, the data points deviate more and more from the

19

dashed line when the predicted values become lager. It indicates that the variances arenot equal and there are non-linear relationships between the response variable and theregressors. Thus, the assumption without any requested normal distribution seems to beappropriate.

Figure 3.2 shows the relationship between the square root of the standardized devianceresiduals and the predicted values. In fact, the figure is similar to the Figure 3.1. Theonly difference between these plots is the different way of showing y-axis versus predictedvalues. In Figure 3.2, the applied red line is quite linear, which implies that there is aproportional relationship between the standardized deviance residuals and the predictedvalues. It also verifies the accuracy of the assumption, namely using Poisson distributionas the GLM model, since the variance is equal to the mean for the Poisson distribution.

From Figure 3.1 and Figure 3.2, there are some observations, e.g. 2, 22, 204, 258 thatstand out from the cluster of observations, which can be considered as potential outliers.However, they do not necessarily indicate any problem. By deleting these potential out-liers and comparing new plots with the old ones, there were no big differences betweenthe plots. These observations should therefore be retained and no further measurementis needed.

By checking the same plots for all other prediction models, they have been shown withsimilar tendencies, which imply that the assumption with Poisson distribution is correct.

In R, the method plot() is used to create the residual plots.

3.2.5 Multicollinearity

Based on the theory of multicollinearity, there are several ways for detecting the problemof near-linear dependencies among the regressors, of which the condition number hasbeen applied for analysis. The condition number of the covariance matrix for the prod-uct B3 turned out to be 4.659913, which implied that there was not any serious problemwith multicollinearity. In fact, the condition number for every prediction model was lessthan 100. Therefore, it was unnecessary to deal with multicollinearity.

In R, the methods model.matrix(), nrow(), ncol(), cor() are used to create correlationmatrix. Methods eigen(), max(), min() are used to calculate condition number.

3.2.6 Recommendations

The significant factors that had the most positive influence on the predicted sales per-formance have been saved as a ’Recommendation’ for each product type. The recom-mendations can be used when guiding the user exactly which characteristics the producttype should have in order to optimise the predicted sales amount.

20

The recommendations were retrieved by analysing the coefficients of each product type.Normally, the specific group that had the largest coefficient within each variable waschosen as the optimal group of that explanatory variable. If the percental difference ofpredicted sales amount of two different groups is lower than X%, then the both groupswould be considered as optimal. In this study, X was set as 50. This requirement can ofcourse be changed based on the algorithm user’s preferences and needs.

In R, the recommendation for each product type is stored as a new variable, adding acolumn to the data frame consisting of the product types.

3.2.7 Sales performance predictions

Based on the recommendations, sales performance predictions have been calculated forSweden. When two or more groups were optimal within a same explanatory variable,an average coefficient value was used. By rounding the predicted numerals to integers,a predicted sales amount for each season was generated for each product type. By di-viding by the number of months included in each season, an average amount for eachmonth was retrieved. Afterwards, one could clearly see which product types within theproduct categories A, B and C had the highest predicted sales amount during each month.

In R, basic mathematical operations and the method round() are used.

3.3 Part II: Conditional probability matrices

3.3.1 Generating product combinations

Based on the prediction models for each product type, gained in Part I, totally n numberof top selling product types for each season from the product categories A, B and Cwere chosen. In other words, a number of most-selling product types from category A,b number of most-selling product types from B and c number from C were chosen. Thisled to a + b + c = n product types. Exactly how much a, b and c should be can bechosen based on how high the user wants to set the ’most-selling percentile’. In this case,an upper percentile of fifty percent was chosen, that is, upper half of the product typeswithin each category were chosen when ranked by highest predicted sales amount.

The chosen product types were then used in creating combinations with no repetitionconsisting of two respectively three product types. The number of unique combinationsfor each season consisting of two product types were

(n2

). The number of unique combi-

nations for each season consisting of three product types were(n3

).

The combinations were both inter- and intra-product-category combinations. In addi-tion, the combinations only contained different product types. This was because of theassumption that the probability of a customer buying a two- or three-product combina-tion consisting of same product type was low. Therefore, such combinations were not of

21

interest in this study as mentioned in the introduction.

In R, the function combn() is used to generate the combinations. The n top-selling prod-ucts are saved in an array with dimensions 1xn. The combinations are saved as matrices,with dimensions 2x

(n2

)and 3x

(n3

), where the columns are the different combinations and

the rows are the product types included in the combinations.

3.3.2 Generating conditional probability matrices

In order to create conditional probability matrices for the unique combinations that havebeen obtained from Section 3.3.1 of two respective three-product types for each season,the original historical sales data was to be used and analysed once again. The main pur-pose of using the historical sales data was to validate the unique combinations that weregained based on the predictions model from Part I, i.e. if those combinations actuallyhave been occurred historically and if there were customers who preferred those kinds ofproduct combinations.

Out of the historical sales data, the number of each combination that existed in the com-bination lists in Section 3.3.1 and the quantity of each product types were calculated.This could be done by checking and counting which products that were purchased by thesame customers with same order IDs. It is worth noting that customers could purchasethe same products several times, which could be distinguished through different orderIDs. Customers can also buy same products in the same order, but the study has not con-sidered these situations since combinations consisting of same products were out of scope.

Thus, the probability for customers purchasing a specific product in a specific month, forexample product A1, or purchasing products A1 and B1 respective A1 and B1 and C1was calculated as follows:

P(A1) = the sales amount of product A1/ the total number of order IDs

P(A1 ∩ B1) = the sales amount of product combinations of A1 and B1 / the total num-ber of order IDs

P(A1 ∩ B1 ∩ C1) = the sales amount of product combinations of A1, B1 and C1 / thetotal number of order IDs.

Observe that the sales amount was counted specifically per season as in the month groups.Average per month was then calculated based on how many months there were per group.

Combining the equations above with the theory of conditional probability, the individ-ual conditional probability of two- respective three-product type combinations could bedescribed as:

22

P (B1|A1) = P (A1 ∩ B1)

P (A1)

P (C1|A1 ∩ B1) =P (A1 ∩ B1 ∩ C1)

P (A1 ∩ B1)

where P(B1|A1) indicates the conditional probability of purchasing product B1 giventhat the customer has already purchased product A1 and P(C1|A1 ∩ B1) represents theconditional probability of purchasing product C1 given that the product combination ofproduct A1 and B1 has been purchased earlier.

When calculating the conditional probability of the three-product combination, therewere several aspects to consider. First of all, there had to be enough data in order forthe result to be mathematically reliable. Therefore, a requirement, Y, of minimum 15counts on P(A1 ∩ B1 ∩ C1) had to be set. Secondly, one had to make sure that theprobability of products A1 and B1 getting purchased was high enough in order to makethe probability of C1 getting purchased worthy. Therefore, both P(B|A) and P(A|B)had to be over a certain requirement, Z, for the next calculation to take place. Therequirement was set as 0.3.

In R, the function array() is used to create matrix with products or product combinationsas column and row names. By using several for-loops and basic operations of arithmetic,every cell of a matrix is filled with corresponding conditional probabilities.

3.4 Algorithm formulation

The algorithm, based on the initial data, creates sales prediction models with help ofGLM with Poisson distribution for desired product types. The sales enhancing variablegroups are classified as recommendations for each product type. Sales amounts are pre-dicted for each month and product type. Combinations of two or more products aregenerated for the well-selling products for respective seasons. The conditional proba-bilities matrices are then generated. Based on the user’s requirements of probabilityrequirements, the algorithm suggests combinations including the input product.

The different requirement variables needed for the algorithm is explained and put togetherbelow.

23

Requirementvariable:

When to use: When setting a requirement: Value used in this study:

X (p.21) When deciding recommenda-tions based on regression co-efficient values. If the sec-ond largest coefficient withina variable group has nearlysimilar value as the max, bothshould be included.

Check whether the percentaldifference of the predictedsales amount of two differentcoefficients within a variablegroup is less than X%.

50

a,b,c (p.21) When deciding how manytop selling products for eachmonth from each productcategory should be includedwhen generating combina-tions.

A ’most selling percentile’should be selected to choosea specific number of prod-uct types among the highestranked ones in predicted salesamount for each category.

a = 0.5*number of prod-uct types in category A;b = 0.5*number of prod-uct types in category B;c = 0.5*number of prod-uct types in category C.

Y (p.23) When deciding whether ifP(A1 ∩ B1 ∩ C1) data isenough for calculations ofconditional probability forthree products.

When deciding whether ifP(A1 ∩ B1 ∩ C1) data isenough for calculations ofconditional probability forthree products.

15

Z (p.23) When deciding whether ifP(B|A) and P(A|B) areenough to calculate the con-ditional probability for threeproducts.

Must be over a certain level inorder to secure that the calcu-lated probability of C gettingpurchased was reliable.

0.3

Table 3.8: Showing the different requirement variables needed for the algorithm.

24

Chapter 4

Results

4.1 Part I

4.1.1 Sales performance prediction models

The final model can be expressed as:

log(y) = β0 +12∑i=1

xiβi.

In order to make the formula more reader-friendly, the final model can be described asbelow:

log(predicted sales amount) = β0 +

0, if sales price < 100

β1, if sales price 100− 300

β2, if sales price > 300

+

{0, if premium brandβ3, if other brands

+

0, if rating NAβ4, if rating 1− 2

β5, if rating 3

β6, if rating 4− 5

+

{0, if newβ7, if old

+

0, if country SEβ8, if country DKFINOβ9, if country others

+

0, if month 1− 2, 9− 10

β10, if month 3− 5



Table 4.1 shows the coefficients and reference groups of product types A1, A3, B3, B6,C6 and C7. Similarly, a total of 89 product types have been modelled.

25

A1 A3 B3 B6 C6 C7

Intercept β0 1.7468035 5.34711268 4.841939 6.076943 3.43216 7.452706

Sales price <100 ref 0 0 0 0 0 0

Sales price 100-300 β1 0.4164126 0.48178607 1.222029 -0.114579 0.84161 -0.740294

Sales price >300 β2 -1.6926008 -0.07297118 -3.375762 -2.945554 -3.61049 -3.827561

Premium Brand ref 0 0 0 0 0 0

Other brands β3 1.2352715 0.14443347 1.040213 -0.192243 0.68234 1.325882

Rating NA ref 0 0 0 0 0 0

Rating 1-2 β4 NA -0.73942075 -2.119507 -1.636227 -3.21259 -3.725096

Rating 3 β5 NA -1.21466017 -0.160187 -0.911495 -1.50872 -3.764935

Rating 4-5 β6 0.6278300 0.62030802 0.140401 1.142131 1.04570 -2.119104

How new ≤ 2 ref 0 0 0 0 0 0

How new >2 β7 1.0867559 0.21598123 0.670811 0.185135 0.92079 -0.592148

Country SE ref 0 0 0 0 0 0

Country DKFINO β8 -2.0689644 -1.86755191 -2.462620 -1.926567 -2.30330 -1.829067

Country others β9 -3.0197942 -3.81171389 -4.599171 -4.813035 -4.55466 -4.245881

Month 1-2,9-10 ref 0 0 0 0 0 0

Month 3-5 β10 0.4685151 -0.42065135 -0.272584 -0.227164 -0.30863 -0.249008

Month 6-8 β11 0.9445195 -0.49300602 -0.338957 -0.346635 -0.30497 -0.216541

Month 11-12 β12 -0.5008866 -0.36292340 -0.402817 -0.431793 -0.70143 -0.441308

Table 4.1: Showing the coefficients and reference groups of product types A1, A3, B3, B6, C6 and C7, where NAstands for not available.

26

4.1.2 Recommendations for each product type

Product type Recommendation

A23 Price between 100-300/Does not depend on brand/Rating between 4-5/Old

A6 Price between 100-300/Premium brands/Rating between 4-5/Old

A31 Price lower than 100/Other brands/No rating/New

A26 Price between 100-300/Does not depend on brand/No rating/Old

A37 Price lower than 100/Other brands/Rating between 4-5/Old

Table 4.2: Showing the recommendations of five most selling product types in category A.

From Table 4.2, one can see the recommendations for the top five most selling producttypes of category A. For instance, if A23 is included in a combination, the user shouldchoose a product that costs between 100-300 SEK, has a rating between 4-5 and haslaunched more than 2 years ago. While which brand the product has does not matter.When a combination consists of a certain product type, its recommendation should befollowed when choosing the respective products.

4.1.3 Selection of product types

With help of the created models and the recommendations, a sales prediction was calcu-lated for each product type each month.

Months 1-2, 9-10 Months 3-5 Months 6-8 Months 11-12

A1 42 88 142 50

A3 227 198 185 315

A6 458 384 411 559

Table 4.3: Showing the predicted average sales amount per month for product types A1, A3 and A6.

Similarly, predicted sales amount was calculated for all product types of categories A, Band C. Based on the predictions, most selling products for each season were chosen forfurther study for combinations. Table 4.4 shows the five products that have highest salesamount for each season.

27

Months 1-2, 9-10 Months 3-5 Months 6-8 Months 11-12

Top 1 C7 C7 C7 C7

Top 2 B19 C9 C9 C9

Top 3 C9 B19 B19 B19

Top 4 B22 B22 B22 B22

Top 5 B3 B3 B30 B29

Table 4.4: The top five best-selling product types for specific months.

4.2 Part II

4.2.1 Product combinations

Combination 1 Combination 2 Combination 3 Combination 4 Combination 5

Product type A5 A8 B3 C7 B6

Product type A3 B19 C14 B9 B13

Table 4.5: An extract of unique combinations of two products for months 1-2, 9-10.

Table 4.5 shows an extract of the unique combinations of two-product type combinationsin the form of matrix. For example, the two-product combination of ’foundation’ and’mascara’ is recommended during months January to February and September to Octo-ber; the combination of ’gel nail polish’ and ’mini nail polish’ is recommended duringspring (Mar. to May); the combination of ’foundation’ and ’eyeliner’ is recommendedduring summer (June to Aug.) and the combination of ’nail file’ and ’gel nail polish’ isrecommended during the winter (Nov. to Dec.).

Combination 1 Combination 2 Combination 3 Combination 4 Combination 5

Product type A20 A31 A5 A31 A21

Product type B16 A9 A41 C6 B19

Product type A23 A6 B9 B11 B11

Table 4.6: An extract of unique combinations of three products for months 1-2, 9-10.

Table 4.6 shows an extract of the unique combinations of three-product type combina-tions in the form of matrix. For example, the three-product combination consisting of

28

’rouge’, ’contouring’ and ’highlighter’ is recommended during months January to Febru-ary and September to October; the combination of ’face mask’, ’skin problem (body)’and ’skin problem (face)’ is recommended during spring (Mar. to May); the combinationof ’sports, health & travel’, ’bronzer’ and ’sunless tanning’ is recommended during thesummer (June to Aug.) and the combination of ’eye shadow’, ’bronzer’ and ’highlighter’is recommended during the winter (Nov. to Dec.).

Observe that the examples above are not based on Table 4.5 and Table 4.6.

4.2.2 Conditional probability matrices

A1 A1 ∩B1

B1 P(B1|A1) -

C1 P(C1|A1) P(C1|A1 ∩B1)

Table 4.7: The structure of the conditional probability matrix.

Table 4.7 shows the structure of the conditional probability matrix, where each cell de-notes the conditional probability of a product in the rows given a product in the columns.Totally, 8 different conditional probability matrix have been obtained.

A5 A31 A21 A15 A10

A17 0.0198660714 0.0195089873 0.024049651 0.2402221970 0.0113759480

A29 0.0696428571 0.0427444103 0.038789760 0.0192721914 0.0755687974

A3 0.2292410714 0.1030249890 0.050426687 0.0252805804 0.0731310943

A36 0.0290178571 0.0241122315 0.036462374 0.2647092166 0.0213976165

A22 0.0046875000 0.0032880316 0.003878976 0.0015871216 0.0487540628

Table 4.8: An extract of the conditional probability matrix of two-product combinations for months 1-2, 9-10.

Table 4.8 shows an extract of the conditional probability matrix of two-product combi-nations. For instance, the conditional probability for purchasing ’Bronzer’ given that acustomer has already purchased ’After sun’ is 0.056291391 for months June to August.

29

A5 ∩ A31 A5 ∩ 21 A31 ∩ A8 A31 ∩ A40 A21 ∩ A32

B30 0.020100503 0.000 0.005882353 0.00000000 0.07692308

B12 0.040201005 0.025 0.111764706 0.05405405 0.12820513

B29 0.030150754 0.000 0.035294118 0.00000000 0.00000000

B25 0.105527638 0.100 0.088235294 0.08108108 0.20512821

B8 0.035175879 0.000 0.023529412 0.02702703 0.07692308

Table 4.9: An extract of the conditional probability matrix of three-product combinations for months 1-2, 9-10.

Table 4.9 shows an extract of the conditional probability matrix of three-product combi-nations. For instance, the conditional probability for purchasing ’face mask’ given thata customer has already purchased ’Quick fix’ and ’Anti-age’ is 0.054662379 for monthsNovember to December.

Observe that the examples above are not based on Table 4.8 and Table 4.9.

4.3 Algorithm

When an input of a barcode and desired number of products in the package (2 or 3) aregiven, the algorithm checks whether the predicted sales amount of this product is highenough to include it in combinations. If not, an answer saying that there are no optimalcombinations with this product will be returned to the user. If the corresponding producttype of the product has a high sales prediction, the algorithm will search for differentcombinations that include the specific product type. Out of all the possible combinations,those combinations that fulfil the requirements, of the conditional probability of theother products will be purchased given that the input product will be purchased, willbe presented to the user. The algorithm will return a list of possible product typescombinations with respective products, which fulfils the recommendations. For eachproduct type combination, the conditional probability of it getting purchased and anaverage estimation of the total value of the combination will be given.

30

4.4 SWOT chart

Strengths Weaknesses

• Different shipping alternatives • Few premium brands• Free standard shipping • Personalized service available only

for specific brands• Personalized service: engravings • Different shipping alternatives not

so well-adapted among customers• Active campaigns • Not known to be specialists on cer-

tain area of products• New and trendy products launch fast• Student discount• Sponsoring of fashion/beauty events• Beauty blog/magazine• Active on social media(Facebook, YouTube, Instagram, Twitter and Spotify.)

Opportunities Threats

• More sponsoring of public events • Not having a better competitiveadvantage against the competitors

• More focus on the affiliate program • Lower barriers to market entry• Campaigns related to the physical stores• Widen the shipping alternatives and adapt theexisting ones more for the customers• Annual campaigns with regards to different

holidays and special occasions• Campaigns relates to viral trends or beauty phenomenon

(e.g. organic cosmetics and South Korean beauty products.)

Table 4.10: The SWOT chart for eleven AB.

Table 4.10 represents a SWOT chart with regard to eleven AB. Observe that the SWOTanalysis is based on the authors’ ideas and interpretation of the organization. The anal-ysis is not based on factual statistics or surveys.

31

Chapter 5

Discussion

5.1 Interpretation of the results

In total, 89 prediction models were developed for the different product types. The overallsignificance of the variable groups and residual analysis showed to be good. Furthermore,the presence of multicollinearity was low. These promising results are probably a positiveconsequence of the large number of observations. The recommendations were particularfor every product type. A brief interpretation of each variable and its recommendationsis given below.

Sales price: The three different groups of sales prices fitted the product types of thecategories A, B and C well. It was observed that the sales price between 100-300 wasthe group that tended to be most selling for categories A and B. This can be explainedas most products for these categories have sales prices within this range. Furthermore,products for these categories within this price range most likely match the customer’sexpectation. As for category C, prices lower than 100 was most selling. This has sameexplanations as earlier but for category C.

Brand: The two different groups of brands fitted the product types of the categories A,B and C well. It was observed that the other brands were most selling for all categories.The reason behind this can be that the number of premium brands in contrast to numberof other brands is low.

Rating: The four different groups of ratings fitted the product types of the categoriesA, B and C well. Higher rating between 4-5 had a very positive influence on the salesamount while lower ratings 1-2 had a negative impact, more negative than No rating.This can be explained as lower ratings actively give negative review of a product for thecustomers, while products with no ratings leave a neutral impression on the customers.Same trend was observed for all categories.

How new: The two different groups of how new the product is fitted the product types

32

of the categories A, B and C well. How new the product should be seemed to differ forthe product categories. For categories B and C, newer products seemed to have morepositive influence on the sales amount. As for category A, older products were morepopular. These results can be explained by the characteristic difference of the categories.Categories B and C are more trend-related and number of new launchings per time in-terval is higher than product types related to category A.

Purchase country: The three different groups of purchase countries fitted the producttypes of the categories A, B and C well. It was clear that Sweden is the biggest andmain market. The high significance of the groupings also tells that markets of Denmark,Finland and Norway are sales performance homogenous.

Purchase month: The four different groups of purchase months fitted the product typesof the categories A, B and C well. Sales amounts had seasonal variations, for instance sunprotection products dominated the summer season. On the other hand, many producttypes had a constant high sales amount throughout a year.

When the recommendation for a certain product says No rating, the rating of the prod-uct does not matter. Observe that this can occur because that non-rated products sellbetter or that none of the products had ratings from the very beginning so that thisrecommendation had to be given. Similarly, Price lower than 100 can be interpreted asthat cheaper products of the given product type sells better or that all products withinthe same type is under 100. These interpretations should be taken into account whenusing the recommendations.

The study is based on factual sales statistics of three years. The regression method ofgeneralized linear model with Poisson distribution proved to fit the data. Therefore, theresults are reliable. However, the different requirement variables and calculations withaverage values could have reduced the final reliability of the results.

Furthermore, other approaches with the same problem definition as this study could getdifferent results. Importantly, the offered total price of a final combination and whetherif the customers have the right to choose between different products while combiningwill hugely affect the sales amount of the combinations. In other words, the marketingstrategy will affect the factual outcome of the mathematical predictions.

5.2 Marketing strategy analysis

Strengths:

The different shipping alternatives are unique and enhances the customers’ possibilityto reach certain order values to use the shipping offers for free. As for the free stan-dard shipping, it triggers the conversion rate of visitors to customers. The personalized

33

service, engravings of specific products gives further value to the products. eleven ABalmost always have different campaigns, such as sales and special offers, which undoubt-edly enhance sales. It is notable that student discount is offered on all standard orders.The range of the product offering is wide and new and trendy products from the existingbrands often launch very fast on the site. The company has an affiliate program and hassponsored or cooperated with different public events which of course help building thebrand. Furthermore, many social media channels are used which enable direct contactand updating with the customers.

Weaknesses:

One can easily see that campaigns in form of special offers and personalized servicesare only available for some premium brands. The premium brands get a lot of mar-keting while impacts of many other brands are getting underestimated and may not beoptimally exploited. Even though there are shipping alternatives that are unique anddifferent, they tend to be not so well-adapted among the customers. Another weaknessof the marketing is that the company still not possess a competitive advantage relatedto being a specialist in a certain beauty area, compared to for instance Lyko which isspecialized in haircare.

Opportunities:

As for the opportunities, eleven AB is considering more sponsoring of public events inorder to increase its visibility. For instance, being featured on an event’s brochure, ap-pearing on an event’s official website with eleven’s logo or having an item in the goodiebags are all good ways to highlight its sponsorships, which in turn will improve eleven’sbrand recognition. Furthermore, eleven should also focus more on the affiliate programs.Besides the current affiliate marketing channel, i.e. blogs, it is worth expanding to otherplatforms such as YouTube. By collaborating with well-known youtubers, both elevenas a company and its products will get more visibility and wider publicity. In additionto the above-mentioned points, there are also a number of opportunities in regards tocampaign effectiveness. An obvious difference between online retailer and physical storeis that customers can first try and then buy products. Thus, for customers who prefer totry on the products, more physical stores and campaigns related to the physical stores areneeded. The campaigns should also be related to different trendy products and brands,for example, South Korean and organic cosmetics and skincare products have becomemore popular in recent years. By doing frequent market research, eleven can always keeptrack of the latest market trends and customer demands. It will also be a beneficial wayto create annual campaigns with respect to different holidays such as Valentine’s Day andBlack Friday. Last but not least, eleven should be able to widen its shipping alternativesand adapt the existing ones more for the customers.

Threats:

34

The biggest threat that eleven is facing relates to its competitors in the local marketsuch as Lyko and Kicks. As mentioned before, Lyko is more specialized in haircare prod-ucts and Kicks is established much earlier in addition to enjoying a higher popularitythan eleven. Compared to its competitors, eleven does not have a stronger competitiveposition with regard to either prices or products. Another potential threat is the lowerentry barriers to the beauty e-commerce market since the costs to develop and providee-commerce services are relatively low. Thus, eleven will be faced with more intensecompetition in a rapidly evolving industry as beauty e-commerce market.

Discussion:

From the SWOT analysis, one can see that similar points are mentioned as strengths aswell as weaknesses and opportunities. The interpretation is that the company should fo-cus on the current marketing strategy simultaneously as taking steps to improve/strengthenit further. Especially for the shipping alternatives, eleven should optimize them and makethem more adapted for its customers. eleven has since long made great efforts on offeringdifferent campaigns, but the company should pay increasing attention to the selectionof campaigns and focus more on a few campaigns, turning to some annual big eventsin order to attain the customers’ high expectations during these periods. Moreover,it is important for eleven to keep on developing its affiliate programs as mentioned inthe opportunities section. Since customers are the cornerstones of eleven’s success, itis essential that the company strives to better understand and fulfil customers’ demandcontinuously. Even though it has been pointed out that new entrants could be consideredas threats for eleven, the impact is not as severe as it may seem. eleven already has aleading position in Scandinavian market, thus, they are less vulnerable towards potentialnew competitors.

5.3 Shortcomings

The initial data was overall well fit to Poisson distribution but there were few modelswhich were not perfectly fit to the distribution. The final groupings of variables led toa huge improvement of overall significance of the coefficient estimates. Unfortunately,a very few coefficients remained insignificant. These could have affected the reliabilityof the prediction models negatively. A better grouping of variables could have been re-trieved.

The original data had many non-sampling errors. Errors in data acquisition, for in-stance orders and products with negative or zero sales price, were fixed by deleting them.However, it cannot be guaranteed that all existing errors of this type were handled andtherefore can have affected the results negatively. The hierarchical structure of the prod-ucts also had errors, for instance a specific name of a product type could belong to twodifferent product categories. For this study, these situations were handled by simply

35

taking only one of these product types into consideration. However, in order to fullymodel all product types, this kind of error must be handled better by having a betterhierarchical structure initially. Another error in the data were insufficient data observa-tions for some product types or some combinations. These errors were handled by settingrequirement of minimum number of observed data in order to get reliable mathematicalresults. This could have led to disregarding of many possible combinations and shouldbe improved for later studies.

The different requirements needed during the study can be hard to set. Not so well-reasoned or well-thought-out requirements can affect the final results negatively.

This study did not include the different factors which in reality affect the sale statisticstremendously. For example, different campaigns, sales, viral trends, number of clicks ofeach product and number of times each product was placed in a shopping basket. Bytaking these factors into consideration, a better algorithm could have been developed.

Finally, the main assumption the study had was that the users’ preferences remain un-changed through time. In fact, preferences and demands in the beauty e-commercemarket change, just like it does in the whole retailing sector. Therefore, a current reli-able result cannot remain reliable for a long time. Constant update of initial sales datashould be strived.

5.4 Comparison to earlier studies

Studies related to recommender system are the only similar studies we were able toidentify. Despite the similarities, the existing differences can dilute the purpose of com-parisons. Therefore, it is only relevant to compare the main ideas and approaches froma general view.

As discussed earlier, interest or demand within the beauty e-commerce market changesthrough time. The interest shift should be taken into account when interpreting thereliability of the results of this study. The algorithm derived from this study has notactively handled the interest shifts of the customers. As recommended, the customerdata which the algorithm is based on should be updated to a more recent one regularly.On the other hand, unlike the recommender systems, interest shift of a customer base isslower or more solid than interest shifts of an individual. Therefore, an active handlingof this might not be as consequential as in recommender systems.

As for combining marketing strategy with product combinations, recommendations aregiven under the marketing strategy analysis. Since the algorithm gives both averageorder value and conditional probability of the combination getting purchased, the usercan with help of the final price of the combination, consider between the importance ofa higher order value or higher probability. However, unlike the recommender systems,

36

change in satisfaction or service level is hard to determine for the algorithm since it isnot for individual customers. Therefore, it is hard to determine how the active marketingstrategy will affect the actual average order values resulted from the combinations.

37

Chapter 6

Conclusion

The purpose of this thesis was to create product combinations based on customer dataanalysis. This thesis provided a combined study of applied mathematical statistics andmarketing strategy analysis. Coming back to the target questions, the correspondinganswers can be described as following:

• Which factors are most significant for sales statistics of certain product types?

Based on the regression analysis, it has been shown that all the defined factors, salesprice, brand, rating, purchase month, purchase country and how new or old the productis, are significant for sales statistics. However, there are specific characteristics withineach variable group that are most significant for a certain product type, which have beenpresented in Section 5.1.

• Which product types have the highest sales amount during specific months?

It can be concluded that product type C7 has the best sales amount throughout a year(see Table 4.4 in Section 4.1.3). Most of other top products occur more than once in thetable, which also indicates that seasonal variation has not so much impact on top-sellingproducts.

• Is it possible to create product combinations including two or three differentproducts of same and different product categories?

The study shows that combinatorics can indeed be used to combine two and three prod-ucts of same and different product categories (see Section 4.2.1).

• If so, what is the conditional probability of a customer buying the second (andthe third) product of the combination, given the first product as input?

38

The conditional probability of a second or a third product getting purchased can be re-trieved by the conditional probability matrices which were developed in Section 4.2.2.

• How can better understanding of the market and its competitors increase sales?

By applying method SWOT-analysis, strengths, weaknesses, opportunities and threatshave been identified for eleven AB. As a result, the company should focus on its currentmarket strategy and improve it further by better adapting for marketing changes and itscustomers.

39

Chapter 7

Further research

Throughout the whole study, a problem which has been identified was the main struc-ture of the initial data. How to develop a better data sorting system is recommended forfurther research. It will not only help to build better models but also benefit companyitself in clarifying all the information.

This thesis was based on the assumption of unchanged user’s preference, but as a rapidlyevolving industry like e-commerce, it is actually hard to predict customer’s behaviourand how much it may affect the sales performance. Thus, taking factors with regards tobehaviour changes would be interesting for further research. In addition, further researchcomparing how different marketing strategies affect the actual sales performance of thealgorithm is also of interest.

40

Chapter 8

References

Blom, Gunnar, Enger, Jan, Englund, Gunnar, Grandell, Jan and Holst, Lars. 2016.Sannolikhetsteori och statistikteori med tillämpningar. 5th ed. Studentlitteratur AB.

Chen, Zhimin, Jiang, Yi and Zhao, Yao. 2010. A Collaborative Filtering RecommendationAlgorithm Based on User Interest Change and Trust Evaluation. International Journalof Digital Content Technology and its Applications: Volume 4, Number 9.

eleven AB. 2017. Om eleven. eleven.se. https://eleven.se/om-eleven-p6.html (Accessed2017.04.28).

Greene, William H. 2012. Econometric Analysis. 7:th ed. England: Pearson EducationLimited.

Gut, Allan. 2009. An Intermediate Course in Probability. 2:nd ed. New York: SpringerScience+Business Media.

Lampropoulos, Aristomenis S. and Tsihrintzis, George A. 2015.Machine Learning Paradigms: Application in Recommender Systems. Springer Interna-tional Publishing. Switzerland.

Mazur, David R. 2010. Combinatorics: A Guided Tour. USA: The Mathematical Asso-ciation of America.

Montgomery, Douglas C., Peck, Elizabeth A. and Vining, G. Geoffrey. 2012.Introduction to Linear Regression Analysis. 5:th ed. Hoboken, New Jersy: John Wiley& Sons, Inc.

Ramberg, Joachim; CTO eleven AB. 2017. Interview 27 Jan.

Schafer, J.Ben, Konstan, Joseph and Riedl, John. 1999. Recommender systems in e-commerce.

41

University of Minnesota.

Silber Kenneth H., Foshay Wellesley R., Watkins, Ryan, Leigh, Doug, Moseley, JamesL. and Dessinger, Joan C. 2010. Handbook of Improving Performance in the Workplace.Vol. 2. International Society fot Performance Improvement.

Wang, Hsiao-Fan andWu, Cheng-Ting. 2010. A strategy-oriented operation module for recommendersystems in E-commerce. Computers & Operations Research 39 (2012) 1837-1849.

Weinswig, Deborah. 2016. Deep Dive: Global Beauty E-commerce - A Highly Attractive Market.Fung Global Retail & Technology.

42

TRITA -MAT-K 2017:17

ISRN -KTH/MAT/K--17/17--SE

www.kth.se

Date post:	01-Apr-2021
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Algorithm that creates product combinations based on ...1117197/FULLTEXT01.pdfDEGREE PROJECT IN...

Documents