Linda Schumacher Informs 2016

1

An Investigation of Cluster Analysis of Retail Stores

To Improve Predictive Modeling of Sales

Linda SchumacherINFORMS Annual Meeting 2016

2

Retail stores ◦ Continental US ~ 2000 stores◦ Many products at each location

Predict sales◦ Classification Problem: Sold/Unsold ◦ Model per product category - pooling

Assortment Planning

Business Problem

3

Assortment Planning Right product at the right store

◦ Avoid out of stock◦ Avoid non-productive inventory

Use predictions to select products per store Increase sales

Business Problem

4

Predictive Modeling - Classification◦ Pooling SKUs by product category

Segment Stores – Cluster Analysis◦ Group like stores together based on customer

needs◦ Define sister stores

Approach

5

Unsupervised Machine Learning – discovery natural groupings ◦ Items that are alike within the group ◦ groups are different from each other◦ alike/different – judged by distance

R Open Source ◦ Statistical software◦ Variety of Packages/Libraries

Cluster Analysis - Introduction

6

Methods Centroid-based Kmeans

Hierarchical

Probabilistic/Fuzzy

Mixture Models◦ Expectation Maximization

Cluster Analysis

7

Considerations Input data prep – scaling required Transformations maybe beneficial

Number of clusters – K/G◦ WSS, gap test, BIC

Quality of clustering solution ◦ Metrics◦ Business purposes

Cluster Analysis

8

Input Data Customers need to purchase specific products based on type of

projects For each project type estimate number of the top J projects near

the store

Cluster Analysis

Store

Modular Log Marble

1 100 5 202 80 4 223 4 200 24 7 150 0

9

Build Models and Evaluate A base classification model Two Approaches

◦ Classification model per cluster and calculate an overall weighted average metric

◦ Derive new cluster average sales variables. Build one predictive model with all variables in the base model and the new derived cluster average variables.

Classification

10

Dependent Variable◦ Sold/unsold over 1 year period

Independent Variables ◦ Store level Sales

Most recent year Previous year

◦ …

Classification

11

Tree-based classification models ◦ Decision Tree rpart- Recursive Partitioning for

Classification◦ Boosted trees – C5.0

R Caret Package ClassificationAndRegressionTraining◦ Split data into Train and Test sets◦ Pre-processing◦ Model Tuning/Training◦ Make Predictions◦ Calculate Results Metrics

Modeling in R

12

Accuracy

Sensitivity

◦

Evaluate Predictive Results - Metrics

13

Base Model

Consider Two Approaches

Compare Results

One Model With

Additional Derived average clustered sales variables

Many Models One per cluster

Weighted Average Overall

BASE ModelOne Model

With Original Input Data

14

The jury is still out.◦ Stability over other samples◦ Improve the predictive models

Classifier tuning Modify number of clusters Add other variables to the clustering

Conclusion

15

ReferencesR Caret Package

Kuhn, M. (2013). Predictive Modeling with R and the caret Package useR! 2013. Retrieved November 10, 2016, from https://www.r-project.org/nosvn/conferences/useR-2013/Tutorials/kuhn/user_caret_2up.pdf

Predictive modelling fun with the caret package. (2014). Retrieved November 10, 2016, from https://www.r-bloggers.com/predictive-modelling-fun-with-the-caret-package/

Kuhn, M. (n.d.). The caret Package. Retrieved November 10, 2016, from https://topepo.github.io/caret/index.html

https://www.r-project.org/nosvn/conferences/useR-2013/Tutorials/kuhn/user_caret_2up.pdf

https://www.r-project.org/nosvn/conferences/useR-2013/Tutorials/kuhn/user_caret_2up.pdf

https://www.r-bloggers.com/predictive-modelling-fun-with-the-caret-package/

https://topepo.github.io/caret/index.html

16

ReferencesInformation Pooling

Ali, O. G., S. Sayin, T. Van Woensel, and J. Fransoo. "Pooling Information Across SKUs for Demand Forecasting with Data Mining" Researchgate.net. N.p., 30 Oct. 2007. Web. 14 Nov. 2016. The authors evaluate different methods in forecasting demand and consider pooling SKUs by category.

https://www.researchgate.net/profile/Tom_Van_Woensel/publication/228762017_Pooling_information_across_SKUs_for_demand_forecasting_with_data_mining/links/02e7e5249c1f1c11ae000000.pdf

17

ReferencesR Clustering

Mclust Version 4 for R: Normal Mixture Modeling for Model ... (n.d.). Retrieved November 10, 2016, from http://www.stat.washington.edu/research/reports/2012/tr597.pdf

Oksanen, J. (2014, January 26). Cluster Analysis: Tutorial with R. Retrieved November 10, 2016, from http://cc.oulu.fi/~jarioksa/opetus/metodi/sessio3.pdf The author presents R code to perform Hierarchic and Fuzzy Clustering.

Quick-R. (n.d.). Retrieved November 10, 2016, from http://www.statmethods.net/advstats/cluster.html Quick-R site is very good for brief overview of creating clustering code in R.

Raftery et al. (2012, June). Mclust Version 4 for R: Normal Mixture Modeling for Model ... Retrieved November 10, 2016, from http://www.stat.washington.edu/research/reports/2012/tr597.pdf The authors present both mixture model explanations with accompanying R code from Mclust R package.

http://www.stat.washington.edu/research/reports/2012/tr597.pdf

http://cc.oulu.fi/~jarioksa/opetus/metodi/sessio3.pdf

http://www.statmethods.net/advstats/cluster.html

http://www.stat.washington.edu/research/reports/2012/tr597.pdf

19

Education Stanford - Graduate Certificate in Data Mining and Applications Oklahoma State - Graduate Certificate in Business Data Mining University of Illinois – MS in Computer Science with Thesis Stevens Institute – BS in Computer Science (Applied Math, OR)

Contact InformationEmail: [email protected]: https://www.linkedin.com/in/linda-schumacher-9b1b214

Slides will be available on www.slideshare.net

Linda Schumacher

mailto:[email protected]

https://www.linkedin.com/in/linda-schumacher-9b1b214



Date post:	11-Jan-2017
Category:	Data & Analytics
Upload:	linda-schumacher
View:	21 times
Download:	0 times

Linda Schumacher Informs 2016

Data & Analytics