Date post: | 11-Jan-2017 |
Category: |
Data & Analytics |
Upload: | linda-schumacher |
View: | 21 times |
Download: | 0 times |
1
An Investigation of Cluster Analysis of Retail Stores
To Improve Predictive Modeling of Sales
Linda SchumacherINFORMS Annual Meeting 2016
2
Retail stores ◦ Continental US ~ 2000 stores◦ Many products at each location
Predict sales◦ Classification Problem: Sold/Unsold ◦ Model per product category - pooling
Assortment Planning
Business Problem
3
Assortment Planning Right product at the right store
◦ Avoid out of stock◦ Avoid non-productive inventory
Use predictions to select products per store Increase sales
Business Problem
4
Predictive Modeling - Classification◦ Pooling SKUs by product category
Segment Stores – Cluster Analysis◦ Group like stores together based on customer
needs◦ Define sister stores
Approach
5
Unsupervised Machine Learning – discovery natural groupings ◦ Items that are alike within the group ◦ groups are different from each other◦ alike/different – judged by distance
R Open Source ◦ Statistical software◦ Variety of Packages/Libraries
Cluster Analysis - Introduction
6
Methods Centroid-based Kmeans
Hierarchical
Probabilistic/Fuzzy
Mixture Models◦ Expectation Maximization
Cluster Analysis
7
Considerations Input data prep – scaling required Transformations maybe beneficial
Number of clusters – K/G◦ WSS, gap test, BIC
Quality of clustering solution ◦ Metrics◦ Business purposes
Cluster Analysis
8
Input Data Customers need to purchase specific products based on type of
projects For each project type estimate number of the top J projects near
the store
Cluster Analysis
Store
Modular Log Marble
1 100 5 202 80 4 223 4 200 24 7 150 0
9
Build Models and Evaluate A base classification model Two Approaches
◦ Classification model per cluster and calculate an overall weighted average metric
◦ Derive new cluster average sales variables. Build one predictive model with all variables in the base model and the new derived cluster average variables.
Classification
10
Dependent Variable◦ Sold/unsold over 1 year period
Independent Variables ◦ Store level Sales
Most recent year Previous year
◦ …
Classification
11
Tree-based classification models ◦ Decision Tree rpart- Recursive Partitioning for
Classification◦ Boosted trees – C5.0
R Caret Package ClassificationAndRegressionTraining◦ Split data into Train and Test sets◦ Pre-processing◦ Model Tuning/Training◦ Make Predictions◦ Calculate Results Metrics
Modeling in R
12
Accuracy
Sensitivity
◦
Evaluate Predictive Results - Metrics
13
Base Model
Consider Two Approaches
Compare Results
One Model With
Additional Derived average clustered sales variables
Many Models One per cluster
Weighted Average Overall
BASE ModelOne Model
With Original Input Data
14
The jury is still out.◦ Stability over other samples◦ Improve the predictive models
Classifier tuning Modify number of clusters Add other variables to the clustering
Conclusion
15
ReferencesR Caret Package
Kuhn, M. (2013). Predictive Modeling with R and the caret Package useR! 2013. Retrieved November 10, 2016, from https://www.r-project.org/nosvn/conferences/useR-2013/Tutorials/kuhn/user_caret_2up.pdf
Predictive modelling fun with the caret package. (2014). Retrieved November 10, 2016, from https://www.r-bloggers.com/predictive-modelling-fun-with-the-caret-package/
Kuhn, M. (n.d.). The caret Package. Retrieved November 10, 2016, from https://topepo.github.io/caret/index.html
16
ReferencesInformation Pooling
Ali, O. G., S. Sayin, T. Van Woensel, and J. Fransoo. "Pooling Information Across SKUs for Demand Forecasting with Data Mining" Researchgate.net. N.p., 30 Oct. 2007. Web. 14 Nov. 2016. The authors evaluate different methods in forecasting demand and consider pooling SKUs by category.
17
ReferencesR Clustering
Mclust Version 4 for R: Normal Mixture Modeling for Model ... (n.d.). Retrieved November 10, 2016, from http://www.stat.washington.edu/research/reports/2012/tr597.pdf
Oksanen, J. (2014, January 26). Cluster Analysis: Tutorial with R. Retrieved November 10, 2016, from http://cc.oulu.fi/~jarioksa/opetus/metodi/sessio3.pdf The author presents R code to perform Hierarchic and Fuzzy Clustering.
Quick-R. (n.d.). Retrieved November 10, 2016, from http://www.statmethods.net/advstats/cluster.html Quick-R site is very good for brief overview of creating clustering code in R.
Raftery et al. (2012, June). Mclust Version 4 for R: Normal Mixture Modeling for Model ... Retrieved November 10, 2016, from http://www.stat.washington.edu/research/reports/2012/tr597.pdf The authors present both mixture model explanations with accompanying R code from Mclust R package.
19
Education Stanford - Graduate Certificate in Data Mining and Applications Oklahoma State - Graduate Certificate in Business Data Mining University of Illinois – MS in Computer Science with Thesis Stevens Institute – BS in Computer Science (Applied Math, OR)
Contact InformationEmail: [email protected]: https://www.linkedin.com/in/linda-schumacher-9b1b214
Slides will be available on www.slideshare.net
Linda Schumacher