+ All Categories
Home > Data & Analytics > Linda Schumacher Informs 2016

Linda Schumacher Informs 2016

Date post: 11-Jan-2017
Category:
Upload: linda-schumacher
View: 21 times
Download: 0 times
Share this document with a friend
18
An Investigation of Cluster Analysis of Retail Stores To Improve Predictive Modeling of Sales Linda Schumacher INFORMS Annual Meeting 2016 1
Transcript
Page 1: Linda Schumacher Informs 2016

1

An Investigation of Cluster Analysis of Retail Stores

To Improve Predictive Modeling of Sales

Linda SchumacherINFORMS Annual Meeting 2016

Page 2: Linda Schumacher Informs 2016

2

Retail stores ◦ Continental US ~ 2000 stores◦ Many products at each location

Predict sales◦ Classification Problem: Sold/Unsold ◦ Model per product category - pooling

Assortment Planning

Business Problem

Page 3: Linda Schumacher Informs 2016

3

Assortment Planning Right product at the right store

◦ Avoid out of stock◦ Avoid non-productive inventory

Use predictions to select products per store Increase sales

Business Problem

Page 4: Linda Schumacher Informs 2016

4

Predictive Modeling - Classification◦ Pooling SKUs by product category

Segment Stores – Cluster Analysis◦ Group like stores together based on customer

needs◦ Define sister stores

Approach

Page 5: Linda Schumacher Informs 2016

5

Unsupervised Machine Learning – discovery natural groupings ◦ Items that are alike within the group ◦ groups are different from each other◦ alike/different – judged by distance

R Open Source ◦ Statistical software◦ Variety of Packages/Libraries

Cluster Analysis - Introduction

Page 6: Linda Schumacher Informs 2016

6

Methods Centroid-based Kmeans

Hierarchical

Probabilistic/Fuzzy

Mixture Models◦ Expectation Maximization

Cluster Analysis

Page 7: Linda Schumacher Informs 2016

7

Considerations Input data prep – scaling required Transformations maybe beneficial

Number of clusters – K/G◦ WSS, gap test, BIC

Quality of clustering solution ◦ Metrics◦ Business purposes

Cluster Analysis

Page 8: Linda Schumacher Informs 2016

8

Input Data Customers need to purchase specific products based on type of

projects For each project type estimate number of the top J projects near

the store

Cluster Analysis

Store

Modular Log Marble

1 100 5 202 80 4 223 4 200 24 7 150 0

Page 9: Linda Schumacher Informs 2016

9

Build Models and Evaluate A base classification model Two Approaches

◦ Classification model per cluster and calculate an overall weighted average metric

◦ Derive new cluster average sales variables. Build one predictive model with all variables in the base model and the new derived cluster average variables.

Classification

Page 10: Linda Schumacher Informs 2016

10

Dependent Variable◦ Sold/unsold over 1 year period

Independent Variables ◦ Store level Sales

Most recent year Previous year

◦ …

Classification

Page 11: Linda Schumacher Informs 2016

11

Tree-based classification models ◦ Decision Tree rpart- Recursive Partitioning for

Classification◦ Boosted trees – C5.0

R Caret Package ClassificationAndRegressionTraining◦ Split data into Train and Test sets◦ Pre-processing◦ Model Tuning/Training◦ Make Predictions◦ Calculate Results Metrics

Modeling in R

Page 12: Linda Schumacher Informs 2016

12

Accuracy

Sensitivity

Evaluate Predictive Results - Metrics

Page 13: Linda Schumacher Informs 2016

13

Base Model

Consider Two Approaches

Compare Results

One Model With

Additional Derived average clustered sales variables

Many Models One per cluster

Weighted Average Overall

BASE ModelOne Model

With Original Input Data

Page 14: Linda Schumacher Informs 2016

14

The jury is still out.◦ Stability over other samples◦ Improve the predictive models

Classifier tuning Modify number of clusters Add other variables to the clustering

Conclusion

Page 15: Linda Schumacher Informs 2016

15

ReferencesR Caret Package

Kuhn, M. (2013). Predictive Modeling with R and the caret Package useR! 2013. Retrieved November 10, 2016, from https://www.r-project.org/nosvn/conferences/useR-2013/Tutorials/kuhn/user_caret_2up.pdf

Predictive modelling fun with the caret package. (2014). Retrieved November 10, 2016, from https://www.r-bloggers.com/predictive-modelling-fun-with-the-caret-package/

Kuhn, M. (n.d.). The caret Package. Retrieved November 10, 2016, from https://topepo.github.io/caret/index.html

Page 16: Linda Schumacher Informs 2016

16

ReferencesInformation Pooling

Ali, O. G., S. Sayin, T. Van Woensel, and J. Fransoo. "Pooling Information Across SKUs for Demand Forecasting with Data Mining" Researchgate.net. N.p., 30 Oct. 2007. Web. 14 Nov. 2016. The authors evaluate different methods in forecasting demand and consider pooling SKUs by category.

Page 17: Linda Schumacher Informs 2016

17

ReferencesR Clustering

Mclust Version 4 for R: Normal Mixture Modeling for Model ... (n.d.). Retrieved November 10, 2016, from http://www.stat.washington.edu/research/reports/2012/tr597.pdf

Oksanen, J. (2014, January 26). Cluster Analysis: Tutorial with R. Retrieved November 10, 2016, from http://cc.oulu.fi/~jarioksa/opetus/metodi/sessio3.pdf The author presents R code to perform Hierarchic and Fuzzy Clustering.

Quick-R. (n.d.). Retrieved November 10, 2016, from http://www.statmethods.net/advstats/cluster.html Quick-R site is very good for brief overview of creating clustering code in R.

Raftery et al. (2012, June). Mclust Version 4 for R: Normal Mixture Modeling for Model ... Retrieved November 10, 2016, from http://www.stat.washington.edu/research/reports/2012/tr597.pdf The authors present both mixture model explanations with accompanying R code from Mclust R package.

Page 18: Linda Schumacher Informs 2016

19

Education Stanford - Graduate Certificate in Data Mining and Applications Oklahoma State - Graduate Certificate in Business Data Mining University of Illinois – MS in Computer Science with Thesis Stevens Institute – BS in Computer Science (Applied Math, OR)

Contact InformationEmail: [email protected]: https://www.linkedin.com/in/linda-schumacher-9b1b214

Slides will be available on www.slideshare.net

Linda Schumacher


Recommended