+ All Categories
Home > Documents > Enterprise Business Intelligence (BUS5EBI) SAS Enterprise Miner Assignment

Enterprise Business Intelligence (BUS5EBI) SAS Enterprise Miner Assignment

Date post: 30-Nov-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
16
Enterprise Business Intelligence (BUS5EBI) SAS Enterprise Miner Assignment Submitted By: Ajay Bhandary: 17841548 Sobin Sebastian: 17835795 Diana : Sandhyabahen H Patel: 17625595 BUS5EBI Enterprise Business Intelligence Submitted To: Department of Management La Trobe Business School La Trobe University Unit Coordinator: Dr. Mei-Tei Chu
Transcript

Enterprise Business Intelligence (BUS5EBI) SAS Enterprise Miner Assignment

Submitted By:

Ajay Bhandary: 17841548

Sobin Sebastian: 17835795

Diana :

Sandhyabahen H Patel: 17625595

BUS5EBI

Enterprise Business Intelligence

Submitted To:

Department of Management

La Trobe Business School

La Trobe University

Unit Coordinator: Dr. Mei-Tei Chu

ContentsTask 1: Analytic Objective....................................................................................................................3

Task 2: Data Analysis and Definition....................................................................................................3

2.1......................................................................................................................................................4

2.2......................................................................................................................................................5

2.3......................................................................................................................................................5

2.4......................................................................................................................................................6

Task 3: Cluster and Association analysis................................................................................................7

3.2.1...................................................................................................................................................7

3.2.2...................................................................................................................................................7

3.2.3...................................................................................................................................................7

Task 4: Predictive Modeling..................................................................................................................8

4.1.1...................................................................................................................................................8

4.1.2...................................................................................................................................................8

4.1.3...................................................................................................................................................9

4.1.4.................................................................................................................................................10

4.2.1.................................................................................................................................................10

4.2.2.................................................................................................................................................10

4.2.3.................................................................................................................................................11

4.2.4.................................................................................................................................................11

4.3.1.................................................................................................................................................12

4.3.2.................................................................................................................................................13

Task 5 Compare your models..............................................................................................................14

5.1....................................................................................................................................................14

5.2....................................................................................................................................................14

5.3....................................................................................................................................................14

Task 6: Business Implication................................................................................................................15

6.1....................................................................................................................................................15

6.2....................................................................................................................................................16

Appendix.............................................................................................................................................16

.............................................................................................................................................................16

Task 1: Analytic ObjectiveCase study:

IGM (Independent Grocers of Melbourne) is a chain of supermarkets in Melbourne, Australia, having 10 stores over various locations in Melbourne. The first Retail Store was opened in 2000; its current customers are the inhabitants of Melbourne. The core objective of their business is to sell various products and also find out the products that are being simultaneously sold with them. The Stores sells 17 products. However, due to the huge competition in Retail sector they are losing their customers and due to this the profit margins of the company have shrank. As a result, the IGM is focusing more on finding association between their products and also analyse their high selling products based on the market.

IGM is planning to deploy data mining in order to perform relationship analysis. The IGM wants to examine its transactions and understand which of its 17 products are being purchased in some combination and which products have more sales. Therefore, IGM has chosen sequence analysis of a sample of its customer base. The transactions data set has 459258 rows exactly and each row represents a transaction and Product combination.

Task 2: Data Analysis and DefinitionThere are four variables in the data set:

Name Model Role

Measurement Level

Description

PRODUCT Target Nominal Product purchased

QUANTITY Rejected Interval Quantity of this product purchased

STORE Rejected Interval Identification number of the store

TRANSACTION ID Nominal Transaction identification number

2.1 Are there any unusual data values in any of your assigned input variables? Support your answer with appropriate argument.

There are no unusual data values in any of the input variables. In figure 1,Graph 4 shows there are spikes in the data but that is expected as Magazine, Candy Bar etc are the highest sold product. The spikes in Graph 3 are also expected as it shows the transactions in each store.

Figure 1

2.2 List two possible strategies to handle cases with unusual values before attaching your desired analysis node? Explain the possible scenarios in which those strategies are appropriate.

We can use Replacement or Filtering tools to handle unusual values. Replacement tool is used to replace incorrect values with more appropriate values, and Filtering Tool is used to exclude unwanted records from the analysis.

2.3Are there any missing values in any of the input variables? (Note: Zero (0) is not considered to be a missing value).

No, Analysing the Histogram of variables and additionally Stat Explorer Tool we found that there were no missing values.

Figure 2

2.4 If you assigned a variable a rejected role, why is this case?

Yes, we have rejected, this is because we want the variable to be excluded from the data mining analysis in the process flow because Quantity and Store isn’t relevant for our analysis.

Task 3: Cluster and Association analysis

We decided to go with association analysis instead of clustering as the market analysis is more relevant to our business case.

3.2.1 What is the highest lift value for the resulting rules?

The highest lift value for the resulting rules 3.60 and 1st and 2nd rule is one and the same.

3.2.2 Which rule has this value?

Rule 1: Perfume Toothbrush

Rule 2: Toothbrush Perfume

As marked in figure 3.

Figure 3

3.2.3Why was an Association Analysis run?

IGM wants to examine its customers based on their transactions and understand which of their 17 products; customers are more likely to buy and if there is any combination between the products.

Task 4: Predictive Modeling

Data Analysis and Definition

Name Model Role

Measurement Level

Description

PRODUCT Target Nominal Product purchased

QUANTITY Input Interval Quantity of this product purchased

STORE Input Interval Identification number of the store

TRANSACTION ID Nominal Transaction identification number

4.1.1 Why the target variable was assigned that variable role?

IGM wants to analyse the sales of its products and find relationship between products based on the association analysis.

4.1.2 How many leaves are there in the optimal tree created in step 6? Which variable was used for the first split and explain why this variable was chosen over others?

The total number of leaves is 13 using a two way split as shown in Figure 4 below.

Quantity variable was used for the first split and it was chosen because of higher logworth value and right node’s lift value is closer to highest lift value (3.6 from association analysis) as shown in figure 4 and 5 below

Figure 4

Figure 5

4.1.3 How many leaves are there in the optimal tree created step 9?

Totally there are 29 leaves in this optimal tree with a 3way split.

Figure 6

4.1.4 Which of the decision tree models appears to be better? a. Based on average squared error on training data? b. Based on average squared error on validation data

Figure 7

Note: Tree3 in predecessor node is Decision tree 1 in the assignment

In both case a) and b) decision tree 2 has optimal performance compared to decision tree 1(as per assignment).

4.2.1 In preparation for regression, is any missing values imputation needed? If yes, should you do this imputation before generating the decision tree models? Why or why not?

No we don’t do imputation; StatExplore tool outcome shows 0 in the Missing column.

Figure 8

4.2.2 Which variables are included in the final regression model generated in step (xii)? List the variables in the descending order of importance to the model.

As per the Type 3 Analysis, a value nearest to 0 in the Pr > ChiSq column shows a significant input; a value near 1 shows an insignificant input. We have Quantity as input variable and it is closer to 0 indicating a significant input.

Figure 9

4.2.3 Which variables are included in the final regression model generated in the last step?

Its same, we have quantity as input variable that is predicting the target product variable.

Figure 10

4.2.4 Based on average squared error on the validation data, which of the two regression models generated appear to be better?

There is no change in the two regression models generated as shown in iteration plot- figure 11and figure 12 below.

Figure 11(Without transform variable)

Figure 12 (With transform variable)

4.3.1 How many weights does the neural network model generated in step (xvii) include?

As marked in the figure below, weight is 73.

Figure 13

4.3.2 Examine the validation average squared error of the neural network model. How does it compare to the two decision tree models and the regression model generated after applying log transformation?

Decision tree 1 Decision tree 2

Regression Neural Network

Figure 14

The above 14 figures compares the average square error of different models.

Task 5 Compare your models.5.1 Examine the results of the Model Comparison node. Of the predictive models compared which model has been selected by the Model Comparison node? Based on what selection criteria this model has been selected?

By default, the model used Misclassification Rate to select the best model and it appears

that the best model is the Decision Tree (2).

5.2Change the default values of the Model Comparison node properties so that it selects the model having the least average squared error on the validation data. Run the Model Comparison node again. Which model has been selected now?There Is no change.

Figure 15

5.3Why are the models compared?

Model Comparison node is used to evaluate and compare when you have more than

one predictive model to help you select the best model.

Task 6: Business Implication6.1 From the outcome of your analysis of the data set and the business case you have come up with, what can you deduce, recommend and conclude.

IGM wants to increase its sales of the products and to do so it is looking at combination of products which would attract customers and indirectly boost sales of products. Figure 16 clearly indicates a strong association between toothpaste and perfume so IGM would want to keep some offers or discounts strategically in one of them and may place them in close proximity so that sales of both boost up.

Figure 16

Moreover in our Regression model of predictive analysis we notice high odd ratios for Deodorants and Pain reliever (refer figure 17) so IGM would want to promote these products to increase their sales further.

Figure 17

6.2 What is the business implications that can be drawn from the process of building and comparing these models, and has this practice helped resolve the business issue? Why or why not?

We get different ways of understanding the predictive outcomes of our analysis of dataset by constructing and assessing the results of different models i.e. decision tree, regression and neural networking models .The results of the different model give us similar results with

minor differences so we can use Comparison model to select the strongest model We found strong evidences supporting our business case and can figure out association purchasing pattern of customers in purchasing IGM products. Moreover we can predict the highest selling product which can be strategically promoted by IGM.

Appendix


Recommended