+ All Categories
Home > Documents > Mining Your Warranty Data Using RFM Analysis - IBM · PDF file3 Mining Your Warranty Data...

Mining Your Warranty Data Using RFM Analysis - IBM · PDF file3 Mining Your Warranty Data...

Date post: 08-Feb-2018
Category:
Upload: doque
View: 214 times
Download: 2 times
Share this document with a friend
13
Mining Your Warranty Data Using RFM Analysis Rob Evans ([email protected]), Warranty Analyst, IBM 28 November 2012 RFM (recency, frequency, monetary) analysis Assume you have 10,000 customers, it costs $1 to mail each of them an offer, and you receive $10 in revenue from each offer accepted. If you mailed the offer to all 10,000 customers, it would cost you $10,000 ($1 x 10,000), and if every customer placed an order, your revenue would be $100,000 ($10 x 10,000), with a prot of $90,000. Clearly, this approach is a good business model. Unfortunately, the acceptance rate is more like 1%, so the revenue would be $1,000 ($10 x 100) which results in a loss of $9,000. The business problem here is to identify the customers who are more likely to accept an offer and only mail the offer to them. Figure 1-1 Finding the right customers affects profits Direct marketers have been using RFM analysis for 30 years to more accurately identify customers who are likely to respond. RFM stands for Recency, Frequency and Monetary, and is used to score how recent a customer purchased (recency), how often they purchase (frequency), and how much they spend (monetary). RFM is based on the idea that most of your business comes from a few of your customers. Of course, one could simply sort customers by these three parameters separately and combine them in some fashion, but RFM analysis provides a simple, repeatable way to sort large lists of customers to produce higher acceptance rates. How then does a direct mailing technique relate to warranty analysis? Well, the need to sort large customer-transaction databases to identify top customers is similar to the need to sort large repair-transaction databases to identify the most recent and most frequently serviced, and most expensive repairs. Each month, I go through hundreds of thousands of repair records to make sense of them. Is the trend going up? If so, which machine, which commodity, and which country? RFM is one technique I use initially to sort through all those repair records to nd © Copyright IBM Corporation 1989, 2012. 1
Transcript
Page 1: Mining Your Warranty Data Using RFM Analysis - IBM · PDF file3 Mining Your Warranty Data Using RFM Analysis can run disconnected from the remote database at anytime. I find the local

Mining Your Warranty Data Using RFM Analysis

Rob Evans ([email protected]), Warranty Analyst, IBM

28 November 2012

RFM (recency, frequency, monetary) analysis

Assume you have 10,000 customers, it costs $1 to mail each of them an offer, and you receive $10 in revenue from each offer accepted. If you mailed the offer to all 10,000 customers, it would cost you $10,000 ($1 x 10,000), and if every customer placed an order, your revenue would be $100,000 ($10 x 10,000), with a profit of $90,000. Clearly, this approach is a good business model. Unfortunately, the acceptance rate is more like 1%, so the revenue would be $1,000 ($10 x 100) which results in a loss of $9,000. The business problem here is to identify the customers who are more likely to accept an offer and only mail the offer to them.

Figure 1-1 Finding the right customers affects profits

Direct marketers have been using RFM analysis for 30 years to more accurately identify customers who are likely to respond. RFM stands for Recency, Frequency and Monetary, and is used to score how recent a customer purchased (recency), how often they purchase (frequency), and how much they spend (monetary). RFM is based on the idea that most of your business comes from a few of your customers. Of course, one could simply sort customers by these three parameters separately and combine them in some fashion, but RFM analysis provides a simple, repeatable way to sort large lists of customers to produce higher acceptance rates. How then does a direct mailing technique relate to warranty analysis? Well, the need to sort

large customer-transaction databases to identify top customers is similar to the need to sort large repair-transaction databases to identify the most recent and most frequently serviced, and most expensive repairs. Each month, I go through hundreds of thousands of repair records to make sense of them. Is the trend going up? If so, which machine, which commodity, and which country? RFM is one technique I use initially to sort through all those repair records to find

© Copyright IBM Corporation 1989, 2012. 1

Page 2: Mining Your Warranty Data Using RFM Analysis - IBM · PDF file3 Mining Your Warranty Data Using RFM Analysis can run disconnected from the remote database at anytime. I find the local

2

Mining Your Warranty Data Using RFM Analysis

machines with unusual repair histories. While those machines might be outliers, they might also be the start of an identifiable trend. In this paper, I show how I use IBM® SPSS® Modeler 14 to make my job easier by breaking

down my analysis into three steps – setup, RFM analysis, and output. The process is shown in Figure 1-2 below. Note: much of how to connect to a database and assign variable roles was covered in Mining Your Warranty Data – Finding Anomalies (Part 1) (available from the SPSS community at http://www.ibm.com/developerworks/spssdevcentral), so those topics will not be covered here.

Setup

Each month, I review thousands of warranty claims looking for trends. The trends might be increasing machine failure rates or increasing parts and labor costs. With tens of thousands of transactions, where does one begin? Figure 1-2 shows the stream for my RFM analysis for Asia Pacific (AP) using IBM® SPSS® Modeler. The stream is split into three sections – (1) setup, (2) RFM analysis, and (3) output.

Figure 1-2 Stream for the complete RFM analysis on the IBM SPSS Modeler canvas

Here are a few tips that make me more productive. First, it is useful to edit the names of the nodes to be more descriptive (e.g., “Convert Date”). The more complex the model gets, the more you will appreciate the little time it takes to add a more descriptive node name. Second, I find that adding comments to the node expedites troubleshooting the stream. Finally, I like turning on the cache setting (SQL Node) which keeps the data locally so it does not need to be retrieved from the remote database each time I process the stream. Also, the cache can be saved to a local file, so I

Page 3: Mining Your Warranty Data Using RFM Analysis - IBM · PDF file3 Mining Your Warranty Data Using RFM Analysis can run disconnected from the remote database at anytime. I find the local

3

Mining Your Warranty Data Using RFM Analysis

can run disconnected from the remote database at anytime. I find the local cache feature useful for working when the remote database is off-line for maintenance.

Figure 1-3 Relational database node – Data tab

I use an SQL relational database node to get my input. Setting up the nodes is as easy as cutting and pasting the SQL query into the query area in Figure 1-3. On the Type node (Figure 1-4), I specify the role of each variable. For my analysis, I set part quantity (PART_QTY) as the “Target” and I set the role of country code (CTRY_CODE) as “None” since it is redundant with the country name and not needed. SPSS Modeler 14 uses these measurements and roles in downstream nodes.

Page 4: Mining Your Warranty Data Using RFM Analysis - IBM · PDF file3 Mining Your Warranty Data Using RFM Analysis can run disconnected from the remote database at anytime. I find the local

4

Mining Your Warranty Data Using RFM Analysis

Figure 1-4 Type node – Types tab

Figure 1-5 Type Node preview showing example of repair records

Many SPSS Modeler 14 nodes require a unique identifier such as a customer number or a loyalty card number to analyze the transactions. To translate from the marketing examples to my warranty models, I represent each machine as a customer who is shopping for parts. I create a unique identifier for each machine by concatenating (operator is “><”) the machine type (e.g., 7870)

Page 5: Mining Your Warranty Data Using RFM Analysis - IBM · PDF file3 Mining Your Warranty Data Using RFM Analysis can run disconnected from the remote database at anytime. I find the local

5

Mining Your Warranty Data Using RFM Analysis

with the plant of manufacture (e.g. 23) and the serial number (e.g., ABCDE) to build a unique machine identifier (e.g. 787023ABCDE). I then assign the compound string to a new variable, “MTSN” as shown in Figure 1-6. Figure 1-6 Derive node to generate unique identifier using the machine type and serial number

The variable TRANDATE (the transaction date) is internally stored as an integer. To enable SPSS Modeler 14 to process dates relative to one another, TRANDATE must be converted from integers to “date” storage. A Filler node can be used to make that conversion as shown in Figure 1-7. Figure 1-7 Filler node to convert transaction date variable (TRANDATE)

Page 6: Mining Your Warranty Data Using RFM Analysis - IBM · PDF file3 Mining Your Warranty Data Using RFM Analysis can run disconnected from the remote database at anytime. I find the local

6

Mining Your Warranty Data Using RFM Analysis

One area to watch out for is forgetting to set the correct Date Format (File-Stream properties). Notice in Figure 1-8 below that the date format is set to “YYYY-MM-DD”, which should be the same format for the date field coming from the source node.

Figure 1-8 Stream-Properties dialog showing the setting for the date format (YYYY-MM-DD)

RFM Nodes

Here is where the fun begins with the two RFM nodes – aggregation and analysis. The aggregation node (Figure 1-9) takes the individual transactions and creates a single record for each unique ID record. In my case, the ID record is the MTSN (Machine Type+Plant Code+Serial Number). The date field, TRANDATE, is when the machine was repaired, and COST_INUS is the cost of the repair in US dollars.

Page 7: Mining Your Warranty Data Using RFM Analysis - IBM · PDF file3 Mining Your Warranty Data Using RFM Analysis can run disconnected from the remote database at anytime. I find the local

7

Mining Your Warranty Data Using RFM Analysis

Figure 1-9 RFM aggregate node

Looking at Figure 1-10, the recency value was calculated by taking the number of days between today and the service date. For example, for MTSN=797999LM874, we have: 10/22/2011 minus 5/17/2011 = 158 days. The frequency value shows one repair, and the monetary value shows the cost of that one repair ($1048).

Page 8: Mining Your Warranty Data Using RFM Analysis - IBM · PDF file3 Mining Your Warranty Data Using RFM Analysis can run disconnected from the remote database at anytime. I find the local

8

Mining Your Warranty Data Using RFM Analysis

Figure 1-10 RFM aggregate node preview

Once the recency, frequency, and monetary values have been calculated, they are placed in bins. The default number of bins is five and the default weight for each bin is 10 as seen in Figure 1-11 below. These values can be changed, but I have found the defaults to be sufficient for my needs.

The RFM aggregate node above automatically created the recency, frequency, and monetary selections seen in Figure 1-11, so no changes are needed to the analysis node defaults.

Page 9: Mining Your Warranty Data Using RFM Analysis - IBM · PDF file3 Mining Your Warranty Data Using RFM Analysis can run disconnected from the remote database at anytime. I find the local

9

Mining Your Warranty Data Using RFM Analysis

Figure 1-11 RFM analysis node

The RFM analysis node calculates the RFM score (Figure 1-12). The RFM score is built usingthe formula:

(recency score x recency weight) + (frequency score x frequency weight) + (monetary scorex monetary weight)

So the RFM score for the first record, “798999LM874” in Figure 1-12 below, is calculated from:

(3 x 10) + (1 x 10) + (1 x 10) = 50

The highest possible RFM score using five bins and 10 for weighting is given by

(5 x 10) + (5 x 10) + (5 x 10) = 150

which represents a very recent activity involving multiple claims that were very expensive.

Page 10: Mining Your Warranty Data Using RFM Analysis - IBM · PDF file3 Mining Your Warranty Data Using RFM Analysis can run disconnected from the remote database at anytime. I find the local

10

Mining Your Warranty Data Using RFM Analysis

Figure 1-12 RFM analysis node preview

Output

I took the RFM score and sorted the machines from the highest to lowest score, using a sort node and then used the sample node shown in Figure 1-13 to get the top ten records.

Figure 1-13 Sample node for selecting the first 10 records

Remember that the maximum RFM score is 150 (5 bins and 10 for weighting) compared to the maximum value of 110 obtained in the current analysis, as seen below in Figure 1-14.

Page 11: Mining Your Warranty Data Using RFM Analysis - IBM · PDF file3 Mining Your Warranty Data Using RFM Analysis can run disconnected from the remote database at anytime. I find the local

1 1

Mining Your Warranty Data Using RFM Analysis

Figure 1-14 Sample node preview

We have collected the transactions, converted the date field, built a unique identifier, aggregated the transactions into single records by MTSN, and given the transactions an RFM score based on recency, frequency and cost to repair. Then we sorted the transactions by RFM score and extracted the first 10 records. Now we want to bridge from the top 10 RFM scores to the detailed service records so we know

the country, customer and servicer involved. We do that by using a Merge node (Figure 1-15) along with the 10 MTSN’s as a key to automatically extract the detailed service records.

Page 12: Mining Your Warranty Data Using RFM Analysis - IBM · PDF file3 Mining Your Warranty Data Using RFM Analysis can run disconnected from the remote database at anytime. I find the local

12

Mining Your Warranty Data Using RFM Analysis

Figure 1-15 Merge node for extracting the detailed records associated with the top 10 MTSN’s from the sample node

Summary

We have covered how to take a direct marketing technique called RFM and apply it to warranty analysis. We have learned how to import, convert and massage the data, run it through two RFM nodes, sort out the top 10 RFM scores, and use them to extract the detailed repair records to review. There are other variations on this analysis. For example, by changing the unique ID from the

machine (MTSN) to the customer or servicer, we can view the data in a completely different way. If you change the ID to customer number, then the results will be customers having frequent, costly, claims. If you change the ID to servicer number, then the results will be servicers who are involved in recent, frequent, and costly repairs. The customer view might provide insights into a developing customer situation, and the servicer view might provide insights into someone who needs more training.

About the author

Rob Evans, a Bachelor of Mechanical Engineering (Georgia Institute of Technology) and a Master of Engineering (Dartmouth College), is a Warranty Analyst for IBM where he has worked for 30 years. Rob has experience in IBM’s development, sales, support, and now warranty areas. He has been awarded three patents and published numerous papers on topics ranging from material science to artificial intelligence. You can reach Rob at [email protected].

Page 13: Mining Your Warranty Data Using RFM Analysis - IBM · PDF file3 Mining Your Warranty Data Using RFM Analysis can run disconnected from the remote database at anytime. I find the local

13

Mining Your Warranty Data Using RFM Analysis

Other Papers on Data Mining Warranty Data:

Mining Your Warranty Data Using IBM DB2 Intelligent Miner for Data - Association Method

Mining Your Warranty Data – Finding Anomalies (Part 1)

Trademarks IBM, the IBM logo, and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at http://www.ibm.com/legal/copytrade.shmtl.


Recommended