Date post: | 17-Aug-2015 |
Category: |
Documents |
Upload: | prakhar-jajoo |
View: | 72 times |
Download: | 0 times |
Bookbinder CaseHANZHI, NATSUMI, PRAKHAR & JOHN
Agenda
• Direct Mail Campaign• Key question: based on the choice data we have, what will be the
most profitable segment of customers to mail • 3 Models – RFM, Logit, and Linear Regression• Compare the models for profit outcomes• Predictions of future profit outcomes• Recommendation
About the case
•BBBC is a direct marketing company looking to optimize their marketing strategies through segmentation and targeting. •Bookbinder has used RFM modeling to this point. We are now going to introduce Logit and Linear regression to determine the best model.
Sample vs. Holdout data: Using the Art History of Florence book to build our model • BBBC mailed 20,000 customers an offer for Art History book and got a
9.03% response rate = 1806 orders• Sample data: 1600 customers, 400 of whom purchased the book• Holdout data: 2300 customers who represent the whole market
•WE BUILD OUR MODEL BASED ON SAMPLE DATA AND APPLY IT TO THE HOLDOUT DATA.
Variables
• Choice: 1 = purchase; 0 = no purchase• Gender: 0 = female; 1 = male• Amount Purchased: total money spent • Frequency: total number purchases• Last Purchase: months since last purchase• First Purchase: months since first purchase• P_Child: number of children’s books purchased• P_Youth: number of youth books purchased• P_Cook: number of cook books purchased• P_DIY: number of DIY books purchased• P_Art: number of art books purchased
RFM scoring model
Amt. Purchased Frequency Last Purchase
$1 - $99 = 1 score 1 to 9 = 1 1 to 3 = 4
$100 - $199 = 2 score 10 to 19 = 2 4 to 6 = 3
$200 - $299 = 3 score 20 to 29 = 3 7 to 9 = 2
$300 - $399 = 4 score 30 to 39 = 4 10 to 12 = 1
$400+ = 5 score -- --
RFM Scoring exampleObservations /
Choice dataAmount
purchased M score Frequency F ScoreLast
purchase
Recency Total score
1 287 3 12 2 4 3 8
2 215 3 4 1 1 4 8
3 261 3 2 1 1 4 8
4 24 1 4 1 1 4 6
5 120 2 8 1 1 4 7
6 66 1 2 1 4 3 5
7 42 1 12 2 1 4 7
8 233 3 8 1 2 4 8
9 66 1 12 2 1 4 7
10 199 2 22 3 1 4 9
RFM Gain Chart
Decile Mail CumMail Response Cum RespPercent
response Cum pct Index
1 230 230 5 5 0.02 0.02 0.25
2 230 460 25 30 0.11 0.15 1.23
3 230 690 17 47 0.07 0.23 0.83
4 230 920 28 75 0.12 0.37 1.37
5 230 1150 28 103 0.12 0.50 1.37
6 230 1380 6 109 0.03 0.53 0.29
7 230 1610 14 123 0.06 0.60 0.69
8 230 1840 29 152 0.13 0.75 1.42
9 230 2070 22 174 0.10 0.85 1.08
10 230 2300 30 204 0.13 1.00 1.47
RFM Lift Chart
2 112 222 332 442 552 662 772 882 992 110212121322143215421652176218721982209222020
50
100
150
200
250
Logistic Analysis
Variables / Coefficient estimates Coefficient estimates Standard deviation t-statistic
Gender -0.86323 0.13745 -6.28034Amount purchased 0.001864 0.000792 2.354247Frequency -0.07551 0.016594 -4.55076Last purchase 0.611771 0.093813 6.521196First purchase -0.01478 0.012803 -1.15439P_Child -0.81125 0.116707 -6.95117P_Youth -0.63704 0.143378 -4.4431P_Cook -0.92301 0.119482 -7.7251P_DIY -0.90587 0.143703 -6.30378P_Art 0.686112 0.127018 5.40171Const-1 -0.35153 0.214384 -1.63971
Logistic Analysis
kk xbxbxbaS ...2211 SeP
1
1
Observations / Choice data Score for Logit Probability1 0.14468 0.536107062 0.12274 0.530646643 0.389078 0.596060664 -0.86744 0.295787295 -0.41553 0.397587696 0.688331 0.665595487 -2.53122 0.073698178 -0.33834 0.416213619 -1.56348 0.1731484
10 -1.53237 0.17764762
Logit Probabilities for Each Decile
Decile Logit average prob.1 63.25%2 38.96%3 27.56%4 20.58%5 16.37%6 12.91%7 9.84%8 7.41%9 5.07%
10 2.26%
Logistic Gain chart
Decile Mail CumMail Response Cum RespPercent
response Cum pct Index
1 230 230 86 86 37.39% 42.16% 4.22
2 230 460 34 120 14.78% 58.82% 1.67
3 230 690 24 144 10.43% 70.59% 1.18
4 230 920 16 160 6.96% 78.43% 0.78
5 230 1150 14 174 6.09% 85.29% 0.69
6 230 1380 12 186 5.22% 91.18% 0.59
7 230 1610 6 192 2.61% 94.12% 0.29
8 230 1840 7 199 3.04% 97.55% 0.34
9 230 2070 4 203 1.74% 99.51% 0.20
10 230 2300 1 204 0.43% 100.00% 0.05
Total 2300 204
Logistic Lift Chart
2 132 262 392 522 652 782 912 10421172130214321562169218221952208222120
50
100
150
200
250
Linear Regression Analysis Coefficients Standard Error t Stat P-value
Intercept 0.3642 0.030741148 11.84824 4.29E-31Gender -0.1309 0.02003031 -6.53612 8.48E-11
Amount purchased 0.0003 0.000111042 2.464059 0.013843Frequency -0.0091 0.002179064 -4.17005 3.21E-05
Last purchase 0.0970 0.013558887 7.156089 1.26E-12First purchase -0.0020 0.001816011 -1.10263 0.270353
P_Child -0.1263 0.01640109 -7.69817 2.41E-14P_Youth -0.0964 0.020109722 -4.79153 1.81E-06P_Cook -0.1415 0.016606434 -8.52024 3.64E-17P_DIY -0.1352 0.019787299 -6.83425 1.17E-11
P_Art 0.1178 0.019442683 6.061375 1.68E-09
Linear regression analysis
Observations / Choice data
Choice (0/1) Gender
Amount purchas
edFrequen
cyLast
purchase
First purchas
eP_Child P_Youth P_Cook P_DIY P_Art SCORE
1 1 0 287 12 4 24 0 3 0 0 1 0.50255122 1 1 215 4 1 4 0 0 0 0 1 0.46265603 1 1 261 2 1 2 0 0 0 0 1 0.49742064 1 0 24 4 1 4 1 0 0 0 0 0.29720865 1 1 120 8 1 8 0 0 0 0 1 0.39230606 1 0 66 2 4 16 0 0 1 1 1 0.56131687 1 1 42 12 1 12 0 0 1 0 0 0.06726728 1 1 233 8 2 12 0 0 0 0 0 0.39439399 1 1 66 12 1 12 0 0 0 0 0 0.2153247
10 1 1 199 22 1 22 0 0 0 0 1 0.2586726
kk xbxbxbaY ...2211
Linear regression Gain chart
Decile Mail CumMail Response Cum RespPercent
response Cum pct Index1 230 230 86 86 0.37 0.42 4.22 2 230 460 34 120 0.15 0.59 1.67 3 230 690 24 144 0.10 0.71 1.18 4 230 920 16 160 0.07 0.78 0.78 5 230 1150 14 174 0.06 0.85 0.69 6 230 1380 12 186 0.05 0.91 0.59 7 230 1610 6 192 0.03 0.94 0.29 8 230 1840 7 199 0.03 0.98 0.34 9 230 2070 4 203 0.02 1.00 0.20
10 230 2300 1 204 0.00 1.00 0.05 Total 2300 204
Linear regression lift chart
2 52 102 152 202 252 302 352 402 452 502 552 602 652 702 752 802 852 902 952 100210521102115212021252130213521402145215021552160216521702175218021852190219522002205221022152220222520
50
100
150
200
250
Cost & Profit Summary
Cost of Mailing $0.65Cost of Each book $15
Overhead $6.75Price of Each Book $31.95
Unit Margin $10.20
Comparison of the modelsLogit Regression
Decile Response Cum RespPercent
response Cum pct Response Cum RespPercent
response Cum pct1 86 86 0.373913 0.4215686 86 86 0.373913 0.4215692 34 120 0.147826 0.5882353 34 120 0.147826 0.5882353 25 145 0.108696 0.7107843 24 144 0.104348 0.7058824 18 163 0.078261 0.7990196 16 160 0.069565 0.7843145 11 174 0.047826 0.8529412 14 174 0.06087 0.8529416 6 180 0.026087 0.8823529 12 186 0.052174 0.9117657 12 192 0.052174 0.9411765 6 192 0.026087 0.9411768 7 199 0.030435 0.9754902 7 199 0.030435 0.975499 4 203 0.017391 0.995098 4 203 0.017391 0.995098
10 1 204 0.004348 1 1 204 0.004348 1Total 204 204
Profit analysis for holdout sampleRegression Logit RFM
Decile Cost to mail Profit Cum Profit Profit Cum Profit ProfitCum Profit
1 $149.50 $727.70 $727.70 $727.70 $727.70 $-98.50 $-98.502 $149.50 $197.30 $925.00 $197.30 $925.00 $105.50 $7.003 $149.50 $95.30 $1020.30 $105.50 $1030.50 $23.90 $30.904 $149.50 $13.70 $1034.00 $34.10 $1064.60 $136.10 $167.005 $149.50 $-6.70 $1027.30 $-37.30 $1027.30 $136.10 $303.106 $149.50 $-27.10 $1000.20 $-88.30 $939.00 $-88.30 $214.807 $149.50 $-88.30 $911.90 $-27.10 $911.90 $-6.70 $208.108 $149.50 $-78.10 $833.80 $-78.10 $833.80 $146.30 $354.409 $149.50 $-108.70 $725.10 $-108.70 $725.10 $74.90 $429.30
10 $149.50 $-139.30 $585.80 $-139.30 $585.80 $156.50 $585.80
Graphical comparison between three models
2 107 212 317 422 527 632 737 842 947 1052115712621367147215771682178718921997210222070
50
100
150
200
250
Comparison of the profit models
• RFM = weaker analysis • Logit analysis = theoretical purposes only• Logit probabilities = not true probabilities• Regression analysis = most appropriate analysis• Segmentation into Deciles = Saving the mail cost• Decile with Higher response rate = More Profits
Whole market profit analysis using Linear Regression model
Decile Mail CumMail ResponseCum Resp
Percent response Cum pct Index Cost to mail Profit Cum Profit
1 5000 5000 1870 1870 0.37 0.42 4.22 $3,250.00 $15,819.57 $15,819.57
2 5000 10000 739 2609 0.15 0.59 1.67 $3,250.00 $4,289.13 $20,108.70
3 5000 15000 522 3130 0.10 0.71 1.18 $3,250.00 $2,071.74 $22,180.43
4 5000 20000 348 3478 0.07 0.78 0.78 $3,250.00 $297.83 $22,478.26
5 5000 25000 304 3783 0.06 0.85 0.69 $3,250.00 -$145.65 $22,332.61
6 5000 30000 261 4043 0.05 0.91 0.59 $3,250.00 -$589.13 $21,743.48
7 5000 35000 130 4174 0.03 0.94 0.29 $3,250.00 -$1,919.57 $19,823.91
8 5000 40000 152 4326 0.03 0.98 0.34 $3,250.00 -$1,697.83 $18,126.09
9 5000 45000 87 4413 0.02 1.00 0.20 $3,250.00 -$2,363.04 $15,763.04
10 5000 50000 22 4435 0.00 1.00 0.05 $3,250.00 -$3,028.26 $12,734.78
Total 50000 4435 0.09 $32,500.00 $12,734.78
Whole market profit analysis using logistic Regression model
Decile Mail CumMail Response Cum RespPercent
response Cum pct Index Cost to mail Profit Cum Profit
1 5000 5000 1870 1870 0.37 42.16% 4.22 $3,250.00 $15,819.57 $15,819.57
2 5000 10000 739 2609 0.15 58.82% 1.67 $3,250.00 $4,289.13 $20,108.70
3 5000 15000 543 3152 0.11 71.08% 1.23 $3,250.00 $2,293.48 $22,402.17
4 5000 20000 391 3543 0.08 79.90% 0.88 $3,250.00 $741.30 $23,143.48
5 5000 25000 239 3783 0.05 85.29% 0.54 $3,250.00 ($810.87) $22,332.61
6 5000 30000 130 3913 0.03 88.24% 0.29 $3,250.00 ($1,919.57) $20,413.04
7 5000 35000 261 4174 0.05 94.12% 0.59 $3,250.00 ($589.13) $19,823.91
8 5000 40000 152 4326 0.03 97.55% 0.34 $3,250.00 ($1,697.83) $18,126.09
9 5000 45000 87 4413 0.02 99.51% 0.20 $3,250.00 ($2,363.04) $15,763.04
10 5000 50000 22 4435 0.00 100.00% 0.05 $3,250.00 ($3,028.26) $12,734.78
Profit Estimation for 3 Models
RFM• Mails to 10 deciles
• 585 / 2300 =
• 25 cents per mailing piece
• Applied to 1 mailing
• 0.25 x 50000 = $12,500
• Applied to whole database (500,000)
• 12500 x10 = $125,000 x 12 mos = $1,500,000
REGRESSION• Mails to 3 deciles
• 1020 / 2300 =
• 44.34 cents per mailing piece
• Applied to 1 mailing
• 0.4434 x 50000 = $22,170
• 22,170 x 3 = $66,510
• Applied to whole database (500,000)
• 66,150 x10 = $661,500
• X 12 mos = 7,938,000
LOGIT• Mails to 3 deciles
• 1030 / 2300 =
• 44.78 cents per mailing piece
• Applied to 1 mailing
• 0.4478 x 50000 = $22,390
• 22,390 x 3 = $67,170
• Applied to whole database (500,000)
• 67,170 x10 = $671,700 x 12 mos = $8,060,400
Recommendation
Stop RFM
Start using Logit or Regression
Use Regression for the smaller mail shops
Use Logit in cases like BBBC where the marginal difference between logit and regression translates into significant dollar value.
1 cent per piece $1,200 per month $144,000 per year covers cost of software and data analyst.
Recommendation for BBBC: purchase data statistics package with logit, hire data analyst, and automate the segmentation / optimization / mailing process
Thank you.