Using Peer-to-Peer Downloading Behavior to Forecast Pre-launch Sales of Music:
A Bayesian Analysis
Il-Horn Hann, Joo Hee OhMarshall School of BusinessUniversity of Southern California
1. Research Motivation
Want to measure and understand the behavior of online system users and its linkage to the business forecasts from individual-level system usage data on Peer-to-Peer network (ARES)
2. Backgrounds
1. Lee, Boatwright and Kamakura (Management Science 2003)
-Bayesian model for Pre-launch Sales forecasting for Billboard 200s albums
2. Wendy Moe and Peter Fader (Marketing Science 2002)
-Using Advance Purchase Orders to Forecast New Product Sales
• Provides Pre-launch Sales forecasts for albums in Billboard 200s
• Does not provide pre-launch sales forecasts for NOT successful albums
• Does not provide point-estimate of first weeks’ sales after launch
• Generate Sales forecasts for music albums using advance-purchase orders
• Limited results for generalization for the special type of consumers
3. Research Question
How can we develop an empirical model utilizing downloading behavior data from peer-to-peer network to generate pre-launch sales forecasts of music for the first-week after launch?
4. Model Development
Hierarchical Random-Effects Model
tY ttQP ttt XbaPQ (Preference or Quality Variable)
,iiitit Xy ),,0(~ ii Nwhere ,,1,,,1 iTtni
iTii I2
1
: weekly sales of album i at week t
:- downloaded # from peer to peer network in the previous week
- total # of previous available dates for files in the previous week
- weekly dummy variables for 2nd, 3rd, and 4th release
it
it
y
X
week
: vector representing the album i's sensitivity to downloaded #si
4. Model Development
,iii z where ),,0(~ VNi
ni ,,1
: vector representing the album i's sensitivity to downloaded #s,
# of available dates, weekly dummies regarding launching datei
: vector representing the album i's characteristics
Total # of previous albums of artist,
Genre of music, Gender of artist
iz
i: matrix representing sensitivity of β to album i's characteristics
Second-level Model Hierarchy
4. Model Development
Conjugate Prior-Distribution for Hierarchical Linear Models
,For )),((~)( 1 AVvecNvec
0 ,200 I
:I Identity matrix
,For
Set &
where
statheAdopt ,priorgammainversendard
,VFor ),(~ 001
bb VWishartV
where kbbb IVandk 000 )11(4
),...,,( 221 mdiag
),(~ ii GammaInvertedtIndependen
Rossi & Allenby et al. (2005)Model hyper-parameters
4. Model Development
1y 2y 3y 50y 51y 74y
1 2 3 50 51 74
Downloading Behavior (X)
Album characteristics (Z)
VZ ),0(~'
VNiidvi
]',,,[ ''2
'1 m
Hyper-parameters),,( V
)( i
k ...,,, 21)( ity
)),((~)( 1 AVvecNvec
),(~ ii GammaInverted
,iiitit Xy ),,0(~ ii Nwhere ,,1,,,1 iTtni
iTii I2
]',,,[ ''2
'1 mzzzZ
),(~ 001
bb VWishartV
,' iii z
Estimation/Calibration set Hold-Out Sample
4. Model Development
)|,,,( tyVp ),(),(),|,(),|,()|( VppVpVpyp t
,,,,,,|1 VzXy iititi
211 ,,,| oiiitit sXy
AzV ii ,,,| 11
ii zV ,,| 11
Forecasting Model
,iiitit Xy ,,1,,,1 iTtni
Gibbs-Sampling
Draw
and
Draw
and Repeat, as necessary.
iiti Xy ̂ˆ 1
Point-Estimate of 1st week Sales
Average # of Downloads before launch, Available # of Dates, Weekly Dummy Var.
,' iii z ),0(~'VNiidvi
5. Hypotheses
H1 The more downloaded from the P2P, the higher sales of the album.
H2 The more dates available from the P2P, the higher sales of the album.
H3 The more previous total # of albums artist have, the higher sales sensitivity to the downloads #.
H4 Albums launched in the same rank of week have similar estimation coefficients of sales on downloads #.
H5 The genre of music, the gender of artist affect positively to the sales through downloads #.
6. Empirical Illustration
Description of the Data
Data Preprocessing
• Downloads data from the Ares P2P network (April 5, 2007-July 15, 2007)• Sales for Newly Released albums in billboard’s Top 200 (May 1-July 15,2007)• Album specific characteristics
- Previous total # of albums of the artist- Genre of music (Rap & Rock)- Gender of the artist (Male)
• Newly Released albums on Billboard 200s weekly chart : 98 albums Extracted Movie Soundtracks or Re-entered albums due to atypical patterns
• Ends up 74 newly Released-albums on Billboard 200s • Choose 50 for Calibration set/ 24 For Hold-out Sample
Calibration Panel set : 50 Cross sectional + 4 Time-series Hold-out Sample: Generate one point-estimate of first-weeks’ sales
6. Empirical Illustration
Sales_over_Aver#_sources
-
100,000
200,000
300,000
400,000
500,000
600,000
700,000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Ave#_sources
Sales_over_Avel#_sources
6. Empirical Illustration
Average Downloads # on Sales (Weekly)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
- 100,000 200,000 300,000 400,000 500,000 600,000 700,000
RTD_Sales
May June
7. Results- Estimation
Estimation Results for Calibration set
0
200000
400000
600000
800000
1000000
1200000
1400000
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155 162 169 176 183 190 197
album
sa
les
Actual Sales Estimated Sales
[Figure 3] Total Estimation Results for Calibration set
7. Results- Estimation
Calibration_Results(First_week)
-200000
0
200000
400000
600000
800000
1000000
1200000
1400000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
album
Sal
es
Actual_Sales Forecasted_sales
Calibration_Results(Second_week)
0
50000
100000
150000
200000
250000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
album
Sal
es
Actual_Sales Forecasted_sales
Calibration_Results(Third_week)
0
200000
400000
600000
800000
1000000
1200000
1400000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
album
Sal
es
Actual_Sales Forecasted_sales
Calibration_Results(Fourth_week)
0
100000
200000
300000
400000
500000
600000
700000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
album
Sal
es
Actual_Sales Forecasted_sales
[Figure 4] Weekly Estimation Results for Calibration set
[Figure 5] Individual Album Sales Estimation: High, Medium, Low -Level of Success
7. Results- Estimation
BECAUSE OF YOU
0
50000
100000
150000
200000
250000
300000
1 2 3 4week
Actual Sales Estimated Sales
DOUBLE UP
0
200000
400000
600000
800000
1000000
1200000
1400000
1 2 3 4week
Actual Sales Estimated sales
7. Results- Estimation
SNAKES & ARROWS
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
1 2 3 4week
Actual Sales Estimated Sales
AMERICAN DOLL POSSE
0
10000
20000
30000
40000
50000
60000
1 2 3 4week
Actual Sales Estimated sales
[Figure 5] Individual Album Sales Estimation: High, Medium, Low -Level of Success
7. Results- Estimation
POISON'D
0
5000
10000
15000
20000
25000
1 2 3 4week
Actual Sales Estimated Sales
ANOTHER SIDE
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
1 2 3 4week
Actulal Sales Estimated Sales
[Figure 5] Individual Album Sales Estimation: High, Medium, Low -Level of Success
8. Results- Forecasting
[Figure 6] Comparison of Forecast Results using Different Measure of Downloading #
Forecast_Results
0
50000
100000
150000
200000
250000
300000
350000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
albums
Actual_Sales Ave_Before_week Ave_Total_download
Absolute_Average Error
0
0.5
1
1.5
2
2.5
3
3.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
albums
Measure1 Measure2
8. Results- Forecasting
[Table 1] Comparison of Forecasts (MAPE) with Different Measure
Aver. Download # a week before Release (Measure 1) 0.689Aver. Number of total Download # before Release (Measure 2) 0.555Mean of Both Measure 1 & Measure 2 (Measure 3) 0.58
Comparison of MAPE with different Measure
[Table 2] Comparison of Fit (MAPE) with Previous Studies
Proposed ModelLee, Boatwright & Kamakura(2003)
Generalized Bass model
Generalized Gamma model
Forecasting 0.555 0.7 0.799 0.896Estimation 0.733 0.178 0.196 0.267
MAPE Comparison
(Lee, Boatwright, Kamakura 2003)
t
t
tt
A
FA
n
||1MAPE =
8. Results- Forecasting
[Figure 7] Point-estimate of Pre-launch Sales forecasts for the first-week
Pre-launch Sales Forecasts
0
50000
100000
150000
200000
250000
300000
350000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
albums
Actual_Sales Sales_Forecasts
8. Results- Forecasting
[Figure 8] Sales-Coefficients ( i ) on Explanatory variables
Sales on # of Available Dates Sales on # of Average Downloads #
8. Results- Forecasting
Sales on 2nd week dummy variable Sales on 3rd week dummy variable
8. Results- Forecasting
[Figure 9] Downloads-Coefficient () on Album characteristics
Total # of previous albums Rap (Genre)
Rock (Genre) Male (Gender)
9. Future Work
Data
Methods
•Small sample-size for the # of Newly-Released albums•Lack of New-albums which is NOT on Billboard 200s’ chart
=> New release data purchased
•Does ARES network large enough to represent downloads behavior?=> Better prediction by using additional P2P network data
•Needs enhancement from simple standardized form of model •Need enhancement in the prior-distribution for using Barnard et al. (2000)V
Model•Does weekly cyclic pattern really exists in larger sample of albums?•Does supply side of data (audio_source) for downloads better than demand side of data (hash_request) for downloads # in forecasting sales?•Is there ommitted variable problem for not considering promotional-effect variable?