Growth and popularity in markets for free digital products
Gil AppelMarshall School of Business
University of Southern [email protected]
Barak LibaiArison School of Business
Interdisciplinary Center (IDC), [email protected]
Eitan MullerStern School of Business
New York University Arison School of Business
Interdisciplinary Center (IDC), [email protected]
June 2016
The authors would like to thank Gal Elidan, Zvi Gilula, Jacob Goldenberg, Hema Yoganarasimhan, Scott Neslin and Oded Netzer for their advice and helpful comments during the research process.
Growth and popularity in markets for free digital products
Abstract
Free digital products (FDPs) dominate online markets, yet our knowledge and theories about
their growth are based mainly on conventional goods. We demonstrate how FDPs’ growth
dynamics differ from those observed for conventional new products, using a large-scale dataset
that documents the growth of close to 60,000 FDPs, and supported by an additional growth
analysis of thousands of mobile apps. We find that FDPs display three distinct patterns of
growth: bell-shaped pattern (“Diffuse”); exponential-type decline (“Slide”); and a combination
of the two (“Slide and Diffuse”). We further show a robust relationship between FDP popularity
and growth pattern ubiquity, providing the first evidence of a correlation between products’
popularity and growth patterns. We further show how FDP-related growth phenomena help to
explain the patterns that emerge, and elucidate the need to adapt our knowledge on new product
growth and its modeling to the fast-moving world of free digital products.
Keywords: diffusion of innovations; free products; mobile applications; product life cycle; social influence; software
2
1. Introduction
An intriguing development in the consumer market landscape is the substantial increase in
the number of digital products available for free (Anderson 2009). Free digital products (FDP)
have been available for a while for computer software products supplied via online platforms,
joined recently by similar FDPs for smartphones and web applications. Some of this availability
stems from the “freemium” business model, under which a certain percentage of adopters will
eventually upgrade to a less restricted version or purchase in-app byproducts (Kumar 2014). Yet
the increase in FDPs also follows other developments such as the rise of open-source software
collaboration projects, where many users join forces to produce software products that will be
free except for technical support (Mallapragada et al. 2012). Recent reports highlight the
ubiquity of the phenomenon: More than 90% of recently downloaded smartphone applications
were free, with this percentage expected to continue rising in the foreseeable future (Olson 2013;
AppBrain 2016). In established markets (e.g., task management tools and anti-virus programs), a
fierce battle is being waged among “freemium” and “premium” business models (Dunn 2011;
Woods 2013).
The question we put forth is whether the new product growth generalizations and insights
developed over the years in markets for conventional products apply also to FDPs. In particular,
we focus on the essential shape of growth. Previous research has maintained that growth in
digital environments in general (Rangaswamy and Gupta 2000) and FDP in particular (Jiang and
Sarkar 2010; Lee and Tan 2013) follows the commonly observed S-shaped diffusion patterns –
with bell-shaped non-cumulative growth – and can thus be analyzed using traditional diffusion
models. However, when we began to examine the growth patterns of tens of thousands of FDPs
(to be described later), the picture that emerged was different: While a bell-shaped pattern
3
implies growing demand early on, we find that in the FDPs examined, the correlation of month-
to-month growth in the first year was positive for only about 40% of the products. What further
stood out was that the extent of the phenomenon was highly correlated to the level of product
popularity: The percentage of positive correlations monotonically decreased with the popularity
level, down to 26% positively correlated patterns for the less popular bottom 10% of popularity.
This is not the conventional pattern of growth we read about in the new product textbooks.
What can drive this phenomenon? Note that conventional products are associated with
significant R&D costs, as well as costs of manufacturing, marketing, and maintaining market
presence. Therefore, firms will invest in screening and testing of the products before market
launch, and will be motivated, internally or due to channel pressure, to take a product off the
shelf if it seems to fail. The case of FDP differs, in particular due to the low barriers to
development and introduction of many digital products into the market, which may be reflected
in two ways: First, the cost of adapting the product and offering it to small, specific niches is
low, which leads to a “long tail” of supply (Brynjolfsson et al. 2010). If past research
emphasized the ability of digital channels to enable a long tail of physical goods (Brynjolfsson et
al. 2011), then the fact that the goods themselves are digital enables even better tailoring to small
niches. Second, given the lack of barriers, there is an increased presence of small and less
experienced suppliers with low resources to invest in marketing, so that we can expect to find
many products whose low popularity stems from inability to reach larger audiences even if
targeted otherwise. Indeed, it is reported that a large share of FDPs are considered failures, and
eventually do not even cover development costs (Foresman 2012; Rubin 2013).
Overall, whether the niche market was intended at the outset or not, consumers should face
a large share of low-popularity products when considering supply in the FDP market, at least in
4
the absolute number of offerings1. Empirical data suggest that this is the case for markets such as
free PC software (Zhou and Duan 2012) and mobile applications (Zhong and Michahelles 2012).
In early 2016, for example, among the 1.87 million free android apps available, more than 60%
had fewer than 1,000 downloads, and only about 1% were downloaded more than 1 million times
(AppBrain 2016).
This phenomenon raises an interesting question regarding the prevalent growth pattern of
FDPs. Our knowledge on the diffusion of new products has been largely shaped in markets such
as durables, pharmaceuticals, and services looking typically at highly popular cases of growth
(Peres et al. 2010). In fact, one of the essential concerns with the understanding of innovation
diffusion is that nearly all knowledge comes from successful innovations (Greve 2011; Rogers
2003). This lack of evidence on the growth pattern for what may be the majority of the FDP
market is an issue of significant managerial and theoretical importance. The shape of the growth
curve is considered “the most important and most widely reported finding about new product
diffusion” (Chandrasekaran and Tellis 2007). Studying growth patterns is a fundamental stepping
stone to the understanding of markets for new products: It is used to understand the driving
forces of new products’ success; as a base for modeling and optimizing firm behavior in the
context of new product introductions; for decisions of termination or further support for new
products; and for segmentation by adoption times (Golder and Tellis 1997; Peres et al. 2010).
Here we study the full spectrum of growth patterns in FDPs, providing comprehensive
evidence for a fundamental difference between the growth of highly studied superstars, and the
growth of the less popular majority. The ability to track information in the case of FDP markets
provides an opportunity to conduct a large-scale analysis in a way seldom available to past new
1 This is independent of the question of the share of downloads by various segments of the popularity curve in such markets (Brynjolfsson et al. 2010).
5
product growth researchers, and to overcome the problem of a left truncation bias to lack of data
on the product’s early days (Jiang et al. 2006). We use data on the monthly level of downloads
from launch-day of a large number of software products in multiple categories, with downloads
per product ranging from a few hundred to millions, making this one of the largest new product
diffusion studies to date. Our main data source is the SourceForge database, which enables us to
study the growth of almost 60,000 free software products. We are able to complement this
analysis by also looking at data on the growth of close to 7,000 mobile apps, which shows
consistent results. The main insights can be summarized as follows:
Three pattern archetypes dominate the growth of FDPs in our datasets: a bell-shaped
curve (largely left skewed) that we label diffuse, an exponential-like decline starting at
launch labeled slide, and a combination of the first two – slide & diffuse. Diffuse patterns
represent about half of the cases in our database.
The dynamics that lead a product to the “underdog” part of the long tail differ from the
pattern that leads a product to become a superstar, as the ubiquity of the three archetypes
is strongly related to the popularity of the products. Bell shapes are dominant in popular
products, yet become a minority in small niche products. The fact that the very popular
products are almost exclusively bell shaped may help to explain how previous research,
which has been based on popular products, missed this relationship.
Two phenomena that characterize FDP markets help explain the shape of growth: The
first is the inception effect, representing disproportional early-onset external effects,
which explains the slide phenomenon in the presence of social influence. The second is
the recency effect, which implies that in free digital markets, recent adoptions (and not
only cumulative adoptions as traditionally used in diffusion models) help explain the
dynamic effect of social influence on growth.
Recency is in particular important in helping to differentiate between popular and less
popular products. The association of recency and growth is more than double among the
top popular 10% compared to the bottom 10% in popularity. We further find evidence
that recency level in a category is associated with the shape of the popularity curve, so
6
that higher average recency level in the category is associated with higher inequality,
captured by the Gini coefficient.
These findings are significant to our understanding of FDP growth, and to attempts to
model and optimize growth in such markets. In a broader theoretical sense, these findings imply
that generalizations that developed along the product life cycle, its turning points, and its drivers
(Golder and Tellis 2004) may need re-examining in the rapidly growing, dynamic world of free
digital products.
2. Background
2.1. Related literature
Our study relates to a number of research avenues:
Markets for free digital products: Research on FDPs has examined issues such as
optimal initial spread of freeware as part of profit maximization in the longer run (Cheng and Liu
2012; Niculescu and Wu 2014), free-riding and competitive dynamics (Haruvy and Prasad
2005), and the impact of the creation process on success (Grewal et al. 2006). Other research has
focused on the effect on demand of bestseller ranking and consumer ranking (Carare 2012; Lee
and Tan 2013), as well as other factors such as price discounts on in-app purchases (Ghose and
Han 2014). We add to this growing literature by providing the first large-scale analysis of the
growth patterns of FDPs, which is significant in particular given the assumption that FDPs grow
and should be modeled in a manner similar to other products typically described by the Bass
diffusion model (Jiang and Sarkar 2010; Yogev 2012; Lee and Tan 2013).
The long tail. From another angle, this work is also related to efforts to understand the
nature and significance of supply and demand inequality in electronic commerce, often
7
considered in the context of a “long tail”. Previous literature in that area has focused on the
factors that affect the pattern of sales, and in particular whether it leads to higher shares of sales
among low-selling niche products, or alternatively among high selling “superstars”. Looking at
both supply-side factors, such as broader product variety and distribution channel dynamics and
lower stocking costs, and demand-side factors, such as reduced search costs (Elberse and
Oberholzer-Gee 2007; Brynjolfsson et al. 2009, 2011; Hinz et al. 2011; Kumar et al. 2014),
considerable attention has been given to the inter-customer effect in the form of
recommendations and reviews in the creation of the long tail; yet also providing more “thrust” to
superstars (Fleder and Hosanagar 2009; Oestreicher-Singer and Sundararajan 2012; Hervas-
Drane 2015; Zhu and Zhang 2010).
We add to this literature an exploration of the dynamics at the individual product level
along the curve. If previous approaches have generally accepted the existence of “underdog
products” and “superstar products”, we ask how a product gets to become one or the other.
Patterns of innovation growth: In a more general sense, our effort is related to the
ongoing efforts to study the pattern of new product growth, which spans numerous disciplines
(Rogers 2003). The fact that the adoption rate of successful innovations follows a bell-shaped or
logistic-type curve, and a cumulative S-shaped curve, is considered one of the fundamental
discoveries of social science, and was largely attributed to the dynamic role of social influence
among customers in various forms (Young 2009; Peres et al. 2010). While there is evidence of
some exceptions to the S-shaped curve with a cumulative r-shaped (non-cumulative exponential
decline) pattern for entertainment goods such as movies and for supermarket goods (Gatingon
and Robertson 1985; Sawhney and Eliashberg 1996), the perception across disciplines is that
“the S-curves are everywhere” (Bejan and Lorente 2012). Indeed, these patterns form the bases
8
of diffusion-of-innovations theory and forecasting new product growth using consistent growth
shapes, such as the Bass model, Gompertz, or logistic curves (Meade and Islam 2006).
We add to this literature in two ways. First, we highlight FDPs as an additional, yet
separate category that is not necessarily dominated by S-shaped curves, and show how growth
characteristics of FDPs can explain the various shapes. In a more general sense, we provide
initial evidence for the relationship between product popularity and the shape of growth, an
unexplored issue in a research stream that has focused on highly popular products.
2.2. Modeling FDP growth
Since our aim is to examine growth along the FDP popularity curve, we will need to model
the growth of an individual free digital product. Two fundamental effects that lead to the
commonly observed S-shaped curve are considered when modeling the growth of new products
(Mahajan et al. 1990): The internal influence captures the impact of previous adopters via word
of mouth, imitation, and network externalities, typically considered a function of the number of
cumulative adopters to date. The external influence captures influences outside of the group of
previous adopters, such as advertising and mass media. We argue that an adaptation is needed in
both types of influence is to capture the growth in FDP markets as follows:
Internal influence and recency effect. While diffusion modelers have largely used the
number of cumulative adopters as a sole indicator of internal influence, some recent work points
to a possible need to separate the effect of recent adopters from that of cumulative number of
adopters, attributed to the difference in intensity of word of mouth in the two groups (Hill et al.
2006; Iyengar et al. 2011). It has been suggested, for example, that recent adopters may be more
contagious than consumers who adopted less recently, as the former are more enthused and/or
credible (Risselada et al. 2014).
9
We contend that in particular the growth in FDPs should allow this distinction. First, it is
often reported that for many FDP users, usage and engagement center on the time right after
adoption (Danova 2015). Second, it is well accepted that adopters of FDPs (and other digital
goods) rely heavily on popularity ranking information as appears in social media, app stores, and
download sites (Carare 2012; Garg and Telang 2013; Ghose and Han 2014; Lee and Raghu
2014). Yet, as is clearly observable, rankings do not necessarily reflect cumulative downloads,
but rather reflect past period popularity (Neitz 2015). This means that the recent number of
downloads, and not only cumulative ones, may play a pivotal role in FDP download decision
making. In fact, popularity rankings may also affect users who do not consider this information
explicitly, but rely on search. For example, it is reported that search results of engines belonging
to Google and Apple also largely depend on recent popularity ranking when displaying results
(Walz 2015).
External influence and the Inception effect. External influence is traditionally a parameter
that captures the marketing mix in the industry, in particular that of advertising (Mahajan et al.
1990). In the absence of large-scale advertising support for many FDPs, and given the
dominance of social media, much of the external influence comes from social media articles and
experts’ recommendations and ratings. However, attention to new products may be short lived:
Given the large number of launched products, the attention given to a new product centers on the
beginning of its life cycle. In fact, even when considering firms that do invest in advertising to
promote FDPs, there is a strong motivation to focus on the early period of growth. It is argued
that FDP producers have a short window of time in which to generate the groundswell that can
lead to attention by sources such as the charts in the app stores, and thus they must act early on
(Rice 2013; Kimura 2014). Consequently, those FDP developers who invest in marketing may
10
often do so in “burst campaigns” that are meant to get them on consumers’ radar early in the
game (ADA 2014; Klein 2014). Overall, we can expect that for FDPs, external influence will be
particularly strong early on in the new product’s life, a phenomenon that we label the inception
effect. This effect can be reflected in decay in the external influence parameter’s value over time.
Following these, we will use an FDP growth model that takes into consideration the
inception and recency effects. We begin with the fundamental Bass product growth model,
which is widely used to model the growth of new products. Under this approach, expected
adoptions at time period t (between t and t+1) are assumed to be reflected in the following
equation, where N is the market potential, X t is the cumulative number of adoptions up to time t,
p is the force of external influence, and q is that of internal influence:
(1)
X t +1−X t=( p+q⋅X t / N )⋅(N−X t )
To capture the inception effect, we let the external parameter be a varying function of time
with an initial external influence parameter (p), using an external decay parameter (δ), to capture
the decay in marketing effect over time. If the external decay parameter is positive, external
influence intensity decays with time. To capture the recency effect, we separate the internal
influence into two sub-parameters: As in the classic diffusion approach, parameter q captures the
effect of cumulative adoptions. The recency parameter r captures the effect of recent adoptions,
so that we multiply the relative change in the past period X t−X t−1
by r. We can now write the
model as follows:
(2)X t +1−X t=( pe−δ⋅t +q⋅X t / N+r⋅( X t−X t−1 )/ N )⋅(N−X t )
11
3. Growth Patterns at SourceForge3.1. Dataset
Our primary source of data is SourceForge.net, a large, open-source software (OSS)
repository that empowers software developers to control and manage open-source software, and
enables users to download these products for free (Madey 2013). As of June 2013, when we
scraped the data, SourceForge offered about 400,000 registered projects, with 3.4 million
registered developers and 4 million downloads a day. As such, it is among the largest download
sites, and home to some well-known consumer software products such as VLC media player,
eMule, and 7-Zip. In fact, many users may not be aware that products they download from
various software download sites are actually hosted by SourceForge.
Scraping SourceForge, we retrieved the monthly history of downloads for a large number
of products. The number of downloads is largely used to assess the success of open-source
products (Grewal et al. 2006; Daniel et al. 2013) and in a broader sense acts as a proxy for the
success of free products (Chandrashekaran et al. 1999). While SourceForge contains a large
number of products (close to 400,000), many of them are inactive and had zero downloads, and
thus are not relevant to our analysis. We focused on the download patterns for the 59,343
products that met the following criteria:
Data from five years of growth. We looked at a 60-month window for all products.
Naturally, the life cycles of FDPs are considerably shorter than the typically analyzed
growth of durables (although for some products, the cycle may be longer). Thus, to
reduce cases of right censoring and to use a consistent time frame, we considered only
products launched before mid-2008. Nonetheless, our analysis suggests that we covered
the majority of downloads for the various products2.
2 SourceForge data is reported monthly. Because the first month is incomplete and with varying lengths, which can bias the results, we use the first full month for which we have data as the first month.
12
At least 200 downloads at the five-year window. This criterion enabled us to capture
actual growth processes that are not affected much by possible developers’ noise over the
product life cycle.
The distribution of downloads in our data points to a large variance in downloads among
the products (see Figure 1). In our dataset, 41% of products had less than 1,000 downloads, while
about 0.6% (329 products) had more than one million downloads. The Gini coefficient is 0.96,
which indicates a high concentration, larger than those reported for markets such as videos and
books (Oestreicher-Singer and Sundararajan 2012). As our focus of interest is the shape of the
growth, and in order to be able to compare between patterns, we scale each pattern to a (0,1)
scale by dividing each observation by the total sum of downloads. We further elaborate on this
scaling in Section 4.2.
Figure 1: Distribution of popularity in SourceForge
3.2. Patterns and estimation of data and model
We break identifying patterns of growth into two stages: We first use the FDP model
presented in Section 2 to smooth the data, particularly essential given our use of monthly data,
13
which is much noisier than the classical annual diffusion data. Our analysis shows that not only
does using the FDP model have the advantage of being theoretically driven, but it also creates a
better smoothing algorithm than do alternatives such as HP filters. See Appendix A for a
discussion.
For the estimation we use general nonlinear optimization. Since we estimate scaled data
(with a sum of 1), we use the augmented Lagrange multiplier method to ensure that our
estimations always sum to one3. To examine our estimations’ fit, we consider two fit measures:
The first is R2, with an average value of 47.7% (as we use nonlinear estimation, an adjusted R2
coefficient could not be calculated). Recall, however, that large-scale monthly data can be very
noisy. We do find a positive correlation between the R2 values and the log of each product’s
downloads (ρ = 0.35), suggesting that when downloads are few and the data tends to be more
noisy (as is the case with much of our data), the R2 levels may be lower. However, the R2
measure can be biased toward capturing the peaks (where the variance can be larger) compared
to the entire curve.
As an alternative to the R2 measure, we also use the Kullback-Leibler divergence (KL
divergence) to measure the difference between our estimations and the data (Dzyabura and
Hauser 2011; Gilula and McCulloch 2013). KL divergence weighs each observation and thus is
less sensitive to the absolute value of the difference, weighing the relative difference instead. The
average KL divergence is 0.2 (with a standard deviation of 0.22, median of 0.14, and mode of
0.05), and most of the divergence values are close to zero.
Classifying the patterns and matching parameters to patterns. In a second stage, we
determine the patterns that emerge. Consistent with past efforts to identify patterns and turning
3 This procedure can be applied using the Rsolnp package in R (Ghalanos and Theussl 2014) or the Solnp implementation in MATLAB (Ye 1987).
14
points in diffusion data, we use a peaks-and-valleys algorithm for the classification (Goldenberg
et al. 2002; Golder and Tellis 2004; Chandrasekaran and Tellis 2011) using these rules:
We count the number of peaks and troughs in the data.
We require peaks to be substantial (10% over the start period or the previous valley) as
well as troughs (a drop of 10% or more since the start period), or else they are ignored
(Goldenberg et al. 2002).
Using a difference algorithm, we find that there were at most two peaks and one trough in
each pattern, which leads us to three general patterns:
o If a pattern climbs from the start toward a peak, then it is pattern that we label
Diffuse, as it is consistent with what we may expect given diffusion theory.
o If a pattern begins with a drop in adoptions with no peaks, it is labeled a Slide
pattern [named after playground slides].
o If a pattern begins with a drop in adoptions but has a later peak, it is labeled Slide
& Diffuse (S&D).
Table 1 presents some statistics on the resultant archetypes as well as the estimated model
parameters per archetype. As we can be seen in the descriptive statistics in part a) of Table 1, the
Diffuse pattern is the most ubiquitous in the data, with 48% of the software exhibiting this
pattern (28,490 patterns); S&D patterns were nearly 28% of the data (16,583 patterns), and
Slides accounted for 24% of the data (14,270 patterns).
Table 1: Archetype pattern characteristics and estimation
a) Descriptive Statistics Diffuse Slide S&D
No. of patterns 28,490 14,270 16,583Share (%) of patterns 48% 24.1% 27.9%Average no. of downloads 110,093 6,363 9,481Median no. of downloads 2,698 996 900
15
b) Model parameter values Diffuse Slide S&D
p (initial external effect) 0.009 0.067 0.032q (cumulative effect) 0.04 0.038 0.053r (recency effect) 0.489 0.184 0.318δ (external decay parameter) 0.18 0.546 1.829
We next examine the relationship between popularity and the shape patterns. One way of
doing so is to divide the dataset by equal download bins in terms of number of products (top
10%, 11%-20%, etc.). Figure 2 shows the relationship between bin membership and the
percentage of each archetype in every popularity bin, in the case of equal-size bins, separating
the top 1% from the rest of the top bin. We observe a strong monotonic increase in the share of
Diffuse patterns from low-popularity products to high-popularity ones, and a monotonic decrease
in the share of Slide and S&D patterns as products increase in popularity. While Diffuse is the
clear majority (89% of the products at the top 1%), it represents a minority among the less
popular products (less than third of the cases in the bottom 10% of the products).
We can also see this pattern visually going back to Figure 1 above. In Figure 1, Diffuse
patterns received a dark shade, while Slide and S&D received a pale shade. We can see how the
color of adoptions is dark around the area of high downloads, and lightens as we look at the long
tail of adoptions.
16
Figure 2: Growth patterns and popularity in SourceForge data
Bottom 10%81%-90%
71%-80%61%-70%
51%-60%41%-50%
31%-40%21%-30%
11%-20%2%-10%
Top 1%*0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
31% 33% 36% 38%42%
47%52%
57%
66%
78%
89%
29% 30% 29% 29% 28% 26% 24% 21%16%
9%4%
40% 37% 35% 33% 30% 27% 24% 22% 19%13%
7%
Diffuse Slide S&D
Download popularity
Patte
rn %
* Note that the top 1% is presented separately.
3.3. A log scale analysis
One of the challenges of an equal decile analysis in a concentrated distribution is that the
range with some bins may be very large. As can be seen in the upper portion of Table 2 below,
when considering equal deciles, the range of downloads is very large in the upper decile, while in
the lower decile, the range is small. To limit this variance, we took a long scale of the range of
downloads (200 to over 300M) and divided it into 10 bins of download size. As can be seen at
the bottom portion of Table 2, the within-bin discrepancy is now lower, however, in the upper
bins there are far fewer products.
17
Table 2: Range of downloads in each bin with equal and log-based binsEqual deciles
Bin 1 -
Bottom
Bin 2 Bin 3 Bin 4 Bin 5 Bin 6 Bin 7
Bin 8 Bin 9 Bin 10 - Top
Frequency
5,934 5,935 5,934 5,934 5,934 5,935 5,934
5,934 5,935 5,934
Min 200 302 447 649 957 1,463 649 4,040 8,027 23,066Max 302 447 649 957 1,463 2,322 957 8,027 23,05
9329.4
M
Log-based bins
Bin 1 -
Bottom
Bin 2 Bin 3 Bin 4 Bin 5 Bin 6 Bin 7
Bin 8 Bin 9 Bin 10 - Top
Frequency
21,663 18,333
11,188
5,135 2,021 689 231 62 16 5
Min 200 836 3,506 14,716
61,767 259,255
1.1M
4.5M 19.2M
80.5M
Max 835 3,505 14,715
61,766
259,254
1.1M 4.5M
19.2M
80.5M
329.4M
Figure 3 presents the shape ubiquity of the log scale deciles. We see that the ubiquity
pattern seen in Figure b2 continues, displaying an even larger difference among the bins. The
two bins that contain 21 products of 19.2 M downloads and up are composed of 100% Diffuse
pattern.
Figure 3: Growth patterns and popularity in SourceForge data (log-based bins)
18
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
34%
45%
59%
72%
81%86%
91%97% 100% 100%
37%
28%21%
16%12% 10%
6% 3% 0% 0%
29% 26%20%
12%7% 5% 3%
0% 0% 0%
Diffuse S&D Slide
Download popularity
Patte
rn %
3.4. Categories
The patterns we see above reflect a blend of many product categories. Is the pattern
ubiquity driven by a subset of the dataset, or is consistent across product types? To see which,
we repeated the analysis of Figure 2 with the six most popular categories SourceForge uses.
Figure 4 presents the results for the larger categories, and in Appendix B we can see the
distribution of the patterns in all 16 categories. As can be seen in Figure 4, the ubiquity pattern
identified above generally remains stable.
Figure 4: Ubiquity of shapes in equal-download deciles by category (top six categories)
19
3.5. Does it work outside of SourceForge?
To what extent can our findings from open-source software be generalized to other
freeware environments? In particular, smartphones have become a prominent freeware
distribution outlet, to the extent that the vast majority of smartphone apps are freeware (Olson
2013). While data on large-scale smartphone app adoption over time is not readily available to
20
researchers (Garg and Telang 2013), we were able to obtain the cooperation of a global firm,
which we will call “Mobility” so as not to reveal its identity. Mobility is a player in a market of
helping businesses create free smartphone apps that can be used as part of their business. Under
this business model, Mobility creates the app and helps manage it for the client for a monthly
fee. Mobility clients are varied and include service providers such as restaurants, artists,
musicians, educational institutions, and non-profits. These clients offer free apps created on the
Mobility platform for their own end users, who are typically individual customers or prospects.
Mobility can track these apps’ downloads by end users over time.
Figure 5: Growth patterns and popularity in Mobility data
Bottom 10%81%-90%
71%-80%61%-70%
51%-60%41%-50%
31%-40%21%-30%
11%-20%2%-10%
Top 1%*0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
33% 35% 36%40%
47% 48%
57% 56%
64%
77% 78%
45% 42%38%
31% 31%27% 25% 24%
19%13%
9%
22% 23%26%
29%
22%25%
18% 20% 17%
10%13%
Diffuse Slide S&D
Download popularity
Patte
rn %
21
* Note that the top 1% is presented separately.
The Mobility dataset is more limited than that of SourceForge in several aspects: The
Mobility apps are specific to certain service providers, so are naturally relevant to much smaller
market segments. In addition, unlike the case of open-source software, there is an entity
(Mobility clients) that may make dedicated efforts to push the freeware via external effects,
which we do not observe. While the time span we have for Mobility downloads is more limited,
it is more detailed, as we observe weekly data for downloaded apps (between February 2011 and
November 2013). Due to the smaller magnitude of adoption, we used data on apps that had at
least 50 downloads, taking a minimum of 52 weeks, and truncated at 52 weeks. We thus had
weekly adoption data for 6,914 smartphone apps.
We repeated the analysis as in the first dataset, and found notably similar results to the
SourceForge case. The patterns that emerged were grouped again in the same order of size into
the archetypes of Diffuse (49%), Slide (30%), and S&D (21%). We can see that while the share
of Diffuse patterns is close to that of SourceForge, the share of Slides is higher at the expense of
S&D.
The Mobility data can also help us examine the data from another angle, which might
affect the archetypes: the issue of versions. FDP creators often release new versions (i.e.,
software updates) and in the SourceForge database, 63% of the products have released more than
one version over the examined life cycle. One might wonder if the demand for versions can
fundamentally affect the archetypes we see and their relationship to popularity. We looked at the
issue in two ways: First, the Mobility data includes only one version, and as we can see, the
extent and the pattern of archetypes remains the same. Second, in the SourceForge dataset, we
looked at products that had only one version to see if within this group the dynamics of the entire
22
groups reported above change. Here also, we found that the dynamics of archetypes’ ubiquity
and popularity shown in Figure 2 largely remain the same for the one version only. Thus,
versioning does not appear to be the driver of the phenomena we identify here. The descriptive
statistics and parameter estimation per archetype for Mobility’s data is found in Appendix C in
Table C1, parts a) and part b) respectively.
23
4. Recency, inception, and the share of patterns4.1. Parameter values per shape
Our next aim was to see to what extent our data can help us to understand the relationship
between popularity and shape ubiquity identified above. We thus now turn to examine the
implication of parameter values that emerge from the FDP model we used, and see their
relationship to popularity.
Looking first at part b) of Table 1, we see a difference between parameter values of the
different shapes. The difference between each pair of the archetypes was significant using a two-
sample Hotelling’s t2 test, and similarly with a two-sample t-test. Following that, we want to
ensure that our model’s parameters actually define and drive these patterns, and that the
classification results are determined by the model and its parameters. We used a random forest
classifier (Breiman 2001) to see if we can correctly match the classified patterns using the
parameters of the freeware model. We indeed see that the random forest classifier shows a very
low out-of-bag error of 2.27%. The resultant confusion matrix is found in Table 3:
Table 3: Random forest confusion matrix results for the classification of the three patterns
Predicted
Actual
Diffuse Slide S&D Classification error
Diffuse 98.4% 0.7% 0.9% 1.6%Slide 1.4% 96.6% 2.0% 3.4%S&D 1.1% 1.4% 97.5% 2.5%
* Percentages are of the actual number of patterns.
4.2. Share of effects and popularity
We now turn to see if the effects represented by the parameters are related to product
popularity. An interesting feature of diffusion modeling is that it allows us to further understand
how the various parameters drive the distinctive shapes (Mahajan et al. 1990). Let T be the time
24
horizon (T = 60 in the SourceForge data, and T = 52 in the Mobility data). Thus we set
N=m⋅XT where m is a scaling factor of the observed data and XT is the total number of
downloads up to time T. In addition, in order to remove the effect of popularity on our shape
analysis, each observation is divided by the sum of observations over time (XT ), and the
equation has been translated into percentages in the standard manner by dividing both sides by
XT. We calculated the sources of growth by breaking down Equation 2 into the main
components that drive adoptions as follows4:
(3) Cumulative w-o-m effect =q⋅
X t
m⋅XT⋅(m−
X t
XT)
(4) Recency effect
=r⋅( X t−X t−1
m⋅XT)⋅(m−
X t
XT)
(5) Inception effect = pe−δ⋅t⋅(m−
X t
XT)
Turning to Table 4, we see a difference between the three archetypes in both parameter
value and share of patterns. While the inception effect is especially dominant for Slides, it has
the lowest share for the other two archetypes.
Table 4: Share of pattern attributed to each effect by archetype pattern
Share of pattern attributed to: Diffuse Slide S&D
p + δ (inception effect) 26.8% 49.3% 11.5%
4 The parameter m represents the fact that the diffusion process has not ended after 60 periods. Thus, m is a scaling parameter that does not have an impact on the shape of a pattern. We repeated the random forest examination without the using the m parameter, obtaining nearly identical results with out-of-bag error of 2.29%.
25
Share of pattern attributed to: Diffuse Slide S&D
q (cumulative w-o-m effect) 41.2% 40.1% 67.5%r (recency effect) 32.0% 10.6% 21.0%
Similar results were found in Mobility’s data in Table C1 (part c) of Appendix C. Figure 6
further elucidates the relationship between share of effect and popularity. We see the average
share of influence of the external, recency, and cumulative effects in various popularity tiers,
generated from Equations 3-5. The direction is clear: The share of recency increases dramatically
from 15% at the lowest popularity products to 40% at the upper 10%, while the cumulative effect
and the inception effect monotonically decrease in their share from less popular to more popular
product tier.
Figure 6: Share of patterns and profitability
Bottom 10%81-90%
71-80%61-70%
51-60%41-50%
31-40%21-30%
11-20%Top 10%
-10%
0%
10%
20%
30%
40%
50%
60%
70%
31% 31% 31% 30% 30% 30% 28% 27%24%
18%15% 16% 18% 19% 21% 23% 25%
28%32%
40%
55% 53% 51% 50% 49% 48% 47% 45% 44% 42%
External Recency Cumulative
Download popularity
Patte
rn's
effe
ct %
To further understand the source of difference among the patterns in this respect, consider
Figure 7, in which we graph the dynamics of the share of each effect over time for each pattern
archetype, using the average parameter values for each pattern presented in part b of Table 1.
26
Figure 7: The temporal dynamics’ shares of effects in the three patterns
Figure 7a: DiffuseCumu...
Months
Dow
nloa
ds
Figure 7c: Slide & DiffuseCumu...
Months
Dow
nloa
ds
Figure 7b: SlideCumu...
Months
Dow
nloa
ds
Consider the case of a Slide pattern (Figure 7b). An exponential-like decline in demand
was considered in the past in two types of markets. In the case of low-involvement supermarket
goods, the explanation – a slide-like pattern – was attributed to lack of inter-customer social
influence, and a dominant role of external effects such as advertising (Fourt and Woodlock 1960;
27
Gatingon and Robertson 1985). However, FDPs typically are not promotion driven, and there is
no reason to assume that they are unaffected by social influence (see Aharony et al. 2011 for the
role of social influence in such markets). In the case of entertainment goods such as movies, a
Slide pattern was identified in particular for blockbusters, and was explained by the anticipation
leading up to the movie’s release, which on the one hand can create a social influence process
pre-release, and on the other hand drives marketers to invest large resources in advertising and
screens early on in these blockbusters’ life cycles (Moe and Fader 2002; Ainslie et al. 2005). It is
not unlikely that such a phenomenon will be relevant to the continuous flow of free products we
examine. In particular, we see an opposite effect to that of movies: For FDPs, it is the least
popular products that exhibit a Slide pattern, not the most popular ones, indicating a different
process.
What Figure 7b suggests instead is an inception that is driving demand in particular early
on. However, later on it is joined by a cumulative effect, which has a large share – though not as
large as inception – in driving growth. Thus, while we don’t need to assume a lack of social
influence to explain the declining Slide pattern, the inception effect is not enough to create a
highly popular product. To create a bell-shaped Diffuse pattern, social influence should kick in
relatively early, and become dominant. What is in particular interesting from the Diffuse
dynamics in Figure 7a is the role of the recency effect. For products that enjoy social influence,
recency becomes the dominant social influence early on, immediately following the external
influence ignited by the inception effect. Only later when there are enough adopters does the
cumulative effect become dominant. As can be seen in Table 4, the cumulative effect has the
overall largest share in Diffuse, yet not much larger than the recency effect. If the recency was
28
not there to start the social process that eventually will bring in a larger number of adopters, the
product could have remained a lower-popularity Slide.
As Figure 7c suggests, S&D begins with an inception effect that is stronger than that of a
Diffuse (yet weaker than that of a Slide), and a recency effect that is weaker than that of a
Diffuse (yet higher than that of a Slide). In such a case, while the initial pattern is that of a
decline, the product creates enough social influence to turn the pattern around later on, i.e., social
influence begins to dominate due to the recency effect, and eventually the cumulative effect
becomes dominant. Overall, it seems that a recency effect is critical to create a popular FDP,
explaining why we see a large difference in the role of recency between the high- and low-
popularity products (Figure 6). Popular products create enough social influence early on, whether
by direct word of mouth or by ranking information to potential adopters, so that recency and later
on the cumulative effect begin to drive demand upward. What could have remained a less-
popular Slide thus become a more-popular Diffuse.
5. Discussion and Conclusion
5.1 The fundamental findings
The first core insight emerging from our findings is that Free Digital Goods have distinct
growth patterns. A large body of research has used information on the adoption of traditional
durables and services to teach us how new products grow, and constitute the base for managerial
thinking on product introduction in general. The fundamentally different shape of growth for
FDP indicates that we need to be cautious in applying past diffusion knowledge to these
environments. One could use the case of movies as an example in that sense: Given the unique
dynamic of growth and profitability in the motion picture industry, its growth and dynamics have
29
been largely analyzed separately from other categories (Eliashberg et al. 2006). The case of FDP
growth may require a similar consideration.
A second essential issue relates to the relationship between popularity and growth. The
strong monotonic relationship between popularity and the shape of growth we witness suggests
that past focus on popular products when analyzing new product growth can lead to a real bias.
This issue is of particular importance for FDPs given the significant dispersion of demand and
the presence of a significant long tail. This finding may have major implications for other
categories as well, yet given that we have data only on FDPs, the generalizability of our finding
to other categories remains to be investigated. The issue of popularity is interesting particularly
in light of the rich research stream that has acknowledged the strong dispersion of product
popularity in digital environments, and the existence of a long tail of demand (Brynjolfsson et al.
2010). While the 60,000 FDPs analyzed here show a large dispersion indeed, the fact that we
could use individual-level growth data and not look at products cross-sectionally based on
overall popularity, enabled us a unique opportunity to understand the creation of demand
dispersion in digital environments.
Our analysis suggests that for FDPs, environments, sales or downloads may start with a
drop. The fact that products are free encourages potential users to download them even if they
are not completely certain they need them. Since people often hear of FDPs most at the time of
their release (the “inception effect”), the time after launch may be relatively high in adoptions.
Yet in order to attract to a wider audience, external influence early on and even some word of
mouth afterwards may not be enough: The product has to create an engagement that will produce
social influence, which in turn will make it popular for a larger market potential. This stage will
be driven by the recency effect, which represents the effect of recent adopters on potential ones.
30
This effect is relevant to many product categories because of the high involvement and word of
mouth characteristics of recent adopters. But it should have a special meaning for free digital
products, as in FDP environments, consumers learn much from recommendation engines,
ranking tables, and search results. As we discussed, all of these may be largely affected by the
number of recent adopters rather than by cumulative adoption. If a product enjoys a strong
enough recency effect early on, it will quickly move to grow in adoption, with a growing
cumulative base of adopters that will join the recent ones. The growth then will be bell shaped,
and it is thus no wonder that bell shapes are more strongly associated with popular products.
In cases where word of mouth is not strong enough early on, the process may take some
time, and the product will begin with a slide. However, given enough time, the social process
will become dominant enough to spur a growth process that creates a Slide & Diffuse pattern.
Our results are thus consistent with studies that cite the communication process, and in particular
recommendation systems, as affecting the level of overall popularity (Fleder and Hosanagar
2009; Oestreicher-Singer and Sundararajan 2012). While the recency effect is driven by word of
mouth between individuals, important drivers for FDPs are online recommendation systems (and
search), which provide products with powerful enough social influence early on so that recent
adopters affect new ones and spur the process of real growth. If the product is not appealing
enough to draw people early on, the recency effect will not kick in and the product may remain
in a slide situation.
5.2 Individual product growth and the long tail phenomenon
So far we have not dealt with one of the main interests of the long tail literature – the
magnitude of the variance in popularity between best-selling product and slow ones, and the
nature of markets that increase this variance. While this highly discussed research question is not
31
our focus, and our ability to examine the issue is partial given the limitation on product specific
information in our large scale database, it is still of interest to investigate whether the product
growth dynamics we highlighted here can help to explain the within product variance in
popularity we see in digital markets.
We did take a first look on the matter by considering the variance in popularity in different
markets. Aiming to analyse markets that are as homogeneous as possible, we took advantage of
the fact that beyond the sub categories used above, SourceForge divides products also to sub-sub
categories (SSC). SSC are not always mutually exclusive for individual products, and in some
the number of products is small so that within product variance is less applicable to examine. We
examined 176 SSC (out of 316) where we had at least 50 products per category. We wanted to
see if the value of within product parameters for the SSC can help explain the between-product
variance in popularity that is reflected in the Gini coefficient of the specific SSC.
Using an OLS regression with SSC Gini coefficient as the dependent variable and the
average value of the parameters of the FDG growth model and SSC size as the independent
variables, two parameters came out significant regarding the effect on the SSC Gini: SSC size –
that is larger SSC’s are less equal, as may be expected (p < 0.05); and the parameter of recency –
that is higher recency is associated with higher Gini coefficient and thus the less equal is the SSC
(p < 0.001).
The effect of recency on inequality is consistent with the insights discussed above on the
importance of recency in FDG markets. Beyond the role of recency in the growth and success of
individual products, we see indications that in markets where recency is high the difference
between the less and more successful products is higher. Given our discussion above on the
possible relationship between recency and the effect of recommendation systems, we see this
32
result as supportive of research that highlights how recommendation systems can create
inequality in digital markets (Fleder and Hosanagar 2009; Oestreicher-Singer and Sundararajan
2012; Hervas-Drane 2015): in markets when recommendation systems play a stronger role,
recency effect may be higher leading to higher inequality between the long tail and the
superstars. Yet, to further understand this relationship and additional analysis that uses smaller
scale data yet is able to dive into the specifics of markets is needed. We believe this is a
promising area for future research.
5.3 Conclusion
The ability to collect precise adoption data in a timely manner and for differing levels of
popularity renders digital environments an unprecedented source of knowledge on the growth of
new products. The abundance of data, in particular individual-level adoption data, social network
data flows, and location information, ensure that much of our knowledge on growth is yet to
come, and may demand updating of our beliefs and empirical generalizations created in times
when such data were not available. We hope this study took a sizeable step in this direction.
33
References
ADA. 2014. Discoverability: How to get noticed in a marketplace overflowing with apps. White Paper, Application Developers Alliance, Washington, DC.
Aharony, N., W. Pan, C. Ip, I. Khayal, A. Pentland. 2011. Social fMRI: Investigating and shaping social mechanisms in the real world. Pervasive and Mobile Comput. 7(6) 643-659.
Ainslie, A., X. Drèze, F. Zufryden. 2005. Modeling movie life cycles and market share. Marketing Sci. 24(3) 508-517.
Anderson, C. 2009. Free: The Future of a Radical Price, 1st ed. New York: Hyperion.
AppBrain. 2016. AppBrain Stats. AppBrain, March 26. Available at http://www.appbrain.com/stats
Bejan, A., S. Lorente. 2012. The S-curves are everywhere. Mech. Engrg. 134(5) 44-47.
Breiman, L. 2001. Random forests. Machine learn. 45(1) 5-32.
Brynjolfsson, E., Y. J. Hu, M. D. Smith. 2010. Long tails vs. superstars: The effect of information technology on product variety and sales concentration patterns. Inform. Systems Res. 21(4) 736-747.
Brynjolfsson, E., Y. J. Hu, D. Simester. 2011. Goodbye Pareto Principle, Hello Long Tail: The effect of search costs on the concentration of product sales. Management Sci. 57(8) 1373-1386.
Brynjolfsson, E., Y. J. Hu, M. S. Rahman. 2009. Battle of the retail channels: How product selection and geography drive cross-channel competition. Management Sci. 55(11) 1755-1765.
Carare, O. 2012. The impact of bestseller rank on demand: Evidence from the app market. Internat. Econom. Rev. 53(3) 717-742.
Chandrasekaran, D., G. J. Tellis. 2007. A critical review of marketing research on diffusion of new products. N. K. Malhotra, ed. Rev. Marketing Res. 39-80.
—. 2011. Getting a grip on the saddle: Chasms, or cycles? J. Marketing 75(4) 21-34.
Chandrashekaran, M., R. Mehta, R. Chandrashekaran, R. Grewal. 1999. Market motives, distinctive capabilities, and domestic inertia: A hybrid model of innovation generation. J. Marketing Res. 36(1) 95-112.
Cheng, H. K., Y. Liu. 2012. Optimal software free trial strategy: The impact of network externalities and consumer uncertainty. Inform. Systems Res. 23(2) 488-504.
Daniel, S., R. Agarwal, K. J. Stewart. 2013. The effects of diversity in global, distributed collectives: A study of open-source project success. Inform. Systems Res. 24(2) 312-333.
Danova, T. 2015. The App-Store marketing report: User acquisition, retention, and strategies for getting apps to stand out. Business Insider, February 5. Available at http://www.businessinsider.com/app-store-marketing-strategies-and-stats-2015-2
Dunn, J. E. 2011. Free antivirus grabs more market share, claims Opswat survey. Techworld, June 8. Available at http://news.techworld.com/security/3284838/free-antivirus-grabs-more-
34
market-share-claims-opswat-survey
Dzyabura, D., J. R. Hauser. 2011. Active machine learning for consideration heuristics. Marketing Sci. 30(5) 801-819.
Elberse, A., F. Oberholzer-Gee. 2007. Superstars and underdogs: An examination of the long tail phenomenon in video sales. Harvard Business School working paper.
Eliashberg, J., A. Elberse, M. Leenders. 2006. The motion picture industry: Critical issues in practice, current research, and new research directions. Marketing Sci. 25(6) 638-661.
Fleder, D., K. Hosanagar. 2009. Blockbuster culture’s next rise or fall: The impact of recommender systems on sales diversity. Management Sci. 55(5) 697-712.
Foresman, C. 2012. iOS app success is a “lottery”: 60% (or more) of developers don’t break even. Ars Technica, May 4. Available at http://arstechnica.com/apple/2012/05/ios-app-success-is-a-lottery-and-60-of-developers-dont-break-even
Fourt, L. A., J. W. Woodlock. 1960. Early prediction of market success for new grocery products. J. Marketing 25(2) 31-38.
Garg, R., R. Telang. 2013. Inferring app demand from publicly available data. MIS Quart. 37(4) 1253-1264.
Gatignon, H., T. S. Robertson. 1985. A propositional inventory for new diffusion research. J. Consumer Res. 11(4) 849-867.
Ghalanos, A., S. Theussl. 2014. Rsolnp: General non-linear optimization using augmented Lagrange Multiplier Method. R package version 1.15.
Ghose, A., S. P. Han. 2014. Estimating demand for mobile applications in the new economy. Management Sci. 60(6) 1470-1488.
Gilula, A., R. McCulloch. 2013. Multi level categorical data fusion using partially fused data. Quant. Marketing Econom. 11(3) 353-377.
Goldenberg, J., B. Libai, E. Muller. 2002. Riding the saddle: How cross-market communications can create a major slump in sales. J. Marketing. 66(2) 1-16.
Golder, P. N., G. J. Tellis. 1997. Will it ever fly? Modeling the takeoff of really new consumer durables. Marketing Sci. 16(3) 256-270.
—. 2004. Growing, growing, gone: Cascades, diffusion, and turning points in the product life cycle. Marketing Sci. 23(2) 207-218.
Greve, H. R. 2011. Fast and expensive: The diffusion of a disappointing innovation. Strategic Management J. 32(9) 949-968.
Grewal, R., G. L. Lilien, G. Mallapragada. 2006. Location, location, location: How network embeddedness affects project success in open-source systems. Management Sci. 52(7) 1043-1056.
Haruvy, E., A. Prasad. 2005. Freeware as a competitive deterrent. Inform. Econom. & Policy 17(4) 513-534.
Hervas-Drane, A. 2015. Recommended for you: The effect of word of mouth on sales concentration. Internat. J. Res. Marketing 32(2) 207-218.
35
Hill, S., F. Provost, C. Volinsky. 2006. Network-based marketing: Identifying likely adopters via consumer networks. Statist. Sci. 21(2) 256-276.
Hinz ,O., J. Eckert, B. Skiera. 2011. Drivers of the long tail phenomenon: An empirical analysis. J. Management Inform. Systems. 27(4) 43-70.
Iyengar, R., C. Van den Bulte, T. W. Valente. 2011. Opinion leadership and social contagion in new product diffusion. Marketing Sci. 30(2) 195-212.
Jiang, Z., F. M. Bass, P. I. Bass. 2006. Virtual Bass model and the left-hand data-truncation bias in diffusion of innovation studies. Internat. J. Res. Marketing 23(1) 93-106.
Jiang, Z., S. Sarkar. 2010. Speed matters: The role of free software offer in software diffusion. J. Management. Inform. Sys. 26(3) 207-240.
Kimura, H. 2014. Why app store keyword rankings drop dramatically seven days after launch. Sensor Tower, August 21. Available at https://blog.sensortower.com/blog/2014/08/21/why-app-store-keyword-rankings-drop-dramatically-seven-days-after-launch/
Klein, A. 2014. The Insider: Preparing your new app for launch. Tune, August 12. Available at http://www.tune.com/blog/the-insider-preparing-your-new-app-for-launch/
Kumar, A., M. D. Smith, R. Telang. 2014. Information discovery and the long tail of motion picture content. MIS Quart. 38(4) 1057-1078.
Kumar, V. 2014. Making “freemium” work. Harvard Bus. Rev. 92(5) 27-29.
Lee, G., T. S. Raghu. 2014. Determinants of mobile apps’ success: Evidence from the app store market. J. Management Inform. Sys. 31(2) 133-170.
Lee, Y. J., Y. Tan. 2013. Effects of different types of free trials and ratings in sampling of consumer software: An empirical study. J. Management Inform. Sys. 30(3) 213-246.
Madey, G. 2013. The SourceForge Research Data Archive (SRDA). University of Notre Dame, Feb. 14. Available at: http://srda.cse.nd.edu
Mahajan, V., E. Muller, F. M. Bass. 1990. New product diffusion models in marketing: A review and directions for research. J. Marketing. 54(1) 1-26.
Mallapragada, G., R. Grewal, G. Lilien. 2012. User-generated open-source products: Founder’s social capital and time to product release. Marketing Sci. 31(3) 474-492.
Meade, N., T. Islam. 2006. Modeling and forecasting the diffusion of innovation: A 25-year review. Internat. J. Forecasting 22(3) 519-545.
Moe, W. W., P. S. Fader. 2002. Using advance purchase orders to forecast new product sales. Marketing Sci. 21(3) 347-364.
Neitz, R. 2015. Extensive Guide to App Store Optimization (ASO) in 2015 – Part 2: Google Play Store. Trademob, June 12. Available at http://www.trademob.com/app-store-optimization-guide-google
Niculescu, M. F., D. J. Wu. 2014. Economics of free under perpetual licensing: Implications for the software industry. Inform. Sys. Res. 25(1) 173-199.
Oestreicher-Singer, G., A. Sundararajan. 2012. Recommendation networks and the long tail of electronic commerce. MIS Quart. 36(1) 65-83.
36
Olson, P. 2013. The win for games: They grab two-thirds of app store sales. Forbes, September 19. Available at http://www.forbes.com/sites/parmyolson/2013/09/19/the-win-for-games-they-grab-two-thirds-of-app-store-sales
Peres, R., E. Muller, V. Mahajan. 2010. Innovation diffusion and new product growth models: A critical review and research directions. Internat. J. Res. Marketing 27(2) 91-106.
Rangaswamy, A., S. Gupta. 2000. Innovation adoption and diffusion in the digital environment: Some research opportunities. V. Mahajan, E. Muller, Y. Wind, eds. New-Product Diffusion Models. Norwell, MA: Kluwer Academic Publishers, 75-96.
Rice, K. 2013. Why pre-launch hype is the key to app success. Kinvey, May 2. Available at http://www.kinvey.com/blog/2545/why-prelaunch-hype-is-the-key-to-app-success
Risselada, H., P. C. Verhoef, T. H. A. Bijmolt. 2014. Dynamic effects of social influence and direct marketing on the adoption of high-technology products. J. Marketing 78(2) 52-68.
Rogers, E. M. 2003. Diffusion of Innovations. New York: Free Press.
Rubin, B. F. 2013. The dirty secret of apps: Many go bust. Wall Street J., March 7. Available at http://online.wsj.com/news/articles/SB10001424127887324582804578346221047028366
Sawhney, M. S., J. Eliashberg. 1996. A parsimonious model for forecasting gross box-office revenues of motion pictures. Marketing Sci. 15(2) 113-131.
Walz, A. 2015. Deconstructing the app store rankings formula with a little mad science. Moz.com, May 27. Available at https://moz.com/blog/app-store-rankings-formula-deconstructed-in-5-mad-science-experiments
Woods, D. 2013. The battle of the freemium and enterprise business models in the task management market. Forbes, March 18. Available at http://www.forbes.com/sites/danwoods/2013/03/18/freemium-enterprise-business-models-task-management-attask-asana
Ye, Y. PhD thesis. Department of ESS, Stanford University; 1987. Interior Algorithms for Linear, Quadratic and Linearly Constrained Non-linear Programming.
Yogev, G. 2012. The Diffusion of Free Products: How Freemium Revenue Model Changes the Strategy and Growth of New Digital Products. Saarbrucken, Germany: Lap Lambert Academic Publishing.
Young, H. P. 2009. Innovation diffusion in heterogeneous populations: Contagion, social influence, and social learning. Amer. Econom. Rev. 99(5) 1899-1924.
Zhong, N., F. Michahelles. 2012. Long tail, or superstar? An analysis of app adoption on the Android market. LARGE 3.0 Conf. 11-14.
Zhou, W., W. Duan. 2012. Online user reviews, product variety, and the long tail: An empirical investigation of online software downloads. Electronic Commerce Res. Appl. 11(3) 275-289.
37
Zhu, F. Zhang, X. 2010. Impact of online consumer reviews on sales: The moderating role of product and consumer characteristics. J. Marketing, 74(2), 133-148
38
Appendices
Appendix A: Comparing model smoothing alternatives
We compared the R2 and the KL divergence measures from our model to other smoothing
alternatives to examine the goodness of fit of our model. First, we examine other variants of the
model. We examine the fit of a model without the recency and δ components separately, and
without both effects (effectively collapsing to the Bass model). Second, we examine other
smoothing methods used in time series modeling. The Hodrick-Prescott (HP) filter removes short
term-cyclical components from the filtered graph, allowing us to separate short-term noises and
retaining the long-term trend (Hodrick and Prescott 1997; Chandrasekaran and Tellis 2011). We
use 129,600 and 14,400 as the smoothing coefficient (λ) commonly used for monthly data
analysis with the HP filter. The Christiano-Fitzgerald filter (CF) has been examined as an
alternative to the HP filter, offering better control over high-frequency fluctuations and better
fitting more granular (e.g., monthly) data (Christiano and Fitzgerald 1998; Lamey et al. 2007;
Van Heerde et al. 2013). We use two months as the minimum length of a software cycle in the
CF filter, and examine 40 and 60 months as the maximum length of the software cycle. We also
examine smoothing with a locally weighted least squared regression (LOWESS, Rust and
Bornman 1982) and with penalized splines (Foutz and Jank 2010; Stremersch and Lemmens
2009). The results are in Table A1 below:
Table A1: Goodness-of-fit comparison between models
Model examined R2 KL Divergence
FDP growth model 0.48 (.26) 0.20 (.22)FDP growth model (without recency) 0.40 (.26) 0.26 (.49)FDP growth model (without δ) 0.34 (.26) 0.30 (.51)Bass model 0.33 (.25) 0.27 (.26)HP filter (λ = 14,400) 0.36 (.22) 0.27 (.27)HP filter (λ = 129,600) 0.28 (.21) 0.36 (.38)CF filter (max cycle = 40) 0.39 (.23) 0.31 (.45)CF filter (max cycle = 60) 0.32 (.23) 0.37 (.58)LOWESS 0.34 (.23) 0.28 (.51)Penalized splines 0.73 (.16) 0.12 (.16)
39
Looking at Table A1, we see that the FDP growth model performs better than all other models
and smoothing methods but one. While the penalized splines model fits the data better than the
FDP growth model, the resulting penalized splines curve is too flexible and does not clean the
short-term trends and outliers that are inherent in monthly-level data, thus offering a “ceiling” of
fit that a smoothing algorithm can reach. If we increase the smoothing parameters of the
penalized splines model to take that into account (see Appendix A in Foutz and Jank 2010 for
further discussion), the fit drops rapidly.
Appendix B: Distribution of pattern types by categories, SourceForge data
Table B1: Distribution of pattern types by categories
Category Diffuse%
Slide%
S&D%
Categorysize
Development 47.2% 23.2% 29.7% 12,074Internet 47.1% 26.5% 26.3% 7,172System Administration 48.0% 23.2% 28.7% 6,267Communications 47.8% 27.2% 25.0% 4,987Games 41.5% 26.7% 31.8% 4,910Science & Engineering 53.6% 16.2% 30.2% 4,317Audio & Video 51.1% 23.3% 25.6% 2,957Security & Utilities 46.3% 23.5% 30.2% 2,542Business & Enterprise 46.1% 27.0% 26.9% 2,387Home & Education 48.7% 19.6% 31.7% 1,607Graphics 48.9% 23.2% 27.9% 1,593Desktop Environment 48.4% 26.3% 25.3% 1,186Other / Unlisted Topic 46.5% 23.2% 30.4% 764Multimedia 47.6% 22.6% 29.8% 477Mobile 47.1% 25.2% 27.7% 242Formats and Protocols 49.1% 33.9% 17.0% 112
Software without an assigned category 52.0% 25.0% 23.0% 5,749
40
Appendix C: Results for the Mobility dataset
Table C1: Statistics and average parameter values for the pattern archetypes
a) Descriptive Statistics Diffuse Slide S&D
No. of patterns 3,412 2,032 1,470% of patterns 49% 30% 21%Average no. of downloads 4,404 874 1,218Median no. of downloads 414 162 196
b) Model parameter values Diffuse Slide S&D
p (initial external effect) 0.016 0.149 0.045q (cumulative effect) 0.083 0.059 0.075r (recency effect) 0.517 0.204 0.214δ (external decay parameter) 0.232 0.746 1.475
c) Share of pattern attributed to: Diffuse Slide S&D
p and δ (external effect) 28.6% 57.2% 15.2%q (cumulative effect) 41.5% 32.7% 71.7%r (recency effect) 29.9% 10.1% 13.1%
41