Community-based diffusion scheme using Markov chainand spectral clustering for mobile social networks
Jegwang Ryu1 • Jiho Park1 • Junyeop Lee1 • Sung-Bong Yang1
� Springer Science+Business Media, LLC 2017
Abstract With the increase in the number of mobile
devices such as tablets and smart watches, mobile social
networks (MSNs) provide great opportunities for people to
exchange information. As a result, information diffusion
has become a critical issue in the emerging MSNs. In this
paper, we address the problem of finding the top-k influ-
ential users who can effectively spread information in a
network, which is referred to as the diffusion minimization
problem. In order to minimize the spreading period, we can
utilize the k-center problem, but which has a time com-
plexity of NP-hard. We propose a community-based dif-
fusion scheme using Markov chain and spectral clustering
(CDMS) to minimize the spreading time by adopting a
community concept based on the geographic regularity of
human mobility in the MSNs. We exploit the Markov chain
to predict a node’s mobility patterns and cluster the pre-
dicted patterns using the spectral graph theory. Finally, we
select the top-k influential nodes in each community.
Simulations are performed using the NS-2, based on the
home-cell community-based mobility model, to show that
the proposed scheme results in MSNs. In addition, we
demonstrate that CDMS outperforms the noncommunity-
based algorithms in terms of the number of nodes and ratio
of k influential nodes.
Keywords Mobile social networks � Information
diffusion � Markov chain � Spectral clustering
1 Introduction
Many companies such as Google, Amazon, and Yahoo
exploit social information of the users of social networks
for effective marketing [1]. For example, to effectively
minimize the marketing cost, companies analyze informa-
tion using people’s social behavioral patterns in online
social networks (OSNs) such as Facebook or Twitter,
because the word-of-mouth advertising technique, wherein
the customers themselves advertise the products and events
to other people, relies on the customers for most of the
promotional efforts [2, 3]. In the past decade, many studies
have been conducted to find the top-k influential users in a
social network. Recently, because of the evolution of
mobile devices and the technological advancements in
wireless network techniques, mobile social networks
(MSNs) are presenting great opportunities for people to
exchange information. MSNs are known as Delay Tolerant
Networks (DTNs) [3, 4], and messages unlike different
cellular infrastructures can be delivered by the store–carry-
forward technique to enable communication. Information
diffusion have effective routing strategy. In the research of
Information diffusion, first small node set propagates
messages to total node. It is effective routing strategy. If
there are small node set to more delivery message than
other nodes, network traffic will decrease. As shown Fig. 1,
Finding Main application in wireless networks is data
offloading to forward messages between nodes in mobile
& Sung-Bong Yang
Jegwang Ryu
Jiho Park
Junyeop Lee
1 Department of Computer Science, Yonsei University, Seoul,
Korea
123
Wireless Netw
DOI 10.1007/s11276-017-1599-6
network, where the most active nodes (the top-k influential
nodes) can propagate to support Cellular Network such as
3G and 4G. Therefore, information diffusion form initial
top-k node set I can exploit finding the proper delivery
active nodes, as a result, our schemes contribute effective
routing strategy in mobile social networks.
Several papers solve these problems through information
dissemination using word-of-mouth techniques in MSNs
[5–7]. Recently, Lu et al. [5] proposed a diffusion scheme to
solve the problem of diffusion minimization in MSNs. The
diffusion scheme under the diffusion model in MSNs can be
formulated as an asymmetric k-center problem, which is
NP-hard and has a time complexity O ( n5), using the (log*
n) approximation algorithm [8]. However, this scheme is not
suitable for large complex networks [8]. To solve this
problem, we propose the CDMS to minimize the diffusion
period in MSNs. The basic idea is to exploit the social
community structure [9]. It has better performance and
shorter time complexity than the schemes finding the top-k
influential nodes in the entire network, because nodes in
each community are strongly connected. For instance,
people tend to demonstrate the regularity of movement
when they are strongly connected, such as in companies, and
houses. [10]. They also demonstrate a mobility pattern,
which provides information such as frequently visited spots.
Our proposed scheme for MSNs is an approach based on the
regularity of movement. To predict the mobility patterns of
nodes in dynamic networks such as MSNs, we employ the
Markov chain to illustrate how a node moves from one spot
to another with a certain transition probability. Then, we can
compute each node’s steady-state vector, which represents
the probability distributions of all spots in a network area.
We construct communities by clustering the vectors and
finally select the top-k influential nodes in each community.
We exploit spectral clustering to divide a node’s steady-state
vector into communities because spectral clustering is
highly effective for community detection [11–14]. We
conducted extensive simulations using the network simu-
lator NS-2 [15], and the results show that the CDMS has
better performance in minimizing the diffusion period in
synthetic networks, when compared to noncommunity-
based schemes.
The technical contributions of this paper can be sum-
marized as follows.
• We introduce a new scheme to solve the problem of
diffusion minimization in MSNs, by exploiting the
mobility patterns of nodes in a network area.
• We refine the contact probabilities using the commu-
nity concept for large-scale MSNs: We employ Markov
chain and spectral clustering to detect community
structures as time goes by.
• We conduct extensive simulations using the network
simulator NS-2, and compare the results of both
nonclustering and clustering schemes.
The rest of this paper is organized as follows. Section 2
explains the related work. In Sect. 3, we define the problem
that we aim to solve in this paper, the assumptions for the
system model, and the problem statement. In Sect. 4, we
explain CDMS in detail. In Sect. 5, we present the simu-
lation results. Finally, the conclusion is presented in
Sect. 6.
2 Background
2.1 Influence maximization in social networks
In the early studies on influence maximization, individual
behaviors were assumed to spread through social contact
information [16–18], and maximizing the spread in social
networks was dependent on specific network infrastruc-
tures such as social behavior. There have been various
studies in the fields of biology, marketing, and data sci-
ence, based on user behaviors. In the past, immunization
strategies had been proposed in the field of biology, to
protect from diseases such as HIV/AIDS, influenza, etc.
Recently, Sun et al. [19] introduced a new metric—con-
nectivity centrality—and its adaptive algorithm, to find
influential users in a targeted vaccination situation, using a
sensor network. In the past decade, influence maximization
has been studied in OSNs such as Facebook [20] and
Twitter [21], for viral marketing. This problem can be
applied to find a small group of influential individuals in
OSNs. Domingos and Richardson [22] first proposed the
problem and solved it using a probabilistic model. Kemp
et al. [23] proposed a greedy algorithm and Wang et al.
[24] designed a community-based greedy algorithm. They
proposed two fundamental stochastic models that are the
independent cascade (IC) model and the linear threshold
(LT) model; these two models propagate through individ-
ual interactions in social networks.
Fig. 1 Information diffusion in many applications
Wireless Netw
123
2.2 Information diffusion in MSNs
Recently, with the evolution of wireless network tech-
niques and mobile devices such as smartphones, tablets,
and smart watches, many companies are interested in
applications to minimize the cost of marketing in MSNs.
Unlike the classical models such as the IC model or the LC
model for OSNs, diffusion models for MSNs exploit node
behaviors such as contact or mobility information. The
diffusion models for MSNs face the problem of finding the
top-k influential nodes to minimize the diffusion period in
MSNs [5, 6]. There are several diffusion schemes to min-
imize the diffusion period in MSNs [5, 6, 25]. Lu et al. [5]
suggested two algorithms—a community-based algorithm
and a distributed set–cover algorithm using the proba-
bilistic model—to minimize the diffusion period in MSNs.
For mobile cellular data offloading in DTNs, Han et al. [25]
suggested a data offloading scheme formulated from
research on information diffusion in MSNs. Recently, Chen
et al. proposed a diffusion scheme using k-means clustering
and the social features in MSNs [6]. Therefore, noncom-
munity-based schemes for diffusion in MSNs have lower
importance than the diffusion schemes based on commu-
nity information. This is because the diffusion schemes that
find the top-k influential nodes within each community are
more effective in minimizing the period than the ones that
find the top-k influential nodes in the whole network. In
this paper, we will solve the problem of diffusion mini-
mization by using social behaviors such as mobility
information in the MSNs. CDMS is also a community-
based diffusion scheme, but it follows an approach that is
different from that of the other community-based schemes.
2.3 Markov chain model
The Markov chain was first introduced by Andrey Markov,
and it is widely used to represent the statistical regularities
in computer science [26]. This theory is mainly utilized for
the prediction of node behavior in MSNs [27–29]. The
algorithm can define the probability and occupation ratio of
node’s movement, and the probability in each spot is rep-
resented by the steady-state distribution vector. To define
the vector, the transition probability matrix P containing
the probability of each node’s movement at each spot is
defined as follows:
p ¼
p11 p12 � � � p1gp21 p22 � � � p2g
..
. ...
� � � ...
pg1 pg2 � � � pgg
26664
37775 ð1Þ
where pij is the probability that a node moves from a cer-
tain spot i to a spot j, where a certain spot is one of the
sections in the network area. If p ið Þ denotes the probabilitydistribution in the i-th step, the rule governing the node’s
mobility can be expressed by the following equation:
p tð Þ ¼ PT� �t
p 0ð Þ ð2Þ
where p 0ð Þ is the initial probability matrix. If the Markov
chain is ergodic, there is a unique steady-state distribution
p with the relation pP = p, where p is the steady-state
distribution vector whose entries are nonnegative and add
to 1. The following are some applications of the Markov
chain in wireless networks and ad hoc networks. Soelisti-
janto et al. [27] proposed a forwarding scheme, which is an
analysis of the traffic distribution among the nodes in social
opportunistic networks, using the Markov model of steady-
state traffic distribution. Lee et al. [28] utilized the semi-
Markov process to predict the distribution of future user
spots. Recently, Yu et al. [29] proposed a new scheme,
which is a Markov-based multihop mobility prediction for
applications such as location-based services or mobile
crowd sensing for MSNs.
2.4 Spectral clustering technique
Clustering is one of the most widely used methods for
research in computer science, such as machine learning and
pattern recognition [13, 14]. Compared to the traditional
clustering techniques such as k-means, spectral clustering
has many advantages and has become one of the most
popular clustering techniques for exploratory data analysis.
Donath and Hoffmann first contributed the spectral graph
theory for partitioning a graph [30]. Fiedler proposed to
solve the graph-partitioning problem using the second-
smallest eigenvalue of the Laplacian matrix of a compo-
nent [31]. In this paper, we exploit the tutorial by Luxburg
[13] on the community structures between nodes. The main
tool to solve the graph-partitioning problem in spectral
clustering is a graph Laplacian spectral clustering algo-
rithm consisting of three matrixes: the similarity matrix S,
the graph Laplacian L, and a matrix Y whose columns are
the k-first multiple eigenvectors corresponding to the k-first
eigenvalues of L. The similarity graph Sij between nodes viand vj is defined as
S vi; vj� �
¼ e�
vi�vjj j22r2
� �
ð3Þ
where r is a scaling parameter to control the width of the
distance among nodes. After constructing the similarity
graph S, the normalized Laplacian Lnorm can be expressed
as:
Lnorm ¼ I � D�1=2SD�1=2 ð4Þ
Wireless Netw
123
where D is the diagonal matrix of S. After calculating the
eigenvalue decomposition to Lnorm, an n 9 k matrix
Y whose columns are the multiple eigenvectors corre-
sponding to the k first eigenvalues, is constructed. The final
step in spectral clustering is to cluster Y, and techniques
such as k-means clustering can be used to cluster Y easily.
Then, a graph is constructed to represent the k subgraphs
that are strongly linked within each subgraph but weakly
linked to each other. From the view point of MSNs, a
subgraph is a community structure among nodes that are
strongly linked to each other.
Compared to the traditional clustering techniques,
spectral clustering has many advantages and has become
one of the most popular clustering techniques for
exploratory data analysis. In this paper, the motivation for
using spectral clustering technique is to solve a problem of
the high-dimensional dataset. In high dimensional data, the
traditional clustering techniques become less precise as the
number of dimensions grows. But, spectral graph theory is
a novel technique for clustering the data based on the
eigenvectors of Laplacian of the similarity graph.
3 Problem statement and system model
3.1 Problem statement
In previous studies, influence maximization was summa-
rized by the following equation:
argminI�V
uðIÞ; jIj � k ð5Þ
where V is the set of all nodes, I is the set of influential
nodes, and c Ið Þ is the expected number of total active nodes
at the end of the influence maximization process in social
networks. Unlike social networks, the diffusion in MSNs
has to consider node behaviors such as contact frequency
and mobility patterns, to publish information because the
network topology changes dynamically with time. The
challenge is to find a subset I that minimizes the diffusion
period in dynamic networks such as MSNs. The diffusion
problem for MSNs is described by the following equation:
argminI�V
uðIÞ; jIj � k ð6Þ
where u Ið Þ is the diffusion time for finding a subset I to
minimize the expected diffusion period. As a result,
information diffusion has to consider both total number of
propagated node from k initial node set and minimum
amount of diffusion time.
3.2 System model
In this section, we introduce our system model and the
assumptions used to solve the problem of diffusion mini-
mization in MSNs. We use HCMM [32] as the mode’s
mobility model because it can model the properties of
human mobility and contact frequency. As shown in [32],
the models have models spatial and temporal properties of
human mobility in social relationship such as contact and
mobility pattern. Each node has own home-community,
speed and location. Each node periodically measures its
historical location information. In our environment, there
are Nm mobile nodes where m is the number of nodes. The
undirected graph for the relationships among nodes is
defined as G = (N, E), where N denotes a finite set of
nodes and E denotes a finite set of links between nodes
based on social behaviors such as contact frequency. We
assume that Ni delivers a message to Nj whenever contact
happens among nodes, and that each node has a memory
space to store delivered messages and the current position.
All nodes have two different states: active and inactive.
Active nodes can deliver a message. However, inactive
nodes cannot deliver a massage and switch their status to
active when they receive a contact from active nodes
during the diffusion period. The diffusion process for dif-
fusion minimization consists of warm-up and diffusion
periods. The warm-up period is the time required to find
the top-k influential nodes by exploiting node behavior.
Diffusion period is the time required to propagate messages
from the top-k influential nodes to the other nodes. When
all nodes are in active states, the diffusion process is ter-
minated. There is a central server (CS), which stores
periodically recorded mobility information of the nodes in
the entire network during the warm-up period. The CS also
has a memory space and a hardware system to analyze the
recorded social information, and creates a similarity graph
G among nodes. As environment of mobile social net-
works, CS is also a mobile node, but it is accomplished in
special-purpose node. Before warm-up period, we already
employ CS. During warm-up period, as special-purpose
node, CS periodically move in entire network area to store
mobility logs. And then create snapshot. As special-pur-
pose node, CS have detected communities using clustering
technique at the end of warm-up period. After the warm-up
period, the CS no longer analyzes information in the
MSNs. Then, the nodes will start contacting each other
from the k influential nodes during the diffusion period.
Figure 2 shows an example of the diffusion process
between mobile nodes Ni and Nj. Both move at t0 and t1.
Node Ni is one of the top-k influential nodes, Nj is an
inactive node, and t0 and t1 are intervals in the diffusion
period. When Ni and Nj are in a communication range R at
t1, contact happens between the two nodes and Nj become
Wireless Netw
123
active. As shown in Fig. 2, the nodes are in one of the two
states, and record the first contact time from the active
nodes. Then, they send the recorded time to the CS. When
all nodes have been switched by the active nodes, the CS
measures the time in the diffusion period. We also explain
in the case when nodes are relatively stable, or even static
in warm-up and diffusion period. During warm-up period,
nodes periodically record their same current position and
communicate with CS. During diffusion period, if there are
no movement in MSNs, there do not increase number of
active nodes. When they contact with other non-static
node, they switched by the active nodes.
In brief, the following assumptions are made:
• Each node in the MSNs has a nodeID that is unique and
has the same radius of communication range. Thus, the
nodes can deliver messages to each other and can
record their information such as current spot, the two
states, and the time in active state.
• Each node has its own community as home or special
spots. This network is composed of communities,
denoted as H = { H1;H2; . . .;Hl}, where l is a
community number.
• The CS is aware of the global information of all the
nodes in the entire network area because it can
periodically record mobility information from nodes.
• We do not consider the main resource consumption (such
as memory space, CPU, battery, power, and bandwidth).
4 Proposed scheme
4.1 Overview
Diffusion schemes in the community concept demonstrate
effective performances in minimizing the diffusion period,
by finding the top-k influential nodes within each
community instead of finding them from the whole net-
work topology. For this reason, we propose the CDMS, a
community-detection-based diffusion scheme using Mar-
kov chain and spectral clustering. Our scheme consists of
three steps. First, we exploit the spot information to learn
the individual behaviors. Second, we present each node as
a vector containing individual behaviors, using the Markov
chain. Finally, the vectors are clustered into one of the
communities through clustering techniques, and we can
determine the most influential node in each community.
4.2 Geographic regularity of node’s movement
The movement of each node is recorded by the CS to
predict the future node behaviors. The CS periodically
accumulates the current position of each node during the
warm-up period. The CS uses a snapshot (SNs), which
represents the locations of all nodes in a network area
during an interval ts; where s is the number of the SN and
t is the interval. By exploiting the SN, the CS can record the
geographic regularity of a node’s movement. After the CS
collects SNs in every interval t, each SN is partitioned into
certain network sections denoted by SP = {SP1, SP2,…,
SPg}, where g is the number of SPs. Figure 3(a), (b), (c),
and (d) show the topologies of the entire network area at
time t0, t1, t2, and t3. As shown in Fig. 3, N2 moves to SP4
from SP2 between SN0 and SN1, while N1 is located in SP1
in both SN0 and SN1. Note that node N2 moves more
actively than node N1 and they have different mobility
patterns. However, nodes N4 and N5 have the same
mobility pattern because they are always located in same
spots at t0, t1, t2, and t3. In this manner, the CS stores the
movements of every node in the warm-up period.
Fig. 2 Diffusion periods at t0and t1
Wireless Netw
123
4.3 Mobility pattern prediction
The goal of our scheme is to predict the mobility pattern of
each node by using the Markov chain, and we assume that
the CS uses some information of Ni such as\ nodeID,
spotID, time interval[ and constructs the transition matrix
P during the warm-up period. The CS can store the
sequence of spotIDs for every node during the warm-up
period. To construct a transition matrix P, the CS calculates
the conditional probability of the node’s next movement
during the warm-up period. As shown in Fig. 4, there are
sixteen cases because there are four network areas: SP1,
SP2, SP3, and SP4. Figure 4 shows the transition matrix P
of node Ni between time t0 and t9. The probability of
movement of Ni from SP1 to SP2 is 0.4. Through these
methods, the CS can construct 4� 4 transition matrix P for
each node using Eq. (1).
After constructing the transition matrix P, the CS cal-
culates vi and solves Eq. (2) using a homogeneous linear
system, which is the steady-state vector of Ni that
represents the probability distribution for the node’s
mobility. There is an instance to explain it in the concrete:
Suppose that the network area is partitioned by four spots
(SP1, SP2, SP3, SP4). Then, vi and vj will be\ 0.8, 0.05,
0.05, 0.1[ and\ 0.5, 0.05, 0.05, 0.4[. Ni will have a
Fig. 3 Snapshots at t0, t1, t2,
and t3
Fig. 4 Transition matrix P of node Ni between t0 and t9
Wireless Netw
123
higher probability of being located in SP1 than in other
spots. Otherwise, Nj will move in SP1 or SP4. The vectors
vi and vj can be calculated as mentioned above. Note that virepresents a mobility pattern of how a node will stay in
certain spots in the network area.
4.4 Applying spectral clustering
To minimize the diffusion period in MSNs, we exploit a
community concept using the spectral clustering technique.
Algorithm 1 describes the spectral clustering algorithm to
help understand our proposed scheme. Spectral clustering
consists of three steps, and we describe each step in detail.
The first step of the spectral clustering algorithm con-
structs a similarity matrix S 2, where Sij � 0 reflects the
relationships between all the nodes according to Eq. (3).
The next step is constructing a matrix L. This is the main
tool in the spectral clustering algorithm used to solve the
graph-partitioning problem by exploiting the eigenvalues,
the eigenvectors of L that are calculated by Eq. (4). Then,
the n 9 k matrix Y is created, where the matrix is repre-
sented by the multiple eigenvectors corresponding to the k
first eigenvalues of L. In the final step of spectral cluster-
ing, Y is classified by k-means clustering into communities
C = {C1, C2, C3,…, Ck} and the number of communities is
equal to k.
4.5 Finding seed node set
The CS selects one per community by calculating the
Euclidian distances between the steady-state vectors cor-
responding to the top-k influential nodes. The distance
between vci and vji is defined as
D x; yð Þ ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPgi¼1 xi � yið Þ2
qffiffiffig
p ð7Þ
where x and y are vci and vji , respectively.v
ci is the centroid
vector and vji is one of the steady-state vectors corre-
sponding to the nodes in Ci. Algorithm 2 is used to
determine the k influential nodes as per Eq. (7). As shown
Algorithm 2, each seed node i is selected in each com-
munity, where a seed node i in Ci has the most similar
value with centroid vector vji in each community Ci. CS
determine the top-k node set I in k-iteration. After a warm-
up period, the states of the inactive nodes are switched to
active by the top-k influential nodes, and the CS is no
longer involved. When all the nodes have been contacted
by active nodes, the diffusion process is terminated, as
mentioned above in Sect. 3.
5 Performance evaluation
5.1 Simulation environment
We use the network simulator NS-2 v2.35 to evaluate the
CDMS due to the build diffusion process and analyze its
result. All nodes in a network area move according to the
HCMM [32]. HCMM is frequently used in MSN simula-
tions. The number of nodes range from 40 to 90. Since, in
the real world, k should be a small value, the number of
top-k influential nodes can be up to 20% of the total nodes
in each setting. The size of the network is set to 450 m �450 m and the number of grids is 9 where each grid is an
SP. Each node has its own community Hi as a home, and
the total number of communities is 4. Communication
ranges are set to 1, 5, 10, 15 and 20 m. Velocity of a node
is from 2 to 10 m/s, which is appropriate for the movement
of both people and vehicles in MSNs. The warm-up period
is set to 500 s to collect the movement patterns of the nodes
and the diffusion period is 5000 s (excluding the warm-up
period). Each node is first located in its community cell as
the HCMM environment. Table 1 shows the parameters of
the simulation environment, in detail.
Figure 5 shows our simulated network and the HCMM
environment that represents one of the home cells (such as
a community). The nodes frequently move to one of their
home cells, issue logs of current context, and then com-
municate with the CS (please see Sect. 4). The CS builds
SNs every 10 s. At the end of the warm-up period, the CS
calculates the mobility pattern of each node using a Mar-
kov process and extracts a steady-state vector that
Algorithm 1. Community detection using spectral clustering
Input: Data points v1, v2 …vm, Number of cluster k
Output: k-community groups
1 Construct a similarity matrix S and Laplacian matrix Lnorm
2 Use eigenvalue decomposition of Lnorm
3 Form the matrix Y by stacking the eigenvectors in columns
corresponding consist first k eigenvalue of Lnorm
4 Use k-means clustering on the rows of Y
Algorithm 2. Determining k influential nodes
Input: Top-k influential node set: I = { ; }
Output: Seed node set I
for i ¼ 1. . .k do
vci / centroid vector of community i
seed / minj
D vci ; vji
� �� �
I / I [ {seed}
end for
Wireless Netw
123
represents a distribution of how long a node would stay at a
certain SP. After extracting vectors for all the nodes, the
CS clusters them using the spectral clustering technique.
The top-k influential nodes are finally selected; these
influential nodes are more similar to centroids than the
other nodes in each community. After a warm-up period,
the inactive nodes are affected by the top-k influential
nodes. Once all nodes have received contact from the
active nodes, the simulation is terminated. We run each
scheme 20 times and measure the average time without the
warm-up period.
5.2 Simulation results
CDMS is compared to three schemes—RAND, K-CEN-
TER [8], and a community-based diffusion scheme using
Markov chain and k-means clustering (CDMK). The top-k
influential nodes in RAND are selected randomly. The
K-CENTER scheme is based on a graph G, where G is
represented by the contact frequency of all nodes as men-
tioned above in Sect. 3. The selection of the top-k influ-
ential nodes can be formulated using the asymmetric
k-center problem, which is a solution for diffusion mini-
mization under our diffusion model. The CDMK is similar
to our proposed scheme because the authors in [6] solved a
problem of diffusion minimization using social information
and community concept. However, the experiment in
HCMM did not use dynamic social features. Therefore, we
simply changed the social features as steady-state vectors
and implemented the community concept through k-means
clustering, as mentioned in [6].
5.2.1 Number of nodes
In order to evaluate the performance in different network
environments, we examined the performance of each
scheme by varying the number of nodes from 40 to 90. As
expected, the diffusion times of all schemes decreased as
Table 1 Simulation parameters
Parameter (unit) Value (default)
Number of nodes 40, 50, 60, 70, 80, 90 (80)
Ratio of k influential nodes (%) 5, 7.5, 10, 15, 20 (10)
Size of the network (m2) 450 9 450
Number of home-cell communities 4
Community size (m2) 150 � 150
Node speed (m=s) 2 * 10
Radius of communication range (m) 1, 5, 10, 15, 20 (10)
Interval for SN (s) 10
Warm-up period (s) 500
Total diffusion process (s) 5000
Fig. 5 HCMM cells in network simulation
Fig. 6 Diffusion times with
different numbers of nodes
Wireless Netw
123
the number of nodes increased. As shown in Fig. 6, RAND
had the worst performance and K-CENTER had a better
performance than RAND. As the number of nodes
increased, the diffusion time of K-CENTER did not change
much, compared to that of the community-based diffusion
scheme, especially in the case of more dynamic complex
networks. Meanwhile, the performances of the community-
based schemes were up to 10% higher than that of the
noncommunity-based schemes. This is because the
community-based schemes select influential nodes from
individual communities rather than from the entire network
as done in the K-CENTER scheme. In Fig. 6, when the
number of nodes is 40, the community-based schemes have
the best performance because the number of clusters is
equal to the number of home-cell communities mentioned
above in Table 1. CDMS has the best performance except
in the cases of 40 and 60 nodes because the spectral
clustering technique outperforms the traditional clustering
Fig. 7 Diffusion times with
different ratios of influential
nodes
Fig. 8 Diffusion times with
different communication ranges
Wireless Netw
123
algorithms such as the k-means algorithm [13]. Although
CDMK has better performance than our scheme in the 40
and 60-node cases, the difference in the diffusion times
between the two community-based schemes is small.
5.2.2 Ratio of influential nodes
Because the number of influential nodes in the real world
should be small, we select k to be no higher than 20% of
the total nodes for each setting. Figure 7 shows the average
diffusion times for different numbers of influential nodes.
However, there is no performance difference when k is
more than 20% of the total nodes. RAND still has the worst
performance while K-CENTER performs better than
RAND. Community-based schemes always demonstrate
better performances than the noncommunity-based
schemes and CDMS has the best performance. In addition,
the diffusion time of K-CENTER does not change much
because of the increase in the ratio of k influential nodes,
when compared to the community-based schemes. This
Fig. 9 Percentage of active
nodes in diffusion period.
a Communication range of 1 m.
b Communication range of
10 m
Wireless Netw
123
means that K-CENTER chooses influential nodes in only
large communities in the case of dense environments. Since
noncommunity-based schemes do not consider a commu-
nity structure, it difficult to find critical nodes such as
isolated or nonactive nodes for diffusion. Meanwhile, in
community-based schemes, critical nodes have high con-
tact probabilities with nodes in their communities because
of the selected influential node in each community, where
nodes in each community have similar mobility patterns.
Therefore, community-based schemes have better perfor-
mances than noncommunity-based schemes.
5.2.3 Communication ranges
Figure 8 shows the diffusion times for communication
ranges of 1, 5, 10, 15, and 20 m. When the communication
range is less than 1 m, not all schemes can propagate
information to all the nodes because a few nodes are not
affected by the k-influential nodes within the diffusion
period. Thus, only in case of 1 m, we measure the diffusion
time when the percentage is equal to a threshold value
defined as 95%, which is the maximum percentage of the
affective nodes. Increase in the communication range is
related to the environment of dense MSNs. Thus, as the
communication range becomes wider, all schemes have
shorter diffusion times because the contact probabilities
between nodes increase. As shown in Fig. 8, the perfor-
mance of RAND is the worst and K-CENTER outperforms
RAND. However, the community-based schemes have
better performances than RAND and K-CENTER. When
the range is 1 m, the differences in the performances
among schemes are too large because of sparse MSNs.
Meanwhile, as shown in Fig. 8, the community-based
schemes in dense MSNs such as 5, 10, 15, and 20 m out-
perform the other two algorithms, but the differences in
performances are relatively small when compared to the
case of 1 m because there are many nodes such as isolated
or nonactive nodes in sparse MSNs. CDMS and CDMK
consider these nodes by constructing community structures
with varying distributions for the node’s mobility patterns.
Therefore, community-based schemes can be implemented
well in sparse networks.
5.2.4 Percentages of active nodes in diffusion period
Finally, we compare the number of active nodes for each
scheme during the diffusion period. Figure 9 shows the
percentage of active nodes for a diffusion period. Fig-
ure 9(a) and (b) show sparse and dense networks, respec-
tively, based on each node’s communication range. When
the communication range is 1 m, a few nodes may not be
affected by the k influential nodes within the diffusion
period. Thus, we measure the diffusion time when the
percentage is equal to a threshold value, as mentioned
above in Sect. 5.2.3. Figure 9 (a) shows the average per-
centage of active nodes within a communication range
1 m, where RAND has the worst performance and
K-CENTER has a better performance than RAND. Com-
munity-based schemes have shorter diffusion processes
than RAND and K-CENTER. CDMS has the best perfor-
mance for most of the diffusion time except for the interval
between 420 and 520 s. Figure 9(b) shows the average
percentage of active nodes for a communication range of
10 m, where K-CENTER has the best performance
between 20 and 130 s; however, the K-CENTER
scheme has a problem in propagation for a few discon-
nected or nonactive nodes. Meanwhile, community-based
schemes still outperform the other two schemes for the
same reasons mentioned above in Sect. 5.2.3. In summary,
the community-based diffusion schemes reduce the term of
the diffusion process in the sparse network for propagation
through the entire network through finding isolated or
nonactive nodes, where these nodes are propagated by a
node, which has the same mobility pattern.
6 Conclusion
We addressed a problem of finding the top-k influential
nodes to propagate information effectively to nodes in a
dynamic network as quick as possible, referred to as the
diffusion minimization problem. In this paper, we analyzed
solutions for the diffusion minimization problem in MSNs
by proposing the CDMS, which is a novel diffusion
scheme in which influential nodes are selected through
node behaviors and techniques of community detection in
MSNs. It was more effective to solve the diffusion mini-
mization because the influential nodes were selected from
within communities instead of from the entire network
topology. Since community-based diffusion schemes also
considered nonactive nodes and isolated nodes, the simu-
lation results showed that the community-based schemes
had better performances compared to the noncommunity-
based schemes regardless of the sparseness of the MSNs. In
addition, the spectral clustering technique has many
advantages over k-means clustering. For that reason,
CDMS has a better performance compared to CDMK. As
all the methods have warm-up time and CS. Therefore, we
execute information diffusion strategy algorithms in the
mobile node, since the main resource consumption (such as
memory, CPU, and battery) of the mobile app will be cost
by central server part in warm-up period, and is related
with the sensing frequency, which is not the focus of this
paper. As a future work, we plan to study the various social
behaviors among nodes and the recent clustering
Wireless Netw
123
techniques for detecting communities, to diffuse informa-
tion more effectively with the various resources in MSNs.
Acknowledgements This research was supported by the Basic
Science Research Program through the National Research Foundation
of Korea (NRF) funded by the Ministry of Education, Science, and
Technology (2016R1A2B4010142).
References
1. Ma, H., Yang, H., Lyu, M. R., & King, I. (2008). Mining social
networks using heat diffusion processes for marketing candidates
selection. In Proceedings of the 17th ACM conference on Infor-
mation and knowledge management (pp. 233–242).
2. Richardson, M., & Domingos, P. (2002). Mining knowledge-
sharing sites for viral marketing. In Proceedings of the 8th ACM
SIGKDD international conference on Knowledge discovery and
data mining (pp. 61–70).
3. Nguyen, H. A., & Silvia, G. (2009). Routing in opportunistic
networks. International Journal of Ambient Computing and
Intelligence, 1(3), 19–38.
4. Conti, M., Giordano, S., May, M., & Passarella, A. (2010). From
opportunistic networks to opportunistic computing. IEEE Com-
munications Magazine, 48(9), 126–139.
5. Lu, Z., Wen, Y., & Cao, G. (2014). Information diffusion in
mobile social networks: The speed perspective. In Proceedings of
IEEE INFOCOM (pp. 1932–1940).
6. Chen, X., & Xiong, K. (2015). Dynamic social feature-based
diffusion in mobile social networks. In Proceedings of IEEE/CIC
International Conference on Communications in China (ICCC)
(pp. 1–6).
7. Myers, S. A., Zhu, C., & Leskovec, J. (2012). Information dif-
fusion and external influence in networks. In Proceedings of the
18th ACM SIGKDD international conference on knowledge dis-
covery and data mining (pp. 33–41).
8. Panigrahy, R., & Vishwanathan, S. (1998). An O (log*n)
approximation algorithm for the asymmetric p-center problem.
Journal of Algorithms, 27(2), 259–268.
9. Girvan, M., & Newman, M. E. (2002). Community structure in
social and biological networks. Proceedings of the National
Academy of Sciences, 99(12), 7821–7826.
10. Hsu, W. J., Spyropoulos, T., Psounis, K., & Helmy, A. (2007).
Modeling time-variant user mobility in wireless mobile networks.
In Proceedings of IEEE INFOCOM (pp. 758–766).
11. van Gennip, Y., Hunter, B., Ahn, R., Elliott, P., Luh, K.,
Halvorson, M., et al. (2013). Community detection using spectral
clustering on sparse geosocial data. SIAM Journal on Applied
Mathematics., 73(1), 67–83.
12. Zhang, S., Wang, R. S., & Zhang, X. S. (2007). Identification of
overlapping community structure in complex networks using
fuzzy c-means clustering. Statistical Mechanics and its Applica-
tions, 374(1), 483–490.
13. Von Luxburg, U. (2007). A tutorial on spectral clustering.
Statistics and computing, 17(4), 395–416.
14. Ng, A. Y., Jordan, M. I., & Weiss, Y. (2001). On spectral clus-
tering: Analysis and an algorithm. In Proceedings of Advances in
Neural Information Processing Systems. Cambridge, MA: MIT
Press.
15. Network Simulator-2. (2014). http://www.isi.edu/nsnam/ns/.
16. Christakis, N. A., & Fowler, J. H. (2007). The spread of obesity in
a large social network over 32 years. New England Journal of
Medicine, 357(4), 370–379.
17. Centola, D., Eguıluz, V. M., & Macy, M. W. (2007). Cascade
dynamics of complex propagation. Physica A: Statistical
Mechanics and its Applications, 374(1), 449–456.
18. Lambiotte, R., & Panzarasa, P. (2009). Communities, knowledge
creation, and information diffusion. Journal of Informetrics, 3(3),
180–190.
19. Sun, X., Lu, Z., Zhang, X., Salathe, M., & Cao, G. (2015). Tar-
geted vaccination based on a wireless sensor system. In Pro-
ceedings of Pervasive Computing and communications
workshops (pp. 215–220).
20. Bakshy, E., Rosenn, I., Marlow, C., & Adamic, L. (2012). The
role of social networks in information diffusion. In Proceedings
of the 21th international conference on World Wide Web (pp.
519–528).
21. Romero, D. M., Meeder, B., & Kleinberg, J. (2011). Differences
in the mechanics of information diffusion across topics: Idioms,
political hashtags, and complex contagion on twitter. In Pro-
ceedings of the 20th international conference on World wide web
(pp. 695–704).
22. Domingos, P., & Richardson, M. (2001). Mining the network
value of customers. In Proceedings of the 17th ACM SIGKDD
international conference on Knowledge discovery and data
mining (pp. 57–66).
23. Kempe, D., Kleinberg, J., & Tardos, E. (2003). Maximizing the
spread of influence through a social network. In Proceedings of
the 9th ACM SIGKDD international conference on knowledge
discovery and data mining (pp. 137–146).
24. Wang, Y., Cong, G., Song, G., & Xie, K. (2010). Community-
based greedy algorithm for mining top-k influential nodes in
mobile social networks. In Proceedings of the 16th ACM
SIGKDD international conference on Knowledge discovery and
data mining (pp. 1039–1048).
25. Han, B., Hui, P., Kumar, V. A., Marathe, M. V., Shao, J., &
Srinivasan, A. (2012). Mobile data offloading through oppor-
tunistic communications and social participation. IEEE Trans-
actions on Mobile Computing, 11(5), 821–834.
26. Markov chain. (2016). https://en.wikipedia.org/wiki/Markov_
chain.
27. Soelistijanto, B., & Howarth, M. (2012). Traffic distribution and
network capacity analysis in social opportunistic networks. In
Proceedings of the 8th IEEE international conference on the
wireless and mobile computing, networking and communications
(WiMob) (pp. 823–830).
28. Lee, J. K., & Hou, J. C. (2006). Modeling steady-state and
transient behaviors of user mobility: Formulation, analysis, and
application. In Proceedings of the 7th ACM international sym-
posium on mobile ad hoc networking and computing (pp. 85–96).
29. Yu, Z., Yu, Z., & Chen, Y. (2016). Multi-hop mobility prediction.
Mobile Networks and Applications, 21(2), 367–374.
30. Donath, W. E., & Hoffman, A. J. (1973). Lower bounds for the
partitioning of graphs. IBM Journal of Research and Develop-
ment, 17(5), 420–425.
31. Fiedler, M. (1973). Algebraic connectivity of graphs. Cze-
choslovak Mathematical Journal, 23(2), 298–305.
32. Boldrini, C., & Passarella, A. (2010). HCMM: Modelling spatial
and temporal properties of human mobility driven by users’ social
relationships. Computer Communications, 33(9), 1056–1074.
Wireless Netw
123
Jegwang Ryu is currently an
M.S. candidate in computer
science at Yonsei University in
Korea. His research interests
include mobile social networks,
delay tolerant networks and
machine learning.
Jiho Park is currently an Ph.D.
candidate in computer science at
Yonsei University in Korea. His
research interests include mobile
social networks, machine Learn-
ing, deep learning and social
network analysis.
Junyeop Lee is currently an
Ph.D. candidate in computer sci-
ence at Yonsei University in
Korea. His research interests
include mobile social networks,
machine Learning, deep learning
and social network analysis.
Sung-Bong Yang received his
M.S. and Ph.D. from the Dept.
of Computer Science at the
University of Oklahoma in 1986
and 1992, respectively. He has
been a professor at Yonsei
University since 1994. His
research interests include graph
algorithms, mobile computing,
machine learning and social
network analysis.
Wireless Netw
123