+ All Categories
Home > Documents > [IEEE 2014 Sixth International Conference on Communication Systems and Networks (COMSNETS) -...

[IEEE 2014 Sixth International Conference on Communication Systems and Networks (COMSNETS) -...

Date post: 27-Jan-2017
Category:
Upload: balaraman
View: 214 times
Download: 2 times
Share this document with a friend
6
Temporal Analysis of Telecom Call Graphs Saket Gurukar Computer Science and Engineering Indian Institute of Techology, Madras Email: [email protected] Abstct-Real world graphs like call graphs, email commu- nication graphs are temporal in nature in which edges between nodes exist only for a limited span of time. Temporal analysis can lead to new insights such as densification laws and shrinking diameters. In this paper we have analyzed temporal properties like diameter, clustering coefficient, number of calls and other properties of Call Detail Records of more than 1 billion calls. To analyze the data we used moving windows at multiple time scales, such as day-night windows, weekday-weekend windows, etc. We also analyzed the number of unique calls with respect to days of week which lead to the rather surprising conclusion that no day of week dominates other days in terms of highest number of unique calls. To best of our knowledge, this is the first study of temporal properties on telecom call graphs with this particular set of splits. I. INTRODUCTION The analysis of complex systems like mobile calls, inteet, biological networks, citations analysis, emails can be done with the help of graphs in which entities are represented as nodes and interaction between them is represented by edges [1]. In this analysis, the nodes and edges are considered static which implies graph topology will not change with time. However the graph topology in communication networks, say call graphs, changes with time because after some time people will disconnect the call. Recently there has been some work in the analysis of such systems using temporal graphs [2] where nodes and edges have timestamps and will be present in the graph depending on their timestamps. Many real graphs are temporal in nature and hence static analysis of such graphs might lead to different inferences. Prior [3], graph generation models assumed that average degree of nodes remains constant and diameter of graph increases as network grows, but with temporal analysis Leskovec et al. found that in real graphs, average degree of nodes increases and diameter of graph decreases as network grows. This temporal analysis brought out the false assumptions in the static network generation models. Information diffusion process in static graphs is incorrectly captured [4]. For example refer Figure 1 of example call graph in which nodes represent people and edge between them represent the time of call. For simplicity, assume duration of call to be l. Figure la and Ib shows communication between people B and C at time tl = 1 and that of between A and B at time t2 = 5 respectively. Figure Ic shows static representation of call graph in which no temporal information 978-1-4799-3635-9/14/$31.00 ©2014 IEEE Balaraman Ravindran Computer Science and Engineering Indian Institute of Techology, Madras Email: [email protected] is stored. Now, one can infer om static call graph Ic that information available at A can be made available to C through B but in reality B does not communicate with C after helshe has communicated with A. a) ® b ) e) L t ' = © t 1 = 1 Fig. 1: Example Call graph a) Communication between Band C at tl = 1. b) Communication between A and B at t2 = 5. c) Static graph with no temporal information. In this paper, we analyze various temporal properties of Call detail records containing more than I billion calls. With this magnitude of calls, performing temporal analysis on short time period is computationally expensive. Hence we address this problem by proposing to study graph generated by aggregating data over different time windows. The goal of the study is to identify differences in calling patterns when windows range over different time periods. We chose a day-night split, i.e., calls made during a single day were aggregated together and calls made during a single night were aggregated together, a weekday-weekend split, i.e., calls made during a given week and the subsequent weekend were aggregated separately, a uniform time window, i.e., calls made during successive n days were aggregated together; and cumulative weeks, i.e., calls made till the end of a certain week starting from week 0 were accumulated. The remaining paper is organized as follows, Section II reports the related work, Section III explains dataset and dataset preparation, Section IV reports the static properties values and observations, Section V explains the temporal time windows, Section VI reports the results and discuss about them, Section VII discusses the choice of time window and Section VIII concludes the paper. II. RELATED WORK The static analysis of mobile networks with focus on struc- ture and tie strength [5] shows the presence of strong ties inside communities and weak ties between communities in mobile
Transcript

Temporal Analysis of Telecom Call Graphs

Saket Gurukar Computer Science and Engineering

Indian Institute of Techology, Madras

Email: [email protected]

Abstract-Real world graphs like call graphs, email commu­nication graphs are temporal in nature in which edges between nodes exist only for a limited span of time. Temporal analysis can lead to new insights such as densification laws and shrinking diameters. In this paper we have analyzed temporal properties like diameter, clustering coefficient, number of calls and other properties of Call Detail Records of more than 1 billion calls. To analyze the data we used moving windows at multiple time scales, such as day-night windows, weekday-weekend windows, etc. We also analyzed the number of unique calls with respect to days of week which lead to the rather surprising conclusion that no day of week dominates other days in terms of highest number of unique calls. To best of our knowledge, this is the first study of temporal properties on telecom call graphs with this particular set of splits.

I. INTRODUCTION

The analysis of complex systems like mobile calls, internet,

biological networks, citations analysis, emails can be done

with the help of graphs in which entities are represented as

nodes and interaction between them is represented by edges

[1]. In this analysis, the nodes and edges are considered static

which implies graph topology will not change with time.

However the graph topology in communication networks, say

call graphs, changes with time because after some time people

will disconnect the call. Recently there has been some work in

the analysis of such systems using temporal graphs [2] where

nodes and edges have timestamps and will be present in the

graph depending on their timestamps.

Many real graphs are temporal in nature and hence static

analysis of such graphs might lead to different inferences. Prior

[3], graph generation models assumed that average degree of

nodes remains constant and diameter of graph increases as

network grows, but with temporal analysis Leskovec et al.

found that in real graphs, average degree of nodes increases

and diameter of graph decreases as network grows. This

temporal analysis brought out the false assumptions in the

static network generation models.

Information diffusion process in static graphs is incorrectly

captured [4]. For example refer Figure 1 of example call

graph in which nodes represent people and edge between them

represent the time of call. For simplicity, assume duration of

call to be l. Figure la and Ib shows communication between

people B and C at time tl = 1 and that of between A

and B at time t2 = 5 respectively. Figure I c shows static

representation of call graph in which no temporal information

978-1-4799-3635-9/14/$31.00 ©2014 IEEE

Balaraman Ravindran Computer Science and Engineering

Indian Institute of Techology, Madras

Email: [email protected]

is stored. Now, one can infer from static call graph I c that

information available at A can be made available to C through

B but in reality B does not communicate with C after helshe

has communicated with A.

a) ® b) e)L t'=� � © t 1= 1

Fig. 1: Example Call graph a) Communication between Band

C at tl = 1. b) Communication between A and B at t2 = 5. c) Static graph with no temporal information.

In this paper, we analyze various temporal properties of Call

detail records containing more than I billion calls. With this

magnitude of calls, performing temporal analysis on short time

period is computationally expensive. Hence we address this

problem by proposing to study graph generated by aggregating

data over different time windows.

The goal of the study is to identify differences in calling

patterns when windows range over different time periods.

We chose a day-night split, i.e., calls made during a single

day were aggregated together and calls made during a single

night were aggregated together, a weekday-weekend split, i.e.,

calls made during a given week and the subsequent weekend

were aggregated separately, a uniform time window, i.e., calls

made during successive n days were aggregated together; and

cumulative weeks, i.e., calls made till the end of a certain week

starting from week 0 were accumulated.

The remaining paper is organized as follows, Section II

reports the related work, Section III explains dataset and

dataset preparation, Section IV reports the static properties

values and observations, Section V explains the temporal time

windows, Section VI reports the results and discuss about

them, Section VII discusses the choice of time window and

Section VIII concludes the paper.

II. RELATED WORK

The static analysis of mobile networks with focus on struc­

ture and tie strength [5] shows the presence of strong ties inside

communities and weak ties between communities in mobile

network. lP Onnela et al. also showed removal of weak ties

leads to sudden broke down of largest connected component as

compared to removal of strong ties. One detailed static study

of mobile network [6] analyses degree, strength, and weight

distributions, topological assortativity, weighted assortativity,

clustering and weighted clustering.

M. Karsai et al. [7] analyzed the spreading of infec­

tion/rumour on mobile call graphs and email logs. They

proposed four null models and found that spreading in original

network is slow as compared to null models where correlations

were destroyed, hence stating small but slow world.

The study of two temporal properties diameter and density

in [3] proposes densification of graph i.e. increase in average

degree of nodes over time and shrinking diameter over time as

graph grows. This result have implications in graph generation

models, graph sampling, prediction of next state of graph and

also abnormality detection.

Analysis of temporal properties like degree distribution, neigh­

borhood distribution, cliques and strongly connected compo­

nent over time of call and SMS graphs is done in [8] which

also proposes treasure hunt model for mobile call graphs. The

time window specified is Uniform day time window for two

operators, while we have explored other time windows.

Gautier Krings et al. [9] analyzed the effect of time window

on telecom networks with focus on link dynamics. Although

they analyzed the effect of time windows, their focus is dif­

ferent from ours. The temporal properties which we analyzed

are completely different with the focus on patterns of those

temporal properties on different time windows.

III. DATASET

A Call Detail Record (CDR) of mobile telecom operator

contains information related to calls like caller number, called

number, time at which call is initiated, duration of call and

many other details. CDR analysis is done by treating people

as nodes and calls between them as edges.

A. DataSet Preparation

All calls are stored in structured text files and various details

of a single call are recorded in this files. Caller No, Called No

, Time of Call and Duration of all calls were extracted from

more than I billion calls using Apache Pig Script. For a spe­

cific time window say uniform day time window, multiple calls

between two persons in a day were considered as single call

since considering multiple calls requires more computational

power. We performed our analysis on two 2.4GHz Quad-Core

Intel Xeon processor, 61 44kb L2 cache and 24Gb running main

memory. The analysis of properties was done using Stanford

SNAP tool and igraph tool in R.

IV. STATIC PROPERTIES

In static graph analysis, multiple calls from one person to

other are considered as single directed edge. Weights can be

added to the edges [5] depending upon the duration of call

Properties Values No of nodes 1771134 No of edges 20510811

No of weakly connected components (WCC) 18 Size of Maximum WCC 1766905

No of edges in Maximum WCC 20508042 Largest hi-connected component nodes 1465195 Largest hi -connected component edges 20199413

Clustering Coefficient 0.063308 Diameter 12

Reciprocity 0.4394683 Density 6.538532e-06

Transitivity 0.01247928

TABLE I: Static properties of Call graph

between two persons but the properties we analyzed does not

differ for edge weighted graphs.

The values of properties of static call graph are shown in

Table I. The number of nodes indicates the actual number of

customers subscribed to our customer for a specific region. The

number of edges indicates the total number of unique calls in

span of 90 days. Note that actual number of calls is more than

I billion but number of unique calls is near to 20 million. The

ratio between number of unique calls to the total number of

calls is 0.02 signifies the existence of large number of calls

between same two persons.

The number of weakly connected components are 18 of

which the number of nodes in maximum WCC is 99% of total

number of nodes signifying the global connectivity between

persons. The diameter of the graph is 12 whereas the diameter

reported in [8] is 20 but the location of both operators

varies continent wise . The low value of density signifies the

sparseness of the call graph.

V. TEMPORAL PROPERTIES

The call detail records of our operator consists of 90 days

of call records. The various temporal time windows on which

we performed our analysis, we call them as

A. Day Night Time Window

In this time window, all calls made in day light from 6 am

to 5:59 pm are aggregated to form a day time graph while

calls made in night from 6:00 pm to 5:59 am are aggregated

to form a night time graph. So for 90 days, a total of 180

alternate day and night graphs were created.

This time window helps in analyzing call patterns and

properties of call graph in day and night time and also helps

in anomaly detection.

B. Uniform Day Time Window

In this time window, for each day a graph is created by

aggregating all calls from that day. Some people may not even

initiate a single call in a specific day, hence the number of

nodes across graphs varies from day to day. Since 90 days of

CDR data is available, total 90 such graphs were created.

This time window analyzes call patterns and properties of

graph on daily basis and may even help in anomaly detection.

C. Weekday and Weekend Time Window

In this time window, calls in weekday of a specific week are

aggregated to form a graph and same procedure is followed

for weekend of that specific week. For 90 days, a total of 25

weekday and weekend graphs were created.

This time window helps in analyzing patterns and properties

of calls across weekdays and weekends since calling patterns

might be different because of holidays in weekends.

D. Cumulative Week Time Window

In this time window, all calls in each week are aggregated

from day 0 till that week. For instance, time snapshot of graph

for 3th week would contain all calls made from week 0 to week

3. So for 90 days, total of 13 such cumulative week graphs

were created.

This time window represents the network growth phase and

helps in inference about network growth.

The temporal properties analyzed for mentioned time win­

dows are number of nodes, number of edges, number of

bidirectional edges, number of closed triads, number of open

triads, clustering coefficient, effective and full diameter.

The number of nodes across a time window say uniform day

time window signifies the number of people initiating atleast

a single call on that day. The number of edges across a time

window signifies the total number of unique calls between

people in that time window. The number of bidirectional edges

signifies reciprocity in the call graph for a time window. The

number of open triads signifies number of people who are at

a distance of one hop while number of closed triads signifies

effect of triadic closure. The full diameter is maximum dis­

tance between two nodes while effective diameter of graph is

the 90th percentile distance between two nodes. The clustering

coefficient of graph is 3 * number of closed triads/ number of

open triads.

VI. RESULTS AND DISCUSSION

The results of experiments on the four time window as

mentioned in section V is discussed here . The call details

records contains calls from 31 st Jan 2010, 4 pm onwards to

30th April 2010 till 11 :59 pm (90 days).

A. Day Night time window

The temporal properties results of day and night time

window are shown in Figure 2 and 3. Since for first day, calls

are recorded from 4 pm, we discard calls from 4 pm to 5:59

pm and hence the first data point is night point.

The full diameter and effective diameter of this time window

graphs increases in night graph compared to day time graph

while clustering coefficient decreases.

The significant drop of number of calls in Figure 3 is not

due to missing data since call records for first night graph of

Sunday 3Pt Jan 2010 from 6 pm to Monday pt Feb 5:59 am

is available. The drop might be due to the public holiday on

Monday pt Feb (3 consecutive holidays) in the region where

our operator records the calls.

Time vs CllJstCf, Eff. Di�m€ter, Full Oi�meter

100r--�-�--�-�-�-�--�---r------,

Clustering Coeffident --+-­Effectivedi�meter_

FuliDiameter __

O.OlL--�-�--�-�-�-�--�-�-----' o

Fig. 2: The temporal properties of Call graph on day and night

time window. The first data point represents night graph.

The changes in number of edges, nodes, bidirectional edges

are comparatively significant in day time graph and night time

graph.

B. Un!form Day time window

The temporal properties results of uniform day time window

is shown in Figure 5. For the first day, since CDR contains

calls from 4 pm, the total number of calls on first day is

very less and hence the properties of graph on first day varies

significantly from the other day graphs.

The diameter and effective diameter of first day graph is

significantly high due to less number of calls but the number

of people initiating the call are comparatively high resulting

in increase of diameter. The clustering coefficient of first day

is significantly low compared to other days .

In right Figure of Figure 5, all the five properties have a

significant drop in values in first day. The number of edges

for latter days seems to regular interval peaks. So, to verify

if any day say Sunday dominates all other days in terms of

number of unique calls, we found the number of calls with

respect to specific day.

As shown in Figure 4 , we found that no particular day

dominates other with respect to unique number of calls. A

simple check can be done by verifying different color points

at the top of each day. For fairness, we removed week one

data since Sunday of week 1 was not completely recorded.

C. Weekday and Weekend time window

The temporal properties results of weekday and weekend

time window is shown in Figure 6 . Since first day was Sunday

and calls from 4 pm were recorded, we removed first weekend

datapoint from Figure 6 for good visualization.

The full diameter of first 6 datapoints are same which is

11. The full diameter is calculated on the sampled graph in

Stanford SNAP tool. The clustering coefficient also varies

minutely across each weekdays and also for weekends.

Time v s Nodes, Edges, Bidirectional Edges, Closed tria ds , Open Triads

le�09r---------.----------.----------'----------.---------.----------.----------r---------.r---------.

� le+O08 '"

� o

"§ "!l � le+O07

� U

J 1 le+O06

� 1l f �� 100000

10000�--------�--------�----------�--------�--------�----------�--------�--------�--------�

o m � � W � W � � �

Time

Fig. 3: The temporal properties of Call graph on Day and Night time window. For all calls initiated in daytime in a specific

day, a call graph is created by aggregating all calls during daytime and various properties of that call graph are analyzed. The

first data point represents night graph. The variability of properties in first datapoint is NOT due to missing data but there are

actual less number of unique calls made on that day. This drop of calls on Sunday night graph might be due to public holiday

on Monday.

-" 1'i � le+O07

§ '0

DZlYs \IS Number of CZllls

Days with 1 representing Sunday

Fig. 4: The number of unique calls with respect to days.In

particular no day dominates other days in terms of unique

calls, as can be seen by ditlerent top color for each days.Day

1 represents Sunday.

As with Uniform Day time window, the number of open

triads are in orders of magnitude higher than closed triads.

The values of number of edges, nodes and bidirectional are

also roughly equal across each weekdays and for weekends

signifying same level of macroscopic interactions occurring

among people for each weekdays and each weekends.

D. Cumulative Week time window

The temporal properties results of weekday and weekend

time window is shown in Figure 7. The graph shows that calls

gets saturated over two weeks period of time which implies

people tend to call same group of people again and again. One

such study [5] also reports saturation but over a period of two

months.The change in full diameter and effective diameter is

due to calculation of those values in sampled graph.

VII. CHOICE OF TIME WINDOW

As seen in section V and VI the properties like clustering

coefficient, diameter of call graph changes with the choice

of time window. This leads to the question what is the right

choice of time window for a graph? Our analysis showed in

short size time window Day Night time window, anomalies can

be easily detected but then weekly patterns and weak links

between communities cannot be etlectively captured. When

size of time window is large, the anomalies seen in Day Night

time window cannot be easily detected in Weekday Weekend

time window. Hence the appropriate choice of size of time

window depends on the study.

Time \IS (Iusta, Eff. Diameter. Full Diameter Time vs Nodes, Edges, Bidirectional Edges, Closed triads. Open Triads

1 00,----,-----,----,-----,----,-----,----,-----,----, le+009 ,----,------,------,-------,-------,-------,------,----,------,

.;g 10 ..

i' � S- le+OO7 "0

2 :0 � .;g

i3 .. "

Clus:ering Coefficient --+- "0

,;> Effective dillmeter � � le+OO6

!I' Full Diameter ----...- U

� h1 Il � l\j � 100000

Edges --+--

0.1 Nodes � �' Closed Triods --.-\J � � Bidirectional Edges -e---U dl Ooen Triads: _____

i 10000

0.01 � �

1000

0.001 '---------'------'-------'-------'---------'------'-------'-------'--------" 100'---------'-----L----�--�-----L----�----'---------'----� o 10 20 30 40 50 60 70 80 90 o 10 20 30 40 50 60 70 80

Time Time

Fig. 5: The temporal properties of Call graph on uniform day time window. For each day, a call graph is created by aggregating

all calls on that day and various properties of that call graph are analyzed. The first day call graph property values differs

significantly from other day graphs due to missing call records.

VIII. CONCLUSION

We studied Call Detail records containing more than 1

billion calls using four different time window. We also showed

in introduction section, the loss the information when static

graph analysis is performed. The summary of inferences for

specified four time windows are

• Day and Night time window study lead to anomaly

detection in the number of calls on a specific day which

might be due to holiday on that day.

• Uniform Day time window study lead to inference that

no day dominates other day in terms of number of calls,

number of nodes.

• Weekday and Weekend time window study lead to infer­

ence that macroscopic level properties recur with respect

to weekdays and weekends.

• Cumulative Week time window study lead to inference

about the saturation of graph over consecutive weeks.

The patterns of number of calls changes in various time win­

dow represent macroscopic pattern of calls. For microscopic

calls patterns between people , one should analyze motifs [lO]

which represents localized flow of information propagation

which is future direction of our work.

ACKNOWLEDGMENT

The authors would like to thank Ericsson Research India

for project funding and Mr. Shivashankar Subramanian for his

assistance in the initial setup of the project.

REFERENCES

[1] M. Newman. A. L. Barabsi, and D. J. Watts. The Structure and Dynamics

of Networks. Princeton University Press, 2006. [2] P. Holme and J. Saramaki, "Temporal networks," Physics Reports.

vol. 519, no. 3. pp. 97-125, 2012. [3] J. Leskovec. J. Kleinberg, and C. Faloutsos. "Graphs over time: Densi­

fication laws, shrinking diameters and possible explanations;' in Proc.

of KDD'05, 2005. [4] J. Tang. M. Musolesi, C. Mascolo, V. Latora, and V. Nicosia. "Analysing

information flows and key mediators through temporal centrality met­rics.," in SNS (E. Yoneki. E. Bursztein, and T. Stein. eds.l. p. 3, ACM. 2010.

[5] J.-P. Onnela, J. Saramki. J. Hyvnen. G. Szab, D. Lazer. K. Kaski. J. Kertsz, and A.-L. Barabsi, "Structure and tie strengths in mobile communication networks." Proceedings 0./ the National Academy 0./

Sciences. vol. 104, no. 18, pp. 7332-7336, 2007. [6] J.-P. Onnela. J. Saramaki. J. Hyvonen. G. Szabo. M. A. de Menezes.

K. Kaski, A.-L. Barabasi, and J. Kertesz. "Analysis of a large-scale weighted network of one-to-one human communication." 2007.

[7] M. Karsai. M. Kivel. R. K. Pan, K. Kaski, J. Kertsz, A.-L. Barabsi. and J. Saramki. "Small but slow world: How network topology and burstiness slow down spreading," CoRR. vol. absIl006.2125. 2010.

[8] A. A. Nanavati. R. Singh, D. Chakraborty, K. Dasgupta. S. Mukherjea. G. Das. S. Gurumurthy, and A. Joshi, "Analyzing the structure and evolution of massive telecom graphs.," IEEE Trans. Know!. Data Eng.

vol. 20. no. 5. pp. 703-718, 2008. [9] G. Krings. M. Karsai, S. Bemhardsson. V. D. Blondel. and J. Saramaki.

"Effects of time window size and placement on the structure of an aggregated communication network," EPJ Data Science, vol. 1, p. 4. May 2012.

[10] U. Alon. "Network motifs: theory and experimental approaches." Nat.

Rev. Genet .• vol. 8. no. 6. pp. 450-461, 2007.

90

Time vs Clusta, Eff. Diameter, Full Diameter Time vs Nodes, Edges, Bidirectional Edges, Closed triads. Open Triads

100,----------,,----------,-----------,-----------,-----------, leffll0,----------,-----------,-----------,----------,-----------,

"0 !ll

�\IVvV\;V\lVv'V\ �

Ie 10 c le+OO9 8-

2 � Il

� "0 ..

(] h

;,; a: .2

j' u Edg es --+-

Clustering Coefficient -+-- J le+OO8 Nodes ----1+-

Closed Triads � (] Effective dill meter �

Bidirectional Edges ---e-

\'il Full Dj�meter ------;E- � Open Triads ---

'if � � '6 u ill

0.1 J le+OO7

� �

0.01 L-________ ---.JL-________ ---.J __________ ---' __________ ---' __________ --" leffl06L------------'----------�-----------L----------�--------� o � � m �

Time

o 10 15 20

Time

Fig. 6: The temporal properties of Call graph on weekday and weekend time window. For all weekdays m a specific week, a

call graph is created by aggregating all calls on that weekdays and various properties of that call graph are analyzed. Since

first weekend contained only Sunday with missing records, we removed first datapoint.

� � (]

� i � B :IJ �'

� v

TIme vs dustCf, Eff. Dll'Imeter, Full Dll'Imeter

100c-------,-------,-------,-------,-------,-------.-----�

10

�"

Clusterrng CoeffiCient --+-Effective dii!lmeter -x--

Full Dillmeter -----'*-

0.1

O.Ol L-______ L-______ L-______ L-______ .L-______ .L-______ .L-____ � o 10 12 14

Time

Time vs r�ode5, Edges, Bldlrectlon�r Edges, Closed trll'lds , Open Trillds

leffl lOc-------,------,-------,-------,-------,-------,------,

� Ie

! le+OO9

" •

1l j; Edge-s --+-

� Nodes � closed Tti1'lds ______

v Bldlrectlonlll Edges -e-

! le+OOB o en Tri/!!lds _____

m

E � ill gi � le+OO7

f

le+{)06L-------L--------'---------'------�-------L-------L------� o 10 12 14

Time

Fig. 7: The temporal properties of Call graph on Consecutive week time window. For all calls initiated from week 0 to specific

week are aggregated and graph is created for that week. This graph shows saturation of calls, implying people call same group

of people again and again.

25


Recommended