AGENT BASED PARALLELIZATION OF BIOLOGICAL NETWORK MOTIF
DETECTION
Saranya Duraisamy
Capstone Progress Report
Master of Science in Computer Science & Software Engineering
University of Washington, Bothell
March 18, 2020
Project Committee:
Dr. Munehiro Fukuda, Committee Chair
Dr. Wooyoung Kim, Committee Member
Dr. Clark Olson, Committee Member
Agent-based Parallelization of Biological Network Motif Detection
ii
TABLE OF CONTENTS
List of Figures ................................................................................................................................ iii
List of Tables ................................................................................................................................. iii
Chapter 1. Introduction ................................................................................................................... 1
Chapter 2. Related works ................................................................................................................ 1
2.1 Sequential Network Motif Detection .............................................................................. 1
2.2 Parallel Network Motif Detection ................................................................................... 2
2.3 MASS-based Parallel Network Motif Detection ............................................................ 2
Chapter 3. METHODS.................................................................................................................... 2
3.1 Agent-Based Network Motif Detection .......................................................................... 3
3.2 System Flow.................................................................................................................... 4
3.3 Performance Improvement.............................................................................................. 9
Chapter 4. RESULTS.................................................................................................................... 10
4.1 Execution Environment ................................................................................................ 10
4.2 Performance Analysis ................................................................................................... 11
Chapter 5. Conclusion & Future work .......................................................................................... 14
Bibliography ................................................................................................................................. 14
Agent-based Parallelization of Biological Network Motif Detection
iii
LIST OF FIGURES
Figure 3.1. Network Motif Detection Process. ................................................................... 3
Figure 3.2. Maximum Motifs for Graphs upto 10 vertices[1]. ............................................. 3
Figure 3.3. Subgraph Enumeration Algorithm[7]. ............................................................... 4
Figure 3.4. Subgraph Enumeration Tree[7]. ......................................................................... 4
Figure 3.5. System Flow for Agent-Based Network Motif Detection. ............................... 5
Figure 3.6. Graph Ordering Visualization. ......................................................................... 6
Figure 3.7. Execution screenshot of MASS Network Motif Detection. ............................. 8
Figure 4.1. Dolphin Network ............................................................................................ 10
Figure 4.2. Comparison of MASS Motif Synchronous vs MASS Network Motif ........... 11
Figure 4.3. MASS Parallel Performance Analysis............................................................ 12
Figure 4.4. MASS Performance Tuning Evaluation ......................................................... 12
Figure 4.5. Parallel I/O Graph........................................................................................... 13
Figure 4.6. Sequential vs Parallel Performance ................................................................ 14
LIST OF TABLES
Table 3.1. Graph vertices rearrangement based on vertex degree ...................................... 6
Table 4.1. Real Network Datasets..................................................................................... 10
Table 4.2. Input graph size for different formats .............................................................. 13
Agent-based Parallelization of Biological Network Motif Detection
1
Chapter 1. INTRODUCTION
‘Network Motifs’ are defined as the recurrent and statistically significant patterns or subgraphs in
the biological networks [1]. k-sized network motifs are the k-vertices induced subgraphs that occur
more frequently in the target network than any other k-vertices subgraphs in the network. Network
motif detection and analysis led to the discovery of unidentified biological interactions. Network
motif detection involves computationally intense subgraph enumeration, random graph generation,
NP-complete subgraph isomorphic testing, and statistical testing. In this work, MASS [2] agent-
based parallelization is applied to the computationally expensive subgraph enumeration process.
Unlike sequential motif detection tools that are restricted to single machine resources, parallel
motif detection could benefit from collective memory and compute power offered by cluster
machines and aid in the detection of large motif size as well as analysis of large networks. Current
MASS Network Motif [3] detection gained 30x speedup than previous MASS Motif Synchronous
implementation [4] and execution time reduced by a factor of 2 with the increase in the number of
nodes utilized for the parallel execution.
The rest of this paper is organized as follows: Chapter 2 reviews the existing sequential and parallel
network motif detection tools. Chapter 3 explains the architecture of agent-based motif detection.
Chapter 4 presents the experimental results and comparative analysis of parallel implementations.
Finally, Chapter 5 concludes the progress with future work.
Chapter 2. RELATED WORKS
2.1 SEQUENTIAL NETWORK MOTIF DETECTION
M-Finder [5] performs an exhaustive network motif search in a brute force manner that runs for
longer time and consumes more memory. Owing to the computational complexity, M-Finder can
only detect motifs up to size 6. Fast Network Motif Detection (FANMOD) [6] employs the most
efficient Enumerate Subgraph (ESU) algorithm [7], which breaks symmetry with vertex
identifiers. Unlike M-Finder, ESU finds a motif only once, and hence it is faster. FANMOD can
detect motifs up to size 8 in both undirected and directed networks. Network Motif
(NemoLib) Java [8] is a general-purpose library used to find motif frequency, motif concentration,
Agent-based Parallelization of Biological Network Motif Detection
2
and motif to instance mapping. Similar to FANMOD, NemoLib [9] uses ESU algorithm to
enumerate subgraphs and relies on nauty labelg [10] to detect graph isomorphisms. Consequently,
the motif size limitation imposed by these sequential tools [5], [6] necessitates the development of
parallel tools to discover large motifs and reduce the detection complexity in large graphs.
2.2 PARALLEL NETWORK MOTIF DETECTION
MPI based Parallel Motif Detection [11] proposed by Wang et al. partitions network and
broadcast to workers, which then detect potential motifs in parallel. Master process gathers results
from all workers and deduce the actual motifs with isomorphism check. This performed faster than
sequential version only up to motif size 4. Parallel ESU [12] parallelized recursive subgraph
extension calls of ESU algorithm and achieved linear speedup for gene and metabolic networks.
But parallel ESU did not result in linear performance for neural and protein-protein interaction
networks due to the long time taken to combine final results. Iterative MapReduce ESU [13]
parallelization achieved upto 37 times speedup than the sequential version.
2.3 MASS-BASED PARALLEL NETWORK MOTIF DETECTION
MASS Motif Synchronous [14], Kipps et al. parallelized biological network motif enumeration in
three different ways, MASS agent-based, MASS places-based, and MPI based enumeration.
Experimental results demand MASS agent management feature to avoid agent explosion caused
by the enumeration of 5.5 million agents. Another research work [15], MASS
NemoProfile construction extended MASS places-based parallelization [14] to map individual
vertices to the motif reveals the possibility to achieve better parallelism using MASS.
Chapter 3. METHODS
Network motif detection process involves subgraph enumeration and frequency computation of
non-isomorphic patterns in the input graph, random graph generation, subgraph enumeration and
frequency computation of non-isomorphic patterns in random graphs, and statistical testing to
determine significant network motifs as depicted in Figure 3.1.
Agent-based Parallelization of Biological Network Motif Detection
3
Figure 3.1. Network Motif Detection Process.
3.1 AGENT-BASED NETWORK MOTIF DETECTION
As evident from Figure 3.2, the number of subgraph patterns increases exponentially with increase
in the graph vertices for both undirected and directed graphs. This drastic increase cause subgraph
enumeration task to consume more time for large graph size or motif size. Agent-based network
motif detection approach intuitively parallelize time-consuming subgraph enumeration task.
Similar to sequential tools [6], [9], this parallel approach employs ESU algorithm for target graph
and Randomized-ESU algorithm for random graphs to improve speed of the motif detection.
Figure 3.2. Maximum Motifs for Graphs upto 10 vertices[1].
Figure 3.3 demonstrates the ESU algorithm that utilizes vertex identifiers to generate unique
subgraphs. This algorithm computes all subgraphs recursively from each vertex by traversing a
limited set of neighbors whose values are higher than the current enumerated vertex identifier.
Agent-based Parallelization of Biological Network Motif Detection
4
Figure 3.3. Subgraph Enumeration Algorithm[7].
In MASS implementation, each graph vertex can be mapped to a MASS place using zero-indexed
vertex identifier. Agent-based subgraph enumeration from each vertex (place) can be parallelized
by migrating mobile agents to the neighbor vertex since enumeration operations from each vertex
are independent of each other as seen in Figure 3.4.
Figure 3.4. Subgraph Enumeration Tree[7].
3.2 SYSTEM FLOW
Figure 3.5 demonstrates the system flow to detect network motifs for the given input size in an
undirected target network. This system comprises of six distinct modules, graph parser, optional
graph ordering module, target graph analyzer, random graph generator, random graph analyzer,
and statistical analyzer.
Agent-based Parallelization of Biological Network Motif Detection
5
Figure 3.5. System Flow for Agent-Based Network Motif Detection.
Agent-based Parallelization of Biological Network Motif Detection
6
3.2.1 Graph Parser and Graph Ordering Module
Graph parser parses input graph represented in edge list format and constructs graph in adjacency
list representation to initialize all vertices with their neighbors. Graph parser invokes a graph
ordering module that can be enabled or disabled at runtime. If enabled, it rearranges graph vertices
in increasing order of vertex degree (number of neighbors) and then split ordered vertices evenly
for the number of computing machines utilized in the current execution. Figure 3.6 illustrates the
original input graph and corresponding reordered graph for four computing machines.
(a) Original Graph
(b) Reordered Graph
Figure 3.6. Graph Ordering Visualization.
Table 3.1 captures the mapping of graph vertices to cluster machines in the original graph and
reordered graph. In contrast to the original graph’s non-uniform total degree distribution of 8-4-6-
2, the reordered graph has total degree distribution of 4-4-6-6. Thus, graph ordering module
attempts to reduce load imbalance by reordering and distributing vertices with an approximately
equivalent degree to all the computing machines.
Table 3.1. Graph vertices rearrangement based on vertex degree
Original Graph Machines Allocation Reordered Graph Machines Allocation
0 {2, 6, 4, 1}
1 {0, 3, 7, 5}
Machine 1
Total Degree: 8
0 {5}
1 {4, 3, 5}
Machine 1
Total Degree: 4
2 {0, 4}
3 {1, 5}
Machine 2
Total Degree: 4
2 {7}
3 {7, 6, 1}
Machine 2
Total Degree: 4
4 {0, 2, 5}
5 {4, 1, 3}
Machine 3
Total Degree: 6
4 {1, 5}
5 {0, 1, 7, 4}
Machine 3
Total Degree: 6
6 {0}
7 {1}
Machine 4
Total Degree: 2
6 {7, 3}
7 {3, 2, 6, 5}
Machine 4
Total Degree: 6
Agent-based Parallelization of Biological Network Motif Detection
7
3.2.2 Target Graph Analyzer
Target graph analyzer performs full enumeration of the input graph to identify all candidate motifs
for the given motif size. It runs in parallel across all the computing machines utilized for the
execution. The target graph analyzer first instantiates all vertices (MASS Places) with their
corresponding neighbor vertices information obtained from graph parser. Then, it populates
crawler (MASS Agents) at each place. These crawlers execute the ESU algorithm shown in Figure
3.3 simultaneously from all the vertices. Crawlers spawn child crawlers to traverse all neighbors
and migrate itself to one of the neighbors. Crawler terminates when no valid neighbor exists for
the vertex. Once crawler traversed motif sized subgraph, crawler deposits subgraph structure in
compact graph6 representation. After the termination of all crawlers, the target graph analyzer
gathers all deposited motif sized subgraphs from all places. Finally, isomorphic occurrences are
grouped together by passing graph6 motif representation to Labelg program and resultant
canonical label of candidate motifs are saved along with respective frequencies.
3.2.3 Random Graph Generator
Random graphs are generated from the input graph by preserving the degree distribution of the
vertices in the input graph. This work generates degree-preserving random graphs using the
configuration model described in [16]. Random graph generator fetches degree distribution
sequence from graph parser. Degree distribution sequence contains a list of vertex identifiers
created by repeatedly adding each vertex identifier up to its degree value. Random graph generator
shuffles degree distribution list and repeatedly picks a random pair of vertices as an edge for the
random graph. Consequently, generated random graph may be connected or disconnected graph
with lesser degree distribution than expected due to the presence of self-loops and parallel edges.
3.2.4 Random Graph Analyzer
Random graph analyzer employs RAND-ESU algorithm to perform approximate enumeration
based on the input sampling probabilities. Instead of traversing all neighbors, crawlers selectively
traverse the limited set of neighbors at each ESU tree level. RAND-ESU algorithm reduces the
time taken to compute the frequency of candidate motifs in a large number of random graphs.
Agent-based Parallelization of Biological Network Motif Detection
8
Similar to the target graph analyzer, random graph analyzer also executes simultaneously in all
computing machines used for the execution.
3.2.5 Statistical Analyzer
Statistical analyzer performs final computation to determine the significance of the candidate
motifs. Z-score is the ratio of the difference between the original frequency and the mean random
frequency to the standard deviation. Z-score may be undefined when the standard deviation is zero.
𝑍(𝑚) =𝐹𝐺(𝑚) − 𝑀𝑒𝑎𝑛(𝐹𝑅(𝑚))
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛𝑅(𝑚)
p-value is the number of random networks in which network motif occurred more often than in the
original network, divided by the number of random networks ‘N’. Hence, p-value will be in the
range between 0 and 1 inclusive. Smaller the p-value, the more significant is the network motif.
𝑝(𝑚) = 1
𝑁∑𝑛=1
𝑁 𝑐(𝑛) 𝑤ℎ𝑒𝑟𝑒 𝑐(𝑛) = 1, 𝑖𝑓 𝐹𝑅(𝑚) ≥ 𝐹𝐺(𝑚)
Generally, Z(m)>2 and p(m)<0.01 are statistically significant subgraph patterns and motifs with
values in this range are recognized as network motifs [1]. Statistical analyzer computes Z-Score
and p-value for all the candidate motifs using the above mathematical relations. This executes
sequentially in the master computing machine and displays result to the user as seen in Figure 3.6.
Figure 3.7. Execution screenshot of MASS Network Motif Detection.
Agent-based Parallelization of Biological Network Motif Detection
9
3.3 PERFORMANCE IMPROVEMENT
Initial MASS implementation suffered from the over usage of heap memory due to the creation of
a large number of non-primitive Java objects. As a result, it caused 'out of memory' errors for large
motif size (9) in the smaller graph (dolphin network) and small motif size (3) in the larger graph
(DIP dataset). The following performance improvements are incorporated to reduce memory usage
and improve execution speed.
• Reduced HashMap with String Key. The initial version maintained data uniquely for each
motif in multiple hash maps (8 hash maps) with motif’s canonical label string as a key.
These hashmaps are redesigned to ‘Motif’ class to reduce memory space occupied by the
recurrent canonical label string objects stored in multiple maps.
• MASS Asynchronous Agent Migration. Replaced Agent’s callAll followed by manageAll
with doAll for ‘motif size’ iterations to reduce time incurred by returning control to driver
program in between the successive function calls.
• Changed Non-primitive Java Objects to Primitive Types to reduce memory space and avoid
autoboxing and unboxing performed during primitive to non-primitive type conversions
and vice versa.
• Replaced Built-in Java Collection with FastUtil’s Primitive Collection. Built-in Java
collections such as HashMap, HashSet, and ArrayList consume enormous memory with an
increase in the collection size and tightly couples the internal data structure used. To reduce
memory and benefit from using different internal data structures such as array, AVL tree,
RB tree, open hash, and custom hash, FastUtil [21] primitive collections are used. In MASS
implementation, the number of agents increases exponentially for large motif size and large
graph size. With primitive collections, each agent carried much lesser data than before.
• Moved Agent’s data to Places. Input motif size and sampling probabilities are stored in
agents initially. These input data consumed huge memory with the creation of millions of
agents. Input data used by agents are stored in all places and agents fetch data from place
upon arrival to the place, thereby reducing memory utilized during the agent expansion.
These performance tuning has reduced execution time significantly, as explained in section 4.2.3.
This fine-tuning enabled the detection of small motif sizes in large graphs as well as large motif
sizes in small graphs that are previously infeasible.
Agent-based Parallelization of Biological Network Motif Detection
10
Chapter 4. RESULTS
4.1 EXECUTION ENVIRONMENT
Experiments are conducted in a cluster of 8 computing nodes made available by the University of
Washington Bothell. Among 8 computing nodes, 4 nodes have 8-core 2.33GHz CPU (Intel Xeon
E5410) with 16GB memory and the remaining 4 nodes have 4-core 2.66GHz CPU (Intel Xeon
5150) with 16GB memory. The latest stable version of software libraries used in this work are as
follows, MASS Java [2] core version 1.2.1, NemoLib Java [9] version 2, and Nauty [10] version
2.6 (r12). All experiments are executed with 4GB initial heap and 12GB maximum heap space.
4.1.1 Input Datasets
Table 4.1 lists three different undirected real datasets used in the experiments. These downloaded
input datasets are in different graph formats such as Graph Modeling Language (GML), Pajek, and
Edge-List format. Different input graphs formats are converted to the Edge-List format expected
by the current implementation using a python script. This script uses open-source python library
NetworkX [17], for format conversion and graph visualization. Figure 4.1 illustrates dolphin and
power network datasets visualized using python script.
Table 4.2. Real Network Datasets
Real Datasets Vertices Edges Highest
Degree
Connected
Components
Dolphin [18] 62 159 12 1
Power [19] 4,941 6,594 19 1
DIP 2016 [20] 27,876 76,108 289 2,385
DIP Modified 26,695 73,085 289 1,204
Figure 4.1. Dolphin Network
As seen from Table 4.1 and Figure 4.1, dolphin and power datasets are fully-connected networks
while DIP dataset is a disconnected network with multiple connected components.
Agent-based Parallelization of Biological Network Motif Detection
11
4.1.2 Input Graph Preprocessing
The current MASS implementation supports only 0-indexed integer identifier for graph vertices to
reduce memory usage and enable easier vertex mapping to MASS places. Dolphin network [18]
has string-based vertex identifiers, and DIP 2016 network [20] has 3019 self-loops and 4 parallel
edges. A utility program has been developed in Java to clean input graphs with self or parallel
edges and construct a graph by mapping string-based or non-zero indexed vertex identifiers to
zero-indexed vertex identifiers. Table 4.1 depicts the original and modified DIP dataset with a
reduction of 1181 vertices and 3023 edges. This preprocessing reduces memory allocation for
vertices with no valid neighbor as well as time spent in traversing self or parallel edges. All
experiments are conducted using the same input graph generated by the utility program.
4.2 PERFORMANCE ANALYSIS
4.2.1 Comparison with Kipps et al. MASS Motif Synchronous Implementation
MASS Network Motif [3] and MASS Motif Synchronous [4] implementations were tested in 8
computing nodes with 1 thread per node for dolphin and power graphs. Both implementations took
similar execution time for small motif size up to 5. But for larger motif sizes, current
implementation achieved a maximum of 3.5-4x speedup for dolphin network (motif size 8), and
power network (motif size 6 and 7) as seen in Figure 4.2.
Figure 4.2. Comparison of MASS Motif Synchronous vs MASS Network Motif
Agent-based Parallelization of Biological Network Motif Detection
12
4.2.2 MASS Parallel Performance Analysis
To evaluate MASS parallel performance, current implementation [3] was tested with 4 and 8
computing nodes. As evident from Figure 4.3, 8 nodes execution decreased execution time by a
factor of 1.7-2 for large motif sizes. In conclusion, the parallel performance improved with more
computing nodes utilized for the parallel execution.
Figure 4.3. MASS Parallel Performance Analysis
4.2.3 MASS Performance Tuning Evaluation
To assess the performance improvement described in section 3.3, experiments were conducted
with pre-tuned and fine-tuned versions. 7x speedup achieved in dolphin data (for motif size 8) and
13x speedup attained in power data (for motif size 7) demonstrated in Figure 4.4 signify the
benefits gained from fine-tuning.
Figure 4.4. MASS Performance Tuning Evaluation
Agent-based Parallelization of Biological Network Motif Detection
13
4.2.4 MASS Parallel I/O Evaluation
MASS Parallel I/O feature expects each line in the input graph to have the same alignment so that
the file can be partitioned and read in parallel from the computing nodes. To meet even alignment
constraint, each neighbor data need to be filled with -1 and spaces up to maximum neighbors as
seen in Figure 4.5 and end up creating a huge input file. As clearly visualized from Table 4.2,
MASS Parallel I/O increases input file size for large graphs. Though MASS Parallel I/O provides
great parallelization potential for complete graphs wherein every vertex has a connection to all
other vertices in the graph. Current MASS Parallel I/O is not well suited for biological networks
that exhibit network property of fewer vertices with high degree and more vertices with low degree.
Hence, sequential I/O was preferred over parallel I/O in the current MASS implementation.
Table 4.3. Input graph size for different formats
Dataset Edge List File Size Parallel I/O File Size
Dolphin 1 KB 8 KB
Power 62 KB 923 KB
DIP 761 KB 75,370 KB
Figure 4.5. Parallel I/O Graph
4.2.5 MASS Agent Population Control Evaluation
MASS agent population control feature enabled large motif size detection (motif size 9 in dolphin
network, motif size 8 in power network and motif size 4 in DIP) up to 14.9 million agents by
serializing agents exceeding the specified maximum population limit. However, with further
increase in motif size (motif size 5 for DIP dataset enumerates 5.1 billion agents), agent population
control consumes more heap space to store serialized inactive agent objects and slows down the
execution. Observation reveals that one of cluster machine heap usage reached 11.8 GB (out of
max 12GB heap) and full garbage collection was triggered for 91 times which took 1116 seconds
out of overall execution of 1380 seconds. When serialized agent objects grow up to maximum
heap size, Processor spent most of the time in garbage collection rather than any useful
computation. Thus, current MASS implementation is limited by the maximum heap availability
on the cluster machines.
Agent-based Parallelization of Biological Network Motif Detection
14
4.2.6 Overall Performance Comparison
Figure 4.6. Sequential vs Parallel Performance
Figure 4.6 compares the performance of
sequential tools against parallel 8-nodes
MASS execution for power network.
Though current MASS implementation
[3] gained 30x speedup than the previous
MASS implementation [4], it lags behind
the sequential FANMOD [6] and
NemoLib [9]. Due to the memory
limitation faced by MASS version, very
large graphs (exceeding single machine
memory) that are infeasible to analyze
with sequential tools couldn’t be tested
with MASS as well.
Chapter 5. CONCLUSION & FUTURE WORK
Agent-based network motif detection enhanced detection up to motif size 9, impractical with
sequential tools [5], [6]. MASS Network Motif version achieved at maximum 30x speedup than
Motif Synchronous version by using MASS asynchronous agent migration and agent population
control feature, eliminating object creation at the application level, using primitive types and
primitive-type specific collections, and carrying minimal data within each agent. However, very
large graphs couldn’t be tested in the existing cluster due to the heap size limitation faced by MASS
implementation. Future work involves Spark-based parallel network motif detection to evaluate
parallelization intuitiveness, ease of programming, and fitness of MASS to parallelize graph
problems. Additionally, focus on possible performance improvements in the MASS
implementation to test large graph sizes and large motif sizes.
BIBLIOGRAPHY [1] Junker, B. H., & Schreiber, F. (2011). Analysis of biological networks (Vol. 2). John Wiley & Sons.
[2] Fukuda, M. (2010). Mass: Parallel-computing library for multi-agent spatial simulation. Distributed
Systems Laboratory, Computing & Software Systems, University of Washington Bothell, Bothell, WA.
[3] https://bitbucket.org/mass_application_developers/mass_java_appl/src/1ce403d59c772efac7ac41bc2
92e735301126114/?at=feature%2FNetworkMotif. MASS Network Motif Implementation.
Agent-based Parallelization of Biological Network Motif Detection
15
[4] https://bitbucket.org/mass_application_developers/mass_java_appl/src/master/Applications/MotifSy
nchronous. MASS Motif Synchronous Implementation.
[5] Mfinder, https://www.weizmann.ac.il/mcb/UriAlon/download/network-motif-software
[6] Wernicke, Sebastian, and Florian Rasche. "FANMOD: a tool for fast network motif detection."
Bioinformatics 22, no. 9 (2006): 1152-1153.
[7] Wernicke, Sebastian, Efficient detection of network motifs, IEEE/ACM Trans. Comput. Biol.
Bioinformatics, vol. 3, no. 4, pp. 347-359, 2006.
[8] Andersen, Andrew, and Wooyoung Kim. NemoLib: A Java Library for Efficient Network Motif
Detection. In International Symposium on Bioinformatics Research and Applications, pp. 403-407.
Springer, Cham, 2017.
[9] NemoLib Java version 2, https://github.com/Kimw6/NemoLib-Java-V2
[10] McKay, B. D., & Piperno, A. (2014). Practical Graph Isomorphism, II. Journal of Symbolic
Computation, 60, 94-112.
[11] Wang T, Touchman JW, Zhang W, Suh EB, Xue G (2005) A parallel algorithm for extracting
transcription regulatory network motifs. In Proceedings of the IEEE international symposium on
bioinformatics and bioengineering, IEEE Computer Society Press, LosAlamitos, CA,USA, pp 193–
200.
[12] Ribeiro, P., Silva, F., & Lopes, L. (2010, January). A parallel algorithm for counting subgraphs in
complex networks. In International Joint Conference on Biomedical Engineering Systems and
Technologies (pp. 380-393). Springer, Berlin, Heidelberg.
[13] Verma, Vartika, Paul Park Kwon, Anand Joglekar, and Wooyoung Kim. Network motif analysis in
clouds-subgraph enumeration with iterative hadoop mapreduce. vol 4, 28-40.
[14] Matthew Kipps, Wooyoung Kim, and Munehiro Fukuda. Agent and Spatial Based Parallelization of
Biological Network Motif Search. In Proc. 17th IEEE International Conference on High Performance
Computing and Communications - HPCC 2015, pages 786–791, New York, NY, August 2015.
[15] Andrew Andersen, Wooyoung Kim, and Munehiro Fukuda. Mass-based nemoprofile construction for
an efficient network motif search. In IEEE International Conference on Big Data and Cloud
Computing in Bioinformatics - BDCloud 2016, pages 601–606, Atlanta, GA, October 2016.
[16] Newman, M. E. (2003). The structure and function of complex networks. SIAM review, 45(2), 167-
256.
[17] Hagberg, A., Swart, P., & Schult, D. Exploring Network Structure, Dynamics, and Function using
NetworkX. In Proceedings of the 7th Python in Science Conference (SciPy2008), Gäel Varoquaux,
Travis Vaught, and Jarrod Millman (Eds.), (Pasadena, CA USA), pp. 11–15.
[18] Lusseau, D., Schneider, K., Boisseau, O. J., Haase, P., Slooten, E., & Dawson, S. M. (2003). The
bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting
associations. Behavioral Ecology and Sociobiology, 54(4), 396-405.
[19] Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’networks. Nature,
393(6684), 440-442.
[20] Xenarios, I., Salwinski, L., Duan, X. J., Higney, P., Kim, S. M., & Eisenberg, D. (2002). DIP, the
Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions.
Nucleic Acids Research, 30(1), 303-305.
[21] Fastutil: Fast & compact type-specific collections for Java, http://fastutil.di.unimi.it/