Design and Implementation of a Network Search Node696730/FULLTEXT01.pdf · Design and...

Design and Implementation ofa Network Search Node

THANAKORN SUEVERACHAI

Master’s Degree ProjectStockholm, Sweden October 2013

XR-EE-LCN 2013:015

Design and Implementation of aNetwork Search Node

THANAKORN SUEVERACHAI

Stockholm October 2013

Supervisor: Abu Hamed Mohammad Misbah UddinExaminer: Prof. Rolf Stadler

Laboratory of Communication NetworksSchool of Electrical Engineering

KTH Royal Institute of Technology

XR-EE-LCN 2013:015

v

Abstract

Networked systems, such as cloud infrastructures, are growing in size and complexity.They hold and generate a vast amount of configuration and operational data, whichis maintained in various locations and formats, and changes at various time scales.A wide range of protocols and technologies is used to access this data for networkmanagement tasks. A concept called ‘network search’ is introduced to make all thisdata available in real-time through a search platform with a uniform interface, whichenables location-independent access through search queries.

Network search requires a network of search nodes, where the nodes have identicalcapabilities and work cooperatively to process search queries in a peer-to-peer fashion.A search node should indicate good performance results in terms of low query responsetimes, high throughputs, and low overhead costs and should scale to large networkedsystems with at least hundred thousands nodes.

This thesis contributes in several aspects towards the design and implementation of anetwork search node. We designed a search node that includes three major components,namely, a real-time data sensing component, a real-time database, and a distributedquery-processing component. The design takes indexing of search terms and concur-rency of query processing into consideration, which accounts for fast response timesand high throughput of search queries. We implemented a network search node as asoftware package that runs on a server that provides a cloud service, and we evaluatedits performance on a cloud testbed of nine servers. The performance measurementssuggest that a network search system based on our design can process queries at lowquery latencies for a high query load, while maintaining a low overhead of computa-tional resources.

vi

Acknowledgements

First and foremost, I would like to thank my supervisor Misbah Uddin, for his valuableguidance and advices. He inspired and motivated me greatly to contribute to theproject. I also would like to thank him for tremendous supports in all aspects of thethesis. Additionally, we would like to take this opportunity to thank Prof. Rolf Stadlerfor offering this thesis and his supports rendered over the period of the project. Also,I would like to thank Rerngvit Yanggratoke for knowledge supports and general helps.Last but not least, I would like to thank my family and friends, who are the sourcesof energies to pursue this thesis.

Thanakorn Sueverachai

October, 2013

Table of Contents

Table of Contents vii

List of Figures ix

1 Introduction 1

2 Background 52.1 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Management and Monitoring of Clouds . . . . . . . . . . . . . . . . . . . . 62.3 NoSQL Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Echo : A Distributed Protocol for Network Management . . . . . . . . . . . 82.5 Network Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.6 A Previous Network Search Prototype . . . . . . . . . . . . . . . . . . . . . 14

3 Related Research 153.1 Weaver Query System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Sophia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3 Distributed Image Search in Camera Sensor Networks . . . . . . . . . . . . 163.4 Minerva ∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Design of a Search Node 194.1 An Architecture of a Search Node . . . . . . . . . . . . . . . . . . . . . . . 194.2 A Design for Efficient Local Query Processing . . . . . . . . . . . . . . . . . 20

5 Implementation of a Search Node 235.1 Implementation of the Sensing Module . . . . . . . . . . . . . . . . . . . . . 24

5.1.1 Overview of the Sensing Module . . . . . . . . . . . . . . . . . . . . 245.2 The Local Databases Component . . . . . . . . . . . . . . . . . . . . . . . . 24

5.2.1 The Object Database . . . . . . . . . . . . . . . . . . . . . . . . . . 255.2.2 The Index Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.2.3 The Index Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.3 Implementation of the Module for Distributed Query Processing . . . . . . 285.3.1 Software Components . . . . . . . . . . . . . . . . . . . . . . . . . . 285.3.2 Software Component Interactions . . . . . . . . . . . . . . . . . . . . 315.3.3 Concurrent Query Processing . . . . . . . . . . . . . . . . . . . . . . 33

5.4 Code Readability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

6 Performance Evaluation of the Network Search Prototype 356.1 Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

vii

viii TABLE OF CONTENTS

6.2 Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.3 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.4 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.5 Experiment Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.6.1 Test 1: Global Query Latency . . . . . . . . . . . . . . . . . . . . . . 376.6.2 Test 2: Local Computational Overhead . . . . . . . . . . . . . . . . 376.6.3 Test 3: Effect on Concurrency . . . . . . . . . . . . . . . . . . . . . . 386.6.4 Test 4: Local Latencies . . . . . . . . . . . . . . . . . . . . . . . . . 406.6.5 Test 5: Impact of Cluster Load on Global Latency . . . . . . . . . . 40

6.7 Estimating the Global Query Latency for a Large Datacenter . . . . . . . . 416.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7 Limitations of the Current Design 457.1 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

8 Conclusions 478.1 Personal Experiences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Appendices 48

A A Complete Class Diagram of a Network Search Node 49

Bibliography 51

List of Figures

2.1 Sample object respresentations: (a) a document store representation, (b) a key-value store representation, and (c) a column store representation. . . . . . . . . 8

2.2 The echo protocol executing on a network graph [51] . . . . . . . . . . . . . . . 92.3 A sample spanning tree created by the echo protocol [54] . . . . . . . . . . . . 92.4 Aggregator for processing a query q on a node with local database D [48] . . . 102.5 The architecture for network search [53] . . . . . . . . . . . . . . . . . . . . . . 122.6 Sample network search objects: (a) an object that represents a virtual machine,

(b) an object that represents a IP flow . . . . . . . . . . . . . . . . . . . . . . . 12

4.1 An architecture of a search node [48] . . . . . . . . . . . . . . . . . . . . . . . . 204.2 An architecture for concurrent query processing in a search node . . . . . . . . 21

5.1 A sample MongoDB object representing a server in JSON . . . . . . . . . . . . 255.2 A sample MongoDB query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.3 A sample search index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.4 A class diagram of the echo protocol component . . . . . . . . . . . . . . . . . 285.5 Sample Python objects used in the function local() when the network search

query (a) is invoked : (b) a sample query object and (c) index entries for termsserver and cloud-1 that belong to object id 5215d68ce2b. . . . . . . . . . . . . 32

5.6 A class diagram shows components and their relations in the query processingmodule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.1 A topology of search nodes in the testbed . . . . . . . . . . . . . . . . . . . . . 366.2 Global latencies for different query loads. Each measurement shows box plots

with markers at 25th, 50th, 75th, and 95th percentile. Each search node runs twoquery processing threads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.3 Computational overhead of a search node for different query loads. Each searchnode runs two query processing threads. . . . . . . . . . . . . . . . . . . . . . . 38

6.4 The 50th percentile of global query latencies for different query loads. The curvesshow results for 1-4 concurrent query processing thread(s). . . . . . . . . . . . 39

6.5 Computational overhead of a search node for different query loads and for 1-4concurrent query processing thread(s). . . . . . . . . . . . . . . . . . . . . . . . 39

6.6 Bar charts show average time spend on each phase of an operation in a searchnode with respect to query loads. Each search node runs two query processingthreads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.7 The Pie chart shows percentage of time spend on each operation : (a) at 100queries/second load and (b) at 200 queries/second load. Each search node runstwo query processing threads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

ix

x List of Figures

6.8 The 50th percentile global latencies for different query loads when the cloud isunderutilized and highly utilized. . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Chapter 1

Introduction

1.1 Background

Over the last decade, server clusters have grown tremendously in terms of using many low-cost commodity machines rather than few top-of-the-line machines. Recently, the paradigmof cloud computing has gained significant popularity. Cloud computing is a concept thatallows computing resources to be virtualized via virtual machines that run on server clustersin a data center.

Networked systems, such as cloud infrastructures, often face challenges in management,since the systems are typically large, e.g., in order of magnitude from tens to hundreds ofthousands of devices. They keep and produce a vast amount of configuration and operationaldata in configuration files, device counters, data caches, and data collection points. Thisdata is often segmented in the sense that it is kept in various locations and formats, and itchanges in various time scales. To access this data for network management, a wide rangeof protocols needs to be known, and the location of the data should be provided. Thechallenge becomes critical when a real-time access to the data is considered.

To address this problem, a concept, called network search, is introduced by Uddin, et al[53] [54] [48]. Network search provides a generalized search process that makes data innetworks and networked systems available in real-time to applications for network manage-ment. Network search can be seen as a function analogous to web search for operationaldata in a network. This data can be accessed by characterizing its content through simpleterms, whereby the location or schema of the data does not need to be known. Networksearch provides a uniform, content-based, and location-independent access. It contains real-time databases inside the network that maintain network data and functions that realizein-network search functionality.

1.2 Requirements

In this thesis, we focus on design and implementation of a network search node. The(network) search node is the key component of the network search architecture [53]. Aset of search nodes deployed in the network infrastructure works cooperatively to providefunctionalities of network search. Generally, each search node has identical capabilities thatincludes sensing data from devices, maintaining the data in a local real-time database, and

1

2 CHAPTER 1. INTRODUCTION

performing data retrieval, matching, ranking and aggregation to realize a distributed search.In particular, a network search node has the following requirements:

1. The design of a search node must include a real-time sensing functionality, i.e., amean to read configuration and operational data in real-time from associated networkdevices, which are subject to search. Since operational data is often fast changing,the sensing must support a fast access to the data.

2. The design must include a database functionality that stores and maintains config-uration and operational network data in a real-time database using the informationmodel for network search [54].

3. The design must include a querying functionality, whereby a query is a statementthat characterizes information needed for network management tasks. A networksearch query is based on the query language provided in [54]. The query needs to beprocessed by the matching and ranking semantics for network search.

4. The query processing must exhibit fast response times, high throughput for searchqueries, and low overhead cost for computational resources.

5. The design must be scalable in the sense that a deployment of network search shouldexhibit above properties for a system of at least 100,000 search nodes.

6. The above metrics cannot be jointly optimized. Therefore, their trade-offs need to bestudied.

1.3 Approaches

Our approach to satisfy the above requirements makes use of the following methods andtechniques:

1. To realize the real-time data sensing functionality, we place search nodes inside thenetwork devices. The sensors in the search node make use of various proprietaryprotocols and command line interfaces and reads data local to the associated device.

2. To implement the real-time database functionality, we look into the lightweight andflexible database management systems. In particular, we focus on so-called ‘NoSQL’database systems that can be used off-the-shelf to realize the functionality.

3. To implement the query language, we look into the query languages for keywordsearch and relational algebra. Our query processing function makes use of matchingand ranking semantics developed for network search, which are based on the extendedboolean model for information retrieval [49].

4. To achieve fast response times for search queries, we apply search indexes for data inthe real-time database. For high throughput, we apply a parallel processing paradigmto support concurrent query processing on multiple CPU cores. To achieve low over-head of computational resources, we look into lightweight databases.

3

5. To achieve scalability, we make use of wave algorithms, such as an echo protocol, thatallows for distributed processing of search queries. Uddin et al [54] [48] have developedan aggregator for the echo protocol to process network search queries. We make useof the echo aggregator.

6. To study the trade-off of the above metrics, we evaluate a prototype implementation.

1.4 Contributions of the Thesis

Our thesis contributes in several aspects in design and implementation of a network searchnode in terms of the requirements given in Section 1.2. We state the contributions as follows:

• We designed a network search node that includes three major components, namely,a real-time sensing component, a real-time database component, and an echo-basedquery processing component. The database component includes an efficient indexingmodule that allows fast query processing in terms of matching and ranking. We alsoprovide an architecture for parallel query processing that enables concurrent queryprocessing on multiple CPU cores. Our design calls for fast query response times,high throughput, and low CPU overhead.

• We implemented a functionally complete network search node. It runs as a softwareon a server that supports an IaaS cloud platform.

• We evaluated our prototype implementation on a cloud testbed that consists of 9 highperformance servers that run an IaaS cloud. Our evaluations show that the prototypeachieves the 95th percentile query latency below 100 milliseconds for a query load upto 100 queries/second at 1.6% CPU overhead.

• We developed a numerical model for estimating the response times of search queriesfor a large system that includes at least 100,000 search nodes. Based on the analysison our experimental results from the prototype, using the model and properties ofthe echo protocol, we can state that a prototype based on our design can achieve anexpected query latency of below 100 milliseconds for a query load of 1 queries/second.

1.5 Thesis outline

The remaining chapters of this thesis are organized as follows. First, we present a back-ground of the topics that are relevant to the thesis in Chapter 2, followed by the relatedresearch in Chapter 3. We then discuss the design of a network search node in Chapter 4,followed by its implementation in Chapter 5. After that, we evaluate the performance of anetwork search node in Chapter 6. Thereafter, limitations and potential future works arediscussed in Chapter 7. Lastly, we conclude the thesis in Chapter 8.

Chapter 2

Background

To design and implement a network search node, it is important to understand relatedconcepts. In this section, we briefly describe the areas, from which we draw inspirationsand relevant concepts. We begin with a high-level summary of cloud computing followed bya brief discussion of how one monitors and manages a cloud. Then, we discuss some databaseconcepts and technologies, which are potential enabling technologies for this project. Afterthat, we provide a brief description of a distributed protocol that we use to implement ourdistributed search plane. Finally, we provide backgrounds on the concept and frameworkof network search.

2.1 Cloud Computing

The term cloud computing has many interpretations, but it typically refers to the use ofcomputing resources that are delivered as services to users through networks. It has abroad view of describing computing concepts, which invoke a number of computationalunits and communication networks. For example, a set of servers working together viacable connections provides a web service. According to Amburst et al [30], a computingcloud has new characteristics in a hardware perspective as an illusion of infinite computingresources that are available on demand, an elimination of up-front commitment for a clouduser caused by an initial hardware investment, and the ability to pay per use in a short-termbasis.

From a user perspective, it enables the use of the services on-demand without any require-ment to provision services — the need for a plan and a prerequisite deployment of softwareand hardware to be able to use the services. Since services are provided by a cloud provider,the user also does not require a direct contact to the hardware and software to make useof the services. We call a collection of hardware and software that provides a service as acloud or a cloud infrastructure.

We can distinguish a computing cloud based on a level of abstraction it provides for services,namely, Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software asa Service (SaaS). The IaaS is the most basic cloud-service model. Providers of IaaS offervirtual machines resembled physical machines and other resources, such as image librariesfor virtual machines, storages, load balancers, virtual networks, etc. In addition, it createsan abstraction between physical hardware and operating system and hides the complexity

5

6 CHAPTER 2. BACKGROUND

of the underlying hardware components. A hypervisor — a virtual machine manager— is a key component that manages virtual machines of the cloud users. It provides avirtual operating platform, and manages an execution environment of the operating systemsinstalled by the cloud users. Typically, any application that is developed for operating on acloud needs a model for computation, a model for storage, and a model for communication.To deploy their applications, cloud users need to install an operating-system image andtheir application software on the cloud infrastructure. Some well-known IaaS providers areAmazon EC2 [2], HP Cloud [8], Rackspace [26], etc.

2.2 Management and Monitoring of Clouds

It is inevitable that a cloud management system is needed to operate and maintain a cloud.A cloud management system manages and aggregates heterogeneous nodes as if they wereone component to realize a complete on-demand and elastic use of resources. In this project,we focus on an IaaS cloud where a virtual machine instance is usually provided as the basicunit of service. We explore some well-known cloud management systems that operate onthe infrastructure level, as follows.

VMware ESX [27] is a computer virtualization product operating in the infrastructure leveloffered by VMware, Inc, which is a leader in cloud and virtualization software and services.The VMware ESX is a hypervisor embedded in bare-metal, which can manage virtual serversdirectly on host-server hardware without an operating system, (hardware needs to satisfysome specific requirements). In addition, extra services can be added and integrated intothe VMware ESX to support, for example, automatic load balancing, efficient migration ofvirtual machines, fault tolerance, etc. Moreover, since VMware ESX is the market leaderon the server-virtualization market [34], software- and hardware-vendors offer wide rangesof tools to integrate their products or services with the VMware ESX. Nonetheless, it is aproprietary product, so a product deployment involves a license cost.

OpenStack [20], on the other hand, is a fast growing and opensource software for cloud man-agement. It provides wide ranges of functionalities for building public or private clouds. Itincludes series of interrelated projects that control pools of computing, storage, and net-working resources throughout a datacenter. The advantage of being opensource is thatit aims to support standard generic hardware. In addition, it provides an ability to inte-grate with legacy systems and third-party technologies. Furthermore, OpenStack includesservices, such as an authenticating identity service, a graphical user interface service foradministrator, a disk and server image library service, to name a few. In general, Open-Stack has three layers, i.e., hypervisor, platform virtualization manager, and OpenStacklayers. Firstly, a hypervisor layer has a hypervisor that creates and runs virtual machines.Depending on resource constraints and required technical specifications, OpenStack leavesit as an open choice. The well-known available hypervisors are KVM [12], XenServer [28],Hyper-V [15], etc. Secondly, a platform virtualization manager layer is a layer, which utilizesthe libvirt package [13], that provides an application programming interface (API) to thehypervisor. The libvirt is an opensource API that helps manage hypervisors and providesan access to manage virtual machines, virtual networks, and storages. We can gather mostof information related to the virtual machine in this layer. Lastly, an OpenStack layer hassoftware components that interact with hypervisors via libvirt. Additionally, it has furthersoftware components to support aforementioned services. Additional information for cloudmanagement can be found in this layer, for instance, a virtual disk type, an image of virtual

2.3. NOSQL DATABASES 7

machines, etc.

A monitoring system is crucial for managing a cloud. It is a key component to understandperformance of a cloud. Since a cloud computing platform consists of computing units,typically servers, and networks, it needs to be monitored to get a holistic view of the system.There is a number of existing tools to provide a monitoring of hardware and software piecesof the cloud. Nagios [18], Zabbix [29], Munin [17] are such tools , to name a few. In addition,OpenStack cloud management software suite provides basic metering and monitoring tools,namely, Ceilometer and Healthnmon[15]. They help monitor CPU/disk/network utilizationdata of the virtual machine in the cloud. Nonetheless, the cloud can be scaled up, say, to100,000 computation and network units. It raises the question whether these tools, whichare centralized, can be able to monitor a large cloud efficiently.

2.3 NoSQL Databases

The traditional Relational Database Management Systems (RDBMS), as known as Struc-tured Query Language (SQL) databases, have been used extensively. The RDBMS tra-ditionally relies on a vertical scaling technique [46], i.e., purchasing higher performanceservers as the database load increases. Additionally, it requires a database schema before-hand, which is a highly structured data model describing the organization and constructionof data in a formal language. Thus, it would prefer more predictable and structured data,and often requires an expertise for designing such a data model.

However, recently non-relational distributed databases, as known as “NoSQL” databases,are gaining interest as an alternative model for database management aimed for achievinghorizontal scaling, higher availability, and flexibility in data models. It provides a mecha-nism to store and retrieve data that has a looser structure than that in RDBMS. Nonetheless,some NoSQL database systems do allow a SQL-like query language to be used; thus, someauthors refer to them as “Not only SQL” database systems. They are usually designed toexpand and scale transparently and horizontally to take advantages of scaling out on com-modity hardware. This makes them an inexpensive solution for large datasets. In contrastto RDBMS, NoSQL databases do not necessarily to provide full ACID (atomicity, con-sistency, isolation, durability) [35] guarantees; however, an eventual consistency is usuallyguaranteed. The eventual consistency is referred to as “given a sufficiently long period oftime over which no changes are sent, all updates can be expected to propagate eventuallythrough the system.” [56]

There are several kinds of NoSQL database including a key-value store, a column store,and a document store. A key-value database stores data as a value with a specified key forlookup. It allows an application to store its data in a schema-less way. Examples of suchdatabases are Amazon DynamoDB [1] and Riak [23]. A column-store database stores databy decomposition data into pairs of an identification (id) and one attribute value. Then,based on attribute name, it stores each pair in a table with two columns (an id and anattribute). Thus, there is one table per attribute name [36]. Examples of such databasesare C-Store [5] and Vertica [9]. A document store database stores data in the notionof ‘document’ of which the data is encapsulated and encoded in some standard formats.Examples of such databases are Apache CouchDB [3], Couchbase [6], MongoDB [16].

In this thesis, we focus on the document store databases, which contain a set of documents.


Figure 2.1: Sample object respresentations: (a) a document store representation, (b) akey-value store representation, and (c) a column store representation.

A document contains a set of attribute-value pairs. These documents are not required toshare the same structure, and they are encoded in a standard exchange format, such asXML [33], JSON [11], BSON [4], etc. Hence, Interoperability can be ensured. A simpleexample of a document is given in Figure 2.1a. The document represents a person named‘Thanakorn’. A document store enables retrieval of a whole document, which may corre-spond to complete information of any real-world object, in terms of its attribute names orvalues in a straightforward way. In contrast, it requires more effort to do the same thingwith a key-value store and a column store. Figure 2.1b and Figure 2.1c show the presen-tations of the person represented in Figure 2.1a in form of key-values and column-storerepresentations. In order to retrieve all information of the person named ‘Thanakorn’ usingkey-value paradigm, one needs to retrieve each key-value pair about the person and reformas a whole object. Similarly, a column store requires more operations to do the same task.

2.4 Echo : A Distributed Protocol for Network Management

Network search makes use of the echo protocol for distributed processing of search queries[51].

The echo protocol has a tree-based distributed algorithm that executes on each node in thegraph of the search plane (Figure 2.5). The algorithm contains an approach to perform adistributed synchronization function on a graph. It defines message types and states for anexecution, and relies only on local information in the form of knowledge about its neighbors.The execution of the echo protocol can be seen as the expansion and contraction of a waveon the network graph as following: the execution can be started on any search node, whichis referred to as a root node for that particular execution, once the query has been received.Then, an explorer message is disseminated to neighbors in an expansion phase, in which aspanning tree is created. A local operation is triggered after a node receives an explorermessage. When the wave contracts, the results of these local operations are collected inecho messages and aggregated incrementally along the spanning tree to the root node.The aggregated result of the global operation becomes available at the root node whenthe execution ends. The protocol execution on the graph is illustrated in Figure 2.2. Apseudocode of the echo protocol is presented in [51].

2.4. ECHO : A DISTRIBUTED PROTOCOL FOR NETWORK MANAGEMENT 9

Figure 2.2: The echo protocol executing on a network graph [51]

Figure 2.3: A sample spanning tree created by the echo protocol [54]

Figure 2.3 shows a sample spanning tree created in the search plane by the echo protocolon nodes n1,...,n6 , with n1 as a root node. Each node has the local database D, whichcontains information objects for a search sensed from the node, and a local state qr, whichstores the local and (partial) aggregated result of a query q. The explorer (EXP) messagecontains a query q, while the echo (ECHO) message contains a partial aggregated result qr.Some of these messages are shown.

The definitions of the local operation, the aggregation operation of the query result, andthe current local state of the query execution are modeled in an object, called an aggregator


1: aggregator object processQuery()2: var: qr : dictionary;3: procedure local()4: qr := {};5: for each o ∈M(q, D) do6: insert (name(o),o,R(q,o)) into qr;7: qr := top-k(qr);8: procedure aggregate(child-qr : dictionary)9: qr := top-k(merge(qr, child-qr));

Figure 2.4: Aggregator for processing a query q on a node with local database D [48]

object of the echo protocol. Figure 2.4 contains a partial pseudocode of the aggregatorobject for processing a query q. In line 2, the current local state qr is defined. The localoperation is defined in line 3-7 whereby objects are retrieved from the matching operationM of the query q against the local database D. Then, each object o along with its nameand rank scores resulting from the ranking operation R is inserted into the local state qr.Finally, a top-k function sorts results in the local state qr based on their rank scores andtruncates the rank-sorted results after k objects in order to limit the size of the result set.On the other hand, the aggregate operation (in line 8-9) takes the results of the child nodeand merges with its local state to produce partial aggregate results. It also performs a top-kfunction afterward.

The performance characteristics of the echo protocol help in determining the performanceoutcomes of network search. Assuming upper bounds for communication delays betweennodes and processing delays for local message processing, we can determine the performancecharacteristics of network search as following [51]: (1) the execution time of a query increaseslinearly with the height of the spanning tree, which is created in an expansion phase of theexecution. Thus, it is bounded by the diameter of the graph. Moreover, it also increaseslinearly with the degree of the node. (2) The protocol overhead is evenly distributed on thenetwork graph as two messages are exchanged on each link during the execution. However,the message size depends on the specific aggregate function. (3) The number of messagesthat requires to process on each search node is bounded by the degree of the network graph.

2.5 Network Search

The network search concept is first introduced by Uddin, et al [53]. Then, a query languagefor network search is developed and presented in [54]. Later, scalable matching and rankingfunctions for network search are described in [48].

Today, datacenters that host cloud services are very large, e.g., in a magnitude of ten-thousand commodity servers. It is inevitable that the management of the cloud becomescomplex. Managements of networked systems are typically required to handle several net-work protocols for accessing the information. Examples of such protocols are SNMP, CLI,Netflow, etc. Moreover, finding information is a tedious task because one needs to know anexact location of the information, most of which require deep knowledge and experiencesabout the clusters. Additionally, the knowledge of the schema of data is required to fetchand interpret information properly. Furthermore, because of the transient nature of the

2.5. NETWORK SEARCH 11

data, information becomes obsolete very fast, in many cases, in the matter of seconds. Tohave useful information for the management at hand, we need to retrieve information pe-riodically and immediately. Although tackling all of these is trivial for a small scale setup,it is obviously infeasible for a system with a large scale. For large clouds, the paradigm ofnetwork search is introduced to deal with the retrieval of management information, whichis primarily transient and location of which is potentially unknown.

The network search concept is defined as the following. The network search can be seenin three ways. First, it can be seen as a generalization of monitoring, where the data isretrieved by its content in simple terms. Second, it can be seen as “googling the network”for operational information, in analogy to “googling the web”. Third, it can be seen as acapability to view the network as a giant database of configuration and operation infor-mation. It provides a unified interface for accessing information for network managementtasks. Additionally, it is a mean to explore management information, which allows forfinding information inside the networked system without giving location or knowing detailstructure of the data.

2.5.1 Architecture

An architecture of network search is adapted from an architecture of peer-to-peer man-agement presented in [51]. As shown in Figure 2.5, conceptually, it has three layers, i.e.,management plane, search plane, and managed system. A management plane, which sitson the top, includes the processes for network supervision and management. They com-municate to a search plane, which particularly realizes the functionality of network search.Each node in the search plane, referred to as a search node, has an associated execution en-vironment, processing, storage capacity. Furthermore, search nodes have knowledge abouttheir neighbors and communicate with them through message exchanges. We can view asearch node and its peer interaction as a network graph, where search nodes are verticesand neighbor relationships are edges. A distributed management protocol executed on thenetwork graph is discussed in Section 2.4. The bottom plane in Figure 2.5 represents thephysical network that is the managed system, and subject to search. Each network device isassociated with a search node, which maintains configuration and operational informationsensed from the network device.

2.5.2 Information Model

We present the information model for network search, often referred to the object model,as follows. Physical and logical entities in a networked system, such as servers, virtualmachines, routers, IP flows, etc., are considered as objects in a search space. The object isexpressed as a bag of attribute-value pairs. The object has a globally unique name, and atype. The object name is expressed as a Uniform Resource Name (URN) [38], because itprovides a unique, location independent, and expressive identifier. Examples of objects areshown in Figure 2.6.

A relation between objects that links objects together is identified by the attribute-valuepairs that they share. The relation allows for finding objects associated with some objectin consideration. Consider objects a, and b in a search space O. a is directly linked to b, ifa and b share an attribute-value pair. Similarly, a is linked to b, if there is a chain of directlinks between a and b.


Figure 2.5: The architecture for network search [53]

object name : ns:instance-002dobject type : virtual-machineIP address : 10.10.11.173

server : ns:cloud-1MAC address : fa:16:3e:31:7b:ee

CPU-cores : 1CPU-load : 0.112Memory : 536870912

Memory-load : 0.912

(a)

object name : ns:10.10.11.79:7730:10.10.11.125:37756

object type : ip-flowsource IP : 10.10.11.79

source port : 7730destination IP : 10.10.11.125

destination port : 37756server : ns:cloud-2bytes : 1329

packet : 3bandwidth : 66.35

(b)

Figure 2.6: Sample network search objects: (a) an object that represents a virtual machine,(b) an object that represents a IP flow

2.5.3 Query LanguageThe query language for network search is described by BNF notation [31] as follows:

q → t | q ∧ q | q ∨ q (2.1)

t→ a | v | a op v (2.2)

op→= | < | > (2.3)

The basic idea of the query language is following: A token t can be an attribute, a value, oran attribute-operator-value as in rule (2.2). The operators op are given in rule (2.3). Then,

2.5. NETWORK SEARCH 13

according to rule (2.1), a query q is made up of a token or a combination of the tokens witha logical And or a logical Or operator. In addition, a link, a projection, and an aggregationoperators are provided in the query language, which we do not discuss here.

2.5.4 Semantics for Matching and RankingNetwork search results may not be exactly matched to a given query, rather may matchapproximately. A semantic for matching search queries is defined as following. A matchingfunction M maps a query and an object onto a matching score in a real number between 0and 1, inclusively. If M returns 0, the object is not included in the result set; otherwise, itwill be included. The value from M indicates the relevance of the object to the query; higherthe score, better the match. This matching function includes objects with approximatelyrelevance to the query. Thus, it is called an ‘approximate match’. On the other hand, an‘exact match’ is a special case of the approximate match where the matching score is aboolean value either 0 or 1. It will include only objects that match exactly to all tokens ofa given query.

The matching function M for the approximate match is defined by an adaptation of theextended boolean retrieval model [49]. M uses two basic metrics, i.e., term frequency (tf ),and inverse document frequency (idf ). In network search, the tf of an attribute name, avalue, or an attribute-value expresses the frequency of those in an object. The idf of anattribute name, a value, or an attribute-value pair indicates the inverse of the number ofoccurrences of those in the object space. For a specific object o, matching function M for aterm t is a multiplication of tf and idf of the term t. The matching function M for queriesthat are constructed out of n terms and boolean operators are defined as following:

M(q1 ∨ . . . ∨ qn) = ‖M(q1), . . . , M(qn)‖p√n

(2.4)

M(q1 ∧ . . . ∧ qn) = ‖(1−M(q1)), . . . , (1−M(qn))‖p√n

(2.5)

Equations 2.4 and 2.5 use the Lp vector norm, also known as P-norm. Choosing P = ∞results in M performing the exact match, while choosing P in the interval [1,∞) results inthe approximate match. The smaller the value P , the looser the approximate match, i.e.,more objects match a given query.

Similar to web search, search results are ranked in network search. The ranking reflects thedegrees to which a query matches the search results produced by the matching function M.The matching function M also provides matching scores of search results. Matching scoresare computed using Equations 2.4 and 2.5, when M performs query matching. Additionally,The matching rule is extended to support matching of substring to an object name, e.g.,‘brooklyn’ matches ‘ns:server:brooklyn’. The contribution to the matching score is higher,if the term matches an object name or an object type, since objects that match a query vianame or type are considered more relevant than other attributes.

The ranking is also reflected by a link structure and freshness of information. The linkstructure considers the neighborhood of an object in the graph of objects and their rela-


tionships. Objects that have a high number of links are considered more important thanobjects having low number of links. The freshness of information indicates that more recentthe information, more important it is. The objects in search results are ordered by thematching scores and ranking metrics described above in a descendent order.

2.6 A Previous Network Search Prototype

A previous version of a prototype has been implemented to explore the usability of networksearch [50]. The prototype demonstrated the functionality of network search through twoapplications. One is a googling client for searching a cloud, and another is an exploratorydata analysis application for virtual resources in the cloud. However, efficiency, particularlyin terms of low latency and high throughput of search queries, of the prototype was notconsidered.

The prototype includes an implementation of search nodes, of which they cooperativelyprovide functionality for network search. Each search node has three logical subsystems, i.e.,a sensing subsystem, a database subsystem, and a query processing subsystem. The sensingsubsystem brings in information from the cloud server and creates objects, the databasesubsystem stores and maintains objects, and the query processing subsystem processes anddistributes search queries.

The prototype is designed to perform query processing in a sequential and synchronousmanner, which is simple in design. However, a drawback of such an approach is that allqueries that are in queue for processing need to wait until the query processing subsystemcompletes the job at hand. As a consequence, the average response time of search queriesis high due to waiting time caused by a query that requires long processing time. As aconsequence, at a higher query rate, low query latency cannot be achieved.

With regard to distributed query processing, a complete echo protocol has not been im-plemented in the prototype. The prototype takes an advantage of only part of the echoprotocol where a node performs local query processing and aggregates the query resultsfrom other nodes. Neither a message exchange protocol nor a distributed algorithm forquery processing is implemented. Thus, performance properties of distributed query pro-cessing cannot be derived from the echo protocol. The prototype also requires a predefinedand static topology of search nodes for query distribution and result aggregation. As aconsequence, the network search system may not be scalable due to the fact that knowledgeof all search nodes and their communication paths has to be known.

Similar to web search, network search should support an approximate matching and rankingof search results. The prototype has implemented a naive approach, i.e., the result setincludes objects on which at least one token of the query is matched. They are brought upfrom the database subsystem. Then, the query processing subsystem calculates a matchingscore based on attribute name and value pairs of all matched objects. The score along withother rank metrics are used to rank objects. However, a small set of top ranked resultsis typically required. The approach of approximate matching that brings in all possiblerelevant objects is not an efficient solution, since it requires an excessive processing cost.

Chapter 3

Related Research

Network search relates to topics, such as network management, distributed systems, infor-mation search and retrieval, etc. In this chapter, we discuss and summarize some closelyrelated projects, as well as, compare and contrast each project to network search.

3.1 Weaver Query System

The Weaver Query System (WQS) is a platform that allows to create global views of trafficflowing through network devices in near real-time [42]. This is done by deploying WeaverActive Nodes (WANs), which are small devices attached to routers for gathering the deviceinformation. Each WAN maintains device information in a local database and processesqueries against the database. A query is sent from a management station to the WQS viaa single interface using a declarative query language based on a structured query language(SQL). The system takes advantage of a decentralized management paradigm that utilizesa navigation pattern, known as an echo pattern (an echo protocol), for distributing queriesamong WANs and aggregating data.

The design of the system suggests that a management station will be less loaded than thatin a centralized system due to the fact that the system has a certain degree of in-networkaggregation process of information. The system is also shown to be robust, since each WANperforms identical functions, thus there is no single point of failure. The completion timeof a query depends on a network diameter rather than total number of nodes; therefore, itcan be expected to work efficiently in large networks.

WQS has a similar architecture to that of network search, i.e., there is a logical planewhere distributed nodes with identical functions work cooperatively to support processingof queries. Moreover, distributed algorithms that run in a WAN and a search node ofnetwork search are developed using the same protocol, i.e., through the use of the echo-pattern algorithm. However, while WQS relies on a dedicated device to host a WAN, anetwork search system allows a search node to be hosted in other mechanisms includinginside a device, on which cloud/network services are provided, that has processing andstorage capability. For the case of an information model, WQS uses a fixed schema modeland a structured SQL-like query language, which enables only schema-aware queries thatcan be matched using an exact matching paradigm. On the other hand, network searchuses a looser information model without having the need of any schema and a keyword-

15

16 CHAPTER 3. RELATED RESEARCH

based query language that can be matched using both exact and approximate matchingparadigms.

3.2 Sophia

Sophia is a distributed system that collects, stores, propagates, aggregates, and reacts toobservations about the network’s current conditions [57]. It can be viewed as a sharedinformation plane that has three main functionalities, i.e., collecting information about thenetwork via sensors, evaluating statements (questions) about the network via a declarativeprogramming environment, and reacting to the results drawn from the evaluation. Allfunctionalities are distributed functions that run on distributed Sophia nodes. It usesa declarative logic programming language as a query language that allows embedding asub-routine program, which can be evaluated at runtime. This can work on a wide-area,decentralized environment that evolves over time, where possible states of the network arenot known beforehand. The query language can also express when and where to executea statement. Moreover, it allows partial answers to a statement, in order to enable that aSophia system sacrifice completeness of answers for performance.

A Sophia node has five core components in its implementation to incorporate functionalitiesof the information plane, namely, (1) a local database that holds terms that are used forevaluating query statements, (2) a statement processing engine, (3) interfaces to sensors andreactors for accessing sensory data and controlling behaviors of the network, (4) a remotestatement processor for delegating tasks to a remote Sophia node, and (5) a schedulingmechanism.

Sophia introduces an information plane to tackle a problem of dispersed information throughdistributed nodes that work together. Each node has identical functionalities, similar tothose of a network search node, namely, to collect information, to store information in a localdatabase, and to process a query in a distributed manner. Furthermore, similar to networksearch, the data model and the query language of Sophia have the flexibility to captureheterogeneity of information. Additionally, a distributed algorithm makes Sophia works wellin large scale networks, where query processing can be done in a fast time scale. However,unlike network search, Sophia lacks a search-like functionality, which allows for explorationof yet-to-be-known information in a network or a networked system. In conclusion, we areinspired by functionalities of a Sophia node that cooperatively provides functionalities ofthe information plane, yet it needs to adapt to satisfy the requirements of network search.

3.3 Distributed Image Search in Camera Sensor Networks

Yan et al proposed the design and implementation of a distributed search engine for awireless camera sensor network, where images from different sensors can be captured, stored,and searched [58]. A sensor network typically has a limitation in resources in terms ofenergy, network bandwidth, computational power, and memory capacity. As a result, it isimpractical to transmit all images for a centralized search. Instead, the proposed systemuses a compact image representation in a text format, so-called a visual word (visual term).This allows the design of the system to apply a search concept in the Information Retrieval(IR) paradigm.

An image query is converted to visual terms, and then the terms are used as input, of which

3.4. MINERVA ∞ 17

results are matched and ranked using a weighted similarity measure called tf-idf, whichis analogous to that of IR. A search is done by having an architecture that uses a singlecentralized node to distribute a query to all sensor nodes and receive a top-k result from alocal processing of each sensor node. Then, it produces a final top-k result set, and finallyit may requests for images that correspond to the result set from specific sensors.

The system utilizes an inverted index for optimizing matching and ranking functions and atree data structure for maintaining visual terms.

Yan’s system is relevant to the project in the sense that data that is subject to search islocated, sensed, and maintained locally in a node itself. The data in a node is inefficient tomigrate out of the node for processing due to, in the Yan’s case, energy constraints. In ourcase, the data would be obsolete by the time the information is migrated for processing, dueto a fast-changing characteristic of data. Therefore, it requires a distributed in-node pro-cessing. Furthermore, Yan’s system exploits a search function through the use of conceptsfrom Information Retrieval. Information Retrieval is attractive to network search, since itprovides concepts for matching and ranking of objects with respect to queries and allows forexploratory of undiscovered data. Nonetheless, Yan’s system has a centralized architecture,which limits scalability in term of system size and leads to having a single point of failure.

3.4 Minerva ∞

Minerva∞ is a web search engine that has a peer-to-peer architecture [47]. It has algorithmsfor creating overlay networks that contain data of interest, placing data on a network node,load balancing, top-k algorithm, and data replication. It is designed for a network of alarge number of peer nodes, each of which has computation, communication, and storagecapabilities. Each peer node has functionalities to crawl web pages, discover documentsthat are subject to search, and compute scores of documents. A scoring function utilizes aweighted similarity measure (tf-idf ) as in an Information Retrieval context.

The system works as follows: web pages are initially loaded and distributed into the systemas a batch process. It builds a global overlay network, which connects all peer nodes, andmany term-specific overlay networks, each of which connects some nodes that maintaindocuments related to the term. A query may have many terms, each of which is processedin the corresponding term-specific overlay network in a distributed manner. A top-k resultis returned as an end result. The system allows incomplete answers for the sake of betterperformance. Note that, the system is built with an assumption that a document is rarelyupdated.

Minerva ∞ operates on a peer-to-peer architecture, which is similar to network search,where each node works cooperatively and has identical functionalities. A distributed websearch is an interesting feature that is closely related to network search, since networksearch can be viewed as a search engine for operational information in networks/networkedsystems. Additionally, Minerva ∞ focuses on distributed algorithms that make an efficientsearch scalable. However, differences in requirements make Minerva ∞ less attractive, i.e.,a data that is used in a web search is rarely changed compared to querying (searching),while data in networked systems is fast-changing. Minerva ∞ has data migration and dataplacing mechanisms to balance loads among nodes, which are impractical in network searchwhere data changes in sub-seconds. Nonetheless, its capability to provide an incomplete-

18 CHAPTER 3. RELATED RESEARCH

yet-meaningful answer to a search query, through the use of ranking and top-k functions isapplicable in network search. Additionally, Minerva ∞ also supports parallel processing ofa search query, which inspires in capitalizing computational resources.

Chapter 4

Design of a Search Node

A network of search nodes is formed in the search plane. They cooperatively provide servicesfor network search. Each search node has an interface to the management plane as an accesspoint for network search. The search node senses information in network devices, maintainsthe information, and performs distributed query processing. We realize a search node as asoftware component that runs inside devices, which are servers that provide cloud services.Each search node is responsible only for information within the server where it resides.Additionally, each search node has an identical functional capability.

4.1 An Architecture of a Search Node

We illustrate the main components of a search node and their interactions in Figure 4.1.A search node has three interfaces, which are defined by their end points, i.e., an interfaceto the management plane, an interface to peer search nodes, and an interface to a cloudserver.

A distributed query processing component, which is placed on the top, provides function-alities of distributed query processing. It interacts with the management plane and peersearch nodes via message exchanges through an interface to the management plane andinterfaces to peer search nodes respectively. It also interacts with local databases via thedatabase API. The query processing component is based on the echo protocol, which de-fines the message exchange protocol as well as a distributed algorithm for query processing.Knowledge about peers of a search node is provided by a topology manager component,which is used by the echo protocol for message exchanges.

The component in the middle is a local database component, which gives an access tolocal information for a search. It contains an object database that maintains local objects.Additionally, it also contains an index database that stores indexes of attribute names,values and attribute-value pairs from the objects in order to optimize response times of asearch. We further discuss the search index in Section 4.2.

A sensing component placed at the bottom has sensors that sense information associatedwith the underlying managed system, i.e., a server that provides cloud services, throughperiodic polling. Such information includes server configurations, virtual machine utiliza-tions, etc. The sensors organize this information as objects and store the objects in the

19

20 CHAPTER 4. DESIGN OF A SEARCH NODE

Figure 4.1: An architecture of a search node [48]

local databases. In this thesis, we do not focus on designing of sensing functionality, whichrelies on the earlier network search prototype [50].

4.2 A Design for Efficient Local Query Processing

For processing search queries in a search node, which can be referred to as local queryprocessing, we consider two design goals: (1) the query processing function should exhibitfast response times of search queries to enable real-time search and (2) support a high queryload. In this thesis, the above design goals are met for basic queries, which are defined bythe query language presented in Section 2.5.3.

In order to achieve the first goal, we introduce search indexes of potential search terms,which can be attribute names, values, or attribute-value pairs of objects. A search index isrepresented as a key-value pair where the search term is represented as the key and the valueis a tuple that contains an object id, a matching metric, and ranking metrics. The objectid is a pointer to the object in the object database, while the metrics contain informationthat is needed by the matching and ranking functions (see Section 2.5.4).

The indexes of the search terms are created and maintained after creation and updatesof objects in the object database. Having the indexes of search terms enables local queryprocessing to perform query matching without retrieving entire objects and ranking withouthaving to compute the ranking metrics during the process. Thus, the index enables fasterquery processing. In addition, if only top-k results are returned, better performance can beachieved.

The indexes of the search terms reduce processing time of the local query processing. Asa result, faster response times of search queries can be achieved. However, there is a cost

4.2. A DESIGN FOR EFFICIENT LOCAL QUERY PROCESSING 21

Figure 4.2: An architecture for concurrent query processing in a search node

associated with maintain such indexes. That is, it adds processing overhead due to updatesof matching and ranking metrics in the index entries. Such an update takes place wheneverany attribute-value pair of an object is updated. An update of one attribute-value pair ofan object may result in updates of matching scores of all index entries associated with theobject. If an object is updated frequently, it therefore increases the processing cost due toa high frequency of updates of the associated indexes.

To achieve the second design goal, we make use of concurrency in local query processing.When queries are forwarded to a search node, multiple threads of the query processingfunction are executed, if possible, on multiple CPU cores, if they are available in the searchnode.

Figure 4.2 shows the design for concurrent processing of search queries in a search node.On the top of the figure, an echo protocol message interface receives messages from eitherthe management plane or peer search nodes. Each message is assigned to a message queueassociated with one of the query processing threads by a message dispatcher. Each messageis assigned an identifier, which is used to determine the query processing thread the messageis being forwarded to. This allows the same query to be handled by the same thread. Thethreads, therefore, do not need to share the states of the echo protocol. The query processingthreads, presented in the middle of Figure 4.2, run asynchronously. They are responsiblefor processing search queries against the local databases and updating echo protocol andaggregator states. The processing of search queries is performed using the aggregator objectof the echo protocol (see Figure 2.4 in Section 2.5). As part of an execution step, one ormore messages can be generated and sent to other search nodes (see Section 2.5).

The database interface that provides an access to the local search indexes and objectssupports concurrent connections. A thread that executes a query locally establishes sucha connection, which enables the thread to have an exclusive connection to the object and

22 CHAPTER 4. DESIGN OF A SEARCH NODE

index databases. This allows for multiple threads to perform multiple query execution atthe same time and, therefore, a high throughput can be achieved for a high number of queryprocessing threads.

In our design, the number of threads in a search node is assigned manually at the moment.Obviously, we want that the number of threads spawned in a search node to be adjusteddynamically by the query load. We will explore such an option in our future work.

Chapter 5

Implementation of a Search Node

A search node is implemented as a software that runs on a server that provides IaaS-cloudservices. To implement network search functions, we apply an approach based on object-oriented programming [45]. The approach, coupled with regular use of design patterns[39], allows us to maintain and manage programming codes much more flexibly. It helpsdividing the system into small implementation components, which are simple to implementor modify.

To implement modules and components, we use the concept of “low coupling and highcohesion” [44]. The coupling is the degree to which each program module relies on othermodules, while the cohesion refers to the degree to which the elements of a module belongtogether. Put together, the concept is refered to as the module loosely depends on othermodules and the elements in a module are closely related. The benefits of using low cou-pling and high cohesion concept are twofold. First, requirement changes typically affect toonly a single module, which can be easily modified, maintained, and tested. Second, theimplementation codes are easy to read, since each module and component have very focusedresponsibilities and have a clear separation. Thus, further development can be done withless effort.

We use the Python programming language for our implementation. Python is a high-levelprogramming language that enables writing software modules. It provides a rich libraryof syntactic sugar, which enables the codes of software modules to be more expressive andeasier to read compared to code written in many other languages, such as C and Java[22] [10] [32]. A key reason for us to choose Python is that our implementation interactswith libvirt [13] and various APIs of OpenStack, which are primarily based on Python. Inaddition, we are inspired by the fact that Python supports rapid prototyping. We primarilyuse Python version 2.7 with CPython interpreter.

As discussed in Section 4.1, a search node has three key modules: (1) sensing, (2) localdatabase, and (3) query processing. We discuss the implementation of each module indetail.

23

24 CHAPTER 5. IMPLEMENTATION OF A SEARCH NODE

5.1 Implementation of the Sensing Module

5.1.1 Overview of the Sensing ModuleThe sensing module employs a set of sensors that periodically poll management informationfrom the associated devices. Sensors are software components written in Python. The sensedinformation is represented as objects in the form of sets of attribute-value pairs. The sensorspopulate and update such objects in the object database.

In our implementation, the sensors run on physical servers that provide cloud services. Theycollect information about servers, virtual machines hosted on servers, users and groups inservers, processes that run on servers, as well as IP flows. Different sensors have differentrefresh rates depending on the lifetime of the information that they poll. When a sen-sor detects changes in an object, the sensor aggregates all changed information includingattributes and values of the object in a single database operation.

We further illustrate the data sensing mechanism through a sensor that collects informationabout IP flows. An IP flow either captures a server-to-server communication or a virtual-machine-to-virtual-machine communication for a short duration. In particular, we specifya flow as a sequence of IP packets from a machine, physical or virtual, to another machine,for a small duration. We identify a flow by IP and port addresses of the both ends.

The sensor recognizes a flow by listening to the network interface, through which IP packetspass. Subsequent IP packets that have the same identifier create a flow. The flow is updatedwhen the sensor collects more packets corresponding to the flow. The flow is consideredalive until the sensor does not detect any futher packet associated with the flow withina certain period of time. When the flow is terminated, the sensor notifies the databasecomponent to remove the object. The information that the sensor reads for a flow includessource and destination IP addresses, source and destination ports, bytes and packet counts,and average bandwidth. The sensor makes use of the Python library named “Scapy” [24]to capture packets from the network interface of a server.

Even though most of the sensors have been developed in the previous prototype [50], we putan effort to reorganize them following the object-oriented technique as well as to rewritethe code to follow the Python coding style [55]. We include a base sensor that allows us todevelop a new sensor fast by inheriting functionalities of the base sensor.

5.2 The Local Databases Component

A database component provides a mean to maintain real-time data for network search. Itlogically sits in between the modules for data sensing and distributed query processing. Itconsists of two databases for network search, i.e., databases of search objects and search in-dexes. The database of search objects, referred to the object database, maintains a collectionof objects populated and updated by the sensors. The object design follows the informationmodel for network search (see Section 2.5.2). On the other hand, the database of searchindexes, referred to the index database, maintains a collection of search indexes that areused for approximate matching and ranking. The search index is discussed in Section 5.2.2.

We implemented the database platform using a NoSQL database platform (see Section 2.3),since it enables storing information using a loosely structured data model that allows to

5.2. THE LOCAL DATABASES COMPONENT 25

{_id : ObjectId(“5215d68ce2b0366d5f4717d3”),

object-name : “ns:cloud-1”,object-type : “server”,cpu-cores : 24,ip-address : “172.31.212.81”,

linux-distribution : “Ubuntu:12.04:precise”,cpu-load : 12.2,

memory-load : 92,content : [“object-type”, “cpu-cores”, “memory-load”, “cpu-load”, “object-

name”, “ip-address”, “linux-distribution”, “server”, “ns:cloud-1”,“172.31.212.81”, “Ubuntu:12.04:precise”]

}

Figure 5.1: A sample MongoDB object representing a server in JSON

capture the heterogeneity of network information. Specifically, we use MongoDB [16], whichis one of the most popular open-source document-store systems [7] to realize the databasecomponent.

A database in MongoDB is called a collection. A collection contains objects in a variationof JSON (JavaScript Object Notation) format [4]. JSON is a text-based, human-readable,and language-independent data interchange format [11]. JSON allows realizing an objectin our information model in a straightforward way.

In this implementation, we consider heavy read operations to the database component tosupport high query loads. MongoDB supports read intensive applications, because it uses ashared lock for read operations that allows concurrent reads to a database. However, for awrite operation, it gives an exclusive access to a single write operation. Both read and writeoperations compete for an access to the lock. The lock is on the level of a collection, whichis a bottleneck. A write operation for a single attribute in one object blocks all accesses tothe collection, until the write operation is completed. Despite the database-level locking,MongoDB has an approach to optimize read and write operations. The platform releasesthe lock when it has been held by a read or a write operation for a long period of time orwhen it is on hold for a disk access while holding the lock, and it allocates the lock to thewaiting operations.

The database component in a search node is realized as a MongoDB server that runs on adedicated core in the node. We use version 2.4.5 of the database platform. Data maintainslocally; therefore, we do not use a distributed mechanism (sharding) of MongoDB. We usethe API provided by the Python Driver of MongoDB, which is called PyMongo to interactwith the platform [21].

5.2.1 The Object DatabaseAs stated earlier, the object database is a MongoDB collection that maintains a set ofobjects in JSON format. Figure 5.1 illustrates a snapshot of a sample object in the objectdatabase in the JSON representation. The objects are identified by the database platformusing the attribute named “_id”.


Figure 5.2: A sample MongoDB query

Objects in a collection can be retrieved by a MongoDB query, which is also representedas a JSON object. A sample MongoDB query is illustrated in Figure 5.2. The querybasically defines a condition on an attribute-value pair to match an object. A query criteriaip-address : “172.31.212.81” in the sample query indicates such a condition. (For detailsabout MongoDB query formulation see [25].)

We extend a MongoDB object to include a special attribute named “content”, which in-cludes a set of all keywords in the object as the value. To retrieve objects using a keyword,we make use of that attribute. For example, a query specifying a keyword “server”, specifi-cally written as {content : “server”}, matches the sample object in Figure 5.1, because theattribute named “content” of the object contains that keyword. Although the current ver-sion of MongoDB supports a keywordbased search, during the time of the implementation,MongoDB did not provide such a functionality.

MongoDB natively indexes all objects by the attribute named “_id”. To optimize the la-tency of keyword-based object retrieval, we maintain a secondary index structure of theattribute named “content”. While other attributes can be indexed, the overwhelming num-bers of attribute names make such an approach prohibitive. For indexing, we apply a nativeMongoDB indexing scheme, which is based on B-trees [37].

5.2.2 The Index Database

As mentioned in Section 4.2, search indexes are used to reduce the time for local queryprocessing. While there are many options to implement the search indexes, we decided toimplement them as a collection of index objects in MongoDB. A key reason for choosingsuch an option is to maintain a simple, unified access to the search data, which enablesrapid prototyping.

A search index is implemented as a JSON object that encapsulates a search term, an object-id, and matching and ranking scores. Figure 5.3 presents a sample search index. Thesearch term can be an attribute, a value, or an attribute-value pair of an object in theobject database. The object-id is a pointer to the search object in the object database. Theterm frequency, name resolution, and the type resolution scores correspond to matchingand ranking metrics of the search term for the object. A search index is identified jointlyby the search term and object-id.

To compute the term frequency score of a search term t for an object o, we use the formulatf(t, o) = 0.5 + (0.5 ∗ f(t, o))/maxt(f(t, o)), where f(t, o) is a number of occurrences ofthe term t in the search object o. The term-frequency score is unbiased to bigger objects(objects with lots of attribute-value pairs), since a frequency of the term is normalized bya frequency of the term with the maximum frequency in the object. The inverse document

5.2. THE LOCAL DATABASES COMPONENT 27

{_id : ObjectId(“525a622f72a17bf2df89fb93”),term : server,

object-id : ObjectId(“5215d68ce2b0366d5f4717d3”),term-frequency score : 0.75,name-resolution score : 0,type-resolution score : 1

}

Figure 5.3: A sample search index

frequency (idf ) metric indicates an inverse of a number of occurrences of the term in theobject space. It usually comes along with the term frequency score. Computing idf metricrequires a distributed aggregation protocol. We plan to investigate the inclusion of idfmetric in our future work.

The values of other two scores are mapped onto a boolean value. If the search term is inthe name of the search object, the corresponding score is set to 1, 0 otherwise. Similarly, ifthe search term corresponds to the type of the search object, the corresponding score is setto 1, 0 otherwise. These metrics collectively make up the matching score.

In our current implementation, the ranking score is based solely on the matching score.The implementation of other ranking metrics, such as freshness and connectivity of searchobjects, will be part of our future work.

Search indexes are retrieved in the same manner as the search objects through MongoDBqueries. To optimize the latency of retrieval of search indexes, we maintain a secondaryMongoDB index structure of the attribute named “term” for the index database.

5.2.3 The Index Manager

The index manager is a component, written in Python, responsible for creating and updatingsearch indexes in the index database. The manager reacts to the creations and updates ofsearch objects.

When any attribute of a search object is updated, all search indexes associated with thesearch object are updated. This is due to the fact that the attribute in the search objectmay lead to a creation of new search indexes, deletion of existing search indexes, or changesof the term-frequency scores of all associated indexes. Obviously, a creation or a deletion ofa search object leads to the creation or deletion of corresponding search indexes.

The index manager does not index numerical values, since most numerical values in theobject correspond to operational states, and they often update at a very fast time scale,which make it computationally expensive to maintain their indexes.


5.3 Implementation of the Module for Distributed QueryProcessing

5.3.1 Software Components

5.3.1.1 The Echo Protocol Component

The echo protocol is a base protocol for distributed processing of search queries. Theprotocol is implemented in all search nodes. As discussed in Section 2.4, the protocoldefines a message exchange protocol and a distributed algorithm for query processing. Foreach search query invoked at a search node, the protocol disseminates and processes thequery, maintains the state of the invocation, and aggregates the (partial) query results fromneighbor search nodes.

A query is invoked at a search node from the management plane using an interface, whichis further described in the Section 5.3.1.2. Each query invocation starts an execution ofan echo protocol. The execution is identified by an invocation identifier (invocation id).The invocation id is generated by concatenating the unique address of a search node thatreceived the query in the first place and a sequence number. Therefore, the invocation id isglobally unique in the search plane. The invocation id allows the protocol to recognize andexecute multiple queries and maintain the protocol states of each invocation.

We organize the implementation of the protocol into three interacting subcomponents [51].Those are a message subcomponent, an echo protocol subcomponent, and an aggregator. Weillustrate the subcomponents by a class diagram in Figure 5.4. The protocol message definesthe format of echo protocol messages. The echo protocol subcomponent contains protocolstates and procedures for an execution when a search node receives a protocol message.The aggregator object encapsulates the state of the (distributed) query processing, as wellas methods for local query processing and aggregating of partial results.

Figure 5.4: A class diagram of the echo protocol component

A protocol message consists of an invocation id, a sender identification (from), a messagetype, and payloads. The form indicates the address of the search node that sent themessage. The message type specifies the type of the message payload. We consider twotypes of messages: explorer and echo. An explorer message contains a search query inthe payload, while an echo message contains an (partial) aggregate result. The messagingscheme for echo protocol is discussed in Section 5.3.1.3.

The aggregator object (see Figure 2.4) captures an aggregator state for a query as well asfunctions for local data retrieval, matching and ranking as well as aggregation of partialresults. (Further explain in Section 5.3.1.5).

5.3. IMPLEMENTATION OF THE MODULE FOR DISTRIBUTED QUERYPROCESSING 29

Table 5.1: Sample states of the echo protocol

invocation id parent NSN07-00001 n3 n1, n2SN07-00002 n5 n1, n4, n6SN06-00001 n3 n1, n2, n4, n5, n7

The echo protocol instance maintains a set of protocol state variables. The set containstuples of the structure (invocation id, parent, N). Each tuple is associated with a queryand identified by the invocation id of the query. The state variable parent maintains anidentification of the search node, from which the protocol receives the query for the firsttime. The state variable N maintains a set of neighbor search nodes. We illustrate a sampleset of protocol states as a table shown in Table 5.1, whereby search nodes are identified asn1, ..., nn.

We explain the protocol execution based on the pseudocode of the echo protocol [51]. Whenthe echo protocol receives an explorer message, an invocation id is retrieved from the mes-sage. The protocol checks, if the invocation id has been encountered before. If yes, themessage is discarded by the protocol. Otherwise, the invocation id is used to create a pro-tocol state tuple. The variable parent in the tuple is set to the sender of the message.The variable N is initialized by information provided by the topology manager, which isexplained later in Section 5.3.1.4. The protocol then invokes the function local() in theaggregator. The function local() uses the query in the payload of the message as the input.After that, the protocol disseminates the explorer message to all search nodes specified inthe state variable N .

When the echo protocol receives an echo message, the protocol obtains an (partial) aggregateresult from the payload of the message. The function aggregate() in the aggregator isperformed using that (partial) aggregate result as an input.

Each time the protocol receives a message, whether an explorer message or an echo message,the state variable N is updated by subtracting the message sender from the set. When theset is empty, the aggregated result in the aggregator state is sent back to the search nodespecified by the state variable parent via an echo message. The protocol state tuple of thequery is then erased.

An aggregator is instantiated when the protocol receives a query. If there are multiplequeries in the process, there can be multiple aggregators that are initiated, one aggregatorper query. The aggregator state is initialized as a local search result, which is the endproduct of an execution of the function local(). The aggregator state is updated wheneverthe function aggregate() of the aggregator is executed. The protocol aggregator is deletedwhen an echo message containing the aggregator state is sent.

5.3.1.2 The Management Plane Interface

The management plane interface is an access point to a search node from the managementplane. A search query is invoked at the search node, and a result of a search query isreturned to the management plane via this interface.


The interface is realized using the Hypertext Transfer Protocol (HTTP) over a connection-oriented socket based on Transmission Control Protocol (TCP). A search node providesa web (server) socket that keeps listening and waiting for initiatives from clients in themanagement plane. Once a client has connected, e.g., using a web browser, to a searchnode, data, such as a search query, can be transmitted.

For a data exchange format, we follow the widely used JSON format. The JSON formatapplies to both search queries and search results when they are exchanged via this interface.The motivations behind using JSON are given as follows. First, it provides flexible inter-operations to applications on the management plane. The applications can be developedin any programming language that has a JSON parser. Second, objects in the informationmodel for network search can be parsed to a JSON format in a straightforward way, sincethey have a similar way to define an object.

5.3.1.3 The Peer-node Interface

The peer-node interface is an interface for exchanging messages of the echo protocol (seesection 5.3.1.1) between search nodes. The interface implements a messaging scheme, whichdefines a pattern for transmitting messages to ensure communications between two searchnodes.

We use a message queue for the messaging scheme. Each message received on a peer-nodeinterface is placed in a queue. Messages are consumed following a “first come, first served”policy. No message is discarded. Therefore, the queue length is not fixed.

We implement the message queue using an open-source queuing library, called ZeroMQ[19]. ZeroMQ is a high-performance asynchronous messaging library, which is written inC++ programming language. It supports various operating systems.

Despite its implementation language, there are bindings that provide APIs for many pro-gramming languages including Python. We interact with the library via function callsthrough the API of the Python binding. The API is simpler than many other messagingqueuing APIs, since it does not have the requirement to implement a queue manager likeother APIs.

ZeroMQ handles connection establishments and re-establishments. Therefore, it eliminatesthe need for having a dedicated module for managing connections between search nodes.The peer-node interface is implemented using ZeroMQ with a simple duplex communicationchannel over TCP. Messages are pushed to other search nodes via a ZeroMQ queue. At theother end, the search node pulls out the messages. ZeroMQ requires a message content tobe in a form of a sequence of bytes, but the echo protocol message is in a Python objectstructure. Therefore, a Python object serialization module, so-called ‘pickle’, is used for theconversion.

5.3.1.4 The Topology Manager Component

The topology manager is a supporting component to the echo protocol. The topologymanager is responsible for providing the identification and location of the neighbors of asearch node to enable the peering for the echo protocol in the node. The topology manager,

5.3. IMPLEMENTATION OF THE MODULE FOR DISTRIBUTED QUERYPROCESSING 31

in our case, uses a combination of IP address and an application-level port address as anidentification and location.

In our implementation, the topology manager is manually configured in such a mannerthat the graph of search nodes displays small-world properties [52] [43]. Such graphs arecharacterized by high clustering coefficients that enable a small diameter. Such an approachworks for a small scale setup. However, for a large system that has at least hundred thousandelements, it requires a dynamic and self-configuring topology manager.

5.3.1.5 The Local Query Processing Component

The local query processing component provides functions for the aggregator object of theecho protocol. It implements operations to process a query in a search node. The queryprocessing includes the retrieval of objects from the local database that matches to thequery, and ranking of matched objects. The local database is described in Section 5.2.

A query that is subject to local query processing is formatted as a Python object that in-cludes a query statement and a set of matching and ranking parameters. A query statementis a MongoDB query translation of a network search query. There is only one matching pa-rameter that specifies whether an exact match or an approximate match (See Section 2.5.4)should be performed during the query execution. There are several ranking parameters.While one parameter specifies the number of search results to be returned, the others definethe weights for different ranking metrics. A sample of a query is shown in Figure 5.5b.

The local query processing is invoked by the function local() of the aggregator object. Itincludes a matching function, a ranking function, and an object retrieval function. We limitthe discussion to the case of an approximate match of search queries.

When the function local() of the aggregator object starts, it initializes the aggregator stateqr. Then, it calls the local query processing component to perform matching the queryagainst the local database using a matching function M . The matching function M extractssearch terms from the query and retrieves index entries, which are in the index database,that correspond to the search terms. The function then groups the index entries by theirobject ids and computes a matching score per object id using the scores in the index entriesin the corresponding group and the weight parameters specified in the query. A rankingfunction then orders the object ids into a list using the matching scores and truncates thelist to top k object ids. Lastly, object retrieval retrieves search objects from the objectdatabase based on the object ids and then stores them in the aggregator state qr along withtheir matching scores. For example, using information from Figure 5.5, the matching scoreof the object with the id “5215d68ce2b” is 0.59375.

5.3.2 Software Component InteractionsA complete class diagram of the query processing module is shown in Figure 5.6, whichincludes software components and their relations. On the top of the figure, there areManagementP laneInterface class, PeerNodeInterface class, and MessageDispatcherclass. The ManagementP laneInterface class and the PeerNodeInterface class are ex-plained in Section 5.3.1.2 and Section 5.3.1.3, respectively. The MessageDispatcher classis a component for dispatching the echo protocol messages to the cooresponding query pro-cessing instance (we further discuss in the next subsection). In the middle, there are classes


Figure 5.5: Sample Python objects used in the function local() when the network searchquery (a) is invoked : (b) a sample query object and (c) index entries for terms server andcloud-1 that belong to object id 5215d68ce2b.

for the echo protocol component, which are the same as those in Figure 5.4. On the bottom,there are TopologyManager class and LocalQueryProcessing class, which are explainedin Section 5.3.1.4 and 5.3.1.5, respectively.

Figure 5.6: A class diagram shows components and their relations in the query processingmodule

A query with parameters from a management process is received by the ManagementP laneInterface.The query is in JSON format, which is then converted into a Python object and encapsu-lated in the echo protocol message by the interface. The message is forwarded to the echoprotocol component via the MessageDispatcher. The echo protocol component can alsoreceive a query object or a partial aggregate result through echo protocol messages fromother search nodes via the PeerNodeInterface. The messages received from both interfacesare put into a queue, which is implemented using a basic Python queue with an unlimitedlength. An echo protocol instance retrieves the message from the queue and executes it by

5.4. CODE READABILITY 33

invoking the local query processing component through the local() and aggregate() func-tions of the aggregator object. Explorer or echo messages may be sent to other search nodesdepending on the current state of the echo protocol. Additionally, the TopologyManagersupports the echo protocol by providing neighbor identifications and their addresses via ashared variable.

A complete class diagram of a network search node is illustrated in Appendix A.

5.3.3 Concurrent Query ProcessingTo support concurrent executions of query processing, we implement the echo protocoland the local query processing components in such a way that multiple threads of thosecomponents can be spawned and run in parallel to process queries. We follow the concurrentquery processing design illustrated in Figure 4.2. The threads of the local query processingcomponent run independently and can access the database at the same time, since theMongoDB database allows multiple concurrent-read accesses.

In our implementation, there is a logical component, called “MessageDispatcher”, locatedin between interfaces of the distributed query processing module and queues of the echoprotocol threads. The Message Dispatcher is responsible to decide which thread will handlethe next query invocation and dispatch the echo protocol messages to the threads such thatthe messages of the same invocation are assigned to the same thread. Therefore, the threadsdo not share their states. We currently implement the Message Dispatcher to distributenew query invocations to the threads using a hash function of an invocation id of theecho protocol messages. Obviously, this scheme does not balance a load, since each queryinvocation may requires different processing powers. An actual load balancing scheme willbe considered in the future work.

In the implementation, we do not use the native Python threading library because of Globalinterpreter lock (GIL). GIL is a mechanism used by CPython interpreter that allows onlyone thread to execute at a time. As a result, we cannot fully utilize processing resourcesin a multi-CPU cores server. Instead, we implement each thread as a separate process. Aprocess is an instance of a program that has its own memory and processing resources andruns independently. We use the native Python multiprocessing library to do that.

As part of a network search node, there are one process of the local databases component,one process of all other components except for query processing threads, and a configurablenumber of process(es) of query processing thread(s).

5.4 Code Readability

We include comments in the Python code that describe modules, functions, and variablesof the software. Generally, the comments are helpful to understand implementation detailsof the software without reading through the code itself. The coding comments are writtenin an appropriate manner such that an automatic documentation tool, named “pydoc”,from native Python libraries can instantly produce a software implementation documentfor developers.

Chapter 6

Performance Evaluation of theNetwork Search Prototype

6.1 Testbed

The testbed used to evaluate our implementation consists of a server cluster that providesan IaaS cloud platform. The platform contains nine high-performance servers connectedby a Gigabit Ethernet switch. These servers run OpenStack cloud management software[20]. One of these servers runs as an OpenStack Controller Node, which manages the cloud.The rest of the servers run as OpenStack Compute Nodes, which host virtual machines. Allservers have an identical specification, i.e., two 12-core AMD Opteron processors, 64 GBRAM, two 500GB of hard disks, and run Ubuntu version 12.04.1 LTS with Linux kernel3.2. The Openstack version Folsom along with KVM hypervisor [12] is used. The clusterruns academic experiments and simulations.

Each server in the cloud hosts a search node. The database component of each searchnode maintains approximately 5,000 objects, which include objects of type server, virtualmachine, process, network interface, and IP-flow. Most objects have 15-25 attributes. Forthe experiments, all objects are synthetically created from actual search objects populatedby the sensing module.

6.2 Load

We produce synthetic loads for performance evaluations as follows. The query load consistsof a collection of search queries with 2-5 tokens. Each token can be an attribute, a value, oran attribute-value pair. The tokens are picked in such a way that each token would match to2 to 300 objects from the search space. Queries are invoked using a Poisson arrival process[40]. Queries are distributed to all search nodes. During each run of the experiment, eachnode processes the same number of queries. The number is equal to the total number ofqueries invoked to the search plane during the run. A small software on each server injectsthe queries at a rate that sums up to a global query invocation rate. (‘global’ refers to allsearch nodes in the search plane, while the word ‘local’ refers to a single search node).

A write load is injected locally to the database in a search node. The write load consists ofobject insertions, object deletions, or updates of attributes of an object that are performed

35

36CHAPTER 6. PERFORMANCE EVALUATION OF THE NETWORK SEARCH

PROTOTYPE

uniformly at random. The write loads are synthetically created and performed using aPoisson arrival process. During the experiments, we maintain the ratio between the globalquery load and the local write load for each search node at 3:1, which reflects a read intensiveload.

6.3 Setup

During the experiments, the cloud servers are underutilized, which is characterized by 10-20% of CPU utilization. The CPU load is mainly created by virtual machines that runlightweight simulations.

The network graph of search nodes is statically configured. The graph has small worldproperties [52], where each search node has two to four neighbors. Note that a spanningtree created by the echo protocol is dynamic, i.e., it may be different from one queryinvocation to another.

For all experiments, we use the approximate matching function that is based on equation2.4 and 2.5, where p = 2. The ranking function limits the number of search results returnedto 200 objects.

Due to the controlled write load, the sensing and indexing modules are not involved in theexperiments. The overhead of both modules are estimated to 2-3 percentages of the CPUutilization.

Figure 6.1: A topology of search nodes in the testbed

6.4 Metrics

The performance metrics that we consider are global query latency, local CPU utilization,and local latency. The global query latency is measured from the time that a query hasbeen invoked at a search node until the result of the query is returned from the search node.The CPU utilization measures CPU usage in percentage of a search node on the server. Itindicates the CPU overhead to run a search node in the server.

The local latency measures the time spent to process a query on a search node. The locallatency involves five phases of operations, namely, message transmission latency, waitingtime, matching and ranking latency, database access latency, and result aggregation latency.

6.5. EXPERIMENT CONFIGURATION 37

The message transmission latency is the duration that an echo protocol message is trans-mitted between two search nodes. The waiting time is the duration that an echo protocolmessage, which carries either a query or query results, waits in a queue before being pro-cessed. The matching and ranking latency is the duration that a query is being matchedto local objects and matched objects are being ranked. The database access latency is theduration that indexes of terms and locally stored objects are retrieved. Finally, the resultaggregation latency is the duration that local query results are aggregated with the resultsobtained from echo messages.

6.5 Experiment Configuration

Before any experiment, the local database in a search node is initialized with 5,000 syntheticobjects. During the experiments, the query load and the write load are injected. Thewrite load is performed in such a manner that the average number of objects in the localdatabase remains constant over time. During each run of the experiments, the system runsfor 30 seconds for warm up, and then it runs for 120 seconds, during which measurementsare performed. The query load ranges from 5 queries/second to 800 queries/second. Thenumber of query processing threads, each of which runs as a separate process (on a separateCPU core), ranges from one to four.

6.6 Results

We evaluate the performance of the system using metrics that are given as follows.

6.6.1 Test 1: Global Query Latency

We consider a set of experiments that measures the global query latencies for differentquery loads. Each search node runs two query processing threads. Figure 6.2 shows mea-surements of global query latencies in terms of box plots that show the 25th, 50th, 75th, and95th percentiles for query loads. The load ranges from 10 to 100 queries/second with 10-queries/second intervals and ranges from 100 to 450 queries/second with 50-queries/secondintervals. Note that the measurements on the vertical axis, which presents the query la-tencies, are in a logarithmic scale. As illustrated through the 50th percentile values of thequery latencies, the query latency increases with the increment of the query load, whichis expected. At the load below 100 queries/second, there is no significant change in thedistribution of query latencies and the 75th percentile latencies range between 19 and 28milliseconds. For the load above 100 queries/second up to the load of 400 queries/second,the query latencies are still small, as demonstrated by the 75th percentile latencies, whichare below 100 milliseconds.

6.6.2 Test 2: Local Computational Overhead

We investigate computational overhead of a search node based on the same measurementresults in Figure 6.2. The Figure 6.3 shows average values of measurements of CPU usageof a search node in percentage for different query loads. As we expected, the CPU usageincreases linearly with the increasing query loads. The figure illustrates that our prototypeuses below 1% of the CPU utilization, while supporting up to 60 queries/second.


PROTOTYPE

Figure 6.2: Global latencies for different query loads. Each measurement shows box plotswith markers at 25th, 50th, 75th, and 95th percentile. Each search node runs two queryprocessing threads.

Figure 6.3: Computational overhead of a search node for different query loads. Each searchnode runs two query processing threads.

6.6.3 Test 3: Effect on Concurrency

We investigate the effect on the global query latencies by different query loads with respectto different numbers of threads for local query processing. Each thread runs on a dedicatedCPU core. Figure 6.4, illustrates four series of such measurements, which are representedby the four curves, each associated with a different number of concurrent threads. Eachcurve shows the 50th percentile query latencies. The curve associated with two concurrentthreads corresponds to the measurement in Figure 6.2. As we expected, for all curves, thequery latency increases with the increasing query load. However, after a certain query load,the latencies start to soar. This indicates that the processing cores assigned to the queryprocessing threads in a search node have reached their capacities. Additionally, we observethat at any given query latency, the query load that search nodes can handle normally

6.6. RESULTS 39

increases with number of threads.

Figure 6.4: The 50th percentile of global query latencies for different query loads. Thecurves show results for 1-4 concurrent query processing thread(s).

Additionally, we also investigate the computational overhead of a search node, which isillustrated in Figure 6.5. The figure shows average values of measurements of CPU usageof a search node in percentage for different query loads. The curves are based on thesame measurement results in Figure 6.2, Figure 6.3, and Figure 6.4. Each curve starts as astraight line and reaches a plateau after the certain query load. We observe that the turningpoint of each curve corresponds to the soaring in global query latencies (Figure 6.4). Thisindicates that the local query processing threads of a search node are unable to capitalizemore CPU resources; as a result, fast global query responses at the higher load cannot beensured due to limitation in the capacity.

Recall that the server that runs a search node has 24 processor cores. By utilizing oneprocessor core, a search node can consume up to 4.17 % of total CPU capacity. It is shownby the experiments that it is an effective way to limit the CPU usage by controlling numberof query processing threads.

Figure 6.5: Computational overhead of a search node for different query loads and for 1-4concurrent query processing thread(s).


PROTOTYPE

6.6.4 Test 4: Local LatenciesWe investigate the local latencies in a search node while processing a query. Figure 6.6illustrates average time spent on each phase per query for eight different query loads. Dur-ing the measurements, two query processing threads are used. Note that the measurementson the vertical axis, which presents the latencies, are in a logarithmic scale. We observethat average latencies of all phases remain almost constant when the query load increasesexcept the waiting time, which increases exponentially with the increasing query load. Thisis unexpected. Since the query load increases linearly, the waiting time should increaselinearly. Even though the fact that an execution of one query generates multiple echo pro-tocol messages, which also waits in the processing queue to be processed, it only attributesto linear increment, not an exponential one. To understand such an aberrant behaviour,further investigations are required, for instance, detail study of timestamps of messages indifferent stages of local processing, e.g., waiting time, database access, etc.

We also compare latencies for each phase of operation as shown in Figure 6.7. We ob-serve that the waiting time contributes most, which is more than 80%, to a query latencycompared to all other operations.

Figure 6.6: Bar charts show average time spend on each phase of an operation in a searchnode with respect to query loads. Each search node runs two query processing threads.

6.6.5 Test 5: Impact of Cluster Load on Global LatencyWe study the impact of load in the cloud on global query latency. We consider two extremecases, a light load (underutilized) and a very heavy load (highly utilized). The first casecorresponds to all previous measurements; however, in the second case, the cloud servers arehighly utilized, which is characterized by being approximately 90% of CPU utilization. Forthe heavy load case, we run virtual machines executing CPU-intensive applications in thecloud servers. Note that each search node runs one thread of local query processing. Figure6.8 shows the global query latencies for both cases for different query loads. The figurepresents the 50th percentile query latencies. We observe that the global query latencies ofthe measurements that are performed in a highly utilized cloud are slightly higher thanthose in the underutilized cloud due to the contesting in acquiring CPU resources. Up to

6.7. ESTIMATING THE GLOBAL QUERY LATENCY FOR A LARGE DATACENTER41

(a) (b)

Figure 6.7: The Pie chart shows percentage of time spend on each operation : (a) at 100queries/second load and (b) at 200 queries/second load. Each search node runs two queryprocessing threads.

200 queries/second, for each query load, the differences of the 50th percentile query latenciescorresponding to the high load and the low load cases are less than 20 milliseconds.

Figure 6.8: The 50th percentile global latencies for different query loads when the cloud isunderutilized and highly utilized.

6.7 Estimating the Global Query Latency for a Large Datacenter

The key performance characteristics of our prototype are based on the performance charac-teristics of the echo protocol (see Section 2.4) [48]. The echo protocol guarantees that onlytwo messages are exchanged per link during the execution, and the number of messagesprocessed in a search node is equal to the number of its neighbors in the graph of searchnodes. The execution time of a query, which we refer to the global query latency, growsproportionally with the height of the spanning tree.

Here we develop a model to estimate the global query latency in a network search systemusing the measurements for local query processing and the performance properties of theecho protocol. Our model considers the case where the network search system processesone query at a time. Our model is inspired by the model developed in [42].


PROTOTYPE

The echo protocol creates a spanning tree during the distribution of a query among thesearch nodes. The results of the query are aggregated along the tree. The global querylatency therefore depends on the height of the tree and the execution time to process thequery on each search node. The latter quantity depends on the execution time of thelocal query processing, database access, and transmission time to transport echo protocolmessages between different levels of the tree.

Consider that a datacenter has 100,000 identical servers, each of which runs a search node.All servers run independently and are connected via a stable Gigabit Ethernet network.Furthermore, we assume that the overlay network is configured in such a way that thespanning tree created by the echo protocol is a height-balanced binary tree, i.e. each nodecan have at most two child nodes, and the height differences of both child nodes’ subtreesare within 1. Therefore, the height of the tree d is upper bound by (log2(100, 000)) + 1,which is 18 (d <= 18).

Let T1,i denote the time it takes to process a query at level i of the spanning tree inthe expansion phase of the echo protocol. Let T2,i denote the time it takes to aggregatethe results at level i of the spanning tree in the contraction phase of the echo protocol.Furthermore, let tr,i denote the time to transmit echo-protocol messages between level iand i + 1 of the spanning tree. Finally, d denotes the height of the spanning tree.

Let G denote the average global query latency. If we present the average values of T1,i, T2,i,and tr,i by T̄1, T̄2, and t̄r respectively, the average global query latency is given by

G ≈ T̄1 + (d− 1)T̄2 + 2(d− 1)(t̄r)

Note that T1 includes the duration of executions of matching, ranking, and object retrieval(database access) functions, T2 includes the duration of an execution of a partial-resultaggregation, and tr includes a duration to transmit explorer or echo messages.

From the experiments, the average duration of executing matching and ranking functionsis 0.61 milliseconds, the average duration to access the database is 1.65 milliseconds, anaverage duration of executing an aggregating function is 1.27 milliseconds, and an averageduration of transmitting explorer and echo messages between levels is 1.71 milliseconds.From above, d ≤ 18. We estimate the expectation of a global query latency G whenquerying on the network of 100, 000 nodes as:

G ≈ (0.61 + 1.65) + 17(1.27) + 34(1.71) = 81.99 milliseconds

6.8 Discussion

From the results of Test 1 and Test 2, we estimate that our prototype can support a queryload up to 100 queries/second with the 95th percentile latency below 100 milliseconds witha CPU utilization of a server less than 1.6%. Note that the sensing and indexing mod-ules are not used in the experiment. The CPU utilization would increase 2-3% because ofthose modules. Test 3 suggests that the prototype can be adjusted its limit CPU consump-tion based on numbers of processing threads, each of which runs on a separate core, that

6.8. DISCUSSION 43

are allocated to a search node. The more numbers of threads, the more CPU resourcescapitalized, as a result, the more query loads the system can support with a low latency.Furthermore, from the results of Test 4, we learn that all of the local latencies do not changefor different query loads, except the waiting time, which increases exponentially with theincreasing loads. Moreover, as a result of Test 5, we show that our prototype can operatewith the similar performance even when the cloud is highly utilized. Finally, the model forestimation of a global latency suggests that the expected global query latency is below 100milliseconds when search nodes are deployed on a cloud with 100,000 nodes.

Chapter 7

Limitations of the Current Design

We designed and implemented a minimal yet functionally complete network search node.However, there is room for improvement. We discuss some limitations of our design andimplementation as follows.

Firstly, there are spaces for improvement in the local query processing where matching andranking functions take place. The functions make use of the metrics tf and idf . The metricidf needs global knowledge of documents (or search objects in our case) across all nodesin order to be computed. Hence, we require an additional protocol to share the metricinformation between search nodes. At the moment, we have not included such a protocolin our node design and, therefore, we assign a constant value (of 1) for the metric idf . Forthe same reason, the ranking metric connectivity of objects in terms of link relations is notincluded. Additionally, other ranking metrics, such as freshness of information, are not usedin the implementation.

Secondly, the topology manager for a network search node is statically configured. Eventhough such an approach works for our small testbed, for large scale implementation, itwill not work. In that case, we require a dynamic, self-configuring topology manager toefficiently deploy a search node in large scale networked systems with at least hundredthousand devices.

Thirdly, at the moment, the database component performs well when there are more readsthan writes. This may not be realistic, if there are lots of real-time updates.

Fourthly, the number of threads for concurrent query processing is pre-configured and static.Obviously, we want the number of threads spawned in a search node to be adjusted dynam-ically based on the current query load. Additionally, the number of queries executed in athread is not balanced due to lack of a query routing scheme that can balance the queryload across the threads.

Lastly, the numerical model that estimates the global query latency in a large datacenterconsiders only the case where a query is processed sequentially in the network search system.The effect of concurrenct query processing is not included in the model.

45

46 CHAPTER 7. LIMITATIONS OF THE CURRENT DESIGN

7.1 Future Works

The issues discussed above require further investigations. We suggest possible ways toovercome those limitations as follows.

We suggest including a protocol for sharing information regarding matching and rankingmetrics, such as idf and connectivity of objects across search nodes. We plan to do thisthrough a distributed aggregation protocol, such as GAP [51]. We also want to implementother ranking metrics, such as freshness. Further, we want to explore new matching andranking options including subnet address matching.

We suggest developing and implementing a topology manager for the echo protocol tosupport a dynamic, self adjusting network overlay of search nodes to adapt to the changesin the network topology for a large-scale network. T-man [41], a distributed protocol forcreating topologies, can be used for this purpose.

We suggest implementing a database component that can handle many writes/updatesof search objects, while supporting a high load of queries. In memory databases, suchas Memcached [14], can support high write loads. However, they lack a high-level querylanguage that can support the query language for network search in a straightforward way.Further, it is difficult to achieve both high rates of reads and writes in a database at thesame time. Therefore, tradeoffs need to be studied. We plan to investigate this area infuture.

A balanced load among query processing threads leads to a high throughput. We suggestincluding a load balancing component in the query dispatching module to achieve a balancednumber of queries per threads.

We suggest investigating and developing an analytical model for estimating the global querylatency in a large datacenter that includes the essence of concurrency in query processingthrough the use of more sophisticated model, such as a queuing model.

Chapter 8

Conclusions

The thesis project contributes towards the design and implementation of a network searchnode, which is the key component in the network search architecture. The search nodeis designed and implemented as a software package that runs in servers on an IaaS cloudplatform, and it has functionalities to sense and maintain data locally, and process searchqueries in a distributed manner. The performance of this system has been evaluated ina testbed of a cloud platform that includes nine high performance servers. It is shownthat the prototype achieves the 95th percentile latency of query response times below 100milliseconds for a query load up to 100 queries/second at a 1.6% CPU overhead. Ournumerical model also shows that the network search system is expected to work well in alarge-scale system, since it shows that the query response time when a query is processedsequentially is below 100 milliseconds for a networked system of 100,000 network elements.

We have identified several limitations of our design and implementation. For example,our implementation of the matching and ranking modules for network search does notconsider several metrics that are part of the design. Further, some of the components inour query processing component are statically configured, which does not scale for a large-scale implementation. These limitations need to be addressed to make the system moreadaptive to dynamic changes.

8.1 Personal Experiences

This thesis was a valuable learning experience in both technologies and a way to conducta long research project. I started the thesis with profiling an existing prototype of networksearch in terms of both CPU usage and a query latency in order to understand bottlenecksof the previous system. Then, along the way, I improved and resolved the bottlenecks oneby one with the incremental development of the prototype. Additionally, I organized thesource code to make it more manageable over time. The most difficult parts of the thesiswere understanding the previous prototype and learning to write a Python code efficientlyin a short amount of time. Yet, the most enjoyable part was writing code. It was thrilling tosee the prototype running efficiently as expected. However, if I had to do the whole projectagain, I would make a concrete plan for development before actually start improving theprototype, since I lost the track along the way. The project execution had no plan. It startedimproving iteratively. As a result, it was difficult for me to schedule the time efficiently.

47

Appendix A

A Complete Class Diagram of aNetwork Search Node

49

50APPENDIX A. A COMPLETE CLASS DIAGRAM OF A NETWORK SEARCH NODE

Bibliography

[1] Amazon DynamoDB - Amazon Web Services. http://aws.amazon.com/dynamodb/.[Online; accessed 25-July-2013].

[2] Amazon EC2 - Amazon Web Services. http://aws.amazon.com/ec2/. [Online; ac-cessed 25-July-2013].

[3] Apache CouchDB. http://couchdb.apache.org/. [Online; accessed 25-July-2013].

[4] BSON - Binary JSON. http://bsonspec.org/. [Online; accessed 5-August-2013].

[5] C-Store: A Column-Oriented DBMS. http://db.lcs.mit.edu/projects/cstore/.[Online; accessed 27-July-2013].

[6] Couchbase | Document-Oriented NoSQL Database. http://www.couchbase.com/.[Online; accessed 25-July-2013].

[7] DB-Engines Ranking. http://db-engines.com/en/ranking. [Online; accessed 24-August-2013].

[8] HP Cloud Services. https://www.hpcloud.com/. [Online; accessed 25-July-2013].

[9] HP Vertica. http://www.vertica.com/. [Online; accessed 25-July-2013].

[10] Java programming language. http://www.oracle.com/technetwork/java/javase/overview/index.html. [Online; accessed 5-August-2013].

[11] JSON (JavaScript Object Notation). http://json.org/. [Online; accessed 27-July-2013].

[12] KVM: Main Page. http://www.linux-kvm.org/. [Online; accessed 25-July-2013].

[13] libvirt: The virtualization API. http://libvirt.org/. [Online; accessed 25-July-2013].

[14] Memcached - a distributed memory object caching system. http://memcached.org/.[Online; accessed 7-October-2013].

[15] Microsoft Hyper-V Server 2012. http://www.microsoft.com/en-us/server-cloud/hyper-v-server/default.aspx. [Online; accessed 25-July-2013].

[16] MongoDB. http://www.mongodb.org/. [Online; accessed 25-July-2013].

[17] Munin. http://munin-monitoring.org/. [Online; accessed 25-July-2013].

51

52 BIBLIOGRAPHY

[18] Nagios - The Industry Standard in IT Infrastructure Monitoring. http://www.nagios.org/. [Online; accessed 25-July-2013].

[19] ØMQ (ZeroMQ). http://zeromq.org/. [Online; accessed 10-August-2013].

[20] OpenStack Open Source Cloud Computing Software. http://www.openstack.org/.[Online; accessed 25-July-2013].

[21] PyMongo 2.6 Documentation. http://api.mongodb.org/python/current/. [Online;accessed 26-August-2013].

[22] Python Programming Language - Official Website. http://www.python.org/. [Online;accessed 5-August-2013].

[23] Riak | Basho. http://basho.com/riak/. [Online; accessed 25-July-2013].

[24] Scapy. http://www.secdev.org/projects/scapy/. [Online; accessed 29-August-2013].

[25] The MongoDB Manual. http://docs.mongodb.org. [Online; accessed 26-August-2013].

[26] The Rackspace Cloud. http://www.rackspace.com/cloud/. [Online; accessed 25-July-2013].

[27] VMware ESXi and ESX Info Center. http://www.vmware.com/products/vsphere/esxi-and-esx/index.html. [Online; accessed 25-July-2013].

[28] XenServer | Open Source Server Virtualization. http://www.xenserver.org/. [On-line; accessed 25-July-2013].

[29] Zabbix :: An Enterprise-Class Open Source Distributed Monitoring Solution. http://www.zabbix.com/. [Online; accessed 25-July-2013].

[30] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph, Randy Katz,Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and MateiZaharia. Above the Clouds: A Berkeley View of Cloud Computing. Technical report,University of California at Berkeley, February 2009.

[31] J. W. Backus, J. H. Wegstein, A. van Wijngaarden, M. Woodger, P. Nauer, F. L.Bauer, J. Green, C. Katz, J. McCarthy, A. J. Perlis, H. Rutishauser, K. Samelson, andB. Vauquois. Revised report on the algorithm language ALGOL 60. Communicationsof the ACM, 6(1):1–17, January 1963.

[32] M. Banahan, D. Brady, and M. Doran. The C Book. Addison-Wesley., 2nd edition,1991.

[33] Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau, andJohn Cowan. Extensible Markup Language (XML) 1.1 (Second Edition). Technicalreport, W3C, 2006.

[34] Steven Burke. VMware Continues Virtualization Market Romp.

[35] Thomas M. Connolly and Carolyn E. Begg. Database systems : a practical approachto design, implementation, and management. Addison Wesley, fourth edition, 2005.

BIBLIOGRAPHY 53

[36] George P. Copeland and Setrag N. Khoshafian. A decomposition storage model. InProceedings of the 1985 ACM SIGMOD international conference on Management ofdata - SIGMOD ’85, pages 268–279, New York, New York, USA, 1985. ACM Press.

[37] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Stein, and Clifford. B-Tree. In Introduction to Algorithms, chapter 18, page 1292. MIT press, 3rd edition,2009.

[38] Leslie L. Daigle, Dirk-Willem VanGulik, and Patrik Faltstrom. Request for Comments:3406: Uniform Resource Names (URN) Namespace Definition Mechanisms.

[39] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns:Elements of Reusable Object-Oriented Software. Addison-Wesley, 1994.

[40] Donald Gross, John F. Shortle, James M. Thompson, and Carl M. Harris. Fun-damentals of Queueing Theory (Wiley Series in Probability and Statistics). Wiley-Interscience, 4th edition, 2008.

[41] Márk Jelasity, Alberto Montresor, and Ozalp Babaoglu. T-man: Gossip-based fastoverlay topology construction. Comput. Netw., 53(13):2321–2339, August 2009.

[42] K.-S. Lim and R Stadler. Real-time views of network traffic using decentralized man-agement. In Integrated Network Management, 2005. IM 2005. 2005 9th IFIP/IEEEInternational Symposium on, pages 119–132, 2005.

[43] GS Manku, M Bawa, and P Raghavan. Symphony: Distributed Hashing in a SmallWorld. USENIX Symposium on Internet . . . , 2003.

[44] Behzad Montazeri Martin Hitz. Measuring Coupling and Cohesion In Object-OrientedSystems. 1995.

[45] Brett D. McLaughlin, Gary Pollice, and Dave West. Head First Object-Oriented Anal-ysis and Design: A Brain Friendly Guide to OOA&D (Head First). O’Reilly Media,Inc., 2006.

[46] Maged Michael, Jose E. Moreira, Doron Shiloach, and Robert W. Wisniewski. Scale-upx Scale-out: A Case Study using Nutch/Lucene. In 2007 IEEE International Paralleland Distributed Processing Symposium, pages 1–8. IEEE, 2007.

[47] Sebastian Michel, Peter Triantafillou, and Gerhard Weikum. MINERVA: A ScalableEfficient Peer-to-Peer Search Engine. In Gustavo Alonso, editor, Middleware 2005,volume 3790 of Lecture Notes in Computer Science, pages 60–81. Springer Berlin Hei-delberg, 2005.

[48] Alexander Clemm Misbah Uddin Rolf Stadler. Scalable Matching and Ranking forNetwork Search. May 2013.

[49] Gerard Salton, Edward A. Fox, and Harry Wu. Extended Boolean information retrieval.Communications of the ACM, 26(11):1022–1036, November 1983.

[50] Amy Skinner. A System for Googling Operational Data in Large Clouds. Master’sthesis, KTH, School of Information and Communication Technology (ICT), 2012.

[51] Rolf Stadler. Protocols for Distributed Management. Technical Report 2012:028, KTHCommunication Networks, Stockholm, 2012.

54 BIBLIOGRAPHY

[52] Milgram Stanley. The small-world problem. Psychology Today, 1(1):61–67, 1967.

[53] M Uddin, R Stadler, and A Clemm. Management by network search. In NetworkOperations and Management Symposium (NOMS), 2012 IEEE, pages 146–154, 2012.

[54] Misbah Uddin, Rolf Stadler, and Alexander Clemm. A Query Language for NetworkSearch, 2013.

[55] Guido van Rossum, Barry Warsaw, and Nick Coghlan. Style Guide for Python Code,2001.

[56] Werner Vogels. Eventually consistent. Commun. ACM, 52(1):40–44, January 2009.

[57] Mike Wawrzoniak, Larry Peterson, and Timothy Roscoe. Sophia: an Information Planefor networked systems. SIGCOMM Comput. Commun. Rev., 34(1):15–20, January2004.

[58] Tingxin Yan, Deepak Ganesan, and R Manmatha. Distributed image search in camerasensor networks. In Proceedings of the 6th ACM conference on Embedded networksensor systems, SenSys ’08, pages 155–168, New York, NY, USA, 2008. ACM.

Date post:	27-Jul-2018
Category:	Documents
Upload:	lydang
View:	214 times
Download:	0 times

Design and Implementation of a Network Search Node696730/FULLTEXT01.pdf · Design and...

Documents