Increasing the Throughput of a Node.js Application954978/FULLTEXT01.pdf · 1.1.2 Node.js Node.js is...

transcript

IN DEGREE PROJECT COMPUTER ENGINEERING,FIRST CYCLE, 15 CREDITS

, STOCKHOLM SWEDEN 2016

Increasing the Throughput of a Node.js ApplicationRunning on the Heroku Cloud App Platform

NIKLAS ANDERSSON

ALEKSANDR CHERNOV

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY

Abstract

The purpose of this thesis was to investigate whether utilization of the Node.js

Cluster module within a web application in an environment with limited resources

(the Heroku Cloud App Platform) could lead to an increase in throughput of the

application and, in the case of an increase, how substantial it was.

This has been done by load testing an example application when utilizing the module

and without utilizing it. In both scenarios, the traffic sent in to the application varied

from 10 requests/second to 100 requests/second. For the tests conducted on the

application utilizing the module the number of worker process used within the

application varied between 1 and 16.

Furthermore, the tests were first conducted in a local environment in order to

establish any increases in throughput in a stable environment, and, in case there

were notable differences in throughput of the application, the same tests were

conducted on the Heroku Cloud App Platform. Each test was also aimed towards

testing one of two different types of tasks performed by the application: I/O or CPU

bound.

From the test results, it could be derived that utilization of the Cluster module did

not lead to any increases in throughput when the application was doing I/O bound

tasks in neither of the environments. However, when doing CPU bound tasks, it led

to a ≥20% increase when the traffic sent to the application in the local environment

was 10 requests/second or higher. The same increase could be seen when the traffic

sent to the application was 50 requests/second or higher in the Heroku environment.

The conclusion was, thus, that utilization of the module would be useful for the

company (that this thesis took place at) in case an application installed on Heroku

was exposed to higher traffic.

Keywords

Throughput, Node.js, Heroku, Performance, Increasing

Abstract

Syftet med detta examensarbete var att undersöka om huruvida nyttjande av

Node.jsmodulen Cluster i wen webbapplikation i en miljö med begränsade resurser

(Heroku cloud appplattformen) skulle kunna leda till en ökning i throughput hos

applikationen, och om det skedde en ökning – hur stor var då denna?

Detta har gjorts genom att belastningstesta en exempelapplikation nyttjande

modulen och utan den. I båda scenarier varierade trafiken som skickades till

applikationen mellan 10 och 100 requests/sekund. För testerna utförda i

applikationen som nyttjade modulen varierade antalet workerprocesser mellan 1 och

Vidare utfördes testerna i den lokala miljön med målet att slå fast möjlig

throughputökning i en stabil miljö först, och om det fanns några märkbara

skillnaden i throughput hos applikationen skulle samma tester även utföras på

Heroku app cloudplattformen. Varje test strävade också för att testa en av två olika

typer av arbetsuppgifter utförda av applikationen: I/O eller CPUbundna.

Från testresultatet kunde det fastslås att: Clustermodulen ledde inte till några

ökningar vad gällde throughput när applikationen gjorde I/Obundna

arbetsuppgifter i någon av miljöerna. När applikationen däremot gjorde

CPUbundna arbetsuppgifter ledde det till en ökning på ≥20% när trafiken var 10

requests/sekund eller högre. Samma ökning kunde ses först när trafiken kommer

över 50 requests/sekund eller högre i Herokumiljön.

Slutsatsen var därmed att användande av modulen skulle vara användbart för

företaget som arbetet uträttades hos om en applikation som låg installerad på

Heroku utsattes för vad som ansågs vara högre trafik.

Nyckelord

Throughput, Node.js, Heroku, Prestanda, Öka

Table of Contents

Abstract (in English) Abstract (in Swedish) Table of Contents 1 Introductio n………………………………………………………………………………………………5

1.1 Background ………………………………………………………………………………………..5 1.1.1 Increasing Throughput ………………………………………………………………...6 1.1.2 Node.js ……………………………………………………………………………………….6 1.1.3 The Heroku Cloud App Platform…………………………………………………..6 1.1.4 Web Applications ………………………………………………………………………..7

1.2 Problem……………………………………………………………………………………………..7 1.3 Research Questions …………………………………………………………………………….7 1.4 Purpose ……………………………………………………………………………………………..8 1.5 Delimitations ……………………………………………………………………………………..8 1.6 Disposition………………………………………………………………………………………...9

2 Theoretical Background …………………………………………………………………………….10 2.1 The Company Platform……………………………………………………………………..10 2.2 Heroku Dyno …………………………………………………………………………………....11 2.3 I/O vs. CPU bound ……………………………………………………………………………12

2.4 The Inner Workings of Node.js ……………………………………………………..13 2.5 Increasing Throughput in Node.js Using the Cluster Module …………...14 2.6 Related Work………………………………………………………………………………15

3 Research Process ……………………………………………………………………………………...17 3.1 Research Methodology ……………………………………………………………………….17 3.2 Process Overview ………………………………………………………………………………18

3.2.1 Problem Definition……………………………………………………………………18 3.2.2 Data Collection………………………………………………………………………...19 3.2.3 Design & Implementation………………………………………………………....20 3.2.4 Defining the Testing Environments …………………………………………...20 3.2.5 Creating Test Plan…………………………………………………………………….20 3.2.6 Results and Analysis …………………………………………………………………20 3.2.7 Evaluation………………………………………………………………………………..21

3.3 Hypotheses ……………………………………………………………………………………...21 4 Analysis: How to Increase Throughput ………………………………………………………22

4.1 Our approach …………………………………………………………………………………..22 4.1.1 Different Implementations of the Cluster Module ………………………..22 4.1.2 Clustering Method Chosen

When Creating the Application Templat e…………………………………...23

4.2 The Application Template ………………………………………………………………...23 4.2.1 CPU Usage ……………………………………………………………………………….25 4.2.2 Workload ………………………………………………………………………………...26 4.2.3 Memory Usage ………………………………………………………………………....27

4.3 Test Application……………………………………………………………………………….27 5 Analysis: Benchmarking the Test Application……………………………………………..28

5.1 Testing Environment ………………………………………………………………………..28 5.1.1 Local Environment …………………………………………………………………….29 5.1.2 Heroku Environment …………………………………………………………………29 5.1.3 The Test Application’s Memory Usage………………………………………...29

5.2 Testing Tools……………………………………………………………………………………30 5.2.1 Apache JMeter…………………………………………………………………………..31 5.2.2 Heroku Metrics………………………………………………………………………...32

5.3 Creating the Test Plan……………………………………………………………………….33 5.4 Local Tests………………………………………………………………………………………33

5.4.1 I/O Bound ………………………………………………………………………………..34 5.4.2 CPU Bound……………………………………………………………………………...35

5.5 Heroku Tests…………………………………………………………………………………...36 5.5.1 Throughput Rates……………………………………………………………………..37 5.5.2 Memory Usage…………………………………………………………………………39 5.5.3 Median Reponse Times…………………………………………………………….40 5.5.4 Analysis of Heroku Test Results………………………………………………...41

6 Discussion……………………………………………………………………………………………….43 6.1 Our Methodology and Consequences of the Study …………………………….....43 6.2 Discussion and Conclusions ……………………………………………………………....44

6.2.1 Recommendations Concerning the Application Template …………….45 6.3 Ethics ……………………………………………………………………………………………...46 6.4 Sustainability …………………………………………………………………………………...46 6.5 Future Work…………………………………………………………………………………….47

References Appendix 1 Heroku Dyno CPU Information………………………………………………...52 Appendix 2 The Test Application……………………………………………………………….58 Appendix 3 The Application Template………………………………………………………..60 Appendix 4 The Local Server CPU Specifications………………………………………...61 Appendix 5 Results from I/O Bound Tests in Local Environment…………………63 Appendix 6 Results from CPU Bound Tests in Local Environment………………..65 Appendix 7 Results from CPU Bound Tests on Heroku………………………….…….67

1 Introduction

Today, virtually every company with a presence on the Internet collects data

concerning their customers in some form[1]. With a large collection of customer

profiles it is possible to collect information concerning the customer’s geographical

area, what products the customer has viewed, what devices the customer is using etc.

With this data, customer communication can be improved, marketing can be

optimized (through a more welltargeted informational flow), and all customer

information can be stored in one single virtual space.

Data can come from different sources: web analyticstools, login processes, email,

etc. It can also be required to collect data from different physical nodes; it might be

located in different data warehouses, and can even be administered by different third

party companies.

For a large company the collected data may grow very large and there might be a lot

of daily transactions. It is therefore important that these transactions are consistent,

that data is preserved, and that the application can handle as much traffic as

possible. One way of making sure that the application is adapted to do this is by

assuring that it can handle as many requests per time unit as possible. This leads to

the application being able to serve more clients, thus lowering the risk of a client not

receiving the requested data.

1.1 Background

Innometrics, the company that the project took place at, is active within the area just

described above. Their product helps other companies personalize their marketing

strategies by collecting data from a customer’s different data warehouses, and

creating a customer profile out of this data.

They were in need of increasing the throughput of Node.js applications used for

intrasystem communication between their system and other systems. These

applications were installed on an external cloud platform (Heroku Cloud App

Platform or Amazon Web Services), and thus restricted by each of the platform’s

individual specifications.

1.1.1 Increasing Throughput

Throughput is a measurement used for describing the number of requests per time

unit handled by any given web service or application. One of the ways of increasing

throughput is by making the application more concurrent, that is – to make it

process more requests simultaneously [2].

This can be achieved by adding extra hardware resources or by maximizing

utilization of the available resources.

1.1.2 Node.js

Node.js is a runtime environment based on the programming language JavaScript –

a programming language most wellknown as the scripting language for web pages. A

runtime environment deals with a variety of issues such as the layout and allocation

of storage locations for the objects specified in the source code, the mechanisms used

by the target program to access variables and for passing variables, etc.[3].

Node.js ships with a collection of modules, which basically encapsulate related code,

as in Java or any other programming language with a set of standard libraries. Also,

new modules can be installed, managed and published through the Node Package

Manager to provide further functionality. A more detailed specification of Node.js is

given in chapter 2.

1.1.3 The Heroku Cloud App Platform

In order to describe what cloud computing is, Eric Griffith states in his article [4]: “In the simplest terms, cloud computing means storing and accessing data and programs

over the Internet instead of your computer’s hard drive”.

Heroku belongs to a type of cloud computing known as Platform as a Service

(PaaS)[5]. This type of service removes the need for organizations to manage the

underlying infrastructure (usually hardware and operating systems) and allows users

to focus on the deployment and management of their applications [6].

The Heroku platform allows for users to install and execute applications isolated

from one another. It provides functionality such as a database management system

and application monitoring. The platform’s execution environment also enables the

user to write applications in several different programming languages, such as

Node.js, Ruby, Java and PHP.

1.1.4 Web Applications

An application is a stored set of instructions that directs a computer to do some

specific task[7].

Web applications are distributed clientserver applications in which a web browser

provides the user interface [8]. The client browser and the server side exchange protocol messages represented as HTTP requests and responses. In the case of cloud

computing, web applications no longer exist on the server, instead they reside on a

cloud platform.

1.2 Problem

During periods of high traffic towards a web application, it is essential that the

system can handle the increased demand of service. Exposing an inefficient web

application to high traffic can cause individual requests not to receive their

corresponding responses. It can also lead to response times – the total time it takes from when a user makes a request until they receive a response – being longer than

desired.

In order to fulfill the need of service to as many clients as possible, it is important

that the web application can provide a large throughput.

1.3 Research Questions

The main questions of this thesis narrow down to:

How can the throughput of a Node.js application, running on the Heroku

Platform, be increased by taking advantage of the available system resources?

In case of an increased throughput, how substantial will it be?

1.4 Purpose

The application’s performance is limited by the cloud platform where the application

is installed. The purpose of this thesis is to show how to increase throughput of the

company’s applications running on the Heroku Platform.

The intention is to develop a generic application template in Node.js, that can be

used when creating new applications within the company’s Node.js application

platform. Applications that utilize this template should be able to be installed on the

Heroku Cloud App Platform. The template should increase the throughput of each

individual application, and thus increase the performance of the system in whole.

Although the application was primarily aimed at the Heroku platform, there should

be a possibility to migrate it to other existing cloud platforms. Therefore, the solution

should be as general as possible.

Furthermore, we are to implement functionality that takes full advantage of the

available system resources in order to increase the number of requests handled by

the application per time unit. This is to be done without adding any additional

hardware resources.

Best practices in increasing throughput of a Node.js application deployed on the

Heroku Cloud App Platform, without adding hardware resources, will be investigated

and evaluated. Hopefully, this will lead to an increase in throughput of each

individual application on the Innometrics’ Node.js application platform.

1.5 Delimitations

This thesis will focus singlehandedly on increasing the throughput by taking

advantage of the available system resources. Also, it is only concerned with the

increase of throughput in an application running on the Heroku Platform – not on an

arbitrary cloud platform. Furthermore, we were limited to using only the free

account level of Heroku (specifications of machines on this level are given in chapter

1.6 Disposition

The thesis is outlined as follows. Firstly, a theoretical background is presented, giving

a brief insight to the specific technologies that are needed in terms of understanding

the approach to the problem and the thesis results. Node.js, the Heroku

environment, and increasing throughput in Node.js specifically are here discussed in

more detail.

After that, the research process is treated. The chapter starts by describing our

information gathering process, and continue with a review of existing literature, a

description of our research methodology and the requirements specification.

The next chapter describes how the template for the applications is created. This

chapter is then followed by a chapter devoted to the tests. Here, the testing

environment is described and the results from the tests are evaluated.

Lastly, in the Discussion chapter, we reflect about our methods and results, future

work, and the topics of ethics and sustainability within the area.

2 Theoretical Background

This chapter will give a deepened insight into the more theoretical parts of the

problem area that is essential in order to understand the problem and its solution. It

will describe the Innometrics system, the Heroku dyno, how Node.js works in more

detail, how to increase throughput within the runtime environment using the cluster

module, and related work done within the area.

2.1 The Company Platform

As mentioned in section 1.1, the customer’s (the company buying Innometrics’

product) data warehouse or their system for tracking and managing existing or

potential customers (Customer Relationship Management system, or CRMsystem)[9] is connected to the company platform.

With the data retrieved from the customer’s data warehouse or CRM system,

Innometrics initially puts together a profile for each of the customer’s clients (the

visitors to the customer’s website), which is then stored in Innometrics’ own data

warehouse. The Innometrics system will then continually add data to this profile

containing information on any website interaction that the client in question has

made towards the customer’s website. The website interaction to listen for is

specified by the customer through the Innometrics system.

All client interaction, that has been specified to listen for, is logged in an event

stream in the form of data objects known as events. An event is, in turn, a collection

of data containing information on an action that has been taken by a client on the

customer website. For example, as a client clicks a banner or a link, an event could be

generated containing information on which banner or link that was clicked, the time

when the click was made, etc.

In order to enable Innometrics to retrieve resources from third party sources,

Node.js applications (deployed on a cloud platform) are used. Each application has

been set to listen for one or several events. In case any of these events are triggered

by a visitor on the customer website (e.g. by the client clicking on a link), the

Innometrics system sends a request containing client’s profile (with the event added

to it) to the application.

An example of this type of communication is shown in figure 2.1.1. As the client visits

a website, an event is generated by the Innometrics system containing information

on the IP of the client. A request containing the client profile is then sent by the

Innometrics system to the application. The application extracts the IP address

contained in the event data of the profile just received with the request, and sends it

onwards to an IPlookup service retrieving further information on the IP address in

question. Then, as the response is received from the lookup service, the application

saves this data towards the Innometrics’ own data warehouse.

Figure 2.1.1: A flow chart describing an example case of communication between different actors as an

event is triggered on a customer website.

2.2 Heroku Dyno

Each application on the Heroku platform is running on a dyno. Each dyno is a

lightweight Linux container that runs a single command provided by the user. A

dyno can run any command available in its environment like restart, stop, scale, etc.

According to Heroku’s official documentation[10], containerization is a virtualization technology that allows multiple isolated operating system containers to be run on a

shared host. All dynos are isolated from one another for security purposes.

Dynos on the free account level are limited to 512 MB of RAM [11]. Concerning the CPU specifications, this is something that Heroku (due to unknown reasons) has

decided not to reveal to the user, but by accessing the application’s shell environment

it was clear that the dyno lied on a machine that had access to one physical unit

consisting of 4 cores with 8 hardware threads each (see Appendix 1). However, it

seemed [10] that this was something that the dyno has varied access to depending on the amount of other dynos currently active on the shared host.

A hardware thread is one out of two execution threads per core that executes

simultaneously in order to hide latencies when it comes to retrieving data from

memory caches on the CPU, and is something that is implemented by Intel

HyperThreading Technology [12].

2.3 I/O vs. CPU bound

Tasks performed by an application or a system can be I/O or CPU bound.

I/O (I/O is shorthand for Input/Output) bound task performs operations associated

with I/O communications. Examples of I/O communications are HTTP requests,

database operations and disk reads and writes [13].

CPU bound tasks are mainly performed by the CPU. In this case the CPU spends its

time mostly on computing. Examples of these types of tasks are calculating a hash,

searching for an item, and performing mathematical calculations.

Figure 2.3.1: A CPU (a) vs. I/O bound (b) application

An application can also be either CPU or I/O bound. In the case of a CPU bound

application, a majority of the tasks done within the application are CPU bound. In

the case of an I/O bound application it is the other way around – a majority of the

tasks are I/O bound. Both types of applications are depicted in figure 2.3.1. Here it

can be seen how the CPU bound application (application ‘a’) spends more time doing

calculations, and less time handling I/O. It can also be seen, in application ‘b’, how

an I/O bound application spends its time doing the opposite – more time waiting for

I/O, and less time doing calculations [14].

2.4 The Inner Workings of Node.js

One of the main strengths of Node.js is its method for treating I/O calls. This is much

because of I/O calls being handled by background threads, while the main thread of

the application, known as the event loop, can treat and process any other requests

sent to the application. In figure 2.4.1, there is a detailed overview of the inner

workings of Node.js.

Figure 2.4.1: A Node.js instance with its event loop and thread pool

Node.js runtime runs on single core [15] and contains an event queue which stores a list of events, each consisting of a name describing the event and a callback

function[16] (a function to be run after the initial function has finished its execution). An example of an event is when an HTTP request is sent to the server. This request is

placed in the event queue. The event loop starts by picking up an event containing an

I/O call that is to be executed from the queue and then delegates the job to the

operating system via an internal thread pool [17]. The thread that receives the job then executes the function associated with the event without blocking the event loop,

while the event loop continues treating the next event in queue.

After the thread in the internal thread pool has finished its execution, the callback

function is again placed in the event queue. The callback function is later on retrieved

from the queue and processed by the event loop. If another event occurs, a new event

is placed in the event queue, and the procedure is repeated. This way the event loop

can handle all incoming requests asynchronously in a nonblocking way.

However, Node.js is not as good at treating CPU intensive tasks [18]. When Node.js

performs a CPU intensive task all other requests are being held up, due to the event

loop running on a single thread and the CPU being occupied with working on this

thread. One of the strategies to handle this problem is by using the Cluster module [13].

2.5 Increasing Throughput in Node.js Using the Cluster Module

In order to improve Node.js ability to treat CPU intensive tasks, worker processes

can be forked. That is, the main process of the application is duplicated into new

processes referred to as worker processes [19]. The main process is then referred to as the master process. This functionality is provided by the Cluster module, which is a

part of the standard library in Node.js [15].

When forking new processes, all new connections are first received by the master

process and then handed over to an available worker. Which worker gets the

connection is decided through a roundrobin approach – which essentially means

that the next available worker gets it [20].

Best practice is to bind each worker to its own logical CPU core, which leads to the

application’s ability of processing each request being increased through utilization of

more of the CPU’s capacity, thus, increasing its effectiveness and throughput [21].

This essentially means that each Node.js instance (figure 2.4.1) is replicated into its

own server instance, where each instance – known as a worker process – listens to

the same socket. Here, the master process works as a loadbalancer by receiving all

incoming connections and distributing them among the worker processes [15]. The resulting architecture of the application when implementing the cluster module is

depicted in figure 2.5.1.

Figure 2.5.1: The desired application architecture for this thesis, with each worker representing the

Node.js instance depicted in figure 2.4.1

2.6 Related Work

The Node.js platform is still rather new and evolving rapidly. Because of that it is not

easy to find articles that are still uptodate. Some of the articles are reviewed in this

section.

The article “Optimizing Node.js Application Concurrency” provided by Heroku’s

official website, explains how to regulate the number of worker processes [22]. It is also recommended to create worker process and bind each of them to its own logical CPU

core, thus making the application take full advantage of the available system

resources. One interesting thing that they mention is that each app has unique

memory, CPU and I/O requirements and there is no solution that can fit each app.

However, they do not provide any benchmark results.

Rowan Manning in his blog describes how to implement the Cluster module [23]. He also states that creating multiple processes for a Node.js application can dramatically

improve the amount of load the application can handle. He provides some simple

benchmarking in order to illustrate the improvement. The app is installed on a local

machine without involving the Heroku platform, and the benchmarked function is

doing CPU bound tasks.

Neil Kandalgaonkar argues that “Node.js can be a great choice for computation heavy

services” [24]. He clarifies that it can be suitable for some occasional CPUbound tasks – not too many, nor too heavy tasks however. The Heroku platform is mentioned in

the article as well, but due to the fact that the application tested in the article was too

big (~200 Mb), it was not possible to perform some thorough tests on that platform.

He names the Cluster module as one of the possible solutions.

3 Research Process

This chapter will describe our research process. It will provide a description of the

methodology used in solving the problem, give an overview of the overall process,

and lay down the hypotheses for this thesis.

3.1 Research Methodology

Since the thesis consisted of two separate research questions, two different research

strategies were used.

In order to answer the first research question, how to increase the throughput of the

application, practices on how to create a server in Node.js were investigated. The

solution was determined through a combination of quantitative and qualitative

methods, where a form of applied research based on existing theories and research [25] was used to create a test application, which could then be evaluated by answering the

second research question. If the results from answering the second question would

lead to a substantial increase (≳20%), the results from the evaluating the first question would be considered positive. If not, the first question would need to be

reevaluated based on another existing theory.

When answering the second research question, how substantial the increase in

throughput would be, two different methods were followed. Experimental research

was conducted by having a foundation for this thesis by comparing different test

results with one changing variable per test. In our case, these variables were

represented by 1) the load sent to the application during the test, 2) the number of

workers used by the application in the test, and 3) the environment the test was

conducted in (local or Heroku).

The hypotheses could also be predefined for the outcome of the comparison, and

thereby, a method of the analytical kind was also used [25]. Thus, the methodology used for answering the second research question was a combination of two research

methods: experimental and analytical.

3.2 Process Overview

The methods listed in this section are described in order to give an understanding on

how this thesis was structured to be able to achieve the goal and answer the research

questions defined in chapter 1. The overall research process is illustrated in figure

3.2.1, and is described in detail below.

Figure 3.2.1: The research process

3.2.1 Problem Definition

This was the phase where the problem was defined out of the requirement

specification received from the company.

3.2.2 Data Collection

Data collection consists of two different types of data: primary data and secondary

Primary data is most generally described as data collected from the information

source and most often is retrieved through interviews, observations and discussions

with members of the company [26].

Secondary data, in turn, is typically gathered by persons not involved in the current

research. The sources of this kind of data can be technical and statistical records,

newspaper articles, etc [26].

The primary data that the qualitative part of this thesis relies on mainly consists of a

task overview given by Innometrics’ supervisor of this thesis, and of informal

interviews given by the employees of the company.

The overview given by the company consisted of recommendations on what modules

to use for the thesis – partly modules used by the company daily when designing

applications for the platform, and partly modules that could contribute to this thesis.

Recommendations on what tools that might be used when performing the tests were

also given.

The informal interviews given by employees consisted of recommendations on how

to set up the remote environment, and information on the average traffic that the

Innometrics system is exposed to.

This primary data was then complemented by document studies in the form of

company documentation on the platform, technical reports, and articles on the

subject. Such materials can give a better and deeper understanding of the subject.

The primary data that the quantitative part of this thesis relies on mainly consists of

test results obtained from tests conducted in order to answer the second research

question on how substantial the increase in throughput was (in case there was an

increase).

3.2.3 Design & Implementation

In this phase, a test application is to be designed and implemented based on a known

method for increasing throughput in Node.js. The initial design of the test

application is the result of the primary data obtained through qualitative methods

just described, and it defines the architecture and functionality of the application.

3.2.4 Defining the Testing Environments

In this phase, the specifications for the machines, of both the local and the Heroku

testing environments, were laid down.

3.2.5 Creating Test Plan

During this phase, focus laid on creating a test plan that included test for both the

local and the Heroku testing environment. The test plan were to be designed to test

the application’s throughput in the case of both I/O and CPU bound tasks, different

rates of traffic sent to the application, and different number of worker processes for

each traffic rate and type of task.

We had been informed on the structure of the requests being sent to the application,

and by reusing that structure we only needed to adapt the request’s body to contain

data relevant for the test application. The body data that was relevant for this thesis

was simply a string, in order to determine which function to call (I/O or CPU bound).

3.2.6 Results and Analysis

This phase consisted of two iterations: one for local tests, and one for Heroku tests.

In both iterations, the test application was benchmarked, and the results of the

benchmark was then analyzed. The results were presented in form of tables and

graphs. Increases in throughput were expressed in terms of a percentage increase

between each test.

The analysis consisted of a type of formative evaluation, where concentration laid on

examining and changing processes as they occur. The last iteration was evaluated, in

case it had provided positive results the process would continue to a final evaluation

of the solution. In case it had provided negative results, a new iteration would be

initiated.

3.2.7 Evaluation

The evaluation of the solution to this thesis was to have a summative approach,

providing an overall description of the application’s performance increases. It was to

be described whether the objectives of the thesis had been fulfilled, and on the future

direction of the product. Here, a secondary analysis was also to be given to reexamine

existing data to address new questions or methods not employed.

3.3 Hypotheses

Our hypotheses were that the results would show that the test application would have

performance increases in areas where Node.js usually was flawed. In other words:

when benchmarking one and the same Node.jsapplication with and without our

application template, a performance increase, in the form of a higher throughput,

should be apparent when doing CPU heavy tasks, such as calculating a hash or when

doing other arithmetic calculations. However, when doing I/O bound tasks it should

result in a status quo.

4 Analysis: How to Increase Throughput

This chapter will provide the answer to the first research question of this thesis how

can the throughput of a Node.js application, running on the Heroku Platform, be

increased by taking advantage of the available system resources? The answer was

obtained by using the qualitative methods described in section 3.2.2. It will provide a

description of the test application used in this thesis.

The application was to consist of partly the throughput increasing template that was

to be the product of this thesis, and partly functionality for testing two different

aspects – it’s capabilities of fulfilling I/O and CPU bound tasks – of the test

application in its current environment.

4.1 Our approach

When analyzing data retrieved during the data collection phase, we found that there

were not so many ways to increase throughput of a Node.js application.

The main method for increasing throughput in Node.js is by creating multiple

processes for the application, thus utilizing more of the available system resources.

This is known as clustering the application, and is mainly implemented by the

Cluster module described in section 2.5.

4.1.1 Different Implementations of the Cluster Module

There are several alternatives for implementing worker processes in Node.js [27]. One of these is to simply use the standard Cluster module, which comes as a standard

library in Node.js, and provides the most basic mechanisms for implementing worker

processes. More about the implementation of the Cluster module for this thesis can

be found in section 4.2.

There is also the alternative of implementing the Throng module [28], which is used by Heroku in their own example on how to cluster. This module is also implemented on

top of the Cluster module. It is being advertised for being “a simple worker manager

for clustered Node.js apps”, by obscuring large parts of the master/worker logic when

clustering the application – in order to make it easier for the developer. Instead, the

developer mainly has to focus on setting the number of workers, configuring the

master process, etc.

Another alternative is PM2, a program which is also implemented on top of the

Cluster module. It is similar to the Throng module by obscuring large parts of the

master/worker logic from the developer, but does so to an even larger extent. It also

provides the application with some additional functionality such as real time process

management [29] (e.g. adding workers), basic system monitoring, log aggregation, etc [30].

Lastly, there is the alternative of implementing the StrongLoop Cluster Management

Tool [27], which also is based on the Cluster module, and basically provides the same functionality as PM2, but with some smaller differences (such as profiling).

4.1.2 Clustering Method Chosen When Creating the Application

Template

When it came to this thesis, it was found that the standard Cluster module was the

most appropriate way to implement clustering in the application.

When looking at the alternatives, they either tended to hide larger parts of the cluster

related code from the developer (Throng, PM2, and StrongLoop), or offer

functionality not relevant for this thesis – which might have lead to a larger memory

usage (higher memory allocation) for the processes. They are also all built on top of

the Cluster module, and it also seemed easier to adapt the standard Cluster module

to different cloud platforms compared to the other alternatives [31].

While other alternatives, with additional functionality, might be useful in a livecase

scenario – it was not appropriate for this study, where it was desired to evaluate the

effects of clustering on the most basic level.

4.2 The Application Template

When creating the template, we relied on the official Node.js documentation, and the

description of the Cluster module in particular, on how to create, and cluster, a web

application. This lead to a template realizing the server model described in section

2.5 (and depicted in figure 2.5.1).

When developing the template, it was important to keep the master process as light

as possible, by keeping the allocated memory for the process at a minimum, and not

to include any server related code, or any other code that was not relevant to its task

of managing the workers. The reason for this was to optimize memory usage on the

cloud platform, and since it was the workers that did the request handling

procedures, it was important that they had a maximum amount of memory available.

In order to change the number of worker processes dynamically for each application

instance, an environment variable that could be set via the command line was used.

On Heroku, this variable held the number of workers appropriate for the number of

dynos used for the application.

According to official Node.js Cluster documentation, the default strategy when

creating worker processes in an application, was to use the worker processes as

request handlers (receives and treats requests), and use the master process for

creating workers and handing sockets to them through interprocess communication

(IPC) – a mechanism for sharing data among multiple process [4].

Figure 4.2.1: The master related code of the template

This works in the way that, as the application instance is started, the master creates a

number of workers equal to the value represented by the environment variable

mentioned earlier (see figure 4.2.1, rows 2527). Also, on rows 2931, it is shown how

a new worker is generated by the master in case a worker dies (process somehow

shuts down).

Figure 4.2.2: The worker related code of the template

In figure 4.2.2, the worker related code of the template is shown. The worker starts

by instantiating the Express framework (row 34) – a Node.js framework used for

creating web applications providing the process with necessary server functionality.

On rows 3941, it can be seen how each worker process listens to the same port.

The code for handling each request sent to the application is shown on rows 4451.

As a request is sent to the application, the request is treated and a response is

generated in the callback of this method.

The complete template of this thesis can be found in Appendix 3.

4.2.1 CPU Usage

As mentioned earlier, a Node.js application has a singlethreaded event loop,

utilizing only one of the available CPU cores. To increase throughput only using the

available system resources, it should be specified in the application how many

worker processes that are to be created.

As mentioned in section 2.5, best practice in determining how many worker threads

that should be created for a particular application, is to base it on the number of

cores available to the system. That way, each process is bound to a single logical core.

The desired CPU usage can be seen in figure 4.2.1.1

Figure 4.2.1.1: Regular vs. desired Node.js CPU usage

Using the Cluster module, this can easily be implemented on a physical machine,

where you exactly know the specifications of the machine. When it comes to a cloud

platform, however, there is not much information revealed about container

specifications. A single Heroku dyno shares access to system resources with some

other dynos and the performance of a single dyno can vary depending on the total

load on the underlying machine. Therefore, according to Heroku’s article

“Optimizing Node.js Application Concurrency” [21], clustering more than one worker on standard single dyno may hurt, rather than help performance. This was one of the

things to be considered when performing the tests.

4.2.2 Workload

Analyzing the information received from observations and recommendations, it was

clear that the application should be able to handle different amounts of simultaneous

requests. The customers that use the application are of different kinds it can be

either large or small companies. Thus, the application should take into consideration

those differences, i.e. it should be able to handle both larger and smaller amounts of

clients. Therefore, finding the right balance of workers is very important.

4.2.3 Memory Usage

Applications can differ in memory usage. Some applications, in need of larger

memory allocation (≳200 Mb for a single application), might suffer from implementing worker processes on a single Heroku dyno (due to exceeding the

memory limit). Exceeding the memory limit could lead to the application not

performing desirably, with requests timing out (not receiving responses). Therefore,

when clustering an application the memory usage of each process has to be kept in

mind – the application’s overall memory usage must not exceed the dyno’s memory

limit.

4.3 Test Application

A template was created, and by that the first research question was partly answered.

In this case, the next step would be to verify whether the template would give the

desired increase in terms of throughput on the Heroku platform or not. In order to

do that a test application was to be created and the second research question to be

answered.

The application had to provide means for testing its capabilities of doing different

tasks. From discussions with people at the company, it was discovered that the

system sometimes calculates hashes when creating new profiles. Therefore, the test

application needed to provide the ability to run two different types of tasks: CPU and

I/O bound. The CPU bound function calculated a hash, while the I/O bound task

simulated an I/O call by doing a timeout of 300 ms where the application simply was

waiting without blocking the event loop. The function to run was parsed from the

HTTPrequest that the application received. In each of the two functions, an

appropriate response was generated. The same test application (see Appendix 2) was

used in both local and Heroku tests.

5 Analysis: Benchmarking the Test Application

This section will focus on describing the testing environment, the test plan, and the

results obtained from the tests, in a local environment and on Heroku. It will provide

an analysis of the results, in order to answer the second research question of this

thesis on how substantial the increase in throughput of the application can be when

clustering functionality has been added.

5.1 Testing Environment Analyzing the information gathered during the data collection phase when

attempting to answer the first research question, we came to the conclusion that it

was needed to define a local and a Heroku testing environment.

Heroku themselves inform[21] that an application might suffer from being clustered when running on a free account. The tests were thus first conducted on the test

application locally, with the goal of acquiring the expected results in a stable

environment. Tests were then conducted on the same application, but instead

installed on Heroku, with the expectation of obtaining similar results.

In both testing environments we used six different versions of our application – one

without clustering functionality, and five with clustering functionality, each with a

given number of workers available to the application instance (1, 2, 4, 8 or 16). The

version without clustering functionality was needed in order to confirm that the

added functionality would not affect the performance of the application.

For both environments the same machine was used as client. The specifications of

the client machine were:

Macbook Air (13inch, Mid 2013)

CPU: 1.7 Ghz Intel Core i7

Memory: 8 GB 1600 Mhz DDR3

OS: Mac OS X El Capitan, Version 10.11.4

100/10Mbs Ethernet Connection

5.1.1 Local Environment

The local testing environment consisted of two machines: one client (with the

specifications given above), and one server with the following specifications:

MacBook Pro (13inch, Mid 2012)

CPU: 2.5 Ghz Intel Core i5

Memory: 4 GB 1600 Mhz DDR3

OS: Mac OS X El Capitan, Version 10.11.5

100/10Mbs Ethernet Connection

Through a terminal command in Mac OS X, the specifications for the Intel Core i5

CPU could be retrieved (see Appendix 4). Here, it could be seen that the CPU had

access to 2 cores and 4 hardware threads. This later on became a determinant when

deciding on what was the most appropriate amount of worker threads to be used

when running a local server.

5.1.2 Heroku Environment

Summarizing the specifications given for the Heroku dyno in section 2.2:

CPU: varied share depending on how many other dynos that are currently

active on the shared host

Memory: 512 Mb

Due to having significantly less memory available in the Heroku environment

compared to the local one, and due to the fact that a dyno had a varied share of the

CPU, it was needed to establish the results locally first. We thought that if the

expected results from the hypotheses (i.e. getting a throughput increase only for CPU

bound tasks) could be obtained in a local environment, it would be worth testing on

Heroku as well. If not, the expected results would definitely not be obtained on the

lower performing machines that we had in our Heroku environment.

5.1.3 The Test Application’s Memory Usage By monitoring the application’s memory usage locally through a Mac OS X terminal

command, “top”, we could see that it used 20 Mb without any requests being sent to

it. When requesting the application to run CPU bound tasks, on the other hand, its

memory usage could climb up to 85 Mb, but averaged around 65 Mb. When

requesting the application to run I/O bound tasks, on the other hand, its memory

usage could climb up to around 80 Mb, but averaged around 60 Mb.

Since we, on Heroku, had a memory quota of 512 Mb, we had now been given an

equation for calculating the appropriate number of workers for the application. By

having 512 Mb in total memory available, and the application having a max memory

usage of around 85 Mb when it was performing CPU bound tasks (the most memory

demanding task), the most appropriate number of workers would be around 512 Mb

/ 85 Mb ≈ 6 workers. Considering that the master process also would need some

memory allocated, the appropriate number of workers would most likely be slightly

below 6.

Among our different versions of the application, we could thereby predict that the

one having 4 workers would produce the best results by giving an increased

throughput, while not exceeding the memory limit of the dyno (and still leave a

margin to it). The application utilizing 4 workers would have a memory quota of 512

Mb / 4 workers = 128 Mb available for each worker (minus master process memory

usage). This meant that when the application would be exposed to high traffic

ordering it to perform CPU bound tasks, each worker would still have a memory

quota of 128 85 = 43 Mb available, which should be considered as a good margin,

without leaving a significant amount of unused memory on the dyno.

Summarizing, it is important that the memory usage of the application’s processes

does not exceed the available memory of the Heroku dyno, and that it, ideally, lies

with a good margin below this value – but not too good, because then a large amount

of memory could become unused. The problem, concerning the Heroku

environment, had thus become memory related as well (not only CPU related).

5.2 Testing Tools

This section will describe testing tools used when running tests locally and on

Heroku.

5.2.1 Apache JMeter

JMeter is a Java application designed to load test functional behavior and measure

performance. It provides means for simulating a heavy load on a server, groups of

servers, or network, to test its strength or to analyze the performance under different

load types. It has the ability to load and performance test many different

server/protocol types: HTTP/HTTPS, FTP, TCP etc.

Figure 5.2.1.1: Example properties of a thread group

With each testing plan, the user creates a thread group, specifying a thread number,

a rampup period and a loop count. The thread number specifies how many threads

that are to be started in the beginning of each rampup period (specified in seconds),

and the loop count specifies how many times this procedure should be repeated. In

figure 5.2.1.1, there is an example of the properties that can be set for a thread group.

Here, 10 threads are being initiated each second, and this is looped 320 times.

Figure 5.2.1.2: An example of the properties of an HTTP Request Sampler

Within each thread group, in turn, there are several elements that can be included.

For example, in our case it was relevant to include an HTTP Request Sampler – an

object that contains information on an HTTP request that is to be sent with each

thread in the thread group. Figure 5.2.1.2 shows an example of properties set for an

HTTP Request Sampler that sends request to port number 8887 on IP 192.168.1.104.

The body data can also be set, this is however something that we could not show due

to risking company policy infringement.

There is also a possibility of generating aggregated reports. This type of report is

what lies as a basis for presenting the results of the tests performed in the local

environment.

5.2.2 Heroku Metrics

When running the tests on Heroku, we used JMeter for sending the requests, but not

for measuring the application’s performance. This was due to JMeter having a

different measurement of throughput, which was based on the number of samples

divided by the total time of the test. This meant that the time for the request being

sent to, and received by, the server, and the time for the response being sent to, and

received by, the client, being included in the measurement as well. This was an

acceptable measurement in the local environment, since the distance between client

and host was small. Now, however, with the application deployed on an external

host, we had to take into consideration that there might be a significant distance

between the client and the host. Therefore, it was decided that the just mentioned

transport times were something that should not be a part of the application’s

performance evaluation.

In order to measure the application’s performance ideally, it was important to do the

measurements as close to the application as possible. This could be done by relying

on the metrics tool which Heroku has made available for developers. The tool

consisted of a collection of graphs including the same units of measurement for the

application as those retrieved from the JMeter reports used in the previous section –

namely throughput, average and median response times, and error rates.

5.3 Creating the Test Plan The testing procedures of the application followed a pattern where, for each type of

task (I/O or CPU bound), the number of requests sent to the application was

gradually increased – in order to evaluate how well the application performed

different tasks at different traffic rates.

The load rate for each test varied between 10100 requests per second. The rates of

1025 requests per second were to simulate low traffic, 2550 medium traffic, and

50100 high traffic. For all tests 15000 samples were sent to the application.

Figure 5.3.1: Example of six thread groups each containing one HTTP Request Sampler

In order to test each application sequentially, there was one thread group (see figure

5.3.1) for each number of workers available for each application. Within each thread

group, we had specified a HTTP Request Sampler (described in section 5.2.1) sending

to that application's specific endpoint. As mentioned earlier, regardless of whether

the application was running locally or remotely on Heroku we set up the same

samplers in each of the thread groups – only changing the names of the thread

groups and the host URL for the samplers.

5.4 Local Tests

This section describes the evaluation of the application’s capabilities in performing

I/O and CPU bound tasks in the local environment. Focus was laid on differences in

throughput between tests, but average and median response times will also be noted

and analyzed.

5.4.1 I/O Bound

Starting with the I/O bound tests and sending in 10 requests per second (see figure

5.4.1.1), it was noted that the results were similar between each thread group – no

matter the number of worker processes used. Both the average and median response

time is close to the same for each of the thread groups. The throughput (number of

requests handled by the application per second) is also similar between the thread

groups.

Label # Samples Average (ms) Median (ms) Error % Throughput (rps)

10/s With Cluster, 2 workers 15000 317 308 0,00% 30,8

10/s Without Cluster 15000 307 307 0,00% 31,9

Figure 5.4.1.1: Results from local I/Obound tests at 10 and 100 request per second (sorted by

throughput)

Looking at the results from the test where 10 requests were being sent per second,

there is barely a difference when comparing the one without clustering and the ones

utilizing it. When looking at the results from the other test (100 requests per second)

the result was the same. The difference in throughput between the thread without

clustering and the highest performing thread with clustering was ~0.4%, which is not

a significant difference (and does not pass the bar of 20%).

To summarize, when the application was performing I/O bound tasks in a local

environment, the test results did not show a significant difference in terms of

throughput, and none of the results passed the bar of an increased throughput of

20%. In accordance with the hypotheses (described in section 3.3) and the research

process depicted for this thesis (section 3.2.6) of only moving onto Heroku with tests

that proved an increase in the local environment, an increase in throughput could

not be seen as the application performed I/O bound tasks.

All of the results obtained from testing the application’s capabilities of performing

I/O bound tasks in the local environment can be found in Appendix 5.

5.4.2 CPU Bound

As can be seen in figure 5.4.2.1, when simulating low traffic (10 requests per second)

the results obtained showed significant difference between the thread groups. When

comparing the thread group without clustering to the highest performing one with

clustering (4 workers), it showed a difference of ~24.4%. There could also be noted a

decrease of 25% in average and median response times between the two thread

groups.

Label # Samples Average Median Error % Throughput

Figure 5.4.2.1: The results of the CPU bound tests at 10 and 50 requests per second (sorted by

throughput)

Moving on to the test where requests were being sent to the application at 50

requests per second, the results obtained showed an increase in throughput of

~42.1% when comparing the application without clustering and highest performing

clustered application (8 workers). In this case, a decrease in average response time

when comparing the two threads were of ~64.9%, and in terms of median response

time the decrease was ~61.5%.

Figure 5.4.2.2: The results of the CPU bound tests at 100 requests per second (sorted by throughput)

Looking at the test where requests were being in at 100 requests per second (see

figure 5.4.2.2), a difference of ~43.1% in terms of increased throughput could be

noted between nonclustered and best performing clustered application (8 workers).

In this case, a decrease of ~52.4% in average and ~48.8% in median response times

could also be seen. Lastly, it was noted that the application with 1 worker performed

~4.8%14.4% lower than the application not implementing clustering.

In conclusion, the results obtained from testing the application’s capabilities of

performing CPU bound tasks locally showed increases in terms of throughput higher

than the bar of 20%. Because of the results showing this increase, CPU bound tests

were to be conducted in the Heroku environment as well. When comparing the

application only utilizing 1 worker, however, the throughput was lowered by

~4.8%14.4% compared to when not utilizing clustering at all. The full results of the

CPU bound tests can be seen in Appendix 6.

5.5 Heroku Tests This chapter presents the evaluation results of the same application used in the local

tests, but deployed on the Heroku platform instead, are presented.

Worth taking note of is that it was here decided not to test the I/O bound function on

Heroku, as the results obtained from analyzing the local tests spoke for the cluster

package not contributing to any increases concerning throughput in this

environment.

Here, the test results are based on the output of the Heroku metrics. Each bar in the

diagram represents the performance of the application during a given minute in

time. The vertical line apparent in most of the diagrams (e.g. figure 5.5.1.1)

represents a specific minute, chosen by us to analyze. This specific minute belongs to

the highest throughput value obtained during each test. All test results in this section

are based on results from when the application was performing CPU bound tasks on

Heroku.

5.5.1 Throughput Rates

The results obtained from simulating low traffic for an application deployed on the

Heroku platform did not show any significant difference. In figure 5.5.1.1 it can be

seen that the throughput is about the same (~10000 requests/min) for an

application without clustering and for the highest performing application with

clustering (4 workers).

Figure 5.5.1.1: A comparison in throughput between tests at 10 rps without clustering (upper) and with

4 (lower) workers

Continuing with the test results obtained from the tests running at 50 requests per

second (see figure 5.5.1.2), a difference of ~20.6% in throughput could be noted

when comparing the application without clustering to the best performing

application with clustering (4 workers). Thus, passing the bar of 20% set out as the

prerequisite for being considered as a positive result.

Figure 5.5.1.2: A comparison in throughput between tests at 50 rps without clustering (upper) and 4

(lower) workers

When looking at the results obtained from the tests running at 100 requests per

second (see figure 5.5.1.3), a significant difference in throughput could be seen. Here,

there is a difference of ~54.9%, which also passes the bar of 20%.

Figure 5.5.1.3: A comparison in throughput between tests at 100 rps without clustering (upper) and 4

(lower) workers

Lastly, looking at the results from each test (10100 rps), it was shown that the

differences in terms of lowered throughput for the application utilizing 1 worker

compared to the one without clustering persisted. Comparing the two, the

performance was lowered with ~1.55.3%, when sending 10100 requests/second to

the application (see Appendix 7).

5.5.2 Memory Usage

Looking at the memory usage when comparing the application without clustering

and the one with the best performance (4 workers) during high traffic, it can be seen

(in figure 5.5.2.1) that a significant amount of the available memory to the

nonclustered application is unused, namely 407 Mb. The reason for looking at high

traffic in particular is because it is the worst case scenario for the application (when

the highest amount of memory is allocated).

Figure 5.5.2.1: Memory footprint of the application without clustering (upper) and with 4 workers

(below) running at 100 rps

Looking at the memory usage of the second application (4 workers), it can be seen

(also in figure 5.5.2.1) that more of the memory available is being used by the

application, leaving 221 Mb unused. The vertical line represents the same time as in

figures 5.5.1.3.

Additionally, the application in this case (100 rps) exceeds the memory limit when

having 8 or 16 workers, leading to requests queueing up (see Appendix 7, figure 23

and 24). These queued requests further adds to the memory quota of the application,

and it can eventually lead to the application crashing (at 1 Gb memory usage [10]), thus loosing data.

5.5.3 Median Response Times

When looking at the tests, the median response times varied between being roughly

the same during low traffic (10 rps), and having a decreased time of ~94% during

high traffic (100 rps), when comparing the application without clustering to the one

having 4 workers (see figure 5.5.3.1). The vertical line in the figure marks the same

time as in figures 5.5.1.3 and 5.5.2.1.

Figure 5.5.3.1: Response times for application without clustering (upper) and with 4 workers when

sending at 100 rps

5.5.4 Analysis of Heroku Test Results

When performing CPU bound tasks in the Heroku environment, an increase in

throughput, passing the set out bar of 20%, could be seen when the traffic sent to the

application was of medium to high rate (50100 requests per second).

It could be seen how the memory when using 8 workers at 100 request/second

exceeded the limit of the dyno. Thus, requests started queueing up, which could

eventually have led to the application crashing. At the same traffic, when clustering

the application, it could be seen that a large amount of the available memory of the

dyno was unused, thus not utilizing the full capacity of the dyno. The memory usage

of the applications at high traffic, thus, spoke for using 4 workers within the

application.

Additionally, in the Heroku tests, as with the local tests, there was some decrease in

throughput when comparing the nonclustered application with the one utilizing 1

worker. This meant that implementing the Cluster module and instantiating only one

worker process would most likely lead to a small decrease in throughput instead of

an increase. This is most likely due to the procedure of the master process both

needing to create the worker process at the start of the test, and due to redundancy of

having a master process doing IPC when only having one worker.

In conclusion, throughput increases could be seen in the Heroku environment as

well. The answer to the second research question of this thesis, on how substantial

the increase in throughput could be, is thus: when sending medium to high traffic

(50100 rps) to an application performing CPU bound tasks, the increase in

throughput varies between ~20.6% (when sending 50 rps) and ~54.9% (when

sending 100 rps). When doing I/O bound tasks, on the other hand, there are not any

throughput increases. The obtained answer confirmed the hypotheses depicted in

section 3.3.

6 Discussion

This chapter presents discussion and our conclusions on this study, which will reflect

our interpretations of the results and problem areas related to the thesis. Finally, the

future direction of the template will be discussed.

6.1 Our Methodology and Consequences of the Study

The purpose of the thesis was to create a template for Node.js applications deployed

on the Heroku cloud platform. The problem definition described in section 1.2 led to

two research questions (see section 1.3), thus dividing the research process into two

parts.

When answering the first question, the applied research method was chosen. This

method belongs both to qualitative and quantitative research. This choice was

determined through early investigation, where it was found that there were not so

many techniques that could be applied in order to design and implement the

template. This was due to the fact that we were bound to a particular situation and a

very specific implementation environment. This led to there only being one solution

to the problem.

The problem with this research method was that it relied too much on the outcome of

the results of the second question. It would require further investigations if the

results would be negative and it would have made the research process iterative,

where a new technique for increasing throughput would have to be evaluated.

The data collection phase of the first question included collection of primary and

secondary data. The primary data consisted mainly of informal interviews and our

observations at Innometrics. This data was then complemented by the document

studies related to our research field. However, we had trouble finding academic

literature on the subject. This is probably partly due to the narrowness of the

problem area, and partly due to the fact that the technologies being discussed are

rather new. That is why we were careful when analyzing the quality of the available

resources, otherwise potential problems could arise during the later phases of the

research process.

The implementation part of the study focused on the development of a Node.js

template for Innometrics. This is the part of the research process that heavily relied

on the result of the second research question.

Before answering the second question, the right assumptions needed to be made. In

our case, it was to focus on CPU bound functions. That is why the tests gave the

desired result. The only problem was that the Heroku environment, where the tests

were conducted, was very unpredictable in terms of available system resources.

In conclusion, the research process made it possible to identify the right problem

areas and choose the right methods. As there was no previous research in this field

and the problem was of a practical character, our thesis could serve as a good ground

for the future development of the applications at Innometrics, as well as for all

developers experimenting with Node.js and the Heroku platform.

6.2 Discussion and Conclusions

The only solution found when trying to answer the first research question, how to

increase throughput for the applications within Innometrics’ system, was to utilize

the Cluster module (there were alternatives on how to implement the module,

however), which was a part of the standard library in Node.js.

When faced with the problem of testing the application on Heroku, it was found that

the problem was not only focused on utilizing more of the CPU’s capacity anymore,

but on optimizing the memory usage as well. This meant we could not focus only on

finding the number of workers giving the highest throughput, but now also had to

consider the memory usage of the application, and make sure that the application,

with its processes, stayed well below the memory limit of the dyno that it was

installed on. Otherwise, the memory might be exceeded, which could eventually lead

to the application crashing, thus loosing data.

Analyzing the obtained test results, it showed an increase of ~20%55% when

utilizing the Cluster module in a test application when performing CPU bound tasks

during medium to high traffic (50100 requests per second). However, through tests

performed in a local and higher performing environment, we came to the conclusion

that, when doing I/O bound tasks, a clustered application would not show any

increases in throughput in the Heroku environment, regardless of the traffic sent to

it. This was in accordance with the hypotheses for this thesis (depicted in section

It was, also, found that the highest increases in throughput, when doing CPU bound

tasks in the Heroku environment, could be seen when using 4 workers in the

application. This number of workers, too, showed to stay well within the memory

limit of the dyno (thus minimizing the risk of exceeding it during sudden increases in

traffic).

Regarding the test results, it was also noted that utilizing only one worker led to a

throughput decrease of ~1.5%5.3% (depending on the traffic sent to the application),

which meant that an application should only utilize the application template in case

it was in need to, and had the memory available, instantiate more than one worker.

The obtained results from answering the second research question (on the

substantiality of the increase in throughput), confirmed that the answer to the first

research question on how to increase throughput in a Node.js application was to

utilize the Cluster module.

6.2.1 Recommendations Concerning the Application Template

Our recommendations when utilizing this template in an application installed on a

Heroku free account is to implement it mainly in applications that are exposed to

traffic around 50 requests per second and higher. It is also recommended to check

the memory usage of the application before implementing this template, to make

sure that it is able to utilize at least more than one worker, in order to obtain an

increase in throughput, without exceeding the memory limit of the dyno.

6.3 Ethics

A problem encountered considering the ethical parts of the project was that we

needed to avoid revealing company secrets about either Innometrics, one of its

customers, or the customer’s clients (the Innometrics profiles). Therefore, the report

is kept in as general terms as possible when writing our report.

Also, by increasing the throughput of an application, more data can be collected per

minute. This means that the consumer behaviour can be tracked more detailed, and

this will lead to a better communication between customer and company. But at the

same time, concerning the individual person, this project might not benefit in a

positive way. It contributes further in monitoring persons, which is a controversial

subject today. A more detailed customer profile might lead to a person feeling

pursued. This is, however, not up to us, but to the company using the template (in

this case Innometrics).

6.4 Sustainability

When speaking about sustainability, there are four different general areas to discuss:

environmental, human, economic, and social sustainability [32].

Concerning environmental sustainability, by increasing the throughput of the

applications, more clients can be handled per minute. This can lead to a faster

abrasion of the machines running the system, since the machine in question now will

be handling more clients simultaneously – thus doing more work per minute. It can

also lead to the energy consumption of the machine going up, due to utilizing more of

the machine’s capacity. Both of these aspects will lead to further damaging our

environment.

It can, however, also lead to less abrasion on the machine by completing demanding

work sessions faster than before, and thereby, giving the machine time for recovery

(e.g. lowering machine temperatures) before strenuous periods.

Lastly, concerning economical sustainability, by increasing the throughput of the

application, it will not crash as often. This will lead to an increase in profits for the

companies in need of the service, and for the company providing the service as well.

Also, by only using the free account on Heroku, it will save expenses for Innometrics.

6.5 Future Work

Despite the fact that the the application template can already be implemented, there

are still some areas that can be further investigated.

When it comes to implementation of the Cluster module, there are different

possibilities. It could be worth evaluating some of them (PM2, StrongLoop). They

will probably not give a much better performance, if any at all, but they can give

more convenient ways of managing the worker processes and some other features,

which can facilitate the maintenance of the application in the future. For example,

there will be no need to undeploy the application when making changes.

In order to make the application as generic as possible for different platforms, other

cloud platforms should also be considered. Similar tests can find the best suitable

platform.

In the case of Heroku, there are other types of accounts providing more system

resources. Detailed comparison tests could give a clear picture of differences in

performance between different types of accounts.

It could also be interesting to do further research on finding the exact appropriate

number of workers for an application. In this case, 4 workers had, as mentioned, a

good margin to the memory limit, but it might still not have been the most optimized

memory usage of the application. Thus, testing 56 workers for this particular

application might have showed even better results.

References

[1] “IAB Internet Advertising Revenue Report”, 2012. [Online], p. 16. Available:

http://www.iab.net/media/file/IAB_Internet_Advertising_Revenue_Report_FY_2

012_rev.pdf , [Accessed: 1 March 2016] [2] B. Cantrill, J. Bonwick, “Realworld Concurrency”, ACM Queue, Volume 6, Issue

5, September 2008. [Online]. Available:

http://queue.acm.org/detail.cfm?id=1454462, [Accessed: 25 May 2016] [3] A.V. Aho, M.S. Lam, R. Sethi, J.D. Ullman, Compilers, Principles and Techniques,

p.247. 2006, Pearson

[4] S.D. Burd, Systems Architecture, Fifth Edition, p.45. 2004, Course Technology a

division of Thomson Learning

[5] E. Griffith, “What Is Cloud Computing?”, PCMag UK, 3 May 2016. [Online].

Available:

http://uk.pcmag.com/networkingcommunicationssoftwareproducts/16824/featur

e/whatiscloudcomputing [Accessed: 24 May 2016] [6] Heroku, “The Heroku Platform as a Service & Data Services”. [Online]. Available:

https://www.heroku.com/platform, [Accessed: 24 May 2016] [7] B. Butler, “PaaS Primer: What is platform as a service and why does it matter?”,

Network World.com, 11 February 2013. [Online]. Available:

http://www.networkworld.com/article/2163430/cloudcomputing/paasprimerwh

atisplatformasaserviceandwhydoesitmatter.html , [Accessed: 25 May 2016] [8] L. R. Rewatkar, U. A. Lanjewar, “Implementation of Cloud Computing on Web

Application”, International Journal of Computer Applications, Volume 2 N0.8,

June 2010. [Online], pp. 2831. Available:

http://www.ijcaonline.org/volume2/number8/pxc387964.pdf , [Accessed: 24 May 2016]

[9] S.Lynn, “What Is CRM?”, PCMag UK, 18 August 2011. [Online]. Available:

http://uk.pcmag.com/software/9038/feature/whatiscrm, [Accessed: 24 May 2016] [10] Heroku Dev Center, “Dynos and the Dyno Manager”. [Online]. Available:

https://devcenter.heroku.com/articles/dynos , [Accessed: 24 May 2016]

[11] Heroku Dev Center, “Dyno Types”. [Online]. Available:

https://devcenter.heroku.com/articles/dynotypes , [Accessed: 24 May 2016] [12] S. Saini, H. Jin, R. Hood, D. Barker, P. Mehrotra, R. Biswas, “The Impact of

HyperThreading on Processor Resource Utilization in Production Applications”,

NASA Advanced Supercomputing Division, 2011. [Online]. Available:

https://www.nas.nasa.gov/assets/pdf/papers/saini_s_impact_hyper_threading_20

11.pdf , [Accessed: 25 May 2016] [13] E. SwensonHealey, “The JavaScript Event Loop: Explained”, Carbon Five, 27

October 2013. [Online]. Available:

http://blog.carbonfive.com/2013/10/27/thejavascripteventloopexplained/, [Accessed: 25 May 2016]

[14] A.S. Tanenbaum, Modern Operating Systems, Third Edition, p.145., 2009,

Pearson

[15] Cluster, Node.js v6.2.1 Documentation, [Online]. Available:

https://nodejs.org/api/cluster.html

[16] A. Burgess, “Using Node's Event Module”, Envato Tuts+, 3 December 2013.

[Online]. Available:

http://code.tutsplus.com/tutorials/usingnodeseventmodulenet35941, [Accessed: 26 May 2016]

[17] D. Khan, “How to Track Down CPU Issues in Node.js”, about:performance

Application Performance, Scalability and Architecture, 14 January 2016. [Online].

Available:

http://apmblog.dynatrace.com/2016/01/14/howtotrackdowncpuissuesinnode

js/, [Accessed: 25 May 2016] [18] M. Ridwan, “The Top 10 Most Common Mistakes That Node.js Developers

Make”, Toptal [Online]. Available:

https://www.toptal.com/nodejs/top10commonnodejsdevelopermistakes

[Accessed: 25 May 2016]

[19] Linux Documentation. [Online]. Available: http://linux.die.net/man/2/fork, [Accessed: 24 May 2016]

[20] B. Noordhuis, “What’s New in Node.js v0.12: Cluster RoundRobin Load

Balancing”, StrongLoop, 19 November 2013. [Online]. Available:

https://strongloop.com/strongblog/whatsnewinnodejsv012clusterroundrobi

nloadbalancing/, [Accessed: 25 May 2016] [21] A. Gorbatchev, “Howto Cluster Node.js in Production with Strong Cluster

Control”, StrongLoop, 22 April 2015. [Online]. Available:

https://strongloop.com/strongblog/productionnodejsstrongclustercontrol/, [Accessed: 25 May 2016]

[22] “Optimizing Node.js Application Concurrency”, Heroku Dev Center. [Online].

Available: https://devcenter.heroku.com/articles/nodeconcurrency , [Updated: 24 September 2015]

[23] R. Manning, “Node.js Cluster and Express”, 10 January 2013. [Online].

Available: http://rowanmanning.com/posts/nodeclusterandexpress/, [Accessed: 28 May 2016]

[24] N. Kandalgaonkar, “Why you should use Node.js for CPUbound tasks”, 30 April

2013. [Online]. Available:

http://neilk.net/blog/2013/04/30/whyyoushouldusenodejsforCPUboundtask

s/, [Accessed: 25 May 2016] [25] A. Håkansson, “Portal of Research Methods and Methodologies for Research

Projects and Degree Projects”, 2013. [Online]. Available:

https://www.kth.se/social/files/55563be1f276547328cea897/Research%20Methods

%20%20Methodologies(1).pdf

[26] “Qualitative and Quantitative Research Techniques for Humanitarian Needs

Assessment. An Introductory Brief”, ACAPS, May 2012. [Online]. Available:

http://www.acaps.org/sites/acaps/files/resources/files/qualitative_and_quantitativ

e_research_techniques_for_humanitarian_needs_assessmentan_introductory_bri

ef_may_2012.pdf

[27] S. Kar, “Node.js Performance Tip of the Week: Scaling with Proxies and

Clusters”, StrongLoop, 22 April 2015. [Online]. Available:

https://strongloop.com/strongblog/nodejsperformancescalingproxiesclusters/, [Accessed: 16 May 2014]

[28] Throng, npm. [Online]. Available: https://www.npmjs.com/package/throng [29] J. Shkurti, “Node.js clustering made easy with PM2”, Keymetrics, 26 March

2015. [Online]. Available:

https://keymetrics.io/2015/03/26/pm2clusteringmadeeasy/, [Accessed: 5 June 2016]

[30] PM2. [Online]. Available: http://pm2.keymetrics.io/, [Accessed: 5 june 2016] [31] “ Using PM2 in Cloud Providers ”, PM2 Documentation, [Online]. Available: http://pm2.keymetrics.io/docs/usage/usepm2withcloudproviders/, [Accessed: 5 june 2016]

[32] Robert Goodland, “Sustainability: Human, Social, Economic and

Environmental”, Baltic University Programme a regional university network.

[Online]. Available:

http://www.balticuniv.uu.se/index.php/component/docman/doc_download/435su

stainabilityhumansocialeconomicandenvironmental , [Accessed: 5 june 2016]

Appendix 1 Heroku Dyno CPU Information

Results from running the “cat /proc/cpuinfo”command in the shell environment of each test application: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E52670 v2 @ 2.50GHz stepping : 4 microcode : 0x428 cpu MHz : 2500.090 cache size : 25600 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms bogomips : 5000.18 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E52670 v2 @ 2.50GHz stepping : 4 microcode : 0x428 cpu MHz : 2500.090 cache size : 25600 KB physical id : 0 siblings : 8 core id : 1

cpu cores : 4 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms bogomips : 5000.18 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E52670 v2 @ 2.50GHz stepping : 4 microcode : 0x428 cpu MHz : 2500.090 cache size : 25600 KB physical id : 0 siblings : 8 core id : 2 cpu cores : 4 apicid : 4 initial apicid : 4 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms bogomips : 5000.18 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management:

processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E52670 v2 @ 2.50GHz stepping : 4 microcode : 0x428 cpu MHz : 2500.090 cache size : 25600 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms bogomips : 5000.18 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 4 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E52670 v2 @ 2.50GHz stepping : 4 microcode : 0x428 cpu MHz : 2500.090 cache size : 25600 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13

wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms bogomips : 5000.18 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 5 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E52670 v2 @ 2.50GHz stepping : 4 microcode : 0x428 cpu MHz : 2500.090 cache size : 25600 KB physical id : 0 siblings : 8 core id : 1 cpu cores : 4 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms bogomips : 5000.18 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 6 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E52670 v2 @ 2.50GHz stepping : 4

microcode : 0x428 cpu MHz : 2500.090 cache size : 25600 KB physical id : 0 siblings : 8 core id : 2 cpu cores : 4 apicid : 5 initial apicid : 5 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms bogomips : 5000.18 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU E52670 v2 @ 2.50GHz stepping : 4 microcode : 0x428 cpu MHz : 2500.090 cache size : 25600 KB physical id : 0 siblings : 8 core id : 3 cpu cores : 4 apicid : 7 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm xsaveopt fsgsbase smep erms

bogomips : 5000.18 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management:

Appendix 2 The Test Application

Figure 1: First half of the test application

Figure 2: Second half of the test application

Appendix 3 The Application Template

Figure 1: Part one of the application template

Figure 2: Part two of the application template

Appendix 4 The Local Server CPU Specifications machdep.cpu.max_basic: 13 machdep.cpu.max_ext: 2147483656 machdep.cpu.vendor: GenuineIntel machdep.cpu.brand_string: Intel(R) Core(TM) i53210M CPU @ 2.50GHz machdep.cpu.family: 6 machdep.cpu.model: 58 machdep.cpu.extmodel: 3 machdep.cpu.extfamily: 0 machdep.cpu.stepping: 9 machdep.cpu.feature_bits: 9203919201183202303 machdep.cpu.leaf7_feature_bits: 641 machdep.cpu.extfeature_bits: 4967106816 machdep.cpu.signature: 198313 machdep.cpu.brand: 0 machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC POPCNT AES PCID XSAVE OSXSAVE TSCTMR AVX1.0 RDRAND F16C machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS machdep.cpu.extfeatures: SYSCALL XD EM64T LAHF RDTSCP TSCI machdep.cpu.logical_per_package: 16 machdep.cpu.cores_per_package: 8 machdep.cpu.microcode_version: 21 machdep.cpu.processor_flag: 4 machdep.cpu.mwait.linesize_min: 64 machdep.cpu.mwait.linesize_max: 64 machdep.cpu.mwait.extensions: 3 machdep.cpu.mwait.sub_Cstates: 135456 machdep.cpu.thermal.sensor: 1 machdep.cpu.thermal.dynamic_acceleration: 1 machdep.cpu.thermal.invariant_APIC_timer: 1 machdep.cpu.thermal.thresholds: 2 machdep.cpu.thermal.ACNT_MCNT: 1 machdep.cpu.thermal.core_power_limits: 1 machdep.cpu.thermal.fine_grain_clock_mod: 1 machdep.cpu.thermal.package_thermal_intr: 1 machdep.cpu.thermal.hardware_feedback: 0 machdep.cpu.thermal.energy_policy: 0 machdep.cpu.xsave.extended_state: 7 832 832 0 machdep.cpu.xsave.extended_state1: 1 0 0 0 machdep.cpu.arch_perf.version: 3 machdep.cpu.arch_perf.number: 4 machdep.cpu.arch_perf.width: 48 machdep.cpu.arch_perf.events_number: 7

machdep.cpu.arch_perf.events: 0 machdep.cpu.arch_perf.fixed_number: 3 machdep.cpu.arch_perf.fixed_width: 48 machdep.cpu.cache.linesize: 64 machdep.cpu.cache.L2_associativity: 8 machdep.cpu.cache.size: 256 machdep.cpu.tlb.inst.small: 64 machdep.cpu.tlb.inst.large: 8 machdep.cpu.tlb.data.small: 64 machdep.cpu.tlb.data.large: 32 machdep.cpu.tlb.shared: 512 machdep.cpu.address_bits.physical: 36 machdep.cpu.address_bits.virtual: 48 machdep.cpu.core_count: 2 machdep.cpu.thread_count: 4 machdep.cpu.tsc_ccc.numerator: 0 machdep.cpu.tsc_ccc.denominator: 0

Appendix 5 Results From I/O Bound Tests in Local Environment

Average and Median measured in ms, and Throughput in requests per second.

Label Samples Average Median Error % Throughput

50/s With Cluster, 4 workers 15000 305 305 0,00% 157

Figure 1: Results from I/O bound tests in local environment

Appendix 6 Results From CPU Bound Tests in Local Environment

Average and Median measured in ms, and Throughput in requests per second.

Figure 1: Results from CPU bound tests in local environment

Appendix 7 Results From CPU Bound Tests on Heroku

To see the results of the test corresponding with the figure description, look at the

vertical line (the one next to the timestamp).

10 rps:

Figure 1: 10 rps, without clustering (vertical line missing, see timestamp for time of test)

Figure 2: 10 rps, 1 worker

Figure 3: 10 rps, 2 workers

25 rps:

Figure 7: 25 rps, without clustering

50 rps:

100 rps:

TRITA TRITA-ICT-EX-2016:69

www.kth.se

Increasing the Throughput of a Node.js Application954978/FULLTEXT01.pdf · 1.1.2 Node.js Node.js is...

Documents