+ All Categories
Home > Documents > Dynamic allocation of servers for large scale rendering ...

Dynamic allocation of servers for large scale rendering ...

Date post: 12-Apr-2022
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
58
Dynamic allocation of servers for large scale rendering application Samuel Andersson Computer Science and Engineering, master's level 2021 Luleå University of Technology Department of Computer Science, Electrical and Space Engineering
Transcript
Page 1: Dynamic allocation of servers for large scale rendering ...

Dynamic allocation of servers for large scale

rendering application

Samuel Andersson

Computer Science and Engineering, master's level

2021

Luleå University of Technology

Department of Computer Science, Electrical and Space Engineering

Page 2: Dynamic allocation of servers for large scale rendering ...

Abstract

Cloud computing has been widely used for some time now, and its area of use isgrowing larger and larger year by year. It is very convenient for companies to usecloud computing when creating certain products, however it comes with a greatprice. In this thesis it will be evaluated if one could optimize the expenses for aproduct regardless of what platform that is used. And would it be possible toanticipate how much resources a product will need, and allocate those machinesin a dynamic fashion?

In this thesis the work of predicting the need of rendering machines based onresponse times from user requests, and dynamically allocate rendering machinesto a product based on this need will be evaluated. The solution used will bebased on machine learning, where different types of regression models will tryto predict the response times of the future, and evaluate whether or not theyare acceptable. During the thesis both a simulation and a replica of the realarchitecture will be implemented. The replica of the real architecture will beimplemented by using AWS cloud services.

The resulting regression model that turned out to be best, was the simplestpossible. A linear regression model with response times as the independentvariable, and the queue size per rendering machine was used as the dependentvariable. The model performed very good in the region of realistic responsetimes, but not necessarily that good at very high response times or at very lowresponse times. That is not considered as a problem though, since responsetimes in those regions should not be of concern for the purpose of the regressionmodel.

The effects of the usage of the regression model seems to be better than in thecase of using a completely reactive scaling method. Although the effects arenot really clear, since there is no user data available. In order for the effectsto be evaluated in a fair way, there is a need of user patterns in terms of dailyusage of the product. Because the requests in the used simulation are based onpure randomness, there is no correlation in what happened 10 minutes back inthe simulation and what will happen 10 minutes in the future. The effect ofthat is that it is really hard to estimate how the dependent variable will changeover time. And if that can not be estimated in a proper way, the results withthe inclusion of the regression model can not be tested in a realistic scenarioeither.

Page 3: Dynamic allocation of servers for large scale rendering ...

CONTENTS CONTENTS

Contents

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Sustainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.6 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Related work 5

3 Theory 6

3.1 Regression analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1.1 Linear regression . . . . . . . . . . . . . . . . . . . . . . . 6

3.1.2 Multi-linear regression . . . . . . . . . . . . . . . . . . . . 8

3.2 Erlang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3 Service rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.4 Server utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.5 Response time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.6 R2 score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.7 Mean squared error . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.8 Mean absolute error . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.9 Root mean squared error . . . . . . . . . . . . . . . . . . . . . . . 13

4 Implementation 14

4.1 System architecture . . . . . . . . . . . . . . . . . . . . . . . . . 14

2

Page 4: Dynamic allocation of servers for large scale rendering ...

CONTENTS CONTENTS

4.2 System components . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.2.1 Web server . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.2.2 SQS-queue . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2.3 Rendering machine . . . . . . . . . . . . . . . . . . . . . . 15

4.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.3.1 Idea of the simulation . . . . . . . . . . . . . . . . . . . . 16

4.3.2 Settings in the simulation . . . . . . . . . . . . . . . . . . 17

4.3.3 Data in the simulation . . . . . . . . . . . . . . . . . . . . 18

4.3.4 Outputs from the simulation . . . . . . . . . . . . . . . . 20

4.4 Single request in AWS-architecture . . . . . . . . . . . . . . . . . 20

4.5 Multi request implementation . . . . . . . . . . . . . . . . . . . . 22

4.6 Regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.7 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Evaluation 29

5.1 Regression models before delimitation . . . . . . . . . . . . . . . 29

5.2 Regression model after delimitation . . . . . . . . . . . . . . . . . 30

5.3 Final solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6 Discussion 44

6.1 Reactive vs proactive . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.2 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.3 What could have been done differently? . . . . . . . . . . . . . . 45

6.3.1 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6.3.2 Multi requests from client . . . . . . . . . . . . . . . . . . 46

6.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3

Page 5: Dynamic allocation of servers for large scale rendering ...

6.4.1 Response from rendering machines . . . . . . . . . . . . . 46

6.4.2 Map response to correct client . . . . . . . . . . . . . . . 47

6.4.3 Multi request . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.4.4 Timing in Node.js . . . . . . . . . . . . . . . . . . . . . . 48

7 Conclusions and future work 49

7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Page 6: Dynamic allocation of servers for large scale rendering ...

Acronyms and Abbreviations

Abbreviation DescriptionIaaS Infrastructure as a servicePaaS Platform as a serviceSQS Simple Queue ServiceGPU Graphics processing unitLSTM Long Short-term memoryStd Standard deviationR2 R-squared

MSE Mean Squared ErrorMAE Mean Absolute ErrorRMSE Root Mean Squared ErrorFIFO First In First OutAPI Application Programming InterfaceURL Uniform Resource Locator

Page 7: Dynamic allocation of servers for large scale rendering ...

1 INTRODUCTION

1 Introduction

1.1 Background

How many servers do you really need in order to guarantee a user friendlyproduct for the lowest possible price? That is a question that every companythat provides a service asks themselves. There is no easy answer to it either, as itdepends on a lot of different factors and parameters. What resources in terms ofhardware is needed? How many users is the service estimated to supply? Whatis a user friendly product really? As you may understand the questions aremany, and the answers to each question varies for every specific product/service

It is known that it is pretty pricey to rent servers for cloud computing purposes,by analyzing the need of server power in order to satisfy the users, and dynam-ically allocated these servers, these costs could be reduced by quite a margin.If it in addition to that would be possible to predict the need, and counteractthe increasing need in a proactive manner, then nothing would impact the userexperience either, in the same time as you would not pay for more resourcesthan you actually need.

The company that is issuing the project are soon releasing a product to theircustomers. In that product, there are many different factors that are influencingthe response times on the requests that the users are sending. There are differentrendering times for each request, as some requests might only need a very tinyrender, while some might need to render a whole scene or a complete view.Thus, the rendering times are very widely spread.

Since the product is not released yet, there is no user data or user patternsavailable either. How should you then be able to anticipate how many renderingmachines that will be needed in order to provide a user friendly product? Thepurpose of this thesis is to find out if there is a way to anticipate and predicthow many rendering machines that will be needed at a specific time.

1.2 Motivation

By only renting servers that are needed, the outcomes for a specific service canbe reduced as stated earlier. However it is not only a question about economybut also for the environment. A server, or in this case a rendering machineperforms very heavy computations that takes a lot of hardware in order toperform its calculations. To not use more rendering machines than your serviceactually needs does therefore not only generate greater economical profit fromyour service, but also it generates a profit for the well-being of the planet.

There are a lot of existing solutions to this problems. The existing solutions are

1

Page 8: Dynamic allocation of servers for large scale rendering ...

1.3 Problem definition 1 INTRODUCTION

often provided by the landlord of the servers/machines (AWS for instance), andthat means that you can not change your supplier of the rendering machinesand expect the exact same type of result afterwards, neither in terms of price orperformance. If you could scale up and down the number of rendering machinesby yourself then that problem disappears.

In addition to the above mentioned motivations there are also another big factorthat motivates this work, namely the progression of cloud computing. Theusage of cloud computing has increased by a lot [16] for the last 10 years and isexpected to increase even more in the future. It is expected that the number ofIaaS and PaaS -providers will increase, and to have a general and independentsolution to a substantial problem when it comes to hosting of a service is rightin time.

1.3 Problem definition

The number of users that are using a product varies a lot depending on a lot offactors. Are all the users located in the same country? Does all the users have ajob? Do all the users have the same habits? No one could possibly answer thesequestions without knowing each and every customer very well. How should itthen be possible to figure out how much resources that are needed in order toprovide a user friendly service? That is simply not possible. With dynamicallocation of rendering machines in the cloud these problems could disappear.

During the course of this thesis project an algorithm will be created that dy-namically regulates the number of rendering machines that are needed. Thealgorithm will analyze the evolution of response times on the requests that aresent from the users to the rendering machines. If the response times grows andseems to be reaching a state where the response times are unacceptable it willallocate another render machine to sustain the system that provides the service.

The algorithm that will be used will be based on machine learning and regres-sion analysis of the response times. The algorithm will continually need to beadjusted and analyzed in order to hopefully provide an acceptable predictionof the response times. By testing, adjusting and continually analyze the cor-rectness of the predictions, the expectation is to be able to deliver a regressionmodel that is capable of predicting the response time with great results.

The algorithm will be tested in an environment that is implemented in AWSthat represents the architecture of the currently running service. Now in orderto also test the algorithm fast and often there will also be a need to implement asimulation of the process. In the simulation the real scenario, with users sendingrequests to a render machine will be imitated and the correlating response timeto each request will need to be realistic and match the response times that aregenerated from the architecture that is implemented in AWS.

2

Page 9: Dynamic allocation of servers for large scale rendering ...

1.4 Sustainability 1 INTRODUCTION

All of the above paragraphs can be combined into the problem definition of thisthesis: ”Is it possible to create a model that can be used to predict theresponse times of requests? If so, could that model be used in orderto proactively scale up or down the number of rendering machines?How does it compare to a reactive solution?

1.4 Sustainability

From a sustainability standpoint the thesis is working, as stated earlier, bydecreasing the power consumption needed in order to provide a user friendlyand energy efficient service as possible. The rendering machines needs a verypowerful GPU to be able to render a picture from a polygon, therefore it alsotakes a lot of power to supply the GPU.

If it is more rendering machines active than necessarily needed, then one ren-dering machine should be taken down. To provide these powerful machines withpower that is not needed is not good for the environment. Hence, the earth isalso in need for a solution like this. That is since everybody needs to do whatthey can in order to save the earth from the situation it is in right now.

1.5 Delimitations

There are some limitations that needs to be done since the time is limited. Forexample, in the real service that soon will be launched, there is a cache that isstoring some requests such that many requests does not have to go to a rendermachine at all. Since only the response times of the requests that are in needof a render machine are of interest, and the fact that it will take a lot of timeimplementing the architecture and simulation, the cache can be disregarded inthis thesis. This would not change the result either way since the algorithmshould be placed ”behind” the cache, that means that the only concern arethe requests that are surpassing the cache in the real system as well, since thecached requests will have a very low response time.

Another delimitation in the project will be the load generator that will be usedin the project. It will not be very advanced as it is not the main focus of thisproject. The load generator will need to able to generate different types ofrequests however, as they are pretty varying in terms of render times. The loadgenerator will be based on some data that product owner will provide me with,that corresponds somewhat to the reality. One pattern that the load generatorcould use is for instance a normal distribution with different types of requests,another type is a completely randomized load within some frames and intervals.

However it should be stated already that the differences in rendering times, with

3

Page 10: Dynamic allocation of servers for large scale rendering ...

1.6 Thesis structure 1 INTRODUCTION

respect to different request types, was another delimitation that was done duringthe work of the thesis. The different types of requests was later on discarded,and instead the usage of a rectangular distribution within some intervals of therendering times was used instead. By doing so, all the requests had the same”type” and used the same time interval. There will be more explanations tothis paragraph later in the thesis under the evaluation section.

1.6 Thesis structure

In this thesis section 3 brings you some information about related work on thesubject, section 4 is completely focusing at the theoretical parts of the thesis.Such things that might be of use to understand what they actually are beforeproceeding further in the report, or things that might be good to go back toin order to remember what they actually are when they are referred to. Insection 5 there are descriptions of how the implementation of different thingshas been, what the different components are, what the idea behind some thingsare, what settings that are possible, what data that is generated and so on. Theevaluation of the solution is covered under section 6, where there are a lot ofdifferent graphs presented, and a lot of explanation on why the solution behavesin a certain way. In section 7 the discussion takes place, and most of all theproblems that arrived during the thesis and what that could and should havebeen done differently is also discussed. Last but not least, section 8 rounds upeverything that has been presented before in some conclusions and what somefuture work could be to this thesis.

4

Page 11: Dynamic allocation of servers for large scale rendering ...

2 RELATED WORK

2 Related work

When it comes to forecasting and machine learning there are numerous of differ-ent types of solution that you could find. Forecasting and predictive regressionis something that is used in many different areas, and there has been a lot ofdifferent articles read. Some examples of such work could be: using multi linearregression in order to predict the load capacity of reinforced bridges [10], fore-casting of stock prices by using a LSTM neural network [20] and also work thataims to forecast the weather in a data driven approach [17].

Although there are a lot of different work that surrounds forecasting and pre-dictions of different things, there has not been a lot of articles found that aresimilar to what this thesis aims to accomplish. That is most likely due to thefact that there already are solutions to this problem, as long as you use theservice provided by the ”landlord” of the servers. AWS AutoScaling [2] for in-stance is mostly reactive and acts when the problem has already occurred, orwhen the problem is occurring, as a trigger. The desired solution is to be able topredict if a problem will occur and take action before that happens. Althoughone can use AWS AutoScaling for predictive scaling [3], its needed to providethem with historical data of the user activity, and that is not something that isavailable by now. In addition to that it would be beneficial to have a generalsolution that could be applied on other server providers than AWS, since therendering machines has been rented at several different places. The need forproactive scaling is very important in this specific case since the boot time forthe machine itself and all of its software is estimated to be somewhere around5 10 minutes.

This thesis is not only about regression and machine learning, as there are alot of other concepts that been needed to be learned and understand. A lot ofthem are surrounding queueing networks [12] and the correlated topics. Thereare also different queueing models, the most simple one is the M/M/1 queueingmodel [1] and the one that is worked with in this thesis, which is a M/M/Cqueueing model [13]. The M/M/C queueing model is just an extension to theM/M/1 queueing model with the difference that there are c servers instead of asingle server.

5

Page 12: Dynamic allocation of servers for large scale rendering ...

3 THEORY

3 Theory

During the thesis there has been some theoretical work that it would be advan-tageous to explain a little bit more about. Some of the theory that is explainedin this section have had a larger part of the thesis, and some might not havebeen used that much at all. Although it is very important to explain what theconcepts are, so that it might become clear why they have been used in thethesis.

3.1 Regression analysis

Before the actual regression models are discussed, regression analysis [11] itselfshould be discussed, as it might not be familiar to everyone. Regression analysisis a technique or a method used to describe an independent variable, basedon some dependent variables. This might sound abstract when reading, but inreality it is not very abstract at all. Regression analysis is often used in statisticsand is a statistical process for estimating the relationships between a dependentvariable and an independent variable, as stated earlier.

In our case, regression analysis is used in a machine learning context. That isdone by providing a data set to a regression model, and when this model hasbeen trained with respect to the given data set, it can determine the correlationbetween different attributes in the data set. What this gives you is the proba-bility to determine or even predict the value of an independent variable, giventhe value of the dependent variable(s).

3.1.1 Linear regression

Linear regression [7] is the most simple model possible in a regression analysis.However, if the data set is well suited, and the dependent variable is wiselypicked, it could also be the most precise model. The resulting model is somethingthat is probably very familiar, namely this one:

Y = βx+m (1)

Where Y is the independent variable and X is the dependent variable. As thisis a very simple model, the challenge using this model is to pick some attributeX that is describing Y in a way that is as good as possible. Many times thereis no dependent variable initially that is suiting the model. However, in somecases one can derive data from the already existing data that describes theindependent variable pretty good.

6

Page 13: Dynamic allocation of servers for large scale rendering ...

3.1 Regression analysis 3 THEORY

There are some metrics that could (and should) be used when picking a depen-dent variable. These are kurtosis and skew. Both kurtosis and skew are metricsthat measures ”the quality of the data” in terms of how many or how big theoutliers are. The data should be as similar to a normal distribution as possible,that is because most underlying statistical models assumes the data provided tothe model to be normally distributed. [9] When it comes to skew, it measureshow the overall shape of the data is compared to a normal distribution. Whichis illustrated in Figure 1

Figure 1: Positive and negative skewness (Source: Wikimedia Commons underCC BY-SA 3.0).

Kurtosis on the other hand only measures outliers with respect to the height ofthe data. The kurtosis of a normal distribution is equal to 3. Hence, there itis often spoken about excess kurtosis instead of kurtosis. The excess kurtosis isthe measured kurtosis of a variable minus 3, as the only interest is in how thedata differs from normally distributed data.

There are some debates regarding which intervals of skew and kurtosis thatare acceptable when constricting a linear regression model. In some cases thedata might be highly skewed or have high values on the excess kurtosis, andstill perform very good. There are no guarantees that your model will perform

7

Page 14: Dynamic allocation of servers for large scale rendering ...

3.1 Regression analysis 3 THEORY

better with no skew and no excess kurtosis. If that is the case however, theshape of the data will fit your model better, and you are more likely to get abetter result.

In the case of high skew values (normally if skew is ≤ −1 or ≥ 1) or high excesskurtosis (normally if excess kurtosis is ≤ −1 or ≥ 1) the most common solutionis to make transformations on the data. The most common transformation isprobably the logarithmic transformation with either the natural logarithm or thelogarithm with base 10. There are numerous different kinds of transformationsthat could be used, but when you are using transformations on the data in yourregression model you need to remember to both transform the input, and torewind the transformation when you are extracting your prediction from themodel. That is because the answer also will be terms of the transformed data,which means that the answer will need to be rewinded with respect to thetransformation before it is interpretable.

3.1.2 Multi-linear regression

Multi-linear regression [14] is actually precisely the same as the above mentionedlinear regression model, however there is one big difference. In the multi-linearregression there are several different dependent variables that together are de-scribing the independent variable. A multi linear regression model could looksomething like this:

Y = β0x0 + β1x1 + . . .+ βnxn +m (2)

As can be seen in the equation above, there are way more variables in thisequation compared to the equation described under the linear regression model,and that is very natural, as there are more variables that are explaining theindependent variable. These variables are all cooperating in order to describethe independent variable. Some may be more significant, while some may benot very significant at all (however they should not have been picked if that isthe case).

In multi linear regression there are also a couple of things that should be checkedfor in your data before picking the dependent variables. In a multi linear regres-sion model there should be no multicollinearity. Multicollinearity is when oneof the dependent variables is highly correlated with another dependent variable,and this is not something that is wanted in the model because the coefficientestimates of the model tends to be unreliable on correlated dependent variables.One good metaphor for this is if one attends to a concert with multiple artistsinging the same song at the same time, and the job is to determine which artistthat is the best singer. To eliminate multicollinearity one could use a correla-tion matrix of the data, and remove one of the highly correlated variables. An

8

Page 15: Dynamic allocation of servers for large scale rendering ...

3.1 Regression analysis 3 THEORY

even better way to do this would be to use something that is called the varianceinflation factor, which is a measurement of how much a particular variable iscontributing to the standard error in the regression model. This can be usedsince the variance inflation factor for a variable would be very big if there ex-ists high multicollinearity. A rule of thumb is to eliminate variables until thevariance inflation factor is ≤ 5 for all variables.

After you have created your model you should also evaluate whether or not thereare any heteroscedasticity or auto correlation. Heteroscedasticity is basically aterm which means that the errors of the models does not have constant variance,and that is nothing that you want in your model, since that implies that theremay be some significant error that occurs every time for some specific input,creating an unreliable model. Testing your model for heteroscedasticity can bedone by using the Breusch-Pagan test [6] or the White test [18].

The possible reasons for heteroscedasticity is often when there are outliers inthe data set, which is something that you also should test your data set for. Ageneral definition of an outlier is described in equation 3, where Mean is themean of the data and Std is the standard deviation of the data.

Outlier ≥ |Mean± 3 · Std| (3)

When it comes to outliers there are also some tradeoffs that should be takeninto account. Some might say that outliers should be removed from the dataset, as that make the model better and more precise. Although that might betrue in some cases, it is also not always true since the outliers are actually datathat describes an event that also has occurred, and might be very important forthe model. Some models should be used in order to detect when there might be”outlying data” coming up, and then you need to train the model to be able tospot and recognize patterns where there are outliers.

When speaking about auto correlation in regression, one generally speak aboutauto correlation on the residuals. Residuals are described in equation 4, whereR is the residual, Yp is the predicted value and Yo is the observed value. Theresidual of a data point is the vertical difference between the regression line andthe actual value of the data point, and it could be explained as the ”error of theprediction”.

R = Yo − Yp (4)

Auto correlation is when values of the same variables is based on related objects,that is that you can almost ”predict” future values based on the preceding valuesin a series. A good example would be the measured temperature during a month.You would expect the second day of a months temperature to be more similar

9

Page 16: Dynamic allocation of servers for large scale rendering ...

3.2 Erlang 3 THEORY

to the first day of a months temperature rather than the last day of a monthstemperature. If this would be true for the data set of temperatures, it wouldexhibit auto correlation.

When merging these two definitions together it becomes auto correlation on theresiduals. And this is something that is not wanted in the model, because thecomputed standard errors and the p-values are misleading when there is autocorrelation. When there is auto correlation in your model it is often a sign thatthe model that is used might not be the correct one, given your data set. Oftenwhen time-series data is used, that is data generated from a time-series, oneends up having auto correlation in your model since the observations will bedependent on the preceding values. To test the model for auto correlation onecan use the Ljung-Box test [5].

3.2 Erlang

In the thesis work the inclusion of Erlangs C-formula has been used in order toevaluate what the probability is of a request ending up having to queue beforebeing processed. Erlang is primarily used as a indication of traffic intensity, andthe the most widely used formula that includes erlangs is probably Erlang B,which originally described the probability of a phone call being blocked, makingthe phone call completely disregarded or dropped. Erlang C is however, as justmentioned, an estimation on the probability of a phone call having to queuebefore being processed.

Erlang can be used in many different areas, and it could also be used in thisthesis as well. For example, if the number of service towers is substituted forrendering machines and a phone call is substituted for a rendering request, theformula ends up describing the exact same thing, but for this problem instead ofthe phone call problem. In equation 5 it can be seen how erlang E is calculated,where λ is the arrival rate and h is the average service time. Then in equation6 the Erlang-C formula can be seen, where Pw is the probability of a serviceneeded to be queued, E is erlang and c is the number of servers.

E = λh (5)

Pw =Ec

c!c

c−E∑c−1i=0

Ei

i! + Ec

c!c

c−E

(6)

10

Page 17: Dynamic allocation of servers for large scale rendering ...

3.3 Service rate 3 THEORY

3.3 Service rate

Service rate is a measurement on how many services that can be handled givena specific service time. The term service corresponds, in our case, to a renderin a rendering machine. The service rate is calculated as shown in equation7, where µ is the service rate, λ is the arrival rate of an operation and t is theservice time (the time it takes to perform the service). The service rate is prettymuch a measurement on how many services you can perform per time unit (λand t must be in the same unit).

µ =λ

t(7)

Please note that this is not the service rate for the entire system, but only forone server. however, the system service rate is describing the entire systemsservice rate, and is calculated as shown in equation 8. Where c is the numberof servers and µ is the service rate of one server.

µs = µ · c =λ

t· c (8)

3.4 Server utilization

Server utilization is a measurement of how much of the time, on average, theservers of the entire system are ”busy” or in use. The server utilization ρ iscalculated according to equation 9 where λ is the arrival rate, c is the numberof servers and µ is the service rate of one server.

ρ =λ

cµ(9)

Given ρ one can get some information regarding the stability of the system. Ifρ > 1, the queue of the system will grow and will eventually be out of control. Ifρ < 1, some services might have to queue (since the service times are not exactlythe same for every service). If all service times were the same, this would meanthat the queue on average would be empty. That is because according to ρ,the system service rate is greater than the arrival rate. however, as mentionedabove, this is not very usual in practise since the service times often varies andthe arrival rate is an average measured over time.

11

Page 18: Dynamic allocation of servers for large scale rendering ...

3.5 Response time 3 THEORY

3.5 Response time

By using the above mentioned calculations, there is actually a way to theo-retically calculate the average response time. The response time is the time acustomer spends in both the queue and in the service itself. The average re-sponse time can be calculated as shown in equation 10, where T is the averageresponse time, Pw is the probability of a service to be queued (erlang c), c isthe number of servers, λ is the arrival rate and µ is the service rate.

T =Pw

cµ− λ+

1

µ(10)

3.6 R2 score

R2 score [8] is a simple, yet widely used metric to determine ”how good a model”is. It is basically measuring how far off the predictions are from the observedvalue in the test data, rather than if you would just be using the mean valueof the observed data as a prediction. It is calculated as shown in equation 13,where SStot and SSres are described in equation 11 and 12 respectively.

SStot =∑i

(yi − y)2 (11)

Where yi is the observed value at position i and y is the mean of the observedvalues

SSres =∑i

(yi − fi)2 (12)

Where yi is the observed value at position i and fi is the predicted value atposition i

R2 = 1− SSres

SStot(13)

3.7 Mean squared error

Mean squared error (MSE) [4] is a metric that is describing how much on averagethe model is guessing from the correct value. MSE is calculated as shown inequation 14 where MSE is the mean squared error, Yi is the observed value atposition i, Yi is the predicted value at position i, and n is the number of entriesin the test data. The reason for squaring the difference between the observed

12

Page 19: Dynamic allocation of servers for large scale rendering ...

3.8 Mean absolute error 3 THEORY

and the predicted is to ”punish” larger errors.

MSE =1

n

n∑i

(Yi − Yi)2 (14)

3.8 Mean absolute error

Mean absolute error (MAE) [19] is a metric that is describing how much onaverage the model is guessing from the correct value. MAE is calculated asshown in equation 15 where MAE is the mean absolute error, Yi is the observedvalue at position i, Yi is the predicted value at position i, and n is the numberof entries in the test data. In MAE the only focus is to calculate the meanmagnitude of the errors, not their direction and not to punish larger errors.

MAE =1

n

n∑i

|Yi − Yi| (15)

3.9 Root mean squared error

The root mean squared error (RMSE) [15] is a metric that is describing howmuch on average the model is guessing from the correct value. RMSE is calcu-lated as shown in equation 16 where RMSE is the root mean squared error, Yiis the observed value at position i, Yi is the predicted value at position i, and nis the number of entries in the test data. This metrics main focus is just to makethe the MSE a little bit more interpretable, since in MSE the square of thedifferences from the obtained and the predicted value are summed up, makingit a little bit hard to interpret. So RMSE is still punishing larger errors, just asMSE are. But in RMSE the answer is more interpretable compared to MSE.

RMSE =√MSE =

√√√√ 1

n

n∑i

(Yi − Yi)2 (16)

13

Page 20: Dynamic allocation of servers for large scale rendering ...

4 IMPLEMENTATION

4 Implementation

4.1 System architecture

The system architecture consists of four core parts. There is a client that isusing the web application which in turn uses the SQS-queue for queueing therequests that is coming from the client. From the SQS-queue messages are pulledby rendering machines that are located within an auto scaling group such thatthe number of rendering machines can be increased or decreased (scaled up anddown). In Figure 2 below, the work flow of the architecture is described.

Figure 2: Current system architecture in AWS.

4.2 System components

4.2.1 Web server

The web server that is implemented using Node.js hosts a web application fromwhich you can send requests. This corresponds to the service that the prod-uct owners are supplying to their customers but it is not anywhere near thatadvanced. You can send three different types of requests and the transporttime, render time, and the total time of the request will be displayed on theweb site. You can also select to send multiple requests using the multi requestfeature. In this feature all the responses to each request will be stored in theweb server temporarily. When the last response to a request is received all thedata will be sent to the client, and some graphs will appear with that displaysthe development of different data attributes.

14

Page 21: Dynamic allocation of servers for large scale rendering ...

4.3 Simulation 4 IMPLEMENTATION

4.2.2 SQS-queue

The SQS-queue is a service provided by AWS. The web server sends the requestto the SQS-queue that is of FIFO-fashion. That is, the first request that comesin is the first request to get pulled from the queue. The SQS-queue is the queuethat stores the requests that is to be processed by the rendering machines. Fromthis queue you can also get all the necessary attributes such as queue length etc.All use of the SQS-queue is handled via AWS API.

4.2.3 Rendering machine

The rendering machines are located inside an Auto Scaling-group. That isbasically because it is necessary to be able to increase or decrease the numberof rendering machines to be used. The rendering machines themselves are builtin a simple python script that pulls for messages (requests) from the queue andthen processes the request and waits/sleeps for the time that the rendering-request is supposed to take. The amount of time a render takes depends onwhich type of request it is. For example, initially a fast request takes between0,3 to 0,7 seconds, a normal request takes between 0,7 to 1,5 seconds and aheavy request takes between 1,5 and 8 seconds. These times were all based onnumber that the product owner had provided me with at the start of the thesis.Due to some delimitations in the thesis, and also due to optimization of therendering times, these times were later on changed. But that will explainedlater. After the ”rendering” is done, the machines adds some information to amessage body and sends it back to the web server.

4.3 Simulation

In addition to making a ”real architecture” there is also a need to make asimulation of the process. The reason for this is that if you want to test yourregression model or your solution, you might not want to make it in real timeas that would be very time consuming. If you have a simulation that is goodand is producing results that are comparable to those from the real scenario,you could instead use your simulation for testing purposes. The results from asimulation that corresponds to user activity from an hour could be obtained inonly 15-30 seconds instead of waiting for that whole hour.

A simulation is also very useful as it gives me the advantage of testing differ-ent settings in a way that would not be possible in the AWS-architecture. Newsettings could be implemented and new data could quickly be derived. Also, dif-ferent parameters, that might not be easy to replicate in the AWS-architecture,but plays a big part in the real system, could be added in the simulation as well.But the main reason for using a simulation is because it is easy to test different

15

Page 22: Dynamic allocation of servers for large scale rendering ...

4.3 Simulation 4 IMPLEMENTATION

settings and run multiple simulations over a day. That is not something thatwould be possible by only using the AWS-architecture, due to the time it wouldrequire.

4.3.1 Idea of the simulation

The idea of the simulation is to base it on a table of entries, where each entrycorresponds to a time slot. In this simulation each time slot is 0.1 seconds.You could then choose for how long you would like to run the simulation, andthat corresponds to how many time slots you would like to have in your table.Now, during the simulation there is for each time slot a certain probability thata request will be sent to the system. That means that you can also test thesystem’s performance during different user loads. In the simulation there arealso different types of requests. There is one ”easy type”, one ”normal type”and one ”heavy type”. Each has different probability of occurring and eachhas different intervals of what times they take. But as already mentioned, thischanged later on in the thesis.

Applying a certain request to a render machine is simply done by going throughall the current rendering machines that the simulation are using, and evaluatewhich one of them that will be available first. The one that is available firstwill be allocated to perform the incoming request, just as it would be in a realscenario, where each rendering machine is pulling from a queue for jobs as soonas it is idle. Now, since the rendering machine might busy when the requestcomes in, and should perform this new request as soon as its previous ones hasexecuted, the starting time of the request Ts needs to be calculated. That isvery easy and is calculated by the following equation:

Ts = max

{Ta

Ti(17)

Where Ta is the time where the rendering machine will become available again,Ti is the current time slot for the simulation, and Ts which is the time slotwhere the render request starts being processed. This is because there might bescenarios where a render machine is idle when a request comes in, and then itshould start executing at the current time slot, and not at a time slot that haspassed. After this is done, Ta needs to be updated. That is done very easilyaccording to this equation where Tr is the time it takes to execute the renderin the machine:

Ta = Tr + Ts (18)

16

Page 23: Dynamic allocation of servers for large scale rendering ...

4.3 Simulation 4 IMPLEMENTATION

In order for the simulation to be similar to the real architecture it also needs aqueue. The queue is not really holding any requests, however it is still workingin the same way that it would do in the real architecture. For each and everytime slot where there is an incoming request, the start time of the request iscalculated according to Equation 17. If the calculated start time of the requestis larger than the actual time slot the the request was sent on, the request isinputted in the queue. For each time slot there is also a check if there are anypending requests in the queue that is supposed to start at this given time slot.If that is the case, the request is removed from the queue. Otherwise the queueis left untouched

The main functionality of the simulation is briefly described above and as youcan probably tell it is a very simple way of simulating a rather complex issue.It is in my opinion a very good way to do the simulation as it takes a lot ofthinking about what is really happening and what things that are dependent ofeach other in the real scenario, and when you get the results you get it presentedin a very good and clear way. Another benefit with this simulation is that youcan use and store the data that the simulation provides you with.

4.3.2 Settings in the simulation

In the simulation there are a various different settings that could be used inorder to generate and compare different data. As mentioned above, you canchoose how many entries the simulation should be based on, and what loadthe simulation should use. But in addition to these things, you have way morepossibilities. If wanted, the simulation could be executed without any scalingwhat so ever, and in this case the initial number of rendering machines specifiedwill be used through the whole simulation time.

There are also a parameter that specifies how many rendering machines in thesimulation that is the least you can have. For example you could say that scalingis allowed, but you are never allowed go go any further down than having, sayx, number of rendering machines. This implementation was done because itis necessary in an architecture like this, because there might be a tight lowerbound on how many rendering machines you want to be allocated, as it mightbe severe consequences if you go below that limit.

When it comes to load you can choose the load that you want. The load corre-sponds to the probability of an entry to turn into a request. Hence, for examplea static load of 30% indicates that on average 3 requests per second will besent. You could also have dynamic, varying load during the execution of thesimulation, which was implemented to test how the results turned out if the loadwas dynamic instead of static. This is done by setting an interval in which theload could vary between, and then you also specify in what rate the load shouldchange. For example you can say that for every 200 seconds in the simulation

17

Page 24: Dynamic allocation of servers for large scale rendering ...

4.3 Simulation 4 IMPLEMENTATION

the load should increase by 1%. The variation of the load is constructed in sucha way that it is changing like a sinusoidal. In an interval of load between xand y, the load starts at x, increasing by 1% every t seconds until y is reached.After y is reached the load is decreasing by 1% every t seconds until x is reachedagain. And then it proceeds like that during the whole simulation.

There are also a lot of settings when it comes to delays and times. For exampleyou could specify how long the boot up time is for a rendering machine. Letssay that you decide to scale up the number of rendering machines at time t,and the boot up delay of a rendering machine is tb. Then the time when therendering machine is actually up and running is at time ts = t + tb, sinceit was told to be a boot up delay of time tb. There is also a correspondingboot down or shut down delay of a rendering machine that you could specify,which is basically the same exact same thing, but comes to action when you arescaling down the number of rendering machines in the architecture. This is avery important implementation as there are a pretty hefty boot up time on therendering machines, as they need to download a lot of files before they are goodto go and ready for rendering.

Another time setting is an attribute which determines how long time it needs tobe between the evaluation of the architecture. This means that after a machinehave been taken up or taken down from the architecture, there must be sometime before the next evaluation of the architecture is done, as you would wantto wait a moment so that you can actually see what impact the change had onthe response times, the queue size, and so on.

4.3.3 Data in the simulation

In this section all the data that is generated and used for each request, that aregenerated in the simulation will be listed. After the listing it will also be statedwhy these data attributes were generated.

1. Time

• The time where a request came into the system

2. Load type

• What load type the incoming request is of

3. Render machine

• Which render machine that is doing the render

4. Start time

18

Page 25: Dynamic allocation of servers for large scale rendering ...

4.3 Simulation 4 IMPLEMENTATION

• What time the request will start being processed by a rendering ma-chine

5. Rendering time

• How long the rendering will take for the rendering machine

• Depends on the load type

• Randomized within some time intervals

6. Done time

• What time the request will be returned with a rendered picture

• Calculated by: start time + rendering time

7. Response time

• How long it took from an incoming request to an outgoing response

• Calculated by: done time - time of request

8. Number of rendering machines

• The number of rendering machines that are currently active

9. Queue size

• The number of requests that are currently queued, waiting to start.

10. Queue size per rendering machine

• An estimation of how many requests that are queued to each render-ing machine

• Calculated by: queue sizenumber of rendering machines

11. Mean response time over the last 10 requests

12. Erlang C

• The value obtained from equation 6

13. Service rate

• The value obtained from equation 7

14. Server utilization

• The value obtained from equation 9

15. Average response time

• The value obtained from equation 10

16. Load recent 10 seconds

19

Page 26: Dynamic allocation of servers for large scale rendering ...

4.4 Single request in AWS-architecture 4 IMPLEMENTATION

• The load in percent the recent 10 seconds

• Calculated by number of requestsnumber of entries over the latest 10 seconds

17. Change in queue size per rendering machines

• The change in percent of the variable described above (queue perrendering machine)

• Measured over a specified time interval. Usually the used time was10 minutes

All these attributes were generated with the regression model in mind. The de-sire was that some of those attributes could be used in a multi linear regressionmodel as one of the dependent variables, in order to model the regression timesmore accurately. However, the correlation was not the best for some of them.And the ones that actually were good for this purpose often struggled withthat they violated the multicollinearity property of a regression model, whichis pretty understandable since most of the theoretical attributes are highly de-pendent of the number of rendering machines.

4.3.4 Outputs from the simulation

The primary output from the simulation is the table that is used in the sim-ulation. In the output table all of the time slots where there are no requestsare stripped off from the table, as they are not very interesting. There are alsogenerated a couple of plots that are all based on the contents of the table. Thetable is saved as an excel file and the plots are also saved as pictures.

4.4 Single request in AWS-architecture

When performing a single request in the ”real system” the execution starts offwith the client requesting a specific type of request. As has been mentioneda couple of times already, there are ”FAST”, ”NORMAL” and ”HEAVY” re-quests. Before the request is sent from the client, the request body is created.That body consists of two things, the current time in milliseconds since 1970and the type of the rendering request. The request of the client is then sentto the web server, the web server is processing the information in the requestbody, creates a new message body that includes the information sent from theclient, and forwards the message to the SQS-queue.

Since the response will not be from the SQS-queue, but from a rendering ma-chine to which there is no active connection, it is needed to, as stated in theparagraph above, include some more information that will be sent to the ren-dering machines via the queue. This information is for example the IP-address

20

Page 27: Dynamic allocation of servers for large scale rendering ...

4.4 Single request in AWS-architecture 4 IMPLEMENTATION

of the web server, such that the rendering machine knows where to send the”response”, and also the size of the queue (how many messages that are locatedin the queue by the time the message will be sent to the queue).

In addition to creating a new message body, the response object to the clientwill need to be stored with some identifier. That is because as described earlierthe response from the rendering machine will come in at a specific route inthe web server. In order to determine to which client the information receivedin the request should be sent to, the response object from the clients requestalong with the Message ID that is returned from a successful publication to theSQS-queue is stored in a data structure.

From the rendering machines perspective they are continually pulling for mes-sages from the SQS-queue, when they are not doing work. Hence, as soon as themessage arrives in the queue, it will be received and processed by a renderingmachine. The rendering machine will then evaluate what type of rendering itwill do, do the render, which in this case corresponds to just randomizing a timebetween a specific interval, which maps to the request type, and sleep for thatrandomized time. After that is done, the rendering is considered done and thetime it took to perform (sleep) the request is stored in the response body, alongwith some other information, the ID of the pulled message for instance.

When a request from a rendering machine is coming in at the web server, thefirst thing that is done is to do a lookup in the data structure that is storingthe response object. Since there is an identifier (message id) both in the datastructure, and in the incoming request body from the rendering machine, it caneasily be performed a lookup in the data structure, locate the response objectto the client, and respond to the client by using the located response object.

When the response finally is arriving at the client, some computations are doneusing the times that are stored in the request body, and the time of the whenthe response is arriving at the client (in milliseconds since 1970). The renderingtime does not need to be calculated, as it is already in the response body, andthe transport time can be calculated using TA, TS and TR. Where TA is thetime of the arrival of the response, TS is the time when the request was sent andTR is the rendering time. The transport time can then be calculated by usingthe following equation:

TT = TA − TS − TR (19)

And the total time of the request T can be calculated by using the followingequation:

T = TA − TS (20)

21

Page 28: Dynamic allocation of servers for large scale rendering ...

4.5 Multi request implementation 4 IMPLEMENTATION

In Figure 3 below, it can be seen what the output in the web applications lookslike.

Figure 3: The result from a single request.

4.5 Multi request implementation

When the execution of a multi request is performed the flow of the requests isvisualized in Figure 4. In that figure the notation on each edge corresponds tohow many requests that are done between the connected instances during theexecution of a multi request. x is a positive number, where x ≥ 1.

Figure 4: Request flow of multi request.

When implementing this feature the goal was to make it as similar as possible tothe simulation, such that it would be a specific load that the user decides fromthe client, and it should be based on entries (basically a measurement on howlong the execution will go on). The reason for that it would be advantageousif the multi request feature were to be as similar to the simulation as possibleis because if they are as similar as possible, the outputs from the two can be

22

Page 29: Dynamic allocation of servers for large scale rendering ...

4.5 Multi request implementation 4 IMPLEMENTATION

compared.

First of all, the client sends a request to the web server indicating that a multirequest should be performed. From the client it is also stated how many entriesthat is wanted during the execution, and what static load percentage that iswanted during the execution, which can be seen in Figure 5. From there it isplaced a spinner with a ”Loading”-text on the web page, indicating that therequest is being handled right now and that the user should wait for the requestto return. This can also be seen in Figure 6

Figure 5: The page of the multi request feature.

Figure 6: The multi request-page after the request has been sent.

At the web server the request from the client is received, indicating that there

23

Page 30: Dynamic allocation of servers for large scale rendering ...

4.5 Multi request implementation 4 IMPLEMENTATION

should be multiple requests sent to the rendering machines. Just as the case ofa single request, the response object along with a specific identifier needs to bestored. The identifier is now not the message ID, since there will be multiplemessages sent to the queue, but just a random ID that is associated to the multirequest. In addition to the response object there is also a need to store the datathat will be produced from the rendering machines, so empty lists that shouldhold the data produced also goes into the data structure.

From this point the web server starts sending some requests, that is done berecursively calling a method x times, with a delay of 100 milliseconds betweeneach call, where x is the number of entries that the user wants to run. For eachof those entries a random number between 1 and 100 is generated, and it ischecked whether that number is less than or equal to the load percentage thatthe user provided. If the number is less than the load percentage, a requestis sent to the queue. If not, nothing is done on that specific entry. That willcause a probability of z percent that an entry will result in a request, where zis the load percentage that the user specified. For each entry that turns into arequest a body is generated with some data. The queue-size and the number ofrendering machines active is for instance some of the data that is included inthe body.

For every response/request that is received from a rendering machine, implyingthat a render has been done, the results of the render are mapped to the suppliedID that was included in the body of that request, such that it will be mapped tothe correct client. The multi request is considered done when the last entry hasbeen received at the web server, hence, the last entry must turn into a requestbefore it can be guaranteed that the process is finished. So, when the last entryis received at the web server, it is known that the process is done, and by usingthe response object stored to the ID in the request body, it is easy to locateboth the data and the response object. The data that has been generated isthen sent back to the client.

When the response to the client finally arrives, the data that is received isused to plot some graphs. The graphs are produced by using Chart.js which isa convenient module for plotting graphs in a javascript-web application. Theresults from this action can be seen in Figure 7, please note that the data showncan be changed by using the buttons below the graph.

24

Page 31: Dynamic allocation of servers for large scale rendering ...

4.6 Regression model 4 IMPLEMENTATION

Figure 7: Client side after response.

4.6 Regression model

The implementation itself of the regression model is very easy now a days. Thereare so many libraries that are available for both constructing the model and forevaluating them. It was also going to be a need for some libraries that are usedfor both handling your data set and plotting graphs. The libraries that wasused for doing all these kinds of things will be gone through one by one andsome comments about why and what they were used to will be addressed.

Pandas is a library that most of you that are reading this will probably befamiliar to, as it is the most widely used library for dealing with data frames ofany kind in python. This library was used to so many things that it would justbecome ridiculous to bullet point all of them. In general this was used to pre-process the data, which corresponds to loading the data from the excel file to apandas data frame, checking and converting the types of the data in the dataframe, checking if there are any null-values or corrupted rows in the table and soon. Pandas was also used to evaluate the data in the data frame, as the pandaslibrary holds useful functionality such as the possibility to generate a correlationmatrix of the columns in data frame. Also there is functions to describe thevalues of the columns in the data frame. From this you get statistical data of thedata frame such as the mean, standard deviation, minimum value, maximumvalue etc. There are numerous of functionality in the pandas library that hasbeen coming in clutch in the implementation of the regression model, and theones mentioned are just a few of the ones that has been used.

Matplotlib is another library that most of you readers are probably familiarwith. One very important aspect when you are creating your regression modelis to understand and know your data, and what better way could it possibly

25

Page 32: Dynamic allocation of servers for large scale rendering ...

4.6 Regression model 4 IMPLEMENTATION

be than to use plots in order to actually see what you are working with. Incombination with statistics and reading material you would never be able tounderstand what is happening and why you might not get the results that youwanted without plotting graphs of your results/progress along the process ofcreating your regression model. At least that was a big problem for me in thebeginning, as my experience in the subject machine learning was extremely low.

A very useful way to start when the dependent variable should be picked is toplot all the columns of your data frame along with the independent variable.From these plots one can evaluate whether there are any interesting patterns orcorrelations between the variables. An example of a plot like this is shown inFigure 8, where there clearly is a linear relationship between the independentvariable response time and the dependent variable Queue/RM. The usage ofthese kinds of plots was done before any verification was done by using anystatistical tool, such as a correlation matrices or the variance inflation factor.The decision to do this was for me to understand and to be able to predict whatattribute that would be good as an independent variable. Statistical tools couldthen instead just be used to verify (or disregard) my own predictions.

Figure 8: Linear relationship between independent variable and dependent vari-able.

This was also just one example on how plots has helped in the process of im-plementing the regression model. More or less for every statistical calculationthere has been a plot trying to explain or predict the outcome of the statisti-

26

Page 33: Dynamic allocation of servers for large scale rendering ...

4.7 Scaling 4 IMPLEMENTATION

cal calculation, as a way of learning what everything is describing and doing.Plots in general has been a key factor during the whole process and has beenused in many cases, and some of the plots will be shown later in this report,such as the comparison between the predictions and the observed values, whichalso has been used for every created model, as that is in my opinion the bestway to see how your model is performing. That is because many of the sta-tistical measurements will not tell you the whole story, if you are calculatingthe mean squared error, mean absolute error, root mean squared error, r2-scoreor whatever of your model, it might give you the impression that your modelis completely useless. That might not always be case, even if the results onthose metrics are bad. From this plot you might also see anomalies and be ableto draw conclusions on why your model is being punished/performing bad incertain situations.

When it comes to the statistical analysis of a model, one of the libraries thathas been used is Statsmodel.api as they provide tests for some of the things thatare explained in the in the theory section, such as kurtosis and skew and so on.It was very easy to use and to get started with, rather than implementing themaths by yourself it was very neat to use an existing library.

The creation of the model is done by using another library that many peopleare familiar with. Sklearn from science kit learn was used for this purpose, asthey provide all the necessary but very time-consuming operations to split yourdata into training and testing sets, creating your model, training your modeland using your model. The whole process of doing these things is just a fewlines of code, which is very, very convenient.

When the model is evaluated and everything is looking fine, the model is savedto a binary file by using a python built in library that is called pickle. This isalso very convenient as it makes it possible to save different models and re-usethem or import them in other python projects, such as in my simulation thatwas going to use my model later on.

After the creation of a notebook-file that was doing all the above described workfor me, everything that was needed to be done was to change what data set thatshould be used. Since each run in the simulation stores the result in an excel-fileall that was needed was to copy this excel-file and import it in the notebook.The solution turned out to be very neat and made it possible to change andcompare results from different data sets in very short time.

4.7 Scaling

During the simulation there was some decisions to be made regarding the scaling.The most crucial decision was to decide when scaling should be done duringrun time of the simulation, when generating data. There has been different

27

Page 34: Dynamic allocation of servers for large scale rendering ...

4.7 Scaling 4 IMPLEMENTATION

models/rules that has been used throughout the thesis, so that might be ofinterest when discussing the overall implementation of the solution.

There are different rules for up and down-scaling, as down scaling is more volatileto the response times in terms of making wrong decisions. In general, downscaling has been done when there has been a queue size of 0 for a specifiedamount of time, since you do really not want do scale down the number ofrendering machines before you have been working down the size of the queue,and kept it in the region of 0 for a while. There has also been tested to scaledown when the server utilization has been below a certain level for a specificamount of time. The reason that you want to check your chosen parameter overtime is because that eliminates the risk of making a rushed decision that mightnot be the correct one.

For up scaling there has also been some different types of implementations.Before the regression model itself was used in the simulation, the queue perrendering machine parameter was used for the most part. When the parameterqueue per rendering machine had been averaging above a certain threshold overa certain time period, an up scaling was done. Server utilization has also beenused as an up scaling parameter, as it is describing the current situation of thesystem in terms of load and available capacity pretty good. In that case youalso look for server utilization over a certain time period, and compare it to athreshold that has been picked, 80% for instance. Usually the time period usedfor up scaling has been shorter than the one used in down scaling, as it is notas important to make the correct decision, as it would just be better, althoughmore pricey with more rendering machines. It is also more important to makea quick decision in terms of up scaling than in the case of down scaling.

When the regression model came to use in this scenario, there was a need to beable to understand how the dependent variable will change over time. Withoutthat information, there will not be of any worth using the regression model, asthe whole intention is to forecast what the response times will be in the future.Now, this was done by using the change in percent of the dependent variable,by looking back, say x minutes back in time and compare that value of thedependent variable to its current value. Then the change in terms of percentbecomes, say h. By doing this it was possible to forecast, by some mean at least,the change in response times by using using the dependent variable, increasedor decreased by h%. In the coming section it will be described why this is nota feasible solution in my case.

28

Page 35: Dynamic allocation of servers for large scale rendering ...

5 EVALUATION

5 Evaluation

5.1 Regression models before delimitation

The initial regression models were not performing in a sufficient way what soever. Pretty much everything that is described in the theory section did notmeet any requirements (kurtosis, skew, auto correlation, heteroscedasticity). Alot of different models were tested, but neither of them seemed to work. Thereason for this result was pretty obvious, but it was not anything that shouldbe changed at the time.

As mentioned above, the reason was pretty clear. The interval of the renderingtimes was way too large, as they could vary from 0.5 seconds all the way up to8.0 seconds. This caused the results from the simulation to be very random, asone could end up being ”unlucky” by having several 8.0 second-jobs in a rowin theory. Even though if the queue was not large when those ”unlucky jobs”came in, they would become a bottleneck for the entire system, as they take solong time to complete. Imagine a situation with 30% load, 3 rendering machinesthat are all available, and an initial queue size of 0, where 3 of the 8.0 secondrendering requests comes in to the system at the same time. In this situationit would on average come in 3 new requests for every second that those renderjobs are performed in the rendering machines, leaving us with an average queuesize of 24 when those jobs are finished. Then one also needs to take into accountthat some of those 24 jobs that are queued are probably another ”heavy job”,while the majority of the jobs are probably ones that takes below 1.5 second tocomplete.

Because of the combination between the arrival of the different rendering types,and the wide interval of rendering times. There is basically too much randomfactors that plays a part in the response times, that makes it very hard for aregression model to find any correlation between different attributes. Lookingback at the results that were reached, the highest correlated attribute from onesimulation was in fact only 33% correlated with the response times. In shortthere were basically too much randomness that had a severe impact on theresults, such that the regression model failed to make any predictions that wereanywhere near acceptable.

When this insight was reached, the decision to shrink the interval of rendertimes was made in order to see if that would help the result of the model. Atthe company there was also ongoing work to optimize the rendering times, some-thing that was also completed about a week after decision was taken. Insteadof having different types of request that takes different amount of time, all therequests from this point takes between 0.5 and 1.5seconds, and the distributionof the rendering times are rectangular within that range.

29

Page 36: Dynamic allocation of servers for large scale rendering ...

5.2 Regression model after delimitation 5 EVALUATION

5.2 Regression model after delimitation

After the delimitation was done, the process of creating new regression modelsstarted. For a variety of different data sets generated from the simulation therewas a regression model created and evaluated for each and every data set. Ideallythe data sets would be of the kind where there are varying response times, nottoo large response times, but not only response times in the range of 0.5 to 2seconds either. The data set should provide response times that are in a rangethat are not unrealistic to obtain in a real scenario.

The models were generated and evaluated, and one of them seemed to performwell according to some statistics and graphs. The regression model that seemedto be performing pretty well was the simplest possible, a linear regression modelwith response time as the independent variable, and queue size per renderingmachine as the dependent variable. The model is described in equation 21,where Y is the response time and x is the queue size per rendering machine.

Y = 1.01x+ 0.7585 (21)

The R2-score of the model is in the region of 0.95 to 1 for every data set that ithas been tested on, with exception for some specific data sets, but that will beexplained later. The mean squared error, mean absolute error and root meansquared error is varying a little bit depending on the data set. For a data setwith normal intervals (response times within 0 - 20 seconds) it is somewherearound 0.3 seconds which is pretty good. However, for data sets with largerand very small time intervals these metrics punishes the model very hard, andit will be explained why that happens, and why it is still an acceptable solutionfurther down.

Below this paragraph some plots are presented of a situation where there are anormal interval of response times. Normal in this case means that the responsetimes are pretty realistic and one should never end up that far off from theseresponse times, as the work of the whole thesis aims to forecast and removethose extreme response times. There will be two plots presenting data andpredictions from two exactly similar simulations. During the simulation therewas no scaling, a static load of 30% and 3 rendering machines active. There willalso be one histogram that is corresponding to the predictions of Figure 10.

30

Page 37: Dynamic allocation of servers for large scale rendering ...

5.2 Regression model after delimitation 5 EVALUATION

Figure 9: No scaling, static load of 30%, 3 rendering machines.

Figure 10: No scaling, static load of 30%, 3 rendering machines.

31

Page 38: Dynamic allocation of servers for large scale rendering ...

5.2 Regression model after delimitation 5 EVALUATION

Figure 11: No scaling, static load of 30%, 3 rendering machines.

The first two graphs are describing the predictions (in green) versus the actualvalues (in light red). The dark red is where the predictions and the observationsare intersecting with each other. The purpose of these graphs is to demonstratehow far off the predictions are from the obtained values of the response times.And as can be seen, the predictions follows along the observations pretty well.The plot of the histogram is describing on how many occasions the errors of thepredictions where within a specific time interval. With this data set the R2-score of the model was 0.95, the MSE was 0.215 seconds, the MAE was 0.372seconds and the RMSE was 0.462 seconds. Below this paragraph there is plotsof another data set with different characteristics but somewhat similar result.

32

Page 39: Dynamic allocation of servers for large scale rendering ...

5.2 Regression model after delimitation 5 EVALUATION

Figure 12: Scaling based on server utilization, static load of 30%, 3 initialrendering machines, boot up delay of 5 minutes.

Figure 13: Scaling based on server utilization, static load of 30%, 3 initialrendering machines, boot up delay of 5 minutes.

In this data set there was scaling. The scaling was based on server utilization,the load was set to be static at 30% throughout the whole simulation, therewere 3 initial rendering machines at the start of the simulation, and there wasalso a boot up delay of 5 minutes when the number of rendering machines was

33

Page 40: Dynamic allocation of servers for large scale rendering ...

5.2 Regression model after delimitation 5 EVALUATION

scaled up. The R2-score of the model was 0.94, the MSE was 0.176 seconds, theMAE was 0.339 seconds and the RMSE was 0.42 seconds. The predictions andobserved response times in this data seem to be somewhat similar to the dataset described above. The predictions are following the observed response timespretty good, and although they might be somewhat off in some areas, they areoverall pretty accurate.

However, the response times that are located somewhere between 0.5 to 1.5seconds, the predictions are not looking all too good. The data set that ispresented in the plots below is used in order to explain this phenomena in moredetail.

Figure 14: No scaling, varying load between 10% and 50%, 5 rendering machines.

34

Page 41: Dynamic allocation of servers for large scale rendering ...

5.2 Regression model after delimitation 5 EVALUATION

Figure 15: No scaling, varying load between 10% and 50%, 5 rendering machines.

This data set was generated given by a simulation with no scaling, the load wasvarying between 10% and 50%, and there was 5 rendering machines throughoutthe whole simulation. As can be seen, the regression model is not that accurateanymore. With this data set the R2-score of the model was 0.62, the MSE was0.151 seconds, the MAE was 0.318 seconds and the RMSE was 0.389 That canbe explained by looking at the generated model described in Equation 21, it isunderstandable that when the parameter x (queue size per rendering machine)is equal to zero (x = 0), the prediction will be 0.7585. However, the renderingtimes still varies between 0.5 and 1.5, at a rectangular distribution. As canbe seen though, the predictions are still pretty good in terms of following theshape of the observed response times when the response times are outside ofthe interval of the rendering times. This only occurs when there are renderingrequests in the queue, such that the dependent variable is not equal to zero.

In this specific domain this is not a problem. That is because, as stated above,the model is only inaccurate in these regions of response times when the queuesize is zero. And when the queue holds no requests, there is no need to scaleup the number of rendering machines. The whole purpose of the work in thisthesis is to be able to forecast outlying/increasing response times. Clearly thepredictions presented in the plots above are following the shape of the observedresponse times as long as the queue size is not zero. It is not 100% accurate butit is more than capable of doing a fine job of the purpose of the thesis.

With the following two data sets the situation with extremely big intervals onthe observed response times are evaluated and explained. Below there will be6 plots, describing two different data sets that have the same characteristics onthe observed response times.

35

Page 42: Dynamic allocation of servers for large scale rendering ...

5.2 Regression model after delimitation 5 EVALUATION

Figure 16: Scaling based on server utilization, varying load between 20% and35%, 3 initial rendering machines, boot up delay of 10 minutes.

Figure 17: Scaling based on server utilization, varying load between 20% and35%, 3 initial rendering machines, boot up delay of 10 minutes.

36

Page 43: Dynamic allocation of servers for large scale rendering ...

5.2 Regression model after delimitation 5 EVALUATION

Figure 18: Scaling based on server utilization, varying load between 10% and50%, 3 initial rendering machines, boot up delay of 5 minutes.

Figure 19: Scaling based on server utilization, varying load between 10% and50%, 3 initial rendering machines, boot up delay of 5 minutes.

Now there are two different data sets, that is somewhat similar in terms of theinterval of the observed response times. First of all, the mean squared error,mean absolute error and root mean squared error are all giving very unpleasantvalues. The returned values from the two data sets of these metrics are all in theinterval 2.47 ≤ x ≤ 596, which is not very optimal. That is because the errors

37

Page 44: Dynamic allocation of servers for large scale rendering ...

5.2 Regression model after delimitation 5 EVALUATION

are extremely large in some specific cases, as can be seen in Figure 18 and inFigure 16. However, it can also be see in Figure 19 and in Figure 17 that largeerrors are not occurring often at all, nonetheless the errors are so significant thatthey also generate extremely high values on all the metrics mentioned above.For R2-score, the model used in Figure 18 scores 0.94, and the model used inFigure 16 scored 0.98, and that is not bad at all, in fact it is a very good score.That is since there are so many predictions that are good, that the ones thatare really bad is compensated for.

The reason for all those abrupt ”cut off” predictions that can be seen in Figure18 and in Figure 16 can be explained by yet again looking at the regressionmodel that is described in Equation 21. Also, recall that x in the model is theparameter queue size per rendering machine. The plots below is presenting thatparameter, and the number of rendering machines over time, of the same dataset that is used in the figures above.

Figure 20: Queue size per rendering machines of the data set used by the modelin Figure 16.

38

Page 45: Dynamic allocation of servers for large scale rendering ...

5.2 Regression model after delimitation 5 EVALUATION

Figure 21: Number of rendering machines of the data set used by the model inFigure 16.

Figure 22: Queue size per rendering machines of the data set used by the modelin Figure 18.

39

Page 46: Dynamic allocation of servers for large scale rendering ...

5.2 Regression model after delimitation 5 EVALUATION

Figure 23: Number of rendering machines of the data set used by the model inFigure 18.

It can be see in Figure 22 and in Figure 20 that the parameter queue size perrendering machine is extremely drastic when the number of rendering machineschanges (Figure 23 and Figure 21). This has not been discovered earlier becauseit depends on the fact that the parameter is only effected in this extreme mannerbecause of the fact that the queue size is extremely big. If this was not the case,then the predictions would not be that extremely volatile in the case of a up ordown scale, which have been seen earlier in Figure 12 for example.

This is not a problem since there will never be situations where the responsetimes are between 50 and 250 seconds. In this specific domain the whole pur-pose is to eliminate and avoid response times that are long, and the interval ofresponse times within 50 to 250 seconds are beyond long. Before those responsetimes are ever reached, the number of rendering machines will be scaled up andthe response times will be reduced. Hence it does not really matter that themodel can be pretty inaccurate if response times are above 50 seconds, since itwill (should) never be a situation where those response times occurs. Even if asituation like that occurs, the model would still be capable of saying that thepredicted response time will be well beyond an acceptable response time, whichis the whole purpose of the model.

It would be wise to remind the reader about what is actually described in theseplots. In these plots, it is validated that the model can actually be used topredict running times, by comparing the predicted values from the test set, tothe observed values of the response times (the actual response times generated

40

Page 47: Dynamic allocation of servers for large scale rendering ...

5.3 Final solution 5 EVALUATION

from the simulation) in the test set. The plots above are not a test of theactual solution, but rather just a test that the model that should be used inthe solution can be used. Hence there are some ”dummy” down scales done insome of the simulations in order to create specific scenarios. It should also benoted that the load used in the plots is not anyway near realistic, as it in somecases increased or decreased by 1% every 10th second, which is not realistic.This was also done because the whole purpose was to test that the model isperforming well, and that it keeps even steps with the observed response timesin all occasions.

5.3 Final solution

In this section the results, after the model was implemented and used in thesimulation, will be discussed. The model was implemented and used in thesimulation in order to try and predict the response times and scale up thenumber of rendering machines if necessary. The implementation of this is brieflydescribed under the ”Implementation” section and the subsection ”Scaling”.Briefly, in order to predict the future response times, there is a need of predictinghow the parameter queue size per rendering machine will change in the future.This was done by using the change in percent over time of the parameter, and”take for granted” that it will change in the same way in the future. However,that is not really the case. Since the load of the simulation is purely based onrandomness when it comes to requests, there is actually no correlation betweenwhat happened 10 minutes ago and what will happen 10 minutes in the future.Because of that, the results from this was not really what was expected.

The evaluation of the outputs from the simulations has been that two simula-tions has been done, one with scaling where the regression model has been usedin the above mentioned fashion, and one where the up scaling has been decidedin a reactive manner when the parameter queue size per rendering machine hasbeen reaching a certain threshold (threshold = 2.0 for most cases). Other thanthis, the simulations share the exact same properties. The simulation of theplots that are presented below will have the following properties:

• 50000 entries

• Varying load between 20% and 40%, increases by 1% every 200sec

• 7.5 minutes up boot delay

• 5 minutes between each evaluation of the architecture

• 3 Initial rendering machines

• 3 is the minimal number of rendering machines

41

Page 48: Dynamic allocation of servers for large scale rendering ...

5.3 Final solution 5 EVALUATION

After the simulations were done, the output of the response times over time iscompared in order to see if there are any differences in how the response timesare progressing, keep in mind though that there are still a certain amount ofrandomness that impacts the result in bots of the simulations.

Figure 24: Result from simulation with the proactive solution.

Figure 25: Result from simulation with the reactive solution.

42

Page 49: Dynamic allocation of servers for large scale rendering ...

5.3 Final solution 5 EVALUATION

In general, the solution using the regression model to predict the response times,and scale up the number of rendering machines based on that, did better thanthe reactive counter part. But although it was better in almost every case, itwas not optimal by any means. That is because of the fact that was mentionedin the first paragraph of this section, that there is no real correlation betweenwhat happened 10 minutes ago, and what will happen 10 minutes in the future.

For a proper, or more correct, evaluation to be done in the simulation, there isa big need for more data. The product that this solution is set to be appliedon, is yet to be released. Because of that there is no user data available. Ifit was known for example how the user activity is distributed across differenthours of the day, different days during a week, different times of the year and soon, it would be a lot easier to know how the dependent variable in the model isexpected to change. If that was known, the above described model can be usedin order to predict the response times.

43

Page 50: Dynamic allocation of servers for large scale rendering ...

6 DISCUSSION

6 Discussion

6.1 Reactive vs proactive

Since the solution that was made is not very easy to verify, as stated earlier, itwould be of great interest in the thesis to discuss and compare the differencesand their respective impacts. Assume that we have a reactive auto scalingsolution that scales up when some parameter x is beyond a certain threshold,and a proactive auto scaling solution that can be used with some historical useractivity data.

Now, lets start with the scenario that we are only using a reactive solution. Howwould that impact the response times of the system? We can not say anythingexplicitly, as it depends on the user activity (load). But lets assume that wehave an increasing user activity, so the load slowly and steadily increases overtime. Then, once the threshold of the reactive solution is reached, there willbe an severe impact on the queue size before the rendering machine is booted(recall that the boot time is somewhere around 5-10minutes during normalcircumstances). The only real way to compensate for this would be to lowerthe threshold such that it is very low, and might be able to boot up before itis actually needed. But that comes with a lot of different drawbacks. Lets nowassume that there is just a peek of load occurring at a certain time, and theload then decreases significantly after that. We would then boot up anothermachine without actually having the need of another machine, which in turncosts a lot of money. More over, if the threshold is set at a very low value, itwould probably be a lot of cases where there was an ”unnecessary” up scale.That is because the threshold is set at a value where it is actually not needed,but it has to be there to compensate for the boot up delay.

There are certain things that you can not predict though, situations where itappears a ”random” load spike can not be predicted for instance. That wouldpunish the reactive solution because of the boot up delay, but what about theproactive solution? Well, since the proactive solution is based on events thathas occurred before, it will not evaluate what is happening right now. It canpredict what will happen right now, but as the load spike was ”random” it wontbe predicted. In fact, the predictive solution would probably face even moretrouble than the reactive solution in this case, as that one is monitoring whatis happening at the current time whilst the proactive uses historical data.

What would happen then if we used both a reactive and a proactive solution?Although we are evaluating these different implementations one by one, thereis nothing that says that they can not be combined. Now, what would thatmean in terms of performance? It would actually solve both these problemsmentioned above, as the ”random load spikes” would be acted upon by thereactive solution, and the ”general load” would be handled by the proactive

44

Page 51: Dynamic allocation of servers for large scale rendering ...

6.2 Solution 6 DISCUSSION

solution. Although the random load spikes would still cause long(er) responsetimes due to the boot up delay, that is the best we can do since those are justrandomly occurring. However these unpredictable extreme load spikes are notvery likely to occur in a real scenario.

6.2 Solution

The problem described in the problem definition was solved using the imple-mentations that are mentioned in the implementation section above. However,the AWS-architecture was barely used after coming up to a solution, that is dueto the fact that the solution was not really a verified solution. But since a lotof that has already been covered in the section above, where we discussed theevaluation of the solution, the focus of this section will instead be a discussionof the entire work during the thesis.

When it comes to the AWS-architecture, it turned out to be of no use what soever in the thesis, as it was more or less only implemented such that it could betested with, and verify the solution. Now, since the time ran out and due to thefact of lacking data of user activity, the solution that could be put in the AWS-architecture would not be sufficient enough to produce any results that wouldbe interesting, as it can already be seen in the simulation that the solution cannot be verified to an acceptable extent.

The simulation is something that is working extremely well, and it took a lot oftime to complete it. But does two things usually goes hand in hand. In order toget something to work well, there is often a lot of time that needs to be put intoit. Even though it is very nice that the simulation turned out to be very goodand handy, it is a little bit annoying that it took longer time than expected toget it working properly. It did not necessarily take that much time to get itrunning from the scratch, but there has been several bugs that was found andneeded to be addressed during the use of the simulation further on in the thesis.

6.3 What could have been done differently?

6.3.1 Planning

Planning is so essential when you start a pretty big project, and in my project Iwas very lost in the beginning. For instance, from the start it was said that thesimulation would be done in NS-3 (a network simulation tool), which I spenta lot of time on trying to figure out how to use and how it could be used inmy project. I did not really understand how that was going to work out, as Idid a lot of research in the first couple of weeks, and did not find any usableexamples. It should have been planned better from the beginning, and I should

45

Page 52: Dynamic allocation of servers for large scale rendering ...

6.4 Problems 6 DISCUSSION

have stated as soon as possible that I was neither comfortable or convinced ofusing NS-3, which I was pretty aware of even when that was brought up as anoption.

That is just one example where the planning was pretty insufficient. As it turnedout, pretty much of what I did in the first two weeks was of no use what so ever.I started implementing stuff that I thought would be of use later in the project,which was never used. Instead it would have been much better to spend time onreally trying to understand how I should build up my AWS architecture, whichI also did, but not as much as I should have done. It was not very easy eitherto ”trial and error” with the AWS-architecture, as I did not have access to anAWS-account until the fourth week of the project.

6.3.2 Multi requests from client

If I were to redo one thing in the project I would probably have gone for adifferent approach when it comes to sending multiple rendering requests fromthe web application. Rather than implementing a whole new solution in the webserver, that took very long time due to the asynchronous behaviour of Node.js,I would have just reused the already existing solution that I had been doing (forsending a single request). If I had done that it would not have been necessarydo to a lot of modifications on the web server. It would be needed to do somemodifications, but not nearly as many as I was forced to due when implementingcompletely new logic.

However, this approach came to mind just by the time that I had been complet-ing the new implementation. An implementation that also was working. Henceit felt like there are no real benefits from changing the implementation by thattime. The only real benefit that I can come up with would have been that youcould have been updating the charts on the web application dynamically duringrun time of the multi request, which would have been pretty cool.

6.4 Problems

During my thesis work there has been some problems that has been encountered.In this subsection some of those problems will be described. What caused theproblems? How did the problems get solved and so on will be discussed.

6.4.1 Response from rendering machines

This was a problem that occurred when I realised that you can not just sendback the response by using a ”regular response object”. For example when you

46

Page 53: Dynamic allocation of servers for large scale rendering ...

6.4 Problems 6 DISCUSSION

are dealing with a web server you normally have a certain URL that the clientis making a GET/POST-request to. When the route (URL) in the web server ispoached you have a Request-object and a Response-object that are used in orderto communicate with the client. That is also the case between the client andthe web application in my thesis work. However, the communication betweenthe rendering machines and the web server uses a SQS-queue. Hence, there areno response objects to use from the rendering machines. Instead, before sendingany request to the SQS-queue, the IP-address of the web server had to be putinside the message body, such that the rendering machine knew exactly whereto send the response.

6.4.2 Map response to correct client

Now, by reading the problem described above, we understand that we will getour responses from a rendering request at a specific route on the web server. Butin order to know which user this response corresponds to, some thinking wasrequired before the solution was done. The solution is that when a client sends arendering request to the web server, before the rendering request is forwarded tothe SQS-queue, the response object to that request is stored in the web server inaddition to a specific message ID. The message ID is a unique ID for every singlemessage in the SQS-queue. Hence, when we receive our POST-request from therendering machine, indicating that the render is executed, we also receive thespecific message ID from the message body to that request. The message IDis in other words extracted along with the request from the SQS-queue in therendering machines, and included in the request body that is sent back to theweb server. Then when the web server is receiving a request from the renderingmachine, it is doing a lookup on the ID in the data structure that holds responseobjects, extracts the object, deletes the entry in the data structure, and usesthe previously stored response object to respond to the client with the data.

6.4.3 Multi request

When dealing with multiple requests from the client I had some issues to fig-ure out how to do that in the best possible way. I knew that the data in theresponse from the rendering machines had to be stored somewhere, and howand where should that be done. I also wanted this to act as similar as possi-ble to the simulation, because otherwise it would not be justified to comparethe results from two respective runs (from the simulation and from the AWS-architecture). Now, the simulation is described more specifically in section 5.3.Briefly described the simulation is based on entries with 0.1 seconds betweenthem. At each entry there is a certain possibility that a request is occurring.This probability corresponds to the load parameter.

47

Page 54: Dynamic allocation of servers for large scale rendering ...

6.4 Problems 6 DISCUSSION

After some thinking I decided to implement this on the server side. From theclient you send how many entries you want (in total, not only those who turnsinto requests) and a load parameter, in the server side all the entries are thenevaluated (computing whether there will be a request at the current entry)with 0.1 seconds between them, which is scheduled using setTimeout. All theresponses and their respective data are stored in the web server until the lastrequest. The storage of the data and the mapping to clients are solved in amore or less identical way as its done when handling a single rendering request.The last entry always turns into a request. By doing so we know for sure whenthe execution is completed, and the data is returned to the client.

6.4.4 Timing in Node.js

In the section above the problem of execution of multiple requests are discussed.The importance of timing and scheduling is extremely important in this solution,because not only does the last entry need to be processed last, all the data isstored in chronological order on each entry. If there are mismatches where entryB is processed before A and A < B, the results could become misleading. Thereare also a lot of API-calls to AWS done for each request. For example the size ofthe queue and the number of instances (rendering machines) in the auto scalinggroup is fetched and put into each message bodies.

This may not seem like a problem, but because of the nature of Node.js’s asyn-chronous behaviour you could find yourself getting some extremely frustratingbugs when it comes to problems like these, and boy have I had some. I have spentmany hours on resolving issues related to promises (handling of asynchronousexecution), especially in the beginning when I did not even think about thosethings when implementing some functionality. Lets say I have learned the hardway during this thesis.

48

Page 55: Dynamic allocation of servers for large scale rendering ...

7 CONCLUSIONS AND FUTURE WORK

7 Conclusions and future work

7.1 Conclusions

In order to conclude this thesis it would be appropriate to head back to theintroduction and see what was actually tried to achieve. Is it possible tocreate a model that can be used to predict the response times ofrequests? If so, could that model be used in order to proactivelyscale up or down the number of rendering machines? How does itcompare to a reactive solution? Is the problem definition that can be foundin the introduction. If one reads the evaluation section, under the subsectionwhere the regression model was discussed, one can clearly see that yes, it ispossible to both create a model that is capable of predicting response times,and to use it in order to proactively scale the number of rendering machines.The results might not have been optimal, but for the scope of the task they weredefinitely acceptable.

When it comes to the comparison between a reactive solution and this proactivesolution, it is not as easy to answer that question. Even though it was notcompletely possible to test the solution in a proper way, given that there are nodata of any user activities yet, I feel pretty comfortable that the solution couldbe integrated in a system with great results. There are no guarantees though. Ifsome changes are done on the product itself, when it comes to rendering timesor other things, the regression model will have to be rebuilt with new data, andthat might be a completely different story.

The key takeaways for me personally during this thesis work has been that youcan never ”trust the stats” completely. I say this with the ”stats” of building aregression model in mind. Even though I had kurtosis and low r2-score and whatnot when building and testing different kinds of regression models, there hasbeen quite good results either way, sometimes not, sometimes yes. It all comesdown to your specific domain, your specific task and your specific goal. Test themodels for different data, evaluate how it performs during different conditions,understand why it performs in a specific way during certain conditions. At theend of the day it is just math, and there are always explanations, some areobvious while some might need some thinking.

7.2 Future work

As of future work the most essential thing would be to implement the usage ofuser activity data, such that the model can be tested under good conditions inthe simulation. That requires that there actually is some user activity data thatis available though, thus it is something that needs to be done in the future.

49

Page 56: Dynamic allocation of servers for large scale rendering ...

7.2 Future work 7 CONCLUSIONS AND FUTURE WORK

After that it would also be necessary to integrate this solution in the AWS-architecture also, as that one is not doing anything necessary at all by now, andthe whole purpose with it was to integrate the solution in it.

Another thing that I personally would be interested in is to check whetheror not it is necessary to use any regression model at all in order to create asolution that can scale the number of rendering machines up and down. Duringthe course of this thesis I have read so much theory, especially about the stuffthat I have been writing about under the ”Theory” section. Since then I havealmost been stuck with the feeling that there might not be any need for asolution with a regression model what so ever, as the only things needed arethe theoretical calculations in combination with user activity data. Althoughit might not be as good as if you use a regression model, I really believe thatit could give good/acceptable results. Another reason for using them is that abig problem by using for example my regression model is that you need to knowor understand how the dependent variable itself is dependent of the load. Byusing the theoretical models instead, you eliminate this factor as you are onlyconcerned with the load/arrival rate itself. So, for simplicity reasons this wouldalso be beneficial.

50

Page 57: Dynamic allocation of servers for large scale rendering ...

REFERENCES REFERENCES

References

[1] Ivo Adan and Jacques Resing. “Simple analysis of a fluid queue driven byan M/M/1 queue”. In: Queueing Systems 22.1 (1996), pp. 171–174. doi:https://doi.org/10.1007\%2FBF01159399.

[2] AWS. ”Amazon EC2 Auto Scaling benefits”.https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-

scaling-benefits.html. May 9, 2021.

[3] Jeff Barr. ”Predictive Scaling for EC2, Powered by Machine Learning”.https://aws.amazon.com/blogs/aws/new-predictive-scaling-for-

ec2-powered-by-machine-learning/. May 9, 2021.

[4] Sergio Bermejo and Joan Cabestany. “Oriented principal component anal-ysis for large margin classifiers”. In: Neural Networks 14.10 (2001), pp. 1447–1461. doi: https://doi.org/10.1016/S0893-6080(01)00106-X.

[5] G. E. P. Box and David A. Pierce. “Distribution of Residual Autocor-relations in Autoregressive-Integrated Moving Average Time Series Mod-els”. In: Journal of the American Statistical Association 65.332 (1970),pp. 1509–1526. doi: https://www.jstor.org/stable/2284333?seq=1.

[6] T. S. Breusch and A. R. Pagan. “A Simple Test for Heteroscedasticity andRandom Coefficient Variation”. In: Econometrica 47.5 (1979), pp. 1287–1294. doi: https://www.jstor.org/stable/1911963?seq=1.

[7] Jason Brownlee. Linear Regression for Machine Learning.https://machinelearningmastery.com/linear- regression- for-

machine-learning/. May 9, 2021.

[8] A.Colin Cameron and Frank A.G.Windmeijer. “An R-squared measure ofgoodness of fit for some common nonlinear regression models”. In: Journalof Econometrics 77.2 (1997), pp. 329–342. doi: https://doi.org/10.1016/S0304-4076(96)01818-0.

[9] Sachin Date. Testing for Normality using Skewness and Kurtosis.https://towardsdatascience.com/testing-for-normality-using-

skewness-and-kurtosis-afd61be860. May 9, 2021.

[10] Edgardo Ruiz. Seamus Freyne. Multiple Regression Model for Load Ratingof Reinforced Concrete Bridges.https://journals.sagepub.com/doi/full/10.1177/0361198120922546?

casa_token=cyKYCfxvAa0AAAAA%3AwdCcmIwIXQ9gGDyuPRpjG4n42IHNGnGhFoD2vik31kV9OufRaShGw-

E4vDzVGK1D8YY3E6DrogWt. May 9, 2021.

[11] Jim Frost. Regression Analysis: An Intuitive Guide for Using and In-terpreting Linear Models. Statistics By Jim Publishing, 2020. isbn: 978-1735431185.

[12] Erol Gelenbe and Guy Pujolle. Introduction To Queueing Networks, 2ndEdition. Wiley, 1998. isbn: 978-0471962946.

51

Page 58: Dynamic allocation of servers for large scale rendering ...

REFERENCES REFERENCES

[13] Erol Gelenbe and Guy Pujolle. “Introduction To Queueing Networks, 2ndEdition”. In: Wiley, 1998. Chap. 3.1.2.

[14] Peter Grant. Understanding Multiple Regression.https://towardsdatascience.com/understanding-multiple-regression-

249b16bde83e. May 9, 2021.

[15] Rob J Hyndman and Anne B Koehler. “Another look at measures offorecast accuracy”. In: International Journal of Forecasting 22.4 (2006),pp. 679–688. doi: https://doi.org/10.1016\%2Fj.ijforecast.2006.03.001.

[16] Shanhong Liu. ”Cloud Computing - Statistics & Facts”.https://www.statista.com/topics/1695/cloud-computing/. May 9,2021.

[17] Stephan Rasp. Peter Dueben. Sebastian Scher. Jonathan Weyn. SoukaynaMouatadid. Nils Thuerey. WeatherBench: A Benchmark Data Set forData-Driven Weather Forecasting.https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/

2020MS002203. May 9, 2021.

[18] Halbert White. A Heteroskedasticity-Consistent Covariance Matrix Esti-mator And A Direct Test For Heteroskedasticity.https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.

7646. 1980.

[19] Cort J. Willmott and Kenji Matsuura. “Advantages of the mean absoluteerror (MAE) over the root mean square error (RMSE) in assessing averagemodel performance”. In: Climatic Research 30.1 (2005), pp. 79–82. doi:10.3354/cr030079.

[20] Jiayu Qiu. Bin Wang. Changjun Zhou. Forecasting stock prices with long-short term memory neural network based on attention mechanism.https://journals.plos.org/plosone/article?id=10.1371/journal.

pone.0227222. May 9, 2021.

52


Recommended