+ All Categories
Home > Documents > Machine Learning Approach to the Design of Autonomous...

Machine Learning Approach to the Design of Autonomous...

Date post: 07-Sep-2021
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
56
Master Thesis Computer Science 02 2019 Machine Learning Approach to the Design of Autonomous Construction Equipment applying Data-Driven Decision Support Tool Aniruddh Goteti Faculty of Computing Blekinge Institute of Technology SE–371 79 Karlskrona, Sweden
Transcript
Page 1: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Master ThesisComputer Science02 2019

Machine Learning Approach to theDesign of Autonomous Construction

Equipment applying Data-DrivenDecision Support Tool

Aniruddh Goteti

Faculty of ComputingBlekinge Institute of TechnologySE–371 79 Karlskrona, Sweden

Page 2: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology inpartial fulfillment of the requirements for the degree of Master of Science in Computer Science.The thesis is equivalent to 20 weeks of full-time studies.

Contact Information:Author:Aniruddh GotetiE-mail: [email protected]

University advisor:Dr. Lawrence HeneseyFaculty of Computing

Faculty of Computing Internet : www.bth.seBlekinge Institute of Technology Phone : +46 455 38 50 00SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57

Page 3: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Abstract

Design engineers working in construction machinery industry face lotof complexities and uncertainties, while taking important decisionsduring the design of construction equipment. These complexities canbe reduced by the implementation of a data-driven decision supporttool, which can predict the behaviour of the machine in an operationalcomplexity and give some valuable insights to the design engineer.This data-driven decision support tool must be supported by a suit-able machine algorithm. The focus of this thesis is to find a suitablemachine algorithm, which can predict the behaviour of a machine andcan later be involved in the development of such data-driven decisionsupport tools.In finding such solution, evaluation of the regression performance offour supervised machine learning regression algorithms, namely Sup-port Vector Machine Regression, Bayesian Ridge Regression, DecisionTree Regression and Random Forest Regression, is done. The evalu-ation was done on the data-sets personally observed/collected at sitewhich were extracted from the autonomous construction machine byProduct Development Research Lab (P.D.R.L). Experiment is chosenas a research methodology based on the quantitative format of thedata set. The sensor data extracted from the autonomous machine isin time series format, which in turn is converted to supervised datawith the help of sliding window method. The four chosen algorithmsare then trained on the mentioned data-sets and are evaluated withcertain performance metrics (MSE, RMSE, MAE, Training Time).Based on the rigorous data collection, experimentation and analysis,Bayesian Ridge Regressor is found to be the best compared with otheralgorithms in terms of all performance metrics and is chosen as the op-timal algorithm to be used in the development of data-driven decisionsupport tool meant for design engineers working in the constructionindustry.

Keywords: Machine Learning, Regression, Time Series, ConstructionEquipment

Page 4: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

AcknowledgmentsI express my deep sense of gratitude and thanks to Dr. Lawrence Henesey, forhis remarkable supervision, outstanding guidance and encouragement. This workwould not have been possible without his immense knowledge and exceptionalguidance. I also thank the researchers at the Product Development Research Lab(P.D.R.L), who guided and helped me in finding the solution and provided accessto the huge data sets I would also like to thank my parents and grandparents whobelieved in me and pushed me to my best, in order to finish this thesis.

Finally, I would like to thank my friends for their unconditional love and con-tinuous support.

ii

Page 5: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

List of Figures

1.1 Three elements of the Data Driven Decision support Tool . . . . . 2

2.1 Seismograph showing an event arrival (Source: The National Sci-ence Foundation) . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Example of Decision Tree Regression . . . . . . . . . . . . . . . . 13

3.1 Methodology followed in this thesis . . . . . . . . . . . . . . . . . 173.2 Example time series data (Source: Machine Learning Mastery) . . 203.3 After applying Sliding Window method (Source: Machine Learning

Mastery) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.4 PACF chart (Source: Kaggle) . . . . . . . . . . . . . . . . . . . . 233.5 Walk Forward Validation . . . . . . . . . . . . . . . . . . . . . . . 26

4.1 Augmented Dickey-Fuller Test . . . . . . . . . . . . . . . . . . . . 294.2 Augmented Dickey-Fuller Test after applying Differencing Trans-

form on the Label data . . . . . . . . . . . . . . . . . . . . . . . . 294.3 PACF chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.4 SVM-R MSE boxplot . . . . . . . . . . . . . . . . . . . . . . . . . 314.5 SVM-R MAE boxplot . . . . . . . . . . . . . . . . . . . . . . . . 314.6 SVM-R RMSE boxplot . . . . . . . . . . . . . . . . . . . . . . . . 314.7 Training time of 10-fold walk forward validationSV R . . . . . . . 314.8 Bayesian Ridge Regressor MSE boxplot . . . . . . . . . . . . . . . 324.9 Bayesian Ridge Regressor MAE boxplot . . . . . . . . . . . . . . 324.10 Bayesian Ridge Regressor RMSE boxplot . . . . . . . . . . . . . . 324.11 Training time of 10-fold walk forward validationBR . . . . . . . . 334.12 Decision Tree Regressor MSE boxplot . . . . . . . . . . . . . . . . 334.13 Decision Tree Regressor MAE boxplot . . . . . . . . . . . . . . . 334.14 Decision Tree Regressor RMSE boxplot . . . . . . . . . . . . . . . 334.15 Training time of 10-fold walk forward validationDTR . . . . . . . 344.16 Random Forest Regressor MSE boxplot . . . . . . . . . . . . . . . 354.17 Random Forest Regressor MAE boxplot . . . . . . . . . . . . . . 354.18 Random Forest Regressor RMSE boxplot . . . . . . . . . . . . . . 354.19 Training time of 10-fold walk forward validationRFR . . . . . . . 36

iii

Page 6: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

5.1 Comparison of MAE achieved by SVR, BR, DT R and RFR on10-fold walk forward validation tests. . . . . . . . . . . . . . . . . 37

5.2 Comparison of MSE achieved by SVR, BR, DT R and RFR on10-fold walk forward validation tests. . . . . . . . . . . . . . . . . 38

5.3 Comparison of RMSE achieved by SVR, BR, DT R and RFR on10-fold walk forward validation tests. . . . . . . . . . . . . . . . . 39

5.4 Comparison of Training Time achieved by SVR, BR, DT R andRFR on 10-fold walk forward validation tests. . . . . . . . . . . . 40

5.5 Kolmogorov-Smirnov test . . . . . . . . . . . . . . . . . . . . . . 40

iv

Page 7: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Contents

Abstract i

1 Introduction and Problem Statement 1

2 Background and Related Work 42.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Construction Equipment, Operational Context and Perfor-mance Measure . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.2 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . 82.2.1 Support Vector Machine Regressor . . . . . . . . . . . . . 92.2.2 Bayesian Ridge Regression . . . . . . . . . . . . . . . . . . 112.2.3 Decision Tree Regressor . . . . . . . . . . . . . . . . . . . 122.2.4 Random Forest Regressor . . . . . . . . . . . . . . . . . . 13

2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Method 163.1 Software Environment . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.1.2 Jupyter Notebook . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3.1 Sliding Window Method . . . . . . . . . . . . . . . . . . . 203.3.2 Correlation based Feature Selection . . . . . . . . . . . . . 24

3.4 Walk Forward Validation . . . . . . . . . . . . . . . . . . . . . . . 253.5 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . 273.5.2 Statistical Test . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Results 294.1 PACF Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2 Support Vector Machine (SVM) Regressor . . . . . . . . . . . . . 304.3 Bayesian Ridge Regressor . . . . . . . . . . . . . . . . . . . . . . 32

v

Page 8: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

4.4 Decision Tree Regressor . . . . . . . . . . . . . . . . . . . . . . . 334.5 Random Forest Regressor . . . . . . . . . . . . . . . . . . . . . . 35

5 Analysis 375.1 Comparative study of Performance Metrics observed from Support

vector machines Regression, Bayesian Ridge Regression, DecisionTrees Regression and Random forests Regression on experimentaldata set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.1.1 Mean Absolute Error (MAE) . . . . . . . . . . . . . . . . 375.1.2 Mean Squared Error (MSE) . . . . . . . . . . . . . . . . . 385.1.3 Root Mean Squared Error (RMSE) . . . . . . . . . . . . . 385.1.4 Training Time . . . . . . . . . . . . . . . . . . . . . . . . . 395.1.5 Statistical Test . . . . . . . . . . . . . . . . . . . . . . . . 40

5.2 Key Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.5 Threats to validity . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.5.1 Internal Validity . . . . . . . . . . . . . . . . . . . . . . . 425.5.2 External Validity . . . . . . . . . . . . . . . . . . . . . . . 425.5.3 Conclusion Validity . . . . . . . . . . . . . . . . . . . . . . 43

5.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6 Conclusions and Future Work 44

vi

Page 9: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 1Introduction and Problem Statement

It is estimated that by 2030, Sweden will invest €64billion in the Engineering andConstruction (E & C) industry, which will inevitably make it one of the flourishingsectors in Sweden. A research survey of McKinsey revealed that there is a grow-ing focus on the technological solutions that incorporate Artificial Intelligence(AI)-powered algorithms[1] in E & C industry. These emerging technologies fo-cus on helping players overcome some of the E & C industry’s greatest challenges,including cost and schedule overruns and safety concerns. Construction machinesare widely used in the E & C industry. These machines handle heavy loads andoperate in dusty, Noisy and hazardous surroundings. In order to prevent humanoperators from getting injured and also in order to increase productivity of thework being accomplished, the machines are been operated unmanned with thehelp of artificial intelligence (A.I) algorithms. These machines are known as au-tonomous machines.To design the construction machine/ autonomous construction machine, the de-sign team shall have a good knowledge about how a particular machine willperform in different design configurations and operational circumstances and alsounderstand how a change in an engineering characteristic of a construction ma-chine would impact its performances. They also need to assess in which situationsit is beneficial to use a non-autonomous, non-intelligent/manned machine ratherthan an autonomous one.

Usually such assessments are made by the intuition of an experienced designengineer or with the help of a assumption-based analytical tool [26]. The out-comes of such assessments or decisions taken by the design engineers using thesetools are often unpredictable and uncertain. Thus in order to help design en-gineers working in construction equipment industry take better decisions, it isrequired to develop a tool which can predict the behavior of the construction ma-chine. This will lead to design engineers take decisions, which have a predictableoutcome.

When the decision support tool is used, it predicts and visualizes how a ma-chine will work in the given circumstances/ operational contexts and with certain

1

Page 10: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 1. Introduction and Problem Statement 2

design configurations and can help the design engineer make better decisions.Thus in order to develop a data-driven decision support tool to fulfill such re-quirements, Product Development Research Lab (P.D.R.L) in Blekinge Instituteof Technology, Karlskrona in collaboration with a renowned construction equip-ment company initiated a research project. This research project is based onthe study by Bertoni et al. which introduced the concept of integrating designengineering with machine learning.

The above mentioned research project is divided into three elements as shownin figure 1.1

Figure 1.1: Three elements of the Data Driven Decision support Tool

This thesis focuses on the second part i.e. to choose a optimal machine learningapproach which can be used to implement a decision support tool. Thus the aimof this thesis can be framed in a formal way as

"To select a suitable and efficient machine learning algorithm which can be usedto predict the performance of an autonomous machine in accordance to the envi-ronment in which the machine is being operated."

Two research questions have been constructed in order to achieve the aim of

Page 11: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 1. Introduction and Problem Statement 3

this thesis;

• RQ1:Which supervised machine learning algorithm is best suited for pre-dicting the performance of the machine in correlation with the various op-erational contexts in which the machine is being operated and why?

• RQ2:What are the results of the best suitable algorithm?

These research questions can be answered by evaluating the regression perfor-mance of four chosen regression algorithms namely SVR (Support Vector Re-gressor), Bayesian Ridge, Decision Tree Regressor and Random Forest Regres-sor on the data-set provided by Product Development Research Lab (P.D.R.L),Blekinge Institute of Technology. The data set is extracted from mini modelsof autonomous construction machine/equipment on a simulated operational en-vironment. The regression results are analyzed and the algorithm showing bestresults is selected to answer RQ1. The results of best suitable machine learningalgorithm answers RQ2. The rationale behind framing RQ2 is to leave a futurescope where the results of the best algorithm in this experiment can be comparedwith statistical models or deep learning models. This thesis also introduces aframework to handle the sensor data for the P.D.R.L project.

Page 12: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 2Background and Related Work

2.1 BackgroundThe first essential step in this study is to understand the process of analyzingtime series data using machine learning. The following section provides a briefgroundwork to acquire the knowledge of time series analysis using machine learn-ing along with the current state of research in the area where machine learning isused as a solution to overcome challenges in industrial applications. But beforewe start discussing the core concepts of this thesis, it is worthwhile to give a briefintroduction to the industrial context considered in this study.

2.1.1 Construction Equipment, Operational Context andPerformance Measure

Construction equipment, like excavators, wheel loaders, and articulated haulers,are vehicles that are constructed to perform tasks at construction sites, quarries,and open pits [2]. Each construction equipment is assigned an individual taskto accomplish in the construction site but in the end, they mostly collaborate toachieve common goals. These construction equipment are of two types: Mannedand Autonomous .

Manned construction equipment is construction equipment, which works on aconstruction site and accomplishes its assigned tasks under the guidance and su-pervision of a human operator. A typical scenario of a non-autonomous workingquarry site includes an articulated hauler that is loaded with material by a wheelloader. The wheel loader operator supervises the process and honks the hornwhen the hauler is fully loaded. The hauler then transports the material anddumps it somewhere else at the site [2].A point to note is that all the moves of the hauler is planned and controlled bythe human operator.

Autonomous construction equipment is a construction equipment, which per-forms assigned tasks on a construction site without constant guidance of the

4

Page 13: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 2. Background and Related Work 5

human operator. A typical scenario of autonomous working quarry site includesan autonomous hauler that is loaded by a material by a wheel loader. This stepends when the sensor attached to the autonomous hauler senses that the hauleris fully loaded. The autonomous hauler then transports the material and dumpsit until the sensor attached to it senses that all load has been dumped.A point to note is that the autonomous hauler is this scenario moves on its ownwhile using image processing and path planning algorithms to accomplish its taskswithout any human intervention.

Performance Measure are physical attributes of a construction equipment whichcan be used to measure the performance of the equipment in a numerical way andmake decisions based on it. For example, we are using Speed of the constructionequipment as a performance measure.

2.1.2 Time Series

According to the Engineering Statistics Handbook, the term "time series" can bedefined as " An ordered sequence of values of a variable at equally spaced timeintervals" [3]. Time series data has two mandatory components: time units andthe corresponding value assigned for the given time unit. To be concise, a timeseries tracks the movement of the chosen data points over a specified period oftime with data points recorded at regular intervals. There is no minimum or max-imum amount of time that must be included, allowing the data to be gathered ina way that provides enough information being sought by the analyst examiningthe activity [4].In simple words, the main characteristic of time series data is that there can beat most one value for each time unit at the same time.

Figure 2.1 is a demonstration of a seismograph of the seismic data collectedby Incorporated Research Institutions for Seismology (IRIS), also an example oftime series data.

Page 14: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 2. Background and Related Work 6

Figure 2.1: Seismograph showing an event arrival (Source: The National ScienceFoundation)

Time Series data is recorded or collected in two ways: Discrete and Continuous.Discrete time series data has a discrete set of values, which are measured forspecific time-stamps that may occur periodically, or occasionally according toconcrete conditions. This is frequently observed in practice in economic sectors.Continuous time series data has data, which is recorded continuously along thetime intervals. Figure 2.1 is a seismograph of an earthquake, where the datacollected is example of a continuous time series.

The data extracted from the construction equipment machine are in the formof continuous time series data.Time series data is classified in two types:

• Stationary Time Series: Stationary time series are time series, for whichstatistical properties like mean value or variance, are constant over time.These time series stay in relative equilibrium in relation to its correspondingmean values [29].

• Non-Stationary Time Series: Non- Stationary time series are time se-ries, for which statistical properties like mean value or variance, are changingover time [29].

In industry, trading or economy, time series more frequently belongs to the non-stationary category. In order to deal with the forecasting task, non-stationarytime series are usually transformed to the stationary ones, by the appropriatepre-processing methods.

The general idea of time series forecasting is based on the fact, that informa-tion about the past events can be effectively exploited to create predictions aboutthe future events [33], with the help of the forecasting models.

Page 15: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 2. Background and Related Work 7

These forecasting models can be statistical time series forecasting models or con-vert the time series problem into a supervised machine learning problem andsolved using machine learning algorithms. ´

Even though machine learning algorithms are evaluated in this thesis, under-standing concepts of time series analysis is crucial. That is because the data ex-tracted from the machine is in the form of time series data and even though thisdata is converted into supervised data, there are still some elements of time seriesdata which should be considered (such as maintaining sequence etc). In the nextsections, the concepts of machine learning is introduced and in the methodologysection, the conversion of time series problem to a supervised machine learningproblem is explained.

2.1.3 Machine Learning

According to Britannica Academic, "Machine learning, in artificial intelligence (asubject within computer science), discipline concerned with the implementationof computer software that can learn autonomously" [6]. Thus in simple words, acomputer first learns to perform a task by studying a training set of examples.The computer then performs the same task with data it hasn’t encountered before[7].

Machine learning is used in different domains. Here are a few examples:

1. Security heuristics that distills attack patterns to protect, for instance, portsor networks [7].

2. Image analysis to identify distinct forms and shapes, such as for medicalanalyses or face and fingerprint recognition[7].

3. Deep learning to generate rules for data analytic and big data handing, suchas are used in marketing and sales promotions [7].

Machine Learning techniques are categorized into three categories on the basis ofnature of data they require to compute. These three categories are:

Supervised Learning: In supervised learning, the target is to infer a function ormapping from training data that is labeled [8]. The training data consist of inputvector X and output vector Y of labels or tags. A label or tag from vector Y isthe explanation of its respective input example from input vector X and togetherwith the input vector, it forms a training example [8]. Thus the goal of the algo-rithm is to learn a general rule, which maps inputs to their corresponding outputs.

Page 16: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 2. Background and Related Work 8

Unsupervised Learning: Unsupervised learning is the training of an machinelearning algorithm using information that is neither classified nor labeled and al-lowing the algorithm to act on that information without guidance [53]. In simplewords, the goal of unsupervised learning is to find patterns in the data withoutany labels.

Reinforcement Learning: Reinforcement learning is a training method basedon rewarding desired behaviors and/or punishing undesired ones [54]. The learn-ing method has been adopted in artificial intelligence (AI) as a method of direct-ing unsupervised machine learning through rewards and penalties. Reinforcementlearning is used in operations research, information theory, game theory, controltheory, simulation-based optimization, multiagent systems, swarm intelligence,statistics and genetic algorithms [54].

Considering a different perspective of machine learning, the problems of machinelearning can be categorized into two categories:

• Classification predictive modeling is the task of approximating a mappingfunction (f) from input variables (X) to discrete output variables (y) [55].For example, spam filtering i.e. classifying a mail into spam and not spam.

• Regression predictive modeling is the task of approximating a mappingfunction (f) from input variables (X) to a continuous output variable (y)[55]. For example, predicting the cost of the house in the next years.

Since our goal is to predict the behavior (Speed of the machine in this thesis)of the construction machine which is a continuous variable, the problem in thisthesis is a regression problem. In the next section, the various machine learninghave been discussed and the intuition behind choosing them over other machinelearning algorithms have also been explained.

2.2 Machine Learning AlgorithmsThe main focus of this thesis is to select a suitable and efficient machine learningalgorithm which can be used for prediction of behaviour of a machine. The firststep involves selecting a set of machine learning algorithms which are suitablesolving design engineering problems.

According to the study by Bertoni et al., the design engineers are required tomake rapid and quick decisions during crucial situations. Also during the duringthe initial take off of the project, there will be lack of proper storage and comput-ing resources. Considering these factors, two characteristics are expected from

Page 17: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 2. Background and Related Work 9

the behaviour of machine learning algorithms which are: Fast prediction speedand Less memory usage.

Mathworks has published a study which states the characteristics of variousmachine learning algorithms taking multiple experiments in consideration [36].Thus four regression algorithms are selected from the mentioned study to beevaluated in this thesis considering the requirements of the design engineer. Thesefour machine learning algorithms are

• Support Vector Machine Regressor (SVR)

• Bayesian Ridge Regressor (BR)

• Decision Tree Regressor (DTR)

• Random Forest Regressor (RFR)

2.2.1 Support Vector Machine Regressor

Support vector machine (SVM) analysis is a popular machine-learning tool forclassification and regression, first identified by Vladimir Vapnik and his colleaguesin 1992 [57]. The SVM algorithm which deals with regression problem is calledas Support Vector Regressor (SVR). In SVR, the actual outputs (label data) areplotted on the graph (green dots in below graph). The goal is to find a functionwhich can bound these points within a certain threshold. This function is alsoknown as a hyper-plane.

Page 18: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 2. Background and Related Work 10

In simple regression problems, the main aim of the machine learning model is tominimize the error while predicting output values. But in SVR, the main aim ofthe model is to fit the error in a certain threshold. The blue line is the abovefigure is called as a hyper-plane and the red lines are called as boundary lines.A hyper-plane is classification problems is used to seperate two classes. In SVR,it plays a role to predict the output by fitting the maximum number of outputpoints (from the training data) between the boundary lines and also at the sametime be flat as possible.

Assuming the hyperplane to be a straight line going through the Y axis, itsequation can be formulated as:

wx+ b = 0 (2.1)

Thus it can also be assumed that the boundary lines are e and −e distance fromthe hyper-plane. Thus the equations of the boundar lines can be formulated as:

wx+ b = e (2.2)

and,wx+ b = −e (2.3)

Thus in order to find an optimal hyperplane, following constraints should befollowed:

e <= y − wx− b <= e (2.4)

Thus the optimal hyperplane can be obtained by minimizing the w as much aspossible (to achieve as much flatness as possible) while following the constrainstoo. This can be shown in a graphical way as follows:

Page 19: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 2. Background and Related Work 11

In the equations given above, xi represents the predictors, yi represents the de-pendent variable, ε is the threshold such that all predictions have to be within εrange of the true value. SVM regression is considered a non-parametric techniquebecause it relies on kernel functions [57].

A kernel is a method of using a linear classifier to solve a non-linear classifi-cation task [56]. Also in order to be flexible to the errors, a soft error margin canalso be obtained by introducing some "slack" variables. This helps in generalizingthe model.

As the algorithm finds threshold function to fit all the output points rather thantrying to predict a output and minimizing the error while comparing it with ac-tual output, it is memory efficient and the prediction speed depends on the choiceof kernel function.

2.2.2 Bayesian Ridge Regression

Ridge Regression is a technique for analyzing multiple regression data that sufferfrom multicollinearity [70]. When multicollinearity occurs, least squares estimatesare unbiased, but their variances are large so they may be far fromthe true value[70]. By adding a degree of bias to the regression estimates, ridge regressionreduces the standard errors [70]. It is hoped that the net effect will be to giveestimates that are more reliable.

The cost function for ridge regression is given as:

λ given here, is actually denoted by alpha parameter in the ridge function. Soby changing the values of alpha, the penalty term is being controlled. Higher thevalues of alpha, bigger is the penalty and therefore the magnitude of coefficientsare reduced. This reduces the model complexity by coefficient shrinkage. Thisprocess is called as "regularization".

Bayesian regression techniques can be used to include regularization parametersin the estimation procedure: the regularization parameter is not set in a hardsense but tuned to the data at hand [60]. It estimates a probabilistic model ofthe ridge regression problem [60] as:.

Page 20: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 2. Background and Related Work 12

The priors over α and ω are chosen to be gamma distributions, the conjugateprior for the precision of the Gaussian [60]. Instead of setting lambda manuallyas in Ridge regression, In Bayesian form of ridge regression, it is possible to treatit as a random variable to be estimated from the data.

Thus in simple words, adding a penalty term of a linear regression problem helpsto generalize the model and the takes a probabilistic decisions while choosinguncertain parameters of the Ridge model.

2.2.3 Decision Tree Regressor

Decision Trees (DTs) are a non-parametric supervised learning method used forclassification and regression. The goal is to create a model that predicts the valueof a target variable by learning simple decision rules inferred from the data fea-tures [59].

The method starts by searching for every distinct values of all its predictors,and splitting the value of a predictor that minimizes the following statistic (otherregression tree models have different optimization criteria):

where y1bar and y2bar are the average values of the dependent variable in groupsS1 and S2.

For groups S1 and S2, the method will recursively split the predictor valueswithin groups. In practice, the method stops when the sample size of the splitgroup falls below certain threshold.

To prevent over-fitting, the constructed tree can be pruned by penalizing theSSE (Sum of Sqaured Errors) with tree size:

where St is the size of the tree (number of terminal nodes), and cp is complexityparameter. Smaller cp will lead to larger trees, and vice versa.

Figure 2.2 provides a better illustration of this type of models through a smallexample of a regression tree:

Page 21: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 2. Background and Related Work 13

Figure 2.2: Example of Decision Tree Regression

As there are four distinct paths from the root node to the leaves, this tree di-vides the input space in four different regions. The conjunction of the tests ineach path can be regarded as a logical description of such regions, as shown above

Unlike linear regression models that calculate the coefficients of predictors, treeregression models calculate the relative importance of predictors. The relativeimportance of predictors can be computed by summing up the overall reductionof optimization criteria like SSE.

2.2.4 Random Forest Regressor

A Random Forest is an ensemble technique capable of performing both regressionand classification tasks with the use of multiple decision trees and a techniquecalled Bootstrap Aggregation, commonly known as bagging [61]. Bagging, in theRandom Forest method, involves training each decision tree on a different datasample where sampling is done with replacement.

The random forest model is a type of additive model that makes predictionsby combining decisions from a sequence of base models [61]. More formally wecan write this class of models as:

where the final model gi is the sum of simple base models fi. Here, each baseclassifier is a simple decision tree [61]. This broad technique of using multiplemodels to obtain better predictive performance is called model ensembling [60].In random forests, all the base models are constructed independently using a dif-ferent subsample of the data [61].

To say it in simple words: Random forest builds multiple decision trees andmerges them together to get a more accurate and stable prediction.Random For-est adds additional randomness to the model, while growing the trees. Instead

Page 22: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 2. Background and Related Work 14

of searching for the most important feature while splitting a node, it searches forthe best feature among a random subset of features [69]. This results in a widediversity that generally results in a better model.

Therefore, in Random Forest, only a random subset of the features is taken intoconsideration by the algorithm for splitting a node [69], which is why randomforest is known for lesser memory usage and higher prediction speed.

2.3 Related WorkAfter identifying the research area, it is important to conduct a pre-study in or-der to understand the work that has been already done in relevance to it. So,a thorough research work is performed to see if any research has been done onintegrating machine learning with design engineering or industrial engineering inorder to make decisions.

Initial research work on Data-driven decision support tools, especiallystatistical analysis:Studies have been conducted where authors have identified and implemented sta-tistical tools used for decision-making. Kayaalp et al. (1997) in an initial studyto integrate decision making and data, has used a statistical method known as amodel switching method to solve a binary decision problem i.e. yes or no problemin the context of word classification [37].

L.Henesey et al. has used a statistical approach along with experience-orientedvariables to implement a decision support tool for Port Authorities, which couldhelp them decide if Short Sea Shipping can be beneficial for them or not [38].

Sabina et al. presented a comprehensive a tool while implements statistical anal-ysis of multiple legacy data sets which is then used as a support tool in thedecision-making process for the higher management in the company [39].

Kusiak (2006) recognises for data mining techniques the quality of being ableto fit the gap between tools used in decision-making and their linkage to data[65]. His paper describes eight different examples of applications in manufactur-ing and service, spanning from process control, to production of semiconductors,to biotechnology and medical/pharmaceutical applications; although no examplesof application of data mining for engineering design tasks in presented [65] [26].

Introducing Machine Learning in the industrial decision support tools:In work by Yam et al, the authors uses recurrent neural networks are used to pre-dict the faults of the critical equipment, thus useful in scheduling the maintenance

Page 23: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 2. Background and Related Work 15

work [66]. This work is also an initial inspiration for the researchers working tocreate predictive diagnostics systems for various industries.

When it comes to using machine learning as a decision support tool, the studyby J.Merkert et al. gives a brief survey of all the applications of machine learningwhich is presently used as a decision support system in various industries andalso describes the evolution of integrating machine learning in decision supportsystems from 1953 to 2014 [40].

Design Engineering:In the field of Swedish automobile industry and construction industry, a studyby Henriksson et al. has identified design tools of Swedish automobile indus-try that combines experinced based analytical models along with data insights,which also acts as a decision support tool in several aspects of design context [25].

When it comes to designing heavy-loaded vehicles, the design engineers use classi-cal optimization algorithms as decision support systems to design their suspensionsystems. This optimization algorithms uses various parameters so that the heavy-loaded vehicle can be highly productive in its tasks [42].

The key similarities among the mentioned works and this thesis are:

• All the studies aim to solve major decision making complexities in industrieswith data-driven techniques.

• All papers tend to introduce a structure/process for decision making invarious industries.

The key differences among the mentioned works and this thesis are:

• The mentioned studies introduce experienced based models combined withstatistical insights (Partially data driven) while this thesis follows a com-plete data-driven procedure .

• Some of the mentioned studies implement statistical solutions to reduce un-certainties in decision making while this thesis implements machine learningalgorithms.

• Some papers which introduce machine learning as a solution to implementdecision tool but does not provide with an extensive example on whichalgorithm to choose but thesis fulfills the former mentioned condition.

• In the design engineering context, there has been no completely data drivensolutions available, which this thesis aims to do so.

Page 24: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 3Method

The research method chosen to answer the research questions posed in this thesisis the Experiment. Conducting an experiment on a quantitative data is thebest approach for obtaining desirable results as experiments give more controlover variables and a bigger subject size over other descriptive research methodslike a case study or a survey [4].

The goal of the experiment in this thesis is to evaluate the regression performanceof the Support Vector Machine Regressor, Bayesian Ridge Regressor,Decision Tree Regressor and Random Forest Regressor on the sensor dataextracted from the autonomous construction equipment, which also acts as theexperimental data in this thesis. The experimental results are analyzed and com-pared in order to select the algorithm with the best regression performance amongthem.

The methodology followed in this thesis can be visualized as:

16

Page 25: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 3. Method 17

Figure 3.1: Methodology followed in this thesis

The independent and dependent variables of the experiment are as follows:Independent Variables: Size of the experimental data set, Support VectorMachine Regressor, Bayesian Ridge Regressor, Decision Tree Regressor, RandomForest RegressorDependent Variables: Performance Measures i.e. Mean Forecast Error (orForecast Bias), Mean Absolute Error, Mean Squared Error, Root Mean SquaredError and Training Time.

Page 26: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 3. Method 18

3.1 Software Environment

3.1.1 Python

Python is a high-level programming language designed to be easy to read andsimple to implement. It is open source, which means it is free to use, even forcommercial applications [10]. The programming features of Python which provedto be useful for the experiments conducted in this thesis are:

1. A variety of basic data types are available: numbers (floating point, com-plex, and unlimited-length long integers), strings (both ASCII and Uni-code), lists, and dictionaries [7].

2. Code can be grouped into modules and packages. The language supportsraising and catching exceptions, resulting in cleaner error handling.

3. Data types are strongly and dynamically typed. Mixing incompatible types(e.g. attempting to add a string and a number) causes an exception to beraised, so errors are caught sooner [7].

4. Python contains advanced programming features such as generators and listcomprehensions [7].

The following libraries are used in the experiment in this thesis:

• Pandas is an open source, BSD-licensed library providing high-performance,easy-to-use data structures and data analysis tools for the Python program-ming language [12].

• Numpy is the fundamental package for scientific computing with Pythonand has a powerful N- dimensional array object which is used in the exper-iment [11].

• matplotlib is a Python 2D plotting library which produces publicationquality figures in a variety of hardcopy formats and interactive environmentsacross platforms [13].

• seaborn is a Python data visualization library based on matplotlib. It pro-vides a high-level interface for drawing attractive and informative statisticalgraphics [14].

• sklearn is a open source library having simple and efficient tools for datamining and data analysis

• xgboost is an optimized distributed gradient boosting library designed tobe highly efficient, flexible and portable [15].

Page 27: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 3. Method 19

• timeit provides a simple way to time small bits of Python code. It is usedto calculate the training time of the algorithms used in this experiment [16].

The algorithms used in the experiment are implement using sklearn. Sklearnhas built-in models of algorithms which can be imported directly, trained andtested on the experimental data. In this experiment, following sklearn tools wereimported:

• SVR (Support Vector Regressor)

• BayesianRidge

• DecisionTreeRegressor

• RandomForestRegressor

For evaluating these regression models, following sklearn tools were imported:

• mean_absolute_error

• mean_squared_error

3.1.2 Jupyter Notebook

Jupyter Notebook is used as an IDE for writing the python scripts for conductingthe experiment. It is an open-source web application that allows the user tocreate and share documents that contain live code, equations, visualizations andnarrative text [43].

3.2 Data setThe data-set used in for the experiment in this thesis is a time series sensor dataextracted from an autonomous fully loaded construction equipment. It has 2208columns and 8 rows.As the data is in time series format, the next step is to convert the time seriesproblem to a supervised machine learning regression problem. Also walk forwardvalidation is used in order to divide the data sets into train and test data in aninterative manner which is explained in the further sections.

Page 28: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 3. Method 20

3.3 Data PreprocessingThe data extracted from the equipment is in the form of time series data andin order to use classical machine learning regression algorithms on the data, itis important to convert this problem from a time series problem to a classicalsupervised learning problem. One way to achieve this is to use the SlidingWindow method. There are several other methods available such as RecurrentSliding Windows, Hidden Markov Models, Conditional Random Fields, GraphTransformer Networks which makes it difficult to decide upon one among thewhich gives better results. But as previous research showed the sliding windowmethod converts the time series data to classical supervised data with an easeof implementation, low computational cost and low memory consumption [18].Thus it is chosen as the conversion method in this experiment.

3.3.1 Sliding Window Method

The sliding window method converts the sequential supervised learning probleminto the classical supervised learning problem. Given a sequence of numbers fora time series data-set, the previous time steps can be used as input variables andthe next time step can be used as the output variable.

For a better understanding of sliding window a simple example,

Figure 3.2: Example time series data (Source: Machine Learning Mastery)

The time series problem in figure 3.2 can be framed in a supervised learningproblem by using sliding window method. Figure 3.3 is the the same time seriesproblem after applying sliding window of window width 1. In this case bothmeasure1 and measure2 can be predicted for the next time step

Page 29: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 3. Method 21

Figure 3.3: After applying Sliding Window method (Source: Machine LearningMastery)

Few observations can be made from analyzing the figure 3.3:

• Previous time step is the input (X1, X2) and the next time step is theoutput (Y1, Y2) in the supervised learning problem respectively.

• Order between the observations is preserved.

• The first and last row has to be deleted as o previous value that we can useto predict the first value in the sequence

This problem in figure 3.3 can be formulated in such as way, where X1, X2, Y1can be the predictor variables and Y2 can be a response variable. It is a pointto note that this is just an example and the same problem can be formulated inmany other ways depending on the requirements.

Thus to convey sliding window method in mathematical terms,A window classifier is constructed hw that maps an input window of width w intoan individual output value y [18].

Specifically, let d = (w1)/2 be the half-width of the window.Then hw predicts yi, t using the window hxi, td, xi, td + 1, ..., xi, t, ..., xi, t +d1, xi, t + di. In effect, the input sequence xi is padded on each end by d nullvalues and then converted into Ni separate examples [18].

The window classifier hw is trained by converting each sequential training ex-ample (xi, yi) into windows and then applying a standard supervised learningalgorithm.

Page 30: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 3. Method 22

A new sequence x is classified by converting it to windows, applying hw to pre-dict each yt and then concatenating the yt’s to form the predicted sequence y [18].

The width of the sliding window or the lag value is chosen PACF analysis [27].In multivariate time series data, PACF analysis is implemented on the label datato choose the width of the sliding window method. Before the analysis, the labeldata is checked for its stationarity. The stationarity of the time series data (inthis thesis, label data) is checked using Augmented Dickey-Fuller test.

Augmented Dickey-Fuller test

The Augmented Dickey-Fuller test is a type of statistical test called a unit roottest. The intuition behind a unit root test is that it determines how strongly atime series is defined by a trend [44].

The null hypothesis of the test is that the time series can be represented bya unit root, that it is not stationary (has some time-dependent structure) [44].The alternate hypothesis (rejecting the null hypothesis) is that the time series isstationary [44].

• Null Hypothesis (H0): If failed to be rejected, it suggests the time serieshas a unit root, meaning it is non-stationary. It has some time dependentstructure [44].

• Alternate Hypothesis (H1): The null hypothesis is rejected; it suggeststhe time series does not have a unit root, meaning it is stationary. It doesnot have time-dependent structure [44].

The result can be interpreted as follows:

• p-value > 0.05: Fail to reject the null hypothesis (H0), the data has a unitroot and is non-stationary.

• p-value <= 0.05: Reject the null hypothesis (H0), the data does not havea unit root and is stationary.

If the time series is non-stationary, the best method to make time series datastationary is to use the Differencing transform [30].

Differencing Transform

Differencing can help stabilize the mean of the time series by removing changes inthe level of a time series, and so eliminating (or reducing) trend and seasonality[45].

Page 31: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 3. Method 23

Differencing is performed by subtracting the previous observation from the cur-rent observation [46].

difference(t) = observation(t)− observation(t− 1) (3.1)

Inverting the process is required when a prediction must be converted back intothe original scale. This process can be reversed by adding the observation at theprior time step to the difference value [46].

inverted(t) = differenced(t) + observation(t− 1) (3.2)

After implementing Differencing transform, PACF analysis is conducted to de-cided the optimal lag value for the sliding window method. As PACF only de-scribes the direct relationship between an observation and its lag, the result ofPACF analysis would suggest that there would be no correlation for lag valuesbeyond the optimal lag value. Usually the optimal lag value is the value wherethe PACF chart crosses the upper confidence interval for the first time [34].

Figure 3.4: PACF chart (Source: Kaggle)

For example, in figure 3.4 the The lag value where the PACF chart crosses theupper confidence interval for the first time is 2.

Although the sliding window method gives adequate performance in many ap-plications, it does not take advantage of correlations between nearby yt values[18]. To be more precise, the only relationships between nearby yt values thatare captured are those that are predictable from nearby xt values. If there arecorrelations among the yt values that are independent of the xt values, then theseare not captured [18].

Page 32: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 3. Method 24

To solve this problem we use a correlation coefficient which gives an indication ofhow related the changes are between two feature variables [23]. This correlationcoefficient is also be as a feature selector. Correlation coefficient as a featureselector is simple and fast to execute. It eliminates irrelevant and redundant dataand, in many cases, improves the performance of learning algorithms [47].

3.3.2 Correlation based Feature Selection

The objective of variable selection is three-fold: improving the prediction perfor-mance of the predictors, providing faster and more cost-effective predictors, andproviding a better understanding of the underlying process that generated thedata [22]. Few benefits of feature selection are:

• Less redundant data means less opportunity to make decisions based onnoise.

• Less misleading data means modeling accuracy improves.

• Time: Less data means that algorithms train faster.

Correlation Coefficient can be used as a feature selector too. If predictor datachange in the same direction as the response data they are positively correlated.If the change in opposite directions together (one goes up, one goes down), thenthey are negatively correlated. If it has no effect on the response data, it meansthe predictor variable can be effectively eliminated from the data set.

The term correlation refers to a mutual relationship or association between twofeature variables [19]. More formally, correlation is a statistical measure that de-scribes the association between random variables. There are several methods forcalculating the correlation coefficient, each measuring different types of strengthof association, but Spearman correlation coefficient is the widely used methodfor problems where better estimates of dependencies in multivariate time seriesdata and more robust than Pearson Correlation Coefficient [19]. Pearson mea-sures linear dependence whereas Spearman measure are invariant by monotonoustransforms of the variables in the data.

Pearson coefficient is a most common used correlation coefficient to explore lineardependencies and is formulated as:

ρ =cov(X, Y )

σxσy

(3.3)

where (rho) measures the relationship between two variables.

Page 33: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 3. Method 25

But a question arises if Pearson correlation is a right method to explore dependen-cies as the data-set in this thesis is not normal distributed and the variables maynot be linearly dependant rather may have complex relationships. But Spear-man’s rho, which is rank-based version of Pearson’s correlation coefficient, canbe used for variables that are not normal-distributed and have a non-linear re-lationship. This is because Spearman’s correlation determines the strength anddirection of the monotonic relationship between the given variables rather thanthe strength and direction of the linear relationship between variables, which iswhat Pearson’s correlation determines.

The original formula for correlation, developed by Spearman himself, uses rawdata and the means of two variables, X and Y:

ρ = 1− 6∑

d2in(n2 − 1)

(3.4)

where,d = the pairwise distances of the ranks of the variables xi and yi .n = the number of samples.

Thus in this thesis, Spearman correlation coefficient is used as a feature selectionmethod. Also as all the feature variables show some kind of correlation with thelabel data, thus there is no elimination of the feature variables.

3.4 Walk Forward ValidationWalk Forward Validation is a testing and training procedure followed by the ma-chine learning models, the procedure is optimized with in-sample data for a timewindow in a data series. The remainder of the data are reserved for out of sampletesting [64]. A small portion of the reserved data following the in-sample data istested with the results recorded.

This testing and training procedure contains an outer loop for error estimationand an inner loop for parameter tuning (see Figure 3.5). The inner loop worksexactly as discussed before: the training set is split into a training subset and avalidation set, the model is trained on the training subset, and the parametersthat minimize error on the validation set are chosen. However, an outer loopwhich splits the dataset into multiple different training and test sets, and theerror on each split is averaged in order to compute a robust estimate of modelerror.

Page 34: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 3. Method 26

Figure 3.5: Walk Forward Validation

This procedure is used instead of the commonly used Cross validation (CV)method as the data is sequential in manner. While CV method tends to ran-domize the testing and training sets, the walk forward validation method tendto conserve sequential behaviour of the data-set. As it cannot be argued whichis the better method as the results can empirically vary with different data-sets,thus walk-forward validation is randomly chosen for the experiment in this the-sis. It is also a point to note that if the walk forward validation is carried outfor 10 folds, there are 9 iterations (similar to the behaviour of nested for loop inprogramming).

3.5 Experimental Setup• Perform 10-fold walk forward validation with Support Vector Machine Re-

gressor, Bayesian Ridge Regressor, Decision Tree Regressor and RandomForest Regressor on the data-set.

• The performance metrics of the algorithm are noted down every fold. Theexperimental results are then analyzed and compared for selecting the al-gorithm that has exhibited the best regression performance.

Page 35: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 3. Method 27

3.5.1 Performance Metrics

Selection of evaluation metrics should be made in accordance to the class propor-tions in the experimental data and also the regression problem [48]. The metricsare commonly used to evaluate the time series forecasting problems are also cho-sen to evaluate the performance of the machine learning algorithms as the datawas initially in a time series manner. This also makes this thesis comparable toother existing research work which usually use classical time series forecastingmethods (statistical method) to reduce the uncertainties in decision making invarious industries. This point is also mentioned in the related work section. Themetrics used to evaluate the machine learning algorithms in this thesis are:

Mean Absolute Error

MAE measures the average magnitude of the errors in a set of predictions, with-out considering their direction [49]. It’s the average over the test sample of theabsolute differences between prediction and actual observation where all individ-ual differences have equal weight [49].

Mean Absolute Error (MAE) =1

n

n∑t=1

|et|

Mean Squared Error

In statistics, the mean squared error (MSE) or mean squared deviation (MSD)of an estimator (of a procedure for estimating an unobserved quantity) measuresthe average of the squares of the errors—that is, the average squared differencebetween the estimated values and what is estimated. MSE is a risk function,corresponding to the expected value of the squared error loss [50. The fact thatMSE is almost always strictly positive (and not zero) is because of randomnessor because the estimator does not account for information that could produce amore accurate estimate [50].

Mean Squared Error (MSE) =1

n

n∑t=1

e2t

Root Mean Squared Error

RMSE is a quadratic scoring rule that also measures the average magnitude ofthe error. It’s the square root of the average of squared differences between pre-diction and actual observation.

Root Mean Squared Error (RMSE) =

√√√√ 1

n

n∑t=1

e2t

There is also a statistical intuition behind selecting these performance metrics.

Page 36: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 3. Method 28

These three metrics tend to trade-off each others disadvantages and advantages.

Why RMSE over MAE?RMSE instead of MAE: when the observations’ conditional distribution is asym-metric (which are) and in order to obtain an unbiased fit.The RMSE is minimized by the conditional mean, the MAE by the conditional me-dian. So if the MAE is minimized the, the fit will be closer to the median andbiased.

Why MAE over MSE?MAE does not penalizes huge errors as MSE does. Thus, it’s not that sensitiveto outliers as mean square error.

Why RMSE over MSE?Easy interpretation of results (all are scaled at one level). Also to determine ifthere is no overfitting or underfitting

Training time:

To get an understanding of the cost that is required for train these algorithms(in terms of time), training times in each fold of 10-fold walk forward validationis also measured.

3.5.2 Statistical Test

Even though the regression results of our chosen algorithms can be comparedmanually, it is important to perform statistical tests on them. This is because,there is a huge risk of misinterpretation of results if they are compared manually,which would result in incorrect conclusions. As the results obtained are not ina form of Gaussian distribution, a statistical test called as Kolmogorov-Smirnovtest is performed to compare the regression performance of the machine learningmodels.

Kolmogorov-Smirnov (KS) is a two-sided test for the null hypothesis that 2 in-dependent samples are drawn from the same continuous distribution [67]. Thetest is non-parametric. It does not assume that data are sampled from Gaussiandistributions (or any other defined distributions) [68].

The null hypothesis of KS test is that both groups were sampled from populationswith identical distributions. It tests for any violation of that null hypothesis –different medians, different variances, or different distributions [68]. Since thetest does not compare any particular parameter (i.e. mean or median), it doesnot report any confidence interval [68].

Page 37: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 4Results

4.1 PACF AnalysisPACF analysis is conducted on the label data to get an optimal window size forthe sliding window method. The sliding window can therefore be later appliedonly on label data or on entire data-set with the obtained window size value. Inthis experiment, sliding window method is applied on the entire data-set. Butbefore jumping into that, there is a need to check for the stationarity of the labeldata.

Figure 4.1: Augmented Dickey-Fuller Test

Clearly from the figure 4.1 as p value is greater than 0.05, thus the data is clearlynon stationary and thus Differencing Transform is applied on the label data.

Figure 4.2: Augmented Dickey-Fuller Test after applying Differencing Transformon the Label data

29

Page 38: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 4. Results 30

After applying Differencing transform on the label data, it can be noted formFigure 4.2 that p value is lesser than 0.05, hence the label data is now stationaryand can be analyzed for partial auto-correlations.

Figure 4.3: PACF chart

It is observed in the above figure (4.3) , the PACF chart of the label data startscross the upper confidence interval at around lag=2.4 or width= 2.4. Hence theoptimal window width for sliding window is 2 (rounding off 2.4) in all data-sets.This lag value is applied to all the feature variables in the data set along withthe label data, in order to have more features to test the dependencies.

4.2 Support Vector Machine (SVM) RegressorSupport Vector Machine is trained with the multiple experimental data sets alongwith 10-fold walk forward validation. The performance measures such as MSE,MAE and RMSE obtained by Support Vector Machine Regressor are noted down.The results obtained by SVM Regressor on the data set is as follows:In Figure 4.4, the box-plot represents the Mean Squared Error (MSE) achieved bySVM Regressor on a 10-fold walk forward validation tests. The median (middlequartile) of the box-plot median MSE which is 0.05462. The lower quartile ofthe box-plot represents minimum MSE which is 0.02632 and the upper whiskersof box-plot represent the maximum MSE which is 0.99705. The triangle in thebox-plot represents the mean MSE which is 0.181318.

Page 39: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 4. Results 31

Figure 4.4: SVM-R MSEboxplot

Figure 4.5: SVM-R MAEboxplot

Figure 4.6: SVM-RRMSE boxplot

In Figure 4.5, the box-plot represents the Mean Absolute Error (MAE) achievedby SVM Regressor on a 10-fold walk forward validation tests. The median (middlequartile) of the box-plot median MAE which is 0.20688. The lower quartile ofthe box-plot represents minimum MAE which is 0.15693 and the upper whiskersof box-plot represent the maximum MAE which is 0.89311. The triangle in thebox-plot represents the mean MAE which is 0.27759.

In Figure 4.6, the box-plot represents the Root Mean Squared Error (RMSE)achieved by SVM Regressor on a 10-fold walk forward validation tests. The me-dian (middle quartile) of the box-plot median RMSE which is 0.23371. Thelower quartile of the box-plot represents minimum RMSE which is 0.16223 andthe upper whiskers of box-plot represent the maximum RMSE which is 0.99852.The triangle in the box-plot represents the mean RMSE which is 0.300181.

Page 40: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 4. Results 32

Figure 4.7, represents the Training time of SVM Regressor on 10-fold cross valida-tion tests. From this figure, it is found that the average training time of SVM inall ten folds is 0.102854 seconds. Whereas the maximum and minimum trainingtimes are 0.247750 seconds and 0.004185 seconds respectively.

4.3 Bayesian Ridge RegressorBayesian Ridge Regressor is trained with the multiple experimental data setsalong with 10-fold walk forward validation. The performance measures such asMSE, MAE and RMSE obtained by Bayesian Ridge Regressor are noted down.The results obtained by Bayesian Ridge Regressor on the data set is as follows:

Figure 4.8: BayesianRidge Regressor MSEboxplot

Figure 4.9: BayesianRidge Regressor MAEboxplot

Figure 4.10: BayesianRidge Regressor RMSEboxplot

In Figure 4.8, the box-plot represents the Mean Squared Error (MSE) achieved byBayesian Ridge Regressor on a 10-fold walk forward validation tests. The median(middle quartile) of the box-plot median MSE which is 0.0. The lower quartileof the box-plot represents minimum MSE which is 0.0 and the upper whiskers ofbox-plot represent the maximum MSE which is 0.0. The triangle in the box-plotrepresents the mean MSE which is 0.0.

In Figure 4.9, the box-plot represents the Mean Absolute Error (MAE) achievedby Bayesian Ridge Regressor on a 10-fold walk forward validation tests. Themedian (middle quartile) of the box-plot median MAE which is 0.00049. Thelower quartile of the box-plot represents minimum MAE which is 0.00023 andthe upper whiskers of box-plot represent the maximum MAE which is 0.01041.The triangle in the box-plot represents the mean MAE which is 0.002435.

In Figure 4.10, the box-plot represents the Root Mean Squared Error (RMSE)achieved by Bayesian Ridge Regressor on a 10-fold walk forward validation tests.The median (middle quartile) of the box-plot median RMSE which is 0.0. Thelower quartile of the box-plot represents minimum RMSE which is 0.0 and theupper whiskers of box-plot represent the maximum RMSE which is 0.01378. Thetriangle in the box-plot represents the mean RMSE which is 0.0028122.

Page 41: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 4. Results 33

Figure 4.11 represents the Training time of Bayesian Ridge Regressor on 10-foldcross validation tests. From this figure, it is found that the average training time ofSVM in all ten folds is 0.0141833 seconds. Whereas the maximum and minimumtraining times are 0.0224215 seconds and 0.0085864 seconds respectively.

4.4 Decision Tree RegressorDecision Tree Regressor is trained with the multiple experimental data sets alongwith 10-fold walk forward validation. The performance measures such as MSE,MAE and RMSE obtained by Decision Tree Regressor are noted down. Theresults obtained by Decision Tree Regressor on the data set is as follows:

Figure 4.12: DecisionTree Regressor MSE box-plot

Figure 4.13: DecisionTree Regressor MAE box-plot

Figure 4.14: DecisionTree Regressor RMSEboxplot

In Figure 4.12, the box-plot represents the Mean Squared Error (MSE) achievedby Decision Tree Regressor on a 10-fold walk forward validation tests. The me-dian (middle quartile) of the box-plot median MSE which is 0.01747. The lower

Page 42: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 4. Results 34

quartile of the box-plot represents minimum MSE which is 0.00923 and the up-per whiskers of box-plot represent the maximum MSE which is 0.40074. Thetriangle in the box-plot represents the mean MSE which is 0.094588.

In Figure 4.13, the box-plot represents the Mean Absolute Error (MAE) achievedby Decision Tree Regressor on a 10-fold walk forward validation tests. The me-dian (middle quartile) of the box-plot median MAE which is 0.11313. The lowerquartile of the box-plot represents minimum MAE which is 0.08447 and the up-per whiskers of box-plot represent the maximum MAE which is 0.52014. Thetriangle in the box-plot represents the mean MAE which is 0.208352.

In Figure 4.14, the box-plot represents the Root Mean Squared Error (RMSE)achieved by Decision Tree Regressor on a 10-fold walk forward validation tests.The median (middle quartile) of the box-plot median RMSE which is 0.13217.The lower quartile of the box-plot represents minimum RMSE which is 0.09607and the upper whiskers of box-plot represent the maximum RMSE which is0.63304. The triangle in the box-plot represents the mean RMSE which is0.239945.

Figure 4.15 represents the Training time of Decision Tree Regressor on 10-foldcross validation tests. From this figure, it is found that the average training time ofSVM in all ten folds is 0.047248 seconds. Whereas the maximum and minimumtraining times are 0.092855 seconds and 0.009392 seconds respectively.

Page 43: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 4. Results 35

4.5 Random Forest RegressorRandom Forest Regressor is trained with the multiple experimental data setsalong with 10-fold walk forward validation. The performance measures such asMSE, MAE and RMSE obtained by Random Forest Regressor are noted down.The results obtained by Random Forest Regressor on the data set is as follows:

Figure 4.16: Random For-est Regressor MSE box-plot

Figure 4.17: Random For-est Regressor MAE box-plot

Figure 4.18: Random For-est Regressor RMSE box-plot

In Figure 4.16, the box-plot represents the Mean Squared Error (MSE) achievedby Random Forest Regressor on a 10-fold walk forward validation tests. Themedian (middle quartile) of the box-plot median MSE which is 0.01766. Thelower quartile of the box-plot represents minimum MSE which is 0.009 and theupper whiskers of box-plot represent the maximum MSE which is 0.40695. Thetriangle in the box-plot represents the mean MSE which is 0.099452.

In Figure 4.17, the box-plot represents the Mean Absolute Error (MAE) achievedby Random Forest Regressor on a 10-fold walk forward validation tests. Themedian (middle quartile) of the box-plot median MAE which is 0.11387. Thelower quartile of the box-plot represents minimum MAE which is 0.08493 andthe upper whiskers of box-plot represent the maximum MAE which is 0.52494.The triangle in the box-plot represents the mean MAE which is 0.213834.

In Figure 4.18, the box-plot represents the Root Mean Squared Error (RMSE)achieved by Random Forest Regressor on a 10-fold walk forward validation tests.The median (middle quartile) of the box-plot median RMSE which is 0.13289.The lower quartile of the box-plot represents minimum RMSE which is 0.09487and the upper whiskers of box-plot represent the maximum RMSE which is0.63793. The triangle in the box-plot represents the mean RMSE which is0.24492.

Page 44: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 4. Results 36

Figure 4.19 represents the Training time of Bayesian Ridge Regressor on 10-foldcross validation tests. From this figure, it is found that the average training timeof SVM in all ten folds is 0.29938 seconds. Whereas the maximum and minimumtraining times are 0.57667 seconds and 0.062983 seconds respectively.

Page 45: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 5Analysis

5.1 Comparative study of Performance Metrics ob-served from Support vector machines Regres-sion, Bayesian Ridge Regression, Decision TreesRegression and Random forests Regression onexperimental data set

5.1.1 Mean Absolute Error (MAE)

Figure 5.1 represents the Mean Absolute error from the predictions achieved bySVM-R or SVR, BR, DTR and RFR on 10- fold walk forward validation tests.From this figure, it can be observed that SVR has a high MAE compared to otheralgorithms except in Fold 6 it starts decreasing and in Fold 7 it is less than RFRand DTR. BR performed the best with the lowest MAE in all folds. RFR andDTR performed significantly in a similar way.

Figure 5.1: Comparison of MAE achieved by SVR, BR, DT R and RFR on 10-foldwalk forward validation tests.

37

Page 46: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 5. Analysis 38

5.1.2 Mean Squared Error (MSE)

Figure 5.2 represents the Mean Absolute error from the predictions achieved bySVM-R or SVR, BR, DTR and RFR on 10- fold walk forward validation tests.From this figure, it can be observed that SVR has a high MAE compared to otheralgorithms except in Fold 6 it starts decreasing and in Fold 7 it is less than RFRand DTR. BR performed the best with the lowest MAE in all folds but MSE ofRFR and DTR performed well until 5 folds and then abruptly increases. RFRand DTR perform in a significantly similar way.

Figure 5.2: Comparison of MSE achieved by SVR, BR, DT R and RFR on 10-foldwalk forward validation tests.

5.1.3 Root Mean Squared Error (RMSE)

Figure 5.3 represents the Mean Absolute error from the predictions achieved bySVM-R or SVR, BR, DTR and RFR on 10- fold walk forward validation tests.From this figure, it can be observed that SVR has a high RMSE compared toother algorithms except in Fold 7 it is less than RFR and DTR. BR performed thebest with the lowest RMSE in all folds. RFR and DTR performed significantlyin a similar way.

Page 47: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 5. Analysis 39

Figure 5.3: Comparison of RMSE achieved by SVR, BR, DT R and RFR on10-fold walk forward validation tests.

5.1.4 Training Time

The performance of training time of the algorithms is similar in all the data-sets. All the figures from 5.19 to 5.4 represents the Training time from thepredictions achieved by SVM-R or SVR, BR, DTR and RFR on 10- fold walkforward validation tests in all the data-sets. From this figure, it can be observedthat RFR has a high training time compared to other algorithms. BR performedthe best with the lowest training time in average. SVR has second highest trainingtime in average. DTR performs optimally with second lowest training time.

Page 48: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 5. Analysis 40

Figure 5.4: Comparison of Training Time achieved by SVR, BR, DT R and RFRon 10-fold walk forward validation tests.

5.1.5 Statistical Test

Figure 5.5: Kolmogorov-Smirnov test

Figure 5.5 is a KS test matrix between the performance measures and machinelearning algorithms. "Same" refers that the null hypothesis is accepted and "Dif-ferent" refers that the null hypothesis is rejected.

Thus from figure 5.5, it can be analyzed that SVR, DTR and RFR performedsimilarly when compared with MSE and RMSE. DTR and RFR performed simi-larly when it came to MAE but the performance of BR is different. The trainingtime is different for all algorithms which also agrees with our manual hypothesis.

It can be noted that the tree based algorithms (RFR and DTR) performed simi-larly.

Page 49: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 5. Analysis 41

5.2 Key AnalysisThe key intuitions behind performance of algorithms is discussed as follows:

• Bayesian Ridge performs the best compared to other machine learning mod-els. This can be because of the regularization technique of Bayesian Ridgewhich makes it robust to outliers and overfitting. This also explains thereason that the training time of BR decreases during walk forward valida-tion as it is able to adapt to the data at hand and generalize the model withthe increasing folds.

• Random Forest Regression has the highest training time. Sklearn does nothave a feature to prune the trees which could have caused RFR to trainlonger. Also due to this there are chances that RFR may have over-fit thetraining data.

• Support Vector Machine has the worst performance when it comes to errors.This may be because the chosen parameters of kernel function is set to ’rbf’and there may be other kernel functions (linear, radial etc) which may havehelped it to perform better.

• When it comes to errors, RFR and DTR performed similarly as both aretree based models. Upon close observation, DTR performed better thanRFR with less errors. This may be due to the overfitting problem of RFRwhich may have prevented it from generalizing the model.

5.3 DiscussionRQ1:Which supervised machine learning algorithm is best suited for predictingthe performance of the machine in correlation with the various operational con-texts in which the machine is being operated and why?

Answer: Bayesian ridge is the best suitable machine learning algorithm to pre-dict the performance/behavior of the machine in correlation with the variousoperational contexts. In this experiment, Bayesian Ridge had the least errorwhile predicting the behaviour of the machine compared to Support Vector Re-gression, Random Forest Regression and Decision Tree Regression. The reasonis due to flexibility of Bayesian Ridge in measuring "how strongly each predictorvariable influences the criterion (dependent) variable" when compared to otheralgorithms. This behaviour is called as regularization where slack variables areadded in order to prevent overfitting.

RQ2:What are the results of the best suitable algorithm?

Page 50: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 5. Analysis 42

Answer: The average MSE achieved by Bayesian Ridge across 10-fold walkforward validation is 0 which is quite impressive as there is less chances for over-fitting to occur, given the algorithm’s properties. The average MAE is 0.002435and the average RMSE is 0.0028122.

5.4 ContributionsAlthough there are been existing research focusing on statistical based data-drivendecision making, there has no existing research based on developing a data-drivendecision support tool which is based on machine learning, especially for designengineers working in construction equipment industry. This thesis shows that ma-chine learning can be used to predict the behaviour of an autonomous constructionequipment efficiently, and this can be further used to develop the decision supporttool.

5.5 Threats to validityValidity is defined as “an indication of how well an assessment actually measureswhat it is supposed to measure” [62]. Some types of validity and their mitigationstrategies are discussed in this section.

5.5.1 Internal Validity

The errors while data measurement and collection is mitigated by using the walkforward validation technique which gives the guarantee of obtaining accurateresults. The threat of missing observations in experiments is mitigated by usinga cloud backup which has all the logs of the experiment conducted and also havingone observer until the end of the experiment.

5.5.2 External Validity

External refers to the extent to which the results of the experiment are able tobe generalized confidently to a group larger than the group that participated inthe study [62]. This validity is achieved by using data from multiple machinesin this study which can be used to evaluate the algorithm and its performance.The threat of specificity of variables is mitigated by defining all the dependentvariables of this study in such a way that they are meaningful in any generalexperimental settings.

Page 51: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 5. Analysis 43

5.5.3 Conclusion Validity

Conclusion Validity verifies if the data from the experiment and the results areactually right and justified. The threat arises when there is improper selectionof evaluation measures can result in over underestimating the size of relationshipbetween independent and dependent variables in the study.To avoid this multiplemetrics are chosen to evaluate the machine learning algorithm. A proper experi-mental setup and methodology has been structured to have a standard protocolwhile conducting the experiment.

5.6 Limitations• The study has been conducted on the data-set obtained from model equip-

ment and it cannot be said that similar results will be obtained from thestudy conducted on data-set obtained from actual equipment.

• Parameters of the machine learning models are set to default (according toscikit learn) instead of choosing best parameters. This is because iteratingthrough various parameters (using Grid Search) is a costly process (timeand memory). Thus, it cannot be said if the models are fine tuned (defaultparameters can also be best parameters) according to the data, but fromthe results obtained from the experiment (error almost equals to 0), it canbe said that the thesis fulfills its aim.

Page 52: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 6Conclusions and Future Work

Design engineers working in construction machinery industry face lot of com-plexities and uncertainties, while taking important decisions during the designof construction equipment. These complexities can be reduced by the imple-mentation of a machine learning based data-driven decision support tool, whichcan predict the behaviour of the machine in an operational complexity and givesome valuable insights to the design engineer. Thus in finding such solution,various machine learning algorithms namely Support Vector Machine Regression,Bayesian Ridge Regression, Decision Tree Regression and Random Forest Re-gression have been evaluated thoroughly which can predict the behaviour of theconstruction machine and then further be involved in development of such data-driven decision support tool. Through the evaluations conducted with variousperformance metrics and statistical tests, it is found out that Bayesian Ridge isthe suitable algorithm which can be used for data-driven decision support tooland thus fulfilling the aim of this thesis.

The future work involves comparing various visualization techniques and meth-ods in order to visualize the results obtained in this thesis. These visualizationresults must be self-explanatory for the design engineers and help them makebetter decisions.

Another future work involves comparing the results obtained in this thesis withthe results obtained from deep learning methods and statistical methods whichcan establish a new open gate for the researchers exploring the industrial machinelearning field.

44

Page 53: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

References

[1]I. Staff, “Time Series,” Investopedia, 12-Mar-2006. [Online]. Available: https://www.investopedia[Accessed: 22-Aug-2018].[2]S. Dersten, P. Wallin, J. Froberg, and J. Axelsson, “Analysis of the informationneeds of an autonomous hauler in a quarry site,” in 2016 11th System of SystemsEngineering Conference (SoSE), Kongsberg, Norway, 2016, pp. 1–6.[3]P. Louridas and C. Ebert, “Machine Learning,” IEEE Software, vol. 33, no. 5,pp. 110–115, Sep. 2016.[4]J. L. Blanco, S. Fuchs, M. Parsons, and M. J. Ribeirinho, “Artificial intelligence:Construction technology’s next frontier | McKinsey Company.”[5] M. Mohammed, M. Khan and E. Bashier, Machine Learning. (1st ed.) 2016.[6]“6.4.1. Definitions, Applications and Techniques.” [Online]. Available: https://www.itl.nist.gov/d[Accessed: 22-Aug-2018].[7]“BeginnersGuide/Overview - Python Wiki.” [Online]. Available: https://wiki.python.org/moin/B[Accessed: 15-Dec-2018].[8]“IRIS: Waveform Data.”[9]“machine learning – Britannica Academic.” [Online].[10]“Python Definition.” [Online]. Available: https://techterms.com/definition/python.[Accessed: 15-Dec-2018].[11]“NumPy — NumPy.” [Online]. Available: http://www.numpy.org/. [Ac-cessed: 15-Dec-2018].[12]“Python Data Analysis Library — pandas: Python Data Analysis Library.”[Online]. Available: https://pandas.pydata.org/. [Accessed: 15-Dec-2018].[13]“Matplotlib: Python plotting — Matplotlib 3.0.2 documentation.” [Online].Available: https://matplotlib.org/. [Accessed: 15-Dec-2018].[14]“seaborn: statistical data visualization — seaborn 0.9.0 documentation.” [On-line]. Available: https://seaborn.pydata.org/. [Accessed: 15-Dec-2018].[15]“XGBoost Documentation — xgboost 0.81 documentation.” [Online]. Avail-able: https://xgboost.readthedocs.io/en/latest/. [Accessed: 15-Dec-2018].[16]“XGBoost Documentation — xgboost 0.81 documentation.” [Online]. Avail-able: https://xgboost.readthedocs.io/en/latest/. [Accessed: 15-Dec-2018].[17]“How to Use a Three-Axis Accelerometer for Tilt Sensing - DFRobot Elec-tronic Product Wiki and Tutorial: Arduino and Robot Wiki-DFRobot.com.”[18]T. G. Dietterich, “Machine Learning for Sequential Data: A Review,” in Struc-tural, Syntactic, and Statistical Pattern Recognition, vol. 2396, T. Caelli, A.

45

Page 54: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 6. Conclusions and Future Work 46

Amin, R. P. W. Duin, D. de Ridder, and M. Kamel, Eds. Berlin, Heidelberg:Springer Berlin Heidelberg, 2002, pp. 15–30.[19]Yule, G.U. (1926) "Why do we Sometimes get Nonsense-Correlations betweenTime-Series?" J.Roy.Stat.Soc., 89, 1, pp. 1-63.[20]“Is it reasonable to use Pearson correlation on time-series data?,” Research-Gate. [Accessed: 16-Dec-2018].[21]J. Brownlee, “How To Backtest Machine Learning Models for Time SeriesForecasting,” Machine Learning Mastery, 18-Dec-2016.[22]I. Guyon and A. Elisseeff, “An Introduction to Variable and Feature Selection,”p. 26.[23]R. Hindriks et al., “Can sliding-window correlations reveal dynamic functionalconnectivity in resting-state fMRI?,” Neuroimage, vol. 127, pp. 242–256, Feb.2016.[24]S. Makridakis, E. Spiliotis, and V. Assimakopoulos, “Statistical and MachineLearning forecasting methods: Concerns and ways forward,” PLOS ONE, vol. 13,no. 3, p. e0194889, Mar. 2018.[25]F. Henriksson and K. Johansen, “Product development in the Swedish Auto-motive industry: Can design tools be viewed as decision support systems?,” inDIVA, 2015.[26] A. Bertoni, T. Larsson, J. Larsson, and J. Elfsberg, “MINING DATA TODESIGN VALUE: A DEMONSTRATOR IN EARLY DESIGN,” 2017.[27]“Automatic lag selection in time series forecasting using multiple kernel learn-ing | SpringerLink.” [Online]. [28]“IBM Knowledge Center - Autocorrelation andPartial Autocorrelation Functions,” 24-Oct-2014. [Online]. Available: undefined.[Accessed: 01-Jan-2019].[29]"Palma, Wilfredo. Time Series Analysis". New York: John Wiley & Sons,Incorporated, 2016. Accessed January 1, 2019. ProQuest Ebook Central.[30]G. van de Ven, “STAT 248: Removal of Trend & Seasonality Handout 4,” p.9.[31]J. D. Seo, “Trend, Seasonality, Moving Average, Auto Regressive Model: MyJourney to Time Series Data with. . . ,” Towards Data Science, 02-Jun-2018.[32]“Stationarity and Unit Roots Tests, Unit Roots tests, Dickey-Fuller test, Aug-mented Dickey-Fuller test, Phillips and Perron tests - Financial Econometrics.”[33]O. Ostashchuk, “Time Series Data Prediction and Analysis,” p. 78.[34]J. Brownlee, “A Gentle Introduction to Autocorrelation and Partial Autocor-relation,” Machine Learning Mastery, 05-Feb-2017.[35]P. S. P. Cowpertwait and A. V. Metcalfe, Introductory Time Series with R,2009 edition. Dordrecht; New York: Springer, 2009.[36] “Supervised learning workflow and algorithms” MatchWorks. [Online].[37]M. Kayaalp, T. Pedersen, and R. Bruce, “A Statistical Decision MakingMethod: A Case Study on Prepositional Phrase Attachment,” in CoNLL97: Com-putational Natural Language Learning, 1997.

Page 55: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 6. Conclusions and Future Work 47

[38]M. Yonge and L. Henesey, “A Decision Tool for Identifying the Prospects andOpportunities for Short Sea Shipping,” p. 13.[39]K. Sabina and B. Witold, "STATISTICAL METHODS AS A DECISIONMAKING TOOL FOR PRODUCTION ENGINEERING– AN EXEMPLARYAPPLICATION."[40]J. Merkert, M. Mueller, and M. Hubl, “A Survey of the Application of MachineLearning in Decision Support Systems,” p. 16, 2015.[41]Y. Cao, K. W. Chau, M. Anson, and J. Zhang, “An Intelligent Decision Sup-port System in Construction Management by Data Warehousing Technique,” inEngineering and Deployment of Cooperative Information Systems, vol. 2480, Y.Han, S. Tai, and D. Wikarski, Eds. Berlin, Heidelberg: Springer Berlin Heidel-berg, 2002, pp. 360–369.[42]K. Wallace and S. Burgess, “Methods and tools for decision making in engi-neering design,” Design Studies, vol. 16, no. 4, pp. 429–446, Oct. 1995.[43]“Project Jupyter.” [Online]. Available: https://www.jupyter.org. [Accessed:10-Jan-2019].[44]J. Brownlee, “How to Check if Time Series Data is Stationary with Python,”Machine Learning Mastery, 29-Dec-2016.[45]“Forecasting: principles and practice: Rob J Hyndman, George Athanasopou-los: 8601404468544: Amazon.com: Books.” [Online].[46]J. Brownlee, “How to Remove Trends and Seasonality with a Difference Trans-form in Python,” Machine Learning Mastery, 09-Jul-2017.[47]M. A. Hall, “Correlation-based Feature Selection for Machine Learning,” p.198.[48]“Machine Learning Mastery With Python,” Machine Learning Mastery.[49]JJ, “MAE and RMSE — Which Metric is Better?,” Human in a MachineWorld, 23-Mar-2016.[50] Lehmann, E. L.; Casella, George (1998). Theory of Point Estimation (2nded.). New York: Springer.[51] Statistics in Plain English, Third Edition (9780415872911): Timothy C. Ur-dan: Books.” [Online].[52]J. Brownlee, “How to Use Parametric Statistical Significance Tests in Python,”Machine Learning Mastery, 17-May-2018.[53]“What is unsupervised learning? - Definition from WhatIs.com,” WhatIs.com.[54]“What is reinforcement learning? - Definition from WhatIs.com,” SearchEn-terpriseAI.[55]J. Brownlee, “Difference Between Classification and Regression in MachineLearning,” Machine Learning Mastery, 10-Dec-2017.[56]T. Afonja, “Kernel Functions,” Towards Data Science, 02-Jan-2017. [Online].[57]“Understanding Support Vector Machine Regression - MATLAB & Simulink.”[58] “Decision Trees”, scikit learn. [Online]. Available:http://scikitlearn.

org/stable/modules/tree.html [Accessed: 05-May-2017]. [59]“1.10. Decision

Page 56: Machine Learning Approach to the Design of Autonomous ...bth.diva-portal.org/smash/get/diva2:1291227/FULLTEXT02.pdfMachine Learning Approach to the Design of Autonomous Construction

Chapter 6. Conclusions and Future Work 48

Trees — scikit-learn 0.20.2 documentation.” [Online]. Available: https://scikit-learn.org/stable/modules/tree.htmltree. [Accessed: 12-Jan-2019].[60]“1.1. Generalized Linear Models — scikit-learn 0.20.2 documentation.” [On-line].[61]“Random Forest Regression | Turi Machine Learning Platform User Guide.”[Online].[62] I. N. Serbec, M. Strnad, and J. Rugelj, Assessment of wiki-supported collab-orative learning in higher education. IEEE, 2010.[63]“The Bayesian approach to ridge regression,” R-bloggers, 30-Oct-2016.[64]“Walk forward optimization,” Wikipedia. 13-Jan-2018.[65] A. Kusiak, “Data mining: manufacturing and service applications,” Interna-tional Journal of Production Research, vol. 44, no. 18–19, pp. 4175–4191, Sep.2006.[66] R. C. M. Yam, P. W. Tse, L. Li, and P. Tu, “Intelligent Predictive DecisionSupport System for Condition-Based Maintenance,” Int J Adv Manuf Technol,vol. 17, no. 5, pp. 383–391, Feb. 2001.[67]“scipy.stats.ks_2samp — SciPy v1.2.0 Reference Guide.” [Online].[68]“Interpreting results: Kolmogorov-Smirnov test.” [Online].[69]N. Donges, “The Random Forest Algorithm,” Towards Data Science, 22-Feb-2018. [Online].[70]“Ridge Regression,” p. 21.


Recommended