Centralized log management for complex computer...

Centralized log management for complex computer networks Marcus Hanikat

KTH ROYAL INSTITUTE OF TECHNOLOGY

E L E K TR O T E K N I K O C H D A T AV E TE N S K AP

Abstract

In modern computer networks log messages produced on different devices throughout the network is collected and analyzed. The data from these log messages gives the network administrators an overview of the networks operation, allows them to detect problems with the network and block security breaches. In this thesis several different centralized log management systems are analyzed and evaluated to see if they match the requirements for security, performance and cost which was established. These requirements are designed to meet the stakeholder’s requirements of log management and allow for scaling along with the growth of their network. To prove that the selected system meets the requirements, a small-scale implementation of the system will be created as a “proof of concept”. The conclusion reached was that the best solution for the centralized log management system was the ELK Stack system which is based upon the three open source software Elasticsearch, Logstash and Kibana. In the small-scale implementation of the ELK Stack system it was shown that it meets all the requirements placed on the system. The goal of this thesis is to help develop a greater understanding of some well-known centralized log management systems and why the usage of them is important for computer networks. This will be done by describing, comparing and evaluating some of the functionalities of the selected centralized log management systems. This thesis will also be able to provide people and entities with guidance and recommendations for the choice and implementation of a centralized log management system. Keywords Logging ; Log management ; Computer networks ; Centralization ; Security

Abstrakt

I moderna datornätverk så produceras loggar på olika enheter i nätverket för att sedan samlas in och analyseras. Den data som finns i dessa loggar hjälper nätverksadministratörerna att få en överblick av hur nätverket fungerar, tillåter dem att upptäcka problem i nätverket samt blockera säkerhetshål. I detta projekt så analyseras flertalet relevanta system för centraliserad loggning utifrån de krav för säkerhet, prestanda och kostnad som är uppsatta. Dessa krav är uppsatta för att möta intressentens krav på loghantering och även tillåta för skalning jämsides med tillväxten av deras nätverk. För att bevisa att det valda systemet även fyller de uppsatta kraven så upprättades även en småskalig implementation av det valda systemet som ett ”proof of concept”. Slutsatsen som drogs var att det bästa centraliserade loggningssystemet utifrån de krav som ställs var ELK Stack som är baserat på tre olika mjukvarusystem med öppen källkod som heter Elasticsearch, Logstash och Kibana. I den småskaliga implementationen av detta system så påvisades även att det valda loggningssystemet uppnår samtliga krav som ställdes på systemet. Målet med detta projekt är att hjälpa till att utveckla kunskapen kring några välkända system för centraliserad loggning och varför användning av dessa är av stor betydelse för datornätverk. Detta kommer att göras genom att beskriva, jämföra och utvärdera de utvalda systemen för centraliserad loggning. Projektet kan även att hjälpa personer och organisationer med vägledning och rekommendationer inför val och implementation av ett centraliserat loggningssystem. Nyckelord Loggning ; Log hantering; Datornätverk ; Centralisering ; Säkerhet

i

Table of contents

1 Introduction ............................................................................................................. 1 1.1 Background .................................................................................................................... 2 1.2 Problem ............................................................................................................................ 3 1.3 Purpose ............................................................................................................................. 4 1.4 Goals ................................................................................................................................... 4

1.4.1 Benefits for society, ethics and sustainability .......................................................... 5 1.5 Research Methodology ............................................................................................ 5 1.6 Stakeholder .................................................................................................................... 7 1.7 Delimitations ................................................................................................................ 7 1.8 Disposition ..................................................................................................................... 8

2 Log management ................................................................................................. 10 2.1 Log source groups ................................................................................................... 10 2.2 Log severity levels ................................................................................................... 11 2.3 Log processing pipeline ....................................................................................... 12 2.4 Centralized log management structures ................................................... 13 2.5 Logging policy ............................................................................................................ 16 2.6 Related work .............................................................................................................. 16

3 Research methodologies and methods ................................................... 19 3.1 System development methodologies ........................................................... 19 3.2 Research phases ....................................................................................................... 20 3.3 Data collection........................................................................................................... 20 3.4 Setting up system requirements .................................................................... 21 3.5 System selection process .................................................................................... 21

4 System requirements ........................................................................................ 24 4.1 Logging policy ............................................................................................................ 24

4.1.1 Log generation....................................................................................................................... 24 4.1.2 Log transmission .................................................................................................................. 25 4.1.3 Log storage and disposal .................................................................................................. 26 4.1.4 Log analysis ............................................................................................................................ 27

4.2 Requirements ............................................................................................................ 28

5 System selection ................................................................................................... 30 5.1 Splunk ............................................................................................................................ 30 5.2 ELK Stack ..................................................................................................................... 32 5.3 Graylog ........................................................................................................................... 37 5.4 System selection ....................................................................................................... 40

5.4.1 Stack Overflow ...................................................................................................................... 40 5.4.2 Cost of implementation .................................................................................................... 41 5.4.3 Scalability ................................................................................................................................. 42 5.4.4 Open source ............................................................................................................................ 44 5.4.5 Criterion summarization.................................................................................................. 44

6 System implementation ................................................................................... 46 6.1 Network topology .................................................................................................... 46 6.2 Parsing log data ........................................................................................................ 49 6.3 Encryption, Authentication and Integrity ................................................ 51

ii

6.4 Data persistency and availability ................................................................... 53 6.5 Scaling the system ................................................................................................... 55 6.6 Generating alerts and X-Pack .......................................................................... 56 6.7 Kibana visualization .............................................................................................. 57

7 Results........................................................................................................................ 59 7.1 Logging policy and system requirements ................................................. 59 7.2 System selection ....................................................................................................... 60 7.3 Proof of concept implementation .................................................................. 60

7.3.1 Log collection and processing ....................................................................................... 60 7.3.2 Log transportation and persistency ........................................................................... 63 7.3.3 Log data visualization ........................................................................................................ 65 7.3.4 Alerting and X-Pack ........................................................................................................... 68

8 Discussion ................................................................................................................ 70 8.1 Research and system development methods.......................................... 70 8.2 Logging policy ............................................................................................................ 70 8.3 System selection ....................................................................................................... 71 8.4 System implementation....................................................................................... 72 8.5 Visualization of data .............................................................................................. 73

9 Conclusion ............................................................................................................... 74

10 Future work ........................................................................................................ 76

References ........................................................................................................................ 77

Appendix A – Logging policy questionnaire ................................................. 86

Appendix B – Elasticsearch configuration .................................................... 87

Appendix C – Kibana configuration .................................................................. 88

Appendix D – Logstash configuration ............................................................. 89

Appendix E – Logstash input configuration ................................................. 90

Appendix F – Logstash filter configuration .................................................. 91

Appendix G – Logstash output configuration ............................................. 92

iii

List of figures FIGURE 1: EMPIRICAL RESEARCH CYCLE .................................................................................................... 6

FIGURE 2: A TWO LAYERED CENTRALIZED LOG MANAGEMENT SYSTEM .......................................13

FIGURE 3: A FOUR LAYERED CENTRALIZED LOG MANAGEMENT SYSTEM .....................................15

FIGURE 4: ARCHITECTURE OF THE ELK STACK SYSTEM ...................................................................33

FIGURE 5: A IMPLEMENTATION OF THE ELK STACK IN A COMPLEX NETWORK ........................36

FIGURE 6: A BASIC VIEW OF A MULTI-NODE SETUP OF A GRAYLOG SYSTEM. .............................38

FIGURE 7: NETWORK TOPOLOGY WHICH THE ELK STACK SYSTEM WAS DEPLOYED IN .........48

FIGURE 8: THE LOGSTASH PIPELINE. ........................................................................................................49

FIGURE 9: PARSING OF LOG DATA USING GROK FILTER ......................................................................50

FIGURE 10: THE AUTHENTICATION PAGE IN THE KIBANA WEB INTERFACE ...............................53

FIGURE 11: A PARSED SYSLOG MESSAGE USING A GROK PATTERN ..................................................62

FIGURE 12: A PACKET CAPTURE OF HTTP COMMUNICATION ..........................................................64

FIGURE 13: A STANDARD LOG MESSAGE DISPLAY IN KIBANA ...........................................................65

FIGURE 14: A LINE CHART VISUALIZATION CREATED WITHIN KIBANA ........................................66

FIGURE 15: A PIE CHART VISUALIZATION CREATED WITHIN KIBANA ...........................................67

FIGURE 16: THE X-PACK MONITORING FEATURE FOR ELASTICSEARCH NODE .........................68

FIGURE 17: X-PACK MONITORING FEATURE FOR LOGSTASH NODES ............................................68

https://d.docs.live.net/38363608eb8459e3/Skolan/Examensarbete/Rapporten-4.0.docx#_Toc516933605

https://d.docs.live.net/38363608eb8459e3/Skolan/Examensarbete/Rapporten-4.0.docx#_Toc516933609

iv

List of tables TABLE 1: A BASIC DESCRIPTION OF THE BASIC LOG SOURCE GROUPS ...........................................10

TABLE 2: THE SEVERITY LEVELS FROM THE SYSLOG STANDARD .....................................................11

TABLE 3: PIPELINE STAGES WHICH LOG MESSAGES ARE PUT THROUGH......................................12

TABLE 4: CRITERIA RATINGS RECEIVED AND THE SUMMARIZATION RECIEVED ........................45

v

List of acronyms and abbreviations SOHO Small Office Home Office

JSON JavaScript Object Notation

XML Extensible Markup Language

SIEM Security information and Event Management

PuL Personuppgiftslagen

TCP Transmission Control Protocol

TLS Transport Layer Security

RELP Reliable Event Logging Protocol

IPS Intrusion Prevention System

VLAN Virtual Local Area Network

UDP User Datagram Protocol

HTTP Hypertext Transfer Protocol

LDAP Lightweight Directory Access Protocol

APM Application Performance Monitoring

AI Artificial Intelligence

1

1 Introduction

The number of devices connected to the internet and computer networks around the world are constantly growing [1]. As the number of devices connected to a network increase so does the amount of traffic and complexity of the network. Most devices in a network run some sort of application or service which produce log messages. These log messages can contain vital information about the system or software execution and information about security threats to a device or the network. A log file is a collection of messages created when an event occurs during a programs execution. These events can be anything from a failed authentication of a user to an error condition within the program. The log message is created to describe the event that occurred and then stored within the programs log file. These log files can then be parsed for information which can provide more information about how the program is executing or if there have been any security incidents. Nowadays, a computer network commonly exists of many different devices. Since there are often several programs executing on each device, each device will in most cases be a producer of multiple log files. As the number of devices increase within a network it can become difficult to manage all the log files produced by the devices. For the person or entity hosting these devices, the log messages can provide crucial information to improve a device or networks operation and strengthen its security. Because of this, a good centralized log management system is becoming a crucial part of any larger network today. The centralized log managements system is used to collect and store the logs from different devices and allows easier access to the log data. Log collection and management is nowadays a must in almost all larger networks. It can be used in Small office/home office (SOHO) with just a couple of devices to larger enterprise networks with possible thousands of devices. All these networks can benefit from log management to different extent. As more devices are added to a network the implementation of a centralized log management system become more and more attractive and at some point, even necessary. A centralized log management system can help to reduce the time spent organizing and analyzing log messages [2]. One of the disadvantages is that it can also have a steep learning curve and the setup might be very time consuming. Also, depending on the chosen system for centralized log management, the time required for the setup process as well as the cost of the system ranges from low to very high [3].

2

1.1 Background

The number of log messages produced in computer networks are constantly growing [4]. Alongside with this growth, the effort required to handle, store and analyze these log messages increase. Log messages are important since they tell the reader a lot about how an application or device currently is operating. For example, the log messages can contain information about severe problems or security issues. Collecting and analyzing these log messages are very important but can also be a time-consuming task. In a relatively large computer network with over 1 000 endpoints, the data produced by log messages could reach about 190GB per day [5]. When the amount of data produced reaches these levels the administrating entity must rely on help from computers to parse the log data efficiently. A computer can sort out which information that is relevant and might be interesting or information that require further investigation.

It is often the network and system administrators who are responsible for analyzing log messages from devices within an entity’s network. But in today’s computer networks it is unfeasible to employ enough administrators to be able to analyze all the log messages. Instead, the administrators often deploy and use systems where all log messages will have to pass through a filtering and analyzation process before they are reviewed or stored. This process can use data mining to extract the valuable information from the log messages [6]. Since all the endpoints and devices within the network is of a wide variety of operating systems and often produce differently formatted log messages, it would be very impracticable and time consuming to analyze the log messages locally on every device. A more common solution is to have each endpoint and device within the network send its log messages to one, or several, designated log parsers and analyzers [7]. These designated log parsers and analyzers can parse the data from the log message and then analyze it to see if the information is relevant or redundant. The data can then with benefit be stored in a new format such as JavaScript Object Notation (JSON) [8] or eXtensible Markup Language (XML) [9] to have all log messages stored in the same format. JSON and XML are two different ways of formatting data for storage or transport into a standardized structure. Within JSON and XML files data such as timestamps and IP-addresses can be standardized fields and searchable between different log formats.

When the log messages have been parsed and analyzed they are stored for a period of time or until the messages have been reviewed. Today many entities use a Security Information and Event Management (SIEM) systems to help with the review process [10]. These tools allow for searching and correlation of log messages and can be used to monitor the log information and alert administrators when a problem occurs. They also allow presentation of the information in a more human-readable format, such as graphs, charts or tables. This can save a large amount of time spent on analyzing log messages and allows for new possibilities for correlation and finding relationships between different events throughout the network.

3

1.2 Problem

There are several different reasons for storing and analyzing the log messages produced within a computer network. For example, in Sweden it is required for internet service providers to keep the dynamically assigned IP-addresses of their clients for a period of time due to the IPRED-law [11]. With the help of this law the authorities can request information about that specific IP-address to see who or what device it belonged to at a given time. If there has been a crime committed using a leased IP-address the authorities can trace the IP-address to a specific person. These log messages are specified by law to be kept for a specific period before they are removed. Other log messages such as informational and debug log messages do in most cases not require storing for longer periods of time. Instead they can be deleted within a couple of days or as soon as the problem described within has been resolved.

An evolving problem related to computer log management is how the valuable information is distinguished from the non-valuable information. It might be feasible to extract information from a single or a couple of log messages. But when trying to extract information from all the log messages produced within a large network it becomes a tough task. To have network administrators handle all the information produced by devices in larger networks, companies would have to hire a lot of personnel just to parse through all these log messages. This would introduce a huge cost for entities maintaining larger networks. Because of this, it is in their interest to introduce automatic parsing systems for log messages. These parsing systems should display only the valuable information to the person reviewing the log messages. A common term within the industry is SIEM [10] which is a software that can provide analysis of log messages in real-time. This software often supplies the network administrators with tools to better visualize the log data and help with correlation, alerting and analysis. The focus of this thesis is towards exploring and finding a viable and scalable centralized log management system for larger and complex networks. Centralized log management is an important part to implement in larger networks and comes with many benefits. There is currently a large amount of different systems available for centralized log management with a wide variety of different features and functionalities. It can be hard to tell which one is suitable for a given network and its requirements. So, from all these different systems, which ones are suitable to use for centralized log management and visualization in a large and complex network? A thorough investigation into the most well-known systems on the market will be deployed in this thesis to answer this question.

4

1.3 Purpose

This thesis aims at presenting the most appropriate solution for a centralized log management system for the stakeholder’s, EPM Data AB’s, network and explore what benefits can be reaped from this system. The reason as to why this problem needs to be solved is that without a proper log management system the management of their network and its security will become increasingly difficult. With a functional and easy to manage log management system, the time spent on network management will be greatly reduced. Security issues such as intrusions or malware detection will also become a lot easier to detect and manage. This will save the stakeholder both time and effort while maintaining full control of their network. In this thesis several different log management systems will be presented and evaluated. This will help the reader of this thesis to develop a greater understanding of these centralized log management systems and their benefits. Centralized log management is a quickly rising concern within network management. This thesis aim is to shed some light on possible issues and solutions within this field. As of before this thesis was started, the stakeholder was using Splunk [12] as their centralized log management system. Hopefully, the presented system in this thesis will be able to offer an improved implementation or alternative system.

1.4 Goals

This thesis goal is to propose a solution to a centralized log management system which can be beneficially used in the stakeholder’s and other complex networks. This system selected must meet the requirements which will be established for the project. To achieve this goal the following sub-tasks must be fulfilled:

1. Construction of a basic logging policy based on the stakeholder’s requirements.

2. Use the logging policy and information from background research to set the system requirements for the log management system.

3. Selection of centralized log management system which meet the requirements established.

4. Implementation and configuration of a small-scale solution which meets the project requirements and follows the logging policy.

The results presented by this project will be a comparison of several well-known centralized log management systems as well as recommendations for implementing the chosen system for this project. This will help the stakeholder by giving them a proposed and proven small-scale system for further deployment within their network. Since there are several different systems discussed within this thesis, it can also be used as a guideline for selection of log management system within a wide variety of networks. Because of this,

5

other companies and entities should also have an interest in the results presented by this thesis. At the end of this thesis the goal is to present a small-scale system which is designed to comply with the crafted logging policy. The configuration of this system will be produced to serve as a guideline for a full-scale implementation within the stakeholder’s network and other networks in the future.

1.4.1 Benefits for society, ethics and sustainability

As the security threats to networks and devices increases, so does the demands for a system to help sort out and present the information produced by networks or devices. This thesis aims towards helping the reader to better understand the requirements for collection, storage and analysis of log data. The implementation of a centralized log management system can help to increase the visibility and correlations of attacks and other threats to networks and devices. This will help the society by allowing for better understanding and more thorough investigations when these attacks occur. Some ethical problems arise when data collection is involved. It is important to make sure that no personal data is stored within log messages since this can violate the person’s privacy and integrity. It is the entity collecting the log data responsibility to make sure that no personal data is captured and stored within or that the personal data is treated according to applicable laws. Some countries have laws to protect peoples’ personal data. For example, in Sweden there is the PuL (Personuppgiftslagen 1998:204) [13] law which states how personal data should be handled. It is very important for the entities collecting personal data to make sure that laws such as this is upheld. This projects also help to facilitate some of the ethical aspects of computer security and the morality of hacking and other computer crimes. Computer security is very important if an entity is storing personal or other sensitive data within their network. A centralized log management system can help with the investigations towards potential crimes, also called computer forensics. It can also help to detect ongoing attacks using visualization and correlation methods and it is also possible to strengthen security by the help of data from the log messages collected.

1.5 Research Methodology

There are two distinct different groups of research methods, quantitative and qualitative research. The quantitative research is concerned with measurements of data, as the name suggest it tries to quantify things [14]. The research methods which belongs to the quantitative research group, such as experimental and deductive research [15], are concerned with gathering and generalizing data from, for example, surveys. The qualitative research methods on the other hand are focusing on the quality of information and perception. It also aims at providing a detailed description of what is observed or experienced

6

[14]. Applied research and conceptual research are examples of qualitative research methods [15]. This thesis relies on empirical and applied research methods. Both the empirical and applied research methods fall under the qualitative group of research methods. The empirical research method aim is to develop a greater understanding of practices and technologies using collection and analyzation of data and experiences [15]. Using experiments, observations and experience as proof the empirical research method draws conclusion of the researched topic. The applied research method is often based upon existing research and uses real work to solve problems [15]. In this thesis the empirical research method will be used during the investigation of the possible solutions for the centralized log management system. Empirical research uses observation of for example experiments to gain knowledge of the area of investigation [15]. When the system to implement has been chosen, empirical research will also be used to set up a small-scale implementation of the chosen system. Finally, with help of experiments and observations the aim is to prove that the chosen system is sufficient and meets the requirements.

Figure 1: This figure gives a visual representation of A.D. de Groot’s empirical

research cycle. The cycle describes the empirical research process and its different phases. From Wikimedia Commons. Licensed under CC for free

sharing [16].

7

In figure 1 above a visual representation of A.D. de Groot’s empirical research cycle can be seen. This cycle describes the work flow of empirical research [17]. The research starts out with the observation phase where the problem is identified and information collected. It continues with the induction phase where the hypothesis is derived from the observations made. The deduction phase is then used to setup how the hypothesis should be tested. In the following testing phase the hypothesis is put to the test. Finally, in the evaluation phase the results are interpreted and evaluated. The applied research method will then be used during the implementation of the proposed system to investigate if it meets the set requirements to solve the problems attacked in this thesis. The applied research method is concerned with answering specific questions regarding a set of circumstances [15]. Applied research often builds on former research and uses data from real work to solve a problem. The results from applied research is often related to a certain situation which in the case of this thesis is the stakeholder’s network. Although the investigation is done towards this particular situation, the results and conclusions found might be applicable to other situations.

1.6 Stakeholder

This thesis was produced in cooperation with the company EPM Data AB. EPM Data is a company which is focusing on providing management of modern IT-services. The services provided by EPM Data ranges from providing high availability hardware and servers to cloud desktops with pre-installed applications which can be reached from anywhere. Among the services offered by EPM Data is hosting, virtualization and cloud desktops. The research done in this thesis is supposed to help EPM Data find alternatives to their current implemented log management system. Since EPM Data focuses on providing a worry free and secure services for their customers, they need a good and reliable log management system to achieve this. EPM Data has several important requirements which the chosen system and implementation must fulfill. The requirements placed upon the system are designed to allow for expansion of EPM Data’s network in the future. Because of this, the requirements are set to meet a larger network than which is currently deployed by the stakeholder, EPM Data. The requirements placed upon the system will be further discussed in section 4.

1.7 Delimitations

There are many different systems available which can provide centralized log management and visualization to choose from. To be able to get a deeper understanding of some of them, only the most well-known systems will be compared. Only one of the compared systems will be implemented and tested. It would have been a great idea to do a performance and in-depth comparison between different systems, instead this is left as a possible thesis proposal or future work.

8

This thesis will only analyze centralized log management systems which also incorporate visualization and analyzation tools. There are several possibilities for collecting and visualizing log data. In this thesis, only the systems which have the functionality to collect, store, analyze and visualize log data will be examined. In this thesis a basic logging policy with foundations built on the paper “Guide to computer and security log management” presented by Kent and Souppaya [18] will be established. This logging policy will serve as a foundation during the selection and implementation of the log management system. However, this thesis will not go into depths about how a good logging policy should be formed since this is greatly dependent on the network or entity it is created for. The solution presented in this thesis is only aimed at fulfilling the stakeholder’s requirements placed upon the system. This means that the solution might, or might not, be applicable to other networks. If such an attempt should be made it is strongly recommend that a new logging policy should be created for the network and then a comparison made to see if the solution still is viable. The primary focus of this thesis is towards centralized log management systems. This means that the discussion and implementation of visualization and correlation tools that come along with these systems will be limited. There will be a small coverage of these visualization and correlation tools, but the primary focus is towards finding a system for centralized log management purposes.

1.8 Disposition

Section 2 of this thesis will discuss and present background information which is relevant to the work performed within this thesis. In section 2 a description of the requirements set upon the searched system, the introduction of logging policies and other relevant work will be presented. In section 3 the methodologies used to solve the stated problems will be presented. Also, section 3 will present some modeling of the network, analysis of the estimated logging related traffic within the network and the system development methods used. Section 4 discusses and produces a logging policy. Drawn from the logging policy and the results of the background and literature study the requirements placed upon the centralized log management system will be stated. Section 5 discusses the different viable systems which meets the requirements for this project. Also, an analytical in-depth comparison of the proposed systems will be done, and the most suitable system will be selected for implementation.

9

In section 6 the configuration and implementation process of the proof of concept system will be described and some of the functionalities further discussed. Section 7 will present the results from the work done within the thesis. This includes the results from the logging policy, system selection, system implementation. In section 8 a discussion of the results presented by this thesis will be conducted. Section 9 will give a quick recap of the work done in the thesis and some of the results presented. Further, section 9 will present some of the conclusions that can be drawn from the results of this thesis. Finally, section 10 will give recommendations on how the work presented within this thesis can be used for further research into the area.

10

2 Log management

In this section some of the most distinct characteristics of log messages and log management systems are presented. To begin with, section 2.1 presents common terminology for different log source groups. In section 2.2 an explanation of log severity levels will be given. Section 2.3 presents terminology regarding the log processing pipeline. In section 2.4 differences in centralized log management structures will be discussed. Section 2.5 will introduce logging policies and explain why they might be useful. Finally, in section 2.6 the related work to this thesis will be presented.

2.1 Log source groups

A log is a collection of events that occurs during the operation of a device, application or service. Log messages can come from many different sources and have many different distinct characteristics. The log messages produced within a network can come from a wide variety of sources with different operating systems, applications and services installed throughout the network. Some log messages might be more interesting than others to an administrator when investigating events in the network. This can be dependent on which sources the log messages origins from. There are three primary groups of log messages which are commonly seen in computer networks, these are [18]:

Computer security logs

These log messages contain information about possible attacks, intrusions, viruses or authentication actions against a device.

Operating system logs

These log messages contain information about operating system related events, such as system events, authentication attempts or file accesses on a system.

Application logs

These log messages contain information from applications installed on devices throughout the network, for example web or database servers.

Table 1: This table gives a basic description of the basic log source groups. Most

log sources can be divided into one of these groups [18].

These log source groups can be used to filter and divide the log data during searches and analysis to remove irrelevant information. This can help network administrators to save time since the number of log messages that are required to be searched and analyzed is decreased. These groups can also be helpful when establishing logging policies, which will be discussed later in the thesis.

11

2.2 Log severity levels

Log severity levels are used to show the severity of the occurred event described within the data of a log message. By using log severity levels uninteresting events can be filtered out. The severity level of a log message can also tell a lot about the device operation and if it is in acute need of oversight. In most cases an event is more important the lower its severity level is. If the log has a low severity level, there might be a severe incident in the network, or on a device, and the network administrators might need to be alerted to this problem. On the other hand, if the log message has a high severity level the data within the log message might be deemed unnecessary by the entity’s logging policy and is instead discarded rather than stored. There are several different standards for log severity levels which might depend on what operating system, application or devices that produced the log message. For network and Linux operating systems logs, one of the most common standards of log severity levels is the one used by the syslog protocol [19]. Syslog uses 8 different log severity levels with different meaning and numerical code. The numerical code represents the severity level of the log message and ranges from 7, as the lowest severity, to 0 which is the highest severity. Numerical

code Severity

0 Emergency: System is unusable

1 Alert: Action must be taken immediately

2 Critical: Critical conditions

3 Error: Error conditions

4 Warning: Warning conditions

5 Notice: Normal but significant condition

6 Informational: Informational messages

7 Debug: Debug-level messages

Table 2: In this table the severity levels documented in the syslog standard can be seen. These severity levels are used to represent how important the log

message is [19].

As can be seen in table 2 above it provides an overview for each log level that syslog uses. These numerical log severity levels shown in table 2 are nowadays used by many different software and other producers of log messages. The log severity levels presented in table 2 are not only used for applications and devices capable of using the syslog protocol but by many other applications as well.

12

2.3 Log processing pipeline

When the log messages are processed and analyzed they are passed through what could be described as a pipeline. Within this pipeline all the scenarios which the log are put through during its lifetime is included. There are four phases which the log passes through which ends with the log being disposed of. These four phases can be described as follow [18]:

Processing

Processes the log by parsing, filtering or performing event aggregation on it. This phase often changes the log messages appearance to get match a more uniform log template.

Storage

Handles the storing of the log messages and are responsible for actions such as log rotation, compression, archiving and in some cases integrity checking.

Analysis

The information from the log messages are reviewed, sometimes with the aid of tools for correlation and relationship finding between log messages.

Disposal

When the log messages are no longer needed they reach the end of the pipeline and are disposed of.

Table 3: This table displays the pipeline stages which log messages are put through during their lifetime in a centralized log management system [18].

Together, these four stages represent the lifetime of the log messages from the moment they enter the log management system until they are deleted. Different log management systems will perform different operations within each stage of the pipeline. For example, one system might only sort out the events with high severity in the processing stage. Meanwhile, another system might parse the data from the log in the processing stage and then store it in another format. The way that the log messages are handled and processed in each stage of the pipeline are therefore dependent on the log management system they pass through.

13

2.4 Centralized log management structures

Almost every device that produces log messages and that has not been configured for any other solution stores the log messages produced in its own storage pool. In a centralized log management system there are different kinds of structures used to achieve centralized log management and each one come with its own benefit. For example, maybe the most used is a single device running one instance of the chosen centralized log management system. All log messages are sent to this device and they are processed, stored and analyzed on the same device. An example is rsyslog protocol enabled devices which can be configured to send its log messages to a single rsyslog server [20]. This will produce a log management structure with two distinct layers, one layer with clients and one layer with the syslog server. These management structures with a low number of layers have several benefits such as being easy to setup and manage. However, they do not scale very well when compared to structures implemented across several layers.

Figure 2: This figure displays a two layered centralized log management system where all clients send their log messages to a central rsyslog server. Figure

drawn by the author using the tools at https://www.draw.io.

Nowadays it is common to find systems where the log messages are not stored as the original message, but instead processed into a unified template before they are stored. These systems are much more resource intensive [21] and therefore it can be a good idea to split them into several layers to allow for better scaling. In some cases, a three-layered solution is a good and viable solution. In a three-layered solution an extra layer has been added between the production and storage of the log data compared to the structure seen in figure 2. This extra layer processes the data within the log messages and in most cases changes it to a different format, for example JSON (JavaScript Object Notation). This enables the centralized log management system to read log messages with different formatting and from different sources and then transform them into a uniform log format. An example of a solution that can implement this strategy is the use of Logstash [22] and Elasticsearch [23].

14

Multiple servers of the same type can be used for redundancy and load balancing in case of hardware failure or at peak load times. This increases scalability since nodes can easily be added to increase the storage or processing capacity of the structure. On the other hand, it is more difficult to manage and setup when compared to the single log server since the number of devices in the structure increases. Another solution is where a visualization tool is used to help with correlation and analyzation of data. These visualization tools often come incorporated in SIEM solutions. Visualization tools retrieve the log data from storage at the log storage servers and then analyses it to present valuable visualizations of the events contained in the log data. With the help of these tools correlation of logged events becomes much easier. It is also common that these tools serve as an administration point for the remainder of the implemented system. An example of one system which incorporates such a visualization tool is Splunk [12]. In figure 3 below a four-layered centralized log management system with visualization tools can be seen.

15

Figure 3: This figure displays a four layered centralized log management system and visualization system. The clients send their log messages to the log

processing servers. The log processing servers then process the messages and then forward the data to the log storage servers. The visualization tool is used to extract the data from the storage servers. Figure drawn by the author using the

tools at https://www.draw.io.

One of the benefits of having a multilayered log management structure is that they often have the ability for horizontal scaling. Horizontal scaling is when more devices are added to share the load whereas vertical scaling is when the hardware is upgraded to meet the load [24]. In most cases horizontal scaling is the most cost-effective way of scaling systems. This is because of that the cost to performance ratio in most cases is lower for the cheaper hardware. Therefore, scaling the system horizontally with cheaper hardware is often more cost efficient than scaling it vertically with more costly hardware. In this thesis the focus is towards finding a four-layered centralized log management system with a visualization tool. This is because of the benefits this structure provides for scalability and visualization. The proof of concept presented within this thesis will have a similar structure to the system implemented in figure 3 above.

16

2.5 Logging policy

Before a log management system is implemented in any network, it is a good idea to create a logging policy which governs the choice of the system and how it is implemented [25]. A logging policy is created by the owners or administrators of a network and is used to describe the required behavior of the log management system within the network. The logging policy states which log messages that contain interesting information, from which sources these log messages originate, for how long these log messages should be stored, who is responsible for analyzing the log messages etc. The logging policy governs all actions performed on the log messages. It starts out by stating which type of devices that are required to save its log messages and which log messages that should be kept from each device. The logging policy also touch upon how log messages should be treated during transmission. This includes if the log messages should be encrypted and hashed to protect confidentiality and integrity of the log messages. Further, a logging policy should state how log messages are stored and if they are required to be encrypted and hashed during storage. Another important topic logging policies should state is for how long each log type should be stored and how they should be disposed of. Finally, a logging policy should also state who should be able to access the log messages, how they are analyzed and how often the log messages should be analyzed. The stakeholder does not have an established logging policy so for the foundation of this thesis a basic logging policy will be set up a using the guidelines presented in Kent and Souppaya’s paper [18] under section 4.2. Worth noting is that this policy is not by any means supposed to be used in full scale deployments. The policy set up within this thesis is purely for demonstration and help with staking out the requirements of the centralized log management system which will be implemented.

2.6 Related work

Within this section relevant work to this thesis will be discussed. The previous work and findings will be presented here as a foundation for this thesis to build upon. Many devices today have support for the syslog protocol for transferring their log messages within networks. The original syslog protocol has several flaws such as it does not support encryption during transport and does not allow for reliable delivery methods. The rsyslog protocol was produced to solve some of these flaws and is now commonly used in networks for centralized storage of log data. In September 2009 Peter Matulis published a technical white paper named “Centralized Logging With rsyslog” [20] which focused on the functionality and implementation of the rsyslog software. This software is an open-source software which is built on syslog but extends it with some key functionalities. These functionalities include Reliable transport with TCP (Transmission Control Protocol), encryption availability with TLS (Transport

17

Layer Security), support for Reliable Event Logging Protocol (RELP) and disk buffering of event messages during peak load times. This paper presents a good introduction and explanation of the key features of the rsyslog software. It also shows how some of them are implemented. Creating and maintaining a logging policy is a good idea when dealing with larger and complex networks. This allows for the entity managing the network to setup a framework for how log messages should be handled and processed within the network. In the end, the goal of a logging policy is to make sure that log messages are treated in the same way on different devices throughout the network. The paper “Guide to computer security log management” [18] that was presented by Karen Kent and Murugiah Souppaya in September 2006 gives a good questionnaire which can be used for creating logging policies. The paper aims at spreading knowledge and understanding about the need for computer security log management. It also provides a guidance on how entities can develop, implement and maintain log management systems within their network. The paper presented by Kent and Souppaya provides a good start for entities looking to implement or improve their implementation of log management system. It provides a good introduction to logging policies, why they are required and how a basic logging policy can be constructed. This logging policy can then, if necessary, be extended and used to dictate the choice of log management system. The largest benefits of centralized log management do not seem to come from the centralized storage of the log messages. In 2005, Robert Rinnan concluded within his master thesis “Benefits of centralized log file correlation” [2] that centralized log management by itself does not achieve much more than the convenience of storing all the information in one place. Instead, the true benefits of centralized log management come from the ability for visualization and correlation between log data. To be able to offer improvements in security and reduce time spent on analysis of log messages, some kind of visualization tool is required. Because of this, finding a system which provides a strong visualization tool together with a centralized log management system can help with correlation between logged events. Thereby, the time spent in this process and the response time when a serious event occurs can be reduced. In the end this will lead to increased security throughout the network since the data provided can be easier accessed and assessed. Therefore, a good log management system should include a visualization and analyzation tool. Comparing log management systems is a difficult task since there are many different features to investigate and requirements to meet. In the master thesis “Application Log Analysis” presented by Júlia Murínová [26] she discusses the implementation of log management system for web-based applications. Various systems were compared and analyzed to find the optimal system. In the end she reached the conclusion that the ELK Stack system was the best suited system for the task. This was because of the custom log-format processing capabilities of Logstash together with the rich filtering and search capabilities of Elasticsearch. Although the results reached in Júlia Murínová’s thesis are not

18

directly applicable to the area concerned by this thesis, it brings valuable information, recommendations and opinions. This information can help with the setup of the system selection process within this thesis.

19

3 Research methodologies and methods

In this section a brief explanation to the methods and methodologies used within this thesis will be given. The work and research process will also be described within this section, together with the techniques used to evaluate and select the solution for a centralized log management system. Section 3.1 will explain the system development methodologies used for implementation of the proof of concept system. In section 3.2 an explanation of how the research within this thesis was done will be given. In section 3.3 the process of setting up the system requirements is explained. Section 3.4 explains how the system selection process was done. Finally, in section 3.5 the implementation process of the chosen system will be described.

3.1 System development methodologies

Since the time is limited to evaluate, develop and present the chosen system, this thesis uses a modified version of SCRUM [27]. SCRUM is an iterative and incremental framework for developing, delivering and sustaining complex products. This modified version of SCRUM was adapted to easier fit a shorter sprint and development time by removing some of the overhead. This will allow for more time to be spent on developing the system rather than producing documents which would have many similarities to the information presented in this thesis. This thesis will not apply any distinct roles from the “Scrum Team”, since it would in this case introduce a lot of extra and unnecessary complexity. Each requirement placed upon the system will be broken down into the basic parts and used as the product backlog items. Some of these items will then be selected to be implemented by the start of each sprint. Each sprint in this thesis will be one week, which in the end amounts to a total of eight sprints for the project. By the end of the last sprint, the proof of concept implementation should be done, and conclusions can be drawn. The proof of concept implementation will be done iteratively using the methods described above. This means that in each sprint, one of the requirements will be targeted. The required functionality will be experimentally implemented to prove that the chosen centralized log management system meets the requirement. This means that for each sprint that passes the proof of concept system should meet at least one more of the requirements that are placed upon it. After the eighth sprint the proof of concept system should meet all the requirements placed upon the system.

20

3.2 Research phases

The foundations of research are based upon the gathering, understanding and correlation of information. The gathered information is examined and processed to reach a conclusion. This conclusion can then be used as a foundation and further developed by other researchers. This thesis will go through a series of different research phases which together will build up to the results and conclusion of the thesis. These phases can be divided into the following:

1. Thesis problem formulation

2. Set up thesis goals

3. Selection of research methods

4. Collection of information

5. Analyze gathered information

6. Apply gathered information to select a system for implementation

7. Implementation of chosen system

8. Draw conclusions from research

Together, these phases will allow the work to flow through the empirical research cycle that was presented in section 1.5 and can be seen in figure 1. The phases 1, 2 and 4 are equivalent to the observation and induction phases. Phase 5 and 6 represent the phase of deduction. Phase 7 is used as the testing phase and finally phase 8 is the evaluation phase of the empirical research cycle. The results presented in the end of the thesis should be repeatable and conclusions well founded. In the end, this should allow future research to use and build upon the conclusion presented within this thesis.

3.3 Data collection

Data collection is the process where the information that the research is based upon is collected. This information must come from relevant and trustworthy sources to be used as a valid support for claims and conclusions presented within the thesis. The most trustworthy information a research can use is self-collected data using valid research methods. This data can be collected during the research using for example interviews or surveys. For all information collected during the research an in-depth analysis of the source must be done to verify that the information is valid. A research built upon unverified or false information will not lead to any new trustworthy results or conclusions. This thesis does not collect any new data using for example interviews or surveys. Instead, this thesis uses existing information to find a solution to the presented problem. The data collected and used within this thesis was retrieved from several different sources. To find the different sources of information presented within this thesis the “Academic and Scholar Search Engines and Sources” [28] written by Marcus P. Zillman was used. Within this document a large amount of search engines for scholar and academic sources. Several of the

21

presented search engines by Marcus P. Zillman was used to find the sources presented within this paper. Information about features and functionalities of the different centralized log management systems discussed within this thesis were taken directly from their respective website. This will help to ensure that the information retrieved about these systems will be correct and not opinionated by reviewers or other people.

3.4 Setting up system requirements

To be able to select the most appropriate software system to use for centralized log management it is important to clearly define the requirements placed upon the system. In this thesis the requirements are created with the help of the stakeholder to cope with the requirements of large data networks. The requirements placed on the system will be slightly higher than necessary. The reason for setting the requirements slightly higher than necessary is to allow for good scaling of the system when future expansions of the stakeholder’s network is considered. To setup the requirements of the system the first task within this thesis is to create a logging policy. The logging policy will provide a better understanding of the requirements of the system. This logging policy was created in collaboration with the stakeholder to make sure it complied with their data treatment policies. The logging policy was created together with the stakeholder by answering the questions that can be found in the questionnaire Appendix A. The logging policy presented in this thesis does not touch upon some of the questions of log management. Some of the log policy questions such as who should analyze the log messages and how often should the log messages be analyzed will not be answered. These questions were deemed unnecessary for this thesis but should otherwise be included when creating a logging policy. From the created logging policy, the requirements for the system was derived. These requirements will make sure that the selected and implemented system will deliver the desired functionality.

3.5 System selection process

The selection of the optimal centralized log management system for the stakeholder’s network is a tough task. The selection process will be based upon the requirements and logging policy laid out within section 4 of this thesis. Since there is a high possibility of having several centralized log management systems matching the requirements, a sorting process is used to decide which system is the most suitable for the stakeholder’s network. In case of multiple centralized log management systems meeting the requirements, these software systems must be ranked using reasonable criterion.

22

In this thesis the criterion used for ranking the suitable centralized log management systems will be:

• Number of threads on Stack Overflow [29].

• Cost of implementation

• Open source

• Scalability Each centralized log management system will be given a rating of one to ten for each presented criterion depending on how good they follow the criteria. The centralized log management system with the highest average rating will be the system chosen for implementation as the proof of concept model. The number of threads on Stack Overflow criteria is one of the criterion that was used for ranking of the suitable systems. It was used because it provides an overview of how good the knowledge of the system is within the community. The community knowledge can help with support for troubleshooting and setup of the system. A larger number of threads on websites such as Stack Overflow [29] clearly shows the width of the community knowledge for the specific system. This will make the setup of the system and its features easier since there is a great deal of frequently asked questions, solved problems and answers available. Therefore, if a problem should be encountered during the setup process of a system it is more likely to be an available solution to this problem if the system is well-known within the community. The cost of implementation is a very important criteria to consider during the selection process. The cost can vary greatly between different systems. This is mainly because there are a lot of open source systems available as well as systems that are developed by companies and sold for profit. It is not as easy as to assume that the open source systems would always have a lower cost since they often are free to use. Many open source systems have a very steep learning curve and can be difficult to implement and configure to the meet the requirements placed upon it. This might introduce a large cost to the entity which is implementing the system. In some cases, it might be cheaper for an entity to implement a paid system instead of using an open source system and implementing it themselves. This is because the systems which are paid for might offer a simpler configuration and setup process. Therefore, the cost of implementation criteria will only focus on the estimated time it takes to implement the system. To handle the differences in cost between open source and paid systems, the open source criteria will be introduced. Open source is most of the times a good thing to look out for. Open source systems will most of the time have better and wider support. Open source systems can also help to ensure development is continued for a longer period of time. Other benefits are that the goal of an open source system is not a financial gain but rather providing functionality to the developers and community surrounding it. Also, open source systems do not require a license to be installed and used.

23

Scalability might be the single most important factor when considering larger network deployment. If the system does not scale well several problems may arise. For example, if the system does not scale well an entity might have to implement several separate systems across their network. This would cause the network to not have a truly centralized log management system, but instead a less decentralized log management system. Another problem might be that the system works well when it is first deployed, but as the entity grows and an expansion of the system is necessary it could lead to a great amount of complications.

24

4 System requirements

In this section the requirements that are placed upon the system will be derived. This will be done by taking into consideration the recommendations from the stakeholder on what requirements should be placed upon the system by creating a logging policy. The requirements will also be deducted from the background and literature study of this thesis. Within the section 4.1 the logging policy is created together with the stakeholder using the questionnaire that can be seen in appendix A. Section 4.2 stakes out the requirements placed upon the centralized log management system.

4.1 Logging policy

A logging policy is a vital part for the planning and maintenance phases of an entity's network. It is recommended that every entity that deploys and maintains a larger network should create a logging policy [25]. The logging policy may be useful to declare which information that is important for the security and maintenance of the network. This can for example be to declare how the log data should be transported and stored to comply with the security requirements placed upon the system. The logging policy also describes things like by who and how the log data should be analyzed. There are several different ways to create a logging policy and there is not a set of rules as to what it should contain. This thesis will use the questionnaire presented in appendix A which is derived from Kent and Souppaya’s paper “Guide to Security Log Management” [18]. The logging policy will provide a clear view of the functionality that the system chosen for implementation is required to have. These requirements will be the foundation of the selection process for the system to be implemented in the stakeholder’s network. To create the logging policy there are several questions that needs to be answered [18]. These questions can be seen in detail within appendix A of this thesis. Below follows a summary of the answers to these questions. The questions have been answered to comply with the stakeholder’s network. The statements presented in the subsections below the logging policy created for this thesis.

4.1.1 Log generation

The questions answered from appendix A under the log generation subject is used to see which devices that are required to send their log messages to the centralized log management server. Using the statements presented below it is possible to tell which devices that are required to log data within the network. These statements also tell which events that are required to be logged on each device. The following devices and events should generate log data within the network:

25

• All network devices, applications which are exposed to the internet and devices concerned with different types of authentication should be performing security logging.

• All operating system log messages should be kept for security purposes since they contain a lot of information about authentication and accounting. All log messages from services and applications reachable through networks, both internal and external, should be logged to keep track of authentication and breaching attempts.

• All security events, such as alerts from IPS (Intrusion Prevention System) or detection of malware, should be logged. All services and applications exposed to both the internal and external network should log both connection and authentication attempts. Debugging and informational messages should be logged from all devices vital to the entity’s operation.

• For authentication events username, source IP-address and port should be logged. For security events type, source and destination (if applicable) address and port should be logged. For network connections source and destination (if applicable) IP-address and port should be logged.

• Each security and authentication event should be logged by the first occurrence and if more events occur within one minute from the first occurrence, it is enough for every five events to be bundled together and logged. Each network connection event should be logged individually. Debugging and informational messages should be logged by the first occurrence but not more than once every minute.

The devices which are included in the above statements should save their log messages. These log messages are generated locally on each device. Therefore, the next step is to transport the log messages to the centralized log management system. When the logs are received by the centralized log management system they will go through the first step of the log processing pipeline, which is processing.

4.1.2 Log transmission

When a new event is recorded, the log data is transmitted to the centralized logging system. It is important that the transportation of the data through the network follows the requirements on confidentiality and integrity of the entity. To make sure that the log data is transported according to the requirements of the entity, the questions from appendix A under the log transmission section is answered. These questions require the collection interval of different log data and types of devices that are required to send log messages to be stated. The following types of devices and security mechanisms are required to transport log data in the network:

26

• All hosts statically connected to the network should send their log messages to the centralized log management system. Wireless and/or mobile hosts are only required to transmit log messages if they are reachable from the external network.

• All entries concerning a priority of under 5 (Notice) information should be transmitted to the log management infrastructure. Devices, such as Ipas’s, firewalls, switches, routers etc. are also required to send informational and debugging messages to give a better understanding of the networks current condition.

• Log data should be transferred with guarantees for integrity and authenticity during transport. This means that all transmissions of log messages within the network should be encrypted using for example TLS (Transport Layer Security).

• Core devices within the network, such as IPS’s (Intrusion Protection System), firewalls, switches, routers etc., should continuously send their log data as it is produced to the centralized log management system. Other devices will suffice if they transfer their log messages within ten-minute intervals.

• Confidentiality and integrity should be protected of the log messages as they are transported through the network. This is done by using encryption and taking hashes of the log messages. The transport of log messages will go through a separate VLAN (Virtual Local Area Network) to ease maintenance and strengthen security.

All these hosts should send their logs to the centralized log management system. The guidelines for transmitting the log messages presented should also be followed. When the logs are received by the centralized log management system they are also processed and parsed for data as the first step in the log processing pipeline. The next step within the log processing pipeline is the storage and disposal of the processed log messages.

4.1.3 Log storage and disposal

There is a lot of different ways to store log data and each way comes with its own benefits. Because of this it is important to set a standard for how the log data should be stored and disposed of when necessary. This standard will make sure that the log data is handled the same way throughout the network. The following statements should be followed when configuring log storage nodes throughout the network:

• Lower priority log messages such as debugging and informational log messages should have a maximum storage volume of 10MB/log/devices before rotation occurs. All other log messages should not be affected by log rotations.

27

• During storage each log should be encrypted to protect the confidentiality of the contained information. Integrity should also be protected by taking a hash of each, or multiple, log files to allow validation test of the data. Availability will also be protected by having the same log messages stored on at least two different log storage nodes at any given time.

• The log messages of lower priorities such as debugging and informational does not have to be stored longer than a week. Log messages of higher priorities will be kept for a longer period of time, for at least 3 months. Log messages affected by and required to be stored by law, such as DHCP leases, should be stored for as long as stated by the law.

• Log data that is no longer required or deemed unnecessary during analysis should be deleted at the log storage nodes which holds them or dropped during processing.

• The amount of storage is dependent on the time that log messages are stored and how long it takes before they are analyzed. For the stakeholder’s network the estimated amount of log data produced each day is approximately 5GB. However, the data produced is expected to grow to somewhere between 10-20GB in the next year. To be able to store this data for three months and on duplicate locations the total storage required should be close to 2TB of storage space. The data should be stored on flash drive to reduce the time required for large searches.

These statements conclude how the log messages should be stored and some of most vital requirements of storage. The statements also describe for how long the log messages should be stored before they are disposed of. However, before disposing of the log messages, the log messages need to pass through the analysis phase of the log processing pipeline.

4.1.4 Log analysis

The most important operation performed on the log data is the analyzation process. This can be a very tedious task and it can be very difficult since it requires a good understanding of the network. To ease this task visualization and correlation tools are often used. The analyzation of log data should be performed regularly to increase its benefits. How often the log data should be analyzed and when alerts needs to be sent is presented in the following statements:

• The accepted time it takes before a log data entry is analyzed is dependent on the severity level and source device of the events. Events with a severity level of 3 or higher should be placed first in the analysis queue and all these events should be ordered by severity level. The top of the queue should be emptied of all level 3 or higher events at least

28

twice each day since it is unlikely to be a large amount of those log messages.

• Log data with severity levels 4 or 5 should be analyzed at least every other

day. For analyzing these events with less severity levels, an analysis and visualization tool could be beneficial to use.

• Log data with severity levels of 6 or 7 is not required to be regularly analyzed but can be stored if there should be a problem with the network or a device. This log data is not critical to the function of the network but can be good for troubleshooting.

• All connections made to the analyzation and storage servers should be logged. There is no need to log which log messages were accessed.

• An alert should be sent to the network administrators immediately when a suspicious activity or anomaly is detected. The network administrators will then have to investigate the incident to see if it is malicious behavior or not.

• All log messages should be both encrypted and hashed during transport and storage.

Together, these statements from the subsections above form the logging policy used within this thesis. These statements describe how the log messages should be handled when passed through each phase of the pipeline described in section 1.1. From these statements the requirements placed upon the log management system can be derived.

4.2 Requirements

To be able to choose the system which can perform the intended tasks it is important to know the requirements placed upon the system. Establishing the requirements of the system is a vital part of the system selection process. This is because of that the system requirements are a description of the functionality that the system is required to have. Without this functionality the system will not be able to fulfill its purpose when deployed. In this thesis the requirements placed upon the system is taken from two different sources. The first source of the requirements is the logging policy. The logging policy was created together with the stakeholder to meet and represent the requirements of their network. Each statement in the logging policy was created to make sure that the system which abides to the logging policy will be eligible for deployment as a centralized log management system. The second source for the selection of the system requirements was the background and literature study performed within the thesis. Presented below

29

is a list of requirements that the logging policy places upon the centralized logging system implemented within the stakeholder’s network. The requirements placed upon the centralized log management system within this thesis is the following:

• The system must be able to parse log messages that come from many different devices and in many different formats.

• The system must be able to encrypt the log messages while they are passed through the network and stored to protect possible confidential content and spoofing of log messages.

• The system must be able to handle and store log messages without losing information due to i.e. system crash, connection problems or network congestion.

• The system must be able to parse the log messages and present the information from these in an easily accessible and human readable fashion.

• The system must be able to search many log messages for certain information in a relatively short amount of time.

• The system must be able to detect and alert network administrators when an anomaly or intrusion is detected within the network.

• The system must be able to sort the severity of events within the log data and display the most important events first.

• The system must be highly scalable and be able to manage large quantities of data so that it is able to handle a big entity’s growth ratio.

• The system must be low-cost to implement and maintain, preferably no license cost for using the software but only hardware cost for storage and log processing.

These requirements will help to decide which systems are eligible for use within the stakeholder’s network. Only some of those systems that meet these requirements will be further discussed in this thesis. The systems which meets these requirements should all be a viable system for implementation in the stakeholder’s network. Every system has its own set of features, price tag, support and other functionalities. Because of this, a selection between the available systems must be done with care to make sure that in the end the optimal system is selected. Notable is that all the centralized log management systems that meet the requirements are not discussed in this thesis. Due to the limited amount of time available only the most well-known systems will be discussed.

30

5 System selection

In this section some of the functionalities of the most well-known systems used for centralized log management will be presented. All the presented systems meet, or can be configured to meet, the requirements stated in section 4.2. Each presented system has its own advantages and disadvantages. By closely analyzing these differences the most suitable software system will be chosen as the centralized log management system at the end of this section. Each system will be discussed in its own subsection where the required functionalities are presented. At the end of the section an evaluation of the systems will be done to find the optimal system for the stakeholder’s network. In section 5.1 some of the relevant functionalities of the Splunk system are presented. Following Splunk, in section 5.2 some of the ELK Stack systems functionalities are presented. In section 5.3 the functionalities of the Graylog system are discussed. Finally, in section 5.4 the system selection process will be described and its results presented.

5.1 Splunk

Splunk [12] is currently one of the most commonly used centralized log management systems. The company developing Splunk is using the same company name and was founded in 2003 and the first version of the software was released in 2005 [30]. Since then Splunk has become one of the most distinguished systems when it comes to handling and analyzing log messages and other sources of data. Today, they provide their log management system to more than 15 000 different customers and employ more than 3 000 people worldwide [30]. When implementing Splunk, the implementation can look very different depending on what network it is implemented in. To be able to handle scaling well Splunk uses three different types of devices: Search heads, indexers and forwarders [31]. The forwarders can be compared to the log processing devices in Layer 1 of figure 3 and are used to process and forward the log data from end devices. The indexers receive, stores and indexes the log data. The indexers provide a similar functionality to the log storage servers in Layer 2 of figure 3 and are used to make log data searchable and easily accessible. Finally, the search heads provide a functionality to aggregate searches from data stored on many different indexers. The search heads placement within the Splunk architecture can be compared to the visualization tool placement in Layer 3 of figure 3. Each search head can be queried with a search for log data by a network administrator. In return the search head will query the indexers for the information required to answer the query of the network administrator. The information will be collected from the indexers and aggregated by the search head before it is sent back to the network administrator. This allows Splunk to have a single point of access for network administrators to data stored on several indexer nodes throughout the network. Using multiple search heads will

31

allow for load balancing of searches and for Splunk to be able to run multiple heavy search queries at once. Depending on the requirements of the implementation Splunk can be installed on a single or multiple devices. When installed on a single device all log messages will be sent to the single Splunk instance and stored on it. This device will also provide the functionality of search head to retrieve and present the information to the network administrators. This will allow for a simple implementation but will have impact on the scalability, availability and performance of the system. When implementing Splunk in larger networks it is recommended to use a system setup similar to the one seen in figure 3. This will allow for improved scalability since in case of the need for expansion of the network forwarders, indexers and search heads can be added as required. Having multiple indexers and duplicating data entries within these nodes also comes with a bunch of advantages. For example, multiple indexers with duplicate data entries eliminates a single point of failure and reduces the risk of data loss. It also helps when performing larger searches on the data. This is because the data can be searched in multiple indexers in parallel and therefore effectively reducing the search time. Splunk can handle log messages and data coming from a wide variety of sources. These sources include Windows event log messages, web server log messages, database tables and archive files [32]. It is also possible to index data from sources which are not interpreted by default within Splunk. This can be done by using regular expressions to match the data fed into the processing [33]. This functionality allows Splunk to be able to meet the requirement of parsing log messages from many different sources. To be able to mitigate a log injection or forging attack the system must be able to encrypt and hash the log messages. This will help to mitigate these attacks since any attempt of altering or deletion of log data will be noticed, and the data retransmitted. Splunk can be configured to use TLS (Transport Layer Security) and self-signed or certificate authority signed certificates when transmitting data [34, 35]. This allows Splunk to ensure the integrity and confidentiality of the messages transmitted between the different nodes of the system. When indexing data fields that may contain sensitive data, encryption is advisable for protection. This can be done by encrypting the data field during the indexing process. If the field needs to be part of a data search they can be decrypted using the same symmetric key as for when they were encrypted during the indexing process [36]. This will allow stored sensitive data to be unusable without the encryption key. A vital feature of any centralized log management system is the ability to make sure that data is not lost during transmission. Data loss during transmission can happened because of many different reasons. Using an unreliable transmission protocol such as UDP (User Datagram Protocol) which might

32

allow data to be distorted or lost during transmission could be one cause. It is also a possibility that data is dropped due to network congestion or failure. To make sure that all log data is delivered securely, Splunk relies on the TCP (Transmission Control Protocol) [37]. TCP Allows for reliable transportation of data and error detection mechanisms to make sure that the package is received and correct. Although TCP help to make sure that the packet is received correctly, network or device failure can still happen. To deal with network problems Splunk forwarders can either failover to another indexer which might be reachable or store events locally until the network is operational again [37]. One vital part of log analysis today is the ability to produce alerts when critical events occur. For example, if an attack is detected on one of the IPS (Intrusion Prevention System) within the network, the log analyzer will receive a log message containing the information about the event. The log analyzer should then be able to alert the network administrators to the ongoing attack so that countermeasures can be applied if necessary. Splunk allows the creation of several different kinds of rules for generating alerts. For example, Splunk can be configured to generate alerts for every time a search query gets a hit. Splunk can also be configured to search for multiple events within a certain time frame and generate an alert when certain events occur within the given time frame [38]. Using different configuration of this functionality will allow Splunk to generate alerts for a single or multiple log messages. Splunk is also capable of showing the most important events that have occurred within the network. In Splunk this feature is called Notable Events Review dashboard [39]. The Notable Events Review dashboard can display the events with the highest severity that have occurred in an ordered list. This can be helpful to detect problems within the network without having to parse through all the log data. Splunk is not an open source software system and comes with a license cost for usage in most cases. The license cost is based upon how much data that is handled by the Splunk system each day. There is a free version of Splunk which offers a daily indexing volume of 500MB and allows for one user to access and view the data [40]. However, this version does not allow monitoring or the generation of alerts. The Splunk Light software system allows for data collection and indexing of up to 20GB of data per day and provides the functionality of monitoring and alert generation. To activate the full functionality of the Splunk software system, the Enterprise edition is required. The Splunk Enterprise software claims to allow for better scaling options, distributed searches and high availability [40]. There is also a Splunk Cloud licenses which allows for storing and processing of log data within the cloud [41].

5.2 ELK Stack

ELK Stack is a collection of the open source systems Elasticsearch, Logstash and Kibana. The Elasticsearch system emerged from the Compass software which was developed by Shay Banon from 2004 and onward. In 2010 Shay Banon reimagined the Compass software to achieve better scalable search and the foundation of Elasticsearch was created [42]. Elasticsearch is a search and

33

analytics engine which stores and indexes data for fast search and access. To make it easier to use Elasticsearch for log storage and indexing Logstash was created. Logstash is a software that can ingest and transform data from a wide variety of sources. After the data has been transformed it is sent to Elasticsearch to be stored and indexed. Finally, to make it possible to visualize the data stored within the Elasticsearch system Kibana was created. Kibana takes the data stored within Elasticsearch and uses it to create graphs and charts to help with understanding different events and their correlation [43]. The ELK Stack uses a layered structure to allow for horizontal scaling of the implemented system. The ability to have horizontal scaling of a system allows for adding more nodes to the system when and where they are required. In the case of the ELK Stack System, the Logstash nodes are the nodes closest to the clients that can be seen in the log management structure of figure 3. The Logstash nodes collects and transforms the log data produced on devices within the network. The data is then forwarded to the Elasticsearch nodes cluster which stores and indexes the data. Finally, a node running Kibana can perform a search on the Elasticsearch cluster to retrieve data. Kibana then uses the retrieved data to visualize the events stored within in different graphs and charts chosen by the user.

Figure 4: This figure illustrates the architecture of the ELK Stack system. The ELK Stack system uses different nodes to collect, analyse and visualize data.

Data can be collected from several different log sources, including Beats clients and Syslog protocol enabled devices. Figure drawn by the author using the tools

at https://www.draw.io.

The image seen in figure 4 illustrates the software architecture of the ELK Stack and the pipeline which log messages are passed through within the system. Each part is created as its own standalone software package and can be installed separately or together. Just like Splunk, the ELK Stack can be installed on a

34

single device or separated to different devices to spread the load. This allows for great adjustment and scaling depending on the requirements of the network that the system is implemented in. If one part of the system is having a high resource usage, this part can be facilitated by adding more nodes to help mitigate the load of the system. For example, if a network switch with a large number of devices connected to it is added to a network, an extra Logstash node can also be added to handle all the log messages coming from these new devices. Elasticsearch was the initial developed open source software within the ELK Stack and is used as its foundation. Elasticsearch was built to enable for faster and scalable search through large amounts of data. Since this was the goal it was designed for it is also expected to perform well at search queries touching a large amount of data. Elasticsearch uses indexes to differentiate between different types of data [43]. Each index is supposed to be used to store one type of data, for example log data. The indexes stored within Elasticsearch can then be split into several shards. Each shard stores a piece of the data from the index it is a part of. These shards can be distributed throughout the cluster of Elasticsearch nodes within the network. The distribution of shards allows the searches of data within an index to be completed a lot faster. The reason as to why the search time is reduced is because of that the data within an index is stored on multiple devices which allows each Elasticsearch node to search through the data in parallel with other nodes. This allows for a very good scaling concept since each device added to the Elasticsearch cluster will reduce the search time of queries. Shards can also be replicated on two or more devices to prevent data loss and increase data availability. The replication of shards allows for faster searches since the data stored within a shard can be searched simultaneously on all devices it is replicated to. Logstash is used to ingest data from a wide variety of devices. Logstash uses input plugins to parse the data received [44]. There are some default input plugins which comes with Logstash such as beats and syslog which allows Logstash to automatically parse input from these sources. If there is an input source that is not available as a plugin, the events can be transmitted using other options and protocols such as to a TCP or UDP socket. The events which cannot be parsed by an input plugin can instead be filtered using a filter plugin. The filter plugin can be used to parse information from different data sources to allow indexing of a wide variety of data. For example, the timestamp of an event can be parsed using the date filter plugin [45]. To facilitate the load on the Logstash node within an ELK Stack system, Filebeat can be installed on most clients [46]. The Filebeat software can parse log messages on the client device and thereby using the clients processing power. This means that most of the log messages coming from the operating systems and applications log source groups can be processed on client computers. This allows the ELK Stack system to spread the load of log message processing across the client computers as well. There will be a slight increase in configuration required since the Filebeat service needs to be installed and configured on each device it is supposed to be used on.

35

Logstash supports a wide variety of transport protocols for collection of data such as TCP, UDP and UNIX-sockets. Within the ELK Stack only TCP is used to transfer data between nodes since this is more reliable than most alternatives. Also, ELK Stack can use HTTP to expose its APIs to all non-Java nodes within the network [47]. Logstash also has functionality to provide at-least-once delivery guarantees which is called persistent queues. The persistent queues functionality allows the Logstash nodes within a network to absorb larger burst of events and cache these in either memory or disk storage [48]. It allows for caching of events if there should be a network issue or device failure. Persistent queues can also be used to make sure that events are not lost if a Logstash node is expectedly or unexpectedly restarted or shutdown. In the end the persistent queues functionality will guarantee that events are stored and indexed on at least one Elasticsearch node. Just like Splunk, ELK Stack can use TLS (Transport Layer Security). Using TLS with certificates allows for authentication of the nodes within a network to make sure that only legitim nodes are part of the ELK Stack cluster. It also makes it possible to deploy encryption on the transportation of data between the nodes [49]. The ELK Stack allows for integrity check of the transported data to avoid forging attacks of log messages. The ELK Stack can use a wide variety of encryption and hashes (also called cipher suites) to allow for faster or stronger encryption algorithms. One of these cipher suites uses the AES-256-bit encryption which is considered strong enough to protect national secrets [50]. To protect data when stored on hard drives within the Elasticsearch cluster it is recommended to use the dm-crypt functionality which is available on most Linux distributions [51]. To enable monitoring and alerting of changes in data when using ELK Stack, the X-Pack feature needs to be installed [52, 53]. X-Pack is a plugin for the ELK Stack which can be installed to increase the management capabilities of ELK Stack. The X-Pack allows for extended functionality such as security, monitoring, alerting and reporting capabilities. Alerting of events that occur within the network is done by using the Watcher extension within X-Pack. Watcher is a set of rules which can be applied to search for anomalies within the log data at given times or intervals [54]. If an anomaly is detected, Watcher can perform an action such as sending an email to the network administrator. It is possible to sort and differentiate the most important events that occur within a network using the ELK Stack. This can be done by using the DSL Query language used by Elasticsearch. In Kibana a search query can be structured to display only the events with the highest severity and order them using the severity and their timestamps [56]. The search query can be saved to allow for accessing this list of important events faster. It is also possible to use the visualization tools within Kibana to show the amount of sever events that occur on different devices throughout the network. This can be a great analyzation tool if there is an ongoing attack towards the network.

36

Both the Elasticsearch and Kibana engines within the ELK Stack is not uniquely designed for log management. Elasticsearch and Kibana can be used to index and visualize a great deal of different types of data and for various purposes. This means that the SIEM functionality is not directly incorporated in Kibana. Instead, Kibana must be configured to work as a SIEM systems and present relevant data to network management. This configuration process can be a time-consuming task for the entity implementing the ELK Stack. On the other hand, it is possible to make Kibana visualize almost any statistics from within the network. Many other SIEM systems might be easier to setup but does in many cases not offer the same amount of adaptation to the user's requirements.

Figure 5: This image illustrates an implementation of the ELK Stack in a complex

network with many devices. The implementation also uses Kafka and load balancer to be able to buffer events. Image copyrights belong to Elasticsearch

BV. Used with the approval of copyright owner [98].

The ELK Stack can use machine learning to detect anomalies within a network [57]. The machine learning feature can be used to monitor a network and detect sudden and unexpected events that occurs. These events could be for example a sudden rise in failed login attempts throughout the network. Machine learning can also be used to collect and analyze metrics from devices within the network. These metrics can tell a lot about how the network is operating. The metrics can also detect anomalies such as unexpected high CPU usage or longer than normal response times on some devices and applications. These changes in metrics might be an indicator of an ongoing attack against the network. Since the ELK Stack system is a combination of open source software it is free to use. Although being free to use it does not necessarily mean that it is the cheapest option. When considering the usage of a system of this size, the time it takes to implement it can be considerable. The longer the system takes to

37

implement the greater the cost will be for any entity implementing it. The biggest cost associated in the end with the implementation of the ELK Stack is most likely going to be the configuration of Kibana visualization and dashboards. Considering that Kibana does not come with a pre-installed SIEM solution, the configuration of Kibana to match an entity's requirements can be time consuming.

5.3 Graylog

Graylog was started in 2009 by Lennart Koopman and is from the ground up built as a log management system [58]. At the time there were a shortage of affordable log management software available on the market. Graylog was created to fill this gap. Graylog is an open source system which just like the ELK Stack is based upon Elasticsearch but incorporates its own processing and visualization tools. Since Elasticsearch is used, Graylog can make use of Logstash for handling log inputs to the system. Although some similarities can be found, Graylog has its own component implemented for dealing with the interaction between Logstash and Elasticsearch. This component is called Graylog-server and handles the storage of incoming log messages, interactions and search queries. The Graylog-server component also hosts the web server which the web-interface interacts with [58]. The graylog-web-interface component is written in JavaScript to allow for a client-side and single-page application which fetches data from the graylog-server [59]. When implementing Graylog there are two different parts of the system that allows for scaling. The first part is the Elasticsearch node cluster which indexes and stores all the data collected [60]. The second part is the Graylog node cluster which is used to process the data before displaying it to the user. Each Graylog node also incorporates a MongoDB [61] database which is used to store meta information, configuration data and user data on each Graylog node. The data stored within the MongoDB database is replicated to all Graylog nodes for increased redundancy and durability of the system [60] [62].

38

Figure 6: A basic view of a multi-node setup of a Graylog system. The system uses a separate cluster of Graylog-nodes and Elasticsearch nodes. This figure is under copyright license of Graylog Inc. Used with the approval of copyright owner [60].

Figure 6 presents an overview of a scalable implementation of the Graylog system. The Elasticsearch cluster can be expanded to add more storage, data redundancy and to reduce search times of data, just like in the ELK Stack system [55]. The cluster of Graylog nodes can be expanded to allow for increasing the throughput of processing of data and reducing the time for constructing visualizations of data. The load balancer seen in figure 6 is used to distribute searches and incoming log messages between the different Graylog nodes within the system. This allows for multiple searches and visualizations to be executed at the same time. The load balancer is also used to increase the availability of the Graylog system [63]. Graylog can parse input from standard syslog messages, JSON files and plaintext [64]. This makes it possible to use Filebeat [65] and Logstash [22] from the ELK Stack to parse and send data to Graylog servers. Since Graylog can use Logstash it can also use the benefits of the persistent queues

39

functionality to allow for at-least-once delivery guarantees and buffering of messages during peak load. When receiving log messages, Graylog uses an “extractor” to parse the data from the received message [66]. These extractors can be used to extract data with a wide variety of composition and from different sources. Extractors can be imported using content packs [64] which can be found in the Graylog marketplace [67]. The content packs include Graylog configurations for input, extractors, stream, dashboard and output. With the help of these configurations the parsing of log messages from new devices added to the network can be easier to setup. Using the content pack, the log data from devices can be extracted and presented with little effort required. It is also possible to build new extractors using regular expressions to parse plain text in different ways [66]. To protect the data when in transit Graylog supports the use of TLS (Transport Layer Security). Graylog can use self-signed certificates or certificates signed by other certificate authorities [68]. It is possible, and recommended, to disable the ciphers and key lengths which are considered unsafe or deprecated to increase the confidentiality and integrity of data during transport [69, 70]. Since the data is stored in an Elasticsearch cluster it is also recommended to use dm-crypt functionality for encryption of the data at rest on the Elasticsearch nodes [52]. Dashboards can be used within Graylog to display the most important data from the network. It is possible to create multiple dashboards within the Graylog interface and allow each dashboard to display different types of data [71]. The dashboards can display different visualizations of the data fetched from search queries. To sort the data before visualization, streams can be used within Graylog [72]. When the data is processed by Graylog it is matched against a set of rules that defines each stream. Depending on the definition of the stream, if the data matches one or more rules the data is tagged with the stream ID. The matching of the stream rules is the last part of the filtering chain within the Graylog message handling process. Unlike search queries, streams can be analyzed in real-time as data is captured by each stream within Graylog. This makes it possible to setup conditions for streams and generate alerts when these conditions are met [72]. When an alert condition is met, Graylog can send an alert to the network administrators either by Email or by a HTTP Post request. Just like the ELK Stack, Graylog is an open source project and free to use. While this in many cases can be a great benefit, the downsides are similar to those of the ELK Stack as well. The time required to setup and configure the system can be very hard to estimate. As mentioned with ELK Stack, the phase that takes the longest time to implement can be the configuration of dashboards to visualize the required data. However, Graylog does provide some functionalities to get content packets which can be installed on the

40

servers [67]. These contents packets provide basic setups for collection and visualization of data from different sources. Because of this, these content packets help to reduce the time required for configuration of the dashboards within Graylog.

5.4 System selection

As mentioned in section 3.4 the system selection process of the discussed systems will be based on several criterion. The systems will be given a grade from one to ten depending on how well it meets each criterion. Finally, the sum of the criterion for each system will be used to select the system for implementation as the proof of concept system within the stakeholder’s network. The goal of the proof of concept implementation is to prove that it meets the requirements and to present some implementation specific requirements of the system. It is possible to use many different criterion to rank the software systems. To make sure that the selected system is the optimal system only the most important criterion is used to rank the different systems. The criterion that was deemed most important was the cost of the implementation, scalability of the system, number of threads on Stack Overflow and if the system is based on open source software. The criterion will be discussed in its own subsections below and each systems compliance described and evaluated.

5.4.1 Stack Overflow

Stack Overflow [29] is a widely known online community for Q&A’s and sharing of knowledge between developers and users. Knowledge can be shared by posting questions and problems which other people and developers in the community can answer or solve. If a system has a good amount of questions asked on a site such as this, it could be a good indication that the system is widely used. This can be beneficial since there exists a larger community surrounding the system and therefore a greater chance of getting support with eventual problems. Searches on Stack Overflow is done by using a search phrase or tag. Each tag can be a name of a system, programming language et cetera. In this case both tags and a phrase search of the system name were used. The use of tags allows to match the systems name to the number of questions tagged with it. The search phrase of the system name was used to see in how many questions the system was mentioned. The search tags and phrases used for the different systems was: “Splunk”, “ELK-Stack” and “Graylog”. There could be other questions asked which falls outside of the scope created by using only these tags, such as questions asked using the tag “elasticsearch”. These tags were deemed irrelevant as only the complete system and not the individual software is evaluated within this thesis. The number of threads created using the different tags were as of 2018-04-18 as follow:

41

1. “ELK-Stack”: 1 223 questions tagged [74] and mentioned in 2 850 questions [75].

2. “Splunk”: 772 questions tagged [76] and mentioned in 2 435 questions [77].

3. “Graylog”: 117 questions tagged [78] and mentioned in 546 questions [79].

The “ELK-Stack” tag and search phrase were the most used tag and mentioned phrase of the evaluated systems. It was tagged in 1 223 questions and mentioned in 2 850 questions. Therefore, the ELK Stack system is clearly well known and has a large number of Q&A. Because of this, the ELK Stack will get a rating of 10 for this criteria. Splunk was the second most tagged system of the ones evaluated. With 772 questions tagged and being mentioned in 2 435 questions it is also clear that Splunk is widely known and used within the community. Although, since Splunk has close to 500 less questions tagged than ELK Stack it will receive a rating of 8 for this criteria. Graylog had the least number of questions it was tagged and mentioned in. Graylog was tagged in 117 questions and mentioned in 546 questions. Therefore, Graylog is with high probability the least known and used system of the ones evaluated. Although being less mentioned on Stack Overflow it must be noted that Graylog relies on the use of Elasticsearch and in some cases Logstash. These systems are both part of the ELK Stack and therefore some questions asked with the “ELK-Stack” tag will also be applicable to Graylog. Therefore, Graylog will receive a rating of 5 for this criteria.

5.4.2 Cost of implementation

The cost is in most cases a crucial part of the selection of any system. A lower cost is in most cases the main criteria when choosing a system for implementation. Systems with a low cost, such as free open source systems, can require more effort to implement. However, the evaluation of the cost of implementation criteria will include only the time required to implement the system. The price tag of the systems available on the log management market today is often based on the amount of log data processed each day. This can make it hard to estimate the cost of a system since it is not always easy to tell how much log data is produced. Because of this, the open source systems, which are free to use, will be awarded bonus points later for the lack of licensing cost by the use of the open source criteria. Splunk can automatically ingest many standard log formats such as Syslog, Apache log messages and windows event log messages and also present these in pre-defined dashboards [80]. In most cases, this reduces the time required to implement and configure the system to get some valuable information from it. In the enterprise license of the software a wide variety of support and services are included to help with the implementation and configuration of the system [81]. This can be a great help when dealing with systems of this scale since the

42

support and education comes directly from the developers. Since Splunk comes with what many smaller entities may consider a huge price tag, it can still be a viable solution. This is because of the ease-of-use and implementation compared to the open source alternatives such as ELK Stack and Graylog. Because of this, Splunk will receive a rating of 9 for this criteria. Since the ELK Stack is a free open source system it does not come with a price tag. This will dramatically reduce the cost of implementation of the system. The ELK Stack is not built solely for log management but can be also used for ingestion and visualization of other types of data. Because of this, the ELK Stack needs to be configured to be used as a log management system. Due to this, the configuration process of the ELK Stack is the most time-consuming of all the evaluated systems. The Logstash and Filebeat nodes needs to be configured to receive, process and forward the data to the Elasticsearch nodes. The Elasticsearch nodes also needs to be configured to receive the data from the Logstash and Filebeat nodes and then index the data. Finally, Kibana needs to be configured to retrieve data from the Elasticsearch node cluster and dashboards created to display the required data. Therefore, the configuration process of the ELK Stack can be very time-consuming. Due to the system having a slightly more time-consuming configuration process, the ELK Stack will receive a rating of 7 for this criteria. Just like ELK Stack, Graylog is also a free open source system. Graylog does not however suffer at the same degree from a tedious and time-consuming configuration process. Graylog allows for the use of “content packets” which can setup a configuration for ingestion, indexing and visualization of log data from various sources. This eases the configuration process of Graylog since many log formats are supported with pre-configured extractors and dashboards for visualization of the ingested data. Due to Graylog being an open source software and does not have as tedious setup and configuration process as the ELK Stack, Graylog will receive a rating of 8 for this criteria.

5.4.3 Scalability

When a system is implemented in larger production environments, the scaling of the system is a very important criteria. This is because systems that scale well will allow for better performance when implemented on larger scale. A scalable system also allows for easier expansion of the system if necessary in the future. If the requirements place upon the system should increase, the system can then be scaled up to meet these requirements. On the other hand, if a system does not scale well there might be a need to implement a larger scale system than required for the moment to be able to meet future requirements. Also, if the system does not scale well parts of it or the whole system might need to be replaced when the system is no longer capable of handling the load placed upon it. Splunk handles scaling by providing functionality for horizontal scaling [82]. Since Splunk is capable of horizontal scaling it allows for more hardware resources to be added when and where it is required. Since there are different

43

nodes within a Splunk system the horizontal scaling can be applied in different ways dependent on which layer is having problems handling the load. If the first layer of the system, which is the log processing layer, is taking up too much resources, a Splunk Forwarder [37] can be added to handle some of the load. In case there is a need for an increased amount of storage or extra data redundancy, another Splunk Indexer can be added to the systems second layer. Also, if there is an increasing number of searches being performed on the data it might be a good idea to add another Splunk search head to the third system layer which allows the system to handle additional searches simultaneously. This means that it is possible to scale each system layer in Splunk individually to handle the load placed upon the system. Due to the good scaling options of Splunk and the ability to add hardware resources where they are most required in the system, Splunk will receive a rating of 9 for this criteria. Just like Splunk, it is also possible to apply horizontal scaling techniques to the ELK Stack. As previously described the ELK Stack also have the ability to scale each layer of the system structure individually. In the first layer more Logstash nodes can be added to help manage the load of processing incoming log data. Also, Filebeat clients can be added to endpoints which also can process data and send it directly to Elasticsearch nodes or load balance the data to the Logstash nodes. This allows the clients processing power to be used to transform the log formats and help alleviate the load placed upon the Logstash nodes. In the second layer it is possible to scale the Elasticsearch cluster by adding more nodes. This allows the system to increase its storage capacity and data availability since there will be more nodes to store the log data on. It is possible to add several Kibana servers to the system to allow for several searches and data queries to be executed simultaneously. This can be useful when the system and network grows and there are multiple network administrators who are simultaneously analyzing the log data. Since the ELK Stack also offers good scaling options with individually and resource effective nodes, the ELK Stack will receive a rating of 9 for this criteria. Graylog uses a somewhat different system structure compared to the ELK Stack and Splunk. Graylog uses a cluster of Graylog server nodes to process log data before storage and storing configuration data within a replicated MongoDB database. To store the processed log data, Graylog uses an Elasticsearch cluster. Because of this, when scaling Graylog for increased storage space and data availability the process is similar to the ELK Stack. When there is a need to scale the log processing capability of Graylog, more Graylog servers can be added. Since these Graylog servers include the ability to process incoming log data, web interface hosting and its own instance of MongoDB database these nodes are performing multiple tasks. This can both be a benefit as well as a disadvantage. The advantage is that it decreases the number of nodes required to be setup within the system. This will also slightly limit the scalability of the system since the nodes are not used for a single purpose. Because of this, if the system is being upgraded to handle an increased amount of processing of log data, some of the hardware resources added will be used for different purposes. Although this might be beneficial during the implementation and maintenance

44

phase of the system, it does not really benefit the scalability of the system. Graylog does however allow for horizontal scaling of both the Elasticsearch cluster as well as the Graylog server nodes. Therefore, Graylog will receive a rating of 8 for this criteria.

5.4.4 Open source

There are many reasons why an open source system might be superior to a system that is not open source. In most cases open source system might be used due to the non-existing licensing cost. In many cases this will lead to a great reduction in price over a system that requires a purchased license. There are other benefits of using an open source system such as increased security, longevity and possible larger communities with more in-depth knowledge of the system. Splunk requires a purchased license if the data processed exceeds 500MB per day. When acquiring a purchased license, the amount paid is dependent on the data ingested by the Splunk installation each day. For a network with 20GB of ingested data per day the price would be $20 000 a year if billed annually [80]. Although there is a volume discounts for larger amounts of data, this is still a staggering amount to pay for a medium sized entity. Because of all these benefits gained by using open source systems, all the open source systems will receive a rating of 5 for this criteria while other systems will be rated 0.

5.4.5 Criterion summarization

The three systems evaluated within this thesis meets all the requirements for implementation in the stakeholder’s network. This means that any one of these three systems could be used as a viable solution for centralized log management and visualization within the stakeholder’s network. Because of this a selection process must be used to select one of the systems for a proof of concept implementation. To select the system to be used in the proof of concept implementation the ratings from the different criterion placed upon the system will be summarized. When the criterion has been summarized, the system with the highest sum will be the selected system for implementation.

45

The following criterion ratings and summarization was awarded to the systems

included in the selection process:

SYSTEM STACK

OVERFLOW

COST OF

IMPLEMENTATION

SCALABILITY

OPEN

SOURCE

CRITERIA

SUM

SPLUNK 8 9 9 0 26

ELK STACK 10 7 9 5 31

GRAYLOG 5 8 8 5 26

Table 4: A table showing each criteria rating received and the summarization of

the criteria ratings for the evaluated systems.

As can be seen in table 4 above the system with the highest rating is the ELK Stack. Because of this the ELK Stack will be implemented as a proof of concept system within the stakeholder’s network. Worth noticing is that all the ratings given to the different systems are based on the requirements of implementation in the stakeholder’s network. Because of this, these ratings might change depending on the environment the systems are evaluated for. In table 4 above the system with the highest rating is the ELK stack. Therefore, the ELK Stack system will be implemented as a proof of concept system within the stakeholder’s network.

46

6 System implementation

This section describes the functionality of the chosen centralized log management system and how it was implemented to meet the requirements placed upon the system. The implementation for each requirement placed upon the system will be discussed in the sub-sections below. There are functionalities discussed throughout this section which may not be implemented in the proof of concept system but should be implemented in a full-scale implementation. In subsection 6.1 the network topology which the system was implemented in is described. In subsection 6.2 the parsing of log data sent to the system is described. Subsection 6.3 describes how encryption and authentication was implemented in the system. Within subsection 6.4 the functionalities required for data persistency and availability is described. In subsection 6.5 the scalability functionality of the system is described. Subsection 6.6 touches upon the generation of alerts and the X-Pack functionality. Finally, subsection 6.7 describes the visualization capabilities of Kibana.

6.1 Network topology

The proof of concept implementation of the ELK Stack system within the stakeholder’s network will be implemented on two dedicated servers. To be able to achieve redundancy and high availability using only two dedicated servers, a full ELK Stack system with Logstash, Elasticsearch and Kibana has been implemented on each server. However, when the ELK Stack is used within a production environment it is recommended to have a dedicated server for each node in the system. This means that each node within the system should be running on its own dedicated hardware. There are several reasons why it is recommended to split the system nodes to different hardware. For example, it is recommended to disable memory swapping on Elasticsearch nodes to make sure that the memory allocated by the Elasticsearch software is not swapped out frequently [83]. Disabling swapping will increase the performance and stability of the Elasticsearch node. Disabling memory swapping is not required for Logstash or Kibana and might instead decrease their performance. Because of this, having different types of nodes installed on the same hardware is not recommended. If it should be necessary, it is a good idea to separate the nodes to different virtual machines with their own dedicated resources and memory management. Another reason why it is a good idea to separate the nodes onto different hardware is because of the increase in system redundancy and availability. The ELK Stack offers solutions to disperse and load balance among the available nodes in different ways. For example, Elasticsearch can replicate the shards of each index across several nodes which is part of the Elasticsearch cluster [55]. This allows the Elasticsearch cluster to provide a high data availability. If one of the Elasticsearch nodes should go down, the data it contains will still be available from a replica of the shard on a different Elasticsearch node. It also

47

allows searches on the data to be executed in parallel on all nodes which stores the requested data. This will help to reduce the time required for performing searches on a large set of data [55]. The separation of different ELK Stack nodes are also beneficial when it comes to scaling. Having a single node running on each server allows for better estimation of load capacity. The load capacity of different hardware can be very hard to determine when several nodes are installed on the same server. Also, if there are several different nodes installed on the same server, all these nodes will compete for the available resources. During peak load times, implementation such as this can also become a bottleneck for the whole system. The purpose of this proof of concept implementation is to show that the system works as expected. Because of this, log data from network devices are required to be directed to the input source of the ELK Stack, which usually are the Logstash nodes. For this implementation the log data will be taken from four different sources:

▪ The firewall and IPS node protecting the stakeholder network from the

internet.

▪ The stakeholder’s internal DNS server.

▪ A Windows client computer using Filebeat to process and deliver log

messages.

▪ The log files produced on the two dedicated servers which the ELK Stack

is deployed on.

Together these log sources will provide a relatively small sample of log messages, but which contains a wide variety of data. The log messages produced on the client computer and the dedicated servers will contain mostly system and authorization messages which belongs to the operating system log source group. The log messages produced on the firewall and IPS node as well as the DNS server will contain messages related to the network and its security which belong to the computer security log source group. This diversity of data will provide a good basis for setting up different data visualizations within Kibana. In the network topology used for this implementation there were also two routers, one edge router connecting to the internet and one internal router handling the internal sub-networks. The log messages from these routers were not collected and processed within the ELK Stack system for this implementation.

48

Figure 7: This figure illustrates the planned network topology for the VLAN

which the ELK Stack system was deployed in. All clients, servers, firewalls and IPSs have an IP-Address within the sub-net of 192.168.99.0/24. Figure drawn by

the author using the tools at https://www.draw.io.

As can be seen in figure 7 above all the devices which log messages were collected from are in its own sub-net. This sub-net was created as a VLAN (Virtual Local Area Network) using a context-based firewall. This means that the network topology shown in figure 7 might not be entirely accurate but is used for illustrating the topology seen from within the VLAN. The devices connected to the VLAN can be scattered across the stakeholder’s network and the number of network nodes between them are most likely different to the visualization that can be seen in figure 7. The visualization in figure 7 is only representing an example of how the nodes are connected within the VLAN.

49

6.2 Parsing log data

Since there are many different applications and programs which produce log messages, the contents and structure of different log messages can vary greatly. Because of this, it is crucial for a centralized log management system to be able to parse the useful data from all these different log messages. When the data has been parsed from the log messages it is stored in a unified log format to make it easier to retrieve and analyse the data. In the ELK Stack system, Logstash is most commonly used to receive and parse log data from different devices. Logstash uses a three-stage processing pipeline where each stage includes multiple plugins to make it easier to adapt Logstash to different requirements [84]. The first stage within Logstash pipeline is the input as can be seen in figure 8 below. The input stage within Logstash processing of log handles the collection of events from different sources with the use of Logstash input plugins [44]. These plugins can handle connections and transmissions from many well-known log sources and protocols such as syslog, TCP (Transmission Control Protocol), UDP (User Datagram Protocol) and HTTP (Hypertext Transfer Protocol). Each input plugin can be configured with its own set of options such as which port to accept connections on, codec to be used for parsing of data and TLS (Transport Layer Security) options to secure transportation of events.

Figure 8: This figure illustrates the path that data take through the Logstash pipeline. Data is pushed/pulled from a source and processed in the Logstash pipeline and finally sent to the data destination. Image copyrights belongs to

Elasticsearch BV. Used with the approval of copyright owner [109].

As can be seen in figure 8, the second stage of the Logstash pipeline is the filtering stage. Just like the input stage, the filtering stage of Logstash use a set of filter plugins [85] to process and extract information from the events passing through the Logstash pipeline. As mentioned earlier one interesting filter plugin is the “date filter” [44] which can be used to extract the date from various events and store the date in a unified format instead. Another useful filter plugin is the “grok filter” which uses grok patterns to extract data from text and structure it into fields. Since there are a lot of Logstash filter plugins which can be used together and even nested, the filter plugins are one of the most powerful

50

tools of the ELK Stack. It might also be one of the most time-consuming tasks when implementing ELK Stack. This is because of that while it brings great flexibility of the parsing of event data, most log source passed through the Logstash pipeline requires its own setup of filter plugin. In larger networks the number of different log sources can become large and therefore the process of setting up Logstash filter plugins for all these increases in difficulty. In the final stage of the Logstash pipeline the log data has been parsed and it is time to transfer the data to the Elasticsearch cluster. This stage is configured using the Logstash output plugins [86]. The Logstash output plugins are used to configure Logstash with the destinations where the parsed data should be sent or stored. There are output plugins like the “elasticsearch” which can send data to the Elasticsearch cluster over HTTP or HTTPS. There is also the “file” output plugin which can write all the processed events to a local file on the Logstash node. When the processed events have been transferred or stored the Logstash pipeline is done and the next part of the ELK Stack can continue its work. Within this thesis grok, date and multiline filters were used. This filter use regular expressions to parse and change data. The grok filter is one of the most powerful tools within the ELK Stack system and is used to parse log messages into standardized fields. However, the process of creating and understanding grok filters can be very complex and time consuming. In figure 9 below a short introduction to grok filters is given.

Figure 9: A short introduction to have the parsing of log data using grok filters is

done. The message can be seen in the top and is colour encoded to better show the fields that is being parsed. The grok filter used can be seen in the middle of the figure and at the bottom the parsed fields can be seen. Figure drawn by the

author.

As can be seen in figure 9 above the parsing of data using grok filters can be quite complex. The fields that have been parsed by the grok filter can be used as basis for searching the log data within Kibana. This allows any data to be extracted from the logs and used for visualization as long as the correct grok filter is used.

51

The “multiline” filter can be used to compress log messages sent on separate lines into a single log message. In the proof of concept implementation, a “multiline” filter designed to collapse Java stack trace log messages was implemented. The pattern used can be seen in Appendix E. This allows the ELK Stack to store java stack trace messages as a single message which would otherwise be divided by each line of the original message. The final filter used within the proof of concept implementation was the “date” filter. The “date” filter was used to parse the syslog timestamps into a standardized format. The implementation of the “date” filter that was used can be seen in Appendix F.

6.3 Encryption, Authentication and Integrity

The data within processed and stored events can contain sensitive or security related information. Because of this, it is important to make sure that the data only can be viewed and accessed by authorized people. There are several ways which the data could be accessed or extracted from within a network. For example, a trojan or virus infected host might be eaves-dropping on the events sent to the logging servers to intercept security related data. If the host gets access to this sort of data, it might aid in the launch of a larger attack on the network. Another way to access the log data might be through gaining access to the database or visualization tool used to store and visualize the log data. It is also possible that the hard drives used to store all the log data can be stolen and that the data is extracted from these hard drives. It is a long list of possible ways that the data contained within log messages can be unauthorizedly accessed and exploited. To mitigate all these threats, it is important to protect the data during transit and storage within a centralized log management system. The ELK Stack uses HTTP to transfer data between its nodes. As previously explained in section 5.2, it is possible to setup all nodes within the ELK Stack system to use HTTPS with TLS (Transport Layer Security). This will encrypt all communications and data transfers between nodes in the network [87]. The use of TLS also allows for authentication of nodes within the ELK Stack. When an TLS connection is established between two ELK Stack nodes within the network, it is possible to set the nodes to authenticate each other. There are multiple options for authentications such as only presenting a certificate signed by an accepted certificate authority or using certificates signed with the IP-addresses and DNS-name of the server to further enhance the authenticity of the nodes [88]. These configurations ensure that no information is leaked during transport between all nodes within the ELK Stack to unauthorized users. When data is stored on one of the nodes within the Elasticsearch Cluster of the ELK Stack the data is not encrypted. There are currently no functionalities to encrypt the data stored within the Elasticsearch indices and for it to be decrypted during searches. As explained within section 5.2, the ELK Stack

52

developers opts for a disk-wide encryption such as dm-crypt to encrypt the data at rest [52]. This allows the data to be encrypted on the hard drive and decrypted during the use within the server. If the hard drives should be removed from the server, the data would be unreadable without the encryption key. Together with the transport encryption over HTTPS the ELK Stack would provide encryption of data across the whole system, either while at rest or in transport. The only place data would be unencrypted is in the RAM since the data is decrypted when read from the hard drives. There are two possible ways to access the data stored within the Elasticsearch cluster. The first is to use HTTP(S) to connect to one of the Elasticsearch nodes within the cluster on its configured port and post a search query. The second way of accessing the data is through posting a search query within the Kibana interface. To allow for user authentication when accessing data, it is possible to setup different users and passwords within the ELK Stack [89]. This is done by using the X-Pack plugin for the ELK Stack nodes [53]. When user authentication is implemented, accessing the HTTP(S) interface of nodes within the Elasticsearch cluster will require a username and password. This username and passwords also needs to be configured in Kibana to allow it to retrieve data from the Elasticsearch cluster. It is also possible to add multiple users to the ELK Stack system along with the built-in users or use for example, LDAP (Lightweight Directory Access Protocol) to handle user authentication [90]. With the ability for user authentication it is possible to ensure that data can only be reached by authorized users within the ELK Stack. In the implementation of the proof of concept system, transport encryption will be implemented using TLS (Transport Layer Security) certificates. The certificates are set up using an internal certificate authority on one of the dedicated ELK Stack servers. This internal certificate authority was created using the X-Pack plugin’s “certutil” [91] feature. When the certificate authority had been created, “certutil” was also be used to create and sign certificates with the private key of the certificate authority. These certificates were then used to verify the different nodes within the ELK Stack system and enable TLS transport encryption within the implemented system. It is also possible to use certificate signed by other certificate authorities if their certificate is added to the list of trusted certificate authorities. To handle the authentication of users who tries to access the contents of the ELK Stack system within the implementation, the built-in users were used. Each user had a strong password setup by using the “setup-passwords” [92] command of the X-Pack plugin. The “setup-passwords” command allows for an interactive mode where the user enters the passwords manually or an automatic mode where the command generates strong passwords for the users and posts them on the screen. When the passwords of the built-in users had been configured the Kibana configuration was updated to allow it to connect to Elasticsearch with the new username and password.

53

Figure 10: The authentication page presented in the Kibana web interface when

authentication is setup for the ELK Stack. Image copyrights belongs to Elasticsearch BV. Used with the approval of copyright owner [90].

As can be seen in figure 10 above, when trying to access the Kibana or Elasticsearch interface authentication is required to access the data. Since this implementation only has the three built in users: kibana, elastic and logstash_system one of these users and the accompanying password should be used for authentication. In the proof of concept implementation, a decision was made to not activate disk encryption on the hard drives of the Elasticsearch nodes. This was because of that it was deemed unnecessary for this small implementation of the system. It is not the focus of this thesis on how to configure and implement disk encryption to protect data. However, disk encryption is strongly recommended in production to protect sensitive and security data.

6.4 Data persistency and availability

When data from an event has been captured it is important to make sure that it is not lost. There might be several reasons as to why the data would be lost within the system. Some of the most common problems arise during transfer of the data, abnormal system shutdowns and storage related issues. To protect the data during the transportation between nodes in the system, there are several things to consider. To begin with, when transferring data between any two nodes in the system it is important that the nodes can verify that the data was received and that it has not been altered during the transfer. This will help to protect the data from network problems since the data is not deleted on the transmitting node until it has been verified and acknowledged by the receiving node. The ability to detect alterations in the data transferred allows the receiving node to detect bit errors that may occur. If a bit error should be detected, the transferred data should be resent [94]. Because of this, the data received is guaranteed to be the same as the data sent. Within the ELK Stack system, the transportations of data use the HTTP (Hyper Text Transfer Protocol) [93]. Nodes accessing the Elasticsearch cluster uses the GET and POST functionality of HTTP to retrieve and send data. The none-data communications between nodes within the ELK Stack system does not use

54

HTTP but instead it uses its underlying protocol TCP (Transmission Control Protocol) [93]. This means that all communications within the ELK Stack system uses TCP as its transportation protocol. TCP provides a lot of functionalities, but most important in this case is the reliable communication and checksum functionalities. The reliable communication functionality allows TCP to guarantee at-least-once delivery of data and that the data received by the application will be in correct order [94]. TCP also uses the checksum functionality to ensure that the data received is the same as the data transmitted [94]. These functionalities will ensure that data is persistent during transportation within the ELK Stack system. It is not uncommon that computer devices or software malfunctions and their execution needs to be stopped. In most cases a software’s execution can be halted by letting the operating system send a signal to it. For example, the SIGTERM signal on Linux operating system which can be sent to signal a process to shut down [95]. This will allow the process to gracefully shutdown by allowing it to clean up and release any resources in use and maybe even finish its current task. Problems may arise when a process is interrupted and not allowed to shutdown properly. For example, there could be a power failure and the device the process is executed on abruptly halts. This might lead to data loss if the process is not able to handle these situations correctly. In the ELK Stack system, Logstash has the required functionality to handle abnormal shutdown of nodes during processing or transportation of events. This functionality within Logstash is called persistent queues [48] and was briefly discussed in section 5.2. When persistent queues are activated within Logstash it causes Logstash to write all the events received to be written to a queue on disk. This queue is inserted between the input and filtering process within the Logstash pipeline. The events are only removed from the queue if they pass through the filtering and output process correctly and the reception is acknowledged by the output destination, which in most cases is a node within the Elasticsearch cluster. Therefore, no events will be lost if a Logstash node is abnormally terminated or there are network interruptions during transport to the Elasticsearch cluster. Persistent queues can also serve as a buffer during peak load times and allows Logstash to buffer a larger number of events than would have been possible by using the regular memory queue functionality. There is always a possibility of hard drive failures at any given time and therefore it is important to back up the data stored within the ELK Stack system. The data stored on the nodes within the Elasticsearch cluster normally performs automatic replicas of the stored shards [96]. These replicas are spread out across the different nodes within the Elasticsearch cluster to ensure data redundancy and availability. Since the data of each shard is stored on several Elasticsearch nodes, if any node should fail its data can still be retrieved from another node within the cluster. This will secure the data and ensure its availability in case of node failures. For data which is crucial to an entity it is also possible to geographically disperse the Elasticsearch cluster to keep the

55

data stored at different locations geographically. This allows for the greatest amount of data redundancy possible. For the proof of concept implementation one instance of Elasticsearch was installed on each server and together configured as a single cluster. The Elasticsearch cluster was configured to replicate all shards across both nodes. This allowed the small-scale implementation to still have data redundancy and availability. The persistent queues functionality within Logstash was implemented as well. This will make sure that the implementation will not suffer any data loss after the data has been accepted into the system. This functionality was very useful throughout the implementation process since the Logstash nodes and the servers they were installed upon was restarted a lot.

6.5 Scaling the system

It is important for a log management system to scale well to be able to meet the growth of a large network. There are two common terms when it comes to scaling of computer systems, these are horizontal and vertical scaling [22]. Vertical scaling implies that the system performance or load tolerance is increased by upgrading the hardware of the system. For example, installing a new processor or RAM on the server hosting the system. Horizontal scaling instead focuses on spreading the load across several devices. For example, setting up additional servers which the system can use. To effectively scale the ELK Stack system horizontal scaling should be used [97]. Since the ELK Stack is commonly split into 4 different layers, like the structure seen in figure 3, each layer can be scaled on its own. This allows the ELK Stack system great flexibility when it comes to scaling. If one node within the ELK Stack system should be under heavy load, an identical node can then be inserted to help manage the load. Together with the X-Pack plugin’s monitoring feature of the ELK Stack it is possible to discern which nodes are heavily loaded and has high resource usage [98]. Using this data, it is possible to pinpoint the exact locations within the network that requires increased performance to handle the load. This can be very beneficial functionality when managing a large network with possible thousands of devices. To implement horizontal scaling in the ELK Stack, the number of nodes needs to be increased. The most common nodes to be under heavy load is the Logstash nodes because they parse and transform the log data they receive. These tasks often require high CPU usage, especially if grok-patterns are used for the parsing of data [99]. In larger production environments it is a good idea to consider the use of a load balancer and buffer to distribute events to the different Logstash nodes. The Filebeats data shipper which is part of the ELK Stack system and installed on the client nodes allows for load balancing across multiple Logstash nodes when transporting data [100]. It is common to use a message broker such as Apache Kafka in production environments to buffer events during bursts in traffic [101]. For example, the setup in figure 5 shows an ELK Stack system using an Apache Kafka instance for buffering of log

56

messages. Another way to buffer events without the requirement of implementing extra nodes are to use the persistent queues feature within Logstash that was described earlier [48]. Kibana is the least resource intensive node within the ELK Stack. In most cases one Kibana node should be enough, but if required more nodes can be added. It is possible to “lock up” Kibana’s resources with large searches and heavy data visualizations [102]. Therefore, if multiple users need access to Kibana, it is possible to deploy several Kibana nodes. These Kibana nodes should in turn be connected to Elasticsearch coordination nodes which will handle the load spreading of searches across the Elasticsearch cluster.

6.6 Generating alerts and X-Pack

There are some events that can occur in a network which the network administrators would like to be alerted about as fast as possible. These events can be anything from a detected intrusion on an IPS (Intrusion Protection System) to an unscheduled shutdown of a server. The information about these events are often stored in log messages. Because of this, an increasingly important functionality of any log management system today is to be able to generate and send alerts when these events are detected. To generate alerts, ELK Stack uses the Watcher [54] which is a feature of the X-Pack plugin for the ELK Stack nodes. When a specified event, or “trigger”, take place, Watcher performs a search query upon the data. Watcher currently only supports scheduled triggers which means that the search query will be performed on a specific time or on intervals [103]. A condition is set within Watcher, such as a log severity level of 3 or lower, which should be true for one of the events returned by the search query. When Watcher executes a search at the specified time or interval and the condition returns true, Watcher performs an “action”. The “action” is the generation of the alert and allows Watcher to alert administrators by, for example, sending emails, Webhook request to HTTP servers or indexing an event in Elasticsearch [104]. Since Watcher is installed along with the ELK Stack’s X-Pack plugin, it is also dependent upon it. X-Pack comes with a long list of features for increasing the ease of use and usage areas of the ELK Stack system. For example, X-Pack allows the functionality of monitoring the health and operation of all nodes within the ELK Stack. X-Pack comes with a bunch of features such as security features, machine learning and APM (Application Performance monitoring) for collecting performance metrics from inside applications [105]. However, X-Pack is not an open source software but is produced by the company Elasticsearch BV and the license to use most of these functionalities has to be bought. There is no official price tag on the X-Pack licenses to be found, instead a request must be sent to Elasticsearch BV for pricing information. The basic license of X-Pack is available for free and comes with some of the most trivial functionalities introduced by X-Pack.

57

There are some functionalities included in the paid X-Pack licenses that have available open source plugin alternatives for the ELK Stack. For example, the open source plugin Elastalert is an alternative to Watcher and available for free [106]. Knowi can provide AI (Artificial Intelligence) and machine learning to Elasticsearch [107]. These alternatives might not be as well integrated into the ELK Stack as the X-Pack features but can serve as a possible cheaper alternative if these functionalities should be required. In the proof of concept implementation of the ELK Stack system the X-Pack plugin was installed. The X-Pack plugin was used together with a basic license and therefore provided the limited set of functionalities. The monitoring of the cluster was available and was used as a tool for debugging purposes.

6.7 Kibana visualization

As mentioned by Robert Rinnan in his thesis “Benefits of Centralized Log File Correlation” [2], the greatest benefits do not come from centralized collection and storage of log data. Instead the largest benefits can be seen when log data is visualized and correlated. This is exactly what Kibana is designed to do, visualize and help with correlation of data. Kibana retrieves its data from the Elasticsearch cluster by using search queries specified in the Kibana web interface. The search queries are set to match one or several fields in the JSON [8] formatted fields parsed from the log files. The results returned from the Elasticsearch cluster can be viewed as formatted fields, the JSON file it is stored as or used as data for charts and visualizations. In Kibana visualizations are used to display data in different ways. There are several different visualization types which ranges from simple pie charts to Timelion time series [108]. These visualizations can be created and saved and together several visualizations can be used to create a dashboard [11]. These dashboards hold multiple visualizations and allows the user to see the most important data without having to go through the process of searching and visualizing it. To create a visualization in Kibana there are four tasks which needs to be completed [108]. The first task is to select one of the available visualization formats. For example, in this thesis a pie chart and a line chart were used to present some of the data collected. When the visualization format has been selected, the second task is to setup the search query. The search query will select the log data which matches the query and use only these data for the presentation. Finally, the third and fourth task is to select the data for the different axes or other visualization options depending on which visualization format that is used. If a line chart is used the third task would be to select a data aggregation for the Y axis. The fourth task would then also be to select the data for aggregation on the X axis.

58

Within the proof of concept implementation two different visualizations were created. This visualizations were also used together to create a dashboard. The first visualization created was a simple line chart displaying the number of “incidents” for each device throughout the network. The term “incidents” in this case means events with a low severity level. The data for this visualization was fetched using a search query specifying the log severity level as 3 or below. The other visualization created within the proof of concept system was a pie chart. The pie chart was designed to show the number of “sudo” commands executed on each host throughout the network. The data for this visualization was fetched using a search query specifying “sudo” to be marked as the program field within the parsed log data. Since data visualization and correlation is not the primary focus of this thesis, Kibana will not receive an in-depth description. The data visualization is also greatly dependent on the network requirements and the network administrator’s preferences. Because of this, further exploration and configuration of Kibana’s features are left to the entity implementing the ELK Stack system.

59

7 Results

This section presents the results that was produced during the project’s duration. These results consist of a logging policy, the system selection process and its related discussion and the implemented proof of concept system. These results will be presented in the sections 7.1, 7.2 and 7.3. The goals of this thesis were to create a logging policy, chose a system which meets the requirements staked out and then setup a small-scale implementation of the selected system which also meets the requirements. The logging policy was established in section 4. The selected system’s required functionality has been thoroughly discussed in section 5 and 6 to explain how it meets the requirements. To meet the final goal of this thesis the proof of concept implementation must also be proven that it meets the requirements placed upon the system. The proof of concept system implementation was discussed in section 6. The real results of this thesis can be viewed as the functionality of the proposed and implemented system. These functionalities are described in the subsection 7.3 below. However, there might be other use cases for the implemented system which is not discussed within this thesis.

7.1 Logging policy and system requirements

The logging policy that was created in section 4 was used to govern the requirements placed upon the system. The resulting logging policy can be seen in sections 4.1.1 through 4.1.4. This logging policy has been a vital part of the implementation process of the proof of concept system. Many of the choices made when implementing functionality within the proof of concept implementation was based upon the logging policy. Worth mentioning again is that the logging policy created in this policy is not intended for re-use. It is advised that the logging policy should be crafted from the networks requirements and not the other way around. The logging policy was used to help stake out the requirements for the selection of centralized log management system. These requirements were drafted by using the information within the statements of the logging policy together with the results of the background and literature study of the thesis. The requirements that can be seen in section 4.2 was used to select the valid systems for further investigation. These requirements were designed to make sure that the chosen system met the requirements of security and functionality described in the logging policy.

60


The selection process of the discussed centralized log management systems within this thesis was based upon several criterion described in section 3.4. Each system was then evaluated for these criterion in section 5.4. This resulted in the criteria points for each evaluated system that can be seen summarized in table 4. The selected system for implementation in the stakeholder’s network was the ELK Stack. The ELK Stack met all of the requirements placed upon the system which was described in section 4.2. The ELK Stack system is based upon the three different open source software Elasticsearch, Logstash and Kibana. The reasons for selecting the ELK Stack system was because of its outstanding ability of scaling, large community and consisting of open source software. The ELK Stack received its lowest rating in cost of implementation. The reason for this was due to the complexity of implementation compared to the other systems. The description of how the ELK Stack was implemented within the stakeholder’s network to meet the requirements and the statements from the logging policy is discussed in section 6.1 through 6.7. The results of the implementation can be seen in section 7.3.

7.3 Proof of concept implementation

As mentioned in section 6.1 the implementation of ELK Stack in the proof of concept system was done on two different servers. Each server was setup with one instance of a complete set of ELK Stack to provide redundancy of data and possibility of testing failover and availability. However, this is not a recommended approach in a production implementation of the system as also discussed in section 6.1. Since the two servers are identical in almost every way only the configuration files from one server has been included in this thesis. These configuration files were directly taken from one of the servers and the only alteration is that the passwords have been omitted. The configuration files for the proof of concept implementation can be found in appendix B through G. The goal was to implement a proof of concept system which meets all the requirements derived from the logging policy in section 4. This were in most cases successful but there were some functionalities specified in the requirements that has not been implemented due to lack of time at the end of the project. The implementation and usage of these functionalities has however been described in section 6.

7.3.1 Log collection and processing

The log collection and processing were done by the Logstash nodes installed on each server and on the Filebeat program installed on the client computer. The Filebeat instance was configured to send its log messages directly to Elasticsearch. This was because of that the Filebeat client is capable of parsing

61

the log messages before transportation therefore eliminating the requirement to send them to a Logstash instance. The Logstash instances was configured to receive two of the most common input types when using the ELK Stack system. These two input types are syslog and Filebeat log messages. Although the Filebeat input was not used in this case, it was configured as a failover if there should be a problem with the Elasticsearch nodes. The syslog input can be received on either UDP (Unified Datagram Protocol) or TCP (Transmission Control Protocol), where TCP should be preferred choice. The configuration of the Logstash input part of the pipeline can be seen in appendix E. To parse the events passing through the Logstash pipeline, two simple grok filters was used. These grok filter was only applied to the log messages transported to the Logstash nodes using the syslog input. This is because of that the log messages passed from Filebeat had already been parsed and reformatted to the JSON format on the client computer. The first grok pattern was implemented using the “multiline” input plugin. The “multiline” input plugin uses a simple pattern to parse the data for a pattern which indicates that the parsed data is a part of a larger log message, such as a Java stack trace. The need to use the “multiline” input plugin originates from the log source. The log source creates the log message on several lines and when transported to the Logstash server using the syslog protocol each line will be sent as an independent log message. The syntax and pattern were in this case used to identify multiline Java stack traces and collapse them from several log messages into a single log message. The pattern used for the “multiline” plugin can be seen in Appendix E.

62

The second grok filter was used in the filtering process of the Logstash pipeline. This grok pattern can be seen within the Logstash filter configuration in appendix F. This grok pattern was used to parse information from the syslog files. Among the information parsed is the priority, timestamp and hostname, all vital information in log data. The data is parsed into several different named fields and then stored in JSON format.

In figure 11 the result of the applied grok and date filter plugins when applied to a standard Logstash message. The parsed fields become indexed in Elasticsearch and therefore also searchable in Kibana. These parsed fields can then be used to better concentrate searches and when creating visualizations. The final stage of the Logstash pipeline is the output stage. The configuration of the output stage of the Logstash implementation can be seen in appendix G. This configuration allows Logstash to ship the collected and processed log data to the Elasticsearch cluster as well as storing it in a local file. The reason for storing the processed log data in a local file was to make it easier to troubleshoot the system. This is not recommended to implement in a production environment since it can consume a lot of disk space. The Elasticsearch output plugin was configured to allow load balance and failover of both Elasticsearch nodes within the network.

Figure 11: An example of a parsed syslog message using the grok pattern supplied in appendix F looks like in Kibana. Some information like the IP-address, log source and timestamp has been parsed into indexable fields. The log message can be viewed within the Kibana web interface. This is a screenshot

taken from the Kibana interface [111].

63

It is also possible to use grok patterns to parse non-standard log message formats. However, the grok patterns used in the proof of concept implementation should be able to parse most of the log messages sent to an ELK Stack system.

7.3.2 Log transportation and persistency

When Logstash has finished the collection and processing of the log data in the proof of concept implementation, the data is sent to one of the Elasticsearch nodes. Since the Elasticsearch nodes is installed on the same server as the Logstash nodes, there might not be a requirement to transport the data at all in this case. Since this is a proof of concept system which is intended to be scaled up it is still important to implement security features to protect the data while in transit. The protection of data when it is transported through the ELK Stack system was discussed in section 6.3. The ELK Stack uses TLS (Transport Layer Security) with certificates and private keys to encrypt communication and verify the authenticity of the nodes. Therefore, to implement encryption on all communications between the nodes in the ELK Stack system, each node is required to have a certificate. A certificate authority was created on one of the servers using the “certutil” [91] X-Pack plugin. The “certutil” plugin was also used to create and sign certificates for each server and the client computer. Only one certificate was created for each server, which means that the Elasticsearch, Logstash and Kibana nodes on the same server also used the same certificate and key. To make use of the TLS functionality within the ELK Stack each node had to be configured correctly. The configurations of each node can be seen in appendix B through D. To use the TLS functionality within the ELK Stack, the X-Pack plugin had to be installed on each node. When the X-Pack plugin was activated the “xpack.security” settings became available to use in the configuration of the nodes. When the TLS settings was configured the communication between the nodes in the proof of concept implementation was encrypted.

64

Figure 12: This figure shows a packet capture of HTTP communication between

the two servers. All text and data are sent in clear text and can easily be deciphered. This is a screenshot of one of the packet captured using Wireshark on the host at IP-Address 192.168.99.20. The packet was transmitted from the

Elasticsearch instance at the IP-Address 192.168.99.10.

In figure 12 the transmission of a data packet through the system without TLS encryption enabled can be seen. This data is easy to parse and therefore could be captured and deciphered during transmission. This could lead to a number of problems. For example, the data might contain sensitive or security related information. If an attack towards the network is attempted, this information could possible aid the attacker by providing valuable information. When TLS encryption was activated on the nodes within the system, the data contained within the transmitted packets was protected. By using encryption in communication between the nodes, the ELK Stack system ensures that no data is leaked unintentionally. Two actions were taken to prevent that data was not lost when accepted into the ELK Stack system. The first action was to activate the persistent queues on the Logstash nodes. The persistent queues allowed Logstash to buffer a large amount of data on disk during Elasticsearch downtime. It also helps to make sure that the events in the Logstash pipeline was not lost due to unexpected shutdowns or restarts. The second action was to control the data storage on Elasticsearch. Since there are two Elasticsearch nodes within the cluster, all the indexes and shards they collect was set to be replicated to the other node. This produced an exact duplicate of the data on both nodes. This allowed the Elasticsearch cluster to have a high data availability and persistency if one of the nodes was taken down or rebooted. This approach also helped to reduce the search times on large data sets since data could be searched on both nodes simultaneously.

65

7.3.3 Log data visualization

There are many available ways to visualize data in Kibana. In the implementation process of the proof of concept system only the surface was scratched. There were two different visualizations created in Kibana for visualizing some of the data collected within the system. Kibana has the functionality to access the parsed data from the log messages. The parsed data can be found by going to the “Discover” tab on the left navigation bar in Kibana. In the “Discover” tab it is possible to perform searches on data and add filters to display the raw parsed data from the log messages without any visualizations.

Figure 13: The standard log message display under the discover tab in Kibana.

The data displayed here has been fetched with a search query specifying log messages with the field "host:192.168.99.10" to be retrieved. This is a screenshot

taken from the Kibana interface. [111]

There are several things to take note of in Kibana’s Discovery interface in figure 13. The vertical bar graph that can be seen in figure 13 displays the number of log messages received over the defined time (which can be changed in the top right corner) matching the current search query and filter. The log messages are then presented in list below the graph. The results from selecting one of the log messages can be seen in figure 11 above. In figure 13 the search query (which can be seen at the top of the interface) was empty, meaning it should select all the data. The filter (which can be found in the top left of the interface) was set to filter out all log messages not originating from the “129.168.99.10” host. It is

66

also possible to sort the log data in the list using the parsed fields such as log severity levels. The first visualization created in Kibana was a line chart showing the number of “incidents” on the devices data was collected from. The term “incident” is used to represent an event with a log severity level of 3 or lower. Therefore, the data for this line chart was fetched using a search query specifying a log severity level of 3 or below. The time span of each visualization can be altered independently to the searching and selection of data. This means that one visualization can display data at time ranges from the last 15 minutes or the past week.

Figure 14: This figure displays a line chart visualization created within Kibana. The line chart displays the number of "incidents" within the last two days. The line chart was created using the steps mentioned in section 6.7 [108]. This is a

screenshot taken from the Kibana interface [111].

As can be seen in figure 14 above, the line chart allows a visualization of a changing metric to be visualized over a time period. In this case the “incidents” visualization might be helpful to track down a root cause of a system failure or

67

breach. This is because of that it can clearly indicate when sever incidents started occurring in the network. The second visualization was a pie chart visualization displaying the number of “sudo” commands executed on each host. The “sudo” command in Linux environments allows a user to execute other commands as the root user. In this case the data for the time chart was taken from the log messages captured in the last two days.

As can be seen in figure 15 above, the number of executed “sudo” commands sorted by each host can easily be examined in the Kibana visualization. This might be helpful to see if a host has been seized or is having a malicious behaviour. Both of these visualizations were created using the steps described in section 6.7. The visualizations may not be very useful but serve as a taste of what Kibana visualizations is capable of. How to visualize and what data to visualize is beyond the scope of this thesis.

Figure 15: This figure displays a pie chart visualization created within Kibana. The pie chart displays the number of "sudo" commands executed on each host within the last two days. The line chart was created using the steps mentioned in section 6.7 [108]. This is a screenshot taken from the Kibana

interface [111].

68

7.3.4 Alerting and X-Pack

Alerting is a very useful tool for network administrators to have. It allows the network administrators to receive a notice when something is wrong in the network or certain log messages is received by the system. Implementing the Watcher alerting tool in ELK Stack requires a paid license for the X-Pack plugin. Since the implementation of this thesis only had access to the basic license, Watcher could not be implemented or evaluated. When installing X-Pack with the basic license it gives access to some of its monitoring features. These features can be of great help if the system should be implemented in larger production environments. The information presented by this feature allowed for easy detection of bottle necks within the system.

Figure 16: The X-Pack monitoring feature for Elasticsearch node. In this figure the performance monitoring functionality of the Elasticsearch cluster is shown. The information seen in this figure can be found in the “Monitoring” tab in the

Kibana web interface. This is a screenshot taken from the Kibana interface [111].

Presented in figure 16 above is the X-Pack monitoring feature for the Elasticsearch nodes. This feature allowed the resource usage of the nodes to be displayed in the Kibana “Monitoring” tab. The feature was of great help to debug the system and find out the event line when parsing the log messages. Logstash have a similar feature for its nodes which are also presented in Kibana’s “Monitoring” tab. This feature also keeps track of important resource usage and load data.

Figure 17: The X-Pack monitoring feature for Logstash nodes. In this figure the performance monitoring of the Logstash nodes within the system is shown. The

information seen in this figure can be found in the “Monitoring” tab in the Kibana web interface. This is a screenshot taken from the Kibana interface [111].

In figure 17 above the very similar feature for the Logstash nodes in the ELK Stack system is presented. Since the Logstash nodes in many cases receive some of the highest loads of the ELK Stack system this was a good feature to have for scaling purposes. In the implementation process it was also very useful for tracing the path the log messages took through the system and the balancing between the Logstash nodes.

69

It is possible to see more detailed information about each nodes operation within the Kibana “Monitoring” tab. The information available would make a possible scaling of the system a lot easier since it is easy to see where the bottle necks within the system are. The X-Pack plugin also seemed to have several more interesting and useful features such as machine learning, security and increased monitoring features. However, these functionalities were beyond the scope of implementation within this thesis.

70

8 Discussion

In this section the work done in the thesis and its results are discussed. In the first section within this chapter, section 8.1, the research methods and system development methods will be discussed and evaluated. In sections 8.2, 8.3 and 8.4 the results from the biggest parts of the thesis are discussed. In sections 8.5 the visualization of data and its importance is discussed.

8.1 Research and system development methods

The research methods used within this thesis were the empirical and applied research methods. These methods fall under the qualitative group of research methods, although the empirical research method can also be used with a quantitative approach. The empirical research cycle was described in section 1.5 and the implementation of it within the thesis was described within section 3.2. I think that the empirical research was a good choice of research methodology to use for this thesis. The use of the empirical research methodology helped to produce results and conclusions that can be verified and which is based on good facts. The system development method used within this thesis was the SCRUM framework. The implementation of SCRUM within the thesis was done with the use of one-week sprints. Each subsection within section 6 is the equivalent work of one sprint. I think that the result of using SCRUM turned out good, but in the end, I am not sure it was necessary to use SCRUM. Since I alone was working on the project and the system basically needs to be implemented in a certain order another development method might have been a better choice.

8.2 Logging policy

The logging policy was created together with the stakeholder as a set of rules which the chosen system was required to follow. Today, it is common for larger entities to have several policies governing their network implementation and data storage. Common policies staked out are data protection policies and network security policies. Having a logging policy is not as common. Therefore, the purpose of creating a logging policy within this thesis was to raise awareness and show that it comes with some benefits and not only extra work. Looking back at the creation and use of the logging policy, it benefitted the project a lot. The first benefits presented themselves when staking out the requirements for the centralized logging system. The requirements placed upon the centralized logging system could easily be parsed from the created logging policy. The logging policy also made it easier to choose what functionality to implement in the proof of concept system. Some of the statements in the logging policy was not followed in the proof of concept implementation due to its size. Since the proof of concept implementation was a small-scale implementation, some of the statements in

71

the logging policy were deemed unnecessary to allow for more time to implement more of the vital functionalities. For example, there were no implementation of automatic log removal or log rotation. The reason for this was that during the implementation process it was deemed valuable to have a bigger data collection. If log rotations would have been applied, some of the data would have been disposed of and not available for visualization. Another statement that was not obeyed was the one describing the policy for log storage. The logs should have been encrypted when stored on the Elasticsearch nodes. Since the encryption of the hard drives on the servers were not considered a priority for this implementation, this statement was not either fulfilled.


There were several systems found that met the requirements set up in section 4.2. Because of this, a selection process was introduced to select one of the system to implement. The system selection process was based upon several criterion which made the system more attractive for usage in the proof of concept implementation and a full-scale implementation as well. The first criteria the centralized logging systems were evaluated for was their community size on Stack Overflow. This criteria were chosen because, as explained in section 3.4, it shows the size of the community surrounding the system. A large community surrounding the system allows for increased ability to get help with possible problems implementing the system. The community and questions asked on Stack Overflow has been of great help when setting up the ELK Stack as the proof of concept system. Many of the problems that arose when implementing the ELK Stack system was solved using the questions asked and answered within this community. The second criteria the system was evaluated for was the cost of implementation. Since there were two of the evaluated systems in this thesis which was based on open source software it was decided that the cost of implementation only focused on the time required to implement the systems. To compensate for the additional licensing cost of Splunk the open source criteria was introduced instead. The cost of implementation was the hardest criteria to set the ratings for. This is because of that to truly be able to set correct ratings for this criteria, each system should have been implemented and the time required, and difficulty should have been measured. This approach would simply take too long to complete. Therefore, the criteria ratings were instead based upon the estimated configuration and implementation process. The accuracy of these estimates are basically impossible to discern and is greatly depending on the functionalities required to be implemented with each system.

72

The third criteria were the scalability of the system. This criteria were also very hard to evaluate. This is a very important criteria to evaluate since the scaling of the systems are important when the implementation should be taken from proof of concept to full-scale. Since there is no performance comparison available between the systems it is hard to tell which system that scales best. However, as described in section 5.4.3 the scalability is probably likely to be better at systems where the system tasks are clearly separated as in figure 3. The fourth and final criteria was the open source bonus. The reasoning for including this criteria was explained earlier. It can be debated whether an open source system with a tedious configuration and implementation process actually reduces the cost towards an easier to implement system with a licensed cost. Once again, to answer this questions thoroughly all systems would probably have had to be implemented. The estimates made throughout the system selection process are however deemed enough since all systems meet the requirements. To sum up the system selection process it can be concluded that some of the criterion’s credibility would have been greatly increased by implementing all the systems. Unfortunately, this would simply take too much time and the result would be a much less focused implementation and thesis. If all systems had been implemented the result might even have been that none of them would have been configured or explained properly.

8.4 System implementation

The system implementation and configuration of the proof of concept system of the ELK Stack was quite straight forward. No major setbacks occurred during the system implementation. At the end of the project the system was functionating, processing, storing log information and was able to visualize data properly. The reason for the smooth configuration and implementation of the system probably was because of the deep investigation to the functionalities of the system before the implementation process began. Without this knowledge the system implementation process would probably have required much more time. The configuration of the ELK Stack is by no means straight forward, it takes more than downloading the software and installing it on three different servers. The configuration process of the ELK Stack is tedious and a complete knowledge of how the ELK Stack system operates is required to be able to setup all its features. This knowledge can be hard acquired and therefor giving the ELK Stack system a very steep learning curve. The configuration process is extremely flexible and allows ingestion and storage of data in a wide variety of ways which is one of the things that makes the ELK Stack a great system. Although this in many cases can be a good thing, it can

73

also be the ELK Stack’s downfall. I think that the configuration process could be improved by learning from Graylog’s content packets. A simple way to automatically implement the most standard parsers and grok filters to Logstash to extract data from well-known devices and applications into Elasticsearch. Also, having standard dashboards within Kibana for presenting this data would probably be much appreciate by the community.

8.5 Visualization of data

Kibana has the same troubles as the other parts of the ELK Stack, it has a steep learning curve. Although Kibana allows for a great variety in its ways to visualize data it is very confusing the first time a user comes in touch with the software and its interface. The visualizations, dashboards and even ELK Stack monitoring features within Kibana are great tools. Although, a thorough understanding of how everything works is required to use most of the functionalities efficiently. It would have been a good idea to introduce some more ways to visualize data in Kibana, but the focus of the project is towards the centralized log management part of the system. The possibilities of Kibana are untold and could probably do wonders for log analysis in most networks. Therefore, a deeper investigation into how the visualization of log data can be done and which data should be visualized for the greatest benefits are left as a future thesis proposal.

74

9 Conclusion

The goal of this thesis was to find a suitable centralized log management system to be used in the stakeholder’s network. This goal was reached by first setting up a logging policy which the system requirements was derived from. These requirements were then used to filter out the eligible systems for implementation as a proof of concept system. Since there were several systems meeting the requirements, one system had to be selected. The selected centralized log management system was the ELK Stack. The ELK Stack was then implemented as a small-scale proof of concept system within the stakeholder’s network. When creating the logging policy within this thesis it was concluded that the logging policy is a less known document. However, this document can provide structure and guidance for the log management within an entity and should not so easily be overlooked. While it may seem like a lot of work for small benefits, it can actually help to reduce the time of the implementation process. The guidelines presented in the logging policy will govern the implementation and analyzation process of log messages and thereby eliminating many uncertainties that may arise. While examining the market of log management systems today, the conclusion that can be drawn is that it seems to be in an accelerated growth. One of the most important details seen when investigating the most well-known systems was the ability to transform log data. There is an increasing requirement for the systems to be able to collect log data from many different sources and using a wide variety of protocols. All the systems in the top share of the market had this functionality. The parsing of data into a standard format seems to gain advantages in search speeds and functionalities like visualization and correlation of the data. The selected system, the ELK Stack, meets all the requirements placed upon it and this was proven using the proof of concept implementation. It is also well worth mentioning that there were several more system which also meets the requirements, but only three of them were discussed in this thesis. The evaluation of the ELK Stack show that it has a lot to gain from an increase in ease of use while still allowing for its complex configuration. The complex configuration process allows the ELK Stack to be adapted to be used in many different scenarios. An increase in the ease of use and implementation of some of the systems functionality would make the system much more attractive. For example, one of the most tedious process of setting up the ELK Stack is setting up the parsing of log messages. If there is a wide variety of log messages within a network, this process becomes even worse. The creation of grok patterns for parsing log data into standardized fields can take a long time. Setting up a way of installing community created grok patterns from within Kibana would make a huge difference to the ELK Stack.

75

What can be concluded from the selection and implementation process within this thesis is that no system is perfect. Each system discussed within this thesis has its own advantages and disadvantages. For a small network with only a couple of devices connected, Splunk or Graylog could be preferred. But when moving up to more complex and larger daily ingests of log data the ELK Stack becomes a much more viable option. This is because of that the configuration files can in many cases be straight of copied between different nodes within the ELK Stack with only small changes required. Without the licensing cost of Splunk to ingest these large amounts of data and the slightly better scaling than Graylog, ELK Stack is the recommended system to use within the stakeholder’s network. The ability to centralize the storage of log messages presented in this thesis is beneficial in ways such as enabling searching through all the log messages in a network simultaneously. It also takes less effort to analyze the log messages since they are all stored in the same place and not on different devices. However, the conclusion that can be drawn from this thesis results is that the true benefits of centralized log management does not come from the centralized storage of log messages. The true benefits come from the ability to parse log messages into a uniform format which then can be visualized or correlated. These functionalities provide a large amount of possibilities to detect problems and security issues within a network. There is a staggering amount of ways to visualize data in Kibana. While the instructions to create visualizations of data is trivial to explain and understand, the hard part is to know what data to visualize. Depending on the network requirements and the network administrator’s preferences different visualizations might be useful when investigating or monitoring a network. Because of this, the Kibana visualizations and dashboards will most likely need to be different depending on what network they are applied to. It can be concluded that to see the real benefits of the usage of an implemented ELK Stack system, the greatest benefits comes from what is done with the data rather than how it is collected, parsed and stored. The visualization tool, Kibana, is a remarkable tool which can provide in-depth understanding and correlation of log data, but only if you know what data to visualize and in what ways. Therefore, further investigation into which data is required to be collected within networks and how it can be visualized to provide the greatest benefits are left as a future thesis proposal.

76

10 Future work

The system selected and implemented within this thesis is working as a good centralized log management system in most aspects. Although, the conclusion drawn by this thesis indicates that there is more work to be done. There are several ways to improve the proposed solution to a centralized logging system within this thesis. To begin with, the visualization of log data is an interesting topic. There are many ways to visualize log data by using one of many different software. One approach might be to investigate which log data is the most interesting for a certain purpose. For example, investigations into which log data should be visualized for network security purposes, how it should be visualized and what benefits will be gained from the visualization is an interesting topic. Also mentioned upon in this thesis is the ability to implement machine learning in the centralized log management system. Using machine learning the parsing of log messages can possibly become more efficient and in some instances even automatic. Using machine learning it might also be possible to detect breaches within the network or other interesting irregularities. A deeper investigation into how machine learning can be used to detect network anomalies using data from the centralized log management system is also an interesting topic. Finally, the performance and efficiency of the ELK Stack system has not been tested in this thesis. Testing and comparing the ELK Stack and matching its performance with other systems would produce some interesting results. These results would tell more about the scalability and performance of the different systems. It would also give a hint about how much computer resources are required when implementing the different systems for different sized networks.

References

[1] Miniwatts Marketing Group, “World Internet Statistics and 2018 World Population Stats”. [Online]. Available: https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/ . [Accessed: 2018-04-29]

[2] Robert Rinnan, Gjøvik University College, “Benefits of Centralized Log File Correlation”, 2005. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.121.8787&rep=rep1&type=pdf. [Accessed: 2018-04-29]

[3] Rapid7, “The Pros and Cons of Open Source Logging”. [Online]. Available: https://blog.rapid7.com/2014/09/05/the-pros-and-cons-of-open-source-logging/. [Accessed: 2014-09-5]

[4] Rapid7, “Considering the Explosive Growth of Log Analytics”. [Online]. Available: https://blog.rapid7.com/2016/02/10/considering-the-explosive-growth-of-log-analytics/. [Accessed: 2018-04-2]

[5] Brad Hale, SolarWinds, “Estimating Log Generation for Security Information Event and Log Management”. [Online]. Available: http://content.solarwinds.com/creative/pdf/Whitepapers/estimating_log_generation_white_paper.pdf. [Accessed: 2018-04-25]

[6] Usama Fayyad, Greggory Piatetsky-Shapiro, Padhraic Smyth, “From Data Mining to Knowledge Discovery in Databases”, 1997-01-8. [Online]. Available: https://www.kdnuggets.com/gpspubs/aimag-kdd-overview-1996-Fayyad.pdf. [Accessed: 2018-04-29]

[7] David, Liedle, Solarwinds Loggly, “Why JSON is the Best Application Log Format”. [Online]. Available: https://www.loggly.com/blog/why-json-is-the-best-application-log-format-and-how-to-switch/. [Accessed: 2015-11-5]

[8] Json.org, “JSON”. [Online]. Available: https://www.json.org/. [Accessed: 2018-05-21]

[9] W3Schools.com, “XML Introduction”. [Online]. Available: https://www.w3schools.com/xml/xml_whatis.asp. [Accessed: 2018-05-21]

[10] Paul Rubens, “SIEM Guide: A Comprehensive View of Security Information and Event Management Tools”. [Online]. https://www.esecurityplanet.com/network-security/security-information-event-management-siem.html. [Accessed: 2017-06-5]

[11] Sveriges Justitiedepartement, “Lag (1960:729) om upphovsrätt till litterära och konstnärliga verk”. [Online]. Available: https://lagen.nu/1960:729. [Accessed: 2018-05-21]

[12] Splunk Inc, “Splunk: Log management, because ninjas are too busy”. [Online]. Available: https://www.splunk.com/en_us/solutions/solution-areas/log-management.html. [Accessed: 2018-04-8]

[13] Sveriges Justitiedepartement, “Personuppgiftslag (1998:204)”. [Online]. Available:

https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/

https://www.statista.com/statistics/471264/iot-number-of-connected-devices-worldwide/

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.121.8787&rep=rep1&type=pdf

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.121.8787&rep=rep1&type=pdf

https://blog.rapid7.com/2014/09/05/the-pros-and-cons-of-open-source-logging/

https://blog.rapid7.com/2014/09/05/the-pros-and-cons-of-open-source-logging/

https://blog.rapid7.com/2016/02/10/considering-the-explosive-growth-of-log-analytics/


http://content.solarwinds.com/creative/pdf/Whitepapers/estimating_log_generation_white_paper.pdf

http://content.solarwinds.com/creative/pdf/Whitepapers/estimating_log_generation_white_paper.pdf

https://www.kdnuggets.com/gpspubs/aimag-kdd-overview-1996-Fayyad.pdf

https://www.kdnuggets.com/gpspubs/aimag-kdd-overview-1996-Fayyad.pdf

https://www.loggly.com/blog/why-json-is-the-best-application-log-format-and-how-to-switch/

https://www.loggly.com/blog/why-json-is-the-best-application-log-format-and-how-to-switch/

https://www.json.org/

https://www.w3schools.com/xml/xml_whatis.asp

https://www.esecurityplanet.com/network-security/security-information-event-management-siem.html

https://www.esecurityplanet.com/network-security/security-information-event-management-siem.html

https://lagen.nu/1960:729

https://www.splunk.com/en_us/solutions/solution-areas/log-management.html

https://www.splunk.com/en_us/solutions/solution-areas/log-management.html

http://www.riksdagen.se/sv/dokument-lagar/dokument/svensk-forfattningssamling/personuppgiftslag-1998204_sfs-1998-204. [Accessed: 2018-05-21]

[14] Stuart MacDonald, Nicola Headlam, CLES, “Research Methods Handbook”. [Online]. Available: http://www.cles.org.uk/wp-content/uploads/2011/01/Research-Methods-Handbook.pdf. [Accessed: 2018-04-21]

[15] Anne Håkansson, KTH, “Portal of Research Methods and Methodologies for Research Projects and Degree Projects”. [Online]. Available: https://www.kth.se/social/files/55563b9df27654705999e3d6/Research%20Methods%20-%20Methodologies%281%29.pdf. [Accessed: 2018-05-21]

[16] Wikimedia Commons, “File:Empirical Cycle.svg – Wikimedia Commons”. [Online]. Available: https://commons.wikimedia.org/wiki/File:Empirical_Cycle.svg, [Accessed: 2018-06-13]

[17] Mietus, Dirkje Magrieta, University of Groningen, “Understanding planning for effective decision support”. [Online]. Available: https://www.rug.nl/research/portal/files/3296545/c3.pdf. [Accessed: 2018-06-13]

[18] Karen Kent, Murugiah Souppaya, Nation Institute of Standards and Technology, “Guide to Computer Security Log Management”, September 2006. [Online]. Available: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-92.pdf . [Accessed: 2018-04-10]

[19] R. Gerhards, Adison GmbH, IETF, “The Syslog Protocol”, March 2009. [Online]. Available: https://tools.ietf.org/html/rfc5424. [Accessed: 2018-04-29]

[20] Peter Matulis, “Centralized logging with rsyslog”, September 2009. [Online]. Available: https://admin.insights.ubuntu.com/wp-content/uploads/Whitepaper-CentralisedLogging-v11.pdf. [Accessed: 2018-04-22]

[21] James Donn, Tim Hartmann, Harvard, “Centralized Logging in a decentralized World”. [Online] Available: https://www.usenix.org/system/files/login/articles/105457-Donn.pdf. [Accessed: 2018-05-01]

[22] Elasticsearch BV, “Logstash: Centralize, Transform & Stash Your Data”. [Online]. Available: https://www.elastic.co/products/logstash [Accessed: 2018-04-8]

[23] Elasticsearch BV, “Elasticsearch: The Heart of the Elastic Stack”. [Online]. Available: https://www.elastic.co/products/elasticsearch. [Accessed: 2018-04-8]

[24] David Beaumont, IBM, “How to explain vertical and horizontal scaling in the cloud – Cloud computing news”, 2014-04-9. [Online]. Available: https://www.ibm.com/blogs/cloud-computing/2014/04/09/explain-vertical-horizontal-scaling-cloud/, [Accessed: 2018-05-23]

http://www.riksdagen.se/sv/dokument-lagar/dokument/svensk-forfattningssamling/personuppgiftslag-1998204_sfs-1998-204

http://www.riksdagen.se/sv/dokument-lagar/dokument/svensk-forfattningssamling/personuppgiftslag-1998204_sfs-1998-204

http://www.cles.org.uk/wp-content/uploads/2011/01/Research-Methods-Handbook.pdf

http://www.cles.org.uk/wp-content/uploads/2011/01/Research-Methods-Handbook.pdf

https://www.kth.se/social/files/55563b9df27654705999e3d6/Research%20Methods%20-%20Methodologies%281%29.pdf

https://www.kth.se/social/files/55563b9df27654705999e3d6/Research%20Methods%20-%20Methodologies%281%29.pdf

https://commons.wikimedia.org/wiki/File:Empirical_Cycle.svg

https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-92.pdf

https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-92.pdf

https://tools.ietf.org/html/rfc5424

https://admin.insights.ubuntu.com/wp-content/uploads/Whitepaper-CentralisedLogging-v11.pdf

https://admin.insights.ubuntu.com/wp-content/uploads/Whitepaper-CentralisedLogging-v11.pdf

https://www.usenix.org/system/files/login/articles/105457-Donn.pdf

https://www.elastic.co/products/logstash

https://www.elastic.co/products/elasticsearch

https://www.ibm.com/blogs/cloud-computing/2014/04/09/explain-vertical-horizontal-scaling-cloud/

https://www.ibm.com/blogs/cloud-computing/2014/04/09/explain-vertical-horizontal-scaling-cloud/

[25] Rapid7, “The benefit of having an enterprise logging policy”. [Online]. Available: https://blog.rapid7.com/2016/02/10/considering-the-explosive-growth-of-log-analytics/. [Accessed: 2018-04-27]

[26] Júlia Murínová, Brno, “Application Log Analysis”, 2015. [Online]. Available: https://is.muni.cz/th/374567/fi_m/thesis_murinova.pdf. [Accessed: 2018-04-28]

[27] ScrumGuides.org, “The Scrum Guide”. [Online]. Available: http://scrumguides.org/scrum-guide.html. [Accessed: 2018-04-8]

[28] Marcus P. Zillman, M.S., A.M.H.A , “Academic and Scholar Search Engines and Sources”. [Online]. Available: http://whitepapers.virtualprivatelibrary.net/Scholar.pdf. [Accessed: 2018-04-10]

[29] Stack Overflow, “Stack Overflow – Where Developers Learn, Share, & Build Careers”. [Online]. Available: https://stackoverflow.com/. [Accessed: 2018-04-22]

[30] Splunk Inc, “Splunk: Company overview”. [Online]. Available: https://www.splunk.com/pdfs/company-overview.pdf, [Accessed: 2018-04-20]

[31] Splunk Inc, “Technical Deep Dive Splunk Cloud”. [Online]. Available: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&ved=0ahUKEwitpejBucjaAhWJqaQKHZiFBH0QFghJMAM&url=https%3A%2F%2Fconf.splunk.com%2Fsession%2F2015%2Fconf2015_RBattula_Splunk_Cloud_CloudDeepDive.pdf&usg=AOvVaw1exSne7jT8ljMMvWArdyIx, [Accessed: 2018-04-20]

[32] Splunk Inc, “What data can I Index? – Splunk Documentation”. [Online]. Available: http://docs.splunk.com/Documentation/Splunk/7.0.3/Data/WhatSplunkcanmonitor#Types_of_data_sources, [Accessed: 2018-04-21]

[33] Splunk Inc, “rex – Splunk Documentation”. [Online]. Available: http://docs.splunk.com/Documentation/Splunk/7.0.3/SearchReference/Rex, [Accessed: 2018-04-22]

[34] Splunk Inc, “Configure Splunk forwarding to use the default certificate – Splunk Documentation”. [Online]. Available: http://docs.splunk.com/Documentation/Splunk/7.0.3/Security/ConfigureSplunkforwardingtousethedefaultcertificate, [Accessed: 2018-04-22]

[35] Splunk Inc, “Configure Splunk forwarding to use your own certificates – Splunk Documentation”. [Online]. Available: http://docs.splunk.com/Documentation/Splunk/7.0.3/Security/ConfigureSplunkforwardingtousesignedcertificates, [Accessed: 2018-04-22]

[36] Splunk Blogs, “Encrypting and decrypting fields”. [Online]. Available: https://www.splunk.com/blog/2010/01/25/encrypting-and-decrypting-fields.html, [Accessed: 2018-04-22]

[37] Splunk Inc, “Forwarders | Splunk enterprise features”. [Online]. Available: https://www.splunk.com/en_us/products/splunk-enterprise/features/forwarders.html, [Accessed: 2018-04-22]

[38] Splunk Inc, “Create real-time alerts – Splunk Documentation”. [Online]. Available:



https://is.muni.cz/th/374567/fi_m/thesis_murinova.pdf

http://scrumguides.org/scrum-guide.html

http://whitepapers.virtualprivatelibrary.net/Scholar.pdf

https://stackoverflow.com/

https://www.splunk.com/pdfs/company-overview.pdf

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&ved=0ahUKEwitpejBucjaAhWJqaQKHZiFBH0QFghJMAM&url=https%3A%2F%2Fconf.splunk.com%2Fsession%2F2015%2Fconf2015_RBattula_Splunk_Cloud_CloudDeepDive.pdf&usg=AOvVaw1exSne7jT8ljMMvWArdyIx





http://docs.splunk.com/Documentation/Splunk/7.0.3/Data/WhatSplunkcanmonitor

http://docs.splunk.com/Documentation/Splunk/7.0.3/Data/WhatSplunkcanmonitor

http://docs.splunk.com/Documentation/Splunk/7.0.3/SearchReference/Rex

http://docs.splunk.com/Documentation/Splunk/7.0.3/SearchReference/Rex

http://docs.splunk.com/Documentation/Splunk/7.0.3/Security/ConfigureSplunkforwardingtousethedefaultcertificate

http://docs.splunk.com/Documentation/Splunk/7.0.3/Security/ConfigureSplunkforwardingtousethedefaultcertificate

http://docs.splunk.com/Documentation/Splunk/7.0.3/Security/ConfigureSplunkforwardingtousesignedcertificates

http://docs.splunk.com/Documentation/Splunk/7.0.3/Security/ConfigureSplunkforwardingtousesignedcertificates

https://www.splunk.com/blog/2010/01/25/encrypting-and-decrypting-fields.html

https://www.splunk.com/blog/2010/01/25/encrypting-and-decrypting-fields.html

https://www.splunk.com/en_us/products/splunk-enterprise/features/forwarders.html

https://www.splunk.com/en_us/products/splunk-enterprise/features/forwarders.html

http://docs.splunk.com/Documentation/Splunk/7.0.3/Alert/DefineRealTimeAlerts, [Accessed: 2018-04-23]

[39] Splunk Inc, “Notable Events Review – Splunk Documentation”. [Online]. Available: http://docs.splunk.com/Documentation/ITSI/3.0.2/User/NotableEventsReview, [Accessed: 2018-04-23]

[40] Splunk Inc, “Splunk Product Comparison”. [Online]. Available: https://www.splunk.com/en_us/products/features-comparison-chart.html, [Accessed: 2018-04-23

[41] Splunk Inc, “Splunk pricing”. [Online]. Available: http://splunk.force.com/SplunkCloud, [Accessed: 2018-04-23]

[42] Shay Banon, "The Future of Compass & Elasticsearch – the dude abides”, 2010-07-7. [Online]. Available: http://thedudeabides.com/articles/the_future_of_compass, [Accessed: 2018-04-24]

[43] Elasticsearch BV, "ELK Stack: Elasticsearch, Logstash, Kibana | Elastic”. [Online]. Available: https://www.elastic.co/elk-stack, [Accessed: 2018-04-24]

[44] Elasticsearch BV, "Input plugins | Logstash Reference [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/logstash/current/input-plugins.html, [Accessed: 2018-04-24]

[45] Elasticsearch BV, "Date filter plugin | Logstash Reference [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html, [Accessed: 2018-04-24]

[46] Elasticsearch BV, “Filebeat: Lightweight Log Analysis & Elasticsearch | Elastic”. [Online]. Available: https://www.elastic.co/products/beats/filebeat, [Accessed: 2018-05-26]

[47] Elasticsearch BV, "Network Settings | Elasticsearch Reference [6.1] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/elasticsearch/reference/6.1/modules-network.html#_transport_and_http_protocols, [Accessed: 2018-04-25]

[48] Elasticsearch BV, "Persistent Queues | Logstash Reference [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/logstash/current/persistent-queues.html, [Accessed: 2018-04-25]

[49] Elasticsearch BV, "Encrypting Communications | X-Pack for the Elastic Stack [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/x-pack/current/encrypting-communications.html, [Accessed: 2018-04-25]

[50] CNSS, "CNSS Policy No. 15, Fact Sheet No. 1, National Policy on the Use of the Advanced Encryption Standard (AES) to Protect National Security Systems and National Security Information”, June 2003. [Online]. Available:

http://docs.splunk.com/Documentation/Splunk/7.0.3/Alert/DefineRealTimeAlerts

http://docs.splunk.com/Documentation/Splunk/7.0.3/Alert/DefineRealTimeAlerts

http://docs.splunk.com/Documentation/ITSI/3.0.2/User/NotableEventsReview

http://docs.splunk.com/Documentation/ITSI/3.0.2/User/NotableEventsReview

https://www.splunk.com/en_us/products/features-comparison-chart.html

https://www.splunk.com/en_us/products/features-comparison-chart.html

http://splunk.force.com/SplunkCloud

http://thedudeabides.com/articles/the_future_of_compass,

https://www.elastic.co/elk-stack

https://www.elastic.co/guide/en/logstash/current/input-plugins.html

https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html

https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html

https://www.elastic.co/products/beats/filebeat

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/modules-network.html#_transport_and_http_protocols

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/modules-network.html#_transport_and_http_protocols

https://www.elastic.co/guide/en/logstash/current/persistent-queues.html

https://www.elastic.co/guide/en/logstash/current/persistent-queues.html

https://www.elastic.co/guide/en/x-pack/current/encrypting-communications.html

https://www.elastic.co/guide/en/x-pack/current/encrypting-communications.html

https://web.archive.org/web/20101106122007/http://csrc.nist.gov/groups/ST/toolkit/documents/aes/CNSS15FS.pdf

[51] Elasticsearch BV, “How should I encrypt data at rest with Elasticsearch? – Elasticsearch – Discuss the Elastic Stack”. [Online]. Available: https://discuss.elastic.co/t/how-should-i-encrypt-data-at-rest-with-elasticsearch/96

[52] Elasticsearch BV, "Introduction | X-Pack for the Elastic Stack [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/x-pack/current/xpack-introduction.html, [Accessed: 2018-04-25]

[53] Elasticsearch BV, "Getting Started with Watcher | X-Pack for the Elastic Stack [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/x-pack/current/watcher-getting-started.html, [Accessed: 2018-04-25]

[54] Elasticsearch BV, "Alerting on Cluster and Index Events | X-Pack for the Elastic Stack [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/x-pack/current/xpack-alerting.html, [Accessed: 2018-04-25]

[55] Elasticsearch BV, "Basic Concepts | Elasticsearch Reference [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/_basic_concepts.html, [Accessed: 2018-04-26]

[56] Elasticsearch BV, "Sort | Elasticsearch Reference [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html, [Accessed: 2018-04-26]

[57] Elasticsearch BV, "Introducing Machine Learning for the Elastic Stack | Elastic”. [Online]. Available: https://www.elastic.co/blog/introducing-machine-learning-for-the-elastic-stack, [Accessed: 2018-04-26]

[58] Graylog Inc, “The thinking behind the Graylog architecture and why it matters to you – Graylog 2.4.3 documentation”. [Online]. Available: http://docs.graylog.org/en/2.4/pages/ideas_explained.html, [Accessed: 2018-04-28]

[59] Graylog Inc, “Web Interface – Graylog 2.4.3 documentation”. [Online]. Available: http://docs.graylog.org/en/2.4/pages/configuration/web_interface.html, [Accessed: 2018-04-28]

[60] Graylog Inc, “Architectural considerations – Graylog 2.4.3 documentation”. [Online]. Available: http://docs.graylog.org/en/latest/pages/architecture.html, [Accessed: 2018-05-07]

[61] Graylog Inc, “MongoDB for GIANT Ideas | MongoDB”. [Online]. Available: MongoDB Inc, https://www.mongodb.com/, [Accessed: 2018-05-07]

[62] Graylog Inc, “Frequently asked questions – Graylog 2.4.3 documentation”. [Online]. Available: http://docs.graylog.org/en/latest/pages/faq.html#what-is-mongodb-used-for, [Accessed: 2018-05-07]

https://web.archive.org/web/20101106122007/http:/csrc.nist.gov/groups/ST/toolkit/documents/aes/CNSS15FS.pdf

https://web.archive.org/web/20101106122007/http:/csrc.nist.gov/groups/ST/toolkit/documents/aes/CNSS15FS.pdf

https://discuss.elastic.co/t/how-should-i-encrypt-data-at-rest-with-elasticsearch/96

https://discuss.elastic.co/t/how-should-i-encrypt-data-at-rest-with-elasticsearch/96

https://www.elastic.co/guide/en/x-pack/current/xpack-introduction.html

https://www.elastic.co/guide/en/x-pack/current/xpack-introduction.html

https://www.elastic.co/guide/en/x-pack/current/watcher-getting-started.html

https://www.elastic.co/guide/en/x-pack/current/watcher-getting-started.html

https://www.elastic.co/guide/en/x-pack/current/xpack-alerting.html

https://www.elastic.co/guide/en/elasticsearch/reference/6.2/_basic_concepts.html

https://www.elastic.co/guide/en/elasticsearch/reference/6.2/_basic_concepts.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html

https://www.elastic.co/blog/introducing-machine-learning-for-the-elastic-stack

https://www.elastic.co/blog/introducing-machine-learning-for-the-elastic-stack

http://docs.graylog.org/en/2.4/pages/ideas_explained.html

http://docs.graylog.org/en/2.4/pages/configuration/web_interface.html

http://docs.graylog.org/en/2.4/pages/configuration/web_interface.html

http://docs.graylog.org/en/latest/pages/architecture.html

https://www.mongodb.com/

http://docs.graylog.org/en/latest/pages/faq.html#what-is-mongodb-used-for

http://docs.graylog.org/en/latest/pages/faq.html#what-is-mongodb-used-for

[63] Graylog Inc, “Load balancer integration – Graylog 2.4.3 documentation”. [Online]. Available: http://docs.graylog.org/en/latest/pages/configuration/load_balancers.html, [Accessed: 2018-05-07]

[64] Graylog Inc, “Sending in log data – Graylog 2.4.4 documentation”. [Online]. Available: http://docs.graylog.org/en/2.4/pages/sending_data.html#, [Accessed: 2018-05-08]

[65] Elasticsearch BV, “Filebeat: Lightweight Log Analysis & Elasticsearch | Elastic”. [Online]. Available: https://www.elastic.co/products/beats/filebeat, [Accessed: 2018-05-08]

[66] Graylog Inc, “Extractors – Graylog 2.4.4 documentation”. [Online]. Available: http://docs.graylog.org/en/2.4/pages/extractors.html, [Accessed: 2018-05-08]

[67] Graylog Inc, "Graylog Marketplace”. [Online]. Available: https://marketplace.graylog.org/, [Accessed: 2018-05-09]

[68] Graylog Inc, "Using HTTPS – Graylog 2.4.4 documentation”. [Online]. Available: http://docs.graylog.org/en/2.4/pages/configuration/https.html, [Accessed: 2018-05-09]

[69] Graylog Inc, "Securing Graylog – Graylog 2.4.4 documentation”. [Online]. Available: http://docs.graylog.org/en/2.4/pages/securing.html, [Accessed: 2018-05-09]

[70] Graylog Inc, "Using HTTPS – Graylog 2.4.4 documentation”. [Online]. Available: http://docs.graylog.org/en/2.4/pages/configuration/https.html#disable-ciphers-java, [Accessed: 2018-05-09]

[71] Graylog Inc, "Dashboards – Graylog 2.4.4 documentation”. [Online]. Available: http://docs.graylog.org/en/2.4/pages/dashboards.html, [Accessed: 2018-05-09]

[72] Graylog Inc, "Streams – Graylog 2.4.4 documentation”. [Online]. Available: http://docs.graylog.org/en/2.4/pages/streams.html, [Accessed: 2018-05-09]

[73] Graylog Inc, "Alerts – Graylog 2.4.4 documentation”. [Online]. Available: http://docs.graylog.org/en/2.4/pages/streams/alerts.html, [Accessed: 2018-05-09]

[74] Stack Exchange Inc, “Newest ‘elk-stack’ Questions – Stack Overflow”. [Online]. Available: https://stackoverflow.com/questions/tagged/elk-stack, [Accessed: 2018-04-18]

[75] Stack Exchange Inc, “Posts containing ‘elk-stack’ – Stack Overflow”. [Online]. Available: https://stackoverflow.com/search?q=elk-stack, [Accessed: 2018-04-18]

[76] Stack Exchange Inc, “Newest ‘splunk’ Questions – Stack Overflow”. [Online]. Available: https://stackoverflow.com/questions/tagged/splunk, [Accessed: 2018-04-18]

http://docs.graylog.org/en/latest/pages/configuration/load_balancers.html

http://docs.graylog.org/en/latest/pages/configuration/load_balancers.html

http://docs.graylog.org/en/2.4/pages/sending_data.html

https://www.elastic.co/products/beats/filebeat

http://docs.graylog.org/en/2.4/pages/extractors.html

https://marketplace.graylog.org/

http://docs.graylog.org/en/2.4/pages/configuration/https.html

http://docs.graylog.org/en/2.4/pages/securing.html

http://docs.graylog.org/en/2.4/pages/configuration/https.html#disable-ciphers-java

http://docs.graylog.org/en/2.4/pages/configuration/https.html#disable-ciphers-java

http://docs.graylog.org/en/2.4/pages/dashboards.html

http://docs.graylog.org/en/2.4/pages/streams.html

http://docs.graylog.org/en/2.4/pages/streams/alerts.html

https://stackoverflow.com/questions/tagged/elk-stack

https://stackoverflow.com/questions/tagged/elk-stack

https://stackoverflow.com/search?q=elk-stack

https://stackoverflow.com/questions/tagged/splunk

[77] Stack Exchange Inc, “Posts containing ‘splunk’ – Stack Overflow”. [Online]. Available: Stack Exchange Inc, https://stackoverflow.com/search?q=splunk, [Accessed: 2018-04-18]

[78] Stack Exchange Inc, “Newest ‘graylog’ Questions – Stack Overflow”. [Online]. Available: https://stackoverflow.com/questions/tagged/graylog, [Accessed: 2018-04-18]

[79] Stack Exchange Inc, “Posts containing ‘graylog’ – Stack Overflow”. [Online]. Available: https://stackoverflow.com/search?q=graylog, [Accessed: 2018-04-18]

[80] Splunk Inc, “Getting data in | Splunk”. [Online]. Available: http://dev.splunk.com/view/SP-CAAAE3A#Gettingdatain-Filesanddirectories, [Accessed: 2018-05-11]

[81] Splunk Inc, “Splunk | Customer Support, Professional Services and Educational Resources”. [Online]. Available: https://www.splunk.com/en_us/support-and-services.html, [Accessed: 2018-05-11]

[82] Splunk Inc, “Scale your deployment with Splunk Enterprise components – Splunk Documentation”. [Online]. Available: https://docs.splunk.com/Documentation/Splunk/7.1.0/Deploy/Distributedoverview, [Accessed: 2018-05-22]

[83] Elasticsearch BV, “Disable swapping | Elasticsearch Reference [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration-memory.html, [Accessed: 2018-05-18]

[84] Elasticsearch BV, “How Logstash Works | Logstash Reference [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/logstash/current/pipeline.html, [Accessed: 2018-05-20]

[85] Elasticsearch BV, “Filter plugins | Logstash Reference [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/logstash/current/filter-plugins.html, [Accessed: 2018-05-19]

[86] Elasticsearch BV, “Output plugins | Logstash Reference [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/logstash/current/output-plugins.html, [Accessed: 2018-05-19]

[87] Elasticsearch BV, “Setting Up TLS on a Cluster | X-Pack for the Elastic Stack [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/x-pack/6.2/ssl-tls.html, [Accessed: 2018-05-20]

[88] Elasticsearch BV, “Security Settings in Elasticsearch | Elasticsearch Reference [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/security-settings.html#ssl-tls-settings, [Accessed: 2018-05-20]

[89] Elasticsearch BV, “Setting Up User Authentication | X-Pack for the Elastic Stack [6.2] | Elastic”. [Online]. Available:

https://stackoverflow.com/search?q=splunk

https://stackoverflow.com/questions/tagged/graylog

https://stackoverflow.com/search?q=graylog

http://dev.splunk.com/view/SP-CAAAE3A#Gettingdatain-Filesanddirectories

http://dev.splunk.com/view/SP-CAAAE3A#Gettingdatain-Filesanddirectories

https://www.splunk.com/en_us/support-and-services.html

https://docs.splunk.com/Documentation/Splunk/7.1.0/Deploy/Distributedoverview

https://docs.splunk.com/Documentation/Splunk/7.1.0/Deploy/Distributedoverview

https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration-memory.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration-memory.html

https://www.elastic.co/guide/en/logstash/current/pipeline.html

https://www.elastic.co/guide/en/logstash/current/filter-plugins.html

https://www.elastic.co/guide/en/logstash/current/output-plugins.html

https://www.elastic.co/guide/en/logstash/current/output-plugins.html

https://www.elastic.co/guide/en/x-pack/6.2/ssl-tls.html

https://www.elastic.co/guide/en/elasticsearch/reference/6.2/security-settings.html#ssl-tls-settings

https://www.elastic.co/guide/en/elasticsearch/reference/6.2/security-settings.html#ssl-tls-settings

https://www.elastic.co/guide/en/x-pack/current/setting-up-authentication.html, [Accessed: 2018-05-20]

[90] Elasticsearch BV, “How Authentication Works | X-Pack for the Elastic Stack [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/x-pack/current/how-authc-works.html, [Accessed: 2018-05-20]

[91] Elasticsearch BV, “certutil | Elasticsearch Reference [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/elasticsearch/reference/current/certutil.html, [Accessed: 2018-05-21]

[92] Elasticsearch BV, “setup-passwords | Elasticsearch Reference [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/setup-passwords.html, [Accessed: 2018-05-21]

[93] Elasticsearch BV, “HTTP | Elasticsearch Reference [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-http.html, [Accessed: 2018-05-23]

[94] IETF, “RFC 793 – Transmission Control Protocol” September 1981. [Online]. Available: https://tools.ietf.org/html/rfc793#section-2.6

[95] man7.org, Linux Programmer’s Manual, “signal(7) – Linux manual page”. [Online]. Available: http://man7.org/linux/man-pages/man7/signal.7.html, [Accessed: 2018-05-23]

[96] Elasticsearch BV, “Index Settings | Elasticsearch: the Definitive Guide [2.x] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/elasticsearch/guide/current/_index_settings.html, [Accessed: 2018-05-23]

[97] Fabian Lee, “ELK: Performance of the Logstash Indexing layer – Fabian Lee : Software Architect”, 2016-10-18. [Online]. Available: http://fabianlee.org/2016/10/18/elk-performance-of-the-logstash-indexing-layer/, [Accessed: 2018-05-23]

[98] Elasticsearch BV, “Monitoring the Elastic Stack | X-Pack for the Elastic Stack [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/x-pack/current/xpack-monitoring.html, [Accessed: 2018-05-24]

[99] Momchil Atanasov, Medium, “Killing your Logstash performance with Grok – Momchil Atanasov – Medium”. [Online]. Available: https://medium.com/@momchil.dev/killing-your-logstash-performance-with-grok-f5f23ae47956, [Accessed: 2018-05-24]

[100] Elasticsearch BV, “Deploying and Scaling Logstash | Logstash Reference [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html, [Accessed: 2018-05-24]

[101] Elasticsearch BV, “Just Enough Kafka for the Elastic Stack, Part 1 | Elastic”. [Online]. Available: https://www.elastic.co/blog/just-enough-kafka-for-the-elastic-stack-part1, [Accessed: 2018-05-24]

[102] Elasticsearch BV, “Scaling Elasticsearch, Kibana, Beats and Logstash | Elastic”. [Online]. Available: https://www.elastic.co/blog/small-

https://www.elastic.co/guide/en/x-pack/current/setting-up-authentication.html

https://www.elastic.co/guide/en/x-pack/current/setting-up-authentication.html

https://www.elastic.co/guide/en/x-pack/current/how-authc-works.html

https://www.elastic.co/guide/en/x-pack/current/how-authc-works.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/certutil.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/certutil.html

https://www.elastic.co/guide/en/elasticsearch/reference/6.2/setup-passwords.html

https://www.elastic.co/guide/en/elasticsearch/reference/6.2/setup-passwords.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-http.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-http.html

https://tools.ietf.org/html/rfc793#section-2.6

http://man7.org/linux/man-pages/man7/signal.7.html

http://man7.org/linux/man-pages/man7/signal.7.html

https://www.elastic.co/guide/en/elasticsearch/guide/current/_index_settings.html

https://www.elastic.co/guide/en/elasticsearch/guide/current/_index_settings.html

http://fabianlee.org/2016/10/18/elk-performance-of-the-logstash-indexing-layer/

http://fabianlee.org/2016/10/18/elk-performance-of-the-logstash-indexing-layer/

https://www.elastic.co/guide/en/x-pack/current/xpack-monitoring.html

https://www.elastic.co/guide/en/x-pack/current/xpack-monitoring.html

https://medium.com/@momchil.dev/killing-your-logstash-performance-with-grok-f5f23ae47956

https://medium.com/@momchil.dev/killing-your-logstash-performance-with-grok-f5f23ae47956

https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html

https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html

https://www.elastic.co/blog/just-enough-kafka-for-the-elastic-stack-part1

https://www.elastic.co/blog/just-enough-kafka-for-the-elastic-stack-part1

https://www.elastic.co/blog/small-medium-or-large-scaling-elasticsearch-and-evolving-the-elastic-stack-to-fit

medium-or-large-scaling-elasticsearch-and-evolving-the-elastic-stack-to-fit, [Accessed: 2018-05-25]

[103] Elasticsearch BV, “Triggers | X-Pack for the Elastic Stack [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/x-pack/current/trigger.html, [Accessed: 2018-05-25]

[104] Elasticsearch BV, “Actions | X-Pack for the Elastic Stack [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/x-pack/current/actions.html, [Accessed: 2018-05-25]

[105] Elasticsearch BV, “Subscriptions – Elastic Stack Products & Support | Elastic”. [Online]. Available: https://www.elastic.co/subscriptions, [Accessed: 2018-05-25]

[106] Sematex Group, “X-Pack Alternatives: Open Source, Commercial and Cloud Services”. [Online]. Available: https://sematext.com/blog/x-pack-alternatives/, [Accessed: 2018-05-25]

[107] Knowi, “Elasticsearch Analytics and Reporting | Knowi”. [Online]. Available: https://www.knowi.com/elasticsearch-analytics, [Accessed: 2018-05-25]

[108] Elasticsearch BV, “Creating a Visualization | Kibana User Guide [6.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/kibana/current/createvis.html, [Accessed: 2018-05-26]

[109] Elasticsearch BV, “Setting up an Advance Logstash Pipeline | Logstash Reference [2.3] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/logstash/2.3/advanced-pipeline.html, [Accessed: 2018-06-12]

[110] Elasticsearch BV, “Kibana and Security | X-Pack for the Elastic Stack [5.2] | Elastic”. [Online]. Available: https://www.elastic.co/guide/en/x-pack/5.2/kibana.html, [Accessed: 2018-06-12]

[111] Elasticsearch BV, “Kibana: Explore, Visualize, Discover Data | Elastic”. [Online]. Available: https://www.elastic.co/products/kibana



https://www.elastic.co/guide/en/x-pack/current/trigger.html

https://www.elastic.co/guide/en/x-pack/current/trigger.html

https://www.elastic.co/guide/en/x-pack/current/actions.html

https://www.elastic.co/guide/en/x-pack/current/actions.html

https://www.elastic.co/subscriptions

https://sematext.com/blog/x-pack-alternatives/

https://sematext.com/blog/x-pack-alternatives/

https://www.knowi.com/elasticsearch-analytics

https://www.elastic.co/guide/en/kibana/current/createvis.html

https://www.elastic.co/guide/en/logstash/2.3/advanced-pipeline.html

https://www.elastic.co/guide/en/logstash/2.3/advanced-pipeline.html

https://www.elastic.co/guide/en/x-pack/5.2/kibana.html

https://www.elastic.co/guide/en/x-pack/5.2/kibana.html

https://www.elastic.co/products/kibana

Appendix A – Logging policy questionnaire

• Log generation

• Which Types of hosts must or should perform logging?

• Which host components must or should perform logging (e.g., operating system, Service application)?

• Which types of events must, or should, each component log (e.g., security events, network connections, authentication attempts)? Which data characteristics must or should be logged for each type of event (e.g., usernames, source and destination IP addresses, protocols)?

• How frequently should each type of event be logged?

• Log transmission

• Which types of hosts must or should transfer its logs to a management infrastructure?

• Which types of entries and data characteristics must or should be transferred from individual hosts to a log management infrastructure?

• How should log data be transferred (e.g., which protocols are permissible)?

• How frequently should log data be transferred from individual hosts to a log management infrastructure?

• Should the confidentiality, integrity and availability be protected while in transit? Should a separate logging network be used?

• Log storage and disposal • How often should logs be rotated?

• How should the confidentiality, integrity and availability of each log be protected while in storage?

• How long should each type of log be preserved? • How should unneeded log data be disposed of?

• How much storage space should be available for the log management infrastructure?

• How should log preservation requests, such as legal requirement to prevent alteration and destructio0n of particular log records be handled?

• Log analysis

• How often should each type of log data be analyzed? Who must or should be able to access the log data and what access should be logged?

• What should be done when suspicious activity or an anomaly is identified?

• How should the confidentiality, integrity and availability be protected of the log analysis while in storage and transit?

• How should inadvertent disclosure of sensitive information recorded in logs, such as passwords or the contents of e-mails be handled?

Appendix B – Elasticsearch configuration

# ======================== Elasticsearch Configuration =========================

# Use a descriptive name for your cluster:

cluster.name: EPM-Stack

#

# ------------------------------------ Node ------------------------------------

# Use a descriptive name for the node:

node.name: logstash01

#

# Set if the node should be master-eligable

node.master: true

#

# Set if the node should be a data node

node.data: true

#

# Set if the node should be able to ingest data

node.ingest: true

# ----------------------------------- Paths ------------------------------------

# Path to directory where to store the data (separate multiple locations by comma):

path.data: /var/lib/elasticsearch

#

# Path to log files:

path.logs: /var/log/elasticsearch

# ---------------------------------- Network -----------------------------------

# Set the bind address to a specific IP (IPv4 or IPv6):

network.host: 192.168.99.10

#

# Set a custom port for HTTP:

http.port: 9200-9300

#

# Set a communication port to be used between nodes:

transport.tcp.port: 9300-9400

# --------------------------------- Discovery ----------------------------------

# Pass an initial list of hosts to perform discovery when new node is started:

# The default list of hosts is ["127.0.0.1", "[::1]"]

discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300", "[::1]", "192.168.99.20:9300"]

#

# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):

discovery.zen.minimum_master_nodes: 1

# ---------------------------------- Gateway -----------------------------------

# Block initial recovery after a full cluster restart until N nodes are started:

gateway.expected_nodes: 2

# ------------------------------- X-Pack Security ------------------------------

# Enable X-Pack Security

xpack.security.enabled: true

#

# Enable SSL/TLS between nodes

xpack.security.transport.ssl.enabled: true

#

# Enable HTTP TLS encryption

xpack.security.http.ssl.enabled: true

#

# Set the verification mode of nodes to use certificate and check DNS-Name and IP-Address

xpack.security.transport.ssl.verification_mode: full

#

# Set the path to the private key's to be used to verify this node

xpack.security.transport.ssl.key: /etc/elasticsearch/certs/logstash01.key

xpack.security.http.ssl.key: /etc/elasticsearch/certs/logstash01.key

#

# Set the path to the certificates used to verify this node

xpack.security.transport.ssl.certificate: /etc/elasticsearch/certs/logstash01.crt

xpack.security.http.ssl.certificate: /etc/elasticsearch/certs/logstash01.crt

#

# Set the path to the CA certificate to be used to validate other nodes certificates

xpack.security.transport.ssl.certificate_authorities: [ "/etc/elasticsearch/certs/ca/ca.crt" ]

xpack.security.http.ssl.certificate_authorities: [ "/etc/elasticsearch/certs/ca/ca.crt" ]

Appendix C – Kibana configuration

# ======================== Kibana Configuration ========================= # ------------------------------- Server --------------------------------- # Kibana is served by a back end server. This setting specifies the port to use. server.port: 5601 # Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values. # The default is 'localhost', which usually means remote machines will not be able to connect. # To allow connections from remote users, set this parameter to a non-loopback address. server.host: "192.168.99.10" # The Kibana server's name. This is used for display purposes. server.name: "logstash01.kibana" # Enables SSL and paths to the PEM-format SSL certificate and SSL key files, respectively. # These settings enable SSL for outgoing requests from the Kibana server to the browser. server.ssl.enabled: true server.ssl.certificate: /etc/kibana/certs/logstash01.crt server.ssl.key: /etc/kibana/certs/logstash01.key # Kibana uses an index in Elasticsearch to store saved searches, visualizations and # dashboards. Kibana creates a new index if the index doesn't already exist. kibana.index: ".kibana" # ------------------------ Elasticsearch Connection -------------------------- # The URL of the Elasticsearch instance to use for all your queries. elasticsearch.url: "https://192.168.99.10:9200" # If your Elasticsearch is protected with basic authentication, these settings provide # the username and password that the Kibana server uses to perform maintenance on the Kibana # index at startup. Your Kibana users still need to authenticate with Elasticsearch, which # is proxied through the Kibana server. elasticsearch.username: "kibana" elasticsearch.password: "*****" # Optional settings that provide the paths to the PEM-format SSL certificate and key files. # These files validate that your Elasticsearch backend uses the same key files. elasticsearch.ssl.certificate: /etc/kibana/certs/logstash01.crt elasticsearch.ssl.key: /etc/kibana/certs/logstash01.key # Optional setting that enables you to specify a path to the PEM file for the certificate # authority for your Elasticsearch instance. elasticsearch.ssl.certificateAuthorities: [ "/etc/elasticsearch/certs/ca/ca.crt" ] # To disregard the validity of SSL certificates, change this setting's value to 'none'. elasticsearch.ssl.verificationMode: full # ------------------------ X-Pack integration -------------------------- # Enable X-Pack security xpack.security.enabled: true # A encryption key to be used which must be at least 32 characters xpack.security.encryptionKey: "something_at_least_32_charactersaaaaaaaaaaaaaaaaaaaaaaaaaaa" # The session timeout for Kibana usage in the web browser xpack.security.sessionTimeout: 600000

Appendix D – Logstash configuration

# ======================== Logstash Configuration ========================= # ------------ Node identity ------------ # # Use a descriptive name for the node: node.name: logstash.logstash # ------------ Data path ------------------ # # Which directory should be used by logstash and its plugins # for any persistent needs. Defaults to LOGSTASH_HOME/data path.data: /var/lib/logstash # ------------ Queuing Settings -------------- # # Internal queuing model, "memory" for legacy in-memory based queuing and # "persisted" for disk-based acked queueing. Defaults is memory queue.type: persisted # If using queue.type: persisted, the total capacity of the queue in number of bytes. # If you would like more unacked events to be buffered in Logstash, you can increase the # capacity using this setting. Please make sure your disk drive has capacity greater than # the size specified here. If both max_bytes and max_events are specified, Logstash will pick # whichever criteria is reached first # Default is 1024mb or 1gb queue.max_bytes: 8gb # ------------ Debugging Settings -------------- # # Options for log.level: # * fatal # * error # * warn # * info (default) # * debug # * trace # log.level: trace # Path to store logs at path.logs: /var/log/logstash # ------------ Other Settings -------------- # # X-Pack monitoring settings to connect to elasticsearch instance with xpack.monitoring.elasticsearch.username: logstash_system xpack.monitoring.elasticsearch.password: ***** xpack.monitoring.elasticsearch.url: https://192.168.99.20:9200 xpack.monitoring.elasticsearch.ssl.ca: /etc/logstash/certs/ca/ca.crt

Appendix E – Logstash input configuration

input { syslog { id => "logstash01.syslog.input" port => 5666 codec => multiline { pattern => "\s\s\s\sat" what => "previous" } } } input { beats { id => "logstash01.beats.input" port => 5044 ssl => true ssl_certificate_authorities => ["/etc/logstash/certs/ca/ca.crt"] ssl_certificate => "/etc/logstash/certs/logstash01.crt" ssl_key => "/etc/logstash/certs/logstash01.key" ssl_verify_mode => "none" } }

Appendix F – Logstash filter configuration

filter { if [type] == "syslog" { grok { match => { "message" => "<%{POSINT:priority}>%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}: %{DATA:command}: %{GREEDYDATA:syslog_message}" } add_field => [ "received_at", "%{@timestamp}" ] add_field => [ "received_from", "%{host}" ] } syslog_pri { } date { match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ] } } }

Appendix G – Logstash output configuration

output { elasticsearch { id => "logstash01.elasticsearch.output" hosts => ["https://192.168.99.20:9200", "https://192.168.99.10:9200"] sniffing => true manage_template => false index => "main-%{+YYYY.MM.dd}" document_type => "%{[@metadata][type]}" ssl => true ssl_certificate_verification => true cacert => "/etc/logstash/certs/ca/ca.crt" user => 'logstash_system' password => '*****' } }

TRITA-EECS-EX-2018:163

Date post:	05-Sep-2020
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Centralized log management for complex computer...

Documents