+ All Categories
Home > Documents > A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf ·...

A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf ·...

Date post: 23-Jun-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
39
A Very Brief Introduction to Hadoop
Transcript
Page 1: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

A Very Brief Introduction to Hadoop

Page 2: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Hadoop – In the wild

2

Page 3: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Hadoop – What really matters

3

•  The tool that solves your problem.

Page 4: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Tools We Use in OpenSOC

4

Page 5: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Tools We Use in OpenSOC

5

•  Kafka – a distributed input queue

Page 6: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

OpenSOC – The Architecture

6

AccessMessaging SystemData CollectionSource Systems StorageReal Time Processing

StormKafka

B Topic

N Topic

Elastic Search

Index

Web Services

Search

PCAP Reconstruction

HBase

PCAP Table

Analytic Tools

R / Python

Power Pivot

Tableau

Hive

Raw Data

ORC

Passive Tap

PCAP Topic

DPI Topic

A Topic

Telemetry Sources

Syslog

HTTP

File System

Other

Flume

Agent A

Agent B

Agent N

B Topology

N Topology

A Topology

PCAP

Traffic Replicator

PCAP Topology

DPI Topology

Page 7: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Tools We Use in OpenSOC

7

•  Kafka – a distributed input queue •  Flume – Listens for “syslog” style data and directs it into Kafka (or HDFS)

Page 8: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Tools We Use in OpenSOC

8

•  Kafka – a distributed input queue •  Flume – Listens for “syslog” style data and directs it into Kafka (or HDFS)

Syslog

Snort

Other

Flume Source

Flume Source

Flume Source

Flume Sink

Flume Sink

Flume Sink

Kafka

Page 9: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

OpenSOC – The Architecture

9

AccessMessaging SystemData CollectionSource Systems StorageReal Time Processing

StormKafka

B Topic

N Topic

Elastic Search

Index

Web Services

Search

PCAP Reconstruction

HBase

PCAP Table

Analytic Tools

R / Python

Power Pivot

Tableau

Hive

Raw Data

ORC

Passive Tap

PCAP Topic

DPI Topic

A Topic

Telemetry Sources

Syslog

HTTP

File System

Other

Flume

Agent A

Agent B

Agent N

B Topology

N Topology

A Topology

PCAP

Traffic Replicator

PCAP Topology

DPI Topology

Page 10: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Tools We Use in OpenSOC

10

•  Kafka – a distributed input queue •  Flume – Listens for “syslog” style data and directs it into Kafka •  Storm – A stream processor

Page 11: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

OpenSOC – The Architecture

11

AccessMessaging SystemData CollectionSource Systems StorageReal Time Processing

StormKafka

B Topic

N Topic

Elastic Search

Index

Web Services

Search

PCAP Reconstruction

HBase

PCAP Table

Analytic Tools

R / Python

Power Pivot

Tableau

Hive

Raw Data

ORC

Passive Tap

PCAP Topic

DPI Topic

A Topic

Telemetry Sources

Syslog

HTTP

File System

Other

Flume

Agent A

Agent B

Agent N

B Topology

N Topology

A Topology

PCAP

Traffic Replicator

PCAP Topology

DPI Topology

Page 12: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Tools We Use in OpenSOC

12

•  Kafka – a distributed input queue •  Flume – Listens for “syslog” style data and directs it into Kafka •  Storm – A stream processor •  Hbase – An In Memory Database (NoSQL)

Page 13: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

OpenSOC – The Architecture

13

AccessMessaging SystemData CollectionSource Systems StorageReal Time Processing

StormKafka

B Topic

N Topic

Elastic Search

Index

Web Services

Search

PCAP Reconstruction

HBase

PCAP Table

Analytic Tools

R / Python

Power Pivot

Tableau

Hive

Raw Data

ORC

Passive Tap

PCAP Topic

DPI Topic

A Topic

Telemetry Sources

Syslog

HTTP

File System

Other

Flume

Agent A

Agent B

Agent N

B Topology

N Topology

A Topology

PCAP

Traffic Replicator

PCAP Topology

DPI Topology

Page 14: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Tools We Use in OpenSOC

14

•  Kafka – a distributed input queue •  Flume – Listens for “syslog” style data and directs it into Kafka •  Storm – A stream processor •  Hbase – An In Memory Database (NoSQL) •  Hive – SQL access to HDFS Data (Java, JDBC, ODBC)

Page 15: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

OpenSOC – The Architecture

15

AccessMessaging SystemData CollectionSource Systems StorageReal Time Processing

StormKafka

B Topic

N Topic

Elastic Search

Index

Web Services

Search

PCAP Reconstruction

HBase

PCAP Table

Analytic Tools

R / Python

Power Pivot

Tableau

Hive

Raw Data

ORC

Passive Tap

PCAP Topic

DPI Topic

A Topic

Telemetry Sources

Syslog

HTTP

File System

Other

Flume

Agent A

Agent B

Agent N

B Topology

N Topology

A Topology

PCAP

Traffic Replicator

PCAP Topology

DPI Topology

Page 16: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Tools We Use in OpenSOC

16

•  Kafka – a distributed input queue •  Flume – Listens for “syslog” style data and directs it into Kafka •  Storm – A stream processor •  Hbase – An In Memory Database (NoSQL) •  Hive – SQL access to HDFS Data •  Elastic Search – For Indexing PCAP

PCAP_ID

Source IP

Source Port

Dest IP

Dest Port

TS_Micro

Page 17: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

OpenSOC – The Architecture

17

AccessMessaging SystemData CollectionSource Systems StorageReal Time Processing

StormKafka

B Topic

N Topic

Elastic Search

Index

Web Services

Search

PCAP Reconstruction

HBase

PCAP Table

Analytic Tools

R / Python

Power Pivot

Tableau

Hive

Raw Data

ORC

Passive Tap

PCAP Topic

DPI Topic

A Topic

Telemetry Sources

Syslog

HTTP

File System

Other

Flume

Agent A

Agent B

Agent N

B Topology

N Topology

A Topology

PCAP

Traffic Replicator

PCAP Topology

DPI Topology

Page 18: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

OpenSOC – The Architecture

18

AccessMessaging SystemData CollectionSource Systems StorageReal Time Processing

StormKafka

B Topic

N Topic

Elastic Search

Index

Web Services

Search

PCAP Reconstruction

HBase

PCAP Table

Analytic Tools

R / Python

Power Pivot

Tableau

Hive

Raw Data

ORC

Passive Tap

PCAP Topic

DPI Topic

A Topic

Telemetry Sources

Syslog

HTTP

File System

Other

Flume

Agent A

Agent B

Agent N

B Topology

N Topology

A Topology

PCAP

Traffic Replicator

PCAP Topology

DPI Topology

Page 19: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

OpenSOC – The Architecture

19

AccessMessaging SystemData CollectionSource Systems StorageReal Time Processing

StormKafka

B Topic

N Topic

Elastic Search

Index

Web Services

Search

PCAP Reconstruction

HBase

PCAP Table

Analytic Tools

R / Python

Power Pivot

Tableau

Hive

Raw Data

ORC

Passive Tap

PCAP Topic

DPI Topic

A Topic

Telemetry Sources

Syslog

HTTP

File System

Other

Flume

Agent A

Agent B

Agent N

B Topology

N Topology

A Topology

PCAP

Traffic Replicator

PCAP Topology

DPI Topology

Page 20: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

Building Your Own

Page 21: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Kittens!

21

-  Pets are given names like Fluffy -  They are unique and lovingly raised and

cared for -  If they get sick, you take them to the vet and

nurse them back to health -  You hope they’ll live forever

Page 22: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Cattle

22

-  Cattle are given numbers like dn01. -  They are almost identical to other cattle. -  When they get sick, you take “normal”

measures to cure. Cost/benefit model. -  Cattle have minimum life expectancy. -  To serve their purpose, they must be

“herded.”

Page 23: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Clusters Nodes are Cattle!

23

Page 24: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

What are the implications?

24

Page 25: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

What are the implications?

25

•  Most of the “Enterprise Class” Server rules no longer apply!

Page 26: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

What are the implications?

26

•  Most of the “Enterprise Class” Server rules no longer apply! •  Automation is King

Page 27: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

What are the implications?

27

•  Most of the “Enterprise Class” Server rules no longer apply! •  Automation is King •  Cheaper is usually better

Page 28: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

What are the implications?

28

•  Most of the “Enterprise Class” Server rules no longer apply! •  Automation is King •  Cheaper is usually better •  No 3-5 Year refresh cycle

To-Do in 3 Years 1.  Buy new servers to replace the old

ones 2.  Re-install software 3.  Transfer Data

Page 29: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

So what do I need to run OpenSOC?

29

Page 30: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

So what do I need to run OpenSOC?

30

•  A Cluster running Core Hadoop + Hive, Hbase, Kafka, Storm and Elastic Search

Elastic Search

Elastic Search

Elastic Search

Elastic Search

OK, but not required

Page 31: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

So what do I need to run OpenSOC?

31

•  A Cluster running Core Hadoop + Hive, Hbase, Kafka, Storm and Elastic Search

•  3 Physical Servers for NN, Zookeeper, YARN and “Master” Nodes

Page 32: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

So what do I need to run OpenSOC?

32

•  A Cluster running Core Hadoop + Hive, Hbase, Kafka, Storm and Elastic Search

•  3 Physical Servers for NN, Zookeeper, YARN and “Master” Nodes •  Data Nodes: As many as you can

•  Depending on Data Retention Requirements & Ingestion Rates

Page 33: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

OpenSOC at Cisco (aka MTD)

33

Hardware footprint (40u): -  14 Hadoop Data Nodes (UCS C240 M3) -  3 Cluster Control Nodes (UCS C220 M3) -  2 ESX Hypervisor Hosts (UCS C220 M3) -  1 PCAP Processor (UCS C220 M3 + Napatech NIC) -  2 SourceFire Threat alert processors -  1 Anue Network Traffic splitter -  1 Router -  1 48 Port 10GE Switch Software Stack -  HDP 2.2 -  Kafka 0.8.1 -  Elastic Search 1.3.0 -  MySQL 5.5 (Hive Meta & GeoData)

Page 34: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

OpenSOC at Cisco

34

Ctrl01 Zookeeper

NN1

ES Master

Ctrl02 Zookeeper

NN2

ES Master

Nimbus Server/UI

Hive Meta Hbase Master

Ctrl03 Zookeeper

YARN / History Server

ES Master

Hbase Master StdBy

Flume Agents

DataNode (10-14) YARN / HDFS

Hbase

Storm Client

Kafka*

*Dedicated disks Elastic Search Nodes (8) 3x Elastic Search

Page 35: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Installation Suggestions

35

•  Kickstart / Autoyast / etc. – Automated builds as much as possible

Page 36: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Installation Suggestions

36

•  Kickstart / Autoyast / etc. – Automated builds as much as possible •  Hadoop Distro or Roll Your Own

Distribution

•  Fast Setup •  Vendor Support

•  Can cost more than nodes!

•  Potential Lock-in

Roll your own

•  Longer Startup •  Experience

required •  Lower Cost •  More control

Page 37: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

© 2015 Cisco and/or its affiliates. All rights reserved. Presentation_ID:TECSEC-3900 Cisco Public

Installation Suggestions

37

•  Kickstart / Autoyast / etc. – Automated builds as much as possible •  Hadoop Distro or Roll Your Own •  Plus Configuration Management

•  We’re working on Ansible scripts that we expect to release

Page 38: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume
Page 39: A Very Brief Introduction to Hadoop - Meetupfiles.meetup.com/18870621/OpenSOC_Platform_v3.pdf · Hbase Master Ctrl03 Zookeeper YARN / History Server ES Master Hbase Master StdBy Flume

Recommended