+ All Categories
Home > Technology > Open Source Security Tools for Big Data

Open Source Security Tools for Big Data

Date post: 15-Apr-2017
Category:
Upload: great-wide-open
View: 172 times
Download: 1 times
Share this document with a friend
33
1 Open Source Security Tools For Big Data Rommel Garcia @rommelgarcia Hortonworks
Transcript
Page 1: Open Source Security Tools for Big Data

1

Open Source Security Tools For Big DataRommel Garcia@rommelgarciaHortonworks

Page 2: Open Source Security Tools for Big Data

2

# whoami

Global Security SME Lead @hortonworks Senior Solutions Engineer @hortonworks Book Author – Virtualizing Hadoop Co-organizer of Atlanta Hadoop User Group Regular Speaker at Big Data Conferences

Page 3: Open Source Security Tools for Big Data

Big Data Landscape

Page 4: Open Source Security Tools for Big Data

4

DATA – More Volume and More Types

I N C R E A S I N G D A T A V A R I E T Y A N D C O M P L E X I T Y

USER GENERATED CONTENT

MOBILE WEB

SMS/MMS

SENTIMENT

EXTERNAL DEMOGRAPHICS

HD VIDEO

SPEECH TO TEXT

PRODUCT/SERVICE LOGS

SOCIAL NETWORK

BUSINESS DATA FEEDS

USER CLICK STREAM

WEB LOGS

OFFER HISTORY DYNAMIC PRICING

A/B TESTING

AFFILIATE NETWORKS

SEARCH MARKETING

BEHAVIORAL TARGETING

DYNAMIC FUNNELSPAYMENTRECORD

SUPPORT CONTACTS

CUSTOMER TOUCHESPURCHASE DETAIL

PURCHASERECORD

SEGMENTATIONOFFER DETAILS

P E TA B Y T E S

T E R A B Y T E S

G I G A B Y T E S

E X A B Y T E S

E R P

B I G D ATA

W E B

C R M

Page 5: Open Source Security Tools for Big Data

5

Big Data Ecosystem

Big Data Platform

DATA REPOSITORIES

Risk modelingFraud detectionCompliance (AML, KYC)Bank 3.0

Information securitySingle view of customerTrading applicationsMarket data management

ANALYSIS & VISUALIZATION

Secu

rity

Ope

ratio

ns

Gove

rnan

ce&

Inte

grati

on

°1 ° ° ° ° ° ° °

° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° N

YARN : Data Operating System

Script SQL NoSQL Stream Search Others

HDFS (Hadoop Distributed File System)

In-Mem

TRADITIONAL SOURCES

EDW

OLAP Datamarts

Column Databases

CRM

RDBMS

LENDING MARKETS TRADES COMPLIANCE DATA

CREDIT CARD CASH & EQUITY FINANCE & GL RISK DATA

EMERGING & NON-TRADITIONAL SOURCES

SERVER LOGS CALL CENTER EMAILS WORD DOCUMENTS

LOCATION DATA SENSOR DATA CUSTOMER SENTIMENT

RESEARCH REPORTS

Page 6: Open Source Security Tools for Big Data

6

• HIPAA - Health Insurance Portability and Accountability Act of 1996 • HITECH - The Health Information Technology for Economic and Clinical Health Act• PCI DSS - Payment Card Industry Data Security Standard• SOX - The Sarbanes-Oxley Act of 2003• ISO - International Organization Standardization• COBIT - Control Objectives for Information and Related Technology

• Corporate Security Policies

Compliance Adherences

Page 7: Open Source Security Tools for Big Data

Big Data Security

Page 8: Open Source Security Tools for Big Data

8

• Authentication• Authorization• Audit• Data at rest/in-motion Encryption• Centralized Administration

5 Pillars of Security

Page 9: Open Source Security Tools for Big Data

9

Big Data Ecosystem

Big Data Platform

DATA REPOSITORIES

Risk modelingFraud detectionCompliance (AML, KYC)Bank 3.0

Information securitySingle view of customerTrading applicationsMarket data management

ANALYSIS & VISUALIZATION

Secu

rity

Ope

ratio

ns

Gove

rnan

ce&

Inte

grati

on

°1 ° ° ° ° ° ° °

° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° N

YARN : Data Operating System

Script SQL NoSQL Stream Search Others

HDFS (Hadoop Distributed File System)

In-Mem

TRADITIONAL SOURCES

EDW

OLAP Datamarts

Column Databases

CRM

RDBMS

LENDING MARKETS TRADES COMPLIANCE DATA

CREDIT CARD CASH & EQUITY FINANCE & GL RISK DATA

EMERGING & NON-TRADITIONAL SOURCES

SERVER LOGS CALL CENTER EMAILS WORD DOCUMENTS

LOCATION DATA SENSOR DATA CUSTOMER SENTIMENT

RESEARCH REPORTS

1

1 Knox2 Kerberos3 Ranger4 HDFS Enc.5 LDAP

2

3

4

5

Page 10: Open Source Security Tools for Big Data

10

• Authentication -> Knox, Kerberos• Authorization -> Ranger• Audit -> Ranger• Data Protection -> HDFS Encryption, Wire Encryption• Centralized Administration -> Ranger

5 Pillars of Security

Page 11: Open Source Security Tools for Big Data

11

Knox

Page 12: Open Source Security Tools for Big Data

12

Why Knox?

Simplified Access

•Kerberos encapsulation •Extends API reach•Single access point•Multi-cluster support•Single SSL certificate

Centralized Control

• Central REST API auditing• Service-level authorization• Alternative to SSH “edge node”

Enterprise Integration

•LDAP integration•Active Directory integration•SSO integration•Apache Shiro extensibility•Custom extensibility

Enhanced Security

• Protect network details• Partial SSL for non-SSL services• WebApp vulnerability filter

Page 13: Open Source Security Tools for Big Data

13

Knox Deployment with Hadoop Cluster

Application Tier

DMZ

Switch Switch

….Master Nodes

Rack 1

Switch

NN

SNN

….Slave Nodes

Rack 2

….Slave Nodes

Rack N

SwitchSwitch

DN DN

Web Tier

LB

Knox

Hadoop CLIs

Page 14: Open Source Security Tools for Big Data

14

REST API

HadoopServices

What does Perimeter Security really mean?

Gateway

Firewall

User

Firewall required at perimeter

(today)Knox Gateway

controls all Hadoop REST API access through

firewall

Hadoop cluster mostly

unaffected

Firewall only allows connections

through specific ports from Knox

host

Hive Host

HBase Host

WebHDFS

HBase HostHBase Host

REST API

Page 15: Open Source Security Tools for Big Data

15

Kerberos

Page 16: Open Source Security Tools for Big Data

16

Why Kerberos?

Provides Strong Authentication

Establishes identity for users, services and hosts

Prevents impersonation on unauthorized account

Supports token delegation model

Works with existing directory services

Basis for Authorization

Page 16

Page 17: Open Source Security Tools for Big Data

17

Don’t be afraid of Kerberos…..

Page 18: Open Source Security Tools for Big Data

18

Security Implications

$ whoamibaduser$ hadoop fs -ls /tmpFound 2 itemsdrwx-wx-wx - ambari-qa hdfs 0 2015-07-14 18:38 /tmp/hivedrwx------ - hdfs hdfs 0 2015-07-14 20:33 /tmp/secure$ hadoop fs -ls /tmp/securels: Permission denied: user=baduser, access=READ_EXECUTE, inode="/tmp/secure":hdfs:hdfs:drwx------

Good right?

Page 19: Open Source Security Tools for Big Data

19

Security Implications

$ whoamibaduser$ hadoop fs -ls /tmpFound 2 itemsdrwx-wx-wx - ambari-qa hdfs 0 2015-07-14 18:38 /tmp/hivedrwx------ - hdfs hdfs 0 2015-07-14 20:33 /tmp/secure$ hadoop fs -ls /tmp/securels: Permission denied: user=baduser, access=READ_EXECUTE, inode="/tmp/secure":hdfs:hdfs:drwx------

Good right? – Look Again!!!$ HADOOP_USER_NAME=hdfs hadoop fs -ls /tmp/secureFound 1 itemsdrwxr-xr-x - hdfs hdfs 0 2015-07-14 20:35 /tmp/secure/blah

Page 20: Open Source Security Tools for Big Data

20

Kerberos Primer

Page 20

Client

KDC

NN

DN

1. kinit - Login and get Ticket Granting Ticket (TGT)

3. Get NameNode Service Ticket (NN-ST)

2. Client Stores TGT in Ticket Cache

4. Client Stores NN-ST in Ticket Cache

5. Read/write file given NN-ST and file name; returns block locations, block IDs and Block Access Tokens

if access permitted

6. Read/write block givenBlock Access Token and block ID

Client’sKerberos Ticket

Cache

Page 21: Open Source Security Tools for Big Data

21

Ranger

Page 22: Open Source Security Tools for Big Data

22

Plugin PluginPlugin PluginPlugin Plugin

Apache Ranger authZ Architecture

Hive YARN Knox Storm Solr Kafka

Plugin

HDFS

Plugin

Audit Server Policy Server

Administration Portal

REST APIs

DB

SOLR

HDFS

KMS

LDAP/AD

user/group syncLog4j

HBase

Page 23: Open Source Security Tools for Big Data

23

Sample Simplified Workflow - HDFS

Policy Manager

Plugin

Admin sets policies for HDFS files/folder

Data scientist runs a map reduce job

User Application

Users access HDFS data through application Name Node

IT users access HDFS through CLI

Namenode usesPlugin for Authorization

Audit Database Audit logs pushed to DB

Namenode provides resource access to user/client

1

2

2

2

3

4

5

Page 24: Open Source Security Tools for Big Data

24

Ranger Stacks

• Apache Ranger v0.5 supports stack-model to enable easier onboarding of new components, without requiring code changes in Apache Ranger.

Ranger Side Changes

Define Service-type

Secured Components Side Changes

Develop Ranger Authorization Plugin• Create a JSON file with following

details :- Resources- Access types- Config to connect

• Load the JSON into Ranger.

• Include plugin library in the secure component. • During initialization of the service: Init RangerBasePlugIn &

RangerDefaultAuditHandler class. • To authorize access to a resource: Use

RangerAccessRequest.isAccessAllowed()• To support resource lookup: Implement

RangerBaseService.lookupResource() & RangerBaseService.validateConfig()

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741207

Page 25: Open Source Security Tools for Big Data

25

HDFS Encryption

Page 26: Open Source Security Tools for Big Data

26

Data Protection

Hadoop allows you to apply data protection policy at two different layers across the Hadoop stack

Layer What? How ?

Storage Encrypt data in diskVolume level: LUKS (Linux), BitLocker (Windows)Native in Hadoop: HDFS EncryptionPartners: Voltage, Protegrity, DataGuise, VormetricOS level encrypt

Transmission Encrypt data as it moves Native in Hadoop: SSL & SASLAES 256 for SSL & DTP with SASL

Page 27: Open Source Security Tools for Big Data

27

Data at rest Encryption Protection

Volume Level Encryption (Open Source - LUKS, DMCrypt)

OS File Level Encryption (Open Source - eCryptfs)

Hadoop Level Encryption (HDFS TDE*, Hive CLE**, HBase** )

Page 28: Open Source Security Tools for Big Data

28

1

°

°

°

°

° °

° °

° °

° °

° N°

HDFS Encryption – How it works

DATA ACCESS

DATA MANAGEMENT

1 ° ° ° ° °

° ° ° ° ° °

° ° ° ° ° °

SECURITY

YARN

HDFS Client

° ° ° ° ° °

° ° ° ° ° °

° °

° °

° °

° °

°HDFS (Hadoop Distributed File System)

Encryption Zone (attributes - EZKey ID, version)

HDFS-6134

Encrypted File(attributes - EDEK, IV)

Name Node

KeyProviderAPI

KeyProvider API

Key Management System (KMS)Hadoop-10433

KeyProvider API – Hadoop-10141

EDEK

DEK

Crypto Stream

(r/w with DEK)DEKs EZKs

Acronym Description

EZ Encryption Zone (an HDFS directory)

EZK Encryption Zone Key; master key associated with all files in an EZ

DEK Data Encryption Key, unique key associated with each file. EZ Key used to generate DEK

EDEK Encrypted DEK, Name Node only has access to encrypted DEK.

IV Initialization Vector

EDEK

EDEK

Page 29: Open Source Security Tools for Big Data

29

As HDFS Admin

HDFS Encryption – Common Commands

• Run KMS Server– ./kms.sh run

• Create Encryption Key– hadoop key create key1 -size 128 – # Key size can be 128, 192 or 256. 256 requires unlimited strength JCE file.

• List all Encryption Keys– hadoop key list –metadata

• As an Admin(hdfs user) create an encryption Zone– hdfs crypto -createZone -keyName key1 -path /secure1 – Point to an existing & empty directory

• List all Encryption Zones– hdfs crypto –listZones

• Read/Write to HDFS unchanged– hdfs dfs -copyFromLocal /tmp/vinay.txt /secure1– hdfs dfs -cat /securehive/sal.txt

Run this as user not in HDFS admin role

As HDFS End-user

Page 30: Open Source Security Tools for Big Data

30

Encrypting Data In-Motion

Page 30

Protocol Communication Point Encryption Mechanism

• REST • WebHDFS (Client to Cluster)• Client to Knox

• REST over SSL• Knox Gateway SSL• SPNEGO - provides a mechanism for extending Kerberos to

Web applications through the standard HTTP protocol

• HTTP • NameNode/JobTracker UI• MapReduce Shuffle

• HTTPS• Encrypted MapReduce Shuffle (MAPREDUCE-4117)

• RPC • Hadoop Client (Client to Cluster, Intra-Cluster)

• SASL – The Hadoop RPC system implements SASL which provides different QoP including encryption

• JDBC/ODBC • HiveServer2 • SSL

• TCP/IP • Data Transfer (Client to Cluster, Intra-Cluster)

• Encrypted DataTransfer Protocol available in Hadoop• Adding SASL support to the DataTransferProtocol

Page 31: Open Source Security Tools for Big Data

Real-world Implementation

Page 32: Open Source Security Tools for Big Data

32

Data Sources

Data Sources

Page 33: Open Source Security Tools for Big Data

33

Thank You !


Recommended