Hadoop Operations - Best practices from the field

transcript

Did someone just order Hadoop?

- Best practices from the field

uweseiler

2 About me

Big Data Nerd

TravelpiratePhotography Enthusiast

Hadoop Trainer NoSQL Fan Boy

2 About us

specializes on...

Big Data Nerds Agile Ninjas Continuous Delivery Gurus

Enterprise Java Specialists Performance Geeks

Join us!

2 Agenda

• Basics• Software I

• Architecture & Rack Design

• Hardware & Cluster Sizing

• Software II

• Advanced• Data Ingestion

• Operation & Monitoring

• Security

2 Agenda

• Software II

• Security

2 Deployment Options

On PremiseHadoop

ApplianceHadoopHosting

Hadoop as a service

Bare Metal Cloud

2 Hadoop Distributions

2 Cloudera vs. Hortonworks

Guess what:

Both will do the job!

Which ideology do you prefer?

“Closed“ Source Open Source

Pricing model

Software +

Support Support

2 Agenda

• Software II

• Security

Platform for Data Exploration

Visualization

Data Warehouse

Create the Big Picture

Map Reduce

TezSpark

Pig Hive

NFS Gateway

Falcon

Ambari

Ranger

Ganglia

Nagios

Monitoring

ZooKeeper

Journal NodesCluster Management Services

Data Ingestion

& Governance

Data Storage

Data Processing Search SecurityWorkflow

Pick your Hadoop Stack

2 Rack Design (without HA)

Rack 1 Rack 2

NameNode

ResourceManager

Mgmt. Server

5 x Master Nodes

5 x Worker Nodes

6 x Worker Nodes

Gateway Server

Nexus 3 K

Cisco Catalyst 2960

1 x ToR Switch Nexus 3 K 1 x ToR Switch

1 x Mgmt. Network Cisco Catalyst 2960 1 x Mgmt. Network

SecondaryNameNode

Rack 1 Rack 2

NameNode (Active)

ResourceManager(Active)

Mgmt. Server

4 x Master Nodes

5 x Worker Nodes

6 x Worker Nodes

NameNode(Passive)

ResourceManager(Passive)

2 x Standby HA Nodes

Gateway Server

Nexus 3 K

Cisco Catalyst 2960

1 x ToR Switch Nexus 3 K 1 x ToR Switch

1 x Mgmt. Network Cisco Catalyst 2960 1 x Mgmt. Network

Rack Design (with HA)

HDFS DataNode

YARN NodeManager

Hadoop Client Libraries

Worker Nodes

NameNode (Active)

ZooKeeper Server

Journal Node

HDFS NameNode (Active) NameNode (Passive)

ZooKeeper Server

Journal Node

HDFS NameNode (Passive)

ResourceManager

App Timeline Server

MapReduce2 History Server

ZooKeeper Server

Journal Node

YARN ResourceManager (Active)

MySQL Server

• Hive MetaStore

• Oozie

• Ganglia

HiveServer2

Oozie Server

Ganglia Server

Nagios Server

Zookeeper Server

Journal Node

Kerberos

Management ServerHue Server

Ambari Server

NFS Gateway Server

WebHCat Server

WebHDFS

Falcon

Gateway Server

ResourceManager

App Timeline Server

MapReduce2 History Server

ZooKeeper Server

Journal Node

YARN ResourceManager (Passive)

Service Mapping

2 Agenda

• Software II

• Security

2 Hardware

• Get good-quality commodity hardware!

• Buy the sweet-spot in pricing: 3 TB disk, 128 GB RAM, 8-12 core CPU– More memory is better. Always.

• First scale horizontally than vertically (1U 6 disks vs. 2U 12 disks)

– Get to at least 30-40 machines or 3-4 racks

• Don‘t forget about rack size (42U) and power consumption.

• Use pilot cluster to learn about load patterns

– Balanced workload

– Compute intensive

– I/O intensive

2 It’s about storage

Total: 3,00 TB

Intermediate data: ~25% - 0,75 TB

= 2,25 TB

HDFS Replication: 3

= 0,75 TB

x 12 disk

x 11 Data Nodes

= 99 TB

Compression: …well, it depends…

2 It’s about Zen

Xeon 10C Model E5-2660v2

4 Memory Channels

10 Cores

8 x 16 GB

12 disks

2 Hardware

Component HDFS NameNode

HDFS Secondary NN

YARN Resource Manager

Management Server

Gateway

Server

Worker Nodes

CPU 2 x 3+ GHz with 8+ cores 2 x 3+ GHz with 8+ cores 2 x 2.6+ GHz with 8+ cores

Memory 128 GB

(DDR3, ECC)

128 GB

(DDR3, ECC)

128 GB

(DDR3, ECC)

Storage 2 x 1+ TB

(RAID 1, OS)

1 x 1 TB

(Hadoop Logs)

1 x 1 TB

(ZooKeeper)

1 x 3 TB

(HDFS)

2 x 1+ TB

(RAID 1, OS)

1 x 1 TB

(Hadoop Logs)

1 x 3 TB

(HDFS)

2 x 1+ TB

(RAID 1, OS)

10 x 3 TB (HDFS)

If disk chassis allows:

12 x 3 TB (HDFS)

Network 2 x Bonded

10 GbE NICs

1 x 1 GbE NIC

(for mgmt.)

2 x Bonded

10 GbE NICs

1 x 1 GbE NIC

(for mgmt.)

2 x Bonded

10 GbE NICs

1 x 1 GbE NIC

(for mgmt.)

2 Example: IBM x3650 series

Master Nodes

Data Nodes

2 Agenda

• Basics• Architecture & Rack Design

• Hardware

• Software

• Security

2 Operating System

2 Linux File System

• Ext3

• Ext4

• XFS with -noattime, -inode64, -nobarrier options

Possibly better performance, be aware of delayed dataallocation (Consider turning off the delalloc option in /etc/fstab)

2 OS Optimizations

• Of course depending on your OS choice• Specific recommendations available by OS vendors

• Common recommendations• No physical I/O Scheduling (competes with virtual/HDFS

I/O Scheduling) (e.g. use NOOP Scheduler)

• Adjust vm.swapiness to 0

• Set number of file handles (ulimit, soft+hard) to 16384 (Data Nodes) / 65536 (Master Nodes)

• Set number of pending connections (net.core.somaxconn) to 1024

• Use Jumbo Frames (MTU=9000)

• Consider network bonding (802.3ad)

2 Java

• Oracle JDK 1.7 (64-bit)

• Oracle JDK 1.6 (64-bit)

• Open JDK 7 (64-bit)

2 Java Optimizations

• Use 64 bit JVM for all daemons

– Compressed OOPS enabled by default (Java 6 u23+)

• Java Heap Size

– Set Xmx == Xms

– Avoid Java defaults for NewSize and MaxNewSize

• Use 1/8 to 1/6 of max size for JVM’s larger than 4 GB

– Configure –XX:PermSize=128 MB, -XX:MaxPermSize=256 MB

• Use low-latency GC collector

– Set -XX:+UseConcMarkSweepGC, -XX:ParallelGCThreads=<N>

• Use high <N> on NameNode & ResourceManager

• Useful for debugging– -verbose:gc -Xloggc:<file> -XX:+PrintGCDetails– -XX:ErrorFile=<file>– -XX:+HeapDumpOnOutOfMemoryError

2 Hadoop Configuration

• Multiple redundant directories for NameNode metadata

– One of dfs.namenode.name.dir should be on NFS

– Softmount NFS with -tcp,soft,intr,timeo=20,retrans=5

• Take periodic backups of NameNode metadata

– Make copies of the entire storage directory

• Set dfs.datanode.failed.volumes.tolerated=true– Disk failure is no longer complete DataNode failure

– Especially important for large density nodes

• Set dfs.namenode.name.dir.restore=true

• Restores NN storage directory during checkpointing

• Reserve a lot of disk space for NameNode logs

– Hadoop logging is verbose – set aside multiple GB’s

– NameNode logs roll with in minutes – hard to debug issues

• Use version control for configuration!

2 Agenda

• Software II

• Security

2 Options for Data Ingestion

MapReduce

WebHDFS

hadoop fs -put

NFS Gateway

hadoop distcp

Oracle, Teradata, SQL Server, et al.

Connectors…

2 Agenda

• Hardware

• Software

• Security

2 Operation

Apache Ambari Cloudera Manager

2 Monitoring

• The basics: Nagios, Ganglia, Ambari/Cloudera Manager, Hue

• Admins need to understand the principles behind Hadoop and learn about their tool set: fsck, dfsadmin, …

• Monitor the hardware usage for your work load

– Disk I/O, network I/O, CPU and memory usage

– Use this information when expanding cluster capacity

• Monitor the usage with Hadoop metrics

– JVM metrics: GC times, memory used, thread Status

– RPC metrics: especially latency to track slowdowns

– HDFS metrics: Used storage, # of files & blocks, cluster load, file system operations

– Job Metrics: Slot utilization and Job status

• Tweak configurations during upgrades & maintenance windows on an ongoing basis

• Establish regular performance tests

– Use Oozie to run standard test like TeraSort, TestDFSIO, HiBench, …

2 Agenda

• Hardware

• Software

• Security

2 Security today

Kerberos in native Apache

Hadoop

Perimeter Security with Apache Knox

• LDAP• SSO

Authentication

Control access to cluster.

Authorization

Restrict access to explicit data

Understand who did what

Data Protection

Encrypt data at rest & motion

Native in Apache Hadoop• HDFS Permissions + ACL’s• Queues + Job ACL’s• Process Execution audit trail

Fine grained role based authorization• Hive• Apache Sentry• Apache Accumulo

Service level authorization with Knox

Central security policies with Ranger

Wire encryption in native

Apache Hadoop

Wire Encryption with Knox

Orchestrated encryption with 3rd party tools

2 Apache Knox

Client

Map Reduce

Tez Spark

Pig Hive

Ambari

Ranger

GangliaNagios

ZooKeeperJournal Nodes

Firewall

Firewall Hadoop Cluster

WebHDFS

WebHCat

2 Data Boxing

Raw data layerRead & Write

Division 1--

Read & WriteDivision 2

Read & Write

Set up data boxing using• Users & Groups• HDFS Permissions & ACL‘s• Higher level where applicable

2 Apache Ranger

File Level Access Control

Control permissions

Supports• HDFS• Hive• HBase• Storm• Knox

2 Thanks for listening

Twitter: @uweseiler

Mail:uwe.seiler@codecentric.de

XING:https://www.xing.com/profile

/Uwe_Seiler

Hadoop Operations - Best practices from the field

Technology