Greenplum AnAlytics Workbench - Rose Technologies AnAlytics Workbench Table of ConTenTs...

Greenplum AnAlytics Workbench

Table of ConTenTsIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Partners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Super Micro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Micron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Intel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Seagate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Mellanox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Mellanox ConnectX®-3 VPI Network Adapter Card . . . . . . . . . . . . . . . . . . . . . . 5

Mellanox ConnectX®-3 VPI card Specifications . . . . . . . . . . . . . . . . . . . . . . . . 5

Mellanox SwitchX® VPI Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Mellanox SwitchX® VPI Switch Specifications . . . . . . . . . . . . . . . . . . . . . . . . . 6

Mellanox Cables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Mellanox FDR Passive Copper and Optical Cable . . . . . . . . . . . . . . . . . . . . . . 6

Mellanox Unstructured Data Accelerator (UDA) . . . . . . . . . . . . . . . . . . . . . . . . 6

Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Rubicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Hadoop Software Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Hadoop MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Hadoop Distributed File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Hadoop Ecosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Hadoop on the Greenplum Analytics Workbench . . . . . . . . . . . . . . . . . . . . . . . . . . 10

TeraSort Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

About Greenplum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

W h i T e P a P e r

absTraCTThis white paper details the way the Greenplum Analytics Workbench was designed and built to validate Apache Hadoop code at scale, as well as provide a large scale experimentation environment for mixed mode development that include various SQL and Non-SQL execution environments . It describes the core architectural components involved as well as highlights the benefits that an Enterprise can leverage to quickly and efficiently analyze Big Data .

inTroduCTionEnterprises have been dealing with storing rapidly growing amounts of data – truly big data – available via traditional sources such as ERP, CRM, etc . to now social media, blogs, etc . The initial focus for the enterprise has been on efficient storage of big data but the focus has now shifted to analytics on the big data sets . Hadoop has emerged as the platform of choice for processing big data – especially unstructured data . The out of box experience with Hadoop still has a lot to be desired as it lacks tools needed for ease of deployment and management of such an infrastructure; especially one needed to handle larger deployments . Enterprises are also quickly realizing that in order to maximize the analytics capabilities, a mixed mode environment is imperative . A mixed environment is where enterprises have an easy way of combining both the structured data sets (using traditional SQL) and unstructured data sets (using Hadoop/no-SQL) without having to rework existing processes .

Greenplum Analytics Workbench is built to provide an environment that supports mixed mode development and validation at scale . The workbench is pre-configured with open and freely available data sets and has analysis software built-in for quick turnaround and rapid productivity . It will contain the entire Hadoop stack consisting of HDFS, M/R, PIG, HIVE, HBase, Mahout and augment it with SQL capabilities of Greenplum Database – industry’s leading MPP database; all deployed on the same nodes . Analytics Workbench provides a perfect experimentation platform for Greenplum’s thought leadership in the Unified Analytics Platform . It also provides a tremendous learning opportunity for organizations that wish to build and operate a large Hadoop/Mixed mode cluster .

ParTnersThe Analytics Workbench was assembled with the help of strong partnership with some of the industries’ leading vendors of hardware and services . The Greenplum team forged close alliances with these partners to carefully assemble the hardware nodes, rack/stack and cable the nodes in a state of the art datacenter . An extremely fast network backbone and switching layer was designed to provide the cluster with blazing fast throughput for intra-cluster communication .

suPermiCroSupermicro contributed 1,000 enterprise-ready server systems for all data-processing nodes to power the 24 peta bytes of storage available on the cluster . Supermicro fully assembled and tested the data-processing nodes in their Silicon Valley production center . The assembly process included integrating processors, memory, disk drives, and network cards from the other contributing partners .

Supermicro’s design team optimized the system configuration to address both data center space and power challenges . Supermicro servers maintain peak power efficiencies with platinum-level 94% efficient power supplies . Supermicro then maximized node count per rack, reducing the overheads for power and cooling expenditure while delivering high performance and 24terabytes of storage requirements per data-processing node .

The specifications of the Supermicro systems are as follows:

2U Greenplum Hadoop OEM Server - Model # PIO-626T-6RF-EM09B

•SupermicroDualProcessormotherboardsupporting:

- Dual Intel® Xeon® X5500/X5600 series processors

- Up to 192GB RAM with 12 DDR3 RDIMMs

- 5 expansion slots: 3 PCI-E 2 .0 x8, 1 PCI-E 2 .0 x4, 1 PCI-E x4

- Onboard LSI 2008 6 .0Gbps disk controller (IR mode)

- Dual LAN with Intel 82576 Gigabit Ethernet Controller

- Dedicated IPMI remote management port

•Supermicro2Userverchassissupporting:

- 12 hot-swap 3 .5” drive trays

- Redundant 500 Watt “platinum-level” power supplies - 94+% efficiency rating

- Active backplane with 6 .0Gbps SAS/SATA expander

- 7 low-profile PCI-E expansion slots

miCronMicron contributed 6000 DDR3 RDIMM memory sticks with 8GB each . A total of 48TB of memory to be evenly distributed across 1000 nodes such that each node has a total of 48GB RAM .

DDR3 functionality and operations supported as:

•240-pin,registereddualin-linememorymodule(RDIMM)

•Fastdatatransferrates:PC3-10600,PC3-8500,orPC3-6400

•8GB(512Megx72)

•VDD=1.5V±0.075V

•VDDSPD=3.0–3.6V

•SupportsECCerrordetectionandcorrection

•Nominalanddynamicon-dietermination(ODT)fordata,strobe,andmasksignals

•Quadrank

•On-boardI2Ctemperaturesensorwithintegratedserialpresence-detect(SPD)EEPROM

•Fixedburstchop(BC)of4andburstlength(BL)of8viathemoderegisterset(MRS)

•SelectableBC4orBL8on-the-fly(OTF)

•Goldedgecontacts

inTelIntel contributed 2000 Westmere processors with the following specifications

Processor Number X5670

# of Cores 6

# of Threads 12

Clock Speed 2 .93 GHz

Max Turbo Frequency 3 .33 GHz

Intel® Smart Cache 12 MB

Bus/Core Ratio 22

Intel® QPI Speed 6 .4 GT/s

# of QPI Links 2

Instruction Set 64-bit

Instruction Set Extensions SSE4 .2

Embedded Options Available No

Lithography 32 nm

Max TDP 95 W

Max Memory Size (dependent on memory type) 288 GB

Memory Types DDR3-800/1066/1333

# of Memory Channels 3

Max Memory Bandwidth 32 GB/s

Physical Address Extensions 40-bit

ECC Memory Supported Yes

Intel® Turbo Boost Technology Yes

Intel® Hyper-Threading Technology Yes

Intel® Virtualization Technology (VT-x) Yes

Intel® Virtualization Technology for Directed I/O (VT-d) Yes

Intel® Trusted Execution Technology Yes

AES New Instructions Yes

Intel® 64 Yes

Idle States Yes

Enhanced Intel SpeedStep® Technology Yes

Intel® Demand Based Switching Yes

Thermal Monitoring Technologies No

Execute Disable Bit Yes

Table 1: (courtesy: http://ark .intel .com/products/47920/Intel-Xeon-Processor-X5670-(12M-Cache-2_93-GHz-6_40-GTs-Intel-QPI))

seagaTeSeagate contributed 12,000 2TB drives for a total of 24TB per node and 24PB raw storage across the cluster .

The specification of the Seagate drives is as follows:

mellanox

mellanox ConneCTx®-3 VPi neTWork adaPTer Card

mellanox ConneCTx®-3 VPi Card sPeCifiCaTions

Table 2: (courtesy: http://www .provantage .com/seagate-st2000nm0011~7SEGS280 .htm)

Product Name 2TB Constellation ES 7200RPM 3 .5” SATA 6GB/s 64MB Cache Hard Drive

Product Type Hard Drive

Buffer 64 MB

Hard Drive Interface SATA/600

Compatible Drive Bay Width 3 .5”

SATA Pin 7-pin

Height 1 .0”

Width 4 .0”

Depth 5 .8”

Product Series ES

Form Factor Internal

Product Model ST2000NM0011

Product Line Constellation

Storage Capacity 2 TB

Rotational Speed 7200 rpm

Maximum External Data Transfer Rate 600 MBps (4 .7 Gbps)

Average Latency 4 .16 ms

Average Seek Time 9 .50 ms

Part Number MCX354A-FCBT

Supported Data Rates – InfiniBand FDR;QDR;DDR

Supported Data Rates – Ethernet 40GbE;10GbE

PCI Express generations support 3 .0;2 .0; 1 .1

RDMA Support InfiniBand; RoCE

Supported Media Types Direct Attached Copper; Active Optical Cables, Optical Modules

Number of Ports and Types 2 Ports, QSFP+

mellanox sWiTChx® VPi sWiTChes

mellanox sWiTChx® VPi sWiTCh sPeCifiCaTions

mellanox Cables

mellanox fdr PassiVe CoPPer and oPTiCal CableGreenplum Analytics Workbench connectivity is enabled by Mellanox’s FDR cables . Both Passive Copper and Active Optical Cables are used to provide state-of-the-art cluster cabling solution as well as durability and ease of installation .

mellanox unsTruCTured daTa aCCeleraTor (uda)Mellanox UDA, a software plugin, accelerates Hadoop network and improves the scaling of Hadoop clusters executing data analytics intensive applications . A novel data moving protocol which uses RDMA in combination with an efficient merge-sort algorithm enables Hadoop clusters based on Mellanox InfiniBand and 40/10GbE RoCE (RDMA over Converged Ethernet) adapter cards to efficiently move data between servers accelerating the Hadoop framework .

The 1000 node Hadoop cluster is connected via blazing fast FDR, 56Gbps InfiniBand interconnect, using ConnectX®-3 VPI cards and SwitchX® VPI switches, as described above . The cluster is using 3 layers of switching: Node level switches that connect 20 servers in each rack to the aggregation layer using 4 FDR uplinks from each Top of Rack Switch (ToR) . Aggregation layer switches that are connected to the core layer using 18 uplinks delivering full non-blocking InfiniBand network between the aggregation and core levels and the core level switches . The cluster utilizes IP over IB protocol to enable a more efficient connection to socket based portions of the framework .

UDA provides the MapReduce portion of the framework the ability to utilize RDMA connectivity between nodes, reducing CPU overhead and enabling lower latency connections . The outcome of RDMA usage is significant reduction of processing time for similar size of datasets .

Part Number MSX6036F-1SFR

Supported Data Rates – InfiniBand FDR;QDR;DDR

Supported Data Rates – Ethernet 40GbE;10GbE

Port-to-Port Latency – InfiniBand 170ns

Port-to-Port Latency – InfiniBand 230ns

Blocking Ratio 1:1 (Fully non-blocking)

Number of Ports and Type 36 Ports, QSFP+

Typical Power Consumption 126W

Supported Media Types Direct Attached Copper; Active Optical Cables, Optical Modules

The interconnect layout of the network is as follows:

sWiTChSwitch is the state of the art datacenter in Las Vegas, NV where the Analytics Workbench is hosted . There are almost 3 full SCIFS consisting of 54 racks in all . Each rack holds 20 servers . A few racks are not completely filled leaving a bit of a room for expansion . Switch datacenter however will be able to accommodate future growth in other SCIFs with no seeming impact to the overall cluster .

The racks are divided into data racks, core racks and infrastructure racks . Infrastructure racks hold servers for Puppet, Nagios, Ganglia, DNS, DHCP, etc . whereas the core racks hold the servers for name node, job tracker node, Zookeeper, HBase, etc .

The rack layout is as follows:

rubiConThe Rubicon team which is a part of VMware provides the Tier-1 an Tier-2 support for the cluster . This includes monitoring the network, hardware and various system level monitoring checks . The team uses Zabbix for systems management and has developed sophisticated plug-ins and dashboard on top of Zabbix . Rubicon team has a local presence in Las Vegas and is possesses the ability to provide rapid response to critical issues within the cluster

hadooP sofTWare oVerVieWHadoop is an industry leading open source distributed file system that is designed to scale with growing data storage and compute needs of an organization . By using the same nodes for both compute and storage, a cluster can scale in both the dimensions simultaneously and avoid traditional bottlenecks of NAS/SAN type architecture . Below are some of the key components of Hadoop:

Hadoop MapReduce: the parallel task processing mechanism that takes a query (job) and runs it in parallel on multiple nodes . The parallelism provides much better throughput for unstructured data sets that can be independently processed .

Hadoop Distributed File System (HDFS): the base file system layer that stores data across all of the nodes in the cluster .

MapReduce as a computing paradigm was popularized by Google and Hadoop was written and open sourced by Yahoo as an implementation of that paradigm .

hadooP maPreduCeHadoop MapReduce is a software framework for easily writing applications which process large amounts of data in-parallel on large clusters of commodity compute nodes .

ThediagrambelowdepictsthebasicsofMapReduceworkflow

A MapReduce job (query) usually splits the input data-set into independent chunks – size of each chunk is dependent on the system wide setting (typically 64MB) . Each block is processed by the map tasks in a completely parallel manner . The framework sorts the output of the maps, which are then used as input to the reduce tasks . Typically both the input and the output of the job are stored in the HDFS . The framework takes care of scheduling tasks, monitoring them and managing the re-execution of failed tasks .

Typically in a Hadoop cluster, the MapReduce compute nodes and the storage layer (HDFS) reside on the same set of nodes . The system is configured to be rack aware therefore making is possible for the framework to effectively schedule tasks on the nodes where data is already present minimizing having to move data within a cluster of nodes . This is the compute layer that derives key insight from the data that resides in the HDFS layer .

Hadoop is completely written in Java but MapReduce applications do not need to be . MapReduce applications can utilize the Hadoop Streaming interface to specify any executable to be the mapper or reducer for a particular job .

The MapReduce framework consists of the following:

JobTracker: single master per cluster of nodes that schedules, monitors and manages jobs as well as its component tasks .

TaskTracker: one slave TaskTracker per cluster node that execute that task components for a job as directed by the JobTracker .

In the upcoming release of Hadoop, the resource management module will undergo drastic rework . It will maintain backwards compatibility while effectively splitting the resource management capabilities into a standalone module .

hadooP disTribuTed file sysTemHadoop Distributed File System (HDFS) is a block based file system that allows user data to be stored in files . It retains the look and feel of a linux file system so that users or applications can create or remove files and directories as well as move or rename files and directories . HDFS does not support setting hard or soft links . All HDFS communication is layered on top of the TCP/IP protocol .

Below are the key components for HDFS:

NameNode: single master metadata server that has in memory maps of every file, file locations as well as all the blocks within the files and their locations within the HDFS namespace . The upcoming release of Hadoop, NameNode HA feature will be introduced to release some stress of the existing design (such as a single point of failure) .

DataNode: one slave DataNode per cluster node that serves read/write requests as well as performs block creation, deletion and replication as directed by the NameNode .

This is the storage layer where all the data resides before a MapReduce job can run on it . HDFS uses block mirroring to spread the data around in the Hadoop cluster for protection as well as data locality to run MapReduce jobs on the same data but on multiple compute nodes . The default block size is 64 MB and the default replication factor is 3x . The copies are written in a rack aware manner so that all 3 copies do not reside on the same rack . Central idea behind replication being that if a rack goes down, system will still have access to full data set as much as possible .

hadooP eCosysTemHadoop ecosystem consists of the following main blocks:

•Hive: a SQL-like adhoc querying interface for data stored in HDFS

•Hbase: a column oriented structured storage system for HDFS

•Pig:highleveldataflowlanguageandexecutionframeworkforparallelcomputation

•Mahout: scalable machine learning algorithms using Hadoop

The above is not an exhaustive list of all Hadoop ecosystem components .

hadooP on The greenPlum analyTiCs WorkbenChFor most, a typical Hadoop cluster consists of a name node, a few other master nodes and a whole lot of data nodes . The diagram below shows the Hadoop data nodes and the corresponding master nodes .

A few of the master roles are hosted on the same machine to begin with . This scenario may change depending on the load on the system .

Mahout

SPRING BATCH

HIVE M/R PIG

HBASE

HDFS

Oozie

Analysis

WorkflowMgmt.

Languages

Exec . Env .

File System

In reality, a typical Hadoop cluster is supported by a number of additional roles as shown in the diagram below

The table below provides a brief description of the server roles

Access The nodes are used to access the cluster . There is no direct access to the data nodes from outside . Typically these nodes support ssh based connectivity

Data ingestion Data ingestion nodes are used for bulk upload of data into the cluster . These nodes can be used as a staging area for further processing prior to loading into HDFS

Web based management Web based management nodes are used for accessing the cluster via HTTP

Jenkins Jenkins is an open source continuous integration framework . The server is used to build Hadoop code on demand or on a pre-defined trigger and provides for a dashboard to view the results

Ganglia, Nagios and Zabbix These systems are used to monitor the cluster . For the analytics workbench, Zabbix is currently used to monitor the system level statistics whereas Nagios and Ganglia combination is used for application level monitoring

Plato server This is deployed to monitor the health of the disks . It is actively monitored by the Rubicon team

Kickstart, DNS, DHCP and NTP Kickstart server is used to load the base OS onto the nodes whereas DNS, DHCP and NTP are used for network management

YUM repo, Puppet master, Kerberos

YUM repo is used as a repository for RHEL packages . It is used by the puppet master and the puppet agent running on each data node to access the packages that the slaves need for de-ployment . Kerberos is used as an authentication mechanism (needed as a part of secure Hadoop implementation)

UFM This is used as a the unified fabric manager for the Mellanox network

Internet

EMC2, EMC, the EMC logo, and Greenplum are registered trademarks or trademarks of EMC Corporation in the United States and other countries . All other trademarks used herein are the property of their respective owners . © Copyright 2012 EMC Corporation . All rights reserved . Published in the USA . 05/12 White Paper

Greenplum, a Division of EMC 1900 South Norfolk Street San Mateo, CA 94403 Tel: 650-286-8012 www .greenplum .com

TerasorT examPleIndustry benchmark TeraSort was run on the cluster . The cluster configuration wasn’t optimized to the best possible tuning . The intent of the run was to simply validate the general health of the cluster as well as measure the TeraSort run characteristics .

The first run was against 1TB and the second run was against 10TB of data . There are plans to run 100TB and even 1PB sort in the near future

The results are shown below:

abouT greenPlumGreenplum, a division of EMC, is driving the future of Big Data analytics with breakthrough products that harness the skills of data science teams to help global organizations realize the full promise of business agility and become data-driven, predictive enterprises . The division’s products include Greenplum Unified Analytics Platform, Greenplum Data Computing Appliance, Greenplum Database, Greenplum Analytics Lab, Greenplum HD and Greenplum Chorus . They embody the power of open systems, cloud computing, virtualization and social collaboration, enabling global organizations to gain greater insight and value from their data than ever before possible . Learn more at www .greenplum .com

ConTaCT usTo learn more about how Greenplum products, services, and solutions can help you realize Big Data Analytics opportunities, visit us at www .greenplum .com .

Date post:	24-Mar-2018
Category:	Documents
Upload:	dinhdung
View:	217 times
Download:	2 times

Greenplum AnAlytics Workbench - Rose Technologies AnAlytics Workbench Table of ConTenTs...

Documents