+ All Categories
Home > Documents > FLASHSTACKTM FOR SPLUNK REFERENCE ARCHITECTURE

FLASHSTACKTM FOR SPLUNK REFERENCE ARCHITECTURE

Date post: 27-Dec-2016
Category:
Upload: ngokiet
View: 235 times
Download: 3 times
Share this document with a friend
25
FLASHSTACK TM FOR SPLUNK REFERENCE ARCHITECTURE A Framework for Deploying Splunk at Enterprise Scale
Transcript

FLASHSTACKTM FOR SPLUNK REFERENCE ARCHITECTUREA Framework for Deploying Splunk at Enterprise Scale

1

EXECUTIVE SUMMARY .................................................................................................................. 2

INTRODUCTION ................................................................................................................................. 3

SOLUTION OVERVIEW ................................................................................................................... 3

SPLUNK ENTERPRISE ...................................................................................................................... 3

TESTING METHODOLOGY .............................................................................................................. 4

PURE + SPLUNK DIFFERENTIATORS ......................................................................................... 4

SOLUTION DESIGN ........................................................................................................................... 5

SPLUNK REFERENCE ARCHITECTURE .................................................................................... 6

SPLUNK SETUP DETAILS ................................................................................................................ 7

GENERAL SIZING GUIDELINES .................................................................................................. 10

SPLUNK STORAGE SIZING ........................................................................................................... 10

VALIDATION & BENCHMARKING ............................................................................................ 10

HOW TO INTERPRET THE RESULTS .......................................................................................... 10

RESULTS ................................................................................................................................................ 10

BONNIE++ ........................................................................................................................................... 10

SPLUNKIT ............................................................................................................................................ 12

LARGE CORE CONFIGURATION ................................................................................................ 13

LARGE ES CONFIGURATION ....................................................................................................... 14

INDEXING PERFORMANCE ...................................................................................................... 15

SEARCH METRICS ....................................................................................................................... 15

CONCLUSION ..................................................................................................................................... 17

REFERENCES ...................................................................................................................................... 17

ABOUT THE AUTHORS ................................................................................................................. 18

LAURA VETTER ................................................................................................................................. 18

SOMU RAJARATHINAM ................................................................................................................ 18

APPENDIX A: REFERENCE ARCHITECTURE CONFIGURATION ............................ 19

APPENDIX B: BONNIE++ .............................................................................................................. 21

APPENDIX C: INSTALLATION CONSIDERATIONS

AND BEST PRACTICES ................................................................................................................. 22

TABLE OF CONTENTS

2

EXECUTIVE SUMMARYKinney Group, a cloud solutions integrator headquartered in Indianapolis, Indiana,

performs Splunk integrations for federal agencies and Fortune 1000 clients throughout

the United States. When considering these integrations, systemic issues can arise as

the Splunk environment scales. One of the greatest risks to enterprise-wide adoption of

Splunk is inadequate, under-sized, or non-performant hardware. Organizations frequently

want to repurpose existing and often aging hardware. That hardware typically meets the

minimum specifications on IOPS and VM compute sizing. However, the net effect of this

repurposing is an under-performant or non-performant Splunk solution that could leave

Splunk customers dissatisfied with the solution – and possibly the entire Splunk platform.

Kinney Group has a rigorous selection process that identifies the best cloud technology

platforms. To tackle the organizational issues that can prevent a successful Splunk

adoption, Kinney Group developed a rigid Splunk test plan and joined forces with

Pure Storage and Splunk to develop an integrated solution offering. Pure Storage is a

leading all-flash array provider focused on reducing storage complexity while providing

performance, resiliency, and efficiency. Pure lowers storage costs by up to 50% and

management costs by up to 90%, and shrinks storage footprints by 5-10x.

The three entities together had the simple, common goal of providing a framework for

deploying a Splunk solution at scale. This framework will allow organizations to reap

the significant benefits of a high-performance analytics platform that lowers hardware

spend and introduces a faster time to deploy. This approach will empower organizations

to manage large Splunk instances as they march toward the analytics-driven, software-

defined enterprise.

To ensure a superior hardware solution, Kinney Group and Pure Storage decided

to elevate the normal Splunk testing process traditionally performed by storage

manufacturers. Our process had three goals. First, we wanted our test offering to be

an all-in solution, including compute, storage, and Splunk. This was accomplished

by leveraging the joint solution of Pure Storage, VMware and Cisco, which is called

FlashStack. Second, the team wanted to test theoretical use cases in order to enable

real world workloads to be used. The goal was to cover both search and indexing at

production scale while seeing how the solution performs. In doing so we are able to

provide real sizing guidance for Splunk architects looking to size various environments,

including resource intensive Splunk premium solutions. The third and final goal was

to reduce overall hardware spend without reducing performance, resiliency, or

adding complexity.

This paper outlines the team’s findings and should be used as framework and best

practices when architecting, developing, and specifying a Splunk installation with a

FlashStack solution. Using this framework, we can see that the Pure Storage FlashStack

solution is designed to scale Splunk at twice the cost savings compared to other

traditional and hyper-converged reference designs. This will ensure that an organization’s

Splunk deployment is 5x more efficient at the compute layer, as well as 10x more efficient

in rackspace, power, heating, and cooling compared to equally performant solutions.

3

INTRODUCTIONWith this solution, Kinney Group and Pure Storage together have developed results

that will impact how Splunk solutions are designed, deployed, and managed.

FlashStack dramatically improves simplicity and performance while lowering the

Total Cost of Ownership (TCO) of enterprise Splunk deployments. These results will

help to articulate and provide solutions to the challenges that many Splunk deployments

currently suffer from.

Many large, enterprise, flagship Splunk customers are able to achieve massive

throughput of data on their compute layer, but doing so involves greater complexity and

cost, which is prohibitive to all but the biggest Splunk deployments. When we started the

Pure FlashStack Splunk testing, we wanted to see if we could achieve similar results, but

without the IO bottlenecks and other issues seen in lower cost deployments. In this paper

we will outline our testing methodologies, how to apply those to real-world scenarios, and

what to expect from FlashStack running a Splunk solution.

The results were dramatically different than traditional Splunk reference architectures.

In fact, Eventgen capacity ran out before hitting the daily ingest throughput ceiling,

meaning that the storage was keeping up with the heavy load. We also were able to run

the solution virtualized since the FlashStack offering is so highly performant. Leveraging

a virtualization layer revealed the possibility of moving High Availability (HA) and site-to-

site redundancy back to the infrastructure layer, instead of solving it as part of the Splunk

software offering (i.e. Indexer Replication). VMware and Pure storage capabilities are

where HA and Redundancy are handled for mission critical apps across the organization.

By putting HA and Redundancy here, we were able to drastically reduce storage costs,

as well as further decrease compute needs for Splunk’s indexing layer, which then further

reduces cost and complexity.

SOLUTION OVERVIEWThe FlashStack solution for Splunk provides an end-to-end architecture with Cisco UCS,

Splunk, VMware, and Pure Storage technologies to tackle the challenges of Security

Information & Event Management and Operational Intelligence head on. The solution

requires a combination of highly-available computing power, sub-millisecond I/O latency,

data collection, high-speed indexing, real-time aggregation, and powerful search,

analysis, and visualization capabilities – which are handily addressed by the FlashStack

solution for Splunk.

SPLUNK ENTERPRISE

Splunk Enterprise is a platform for machine data. Machine data contains a definitive

record of all the activity and behavior of your customers, users, transactions, applications,

servers, networks, and mobile devices. And it's more than just logs: it includes

configurations, data from APIs, message queues, change events, the output of diagnostic

commands, call detail records, sensor data from industrial systems, and more.

Machine data comes in an array of unpredictable formats and the traditional set of

monitoring and analysis tools were not designed for the variety, velocity, volume, or

4

variability of this data. A new approach, one specifically architected for this unique class

of data, is required to quickly diagnose service problems, detect sophisticated security

threats, understand the health and performance of remote equipment, and demonstrate

compliance.

Splunk Enterprise scales to hundreds of terabytes per day to meet the needs of

any organization, and supports clustering, high availability and disaster recovery

configurations. All of this while keeping your data secure with role-based access controls,

secure data handling, auditability, and assurance of data integrity.

TESTING METHODOLOGY

When testing Splunk on the FlashStack architecture, we wanted not only to do the

standard testing that other Splunk Reference Hardware typically utilizes, but also to

develop real world data and use case scenarios. The typical Splunk testing frameworks

focus mainly on theoretical maximums, but the results captured and included here allow

for side-by-side comparison of the testing strategies on the solution. Also, by performing

testing with real world workloads, Splunk architects can size their actual workloads using

our Sizing Guidelines and be able to plan with confidence for how FlashStack will handle

their Splunk Workloads.

“Theoretical Maximum” Testing Methodologies Used:

• Bonnie++

• SplunkIT

“Actual, Real-World Use Cases” Testing Methodologies Used:

1. Core Use at Scale for Security, Compliance, and IT Operations Analytics (ITOA) with

typical Core Search Loads.

2. Splunk Enterprise Security at Scale for Security Use Cases with typical Enterprise

Security Data Model Acceleration and Correlation Search Loads.

Details of the exact FlashStack Configurations, Workload setup and execution, as well

as Splunk Configs are noted below.

PURE + SPLUNK DIFFERENTIATORS

Some of the compelling differentiators for Splunk + FlashStack include the following:

1. Performance Benchmarks that dramatically reduce your compute footprint in terms

of TB/day and compute hardware required.

2. By leveraging native Pure + VMware HA features, the solution negates the need for

Indexer Clustering and site-to-site Indexer Clustering.

3. Pre-built Kinney Group Cisco UCS Director Workflows

5

SOLUTION DESIGNTranscending the conventional model of bare metal installs for Splunk, the FlashStack

solution for Splunk involves all virtual machines. Apart from the benefits of server

consolidations, rapid deployment & provisioning, and ease of management that VMs

provide, the primary reason for choosing virtual machines is to avoid the overhead of

index clustering. The solution uses VMware vSphere 6, which provides efficient DRS

(Dynamic Resource Scheduling) & HA (High Availability) features to migrate the VMs to

other hosts in the cluster in case of a host failure. The ability to virtualize has removed

the need for indexer clustering as the data is always available on the shared storage,

and the supporting virtual machine can be migrated to another host with minimal time for

indexing and searching but with no index data loss. For multi-site failover, we recommend

pushing the site-to-site replication to the storage layer where it traditionally resides

for other mission critical workloads. This again drives out complexity with the Splunk

config and provides further compute workload savings at the indexing layer by removing

replication overhead.

Pure Storage FlashStack consists of a combined stack of hardware (storage, network, and

compute) and software (Cisco UCS Manager, Splunk Core & ES, Pure Storage GUI, Purity,

Red Hat Enterprise Linux).

The following diagram shows the FlashStack for Splunk physical architecture.

vPCvPCCisco Nexus 9396PX - A

FlashArray //m50

Cisco Nexus 9396PX - B

8G FC Connectivity

10/40 GbE NW Connectivity

WANSpine

Network

Cisco UCS 5108

10GbE Uplinks

Converged Interconnect

Cisco MDS 9148S A Cisco MDS 9148S B

16G FC Connectivity

Cisco UCS 6248 UP FI-BCisco UCS 6248 UP FI-A

6

SPLUNK REFERENCE ARCHITECTURE

Logical Architecture Diagram

7

SPLUNK SETUP DETAILS

In the setup, one SearchHead was sized for both Core and Enterprise Security

workloads. The testing included as many indexers as the setup could support, while

supporting enough EventGenerators to push a maximum amount of data. The Splunk

EventGenerator was moved outside of Splunk Core so that it would be able to push

up to 30TB per day through the environment. The findings show that 30TB/day was

the maximum one could push through this setup before EventGenerators became

the bottleneck. Also, at 30TB/day, we could still leave room for an appropriately sized

indexing layer.

Using the Kinney Group custom EventGenerator, the solution gained the ability to push

real-world logs through the environment, which included: Windows Event Logs, Apache

Logs, McAfee AV, Cisco ASA, and Cisco ESA. As for Core Workloads, various sparse and

dense searches were used as detailed below. For Enterprise Security, over 20 correlation

searches were turned on and accelerated as many DataModels as possible with the

available data to support.

To assess performance and to understand if the system was able to handle the workload

pushed at it, monitoring compute utilization at the Splunk Indexing and Search layers

was important. The size of the various Splunk queues were monitored through the

environment; the size of the Splunk queues are a good indicator of whether the Splunk

infrastructure has reached maximum capacity. For Enterprise Security, we also monitored

whether the data models and correlation search were able to keep up with the volume of

data being ingested.

Searches Used for Core Workloads

Cisco Dense Search

• index=cisco src_ip="10.123.175.128"

• Run once a minute

• -15m to now

• Schedule Window: 0

Cisco Needle search Last 24 Hours

• index=cisco src_ip="10.123.175.128" action=allowed dest_port=80 dest_

ip="129.168.154.20"

• Run once a minute

• -24h@h to now

• Schedule Window: 0

Mcafee Dense Search 15Min

• Index=av

• Run Once a Minute

• -15m to now

• Schedule Window: 0

8

Enterprise Security: DataModels Accelerated

20 Accelerated Data Models

• Application State

• Authentication

• Certificates

• Change Analysis

• Domain Analysis

• Email

• Incident Management

• Intrusion Detection

• Malware

• Network Resolution (DNS)

• Network Sessions

• Network Traffic

• Performance

• Risk Analysis

• Splunk Audit Logs

• Threat Intelligence

• Ticket Management

• Updates

• Vulnerabilities

• Web

Enterprise Security: Correlation Searches Turned On

22 Enabled Correlation Searches

• Account Deleted

• Activity From Expired User Identity

• Brute Force Access Behavior Detected

• Brute Force Access Behavior Detected

Over One Day

• Completely Inactive Account

• Concurrent Login Attempts

• Default Account Activity Detected

• Default Account At Rest Detected

• Excessive DNS Failures

• Excessive Failed Logins

• High Number of Hosts Not Updating

Malware Signatures

• High Number of Infected Hosts

• High Or Critical Priority Host with

Malware Detected

• Host With A Recurring Malware

Infection

• Host With High Number Of Listening

Ports

• Host With Multiple Infections

• Host With Old Infection Or Potential

Re-Infection

• Personally Identifiable Information

Detected

• Short-lived Account Detected

• Substantial Increase In Port Activity

• Threat Activity Detected

• Unusual Volume of Network Activity

9

VMware ESX Host Settings

PURPOSE COUNT CPU MEMORY

Splunk ES Reference

Architecture6 28 cores 256 GB

Event Generators

(Load generators)2 28 cores 256GB

VMware Virtual Machine Settings

PURPOSE COUNT CPU MEMORY

Search Head 1 24 vCPUs 128 GB

Indexers 10 12 vCPUs 64 GB

Deployment Server 1 4 vCPUs 8 GB

Event Generators 4 14 vCPUs 128 GB

VMware Storage Settings

The indexer data were placed on VMFS datastores which supports vMotion functionality.

Five Pure Storage volumes/luns were provisioned and attached to the Splunk ESX hosts

and five VMFS datastores were created. Each VMFS datastore serves two indexers.

DATASTORE NAME TYPE PROVISIONED

ds-splunkidx-data01 VMFS5 10 TB

ds-splunkidx-data02 VMFS5 10 TB

ds-splunkidx-data03 VMFS5 10 TB

ds-splunkidx-data04 VMFS5 10 TB

ds-splunkidx-data05 VMFS5 10 TB

10

GENERAL SIZING GUIDELINES

To size this environment, a minimum of three compute nodes are needed to fully realize

the HA benefits with VMware server virtualization. For those compute nodes, we

needed to allocate a minimum of three indexers. According to the testing, those three

indexers will yield around 4.5TB per day of daily ingest for both Core and ES workloads.

The testing determined that each indexer can handle around 1.5TB/day, regardless

of workload. For a Core workload, we observed performance of up to 3.3TB/day per

indexer. The recommendation is to have at least a 2:1 ratio, with a preference for 4:1, of

RAM:CPU in the Cisco layer. This is because we observed better performance at high

throughput results when we beefed up the RAM footprint per server, even at lower

compute levels. Still, additional elements are needed, like a Deployment Server and

SearchHead Clustering, especially with ES. When the indexer nodes were pushed to an

appropriate level, the Search Layer struggled to keep up on datamodel building, and the

resource utilization on the SearchHead compute reached maximum capacity. We believe

that SearchHead clustering will address that issue and allow one to keep raising the

ceiling on ingest throughput without compromising visibility, speed, or performance.

SPLUNK STORAGE SIZING

Based on real-world testing, we found that Pure Storage Arrays can conservatively

support a compression ration of 2:1 with a normal Splunk workload. Because HA can

be handled in the virtualization and storage layers, there is never a need for Indexer

Clustering, cutting one’s overall storage needs in half or better and still providing

a mission-critical level Splunk deployment. If site-to-site replication is required, we

recommend leveraging Pure’s asynchronous replication capabilities at the storage layer

to achieve effective replication. This will allow Splunk to follow the same HA and site

redundancy architectures that have been leveraged by all mission-critical applications for

years. After factoring in the appropriate compression ratio, one can use Splunk’s normal

storage benchmarks of 50% compression on raw. Then, the storage amount will need to

meet the retention requirements (90 days, 6 months, 1 year, etc.).

VALIDATION & BENCHMARKING

HOW TO INTERPRET THE RESULTS

When reviewing these results, use them as backing material when sizing a FlashStack

Splunk instance using the General Sizing Guidelines as noted above. This testing allows

one to understand, with confidence, what the reference design is capable of, given

both common workloads of Core and Enterprise Security. In sizing for larger or smaller

environments, both the general sizing guidelines and the test results below can be used

to size the system appropriately.

RESULTS

BONNIE++

Even though Bonnie++ is the suggested benchmark tool by Splunk to validate storage

requirements, it performs writes and rewrites to data that have similar patterns that

11

are easily dedupable and compressible with Pure Storage FlashArray. Hence, the

bandwidth one would see on FlashArray using Bonnie++ might seem artificially higher

than a production Splunk environment. To avoid confusion and inflated results we also

performed real Splunk Core and ES testing to convince readers (and ourselves) that the

FlashStack Solution for Splunk can indeed perform at a higher caliber and efficiency than

other storage platforms that have been tested.

# OF

INDEXERS

PUT BLOCK

(MBPS)

REWRITE

(MBPS)

GET BLOCK

(MBPS)

RANDOM SEEK

(REQUESTS/SEC)

2 4,304 1,359 2,451 29,534

4 6,092 2,336 4,272 46,267

8 7,381 3,461 6,352 72,584

4,304

6,092

7,381

2 Indexers

2,451

4,272

6,352

1,359

2,336

3,461

0

2,000

4,000

6,000

8,000

GET_BLOCKREWRITEPUT_BLOCK

4 Indexers 8 Indexers

29,534

72,584

46,267

0

20,000

40,000

60,000

80,000

Indexer Nodes Count

12

SPLUNKIT

For SplunkIT testing, a single instance was used with the configuration of Splunk on a

virtual machine with 16 vCPUs and 128GB memory.

The static_filesize_gb=100 was configured to create 100GB worth of ingest data, which

was then indexed and used in searches.

The SplunkIT tool can be downloaded from the Splunk Apps store at the following url:

https://splunkbase.splunk.com/app/749/

Indexing Results

Search Results

13

LARGE CORE CONFIGURATION

VARIABLES DETAILS

Reference

Hardware

8 UCS B200-M4 blades hosting ESX 6.0 comprised of:

1 Search Head

1 Deployment Server

10 Indexers

4 Event Generators

Test Scenario Core Install with a Variety of Searches

Data Types

Even Mix of the following data types:

• Apache

• Windows EventLogs

• McAfee AV

• Cisco ASA

• Cisco ESA

Search LoadVariety of Sparse and Dense Searches running at

60 second intervals

Results:

Data Volume6TB 12TB 18TB 24TB 30TB

Results:

Indexer Count2 4 6 8 10

Summary

At 30TB, the Indexer Compute resources were still healthy

and the queues were not filled. See Compute and Queue

health details below. 30TB was the maximum we could push

with our current configuration before our Event Generators

became the bottleneck.

14

LARGE ES CONFIGURATION

VARIABLES DETAILS

Reference

Hardware

6 UCS B200-M4 blades hosting ESX 6.0 comprised of the

following virtual machines:

1 Search Head

1 Deployment Server

10 Indexers

Data Load

2 UCS B200-M4 blades hosting ESX 6.0 comprised of the

following virtual machines:

4 Event Generators

Test Scenario Core Install with a Variety of Searches

Data Types

Even Mix of the following data types:

• Apache

• Windows EventLogs

• McAfee AV

• Cisco ASA

• Cisco ESA

Search Load221 of 60 Correlation Searches running at normal ES

workload intervals

Results:

Data Volume3TB 6TB 9TB 12TB 15TB

Results:

Indexer Count2 4 6 8 10

Summary

At 30TB, the Indexer Compute resources were still healthy

and the queues were not filled. See Compute and Queue

health details below. 30TB was the maximum we could push

with our current configuration before our Event Generators

became the bottleneck.

Recommendation

Recommend adding Indexer resources as well as consider

utilizing SearchHead Clustering for ES installations larger than

15TBs to spread out the load.

1 22 correlation searches is typical for large ES deployments

15

Indexing Performance

The following screenshots show the various Indexer queue utilizations during the run.

CPU utilization of the search head and 10 indexers during the run.

Search Metrics

16

Following is a snapshot of an indexer’s data pipeline view as reported by Splunk.

17

CONCLUSIONTo summarize the findings contained in this paper, the Pure

FlashArray + FlashStack reference design by Kinney Group,

Pure Storage, and Splunk gives an organization at least five

times more efficiency at the compute layer, as well as ten times

greater efficiencies in rackspace, power, heating, and cooling

when compared to an equally performant disk based solution.

Additionally, FlashStack removes the need for Indexer

Clustering because of storage replication, HA at the

virtualization layer, and pushing site-to-site replication into the

storage layer – thereby reducing the storage required by at

least two times, which translates to a dramatic cost savings.

By using the highly performant storage found in Pure’s Purity

Operating System and all-flash Array, paired with proven

FlashStack solutions, offering organizations can much more

quickly leverage Splunk at scale. This approach will empower

organizations to manage large Splunk instances as they march

toward the analytics-driven, software-defined enterprise.

REFERENCES• Splunk Documentation: https://docs.splunk.com/

• SplunkIT Toolkit: https://splunkbase.splunk.com/app/749/

• EventGen app: https://splunkbase.splunk.com/app/1924/

18

ABOUT THE AUTHORS

LAURA VETTER

Laura Vetter is the VP of IT Operations Analytics at Kinney

Group, Inc. and she is one of the most influential leaders

at the company. After graduating from Indiana University

in 1997, Laura built a professional foundation in database

and software engineering. Fast-forward to today, Laura

is a Splunk Certified Consultant II and holds a CompTIA

Security+ certification. She has been a critical driver of

Kinney Group’s technical capabilities and has written

numerous customer success stories. Her combination of

natural intelligence, work ethic, and expertise has earned

her tremendous respect with customers, partners, and

colleagues alike.

SOMU RAJARATHINAM

Somu Rajarathinam is the Pure Storage Solutions Architect

responsible for defining database solutions based on

the company’s products, performing benchmarks, and

developing reference architectures for databases on Pure.

Somu has over 20 years of database experience, including

as a member of Oracle Corporation’s Systems Performance

and Oracle Applications Performance Groups. His career

has also included assignments with Logitech, Inspirage,

and Autodesk, ranging from providing database and

performance solutions to managing Infrastructure, to

delivering database and application support, including

Splunk, both in-house and in the cloud.

19

APPENDIX A: REFERENCE ARCHITECTURE CONFIGURATION

The following table shows the bill of materials used to build the Splunk FlashStack

reference architecture.

VENDOR NAME

VERSION/

MODEL DESCRIPTION QTY

Cisco Ethernet Switch Cisco Nexus

9396-PX

10/40 Gigabit

Ethernet switch

2

Cisco Fabric Switch Cisco MDS 9148s Fabric switch

for connectivity

between storage

& servers

2

Cisco Fabric

Interconnect

Cisco UCS

6248UP

Interconnect

between UCS

Chassis & switch

fabric

2

Cisco Blade Server

Chassis

UCS 5108 Blade chassis

that can hold up

to 8 blades

2

Cisco UCS Blade

Servers

UCSB-B200-M4 UCS B-Series

Blade servers

92

Cisco Cisco UCS VIC

1350

UCS-10M-

2208XP

Cisco UCS VIC 1350

modular LOM for

blade servers

8

Pure Storage Pure FlashArray HW: //m50

SW: Purity 4.7.4

Pure Storage

FlashArray

1

2 Two of the ESX hosts were used for Event generators and one server was setup as spare.

SERVER CONFIGURATION

Two chassis with 8 Intel CPU based Cisco UCS B-series B200 M4 blade servers were

deployed for hosting the cluster of ESXi hosts that housed the Splunk virtual machines.

In the set-up were two dedicated ESX hosts for housing four Event Generators – which

might not be required for production setup but can be repurposed as Forwarders. To

account for any ESX host failure that could impact the availability of the indexers or

Search Head, the recommendation is to add an additional ESX host (N+1) as a spare to the

setup. Server configuration is described in the following table.

20

COMPONENT DESCRIPTION

Processor2 x Intel Xeon E5-2697 V3 2.6 GHz (2 CPUs with 14 cores

each, 28 cores in total)

Memory 256GB @ 2.1GHz (8 x 32GB)

HBA4 x 10G Ports on Cisco UCS VIC 1350 (UCS-IOM-2208XP)

40Gbps

NIC 2 x 10G Ports on Cisco UCS VIC 1350

BIOSTurbo Boost, Hyper Threading, Virtualization Technology

(VT), VT for Directed IO, Intel ATS Support were enabled

Processor2 x Intel Xeon E5-2697 V3 2.6 GHz (2 CPUs with 14 cores

each, 28 cores in total)

STORAGE CONFIGURATION

The FlashStack design for Splunk includes an //m50 FlashArray for increased scalability

and throughput. Based on the capacity needs, the lower (//m20)or higher end (//m70) of

the //m series can be used.

There are no special configuration or value changes from any normal configuration.

There are no performance knobs to tune on the FlashArray. The hosts are redundantly

connected to the controllers with four connections to each controller from two redundant

HBAs on each host over the FC protocol for a total of sixteen logical paths. The table

below shows the components of the array.

COMPONENT DESCRIPTION

FlashArray //m50

Capacity20TB raw (base chassis)

11.17 TB usable

Connectivity8 x 16 Gb/s Fibre Channel

1 Gb/s redundant Ethernet (Management port)

Physical 3U (5.12” x 18.94” x 29.72” FlashArray//m chassis)

O.S Version Purity 4.7.4

21

Zoning was performed on the Cisco MDS 9148S switches to allow Pure Storage

FlashArray//m to see the initiators. The following table lists various operating systems and

the versions used in this solution.

OPERATING SYSTEM

AND SOFTWAREDESCRIPTION

Red Hat Linux 7.2 (3.10.0-327.el7.x86_64) – 64 bit

Splunk Splunk 6.4.2 (build 00f5bb3fa822)

Purity O.S 4.7.4

Cisco UCS Manager 2.2 (5b)

APPENDIX B: BONNIE++Splunk is demanding on two critical infrastructure components: disk and CPU/RAM (aka

compute). Splunk provides guidelines on hardware requirements for Splunk instances

running on any hardware. For better indexing and to support more searches, Splunk

suggests a minimum number of IO operations per second (IOPS) and recommends the

customer run a Bonnie++ benchmark to validate if their storage system can meet the

requirements. Most of the storage vendors who are providing a reference architecture

for Splunk have performed and published their specific Bonnie++ benchmark results.

Interestingly, the Bonnie++ numbers that were recommended by Splunk were based on

7200rpm or 15000rpm spinning disks, and with the advent of SSD drives and all-flash

arrays, these numbers can easily be met, without any constraints. We performed the

Bonnie++ benchmark just to showcase the comparison numbers against other vendors

but strongly recommend performing real testing on Splunk (as outlined in the rest of this

document) rather than just relying on Bonnie++ numbers.

Bonnie++ performs a mix of sequential writes, rewrites, and reads along with file creation

and deletion. Bonnie++ was run across 2, 4, 8, and 10 virtual machines using the following

command.

$ bonnie++ -d /b01 –s 128G –u root:root –qf > bonnie-`hostname`.csv

The /b01 location is the block storage that was carved out of the VMFS datastore.

The –s was provided with double the size of the memory to avoid any host level caching.

The output in CSV files, consolidated per run, were captured and summarized.

Following are the key metrics reported by Bonnie++.

• Put Block (Sequential output – Block – KBps) which affects the data ingest rate.

• Rewrite (Sequential output – Rewrite – KBps) which affects index performance

• Get Block (Sequential input – Block – KBps) which affects index performance

• Random Seek (requests/sec) which affects search performance

22

APPENDIX C: INSTALLATION CONSIDERATIONS AND BEST PRACTICES

SERVERS DEPLOYMENT

One of the salient features of Splunk is the software itself. Splunk can perform various

roles like search head, indexer, deployment server, etc., and doesn’t require different

software to be installed. Rather, the server roles are defined by configurations. Hence,

to save time with installation, a VM template was created with all software components

(RHEL, Splunk) installed without defining any roles, and we cloned 16 of them.

AUTOMATION

There are various frameworks available to automate the server provisioning and

management. The virtual machine installation can be automated through PowerCLI for

vSphere ESXi, Chef/Puppet, or through VMware vRealize Automation (VRA). We used

PowerCLI for vSphere to perform various operational activities like VM configuration

changes, and startup and shutdown.

Cluster Secure Shell for Mac (csshx), which enables us to issue commands across

multiple terminal clients at the same time, was implemented. This is useful for performing

any configurations on all indexers. An alternate option for Windows is the MobaXterm.

COMPRESSION

Splunk by default enables gzip compression of the raw data files after indexing. Gzip not

only performs deep level compression but also consumes more server level compute

cycles. Pure Storage FlashArray performs data reduction inline all the time and follows it

up with deep level data reduction. Even with Splunk’s gzip compression, Pure FlashArray

gets anywhere between 1.5 to 2.2:1 data reduction. To get further reduction at the

FlashArray level, we suggest using lz4 compression at the Splunk level, which would also

relieve CPU cycles from the host.

# Make the following change in indexes.conf

journalCompression=lz4

23

© 2016 Pure Storage, Inc. All rights reserved. Pure Storage, Pure1, the P Logo, Evergreen, and FlashStack are

trademarks or registered trademarks of Pure Storage, Inc. in the U.S. and other countries. All other trademarks

are registered marks of their respective owners.

The Pure Storage products and programs described in this documentation are distributed under a license

agreement restricting the use, copying, distribution, and decompilation/reverse engineering of the products.

No part of this documentation may be reproduced in any form by any means without prior written authorization

from Pure Storage, Inc. and its licensors, if any. Pure Storage may make improvements and/or changes in the

Pure Storage products and/or the programs described in this documentation at any time without notice.

THIS DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS,

REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY,

FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE

EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. PURE STORAGE SHALL NOT

BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES IN CONNECTION WITH THE FURNISHING,

PERFORMANCE, OR USE OF THIS DOCUMENTATION. THE INFORMATION CONTAINED IN THIS

DOCUMENTATION IS SUBJECT TO CHANGE WITHOUT NOTICE.

24


Recommended