Technical Report
NetApp E-Series E2800 and Splunk with SANtricity System Manager 11.30 Stephen Carl, NetApp
September 2016 | TR-4555
Abstract
This technical report describes the integrated architecture of the NetApp® E-Series 2800 all-
flash or hybrid storage system and Splunk design. Optimized for node storage balance,
reliability, performance, storage capacity, and density, this design employs the Splunk
clustered index node model, with higher scalability and lower TCO. This document
summarizes the performance test results obtained from a Splunk machine log event
simulation tool.
2 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
TABLE OF CONTENTS
1 Introduction ........................................................................................................................................... 4
2 Splunk Use Cases ................................................................................................................................. 4
2.1 Use Cases ...................................................................................................................................................... 4
2.2 Architecture ..................................................................................................................................................... 5
3 NetApp E-Series Overview .................................................................................................................. 8
3.1 E-Series Hardware Overview .......................................................................................................................... 9
3.2 SANtricity ...................................................................................................................................................... 11
3.5 Performance ................................................................................................................................................. 16
4 NetApp and E-Series Testing ............................................................................................................ 17
4.1 Overview of Splunk Cluster Testing Used for E-Series Compared to Commodity Server DAS .................... 18
4.2 Eventgen Data .............................................................................................................................................. 18
4.3 Cluster Replication and Searchable Copies Factor ....................................................................................... 19
4.4 Commodity Server with Internal DAS Baseline Test Setup ........................................................................... 19
4.5 E-Series with DDP Baseline Test Setup ....................................................................................................... 19
4.6 Baseline Test Results for E-Series Compared Commodity Servers with Internal DAS ................................. 20
4.7 Search Results for Baseline Tests ................................................................................................................ 21
5 Summary ............................................................................................................................................. 25
6 Appendixes ......................................................................................................................................... 25
Splunk Apps for NetApp ........................................................................................................................................ 25
Splunk Cluster Indexer Rate from Distributed Management Console .................................................................... 27
Splunk Cluster Server Information ......................................................................................................................... 27
Splunk Cluster Indexer Bucket Information from Distributed Management Console ............................................. 28
E2800 SANtricity 11.30 Storage Hierarchy ............................................................................................................ 29
References ................................................................................................................................................. 29
Version History ......................................................................................................................................... 30
LIST OF TABLES
Table 1) E2800 controller shelf and drive shelf models. ................................................................................................. 9
Table 2) Supported drive types in SAS 3 enclosures. .................................................................................................. 10
Table 3) Drive shelf options for E2800. ........................................................................................................................ 10
Table 4) Splunk cluster server hardware. ..................................................................................................................... 18
Table 5) Index peer node mounted LUNs..................................................................................................................... 27
3 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
LIST OF FIGURES
Figure 1) Splunk cluster server components. ................................................................................................................. 6
Figure 2) Basic Splunk cluster configuration. ................................................................................................................. 6
Figure 3) Distribution of data in a five-node Splunk cluster. ........................................................................................... 8
Figure 4) E2800 shelf options (duplex configurations shown). ....................................................................................... 9
Figure 5) SANtricity Storage Manager 11.30 management environment. .................................................................... 12
Figure 6) Managing a mixed-array environment with SANtricity Storage Manager and System Manager. .................. 13
Figure 7) System Manager home page. ....................................................................................................................... 14
Figure 8) Dynamic Disk Pools components. ................................................................................................................. 15
Figure 9) Dynamic Disk Pools drive failure. .................................................................................................................. 16
Figure 10) Performance of the E2800 ......................................................................................................................... 17
Figure 11) Commodity server Splunk cluster with DAS. ............................................................................................... 19
Figure 12) Splunk cluster with E-Series DDP. .............................................................................................................. 20
Figure 13) Index peer node ingest rates. ...................................................................................................................... 21
Figure 14) Dense static searching comparison. ........................................................................................................... 22
Figure 15) Rare static searching comparison ............................................................................................................... 22
Figure 16) E2800 controller failure index rate ............................................................................................................... 23
Figure 17) E2800 controller failure rare searches ........................................................................................................ 24
Figure 18) E2800 controller failure dense searches test. ............................................................................................. 24
Figure 19) Splunk app NetApp SANtricity Performance App for Splunk Enterprise. .................................................... 26
Figure 20) index.conf to limit maximum warm buckets ................................................................................................. 27
Figure 21) index.conf from splunk.com capacity configurator (7-day hot/warm) ........................................................... 27
Figure 22) index.conf volume settings .......................................................................................................................... 28
Figure 23) Indexer bucket data from DMC—E2800 system ......................................................................................... 28
Figure 24) SANtricity 11.30 storage hierarchy .............................................................................................................. 29
4 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
1 Introduction
NetApp E-Series enables Splunk environments to maintain the highest levels of performance and uptime
for Splunk workloads by providing advanced fault recovery features and easy in-service growth
capabilities to meet ever-changing business requirements. The E-Series is designed to handle the most
extreme application workloads with very low latency. Typical use cases include application acceleration;
improving the response time of latency-sensitive applications; and improving the power, environmental,
and capacity efficiency of overprovisioned environments. E-Series storage systems leverage the latest
solid-state-disk (SSD) and SAS drive technologies and are built on a long heritage of serving diverse
workloads to provide superior business value and enterprise-class reliability.
Splunk is the leading operational intelligence software that enables you to monitor, report, and analyze
live streaming and historical machine-generated data, whether it is on the premises or in the cloud. An
organization’s IT data is a definitive source of intelligence because it is a categorical record of activity and
behavior, including user transactions, customer behavior, machine behavior, security threats, and
fraudulent activity. Splunk helps users gain visibility into this machine data to improve service levels,
reduce IT operations costs, mitigate security risks, enable compliance, and create new product and
service offerings. Splunk offers solutions for IT operations, applications management, security and
compliance, business analytics, and industrial data.
The NetApp E2800 for this test used NetApp SANtricity® release 11.30; you can find more information on
this release here. All of the testing was also done on the new version of Splunk, version 6.4.2. More
information on Splunk version 6.4.2 is available here.
2 Splunk Use Cases
All of your IT applications, systems, and technology infrastructure generate data every millisecond of
every day. This machine data is one of the fastest growing and most complex areas of big data. Splunk
collects all of your data sources—streaming and historical—by using a technology called universal
indexing. Splunk is scalable enough to work across all of your data centers and it is powerful enough to
deliver real-time dashboard views to any level of the organization. However, using this data can be a
challenge for traditional data analysis, monitoring, and management solutions that are not designed for
large-volume, high-velocity diverse data.
Splunk offers a unique way to sift, distill, and understand these immense amounts of machine data that
can change how IT organizations manage, analyze, secure, and audit IT. Splunk enables users to
develop valuable insights into how to innovate and offer new services as well as into trends and customer
behaviors.
2.1 Use Cases
Splunk can be deployed for use in a wide variety of use cases and it provides creative ways for users to
gain intelligence from data.
Application Delivery
Gain end-to-end visibility across distributed infrastructures, troubleshoot application environments,
monitor performance for degradation, and monitor transactions across distributed systems and
infrastructure.
Security, Compliance, and Fraud
Enable rapid incident response, real-time correlation, and in-depth monitoring across data sources.
Conduct statistical analysis for advanced pattern detection and threat defense.
5 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
Infrastructure and Operations Management
Proactively monitor across IT silos to enable uptime, rapidly pinpoint and resolve problems, identify
infrastructure service relationships, establish baselines, and create analytics to report on SLAs or track
service provider SLAs.
Business Analytics
Provide visibility and intelligence related to customers, services, and transactions. Recognize trends and
patterns in real time and provide valuable understanding of new product features’ impact on back-end
services. Gain valuable understanding of the user experience for greater user satisfaction and prevent
drop-offs, improve conversions, and boost online revenues.
2.2 Architecture
Splunk’s architecture provides linear scalability for indexing and distributed search. Splunk’s
implementation of MapReduce allows large-scale search, reporting, and alerting. Splunk takes a single
search and enables you to query many indexers in massive parallel clusters. With the addition of index
replication, you can specify how many copies of the data you want to make available to meet your
availability requirements.
The Splunk platform is open and has SDKs and APIs, including a REST API and SDKs for Python, Java,
JavaScript, PHP, Ruby, and C#. This capability enables developers to programmatically interface with the
Splunk platform. With Splunk you can develop your own applications or templates to deploy on your
infrastructure.
Splunk can write the indexed data to clustered servers to add additional copies of the raw file and
metadata using replication so that the data is available even from indexer node failures. Common Splunk
server components are shown in Figure 1.
6 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
Figure 1) Splunk cluster server components.
The common and recommended replication factor for Splunk running internal direct-attached storage
(DAS) is three. In this scenario, the minimum number of needed Splunk index servers is also three.
Figure 2 shows the basic Splunk cluster configuration.
Figure 2) Basic Splunk cluster configuration.
7 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
The machine log data from the Splunk forwarders sent to the indexer peer nodes uses the recommended
data replication factor of three, which makes available three copies of data. The ingested data is
compressed and indexed as raw data files and metadata that is then distributed among the indexer peer
nodes for redundancy. Figure 3 depicts the way that Splunk replicates data in a five-indexer cluster.
8 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
Figure 3) Distribution of data in a five-node Splunk cluster.
3 NetApp E-Series Overview
NetApp E-Series E2800 storage systems address wide-ranging data storage requirements with balanced
performance that is equally adept at handling large sequential I/O for video, analytical, and backup
applications. It also handles small random I/O requirements for small and medium-sized enterprise mixed
workloads. The E2800 brings together the following advantages:
Support for all-flash and hybrid drive configurations
Support for SSD read cache
Modular host interface flexibility (SAS, FC, and iSCSI)
High reliability (99.999% reliability)
Intuitive management: simple administration for IT generalists, detailed drill-down for storage specialists
The new entry-level E2800 is a 12GB SAS 3 system with SANtricity 11.30 software. The E2800
introduces new on-box management, including browser-based SANtricity System Manager 11.30, which
features the following new capabilities:
On-box web services
9 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
On-box SANtricity System Manager, with an easy-to-use graphical user interface
The ability to store and present up to 30 days of performance data, including I/O latency, IOPS, CPU utilization, and throughput
The ability to do application/workload tagging
Easier alert management, including an embedded SNMP agent and MIB
Embedded NetApp AutoSupport® functionality
Together, these features create an entry-level storage system with the flexibility and performance
capabilities to support enterprise workloads without sacrificing simplicity and efficiency. In addition, the
E2800 storage system’s fully redundant I/O paths, advanced protection features, and extensive
diagnostic capabilities deliver a high level of availability, data integrity, and security.
3.1 E-Series Hardware Overview
As shown in Table 1, the E2800 is available in two shelf options that support both hard-disk drives
(HDDs) and solid-state drives (SSDs) to meet a wide range of performance and application requirements.
Table 1) E2800 controller shelf and drive shelf models.
Controller Shelf Model Drive Shelf Model Number of Drives Type of Drives
E2824 DE224C 24 2.5” SAS drives (HDDs and SSDs)
E2812 DE212C 12 3.5” NL-SAS drives
2.5” SAS SSD drives
Both shelf options support one or two controller canisters, dual power supplies, and dual fan units for
redundancy (the shelves have an integrated power-fan canister). The shelves are sized to hold 24 drives
or 12 drives, as shown in Figure 4.
Note: In a duplex configuration, both controllers must be identically configured.
Figure 4) E2800 shelf options (duplex configurations shown).
Each E2800 controller provides two Ethernet management ports for out-of-band management and has
two 12Gbps (x4 lanes) wide-port SAS drive expansion ports for redundant drive expansion paths. The
E2800 controllers also include two built-in host ports, either two 16Gb FC/10Gb iSCSIs or two 10Gb
iSCSI RJ-45s, but one of the following host interface cards (HICs) can be installed in each controller:
4-port 12Gb SAS (SAS 3 connector)
10 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
2-port 12Gb SAS (SAS 3 connector)
4-port optical HIC (SFP+), which can be configured as either 16Gb Fibre Channel or 10Gb iSCSI
2-port optical HIC (SFP+), which can be configured as either 16Gb Fibre Channel or 10Gb iSCSI
Note: A software feature pack can be applied in the field to change the host protocol of the optical baseboard ports and the optical HIC ports from FC to iSCSI or from iSCSI to FC.
2-port 10Gb iSCSI (Cat6e/Cat7 RJ45)
Note: If the base ports on the controller are configured with 10GB iSCSI RJ-45, then the only HIC option supported is the 2-port 10Gb iSCSI (Cat6e/Cat7 RJ45).
For optical connections, the appropriate SFPs must be ordered for the specific implementation. Consult the Hardware Universe for a full listing of available host interface equipment.
For detailed instructions on changing the host protocol, go to the Upgrading > Hardware Upgrade section
at https://mysupport.netapp.com/eseries.
Table 2) Supported drive types in SAS 3 enclosures.
Drive Types NL-SAS SAS SDD
DE212C 4TB, 6TB 800GB
8TB 1.6TB
10TB 3.2TB
DE224C 900GB 800GB
1.2TB 1.6TB
1.8TB 3.2TB
The E2800 controller shelf supports 12 and 24 drives based on the shelf model (DE212C or DE224C,
respectively), but the system capacity can be further expanded by adding additional expansion-drive
shelves to the controller shelf. The E2800 supports up to 4 total shelves, the controller shelf plus 3
expansion-drive shelves, for a maximum of 180 HDD (120 SSD) drives. Drive shelf options are shown in
Table 3.
Table 3) Drive shelf options for E2800.
Property DE212C DE224C DE1600 DE5600 DE6600
Form Factor 2U 2U 2U 2U 4U
Drive Size 3.5”
2.5” (w/bracket) 2.5” 3.5” 2.5”
3.5”
2.5” (w/bracket)
Drive Types NL-SAS
SSD
SAS
SSD NL-SAS
SAS
SSD
SAS
NL-SAS
SSD
Total Drives 12 24 12 24 60
Drive Interface 12Gb SAS 12Gb SAS 6Gb SAS 6Gb SAS 6Gb SAS
11 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
Note: DE1600, DE5600, and DE6600 are supported only as part of in-place data migration from E2700/E5400/E5500/E5600 to E2800. For information on the hardware used in previous NetApp E-Series and Splunk testing, see TR-4460.
3.2 SANtricity
E2800 systems are managed by the SANtricity System Manager browser-based application. The E2800
controller and SANtricity 11.30 form a milestone release for E-Series because it puts into production a
new architecture for both controller firmware and management software. In particular, SANtricity System
Manager 11.30 is embedded on the controller.
The major components of SANtricity storage management software are still used with the E2800-based
storage arrays so the installation flow is similar. The only component that will never be used is the Array
Management Window for E2800-based storage arrays. This component was replaced by the embedded
browser-based System Manager. For more details and information on the E-Series E2800 storage system
and SANtricity 11.30, see TR-4538.
3.3 Overview
SANtricity System Manager provides embedded management software, web services, event monitoring,
and AutoSupport for the E2800 controller. Previous controllers such as the E2700, E5600, and EF560 do
not have this embedded functionality. Because you might have a mixed environment with both the new
E2800 storage array and older storage arrays, there are a variety of management options. Figure 5
shows a graphical representation of the new landscape and where the different management functions
occur.
12 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
Figure 5) SANtricity Storage Manager 11.30 management environment.
Figure 6 shows the mixed-array environment and how it is managed with SANtricity Storage Manager and
System Manager.
13 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
Figure 6) Managing a mixed-array environment with SANtricity Storage Manager and System Manager.
For a detailed description of installing and configuring the components you choose, refer to the
appropriate Power Guides for deployment.
3.4 System Manager Navigation
After you log in to System Manager, the home page is displayed, as shown in Figure 7.
The icons on the left of the home page are used to navigate through the System Manager pages and are available on all pages. The text can be toggled on and off.
The items on the top right of the page (Preferences, Help, Log Out) are also available at any location in System Manager.
Highlighted on the bottom right corner is the drop-down-style menu used extensively in System Manager.
Figure 7 shows the System Manager home page.
14 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
Figure 7) System Manager home page.
For more information on the home page, also see the E-Series Documentation Center.
Dynamic Capabilities
From a management perspective, SANtricity System Manager offers a number of capabilities to ease the
burden of storage management, including the following:
New volumes can be created and are immediately available for use by connected servers.
New RAID sets (volume groups) or Dynamic Disk Pools can be created at any time from unused disk devices.
Volumes, volume groups, and disk pools can all be expanded online as necessary to meet any new requirements for capacity or performance.
15 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
Dynamic RAID Migration allows the RAID level of a particular volume group, for example, from RAID 10 to RAID 5, to be modified online if new requirements dictate a change.
Flexible cache block and segment sizes allow optimized performance tuning based on a particular workload. Both items can also be modified online.
There is built-in performance monitoring of all major storage components, including controllers, volumes, volume groups, pools, and individual disk drives.
Automated remote connection to the NetApp AutoSupport function provides “phone home” capabilities and automated parts dispatch if a component fails.
The E2800 has path failover and load-balancing (if applicable) between the host and the redundant storage controllers.
You gain the ability to manage and monitor multiple E-Series storage systems from the same management interface.
Dynamic Disk Pools
With seven patents pending, the DDP feature dynamically distributes data, spare capacity, and protection
information across a pool of disk drives. These pools can range in number from a minimum of 11 drives to
all the drives in an E2800 or E-Series storage system. In addition to creating a single DDP, storage
administrators can opt to create traditional volume groups in conjunction with a single DDP or even
multiple DDPs, which offers an unprecedented level of flexibility.
Dynamic Disk Pools are composed of several lower-level elements. The first of these is known as a D-
piece. A D-piece consists of a contiguous 512MB section from a physical disk that contains 4,096 128KB
segments. Within a pool, 10 D-pieces are selected using an intelligent optimization algorithm from
selected drives within the pool. Together, the 10 associated D-pieces are considered a D-stripe, which is
4GB of usable capacity. Within the D-stripe, the contents are similar to a RAID 6 8+2 scenario. There, 8
of the underlying segments potentially contain user data, 1 segment contains parity (P) information
calculated from the user data segments, and the final segment contains the Q value as defined by RAID
6.
Volumes are then created from an aggregation of multiple 4GB D-stripes as required to satisfy the
defined volume size up to the maximum allowable volume size within a DDP. Figure 8 shows the
relationship between these data structures.
Figure 8) Dynamic Disk Pools components.
Another major benefit of a DDP is that, rather than using dedicated stranded hot spares, the pool contains
integrated preservation capacity to provide rebuild locations for potential drive failures. This benefit
16 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
simplifies management, because you no longer need to plan or manage individual hot spares. The
capability also greatly improves the time of rebuilds, if required, and enhances the performance of the
volumes during a rebuild, as opposed to the time and performance of traditional hot spares.
When a drive in a DDP fails, the D-pieces from the failed drive are reconstructed to potentially all other
drives in the pool using the same mechanism normally used by RAID 6. During this process, an algorithm
internal to the controller framework verifies that no single drive contains two D-pieces from the same D-
stripe. The individual D-pieces are reconstructed at the lowest available LBA range on the selected disk
drive.
Figure 9) Dynamic Disk Pools drive failure.
In Figure 9, above, disk drive 6 (D6) is shown to have failed. Later, the D-pieces that previously resided
on that disk are recreated simultaneously across several other drives in the pool. Because there are
multiple disks participating in the effort, the overall performance impact of this situation is lessened and
the length of time needed to complete the operation is dramatically reduced.
In the event of multiple disk failures within a DDP, priority reconstruction is given to any D-stripes that are
missing two D-pieces to minimize any data availability risk. After those critically affected D-stripes are
reconstructed, the remainder of the necessary data continues to be reconstructed.
From a controller resource allocation perspective, there are two reconstruction priorities within a DDP that
the user can modify:
The degraded reconstruction priority is assigned for instances in which only a single D-piece must be rebuilt for the affected D-stripes; the default for this is high.
The critical reconstruction priority is assigned for instances in which a D-stripe has two missing D-pieces that need to be rebuilt; the default for this is highest.
For very large disk pools with two simultaneous disk failures, only a relatively small number of D-stripes
are likely to encounter the critical situation in which two D-pieces must be reconstructed. As discussed
previously, these critical D-pieces are identified and reconstructed initially at the highest priority. This
process returns the DDP to a degraded state very quickly so that further drive failures can be tolerated.
In addition to the improvement in rebuild times and superior data protection, DDP can also greatly
improve the performance of the base volume when under a failure condition compared with the
performance of traditional volume groups.
3.5 Performance
An E2800 configured with all SSD, HDD, or a mixture of both drives is capable of performing at very high
levels, both in input/output per second (IOPS) and throughput, while still providing extremely low latency.
17 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
The E2800, through its ease of management, high degree of reliability, and exceptional performance, can
meet the extreme performance requirements expected in a Splunk server cluster deployment.
An E2800 with 24 SSD drives can provide up to 300,000 4K random read IOPS at less than 1ms average
response time.
Many factors can affect the performance of the E2800, including different volume group types, the use of
DDP, the average I/O size, and the read versus write percentage provided by the attached server(s).
Figure 10 provides further performance statistics across various data protection strategies on the system
under several generic I/O workloads. Figure 10 shows the E2800’s expected performance for DDP in this
test.
Figure 10) Performance of the E2800.
4 NetApp and E-Series Testing
NetApp recently tested a simulated Splunk cluster environment with both the E-Series and the commodity
servers configured for the index peer node disk for hot/warm and cold data buckets. This configuration
enabled testing the E-Series compared to the commodity server DAS for the indexing and search
functions that a Splunk cluster requires. The server hardware was chosen following recommendations
from the Splunk reference architecture system requirements. The Splunk cluster server hardware that
was used is listed in Table 4.
18 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
Table 4) Splunk cluster server hardware.
Splunk Cluster Qty. Type CPU CPUs Cores/
CPUs
Speed RAM
Indexer Peer Node
8 Dell 730xd E2-2670 v3 2 8 2.3 Ghz
128GB
Search Head 1 Dell 730xd E2-2670 v3 2 8 2.3 Ghz
128GB
Cluster Master 1 Dell 730xd E2-2670 v3 2 8 2.3 Ghz
128GB
Forwarder 3 Dell 730 E2-2670 v3 2 8 2.3 Ghz
128GB
The ingest machine log data was created using the Splunk workload tool eventgen. The cluster had eight
index peer nodes to handle ingesting ~125GB of simulated machine syslog data per indexer, for a total of
~1TB per day for the entire cluster.
4.1 Overview of Splunk Cluster Testing Used for E-Series Compared to Commodity Server DAS
The Splunk cluster configuration components consist of:
Forwarders—Ingest 125GB of machine log data files into the cluster of index node peers.
Index peer nodes—Index the ingested machine syslog data and replicate data copies in the cluster.
Search head—Execute custom searches for dense, very dense, rare, and very rare data from the cluster of index peer nodes.
Master—Monitor and push configuration management changes for the cluster. License master of 1TB per day ingest amount for the eight-index peer node cluster.
4.2 Eventgen Data
The machine log dataset was created with Splunk’s event generator, the eventgen. The Splunk event
generator is a downloadable Splunk app available from the Splunk website. Splunk eventgen enables
users to load samples of log files or exported .csv files as an event template. The templates can then be
used to create artificial log events with simulated timestamps. A user can modify the field values and
configure the random variance while preserving the structure of the events. The data templates can be
looped to provide a continuous stream of real-time data. For more eventgen information, visit Splunk
eventgen app.
For our testing, the eventgen was loaded into the cluster and was configured to produce a 125GB
simulated syslog type file for each Splunk forwarder instance. The file is then split into smaller syslog files
on each of eight individual Splunk heavy forwarder instances, each ingesting data in a one-to-one data
path to one of the eight index peer nodes. The total ingested data is ~1TB per day loaded into the cluster
for each simulated daily index.
Following are the number of rare and dense search terms per 10,000,000 lines:
Very Dense Search—1 out of 100 lines; 100,000 occurrences
Dense Search—1 out of 1,000 lines; 10,000 occurrences
Rare Search—1 out of 1,000,000 lines; 10 occurrences
Very Rare Search—1 out of 10,000,000 lines; 1 occurrence
19 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
4.3 Cluster Replication and Searchable Copies Factor
The commodity server Splunk cluster was configured with a replication factor of 3 and searchable copies
with a factor of 2. The search factor determines the number of searchable copies of data the indexer
cluster maintains for the number of searchable copies of each bucket. The default value for the search
factor is 2, meaning that the cluster maintains two searchable copies of all data. The search factor must
be less than or equal to the replication factor. The replication factor is the number of copies of data that
you want the cluster to maintain. Peer nodes store incoming data in buckets, and the cluster maintains
multiple copies of each bucket. The cluster stores each bucket copy on a separate peer node. The
number of copies of each bucket that the cluster maintains is the replication factor. The default replication
value for a cluster is 3.
The E-Series test was configured with a replication factor of 2 and searchable copies with a factor of 1.
The E-Series provides additional redundancy with additional copies of indexed data located in the DDP
volumes for each indexer. This additional redundancy enables the replication factor of 2 to seamlessly
provide fewer copies of index data in the Splunk cluster for performance and data storage benefits.
4.4 Commodity Server with Internal DAS Baseline Test Setup
The commodity servers with DAS for indexer peer nodes for the baseline test were configured with:
• 4 800GB SSD RAID 10 offerings, ~12TB of usable capacity for the Splunk cluster hot data buckets
• 10 1.2TB-drive RAID 10 offerings, ~48TB of usable capacity for the Splunk cluster cold data buckets
• 10Gb Ethernet private network for the Splunk index peer nodes, Search Head node, and Controller
node
The servers for the Splunk heavy forwarder servers are interconnected with a 10Gb Ethernet private
network to isolate the ingest of data from other network traffic. The SSDs and SAS drives were configured
into RAID 10 volumes on each server using the server’s internal RAID controller. The mounted volumes
were configured as ext4 file systems on the CentOS 7.1 operating system for each indexer peer node,
search head, and controller. Customers need to validate exact configuration details for the E-Series using
the NetApp Interoperability Matrix Tool IMT. Figure 11 shows the commodity server test configuration.
Figure 11) Commodity server Splunk cluster with DAS.
4.5 E-Series with DDP Baseline Test Setup
The E-2800 system configuration for the baseline test was configured with DDP LUNs using:
24 x 800GB SSDs with a pool preservation capacity of 2 drives, offering ~12TB of usable capacity for the Splunk cluster hot data buckets
20 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
An additional DE6660 expansion tray was added with a single DDP using:
60 x 4TB drives with a pool preservation capacity of 3 drives, offering ~168TB of usable capacity for the Splunk cluster cold data buckets
10Gb Ethernet private network for index peer nodes
16Gb Fibre Channel SAN for E-Series and index peer nodes
The DDP LUNs were configured into eight volumes: one each of the eight index peer node hosts. The
mounted volumes were configured as ext4 file systems on the CentOS 7.1 OS of each indexer.
The same server hardware is used for the E-Series configuration as for the commodity test configuration.
This setup provides a baseline comparison level to eliminate any hardware differences between the E-
Series and the commodity test configurations except for the local internal disks (DAS). See Figure 12 for
the E-Series baseline configuration.
Figure 12) Splunk cluster with E-Series DDP.
The three forwarders run eight instances of Splunk Enterprise configured as heavy forwarders to ingest
the syslog data for a 1 to 1 data path of each forwarder instance to a Splunk cluster peer node.
4.6 Baseline Test Results for E-Series Compared Commodity Servers with Internal DAS
Figure 13 displays a graph that reflects the data ingest rate for each baseline configuration setup. The
indexes were created with Splunk parameters to ingest the log data into the SSD hot buckets until a
threshold of hot/warm buckets was reached of 10 maximum hot buckets per Splunk indexer. This index
was designed to test the flow of data through the hot/warm and cold buckets for ingest and searches. The
hot/warm data buckets were moved from the E-2800 system with all SSDs to cold data buckets in the 4.0
NL-SAS drives DDP located in a DE6600 expansion tray. Each test was run using Splunk version 6.4.2
with parallel ingestion pipelines enabled on the indexers and the heavy forwarders with a setting of
21 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
parallelIngestionPipelines = 2. Index parallelization can help accommodate bursts of data. For
more Splunk information and settings to enable parallel indexing, see managing indexers.
Figure 13) Index peer node ingest rates.
The E-Series baseline configuration ingests the eventgen log data at rates that are comparable to those
of the local DAS used with the commodity baseline configuration. The ingest rate for Splunk has many
variables, including the replication factor. The reduction in replicas in the E-Series system results in fewer
raw data files and buckets stored than with the commodity configuration. When scaling the Splunk cluster
with additional index peer nodes, this benefit provides an advantage in that less overall storage is needed
to meet the capacity increases.
4.7 Search Results for Baseline Tests
The graphs in Figure 14 and Figure 15 reflect multiple searches run after the data was ingested and
indexed in the Splunk cluster and was ready for users to access for searches and report generation. With
a decreased replication factor of 2, the E-Series 2800 system is able to perform searches in less time
than it takes the commodity DAS configuration.
22 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
Figure 14) Dense static searching comparison.
Figure 15) Rare static searching comparison.
In addition to the baseline performance testing, additional testing was conducted on the E-2800 under a
controller failure condition.
In a typical Splunk cluster deployment using only internal drives within the server, there is no redundancy
for a RAID controller failure. Failure of the internal controller would be similar to a complete server failure
23 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
and would require the remaining servers in the cluster to handle all indexing and search requests. This
condition renders the failed servers unavailable, whereas with the E-Series and the redundancy provided
through the dual-redundant-controller design, all data is available because all LUNs simply transition to
the remaining controller.
A test was conducted in which a controller within the E2800 was failed while under an active workload
ingesting eventgen log data into all eight index peer nodes. This action left the one remaining E2800
system controller handling the workload. Search tests were executed after the workload completed and a
second test was made of streaming data ingestion while searching. The E2800 system can seamlessly
transition to one controller while the Splunk cluster performs normal operations without being negatively
affected.
Figure 16 compares the E2800 with a controller failure to an E2800 system running controllers optimally.
The system performs as well as or better under a controller failure condition.
Figure 16) E2800 controller failure index rate.
Figure 17) E2800 controller failure rare searches and Figure 18 shows the dense search test results.
They also show similar performance, enabling the Splunk cluster to continue running optimally.
24 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
Figure 17) E2800 controller failure rare searches.
Figure 18) E2800 controller failure dense searches test.
As the above graphs show, the search times are slightly higher under the controller failure condition, but
performance minimally affects the Splunk cluster and end users.
The E2800 was also tested with a drive failure in the DDP during the data ingest and search tests to all
eight index peer nodes. The Splunk cluster and E2800 system performance were not affected by a single
drive failure and required rebuild times within the DDP. For more information on previous NetApp E-
Series products tested with Splunk, refer to NetApp E-Series and Splunk - TR-4460.
25 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
5 Summary
The NetApp E-Series 2800 system with all-flash and hybrid array capabilities provides a number of
significant advantages over internal direct-attached storage (DAS) for Splunk deployments. These
advantages include exceptional storage management capabilities, dramatically improved reliability, high
availability, and limited performance degradation because of failure conditions such as disk failures. The
advantages also include excellent performance handling ingest of machine log data and excellent search
capabilities at very low latency with an E2800 system configuration of all-flash SSDs for hot and warm
Splunk buckets or a hybrid configuration of both SSDs and HDDs. The E-Series DE5660 provides
excellent performance and reliability for the Splunk cold data bucket tiers and, combined with the E2800
system, is a highly scalable solution for Splunk use cases.
Organizations that use Splunk often use traditional server-based storage with inefficient, hard-to-scale
internal DAS. The NetApp reference design employs the managed DAS model, with higher scalability and
performance. The reduction of the Splunk cluster replication factor available in E-Series storage reduces
the amount of data indexed. The reduction also prevents unnecessary purchases of compute nodes for
storage-intensive workloads for Splunk environments that need to grow to meet organizational
requirements.
The NetApp reference architecture for Splunk is optimized for node storage balance, reliability,
performance, storage capacity, and density. From an administrative standpoint, the E-Series offers
simplified storage management with a browser-based UI. This solution enables new volumes, volume
groups, and Dynamic Disk Pools to be created easily and provisioned immediately for use by Splunk
cluster servers. In addition, existing volumes, volume groups, and Dynamic Disk Pools can all be
increased in size dynamically to provide additional capacity and/or performance as required for the
Splunk indexer cluster environment.
6 Appendixes
Splunk Apps for NetApp
The relationship between NetApp and Splunk includes developing apps for the range of NetApp products.
Current products supported with Splunk apps include SANtricity with E-Series, NetApp StorageGRID®
technology, and the NetApp Data ONTAP® operating system (both clustered and 7-Mode).
These apps are available on the Splunk base portal: http://apps.splunk.com/.
The NetApp SANtricity Performance App for Splunk Enterprise provides visibility into the health and
performance of NetApp E-Series and EF-Series storage systems. Figure 19 displays the configuration
information using a dashboard view for multiple E-Series storage systems. This view makes it easy to drill
down to specific configuration information for each array, such as IOPS, MBps, and latency, and
information about the controller, DDP, pools, volume groups, volume, and drives. The view also displays
Major Event Log information.
26 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
Figure 19) Splunk app NetApp SANtricity Performance App for Splunk Enterprise.
You can download the app from Splunk Apps. Running the app requires the Splunk app Technology Add-
On for NetApp SANtricity, also from Splunk Apps.
The Splunk app for StorageGRID provides real-time visualization and reporting of audit log parsing
information for billing and usage monitoring. These capabilities enable chargeback and billing, search
integration, custom reporting, security diagnostics, and alerts for compliance events. The app also
provides a view of CIFS/NFS activity and CDMI/SGAPI use.
You can download the app from Splunk Apps.
The Splunk app for Data ONTAP is compatible with Data ONTAP 8.x and above. You can download the
app from Splunk Apps.
27 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
Splunk Cluster Indexer Rate from Distributed Management Console
Splunk Cluster Server Information
Mounted E-Series and DAS file systems used in the configuration.
Table 5) Index peer node mounted LUNs.
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel-root 50G 15G 36G 29% /
devtmpfs 63G 0 63G 0% /dev
tmpfs 63G 84K 63G 1% /dev/shm
tmpfs 63G 34M 63G 1% /run
tmpfs 63G 0 63G 0% /sys/fs/cgroup
/dev/sdc 1.5T 232M 1.4T 1% /mongoSSD
/dev/sdd 6.5T 89M 6.2T 1% /mongoSAS
/dev/mapper/mpathu 1.5T 82G 1.4T 6% /e2800ssd
/dev/sda1 497M 127M 370M 26% /boot
/dev/mapper/rhel-home 504G 33M 504G 1% /home
/dev/mapper/mpathv 5.0T 89M 4.8T 1% /eseriesWSAS
Splunk index.conf and server.conf files for parallel ingestion pipelines.
Figure 20) index.conf to limit maximum warm buckets.
[lehigh_eseries_124]
repFactor = auto
homePath = volume:primary/lehigh_eseries_124/db
coldPath = volume:cold/lehigh_eseries_124/colddb
thawedPath = $SPLUNK_DB/lehigh_eseries_124/thaweddb
maxDataSize = auto_high_volume
maxWarmDBCount = 10
maxTotalDataSizeMB = 1500000
Figure 21) index.conf from splunk.com capacity configurator (7-day hot/warm).
[lehigh_eseries_114]
28 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
repFactor = auto
homePath = volume:primary/lehigh_eseries_114/db
coldPath = volume:cold/lehigh_eseries_114/colddb
thawedPath = $SPLUNK_DB/lehigh_eseries_114/thaweddb
homePath.maxDataSizeMB = 1030400
coldPath.maxDataSizeMB = 4416000
maxDataSize = auto_high_volume
maxWarmDBCount = 4294967295
Figure 22) index.conf volume settings.
# VOLUME SETTINGS
# One Volume for hot/warm data in E-2800 AFA DDP per indexer
[volume:primary]
path = /e2800ssd/hot
maxVolumeDataSizeMB = 2000000
# One volume for colde data in DE6600 4TB NL-SAS DDP
[volume:cold]
path = /eseriesWSAS/cold
maxVolumeDataSizeMB = 4600000
# One baseline DAS Volume for hot/warm data 4- 800GB SSD raid10 per indexer
[volume:primary2]
path = /mongoSSD/whiteboxHot
maxVolumeDataSizeMB = 2000000
# One baseline DAS Volume for cold data 10- 1.2GB 10K SAS raid10 per indexer
[volume:cold2]
path = /mongoSAS/whiteboxCold
maxVolumeDataSizeMB = 4600000
Splunk Cluster Indexer Bucket Information from Distributed Management Console
Figure 23) Indexer bucket data from DMC—E2800 system.
29 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
E2800 SANtricity 11.30 Storage Hierarchy
Figure 24) SANtricity 11.30 storage hierarchy.
References
NetApp Documentation
NetApp Architecture for Splunk
http://www.netapp.com/us/media/TR-4260_NetApp_Architecture_for_Splunk.pdf
NetApp E-Series and Splunk
http://www.netapp.com/us/media/tr-4460.pdf
NetApp E2800 Product Information
http://www.netapp.com/us/media/ds-3171-66862.pdf
SANtricity 11.30 release
SANtricity Release 11.30
Splunk Documentation
Splunk>docs
http://docs.splunk.com/Documentation
Installation Manual
http://docs.splunk.com/Documentation/Splunk/latest/Installation/Whatsinthismanual
Hardware Capacity Planning
http://docs.splunk.com/Documentation/Splunk/6.2.4/Capacity/IntroductiontocapacityplanningforSplunkEnterprise
Managing Index Sizes
30 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
http://docs.splunk.com/Documentation/Splunk/6.2.4/Indexer/Aboutmanagingindexes
Splunk Apps
http://apps.splunk.com/
Splunk Answers
http://answers.splunk.com/
Version History
Version Date Document Version History
Version 1.0 September 2016 Initial release
31 NetApp E-Series and Splunk © 2016 NetApp, Inc. All rights reserved.
Refer to the Interoperability Matrix Tool (IMT) on the NetApp Support site to validate that the exact product and feature versions described in this document are supported for your specific environment. The NetApp IMT defines the product components and versions that can be used to construct configurations that are supported by NetApp. Specific results depend on each customer's installation in accordance with published specifications.
Trademark Information
NetApp, the NetApp logo, Go Further, Faster, AltaVault, ASUP, AutoSupport, Campaign Express, Cloud
ONTAP, Clustered Data ONTAP, Customer Fitness, Data ONTAP, DataMotion, Fitness, Flash Accel,
Flash Cache, Flash Pool, FlashRay, FlexArray, FlexCache, FlexClone, FlexPod, FlexScale, FlexShare,
FlexVol, FPolicy, GetSuccessful, LockVault, Manage ONTAP, Mars, MetroCluster, MultiStore, NetApp
Insight, OnCommand, ONTAP, ONTAPI, RAID DP, RAID-TEC, SANtricity, SecureShare, Simplicity,
Simulate ONTAP, SnapCenter, Snap Creator, SnapCopy, SnapDrive, SnapIntegrator, SnapLock,
SnapManager, SnapMirror, SnapMover, SnapProtect, SnapRestore, Snapshot, SnapValidator,
SnapVault, StorageGRID, Tech OnTap, Unbound Cloud, WAFL, and other names are trademarks or
registered trademarks of NetApp Inc., in the United States and/or other countries. All other brands or
products are trademarks or registered trademarks of their respective holders and should be treated as
such. A current list of NetApp trademarks is available on the web at
http://www.netapp.com/us/legal/netapptmlist.aspx. TR-4555-0916
Copyright Information
Copyright © 1994–2016 NetApp, Inc. All rights reserved. Printed in the U.S. No part of this document covered by copyright may be reproduced in any form or by any means—graphic, electronic, or mechanical, including photocopying, recording, taping, or storage in an electronic retrieval system—without prior written permission of the copyright owner.
Software derived from copyrighted NetApp material is subject to the following license and disclaimer:
THIS SOFTWARE IS PROVIDED BY NETAPP "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL NETAPP BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
NetApp reserves the right to change any products described herein at any time, and without notice. NetApp assumes no responsibility or liability arising from the use of products described herein, except as expressly agreed to in writing by NetApp. The use or purchase of this product does not convey a license under any patent rights, trademark rights, or any other intellectual property rights of NetApp.
The product described in this manual may be protected by one or more U.S. patents, foreign patents, or pending applications.
RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.277-7103 (October 1988) and FAR 52-227-19 (June 1987).