+ All Categories
Home > Documents > A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation...

A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation...

Date post: 21-Mar-2018
Category:
Upload: hoangtuong
View: 214 times
Download: 2 times
Share this document with a friend
22
89 Fifth Avenue, 7th Floor New York, NY 10003 www.TheEdison.com 212.367.7400 82 82 White Paper A Comparative Evaluation of Block Storage Compression Technology for Multipurpose Storage Environments IBM Storwize 7000 Storage System Real-time Compression EMC VNX Embedded Compression NetApp Fabric-Attached Storage (FAS) Compression
Transcript
Page 1: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

89 Fifth Avenue, 7th

Floor

New York, NY 10003

www.TheEdison.com

212.367.7400

89 Fifth Avenue, 7th

Floor

New York, NY 10003

www.TheEdison.com

212.367.7400

89 Fifth Avenue, 7th Floor

New York, NY 10003

www.TheEdison.com

212.367.7400

89 Fifth Avenue, 7th

Floor

New York, NY 10003

www.TheEdison.com

212.367.7400

89 Fifth Avenue, 7th

Floor

New York, NY 10003

www.TheEdison.com

212.367.7400

89 Fifth Avenue, 7th Floor

New York, NY 10003

www.TheEdison.com

212.367.7400

82

82

White Paper

A Comparative Evaluation of Block

Storage Compression Technology for

Multipurpose Storage Environments

IBM Storwize 7000 Storage System Real-time

Compression

EMC VNX Embedded Compression

NetApp — Fabric-Attached Storage (FAS) Compression

Page 2: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Printed in the United States of America.

Copyright 2012 Edison Group, Inc. New York. Edison Group offers no warranty either

expressed or implied on the information contained herein and shall be held harmless for errors

resulting from its use.

All products are trademarks of their respective owners.

First Publication: July 2012

Produced by: Craig Norris, Senior Analyst; Barry Cohen, Chief Analyst and Editor-in-Chief;

Manny Frishberg, Editor

Page 3: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Table of Contents

Executive Summary ..................................................................................................................... 1

EMC Compression Findings ................................................................................................... 2

NetApp Compression Findings .............................................................................................. 3

Executive Summary Conclusions ........................................................................................... 4

Introduction .................................................................................................................................. 5

Audience .................................................................................................................................... 5

Overview ....................................................................................................................................... 6

More Data / Fewer Resources .................................................................................................. 6

Making Efficient Use of Data Storage Capacity .................................................................... 7

What Makes IBM Real-time Compression Unique ............................................................... 9

Transparency ............................................................................................................................. 9

Performance ............................................................................................................................... 9

Compression .............................................................................................................................. 9

Products Tested ......................................................................................................................... 9

EMC VNX 5500 .................................................................................................................... 10

NetApp Fabric-Attached Storage (FAS) 3240 ................................................................. 10

IBM Storwize V7000 Storage System ................................................................................ 10

Research Results ......................................................................................................................... 11

Test One—Microsoft Exchange Jetstress ............................................................................. 11

Test Two — Database Performance Measured with TPC-C ............................................. 13

Conclusions and Recommendations ...................................................................................... 15

Appendices .................................................................................................................................. 16

Appendix 1—Benchmarks Used in this Study ................................................................... 16

Exchange Load Generator (Microsoft Jetstress) .............................................................. 16

TPC-C .................................................................................................................................... 16

Appendix 2— Research Methodology ................................................................................. 17

Test Environment ................................................................................................................ 17

Test Objectives ..................................................................................................................... 18

Page 4: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 1

Executive Summary

With its 2010 launch of Real-time Compression, IBM announced the first real time data

compression technology that captures and compresses data before it is stored on disk.

This enabled both a tremendous increase in existing storage capacity and an impressive

decrease in storage footprints. In the area of storage optimization through compression,

IBM leapt ahead of its competitors by providing real-time data compression that shrank

primary, online data in real time. The end result was data compression without any

impact to application performance.

In conducting the research for this study, Edison Group sought to evaluate the

practicality of the compression technologies included with leading storage systems for

the mid to enterprise market, as compared with the Real-time Compression technology

now included with the IBM Storwize V7000 Storage System.

As storage system vendors trip over one another to offer technologies that improve

storage system utilization, some technologies may be added more to appear competitive

than to serve a practical purpose in an actual production storage environment.

Customers should evaluate such technologies carefully to determine whether or not they

are truly applicable to their particular IT scenarios.

Edison evaluated the product offerings in this study with an eye to their applicability

within multipurpose storage environments involving primary active data in a

production environment, as opposed to their applicability for specific, dedicated storage

purposes. To this end, the Real-time Compression technology included with the IBM

Storwize V7000 Storage System is compared with the compression technology included

with both the EMC VNX Series and the NetApp Fabric-Attached Storage (FAS) family of

filer storage systems.

The testing workloads in this study consisted of block-level data typical of production-

environment SANs (Storage Area Networks). Edison validated IBM’s claims in the areas

of transparency, performance, and maximum compression over time. Compressing

block-level Exchange Jetstress test data using IBM Real-time Compression occurred on-

the-fly at wire speed. Compressing block-level data using TPC-C benchmark in an

Oracle database environment resulted in no performance impact (other than some small

improvement for certain transactional functions).

Page 5: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 2

To sum up the research findings for Real-time Compression:

Complete transparency: Real-time Compression is invisible to the server operating

systems and applications, which means that no administrative overhead is required

to use it. The EMC VNX Series, by contrast, is designed to compress inactive content

and is entirely unsuitable for active data. While the NetApp FAS 3240 demonstrated

file copy transparency with pre-processed compression on initial writes, for any

subsequent changes to stored data, compression is performed after processing. Thus

the NetApp FAS native compression is not practical for highly active, frequently

changing data utilization in applications such as databases.

No impact on performance: Edison found no impact on performance with Real-time

Compression and, in fact, found some advantage over NetApp compression for

certain TPC-C order functions in the database transaction tests.

Minimal management requirements: Beyond determining which data the

compression can be applied to (since not all data is compressible), Real-time

Compression involves no additional administrative complexity. Since no post-

processing impacts data protection scheduling and policies, these remain

unchanged.

Reduced storage capacity required: Being able to use compression on all

compressible data means less capacity must be obtained at initial acquisition and less

additional capacity needs to be acquired as data inevitably grows.

EMC Compression Findings

Edison ran Microsoft’s Jetstress benchmark to compare the Real-time Compression

technology of IBM Storwize V7000 to the embedded compression technology of EMC

VNX. Edison discovered it took VNX compression over 83 hours to compress the same

1.65 TB amount of data to a similar compression ratio as Real-time Compression

compressed instantaneously, on-the-fly.

It should be noted that even the exceedingly slow compression time of the EMC VNX

technology was achieved on a test system dedicated entirely to the task, so that it could

be kept idle until compression was completed. In an actual production scenario, where

resources and I/O are being allocated to higher-priority and/or business-critical tasks, no

compression would occur at all during those times, rendering the compression feature

all but useless. The sheer amount of data generated and/or changed in the course of

practical day-to-day operations would outpace the ability of the system to compress it

all.

Page 6: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 3

A process that compresses data over a timeframe of several weeks, as Edison has

determined would be the case with EMC VNX compression, is impractical for any data

other than that which is to be entirely dormant or permanently archived. Therefore, the

compression technology included with the EMC VNX line is really at odds with the

mixed storage environment for which the devices themselves are usually purposed.

NetApp Compression Findings

Unlike EMC VNX compression, NetApp compression does have the ability to compress

data during initial writes (on-the-fly). However, it is unable to compress updates to

stored data during writes. This causes compression rates to deteriorate over short

periods of time because this changed data must be stored uncompressed awaiting post-

process compression. With NetApp FAS compression, there is a tight correlation

between the percentage of data changing and the decrease in the amount of data

compressed; after a 24-hour Exchange Jetstress run, the data compression ratio

decreased by 50 percent.

Additionally, the NetApp compression that takes place post-process — that is, where

changes are made to stored data — heavily utilizes the system CPUs (as much as 100

percent) and disks (100 percent). Again, it is important to note that this was the case in a

test system that was otherwise idle. In a true mixed production environment, this

characteristic could either have unacceptable impact on other more business-critical

processes, or else be prioritized such that the compression process could take an

unacceptably long time.

As expected, our tests indicated that, when post-process compression took place while

the Exchange Jetstress test was running, the application performance decreased

dramatically and the post-process compression scarcely progressed.

In summary, Edison’s testing revealed that NetApp post-processing incurs a significant

performance penalty on further writes, due to two factors:

100 percent drive utilization for compression, with up to 100 percent CPU utilization

without further I/O.

100 percent CPU utilization on top of 100 percent drive utilization when additional

workloads are run, which nearly locks the system up completely.

Page 7: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 4

Executive Summary Conclusions

All EMC VNX data compression occurs post-process (after written uncompressed to the

storage device), as does compression of any changes made to data with NetApp

technology. Not only does this approach severely impact stored data access

performance, but the data must also be stored in its uncompressed form before post-

process compression can handle it. For this reason, both EMC and NetApp recommend

sizing compressed devices for data at its original size. If followed, this practice

eliminates the benefits of deploying compression.

Edison believes that the compression features of the EMC VNX are best utilized in

environments where an entire VNX array can be dedicated to compressible low-

throughput workloads, and for which considerable idle periods are available. Though

there may be cases where this is a valid approach, there are other solutions on the

market that would be more appropriate. NetApp compression may be practical for

applications that regularly write data to storage during production operations with few

or no further updates. However, the fact that it must update data post-processed makes

it impractical for applications — such as databases — where frequent changes are made

to stored data.

Edison’s findings validate IBM’s reasoning in offering Real-time Compression with

multipurpose storage systems for production IT environments using active primary

data. For organizations purchasing a storage solution where efficient data capacity

utilization is an important criterion, with Real-time Compression they need no longer

compromise application performance or purchase multiple solutions for dedicated

storage purposes.

Page 8: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 5

Introduction

IBM has asked Edison to perform product testing comparing the Real-time Compression

technology included with the IBM Storwize V7000 Storage System to the compression

technology included in both the EMC VNX Series of storage systems and the NetApp

Fabric-Attached Storage (FAS) family of filers. The purpose is to validate IBM’s claims in

the areas of maximum compression, design for active primary data workloads, and the

technology’s transparency to applications, storage, networks, and downstream process

as these apply to block-level storage. Edison conducted the research for this study with

an eye to the products’ viability within a multipurpose storage environment involving

primary active data in a production environment.

In the course of this validation, Edison ran performance tests on an EMC VNX 5500 and

a NetApp FAS3240. Their compression technology for block-based storage was stacked

up against the IBM Storwize V7000 Storage System’s Real-time Compression solution in

the areas of compressibility, performance, implementation transparency, and other

nuances of compression technology.

Audience

This competitive white paper is a public report that can be of value to IT decision

makers as well as storage and other data administrators seeking ways of maximizing the

efficiency of their storage technology investments.

Page 9: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 6

Overview

The explosive growth of data in the IT world is a long-established fact. Though it has

taken nobody by surprise, it has rapidly accelerated as technology has put more and

more of the world — including audio, image, video, and multifarious communications

— into the form of data. The amount of digital information in the world surpassed a

zettabyte (1 trillion gigabytes) for the first time history in 2010, according to IT industry

and market intelligence forecasting firm IDC.1 Yet IDC expects that mind-boggling

volume of data to nearly triple to 2.7 zettabytes during 2012, only two years later. A 2011

survey 2 conducted by the Independent Oracle Users Group found that almost one out of

10 of the respondent’s sites had data stores in the petabyte range.

To accommodate this burgeoning data, the number of servers (virtual and physical)

worldwide must grow by a factor of 10 over the next decade. The amount of information

managed by enterprise data centers will grow by a factor of 50 and the number of files

data centers will have to deal with will grow by a factor of 75. Meanwhile, the number of

IT professionals to manage this growth will grow by just 150 percent worldwide in the

same timeframe.3

More Data / Fewer Resources

As a result of this accelerating propagation of data, organizations’ data centers and the

IT vendors that provide technology solutions to them have been racing to grapple with

the challenges it poses for data backup and storage. Physical data storage continues to

drop in cost by as much as 25 percent per year.4 Nevertheless, the need for ever-

expanding capacity outstrips even these savings. Massive build-outs of infrastructure—

both physical capacity and enabling technology—are still required to accommodate the

increasing amount of data, as well as to manage critical backup, access, and recovery

requirements.

This is all taking place against an economic backdrop in which IT budgets have

stagnated or are increasing only incrementally. To sum up the situation: data capacity is

growing at the rate of 40 to 60 percent annually (due to both the explosion of

1John F Gantz, Chief Research Officer & Senior Vice President, IDC

http://www.emc.com/collateral/demos/microsites/emc-digital-universe-2011/index.htm 2 “The Petabyte Challenge: 2011 IOUG Database Growth Survey,” IOUG, 2011 3 “The 2011 Digital Universe Study: Extracting Value From Chaos,” IDC 2011 4 A good quote on this topic, from Wayne Salpietro at Database Trends and Applications can be found at:

http://www.dbta.com/Articles/Editorial/Trends-and-Applications/Getting-Out-of-Storage-Debt-77189.aspx

Page 10: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 7

unstructured data as well as compliance requirements compelling organizations to keep

data longer),5 yet IT spending is forecasted to increase at only 5 percent annually.6

Making Efficient Use of Data Storage Capacity

Against the backdrop described above, minimizing storage capacity requirements and

associated costs is of great importance. IT planners and administrators must look for

technology that enables greater efficiency in storage capacity utilization. Such

technology, offered by storage systems vendors, generally falls into four types:

Thin Provisioning — Storage capacity is shared, and is allocated to volumes on an

as-needed basis, increasing efficiency in the use of actual available physical capacity.

So, the need to overprovision volumes in anticipation of expected data growth is

reduced. Thin provisioning is a long-established but often underused strategy; one

advantage of compression is that it also enables data centers to gain the benefits of

thin provisioning.

Tiering —Involves allocating data to the most appropriate storage media in terms of

cost/performance factors. The media might range from costly SSDs (Solid State Disk

Drives) for data demanding the ultimate performance to low-cost, low-performing

tape media for archiving. Enterprise storage software typically offers some form of

automated tiering, which can allocate data to storage according to type or

prioritization rules pre-assigned by administrators.

Compression—Involves encoding information using fewer bits than the original

data via software algorithms.

Deduplication—Duplicate copies of the same data are removed and replaced by

pointers referencing the original string of data to be repeated.

The two approaches listed here that actually reduce the amount of data stored—

deduplication and compression—each have always been characterized by certain

limitations. Deduplication works only on data having multiple copies of identical data

chunks. Traditional approaches to compression typically must be performed on data

after it has been written to storage, which usually has significant adverse impact on

performance and may not actually deliver the efficient capacity utilization desired. For

this reason it is ordinarily not suitable for use with primary, active data.

5 “Data Growth Remains IT’s Biggest Challenge Gartner Says,” Computer World, Lucas Mearian,

November 2, 2010

http://www.computerworld.com/s/article/9194283/Data_growth_remains_IT_s_biggest_challenge

_Gartner_says 6 ”Big data: The Next Frontier for Innovation, Competition, and Productivity,” McKinsey Global

Institute, June 2011

Page 11: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 8

Naturally, any compression technology used to minimize the impact of expanding data

requirements must be efficient in reducing the need for storage capacity. It should,

however, also be transparent to operations, without compromising any of the criteria

actually driving the purchase or management of a given IT storage solution. Further, the

more different types of data to be compressed, the greater the benefits that can be

expected.

Page 12: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 9

What Makes IBM Real-time Compression Unique

Transparency

Use of Real-time Compression technology is completely transparent from the

perspective of the storage arrays and hosts making use of the technology. Applications

and arrays communicate with no configuration changes. Applications accessing data

from a server are entirely unaware that the data is compressed; the host operating

system sees the uncompressed data size. This provides complete application and host-

level transparency.

In addition, all of the existing downstream processes stay the same. Snapshots work as

they did before, with those snapshots occupying less storage space, allowing for

improved recovery point objectives (RPOs). Replication works as it did before, as do

backups and restores.

Performance

Edison’s earlier testing7 has shown that use of Real-time Compression Appliance has

either a positive effect or no effect on performance. In many of our tests, storage system

throughput was higher with compression on than without. For example, system

throughput was almost 57 percent greater for read operations with the Real-time

Compression Appliance compression enabled than without compression.

Compression

Since use of Real-time Compression technology is transparent, the effects of compression

are only perceived on the storage array in the form of higher available capacity. The

high compression levels possible for active block-level data results in lower acquisition

costs for storage capacity, while having no effect on performance or user experience.

Products Tested

The products tested for the study described in this paper are all block storage arrays

designed to target a mid-to-enterprise market.

7A Validation and Comparison of Storage Efficiency Technology: Compression, Edison Group, Jan. 2011

Page 13: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 10

EMC VNX 5500

EMC replaced its former CLARiiON and Celerra product lines with new models under

the VNX brand. These new storage systems combine many of the features of CLARiiON

and Celerra into a single platform on two hardware servers, which include several

hardware changes, including an update to the Intel processor in the controller. In

addition, EMC has joined the transition from 3.5-inch FC drives to 2.5-inch SAS drives as

the new standard for high performance enterprise-class spinning disks.8

The highly customized storage system can serve data over block-based protocols such as

Fibre Channel (FC), Fibre Channel over Ethernet (FCoE), and iSCSI. It is packaged in

rack-mountable enclosures that house up to 25 2.5-inch disk drives or SSDs, or up to 15

3.5-inch drives. Disk Processor Enclosures (DPEs) contain drives, redundant dual-active

intelligent RAID controllers, dual power supplies, and dual cooling components. Disk

Array Enclosures (DAEs) contain drives, switches, power supplies, and cooling

components. The VNX 5500 supports up to nine expansion enclosures can be attached to

a control enclosure, supporting up to 250 drives.

NetApp Fabric-Attached Storage (FAS) 3240

NetApp’s Fabric-Attached Storage (FAS) filers target enterprise-class SAN

environments. Like an EMC VNX system, these particular filers can also serve data over

block-based protocols such as Fibre Channel (FC), Fibre Channel over Ethernet (FCoE),

and iSCSI. NetApp filers implement their physical storage in large disk arrays. NetApp

filers use customized hardware, as well as the proprietary Data ONTAP operating

system, built specifically for storage-serving purposes. NetApp’s new compression

functionality is a component of Data ONTAP.

IBM Storwize V7000 Storage System

IBM Storwize V7000 with easy-to-use, fully integrated Real-time Compression is a

virtualized storage system designed to deliver simplicity of management, reduced cost,

highly scalable capacity, performance, and high availability. Its high-performance

implementation of Real-time Compression supports compression for primary active

workloads. IBM Storwize V7000 also offers improved efficiency and flexibility through

built-in solid state drives (SSD) optimization, standard thin provisioning, and non-

disruptive migration of data from existing storage. The system can virtualize and reuse

existing disk systems, offering a greater potential return on investment.

8EMC VNX Series also supports optional SSD and associated data tiering in conjunction with the addition of

EMC FAST Suite software.

Page 14: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 11

Research Results

This section presents high points of the results and summarized analysis of the research

conducted for this white paper. For descriptions of the particular tests performed, see

the appendix of this white paper entitled “Benchmarks Used in This Study.” For

descriptions of the compression technologies evaluated, see the appendix entitled

“Technology Solutions Evaluated.” For descriptions of the test environment and

methodology employed, see the appendix entitled “Research Methodology.”

Test One—Microsoft Exchange Jetstress

Microsoft Exchange is considered a very suitable workload for IBM Real-time

Compression. In addition, the application supports only block-level storage at this time.

The tool used to generate the workload for this test — Microsoft Jetstress — works with

the Microsoft Exchange Server database engine to simulate the Exchange database and

log disk input/output (I/O) load. It allows you to simulate a specific number of users for

the tests.

The benchmark configuration used was as follows:

Number of users: 1,600

Mailbox Size: 1024 MB

Two databases with one copy — databases on the 1 TB volume, and LOGS on the

500 GB volume

The Jetstress performance test was run for two hours, while background maintenance

was in progress. Table 1, below, presents the results of the tests comparing Real-time

Compression with EMC VNX compression in terms of compression efficiency, average

rate of compression, and total process time.

Table 1: Microsoft Exchange Jetstress Comparison: Real-time Compression and EMC

VNX Compression

IBM Storwize V7000

Real-time Compression

EMC VNX

Compression

Pre-compressed data size 1.65 TB 1.65 TB

Post-compression data size 151.80 GB 214.41 GB

Compression percentage 92% 87%

Page 15: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 12

IBM Storwize V7000

Real-time Compression

EMC VNX

Compression

Post-compression disk space gain 1.50 TB 1.44 TB

Average compression rate Wire Speed 17.5 GB/hr.

Total process time N/A 9 83 hrs / 46 mins.

As shown in this table, the IBM Storwize V7000 system delivered a greater degree of

compression than the EMC VNX 5500, while accomplishing that compression at wire

speed, on-the-fly. The Exchange Jetstress data compression with EMC VNX was 87

percent, compared to 92 percent on the V7000.

Note: Compression rates observed in this test are higher than real-world Exchange system data,

due to the synthetic data generated by the Jetstress tool.

Table 2, below, presents the transactional I/O rates achieved for both a compressed and

uncompressed NetApp FAS 3240 LUN, and for compressed V7000 VDisks.

Note: The NetApp Compressed LUN column in Table 2 includes inline and post process metrics;

this is because TPC-C makes updates to existing data, which are not compressed at all and are

offset to post process.

Table 2: Microsoft Exchange Comparison: IBM Real-time Compression and NetApp

Compression

Transactional I/O

Performance

NetApp

Uncompressed

LUN

NetApp Compressed

LUN (Inline + Post

Process)

V7000

Compressed

VDisks

I/O Database Reads

Average Latency (msec) 6.42 67.2 6.03

I/O Database Writes

Average Latency (msec) 2.6 1.8 2.7

I/O Database Reads/sec 1,661 68.5 1,755

I/O Database Writes/sec 921 46.5 995

Achieved Transactional

I/O per 2,582 115 2,750

9 This value reflects the process time for compression only. Generating the Exchange Jetstress data took five

hours in both cases. Real-Time Compression compressed the data on-the-fly as it was generated; thus took

no additional time for the compression process, while the EMC VNX compression process required an

additional 83 hrs, 46 mins.

Page 16: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 13

Note that the lower average writes latency value for the NetApp Compressed LUN

shown in the above table directly correlates with the extremely poor I/O writes

performance it demonstrates (46.5 writes/sec for compressed LUN versus 921 for

uncompressed), since more transactional IOPS leads to greater latency. A less seemingly

anomalous instance can be seen in the case of the V7000 Compressed VDisks showing

four percent greater latency for database writes over the NetApp Uncompressed LUN

(2.7 msec versus 2.6). This can occur because compression allows for more IOPS (995

writes/sec in this example, versus 921 for uncompressed data — or 8 percent more).

Test Two — Database Performance Measured with TPC-C

The TPC-C benchmark was designed to test the transaction performance of a complete

database server platform—hardware and software—in transactions per minute. Usually,

the results of the benchmark are presented in the context of cost per transaction, a metric

that is good for comparing the different platforms. For this study, TPC-C was used

solely to generate transaction loads that would demonstrate any effects on performance

resulting from using Real-time Compression.

Compression is gaining in importance for use with database systems as a means of

dealing with the exponential growth in the number of transactions and the storage

capacity needed to contain that growth. The NetApp FAS 3240 system used for this

study offers deduplication and compression features, but these features are not suitable

for the active data used by a database. In fact, NetApp recommends not using

deduplication on active database files.

Another approach to addressing this challenge is the use of expensive customized

hardware platforms with specialized storage subsystems, and changes to the database

architecture designed to enable high-performance compression. These systems work

well, but they require that existing databases be moved to the new systems and that

database design, as well as the design of the applications using those databases, be

modified to leverage the new platform.

Results of the TPC-C-based tests Edison ran to compare the effects of compression on

performance are presented in Table 3, below. Beside each simulated order function of

the benchmark is the response time for the NetApp system without compression, for the

same system using compression, and for the IBM Storwize V7000 system using

compression Real-time Compression.

Note: The NetApp Compressed LUN column in Table 3 includes inline and post process metrics;

this is because TPC-C makes updates to existing data, which are not compressed at all and are

offset to post process.

Page 17: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 14

Table 3: Database Transaction Performance: IBM Real-time Compression and NetApp

Compression

Transaction Name

NetApp

Uncompressed

LUN

Response Time –

Seconds

NetApp

Compressed LUN

(Inline + Post

Process)

Response Time –

Seconds

V7000

Compressed

VDisks

Response

Time –

Seconds

Stock Level 3.397 31.102 3.053

Delivery 0.606 4.133 0.566

Order status 0.039 0.222 0.039

Payment 0.027 0.912 0.027

New Order 0.175 1.75 0.158

tpmC (Throughput) 986 820 987

As indicated by these test results, response time for compressed data are about the same

(and even show some advantage in being lower for certain order functions) between the

NetApp system without compression and IBM V7000 with compression. However, with

NetApp compression applied transaction rates suffer — at times to a considerable

extent.

Page 18: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 15

Conclusions and Recommendations

Our research reveals that savings in overall storage capacity can result from utilizing

IBM Real-time Compression in a primary storage environment using block-level data.

Less data to store means less capacity is required. Savings are realized upfront on

storage requirements, of course. But lower storage requirements also means less

management and a smaller infrastructure footprint which, in turn, requires less energy

to power, thus resulting in additional savings downstream on capital expenditure and

energy costs. Because Real-time Compression has no adverse impact on performance in

the tests we ran, it is the only compression technology truly suitable for use with

primary, active data.

Unlike the compression technology offered with other multipurpose storage solutions

targeting the mid to enterprise market, IBM Real-time Compression works well with

structured (database) workloads that are utilized in both physical and virtual operating

platforms. The Storwize V7000 Real-time Compression is ideally suited for all random

access compressible workloads. Primary active data is compressed in real time, on-the-

fly, before it is written to disk. All data in a storage volume is always compressed in the

most optimized form, achieving highest compression savings at any point in time, and

over time.

Because IBM Real-time Compression is utilized on active, primary data, there are some

specific data types that are ideally suited for Real-time Compression. Those data types

that will see the highest compression rates include: text, database, CAD/CAM, VMware

virtual environments, collaboration data, files, and pre-production raw animation and

video data.

Because IBM Real-time Compression is best suited for these data types it is, therefore,

best utilized in certain vertical industries that rely heavily on utilizing those specific

applications such as CAD/CAM or Microsoft Office, or on raw video footage. Industries

that will see the biggest benefit from Real-time Compression include: retail and

manufacturing design, engineering, up-stream oil and gas, pre-production video,

telecom, life sciences, and insurance.

The test results in this white paper prove that the use of IBM Real-time Compression on

block-level data is transparent and that there is no degradation to average performance

with compression turned on. Significantly, in several of our test scenarios, system

throughput was greater with Real-time Compression enabled than without its use.

Finally, Edison found that IBM Real-time Compression delivers superior data system

compression over its competitors in primary active data environments.

Page 19: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 16

Appendices

Appendix 1—Benchmarks Used in this Study

Edison made use of TPC-C and the Exchange Load Generator (Microsoft Jetstress) tests

in order to generate block-level workloads to read and write data to the storage array.

The software was not used to achieve maximum performance for the systems under test

or create a publishable benchmark result. The goal was to attain reasonable performance

levels that would be consistent across the test runs.

Exchange Load Generator (Microsoft Jetstress)

Microsoft Exchange Jetstress Benchmark is typically employed to verify the performance

and stability of a disk subsystem prior to putting an Exchange server into production.

Jetstress helps verify disk performance by simulating Exchange disk Input/Output (I/O)

load. Specifically, Jetstress simulates the Exchange database and log file loads produced

by a specific number of users. Performance Monitor, Event Viewer, and ESEUTIL are

used in conjunction with Jetstress to verify that your disk subsystem meets or exceeds

the performance criteria that has been established. After a successful completion of the

Jetstress Disk Performance and Stress Tests in a non-production environment,

administrators will have ensured that their Exchange disk subsystem is adequately sized

(in terms of performance criteria they have established) for the user count and user

profiles they have also established.

TPC-C

TPC-C is an OLTP benchmark developed by the Transaction Performance Processing

Council (TPC – www.tpc.org). TPC-C simulates an order-entry application by executing

a mixture of read-only and update intensive transactions found in complex OLTP

application environments. TPC-C runs five different transactions against the database:

STOCK LEVEL – Checking the stock level

DELIVERY – Processing a batch of 10 orders

ORDER STATUS – Monitoring the status of orders

PAYMENT – Process a payment

NEW ORDER – Entering a complete order

Page 20: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 17

The TPC-C benchmark measures transactions per minute (tpmC), which indicates new

order transactions executed per minute and provides a measure of business throughput.

The benchmark also measure response time, which is the average time a user got a

response for each transaction.

Appendix 2— Research Methodology

Test Environment

IBM sought to validate the advantages of Real-time Compression on block-level data

storage, which is typically used on Storage Area Networks (SANs) and is often more

flexible and versatile for shared storage than file-level storage. It is also the only type of

storage currently supported by certain applications, such as Microsoft Exchange.

IBM asked Edison to perform product testing comparing the Real-time Compression

technology included with the IBM Storwize V7000 Storage System with the compression

technology included in two other storage systems: EMC VNX Series and NetApp FAS

family of filers. The overall generalized testing plan was as follows:

Run Microsoft Jetstress and TPC-C using block-level workloads with each system,

with and without compression.

Compare and document the results.

Validate the vendor’s claim for IBM Real-time Compression being suitable for use

with primary active workloads using block-level storage.

The test equipment for this study consisted of an EMC VNX 5500, NetApp FAS 3240 and

IBM Storwize V7000 mid-range storage systems, IBM System x 3550 x86-based servers

for load generation, 8GB Fibre Channel (FC) networks for SAN and 1 GbE Networks for

management, respectively. The specifics for each component are as follows:

EMC Configuration

EMC VNX 5500

One 3TB thin-provisioned LUN

One 40-disk RAID pool (RAID5), compression ratio medium/high 10

10 Compression ratio was initially set at Medium; however, when only 16 percent of the data had

compressed after 36 hours in the course of the Microsoft Jetstress testing, EMC recommended changing the

compression setting from Medium to High thereafter.

Page 21: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 18

NetApp Configuration

N6240 Cluster — 24 x disks (SAS 10 K RPM)

Two Aggregates: each aggregate built from 12 disks; RAID-DP

12 disks for each node

IBM Configuration

IBM Storwize V7000 storage system — Total of 24 disks (SAS x 10 K RPM)

Two pools , each pool built from 12 disks; RAID5 (7+1)

12 disks for each node

Other Configuration Specifications

Client Systems—Five IBM System x x3550 M3 servers with 42GB RAM, three 300GB

SAS HDD, two Intel Xeon E5649 2.53GHz processors with six cores apiece (12 cores

total); IBM QLogic 8GB FC Dual Port HBA for System x.

Network — QLogic SANbox 5800 series 8G FC dual port switch (for the comparisons

with the EMC VNX system); Cisco MDS-9148 (for the comparisons with the NetApp

system); all clients connected to the same FC switch as the storage array, with all

client ports zoned with all storage ports, and all LUNs masked to all hosts for

flexibility in testing.

Software — Included: Oracle v11g running on Red Hat Enterprise Linux 6X (1.2TB

data set); VMware vSphere 4 or 5 with ESXi 4 or 5 (800 GB data set); Microsoft

Windows 2008 Server R2 Standard Edition SP1, Exchange 2010 configured with two

hosts, one as Edge Server performing other Exchange roles (1.6 TB data set);

Microsoft Jetstress Version Multiple Path I/O (MPIO) 2; Data ONTAP 8.17-mode (for

the NetApp storage system).

Test Objectives

Edison conducted two separate tests with the goal of determining and comparing ease of

configuration, immediate compression rates and compression over time, performance,

and transparency.

Edison considered two types of transparency in the test environment: configuration and

operational transparency. Configuration transparency is determined by whether

changes need to be made to server and storage array configurations in order to use the

appliance. Operational transparency, an extension of configuration transparency, is

determined by assessing whether or not the appliance is visible to the applications

running on the host and, in turn, visible to end users.

Page 22: A Comparative Evaluation of Block Storage Compression ... · PDF fileA Comparative Evaluation of Block ... The EMC VNX Series, by contrast, is designed to compress inactive content

Edison: Evaluation of Block Storage Compression Technology Page 19

All testing was conducted using block-level data on the respective platforms, as follows:

EMC VNX with and without compression

Note: Following the results of Test One —Microsoft Exchange Transactions

Measured with Jetstress— it was determined that EMC VNX with compression is

entirely impractical for all compressible data other than that which is to be

completely dormant or permanently archived. Edison concluded that the EMC

VNX Series compression is unsuitable for the sort of mixed-environment

production data environment targeted by the product itself, and which this study

was intended to address. Therefore, no further testing on the EMC VNX 5500

was conducted for the remainder of the study.

NetApp FAS 3240 with and without compression (two sets of test runs made on the

NetApp FAS 3240 for each).

IBM Storwize V7000 with Real-time Compression.

The tests are described in Table 4 below.

Table 4: Test Cases

Test Name Description

1 Exchange

Compare compression efficiency, average rate of

compression, and total process time for a Microsoft

Exchange data workload.

2 TPC-C

Compare compression rates in various scenarios, including:

OLTP/database use-case (test application response time)

Compression rates


Recommended