Date post: | 21-Mar-2018 |
Category: |
Documents |
Upload: | hoangtuong |
View: | 214 times |
Download: | 2 times |
89 Fifth Avenue, 7th
Floor
New York, NY 10003
www.TheEdison.com
212.367.7400
89 Fifth Avenue, 7th
Floor
New York, NY 10003
www.TheEdison.com
212.367.7400
89 Fifth Avenue, 7th Floor
New York, NY 10003
www.TheEdison.com
212.367.7400
89 Fifth Avenue, 7th
Floor
New York, NY 10003
www.TheEdison.com
212.367.7400
89 Fifth Avenue, 7th
Floor
New York, NY 10003
www.TheEdison.com
212.367.7400
89 Fifth Avenue, 7th Floor
New York, NY 10003
www.TheEdison.com
212.367.7400
82
82
White Paper
A Comparative Evaluation of Block
Storage Compression Technology for
Multipurpose Storage Environments
IBM Storwize 7000 Storage System Real-time
Compression
EMC VNX Embedded Compression
NetApp — Fabric-Attached Storage (FAS) Compression
Printed in the United States of America.
Copyright 2012 Edison Group, Inc. New York. Edison Group offers no warranty either
expressed or implied on the information contained herein and shall be held harmless for errors
resulting from its use.
All products are trademarks of their respective owners.
First Publication: July 2012
Produced by: Craig Norris, Senior Analyst; Barry Cohen, Chief Analyst and Editor-in-Chief;
Manny Frishberg, Editor
Table of Contents
Executive Summary ..................................................................................................................... 1
EMC Compression Findings ................................................................................................... 2
NetApp Compression Findings .............................................................................................. 3
Executive Summary Conclusions ........................................................................................... 4
Introduction .................................................................................................................................. 5
Audience .................................................................................................................................... 5
Overview ....................................................................................................................................... 6
More Data / Fewer Resources .................................................................................................. 6
Making Efficient Use of Data Storage Capacity .................................................................... 7
What Makes IBM Real-time Compression Unique ............................................................... 9
Transparency ............................................................................................................................. 9
Performance ............................................................................................................................... 9
Compression .............................................................................................................................. 9
Products Tested ......................................................................................................................... 9
EMC VNX 5500 .................................................................................................................... 10
NetApp Fabric-Attached Storage (FAS) 3240 ................................................................. 10
IBM Storwize V7000 Storage System ................................................................................ 10
Research Results ......................................................................................................................... 11
Test One—Microsoft Exchange Jetstress ............................................................................. 11
Test Two — Database Performance Measured with TPC-C ............................................. 13
Conclusions and Recommendations ...................................................................................... 15
Appendices .................................................................................................................................. 16
Appendix 1—Benchmarks Used in this Study ................................................................... 16
Exchange Load Generator (Microsoft Jetstress) .............................................................. 16
TPC-C .................................................................................................................................... 16
Appendix 2— Research Methodology ................................................................................. 17
Test Environment ................................................................................................................ 17
Test Objectives ..................................................................................................................... 18
Edison: Evaluation of Block Storage Compression Technology Page 1
Executive Summary
With its 2010 launch of Real-time Compression, IBM announced the first real time data
compression technology that captures and compresses data before it is stored on disk.
This enabled both a tremendous increase in existing storage capacity and an impressive
decrease in storage footprints. In the area of storage optimization through compression,
IBM leapt ahead of its competitors by providing real-time data compression that shrank
primary, online data in real time. The end result was data compression without any
impact to application performance.
In conducting the research for this study, Edison Group sought to evaluate the
practicality of the compression technologies included with leading storage systems for
the mid to enterprise market, as compared with the Real-time Compression technology
now included with the IBM Storwize V7000 Storage System.
As storage system vendors trip over one another to offer technologies that improve
storage system utilization, some technologies may be added more to appear competitive
than to serve a practical purpose in an actual production storage environment.
Customers should evaluate such technologies carefully to determine whether or not they
are truly applicable to their particular IT scenarios.
Edison evaluated the product offerings in this study with an eye to their applicability
within multipurpose storage environments involving primary active data in a
production environment, as opposed to their applicability for specific, dedicated storage
purposes. To this end, the Real-time Compression technology included with the IBM
Storwize V7000 Storage System is compared with the compression technology included
with both the EMC VNX Series and the NetApp Fabric-Attached Storage (FAS) family of
filer storage systems.
The testing workloads in this study consisted of block-level data typical of production-
environment SANs (Storage Area Networks). Edison validated IBM’s claims in the areas
of transparency, performance, and maximum compression over time. Compressing
block-level Exchange Jetstress test data using IBM Real-time Compression occurred on-
the-fly at wire speed. Compressing block-level data using TPC-C benchmark in an
Oracle database environment resulted in no performance impact (other than some small
improvement for certain transactional functions).
Edison: Evaluation of Block Storage Compression Technology Page 2
To sum up the research findings for Real-time Compression:
Complete transparency: Real-time Compression is invisible to the server operating
systems and applications, which means that no administrative overhead is required
to use it. The EMC VNX Series, by contrast, is designed to compress inactive content
and is entirely unsuitable for active data. While the NetApp FAS 3240 demonstrated
file copy transparency with pre-processed compression on initial writes, for any
subsequent changes to stored data, compression is performed after processing. Thus
the NetApp FAS native compression is not practical for highly active, frequently
changing data utilization in applications such as databases.
No impact on performance: Edison found no impact on performance with Real-time
Compression and, in fact, found some advantage over NetApp compression for
certain TPC-C order functions in the database transaction tests.
Minimal management requirements: Beyond determining which data the
compression can be applied to (since not all data is compressible), Real-time
Compression involves no additional administrative complexity. Since no post-
processing impacts data protection scheduling and policies, these remain
unchanged.
Reduced storage capacity required: Being able to use compression on all
compressible data means less capacity must be obtained at initial acquisition and less
additional capacity needs to be acquired as data inevitably grows.
EMC Compression Findings
Edison ran Microsoft’s Jetstress benchmark to compare the Real-time Compression
technology of IBM Storwize V7000 to the embedded compression technology of EMC
VNX. Edison discovered it took VNX compression over 83 hours to compress the same
1.65 TB amount of data to a similar compression ratio as Real-time Compression
compressed instantaneously, on-the-fly.
It should be noted that even the exceedingly slow compression time of the EMC VNX
technology was achieved on a test system dedicated entirely to the task, so that it could
be kept idle until compression was completed. In an actual production scenario, where
resources and I/O are being allocated to higher-priority and/or business-critical tasks, no
compression would occur at all during those times, rendering the compression feature
all but useless. The sheer amount of data generated and/or changed in the course of
practical day-to-day operations would outpace the ability of the system to compress it
all.
Edison: Evaluation of Block Storage Compression Technology Page 3
A process that compresses data over a timeframe of several weeks, as Edison has
determined would be the case with EMC VNX compression, is impractical for any data
other than that which is to be entirely dormant or permanently archived. Therefore, the
compression technology included with the EMC VNX line is really at odds with the
mixed storage environment for which the devices themselves are usually purposed.
NetApp Compression Findings
Unlike EMC VNX compression, NetApp compression does have the ability to compress
data during initial writes (on-the-fly). However, it is unable to compress updates to
stored data during writes. This causes compression rates to deteriorate over short
periods of time because this changed data must be stored uncompressed awaiting post-
process compression. With NetApp FAS compression, there is a tight correlation
between the percentage of data changing and the decrease in the amount of data
compressed; after a 24-hour Exchange Jetstress run, the data compression ratio
decreased by 50 percent.
Additionally, the NetApp compression that takes place post-process — that is, where
changes are made to stored data — heavily utilizes the system CPUs (as much as 100
percent) and disks (100 percent). Again, it is important to note that this was the case in a
test system that was otherwise idle. In a true mixed production environment, this
characteristic could either have unacceptable impact on other more business-critical
processes, or else be prioritized such that the compression process could take an
unacceptably long time.
As expected, our tests indicated that, when post-process compression took place while
the Exchange Jetstress test was running, the application performance decreased
dramatically and the post-process compression scarcely progressed.
In summary, Edison’s testing revealed that NetApp post-processing incurs a significant
performance penalty on further writes, due to two factors:
100 percent drive utilization for compression, with up to 100 percent CPU utilization
without further I/O.
100 percent CPU utilization on top of 100 percent drive utilization when additional
workloads are run, which nearly locks the system up completely.
Edison: Evaluation of Block Storage Compression Technology Page 4
Executive Summary Conclusions
All EMC VNX data compression occurs post-process (after written uncompressed to the
storage device), as does compression of any changes made to data with NetApp
technology. Not only does this approach severely impact stored data access
performance, but the data must also be stored in its uncompressed form before post-
process compression can handle it. For this reason, both EMC and NetApp recommend
sizing compressed devices for data at its original size. If followed, this practice
eliminates the benefits of deploying compression.
Edison believes that the compression features of the EMC VNX are best utilized in
environments where an entire VNX array can be dedicated to compressible low-
throughput workloads, and for which considerable idle periods are available. Though
there may be cases where this is a valid approach, there are other solutions on the
market that would be more appropriate. NetApp compression may be practical for
applications that regularly write data to storage during production operations with few
or no further updates. However, the fact that it must update data post-processed makes
it impractical for applications — such as databases — where frequent changes are made
to stored data.
Edison’s findings validate IBM’s reasoning in offering Real-time Compression with
multipurpose storage systems for production IT environments using active primary
data. For organizations purchasing a storage solution where efficient data capacity
utilization is an important criterion, with Real-time Compression they need no longer
compromise application performance or purchase multiple solutions for dedicated
storage purposes.
Edison: Evaluation of Block Storage Compression Technology Page 5
Introduction
IBM has asked Edison to perform product testing comparing the Real-time Compression
technology included with the IBM Storwize V7000 Storage System to the compression
technology included in both the EMC VNX Series of storage systems and the NetApp
Fabric-Attached Storage (FAS) family of filers. The purpose is to validate IBM’s claims in
the areas of maximum compression, design for active primary data workloads, and the
technology’s transparency to applications, storage, networks, and downstream process
as these apply to block-level storage. Edison conducted the research for this study with
an eye to the products’ viability within a multipurpose storage environment involving
primary active data in a production environment.
In the course of this validation, Edison ran performance tests on an EMC VNX 5500 and
a NetApp FAS3240. Their compression technology for block-based storage was stacked
up against the IBM Storwize V7000 Storage System’s Real-time Compression solution in
the areas of compressibility, performance, implementation transparency, and other
nuances of compression technology.
Audience
This competitive white paper is a public report that can be of value to IT decision
makers as well as storage and other data administrators seeking ways of maximizing the
efficiency of their storage technology investments.
Edison: Evaluation of Block Storage Compression Technology Page 6
Overview
The explosive growth of data in the IT world is a long-established fact. Though it has
taken nobody by surprise, it has rapidly accelerated as technology has put more and
more of the world — including audio, image, video, and multifarious communications
— into the form of data. The amount of digital information in the world surpassed a
zettabyte (1 trillion gigabytes) for the first time history in 2010, according to IT industry
and market intelligence forecasting firm IDC.1 Yet IDC expects that mind-boggling
volume of data to nearly triple to 2.7 zettabytes during 2012, only two years later. A 2011
survey 2 conducted by the Independent Oracle Users Group found that almost one out of
10 of the respondent’s sites had data stores in the petabyte range.
To accommodate this burgeoning data, the number of servers (virtual and physical)
worldwide must grow by a factor of 10 over the next decade. The amount of information
managed by enterprise data centers will grow by a factor of 50 and the number of files
data centers will have to deal with will grow by a factor of 75. Meanwhile, the number of
IT professionals to manage this growth will grow by just 150 percent worldwide in the
same timeframe.3
More Data / Fewer Resources
As a result of this accelerating propagation of data, organizations’ data centers and the
IT vendors that provide technology solutions to them have been racing to grapple with
the challenges it poses for data backup and storage. Physical data storage continues to
drop in cost by as much as 25 percent per year.4 Nevertheless, the need for ever-
expanding capacity outstrips even these savings. Massive build-outs of infrastructure—
both physical capacity and enabling technology—are still required to accommodate the
increasing amount of data, as well as to manage critical backup, access, and recovery
requirements.
This is all taking place against an economic backdrop in which IT budgets have
stagnated or are increasing only incrementally. To sum up the situation: data capacity is
growing at the rate of 40 to 60 percent annually (due to both the explosion of
1John F Gantz, Chief Research Officer & Senior Vice President, IDC
http://www.emc.com/collateral/demos/microsites/emc-digital-universe-2011/index.htm 2 “The Petabyte Challenge: 2011 IOUG Database Growth Survey,” IOUG, 2011 3 “The 2011 Digital Universe Study: Extracting Value From Chaos,” IDC 2011 4 A good quote on this topic, from Wayne Salpietro at Database Trends and Applications can be found at:
http://www.dbta.com/Articles/Editorial/Trends-and-Applications/Getting-Out-of-Storage-Debt-77189.aspx
Edison: Evaluation of Block Storage Compression Technology Page 7
unstructured data as well as compliance requirements compelling organizations to keep
data longer),5 yet IT spending is forecasted to increase at only 5 percent annually.6
Making Efficient Use of Data Storage Capacity
Against the backdrop described above, minimizing storage capacity requirements and
associated costs is of great importance. IT planners and administrators must look for
technology that enables greater efficiency in storage capacity utilization. Such
technology, offered by storage systems vendors, generally falls into four types:
Thin Provisioning — Storage capacity is shared, and is allocated to volumes on an
as-needed basis, increasing efficiency in the use of actual available physical capacity.
So, the need to overprovision volumes in anticipation of expected data growth is
reduced. Thin provisioning is a long-established but often underused strategy; one
advantage of compression is that it also enables data centers to gain the benefits of
thin provisioning.
Tiering —Involves allocating data to the most appropriate storage media in terms of
cost/performance factors. The media might range from costly SSDs (Solid State Disk
Drives) for data demanding the ultimate performance to low-cost, low-performing
tape media for archiving. Enterprise storage software typically offers some form of
automated tiering, which can allocate data to storage according to type or
prioritization rules pre-assigned by administrators.
Compression—Involves encoding information using fewer bits than the original
data via software algorithms.
Deduplication—Duplicate copies of the same data are removed and replaced by
pointers referencing the original string of data to be repeated.
The two approaches listed here that actually reduce the amount of data stored—
deduplication and compression—each have always been characterized by certain
limitations. Deduplication works only on data having multiple copies of identical data
chunks. Traditional approaches to compression typically must be performed on data
after it has been written to storage, which usually has significant adverse impact on
performance and may not actually deliver the efficient capacity utilization desired. For
this reason it is ordinarily not suitable for use with primary, active data.
5 “Data Growth Remains IT’s Biggest Challenge Gartner Says,” Computer World, Lucas Mearian,
November 2, 2010
http://www.computerworld.com/s/article/9194283/Data_growth_remains_IT_s_biggest_challenge
_Gartner_says 6 ”Big data: The Next Frontier for Innovation, Competition, and Productivity,” McKinsey Global
Institute, June 2011
Edison: Evaluation of Block Storage Compression Technology Page 8
Naturally, any compression technology used to minimize the impact of expanding data
requirements must be efficient in reducing the need for storage capacity. It should,
however, also be transparent to operations, without compromising any of the criteria
actually driving the purchase or management of a given IT storage solution. Further, the
more different types of data to be compressed, the greater the benefits that can be
expected.
Edison: Evaluation of Block Storage Compression Technology Page 9
What Makes IBM Real-time Compression Unique
Transparency
Use of Real-time Compression technology is completely transparent from the
perspective of the storage arrays and hosts making use of the technology. Applications
and arrays communicate with no configuration changes. Applications accessing data
from a server are entirely unaware that the data is compressed; the host operating
system sees the uncompressed data size. This provides complete application and host-
level transparency.
In addition, all of the existing downstream processes stay the same. Snapshots work as
they did before, with those snapshots occupying less storage space, allowing for
improved recovery point objectives (RPOs). Replication works as it did before, as do
backups and restores.
Performance
Edison’s earlier testing7 has shown that use of Real-time Compression Appliance has
either a positive effect or no effect on performance. In many of our tests, storage system
throughput was higher with compression on than without. For example, system
throughput was almost 57 percent greater for read operations with the Real-time
Compression Appliance compression enabled than without compression.
Compression
Since use of Real-time Compression technology is transparent, the effects of compression
are only perceived on the storage array in the form of higher available capacity. The
high compression levels possible for active block-level data results in lower acquisition
costs for storage capacity, while having no effect on performance or user experience.
Products Tested
The products tested for the study described in this paper are all block storage arrays
designed to target a mid-to-enterprise market.
7A Validation and Comparison of Storage Efficiency Technology: Compression, Edison Group, Jan. 2011
Edison: Evaluation of Block Storage Compression Technology Page 10
EMC VNX 5500
EMC replaced its former CLARiiON and Celerra product lines with new models under
the VNX brand. These new storage systems combine many of the features of CLARiiON
and Celerra into a single platform on two hardware servers, which include several
hardware changes, including an update to the Intel processor in the controller. In
addition, EMC has joined the transition from 3.5-inch FC drives to 2.5-inch SAS drives as
the new standard for high performance enterprise-class spinning disks.8
The highly customized storage system can serve data over block-based protocols such as
Fibre Channel (FC), Fibre Channel over Ethernet (FCoE), and iSCSI. It is packaged in
rack-mountable enclosures that house up to 25 2.5-inch disk drives or SSDs, or up to 15
3.5-inch drives. Disk Processor Enclosures (DPEs) contain drives, redundant dual-active
intelligent RAID controllers, dual power supplies, and dual cooling components. Disk
Array Enclosures (DAEs) contain drives, switches, power supplies, and cooling
components. The VNX 5500 supports up to nine expansion enclosures can be attached to
a control enclosure, supporting up to 250 drives.
NetApp Fabric-Attached Storage (FAS) 3240
NetApp’s Fabric-Attached Storage (FAS) filers target enterprise-class SAN
environments. Like an EMC VNX system, these particular filers can also serve data over
block-based protocols such as Fibre Channel (FC), Fibre Channel over Ethernet (FCoE),
and iSCSI. NetApp filers implement their physical storage in large disk arrays. NetApp
filers use customized hardware, as well as the proprietary Data ONTAP operating
system, built specifically for storage-serving purposes. NetApp’s new compression
functionality is a component of Data ONTAP.
IBM Storwize V7000 Storage System
IBM Storwize V7000 with easy-to-use, fully integrated Real-time Compression is a
virtualized storage system designed to deliver simplicity of management, reduced cost,
highly scalable capacity, performance, and high availability. Its high-performance
implementation of Real-time Compression supports compression for primary active
workloads. IBM Storwize V7000 also offers improved efficiency and flexibility through
built-in solid state drives (SSD) optimization, standard thin provisioning, and non-
disruptive migration of data from existing storage. The system can virtualize and reuse
existing disk systems, offering a greater potential return on investment.
8EMC VNX Series also supports optional SSD and associated data tiering in conjunction with the addition of
EMC FAST Suite software.
Edison: Evaluation of Block Storage Compression Technology Page 11
Research Results
This section presents high points of the results and summarized analysis of the research
conducted for this white paper. For descriptions of the particular tests performed, see
the appendix of this white paper entitled “Benchmarks Used in This Study.” For
descriptions of the compression technologies evaluated, see the appendix entitled
“Technology Solutions Evaluated.” For descriptions of the test environment and
methodology employed, see the appendix entitled “Research Methodology.”
Test One—Microsoft Exchange Jetstress
Microsoft Exchange is considered a very suitable workload for IBM Real-time
Compression. In addition, the application supports only block-level storage at this time.
The tool used to generate the workload for this test — Microsoft Jetstress — works with
the Microsoft Exchange Server database engine to simulate the Exchange database and
log disk input/output (I/O) load. It allows you to simulate a specific number of users for
the tests.
The benchmark configuration used was as follows:
Number of users: 1,600
Mailbox Size: 1024 MB
Two databases with one copy — databases on the 1 TB volume, and LOGS on the
500 GB volume
The Jetstress performance test was run for two hours, while background maintenance
was in progress. Table 1, below, presents the results of the tests comparing Real-time
Compression with EMC VNX compression in terms of compression efficiency, average
rate of compression, and total process time.
Table 1: Microsoft Exchange Jetstress Comparison: Real-time Compression and EMC
VNX Compression
IBM Storwize V7000
Real-time Compression
EMC VNX
Compression
Pre-compressed data size 1.65 TB 1.65 TB
Post-compression data size 151.80 GB 214.41 GB
Compression percentage 92% 87%
Edison: Evaluation of Block Storage Compression Technology Page 12
IBM Storwize V7000
Real-time Compression
EMC VNX
Compression
Post-compression disk space gain 1.50 TB 1.44 TB
Average compression rate Wire Speed 17.5 GB/hr.
Total process time N/A 9 83 hrs / 46 mins.
As shown in this table, the IBM Storwize V7000 system delivered a greater degree of
compression than the EMC VNX 5500, while accomplishing that compression at wire
speed, on-the-fly. The Exchange Jetstress data compression with EMC VNX was 87
percent, compared to 92 percent on the V7000.
Note: Compression rates observed in this test are higher than real-world Exchange system data,
due to the synthetic data generated by the Jetstress tool.
Table 2, below, presents the transactional I/O rates achieved for both a compressed and
uncompressed NetApp FAS 3240 LUN, and for compressed V7000 VDisks.
Note: The NetApp Compressed LUN column in Table 2 includes inline and post process metrics;
this is because TPC-C makes updates to existing data, which are not compressed at all and are
offset to post process.
Table 2: Microsoft Exchange Comparison: IBM Real-time Compression and NetApp
Compression
Transactional I/O
Performance
NetApp
Uncompressed
LUN
NetApp Compressed
LUN (Inline + Post
Process)
V7000
Compressed
VDisks
I/O Database Reads
Average Latency (msec) 6.42 67.2 6.03
I/O Database Writes
Average Latency (msec) 2.6 1.8 2.7
I/O Database Reads/sec 1,661 68.5 1,755
I/O Database Writes/sec 921 46.5 995
Achieved Transactional
I/O per 2,582 115 2,750
9 This value reflects the process time for compression only. Generating the Exchange Jetstress data took five
hours in both cases. Real-Time Compression compressed the data on-the-fly as it was generated; thus took
no additional time for the compression process, while the EMC VNX compression process required an
additional 83 hrs, 46 mins.
Edison: Evaluation of Block Storage Compression Technology Page 13
Note that the lower average writes latency value for the NetApp Compressed LUN
shown in the above table directly correlates with the extremely poor I/O writes
performance it demonstrates (46.5 writes/sec for compressed LUN versus 921 for
uncompressed), since more transactional IOPS leads to greater latency. A less seemingly
anomalous instance can be seen in the case of the V7000 Compressed VDisks showing
four percent greater latency for database writes over the NetApp Uncompressed LUN
(2.7 msec versus 2.6). This can occur because compression allows for more IOPS (995
writes/sec in this example, versus 921 for uncompressed data — or 8 percent more).
Test Two — Database Performance Measured with TPC-C
The TPC-C benchmark was designed to test the transaction performance of a complete
database server platform—hardware and software—in transactions per minute. Usually,
the results of the benchmark are presented in the context of cost per transaction, a metric
that is good for comparing the different platforms. For this study, TPC-C was used
solely to generate transaction loads that would demonstrate any effects on performance
resulting from using Real-time Compression.
Compression is gaining in importance for use with database systems as a means of
dealing with the exponential growth in the number of transactions and the storage
capacity needed to contain that growth. The NetApp FAS 3240 system used for this
study offers deduplication and compression features, but these features are not suitable
for the active data used by a database. In fact, NetApp recommends not using
deduplication on active database files.
Another approach to addressing this challenge is the use of expensive customized
hardware platforms with specialized storage subsystems, and changes to the database
architecture designed to enable high-performance compression. These systems work
well, but they require that existing databases be moved to the new systems and that
database design, as well as the design of the applications using those databases, be
modified to leverage the new platform.
Results of the TPC-C-based tests Edison ran to compare the effects of compression on
performance are presented in Table 3, below. Beside each simulated order function of
the benchmark is the response time for the NetApp system without compression, for the
same system using compression, and for the IBM Storwize V7000 system using
compression Real-time Compression.
Note: The NetApp Compressed LUN column in Table 3 includes inline and post process metrics;
this is because TPC-C makes updates to existing data, which are not compressed at all and are
offset to post process.
Edison: Evaluation of Block Storage Compression Technology Page 14
Table 3: Database Transaction Performance: IBM Real-time Compression and NetApp
Compression
Transaction Name
NetApp
Uncompressed
LUN
Response Time –
Seconds
NetApp
Compressed LUN
(Inline + Post
Process)
Response Time –
Seconds
V7000
Compressed
VDisks
Response
Time –
Seconds
Stock Level 3.397 31.102 3.053
Delivery 0.606 4.133 0.566
Order status 0.039 0.222 0.039
Payment 0.027 0.912 0.027
New Order 0.175 1.75 0.158
tpmC (Throughput) 986 820 987
As indicated by these test results, response time for compressed data are about the same
(and even show some advantage in being lower for certain order functions) between the
NetApp system without compression and IBM V7000 with compression. However, with
NetApp compression applied transaction rates suffer — at times to a considerable
extent.
Edison: Evaluation of Block Storage Compression Technology Page 15
Conclusions and Recommendations
Our research reveals that savings in overall storage capacity can result from utilizing
IBM Real-time Compression in a primary storage environment using block-level data.
Less data to store means less capacity is required. Savings are realized upfront on
storage requirements, of course. But lower storage requirements also means less
management and a smaller infrastructure footprint which, in turn, requires less energy
to power, thus resulting in additional savings downstream on capital expenditure and
energy costs. Because Real-time Compression has no adverse impact on performance in
the tests we ran, it is the only compression technology truly suitable for use with
primary, active data.
Unlike the compression technology offered with other multipurpose storage solutions
targeting the mid to enterprise market, IBM Real-time Compression works well with
structured (database) workloads that are utilized in both physical and virtual operating
platforms. The Storwize V7000 Real-time Compression is ideally suited for all random
access compressible workloads. Primary active data is compressed in real time, on-the-
fly, before it is written to disk. All data in a storage volume is always compressed in the
most optimized form, achieving highest compression savings at any point in time, and
over time.
Because IBM Real-time Compression is utilized on active, primary data, there are some
specific data types that are ideally suited for Real-time Compression. Those data types
that will see the highest compression rates include: text, database, CAD/CAM, VMware
virtual environments, collaboration data, files, and pre-production raw animation and
video data.
Because IBM Real-time Compression is best suited for these data types it is, therefore,
best utilized in certain vertical industries that rely heavily on utilizing those specific
applications such as CAD/CAM or Microsoft Office, or on raw video footage. Industries
that will see the biggest benefit from Real-time Compression include: retail and
manufacturing design, engineering, up-stream oil and gas, pre-production video,
telecom, life sciences, and insurance.
The test results in this white paper prove that the use of IBM Real-time Compression on
block-level data is transparent and that there is no degradation to average performance
with compression turned on. Significantly, in several of our test scenarios, system
throughput was greater with Real-time Compression enabled than without its use.
Finally, Edison found that IBM Real-time Compression delivers superior data system
compression over its competitors in primary active data environments.
Edison: Evaluation of Block Storage Compression Technology Page 16
Appendices
Appendix 1—Benchmarks Used in this Study
Edison made use of TPC-C and the Exchange Load Generator (Microsoft Jetstress) tests
in order to generate block-level workloads to read and write data to the storage array.
The software was not used to achieve maximum performance for the systems under test
or create a publishable benchmark result. The goal was to attain reasonable performance
levels that would be consistent across the test runs.
Exchange Load Generator (Microsoft Jetstress)
Microsoft Exchange Jetstress Benchmark is typically employed to verify the performance
and stability of a disk subsystem prior to putting an Exchange server into production.
Jetstress helps verify disk performance by simulating Exchange disk Input/Output (I/O)
load. Specifically, Jetstress simulates the Exchange database and log file loads produced
by a specific number of users. Performance Monitor, Event Viewer, and ESEUTIL are
used in conjunction with Jetstress to verify that your disk subsystem meets or exceeds
the performance criteria that has been established. After a successful completion of the
Jetstress Disk Performance and Stress Tests in a non-production environment,
administrators will have ensured that their Exchange disk subsystem is adequately sized
(in terms of performance criteria they have established) for the user count and user
profiles they have also established.
TPC-C
TPC-C is an OLTP benchmark developed by the Transaction Performance Processing
Council (TPC – www.tpc.org). TPC-C simulates an order-entry application by executing
a mixture of read-only and update intensive transactions found in complex OLTP
application environments. TPC-C runs five different transactions against the database:
STOCK LEVEL – Checking the stock level
DELIVERY – Processing a batch of 10 orders
ORDER STATUS – Monitoring the status of orders
PAYMENT – Process a payment
NEW ORDER – Entering a complete order
Edison: Evaluation of Block Storage Compression Technology Page 17
The TPC-C benchmark measures transactions per minute (tpmC), which indicates new
order transactions executed per minute and provides a measure of business throughput.
The benchmark also measure response time, which is the average time a user got a
response for each transaction.
Appendix 2— Research Methodology
Test Environment
IBM sought to validate the advantages of Real-time Compression on block-level data
storage, which is typically used on Storage Area Networks (SANs) and is often more
flexible and versatile for shared storage than file-level storage. It is also the only type of
storage currently supported by certain applications, such as Microsoft Exchange.
IBM asked Edison to perform product testing comparing the Real-time Compression
technology included with the IBM Storwize V7000 Storage System with the compression
technology included in two other storage systems: EMC VNX Series and NetApp FAS
family of filers. The overall generalized testing plan was as follows:
Run Microsoft Jetstress and TPC-C using block-level workloads with each system,
with and without compression.
Compare and document the results.
Validate the vendor’s claim for IBM Real-time Compression being suitable for use
with primary active workloads using block-level storage.
The test equipment for this study consisted of an EMC VNX 5500, NetApp FAS 3240 and
IBM Storwize V7000 mid-range storage systems, IBM System x 3550 x86-based servers
for load generation, 8GB Fibre Channel (FC) networks for SAN and 1 GbE Networks for
management, respectively. The specifics for each component are as follows:
EMC Configuration
EMC VNX 5500
One 3TB thin-provisioned LUN
One 40-disk RAID pool (RAID5), compression ratio medium/high 10
10 Compression ratio was initially set at Medium; however, when only 16 percent of the data had
compressed after 36 hours in the course of the Microsoft Jetstress testing, EMC recommended changing the
compression setting from Medium to High thereafter.
Edison: Evaluation of Block Storage Compression Technology Page 18
NetApp Configuration
N6240 Cluster — 24 x disks (SAS 10 K RPM)
Two Aggregates: each aggregate built from 12 disks; RAID-DP
12 disks for each node
IBM Configuration
IBM Storwize V7000 storage system — Total of 24 disks (SAS x 10 K RPM)
Two pools , each pool built from 12 disks; RAID5 (7+1)
12 disks for each node
Other Configuration Specifications
Client Systems—Five IBM System x x3550 M3 servers with 42GB RAM, three 300GB
SAS HDD, two Intel Xeon E5649 2.53GHz processors with six cores apiece (12 cores
total); IBM QLogic 8GB FC Dual Port HBA for System x.
Network — QLogic SANbox 5800 series 8G FC dual port switch (for the comparisons
with the EMC VNX system); Cisco MDS-9148 (for the comparisons with the NetApp
system); all clients connected to the same FC switch as the storage array, with all
client ports zoned with all storage ports, and all LUNs masked to all hosts for
flexibility in testing.
Software — Included: Oracle v11g running on Red Hat Enterprise Linux 6X (1.2TB
data set); VMware vSphere 4 or 5 with ESXi 4 or 5 (800 GB data set); Microsoft
Windows 2008 Server R2 Standard Edition SP1, Exchange 2010 configured with two
hosts, one as Edge Server performing other Exchange roles (1.6 TB data set);
Microsoft Jetstress Version Multiple Path I/O (MPIO) 2; Data ONTAP 8.17-mode (for
the NetApp storage system).
Test Objectives
Edison conducted two separate tests with the goal of determining and comparing ease of
configuration, immediate compression rates and compression over time, performance,
and transparency.
Edison considered two types of transparency in the test environment: configuration and
operational transparency. Configuration transparency is determined by whether
changes need to be made to server and storage array configurations in order to use the
appliance. Operational transparency, an extension of configuration transparency, is
determined by assessing whether or not the appliance is visible to the applications
running on the host and, in turn, visible to end users.
Edison: Evaluation of Block Storage Compression Technology Page 19
All testing was conducted using block-level data on the respective platforms, as follows:
EMC VNX with and without compression
Note: Following the results of Test One —Microsoft Exchange Transactions
Measured with Jetstress— it was determined that EMC VNX with compression is
entirely impractical for all compressible data other than that which is to be
completely dormant or permanently archived. Edison concluded that the EMC
VNX Series compression is unsuitable for the sort of mixed-environment
production data environment targeted by the product itself, and which this study
was intended to address. Therefore, no further testing on the EMC VNX 5500
was conducted for the remainder of the study.
NetApp FAS 3240 with and without compression (two sets of test runs made on the
NetApp FAS 3240 for each).
IBM Storwize V7000 with Real-time Compression.
The tests are described in Table 4 below.
Table 4: Test Cases
Test Name Description
1 Exchange
Compare compression efficiency, average rate of
compression, and total process time for a Microsoft
Exchange data workload.
2 TPC-C
Compare compression rates in various scenarios, including:
OLTP/database use-case (test application response time)
Compression rates