+ All Categories
Home > Documents > IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

Date post: 12-Sep-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
13
Sensitivity-Based Optimization of Disk Architecture Sriram Sankar, Yan Zhang, Sudhanva Gurumurthi, Member, IEEE, and Mircea R. Stan, Senior Member, IEEE Abstract—Many applications, especially those that run on servers, are I/O intensive and therefore require high-performance storage systems. These high-end storage systems consume a large amount of power, the bulk of which is due to the disk drives. Optimizing disk architectures is a design-time, as well as a run-time, issue, and requires performance and power trade-offs. A hard disk designer needs to balance between the disk rotational speed (rotations per minute, RPM), platter sizes, and the number of platters. The RPM and platter sizes affect performance, and all three have an impact on power. A data center manager might have specific energy budgets within which she has to extract as much performance as possible. Applications themselves may have specific optimization requirements. Therefore, there are different figures of merit, such as performance and energy, and a large space of design and runtime “knobs” that can be used to optimize disk drive behavior. Given such a large space, it is desirable to have a systematic methodology to optimally set these knobs to satisfy the figures of merit as efficiently as possible. In this paper, we present the Sensitivity-based Optimization methodology for Disk Architectures (SODA), which leverages results previously obtained in digital circuit design optimization scenarios. Using detailed models of the electromechanical behavior of disk drives, and a suite of realistic workloads, we show how SODA can aid in design and runtime optimization of disk drive architectures. Index Terms—Disk drives, storage, power, performance, optimization. Ç 1 INTRODUCTION W E are in the era of data-centric computing. Many applications deal with large data sets that need to be processed with a low turnaround time. Given the data- intensive nature of these applications, the storage system plays a key role in determining their performance. Several enterprise class applications, such as online transaction processing (OLTP), online analytical processing (OLAP), and web-services, are I/O intensive and therefore require a high-performance storage system. The performance of a storage system is largely determined by the disk drives. Nowadays, disk drives are also widely used in consumer electronics platforms like gaming systems and portable music devices, and each of these systems have unique performance, energy, and form-factor requirements. For the disk drive designer, optimizing disk drives involves capacity, performance (in particular, the data rate), and power trade-offs [10]. The capacity is increased by using larger platters or more of them; but the larger platters increase the viscous heating (i.e., air friction due to the rotating platters) inside the drive by nearly the fifth power, and adding more platters causes the power dissipation to increase linearly [30]. The data rate of the disk drive can be increased by improvements in the linear density (which had been growing exponentially at a rate of 30 percent per year [13], resulting in an equivalent “Moore’s Law” for disk drives) and the rotational speed of the platters (which is expressed in rotations per minute or RPM). However, since the RPM has a nearly cubic relation to the viscous dissipation, increasing the rotational speed causes even more heat to be generated. Since power consumption is a major issue in data centers, and temperature has a significant impact on reliability [12], the designer needs to meet the performance or capacity targets without increasing the heat dissipation. The performance improvements within this power-constrained design space can be obtained through a combination of improvements in the magnetic recording density and structural changes to the disk drive. The structural changes can involve shrinking the platters to reduce the power and taking advantage of this reduction in power to ramp up the RPM, thereby boosting the data rate. Although such a design approach was successfully used for nearly the past two decades, there are now a number of fundamental scalability limits affecting magnetic recording technology (e.g., the superparamagnetic limit [4], difficulty in lowering the fly-height of the head), while increasing the RPM more aggressively to mask out these problems poses serious thermal challenges [10]. Besides the design-time approach, an effective optimiza- tion of disk drive behavior should also be performed during the deployment and use of the storage system. For example, a data center typically has specific energy constraints based on the electricity supply and capabilities of the cooling system in the building. Since disks are used in large numbers in server IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1 . S. Sankar is with Microsoft Corporation, One Microsoft Way, Redmond, WA 98052. E-mail: [email protected]. . Y. Zhang is with Qualcomm, 5775 Morehouse Drive, San Diego, CA 92121. E-mail: [email protected]. . S. Gurumurthi is with the Department of Computer Science, University of Virginia, 151 Engineer’s Way, P.O. Box 400740, Charlottesville, VA 22904-4740. E-mail: [email protected]. . M.R. Stan is with the Charles L. Brown Department of Electrical and Computer Engineering, University of Virginia, Thornton Hall E209, 351 McCormick Road, PO Box 400743, Charlottesville, VA 22904-4743. E-mail: [email protected]. Manuscript received 4 June 2007; revised 17 May 2008; accepted 7 July 2008; published online 6 Aug. 2008. Recommended for acceptance by A. George. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TC-2007-06-0204. Digital Object Identifier no. 10.1109/TC.2008.135. 0018-9340/09/$25.00 ß 2009 IEEE Published by the IEEE Computer Society This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. Authorized licensed use limited to: University of Virginia Libraries. Downloaded on November 14, 2008 at 13:55 from IEEE Xplore. Restrictions apply.
Transcript
Page 1: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

Sensitivity-Based Optimizationof Disk Architecture

Sriram Sankar, Yan Zhang, Sudhanva Gurumurthi, Member, IEEE, and

Mircea R. Stan, Senior Member, IEEE

Abstract—Many applications, especially those that run on servers, are I/O intensive and therefore require high-performance storage

systems. These high-end storage systems consume a large amount of power, the bulk of which is due to the disk drives. Optimizing

disk architectures is a design-time, as well as a run-time, issue, and requires performance and power trade-offs. A hard disk designer

needs to balance between the disk rotational speed (rotations per minute, RPM), platter sizes, and the number of platters. The RPM

and platter sizes affect performance, and all three have an impact on power. A data center manager might have specific energy

budgets within which she has to extract as much performance as possible. Applications themselves may have specific optimization

requirements. Therefore, there are different figures of merit, such as performance and energy, and a large space of design and runtime

“knobs” that can be used to optimize disk drive behavior. Given such a large space, it is desirable to have a systematic methodology to

optimally set these knobs to satisfy the figures of merit as efficiently as possible. In this paper, we present the Sensitivity-based

Optimization methodology for Disk Architectures (SODA), which leverages results previously obtained in digital circuit design

optimization scenarios. Using detailed models of the electromechanical behavior of disk drives, and a suite of realistic workloads, we

show how SODA can aid in design and runtime optimization of disk drive architectures.

Index Terms—Disk drives, storage, power, performance, optimization.

Ç

1 INTRODUCTION

WE are in the era of data-centric computing. Manyapplications deal with large data sets that need to be

processed with a low turnaround time. Given the data-intensive nature of these applications, the storage systemplays a key role in determining their performance. Severalenterprise class applications, such as online transactionprocessing (OLTP), online analytical processing (OLAP),and web-services, are I/O intensive and therefore require ahigh-performance storage system. The performance of astorage system is largely determined by the disk drives.Nowadays, disk drives are also widely used in consumerelectronics platforms like gaming systems and portablemusic devices, and each of these systems have uniqueperformance, energy, and form-factor requirements.

For the disk drive designer, optimizing disk drives

involves capacity, performance (in particular, the data rate),

and power trade-offs [10]. The capacity is increased by using

largerplatters ormoreof them;but the largerplatters increase

the viscous heating (i.e., air friction due to the rotatingplatters) inside the drive by nearly the fifth power, andaddingmoreplatters causes thepowerdissipation to increaselinearly [30]. The data rate of the disk drive can be increasedby improvements in the linear density (which had beengrowing exponentially at a rate of 30 percent per year [13],resulting in an equivalent “Moore’s Law” for diskdrives) andthe rotational speed of the platters (which is expressed inrotations per minute or RPM). However, since the RPM has anearly cubic relation to the viscous dissipation, increasing therotational speed causes evenmore heat to be generated. Sincepower consumption is a major issue in data centers, andtemperature has a significant impact on reliability [12], thedesigner needs to meet the performance or capacity targetswithout increasing the heat dissipation. The performanceimprovements within this power-constrained design spacecan be obtained through a combination of improvements inthe magnetic recording density and structural changes to thedisk drive. The structural changes can involve shrinking theplatters to reduce the power and taking advantage of thisreduction inpower to rampup theRPM, thereby boosting thedata rate. Although such a design approach was successfullyused for nearly the past two decades, there are now anumber of fundamental scalability limits affecting magneticrecording technology (e.g., the superparamagnetic limit [4],difficulty in lowering the fly-height of the head), whileincreasing the RPM more aggressively to mask out theseproblems poses serious thermal challenges [10].

Besides the design-time approach, an effective optimiza-tion of disk drive behavior should also be performed duringthe deployment and use of the storage system. For example, adata center typically has specific energy constraints based onthe electricity supply and capabilities of the cooling system inthe building. Since disks are used in large numbers in server

IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

. S. Sankar is with Microsoft Corporation, One Microsoft Way, Redmond,WA 98052. E-mail: [email protected].

. Y. Zhang is with Qualcomm, 5775 Morehouse Drive, San Diego, CA92121. E-mail: [email protected].

. S. Gurumurthi is with the Department of Computer Science, University ofVirginia, 151 Engineer’s Way, P.O. Box 400740, Charlottesville, VA22904-4740. E-mail: [email protected].

. M.R. Stan is with the Charles L. Brown Department of Electrical andComputer Engineering, University of Virginia, Thornton Hall E209,351 McCormick Road, PO Box 400743, Charlottesville, VA 22904-4743.E-mail: [email protected].

Manuscript received 4 June 2007; revised 17 May 2008; accepted 7 July 2008;published online 6 Aug. 2008.Recommended for acceptance by A. George.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TC-2007-06-0204.Digital Object Identifier no. 10.1109/TC.2008.135.

0018-9340/09/$25.00 � 2009 IEEE Published by the IEEE Computer Society

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Authorized licensed use limited to: University of Virginia Libraries. Downloaded on November 14, 2008 at 13:55 from IEEE Xplore. Restrictions apply.

Page 2: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

systems (e.g., RAID arrays), they are significant consumers ofpower, and simultaneously stress the cooling system. Thedata center manager would therefore like to maximize theperformance within these energy constraints. The applica-tions that run on these systems might themselves have avariety of characteristics and requirements. For example,OLTP applications (e.g., TPC-C [35]) tend to transfer smallchunks of data and do more random I/O. In this case, wewould like to focus our optimization efforts on the perfor-mance (or energy) of disk seeks. On the other hand, for a videoserver, we want a constant data transfer rate that is goodenough toprovide thedesiredplayback speed. In terms of thestorage system, this requirement translates to having enoughdisk RPM to achieve the desired data rate to stream the videoand also minimizing disk seeks to facilitate having a steadystream of data. Finally, there can be a number of otherarchitecture alternatives to achieve energy and performancetargets. For example, to achieve a certain throughput, wecould increase the RPM of the disk [9], [3].

In order to optimize disk drives, it is important to firstunderstand the key figures of merit (i.e., the objectives andconstraints in the optimization) and the knobs that can beused (the variables in the optimization). There are a numberof figures of merit for disks, such as performance (boththroughput and latency), power, form factor, capacity, andcost. There are also a variety of knobs, some of which areusable at design time (static knobs) and others that couldpotentially be varied at runtime (dynamic knobs). Static knobsinclude the number of platters and their size, and thecharacteristics of the spindle motor (SPM) and the voice coilmotor (VCM), which are used to rotate the platters, andmove the disk arms, respectively. Dynamic knobs includethe voltages for the SPM and VCM, which can be used totrade off performance and power by slowing down orspeeding up the platter rotation and the seek time. Given thislarge optimization space consisting of different figures ofmerit, static and dynamic knobs, it is desirable to have asystematicmethodology to guide us in optimally setting theseknobs for a given set of optimization goals and constraints.

In this paper, we present the Sensitivity-based Optimiza-tion of Disk Architecture (SODA) framework. Compared toother optimization methods, sensitivity-based optimization(originally proposed for energy-delay optimization incircuit design [11], [22], [45]) recognizes that the optimaltrade-off between power and performance is not uniquelydefined, but rather depends on the actual level of desiredperformance or acceptable power consumption. For exam-ple, if we start with an optimal base case that has a givenperformance and power consumption, we would like toknow what would be the minimum amount of extra powerconsumption to be able to double the performance. In orderto get that doubling in performance we would need to varysome of the available design knobs (e.g., increase the supplyvoltage). Thus, instead of a unique optimal design point, inreality, there is an entire series of points where one canoptimally trade off power for performance. The sensitivityanalysis approach provides a formal way to identify thesepoints by first calculating the ratio of energy to delaysensitivities (partial derivatives) with respect to each knobthat the designer can use for the optimization, and thenmaking sure that all those sensitivity ratios are equal [11],[22], [45]. Indeed, there are inherent similarities between

circuit design optimization and those for disk drives. Forexample, the energy-delay product (which is a measure ofenergy and delay of a system at any given point) that isused in circuits is similar to the energy-(1/throughput)product for disk drives. There are also several othersimilarities between the two types of systems such asbetween the spinning up of a motor and the charging of acapacitor, electric charge energy stored on a capacitor andmagnetic field energy stored on a spinning motor, leakagecurrent in CMOS circuits and DC motor current losses,Dynamic Voltage Scaling (DVS) and Dynamic RPM (DRPM)[9], and various power modes (e.g., active, idle, sleep).Having such mappings facilitates modeling and under-standing, as well as optimization, in one field (disk drives inthis case) by reusing and applying the large body ofknowledge developed in another (e.g., circuit design). Asanother example of the successful use of a mappingbetween two different domains, modeling the temperaturebehavior of microprocessors by an electrical (rather thanthermal) circuit model proved to be both efficient andaccurate [31] and has paved the way for computer architectsto conduct research in that topic. We intend to do the samefor architects interested in disk drive architecture researchwith the SODA methodology.

In this paper, we make the following contributions:

1. We develop detailed and parameterized disk drivemodels for two key figures of merit, namely,performance and energy. We show the inherentsynergy between circuit optimization and disk driveoptimization. Using these models, we explain theSODA methodology.

2. Using a set of real workloads and a variety of staticand dynamic knobs, we show how SODA canfacilitate disk drive architecture optimization.

3. We present an online power management algorithmbased on SODA. We show that this algorithm canreduce the energy consumption of the storagesystem by 20.6 percent on the average for a set ofcommercial server workloads while meeting perfor-mance goals.

The organization of the rest of this paper is as follows:Section 2 reviews the related work, and Section 3 presentsan overview of disk drive architecture and the sensitivity-based optimization methodology. We present the detaileddisk drive models in Section 4. The details about theexperimental setup and workloads used in this study aregiven in Section 5, and Section 6 explains how the workloadparameters are used by the SODA model. Section 7 presentsthe results, and Section 8 concludes this paper.

2 RELATED WORK

Optimizing disk drives has been widely studied from boththe performance and power viewpoints. Disk drive leveloptimizations include disk arm scheduling [39] and datalayout optimizations [27], [1], [15] to improve seek behavior,techniques to boost the bandwidth of the storage systems byusingmultiple disks to formRAID arrays [24], anddisk cacheoptimizations [16], [2]. The power optimizations for laptop/desktop systems include simple spin-down-based schemes[20], [6] that exploit idleness in the I/O access stream and

2 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Authorized licensed use limited to: University of Virginia Libraries. Downloaded on November 14, 2008 at 13:55 from IEEE Xplore. Restrictions apply.

Page 3: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

techniques to increase the available idleness via the use ofprefetching and caching [25]. In the context of servers,multispeed disk drives (called DRPM) have been proposed[9], [3]. There have been several studies on usingDRPMdisksin conjunction with data clustering to facilitate disk spin-downs [26], caching [42], [43], andmeetingperformancegoalswhile attempting to maximize energy savings [21], [44].

There has also been prior work on modeling the physicalbehavior of disk drives. The Physical Effect Modelingapproach [38] captures the physical phenomena that occurwithin and between electromechanical devices using equa-tions that capture electrical, mechanical, and electromecha-nical phenomena. This technique has been used to modelthe SPM and VCM [5] of a hard disk. The disk drive modelsthat we present in Section 4 are equivalent to the physicaleffect approach. The SODA methodology that we present inthis paper provides a framework to craft policies to achievespecific energy-performance trade-offs, using these powermanagement techniques as the underlying control mechan-isms. We present one such policy in Section 7.2.1.

Performance modeling of disk drives has beenstudied extensively [18], [29] and have resulted in detailedI/O simulation tools, such as Disksim [7], which we usein this study. There have also been studies on modelingthe power consumption [40] and temperature [17] ofstorage systems.

Sensitivity-based optimization has been proposed forenergy delay optimizations in circuits [11], [22], [45]. Theconstraint-based optimization methodology has been ap-plied in the past for generating schedules for applying DVSin real-time systems [41] to reduce energy consumption.

3 OVERVIEW

3.1 Hard Disk Drives

A Hard Disk Drive (HDD) is an electromechanical magneticstorage device, whose activities are controlled and coordi-nated by digital controllers and buffers. The three mainpower dissipaters in a HDD are the SPM, which is used torotate the platters, the VCM that moves the disk arms, andthe onboard electronics. When the disk is spinning, but notservicing any requests, it is said to be in an idle power mode,and most of the power consumed is by the SPM. When arequest comes to the disk drive and a physical seek isneeded, the VCM has to be activated, and the disktransitions into the seek power mode. The actual transferof bits between the magnetic media and the electronicbuffers in the drive takes place when the drive is in theactive mode, where the read/write channel (also called thedata channel) is enabled and leads to additional powerconsumption. When the disk is idle for long periods of time,further power savings can be obtained by spinning downthe SPM, thus putting the disk into the sleep mode.

Designing disk drives involves trade-offs betweencapacity, performance, and power. The capacity of a diskdrive can be increased through a combination of largerplatters, and more of them. The number of platters and theirsize affects the heat that is generated inside the disk drive(due to viscous dissipation) by a linear factor, and by nearlythe fifth power, respectively [30]. The data rate can beincreased by improvements in the linear density (expressedin bits per inch or bpi) and/or increases in the RPM. Thelatter causes the generated heat to increase by nearly a cubic

factor. In order to ensure reliability, one of the requirements

in disk drive design is to always keep the operatingtemperature below a particular threshold, known as the

thermal envelope [12]. Excess heat from the disks canpreheat the air around other components and vice-versa.

Given the high costs associated with cooling modernelectronic systems [37], it is important that disk drives do

not further increase this burden.

3.2 Sensitivity-Based Optimization

In order to introduce sensitivity-based optimization for diskdrives, we briefly present the formalism behind the method

of “true power optimization” [11], [22]. This method is the

culmination of a series of attempts in the low-power circuitdesign community to come up with an “ideal” figure of

merit for power-aware design [33], [45]. The main result isthat there is really no single optimal point in the design

space, but rather an entire series of such points thatoptimally trade off power for performance; and that the

way to identify these points is by calculating the energy-to-

delay sensitivity ratios with respect to each “design knob”that the designer can use for optimization, and making sure

that all those sensitivity ratios are equal.The method assumes that there are two dimensions

(figures of merit) in the design space, Energy and Delay,and the cost function is the Energy that needs to be

minimized for a given Delay constraint (it is interesting tonote that the final result is exactly the same if the roles of

cost and constraint are reversed). To simplify thediscussion, let us assume that there are only two knobs

that the designer can use in the optimization, let us call

them x and y (for example, these can be the supplyvoltage for the SPM and for the VCM), such that both E

and D are functions of x and y. The optimization problemcan then be stated formally this way

minEðx; yÞ such that Dðx; yÞ ¼ D0:

The constraint D ¼ D0 implies

dD ¼ @D

@xdxþ @D

@ydy ¼ 0: ð1Þ

Simple algebraic manipulation of the above leads then to

dy

dx¼ �

@D@x@D@y

: ð2Þ

What this equation means, is that the two “knobs” x

and y are no longer independent of each other; they are now

related such that Dðx; yÞ ¼ D0. This also means that Eðx; yÞbecomes now a function of just one independent variable,

and in order to minimize it, we can write

dE

dx¼ @E

@xþ @E

@y

dy

dx¼ 0: ð3Þ

Finally, substituting (2) into (3), and doing a simplealgebraicmanipulation, we get themain result of themethod

of true power optimization which states that the ratio ofsensitivity of change in energy (E) with respect to change in

delay (D) for knob x has to be the same as for knob y:

SANKAR ET AL.: SENSITIVITY-BASED OPTIMIZATION OF DISK ARCHITECTURE 3

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Authorized licensed use limited to: University of Virginia Libraries. Downloaded on November 14, 2008 at 13:55 from IEEE Xplore. Restrictions apply.

Page 4: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

@E@x@D@x

¼@E@y

@D@y

: ð4Þ

Although the discussion above concerned energy anddelay, the same result would be obtained for any other pairof dimensions in the design space; similarly, the choice ofthe type of knobs depends on the design; and the resultextends to any larger number of knobs. Also interesting, theexact same result would be obtained if the role of E and Dwere reversed and the delay was to be minimized under aconstant energy constraint (this suggests that some of thedistinctions made in the community between the areas oflow-power and power-aware design may be unnecessary).

4 MODELING A HARD DISK DRIVE

Hard disk systems can be divided into two parts: electro-mechanical components, which include the SPM and VCM,and electronic components, which include the data channel(I/Os), controllers, digital-to-analog converter (DAC), mi-croprocessor, and RAM. Fig. 1 shows a simplified blockdiagram of an HDD.

The SPM and VCM are DC motors. A first-ordermathematical model of a DC motor can be developed byconsidering the electrical and mechanical constituents of thesystem separately, then combining them. In a DC motor, theback-emf voltage ðVbÞ is proportional to the angularvelocity ð!Þ of the motor. Thus, the voltage ðVaÞ applied tothe motor is given by the following equation:

Va ¼ Ia � Ra þ Vb ¼ Ia �Ra þ kg � !; ð5Þwhere Ia is the armature current, Ra is the windingresistance of the armature, and kg is the motor voltageconstant. The output torque T of the motor is proportionalto the armature current, i.e., T ¼ kt � Ia, where kt is thetorque constant (which equals the voltage constant in mostcases). This provides a connection between the mechanical

response and the electrical behavior of the motor. In thisparticular setup, the output torque is used to overcome themotor inertia and frictional drag. The equilibrium equationof the rotor system is

T ¼ kt � Ia ¼ J � d!dt

þ b � !�; ð6Þ

where J is the inertia of the rotating parts, b is the rotationalviscous coefficient, and � is the coefficient which dependson the angular velocity: when the velocity is low, frictionaldrag is viscous in nature and � is equal to 1; when thevelocity is high, � becomes 2 due to the turbulent flow [30].

In this paper, we consider � equal to 2 for SPM since itrotates at high velocity, thus (6) becomes a nonlineardifferential equation. Combining (5) and (6), the steady-statesolution for the SPMmust satisfy the following equation:

b!2 þ ktkgRa

!� ktVa

Ra¼ 0: ð7Þ

We can now derive the steady-state angular velocity of

the SPM as follows:

!SPM ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiktkgRa

� �2þ 4ktVab

Ra

r� ktkg

Ra

2b: ð8Þ

Assuming that the SPM is rotating at constant speed, thesteady-state power of the SPM to overcome the friction andwindage loss can be expressed as follows [30]:

PSPM ¼ n � bSPM � !2:8SPM; bSPM ¼ �

2� � � Cd � r4; ð9Þ

where n is number of platters in the disk, bSPM is theviscous friction coefficient for a flat platter, � is the densityof air, Cd is the drag coefficient (which equals 0.005 for a flatplatter), and r is the radius of the platter.

4 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009

Fig. 1. Block diagram of hard disk systems (adapted from [32]).

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Authorized licensed use limited to: University of Virginia Libraries. Downloaded on November 14, 2008 at 13:55 from IEEE Xplore. Restrictions apply.

Page 5: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

For theVCM,� is set to 1, since itmoves at slowspeed, thus

in this case, (6) becomes a linear differential equation. If weagain combine (5) and (6), but this time for the VCM, we get

the mechanical response of the VCM system as follows:

!ðtÞ ¼ !VCM � 1� e�t�

� �; ð10Þ

where � is the time constant of the VCM system, and !VCM

is the maximum speed of the VCM, given by

� ¼ J

bþ kgktRa

!VCM ¼K � Va; K ¼ 1

kg þ bRa

kt

:

ð11Þ

For the seek operations, we use the model described by

Kim et al. [17]. A long seek operation normally involves anacceleration phase, followed by a coasting phase of constant

velocity, and then by a deceleration phase. The averagedistance Dseek is typically equal to a seek across one-third ofthe data zone and, in general, does not involve a coasting

phase. Because of that, we assume that for an average seek

operation, the VCM is accelerated from 0 to a maximum

velocity Vmax and then immediately decelerated to 0 (thedeceleration phase taking the same amount of time as the

acceleration phase) with no coasting time, as shown inFig. 2b. If the seek distance is less than the average seek

distance Dseek, the VCM velocity will not reach themaximum velocity Vmax, as shown in Fig. 2a. If the seekdistance is larger than the average seek distance Dseek, after

the acceleration phase, there will be a coasting phase beforethe deceleration, as shown in Fig. 2c.

Since the power of the motor is given by T � !, where T isthe torque and ! is angular velocity, the energy consump-tion for one seek operation can be derived as follows:

EVCM ¼Ztacc0

T � !dt

¼Z!VCM

0

n � JVCM � !d!þZtacc0

n � bV CM � !dt;ð12Þ

where n is number of platters, !VCM is themaximum angularvelocity of the VCM, JVCM is the inertia of the arm actuator,which is proportional to r3 (r is the radius of the platter,assuming that the length of the arm actuator is � 2r), andbV CM is the friction coefficient of the arm actuator, which isproportional to r2. Combined with (10), we can now derivethe energy for one seek operation as follows:

EVCM ¼ n � JVCM � !2V CM

2þ n � bV CM � !VCM

3: ð13Þ

The average seek time and average VCM power can beexpressed by

tseek ¼ 2Davg

!VCM; PVCM ¼ EVCM

tseek; ð14Þ

where Davg is the average angular seek distance (which is�1/12 with the previous assumptions that the length of theVCM arm is 2r and that the average seek is 1/3 of theplatter size).

From the analysis above, we can finally obtain the totalenergy for an HDD running for a time period t0 (duringwhich only one seek occurs) as

E ¼ PSPM � t0 þ PVCM � t2 þ Ec; ð15Þwhere t2 is the actual seek time, and Ec is the energyconsumption for the electronic part of the disk system(which can be approximated as �40 percent of total systemidle power [32]).

In order to do the sensitivity-based optimization, we alsoneed to model the performance of hard disk. Thethroughput of a hard disk can be expressed as

TP ¼ B

trot þ tseek þ ttrans; ð16Þ

where B is number of bits for each transfer, and trot is therotational latency (equal to �=!SPM ). We assume that, onaverage, the SPM rotates half-circle in order to reach thedesired seeking position, and ttrans is the actual time to readthose bits, and is given by

ttrans ¼ B

bpi � !spm � 34 r; ð17Þ

SANKAR ET AL.: SENSITIVITY-BASED OPTIMIZATION OF DISK ARCHITECTURE 5

Fig. 2. Short, average, and long seek operation for the VCM. (a) Seek distance < Dseek. (b) Seek distance ¼ Dseek. (c) Seek distance > Dseek.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Authorized licensed use limited to: University of Virginia Libraries. Downloaded on November 14, 2008 at 13:55 from IEEE Xplore. Restrictions apply.

Page 6: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

where bpi is the linear density (i.e., the number of bits stored

per inch on a track on the platter). We assume that, on

average, a seek operation occurs at the middle track of the

data zone,which is at approximately 3/4 of theplatter radius.

5 EXPERIMENTAL SETUP AND WORKLOADS

In order to conduct experimental evaluations, we use a set

of commercial workload traces. The details of these work-

loads and the configuration of the storage system on which

they were collected are given in Table 1. Financial and

Search-Engine are I/O traces collected at a large financial

institution and at a popular Internet search-engine, respec-

tively. The Openmail trace was obtained from [23], and the

OLTP and Search-Engine traces were downloaded from

the University of Massachusetts Trace Repository [36]. The

TPC-C trace was collected on a two-processor SMP machine

running the IBM DB2 EEE database engine. The TPC-C

benchmark was run for a 20-warehouse configuration with

eight clients. The TPC-H trace was collected on an eight-

processor IBM Netfinity SMP machine with 15 disks and

running the IBM DB2 EE edition. The TPC-H benchmark

was run in the power test mode, in which the 22 queries of

the benchmark are executed consecutively.

The parameters that are used in the analytical hard diskpower models that we developed are obtained for theconfigurations shown in Table 1 by running these traces onthe Disksim storage system simulator [7], which models theperformance aspects of disk drives, caches, and intercon-

nects, in a fairly detailed manner. We augmented Disksimwith our power models for implementing a dynamic powermanagement algorithm. We validated these power modelsagainst data from real disk drives datasheets. The powerconsumed in seek, rotational latency, transfer, and idlephases of disk operation were calculated separately, and thepower consumed by each subsystem component wasverified with actual power numbers from datasheets.

The workload traces also output the actual disk opera-

tion parameters such as seek time, data transfer time, idletime, rotational latency, number of transfer blocks, numberof total physical seeks, number of zero distance seeks,single-cylinder seek time, average seek time, and full-strobeseek time. The parameter values for the workloads areshown in Table 2.

6 ADAPTING THE HARD DISK MODEL

The disk model described in Section 4 was based on theaverage case. In order to get results specific for each

6 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009

TABLE 1Workloads Used and Their Storage System Configurations

TABLE 2Workload I/O Characteristics

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Authorized licensed use limited to: University of Virginia Libraries. Downloaded on November 14, 2008 at 13:55 from IEEE Xplore. Restrictions apply.

Page 7: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

workload, the input parameters of the model need to bescaled based on theworkload characteristics. For example, inSection 4,we assumed that seeks showaverage-case behaviorand that the average angular seek distance is 1/12 (assumingthe disk arm at 2r). We also assumed that the rotationallatency is �=!SPM (i.e., the SPM always rotates half circle inorder to reach the desired transfer position); andwe assumedthat, on average, data transfers occur around themiddle trackof the data zone, or at 3/4 of the platter radius. However, forvarious workloads, physical seeks will vary from single-cylinder seeks to full-stroke seeks; also the rotational latencyand the data transfer cylinder will change depending uponhow data is laid out on disk. In order to perform workload-specific optimization,weuse data from theworkloadprofilesin order tomodify themodel to aworkload-specific case. Theparameters that are used for adapting the hard disk modelare obtained by running the workload traces on Disksim.The online algorithm presented in Section 7, however, usesreal-time data from the simulation. Seek times and seekdistances are obtained at sample intervals during the work-load run in Disksim.

Assuming the probability of full-strobe seeks to bevery small, approximately equal to zero (supported byobserving the actual workloads), we use (18) to obtain thepercentage of single-cylinder seeks and average seeks(psingle and pavg) as

nsingle þ navg ¼ #total disk seeks;

Psingle � tsingle þ Pavg � tavg ¼ seek time:ð18Þ

While the maximum angular velocity of SPMð!SPM workloadÞ can be found in Table 1 for each workload,the maximum angular velocity of the VCM is still unknown.Furthermore, since some of the seek operations involve onlya single cylinder of physical arm traversal, we also need tofind out the maximum VCM speed of a single-cylinder seek.To find out the VCM speed for a workload-specific averageseek, we can use (14) to derive the maximum VCM speed as

!VCM avg workload ¼ 2Davg

tavg; ð19Þ

where Davg is the average angular seek distance which isequal to 1/12, and tavg is the average seek time which can beobtained from Table 2. The maximum VCM speed of asingle-cylinder seek can be scaled from the average VCMspeed as follows:

!VCM single workload ¼ !VCM avg workload � tsingletavg

: ð20Þ

In Section 4, we assumed a rotational latency of �=!SPM ,which means that the SPM always rotates half circle inorder to reach the desired seeking position. To calculate theworkload-specific rotational angle, we use the followingequation:

rotational angle ¼ !SPM workload � trotational latency: ð21ÞTo obtain the workload-specific seek position, we use the

following equation:

seek pos ¼ B

bpiworkload � rworkload � !SPM workload � ttransfer : ð22Þ

The time for single-cylinder and average seek operationscan be scaled for each workload as

tactual single ¼ tsingle � rworkload � !VCM avg workload

ractual � !VCM actual;

tactual avg ¼ 2Davg

!VCM actual:

ð23Þ

To simplify the calculation, we use the ratio of maximumVCM speed of an average seek, instead of that of a singleseek when calculating the actual single-cylinder seek time.

Thus, the actual workload-specific seek time can bederived as

tactual seek ¼ Psingle � tactual single þ Pavg � tactual avg: ð24ÞThe actualworkload-specific rotational latency is given by

tactual rot ¼ rotational angle

!SPM actual: ð25Þ

And the actual workload-specific transfer time is given as

tactual transfer ¼ B

bpiactual � ractual � !SPM actual � seek pos: ð26Þ

The average spindle power is still calculated using (9)by replacing !SPM with the workload-specific SPM speed!SPM actual. The average seek power is slightly differentfrom that in Section 4, since now we consider both single-cylinder seek and average seek. The average power of asingle-cylinder seek can be calculated using (13) byreplacing !VCM with !VCM single actual. The workload-specific energy of the VCM can be calculated as

EVCM ¼ Psingle � EVCM single þ Pavg � EVCM avg: ð27ÞFinally, the throughput is calculated using (16) using the

new parameters:

TP ¼ B

tactual rot þ tactual seek þ tactual transfer: ð28Þ

7 SIMULATION RESULTS

SODA can be used in two ways: 1) for design spaceexploration, studying the impact of variations in the staticand dynamic knobs on the figures of merit of disks, and2) as a tool for developing online policies for controlling thedynamic knobs efficiently. Using the workloads describedin Section 6, we show how SODA can be used in both ofthese ways.

7.1 Using SODA for Design Space Exploration ofDisk Drive Knobs

In the first experiment using SODA, we investigate the effectof varying two dynamic knobs, namely, the voltages of theSPM and VCM, for all combinations of three different plattersizes: 1:800, 2:600, and 3:300; and 1, 2, and 4 platters/disk,resulting in nine distinct disk drive organizations withcapacities ranging from 76 to 1,026 Gbytes. However, notethat we use summary characteristics of the workloads, andthis is an offline evaluation using SODA. For each configura-tion, we apply the SODA methodology to find the optimaltrade-off between energy and performance. We conductedthe experiment for all fiveworkloads but, for clarity,we show

SANKAR ET AL.: SENSITIVITY-BASED OPTIMIZATION OF DISK ARCHITECTURE 7

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Authorized licensed use limited to: University of Virginia Libraries. Downloaded on November 14, 2008 at 13:55 from IEEE Xplore. Restrictions apply.

Page 8: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

results only for Openmail; the results for the otherworkloadsare similar. Each curve in Fig. 3 corresponds to one of the ninedisk drive configurations and represents the Pareto-optimal(i.e., equal sensitivity ratios, except when a knob has reachedits maximum).

This design space exploration study provides severalinsights into the static design, aswell as thedynamicbehaviorof disk drives. First, as expected, the curves corresponding tothe highest and lowest capacity disk drives are the farthestand closest to the origin, respectively. However, betweenthese two extremes, the design space shows some interestingtrade-offs. For instance, a 2:600 disk drive with four plattersprovides roughly the same energy-performance trade-offs asa 3:300 disk with a single platter, but the capacity of theformer is significantly higher (637 Gbytes) than the latter(256 Gbytes). Therefore, given a power and performancetarget, thedesigner canmakeuseof such information to targetthe same disk drive to multiple segments of the market.

Another way to utilize SODA is to choose the mostdesirable configuration (from a power-performance point ofview) that satisfies a given capacity target. For example, wecan observe two configurations that provide approximatelythe same capacity (a 1:800 disk drive with two platters anda 2:600 drive with one platter). However, the curve for the1:800 disk drive is closer to the origin, thereby being the moredesirable design choice.

A key observation from Fig. 3 is that the disks with thelarger platters can operate over a larger dynamic range in theEversus 1/TPdesign space. Eachpoint in this dynamic rangecorresponds to a particular setting of the SPM and VCMvoltages. This suggests that for a disk drive that should workover a range of voltages (thus speeds), it may be better tochoose drives that use larger, rather than smaller, platters.There has been research in recent years on designing suchmultispeed disk drives [9], [3]; SODA can provide guidancefor optimizing such a drive at an early stage of the designprocess and in an application-aware manner.

Also from Fig. 3, we can see how SODA can be used tooptimize for either high performance or for low power. Inthe second graph, it can be seen that the original nominalpoint for the design (the one used in the original workload)is not on the optimal curve (it has nonequal sensitivityratios). This means that we can get the same performancebut lower power by projecting on the x-axis (LP point in thegraph), or get higher performance with the same powerconsumption by projecting on the y-axis (HP point in thegraph). Incidentally, for this case, even the Pareto-optimalcurve is suboptimal since the VCM has already reached itsmaximum speed, as can be seen from Fig. 4.

Fig. 4 shows the VCM and SPM speed setting for thePareto-optimal case from Fig. 3. The points in Fig. 4 representthe settings at which the equal sensitivity ratios are achievedfor each of the different configurations. Interestingly, from

8 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009

Fig. 3. Static design space exploration for Openmail: energy versus 1/throughput.

Fig. 4. Static design space exploration: SPM speed versus VCM speed.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Authorized licensed use limited to: University of Virginia Libraries. Downloaded on November 14, 2008 at 13:55 from IEEE Xplore. Restrictions apply.

Page 9: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

the graphs in Fig. 4, we observe significantly different trends(for optimal SPM and VCM) for Openmail and Financial.This can be explained by the fact that the Financial workloadhas significantly longer idle time compared to Openmail(111.45 versus 12.5 ms); thus, the energy-per-request inFinancial tends to be much higher than for Openmail. Thebulk of this energy is consumed by the SPM which reducesthe range of values that can be used optimally. On the otherhand, since the VCM consumes far less power than the SPM,the performance target can be recovered by performing largemodulations of the VCM speed, as shown in the y-axis of thegraphs at the bottom. However, this modulation reachessaturation due to physical limits on the VCM voltage, and isnot enough to fully compensate for the restricted RPM range;therefore, the optimal range of values for performance (thex-axis of the graphs) tends toward a higher latency forFinancial. A higher RPM could be used in order to attain ahigher performance, but this would break the optimality inthe design.

7.1.1 The Impact of Seek Time

We now show how SODA can be used as an offline analysistool, to study the performance-power trade-offs for seekoperations. From a performance viewpoint, seeks impedethe flow of data, to and from the platters, therebydiminishing the effective data rate of the disk. Disk seekoperations also exercise the VCM and therefore dissipatepower. In order to isolate the impact of seeks, we use SODAwith the same workloads as in the previous section, butwith seek times as an input variable, all the way from single-cylinder seeks to full-stroke seeks. The results of thisanalysis are given in Fig. 5. Each curve in this graphcorresponds to a particular value of the seek time.

As Fig. 5 shows, since shorter seek times benefitperformance and consume less power in the VCM, theirsensitivity curves are closer to the origin. When we look atthe speed characteristics for the Openmail and Financialworkloads, we observe that as the seek time increases from0.5 ms (single-cylinder seek), the optimal curves for higherseek times result in a higher RPM range. This is becauselonger duration seeks hurt performance—to compensatefor this, the optimization algorithm increases the disk RPM,which improves the rotational latency and the transfertime, and hence, the curves shift to the right. This trendcontinues till the seek time reaches 4.48 ms, which is theaverage seek time for the 3:300-platter disk drives. At thatpoint, the disk arm has reached its terminal velocity. Anyfurther increases in the seek time will induce coasting of thehead—this coast time causes extra power to be consumedby the SPM. In order to optimize for energy during thesecoasting periods, the optimization algorithm needs to scaleback the RPM, and hence, the curves for seek times greaterthan 4.48 ms shift to the left.

This result shows another interesting similarity betweendisk and circuit power optimization. Circuit static power isconsumed irrespective of any switching activity (mainlyleakage), while circuit dynamic power is due to switchingactivity and is therefore a function of the usage of the circuitby some workload. The SPM is always operational anddraws power irrespective of whether the disk is in idle,seek, or active modes. The VCM is active only when diskseeks are needed, which is workload dependent. The SPMand VCM power are thus similar to circuit static anddynamic power, respectively. Like in modern CMOScircuits, the power consumed by disk drives is alsodominated by the static part (SPM). Also similar to circuits,

SANKAR ET AL.: SENSITIVITY-BASED OPTIMIZATION OF DISK ARCHITECTURE 9

Fig. 5. Impact of seek time.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Authorized licensed use limited to: University of Virginia Libraries. Downloaded on November 14, 2008 at 13:55 from IEEE Xplore. Restrictions apply.

Page 10: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

just as DVS can be used to reduce leakage power in circuits,lowering the SPM voltage (which reduces the drive RPM)can reduce power consumption in disk drives.

In summary, minimizing seek times is important, bothfrom the performance and power viewpoints. The resultsindicate that there is room for designing powerful VCMs toimprove performance, since the VCM power is significantlylower than that of the SPM.

7.2 Using SODA to Craft Policies to Control theDynamic Knobs

In the previous sections, we have seen how SODA can beused as an offline analysis tool to select the optimalconfiguration and analyze the effect of disk drive knobsfor different workloads. In this section, we demonstratehow the SODA methodology can be used to develop onlinepolicies that use the dynamic knobs in the storage system.

7.2.1 Crafting a Policy

We use two dynamic knobs in this work, SPM speed (RPM)and VCM speed to illustrate how an online powermanagement policy could be crafted using SODA. Sincethe ratio of sensitivities of energy to performance withrespect to these two knobs varies over time for a givenworkload, we need to measure energy and delay periodi-cally and compute the respective sensitivities. We performperiodic measurements after every n requests to the storagesystem, which we call as the “sample window.” To computethe sensitivity ratios ð@E=@x=@D=@xÞ for each dynamicknob, we vary the knob value at that measurement instantby a small amount, above and below its current value, andcompute the energy and delay effects from our analyticalpower and performance models that are integrated with thesimulator. We then compute the sensitivities for each knoband obtain the ratio of sensitivities with respect to the twoknobs. This allows us to craft a power management policywhere our goal is to bring this ratio as close to 1 as possible(since that provides the best energy efficiency as discussedin Section 3), by going after the knob which exhibits agreater sensitivity value. For instance, if knob x has a higherð@E=@x=@D=@xÞ value and if our goal is to reduce energygiven a performance limit, then we would modulate knob xby a higher amount than knob y, since the ratio ofsensitivities show that knob x has a higher impact onenergy reduction. The power management algorithm thatwas crafted using SODA is discussed below.

7.2.2 Power Management Policy

The first step in our power management policy is toestablish the performance constraint under which thesystem should provide energy efficiency. In order to dothis, we profile the workload running on the storagesystem for k I/O requests without performing any powermanagement and calculate the average response time of theI/O requests over this window. During this phase, we setthe RPM of the disk drives to those used in their originalstorage system configurations. In our experiments, wechoose k to be the first 100,000 I/O requests of eachworkload. We use this average response time value as thebasis for the performance constraint to use in the optimiza-tion. Let us denote this constraint as R0. (Note that a datacenter manager may craft this performance constraint in adifferent way. For example, this constraint might be arrivedat through negotiations with the client whose application isto be hosted on her servers, or she may choose a different

performance metric, such as, the maximum or minimumresponse time of the I/O requests over the profilingwindow.) This algorithm allows two additional thresholdsthat specify the range of acceptable deviation in perfor-mance of the storage system: an upper threshold ðUT Þ, anda lower threshold ðLT Þ, which are expressed as apercentage. In our experiments, we chose to use 15 percentfor UT and 5 percent for LT as acceptable performancethresholds. These thresholds can be varied depending onsystem requirements by the datacenter manager. We thenmeasure the average response time of the storage systemðRT Þ every n requests and calculate the ratio of sensitivitieswith respect to each knob. Based on the values of RT , R0,UT , and LT , there are three possibilities:

. ½100ðRT �R0Þ=R0� > UT : This condition indicatesthat the storage system is operating below theacceptable level of performance, and therefore, weneed to turn up the knob settings to improveperformance.

. ½100ðRT �R0Þ=R0� < LT : This condition indicatesthat the storage system is operating at higherperformance than the desired level, and therefore,we can save energy by turning down the knobsettings.

. If the difference in the response times is between LTand UT, then no power management actions aretaken.

For the first two cases, the magnitude and direction ofthe SPM and VCM knobs are modulated based on the ratioof sensitivities that are measured in each sample window. Ifone knob exhibits a higher value in the ratio of sensitivities,then that signifies that this knob can give higher energyreductions when the second condition in the algorithm issatisfied, and hence, this knob is varied in a largerproportion to the other knob. It can be observed that bychoosing an appropriate sample window, we can make thepower management algorithm better responsive to work-load changes. Since the control of the knobs is based onsensitivity-based optimization, as discussed in Section 3, wecan achieve a better energy efficiency by using it dynami-cally during different phases of the workload.

7.2.3 Results

In this section, we present the energy and performancecharacteristics of the system using the power managementalgorithm that we designed using SODA for the fourworkloads (Financial, Search-Engine, TPC-H, and Open-mail) and compare them to the Baseline storage systemconfigurations on which the workloads were obtained. Theparameters for the Baseline configuration are given in Table 1in Section 5. For the evaluation of SODA, we assume that thedisks are similar in all respects to the Baseline configuration,except that they are multispeed disks [9] and the SPM andVCM speeds (voltages) can be modulated during runtime.We assume that RPM transitions are done in steps, and weassume a total of 10 RPM-levels between 6,000 RPM and15,000 RPMwith an RPM step size of 1,000 RPM. The powerconsumed during the transition times is computed as theaverage of the power consumption at the two different levelsbetween which transition occurs. The transition time ismodeled by fitting a linear equation based on the real valuesof transition times reported by multi-RPM disk manufac-turers [44], [46]. We measure the sensitivities at every

10 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Authorized licensed use limited to: University of Virginia Libraries. Downloaded on November 14, 2008 at 13:55 from IEEE Xplore. Restrictions apply.

Page 11: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

10,000 requests, and set the default values of UT and LT to be15 percent and 5 percent, respectively.

We present two sets of graphs for the evaluation of thisalgorithm on the four workloads. The energy consump-tion characteristics of the workloads are shown in Fig. 6,and the corresponding performance characteristics aregiven in Fig. 7.

From Fig. 6, we can see that SODA reduces the energyconsumption of the storage system from the baseline for allthe fourworkloads. The energy savings for Financial, Search-Engine,TPC-H, andOpenmail, byusing theSODApolicy, are21.31 percent, 26.14 percent, 4.34 percent, and 30.75 percent,respectively. We observe that TPC-H has a lower energyconsumption reduction of 4.34 percent as compared to otherworkloads. This can be explained by observing the variationin seek characteristics across the workloads, especially withTPC-H exhibiting highly varying seek behavior, therebymaking control of the VCM knob ineffective during most ofthe sample windows. To quantify the effect of seek behaviorvariation on TPC-H, wemeasured the coefficient of variation

SANKAR ET AL.: SENSITIVITY-BASED OPTIMIZATION OF DISK ARCHITECTURE 11

Fig. 6. Energy reduction using dynamic SODA.

Fig. 7. Performance characteristics of Baseline versus SODA.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Authorized licensed use limited to: University of Virginia Libraries. Downloaded on November 14, 2008 at 13:55 from IEEE Xplore. Restrictions apply.

Page 12: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

which is a normalized measure of dispersion of a probabilitydistribution. The coefficient of variation is mathematicallyexpressed as 100ð�=�Þ, where � and � are the mean andstandard deviations of the distribution, respectively. Wefind that TPC-H has the highest coefficient of variation(62.01 percent), while those of Financial, Search-Engine, andOpenmail are 18.57 percent, 10.88 percent, and 20.84 percent,respectively. Since the VCM speed has a direct impact on theseek time, and the impact of seeks on the response time variessignificantly for this workload, the ratio of sensitivities is notbalanced most of the time, which results in lower energysavings as compared to other workloads.

Fig. 7 shows that using SODA also delivers performancecomparable to Baseline for the four workloads. We representthe performance curves as Cumulative Distribution Func-tions (CDFs) of the response time. CDFs show the fractionof I/O requests whose response times are less than or equalto a given value on the x-axis. CDFs allow us to visualizethe scenario where a large number of I/O requests may beexperiencing relatively short response times whereas a fewother requests may have very long response times. Since wechose to allow up to 15 percent degradation in the averageresponse time to save energy (via the UT parameter), theSODA CDFs are shifted slightly below the Baseline CDFs.The workloads with shorter interarrival times experience aslightly larger shift as can be seen in Openmail and Search-Engine (with average interarrival times of 1.18 and 2.96 ms,respectively), since more requests get queued at the diskwhen the disk is transitioning between RPM levels in thealgorithm. For, Financial and TPC-H (with average inter-arrival times of 8.19 and 8.76 ms), SODA and Baselineperformance CDFs are almost similar, with insignificantperformance difference.

From the two set of graphs, we can observe that usingSODA dynamically provides energy reduction and does notcompromise on performance as it satisfies the performanceconstraint throughout the different phases of the workload.Using SODA dynamically can, hence, provide a way ofadapting energy requirements to workload needs andthereby make the system more energy efficient.

8 CONCLUSIONS

In this paper, we have presented a detailed and parameter-ized model for disk drive architectures for two key figuresof merit, namely, performance and energy consumption.We have shown two scenarios where this model can beused as an offline analysis tool for sensitivity-basedoptimization of disk drives, one at design-time and theother using run-time summary characteristics. We have alsoshown how SODA can be used dynamically as an onlinetool to optimize runtime power consumption. A keyadvantage of the SODA framework is the ability to rapidlyexplore large design spaces efficiently, which can beespecially useful during the early stages of developmentof a new architecture. We have also integrated the SODAmodel in a detailed performance simulator, Disksim, todevelop a power management algorithm and have shownthat dynamic use of SODA can deliver performance fordata-intensive applications in an energy-efficient manner.

ACKNOWLEDGMENTS

This work was supported in part by US National ScienceFoundation (NSF) CAREER Award 0643925 and NSF

Grants CCR-0105626, CCR-0133634, CNS-0627527, and

CNS-0551630, a grant from the MARCO Interconnect

Focus Center, one of five research centers funded under

the Focus Center Research Program, a Semiconductor

Research Corp. and DARPA program, and gifts from HP

and Google.

REFERENCES

[1] S. Akyurek and K. Salem, “Adaptive Block Rearrangement,” ACMTrans. Computer Systems, vol. 13, no. 2, pp. 89-121, May 1995.

[2] E.V. Carrera and R. Bianchini, “Disk Caching with an OpticalRing,” Applied Optics, vol. 39, no. 35, pp. 6663-6680, Dec. 2000.

[3] E.V. Carrera, E. Pinheiro, and R. Bianchini, “Conserving DiskEnergy in Network Servers,” Proc. Int’l Conf. Supercomputing(ICS ’03), June 2003.

[4] S.H. Charrap, P.L. Lu, and Y. He, “Thermal Stability of RecordedInformation at High Densities,” IEEE Trans. Magnetics, vol. 33,no. 1, pp. 978-983, Jan. 1997.

[5] D. Dammers, P. Binet, G. Pelz, and L.M. Voßkamper, “MotorModeling Based on Physical Effect Models,” Proc. IEEE/ACM Int’lWorkshop Behavioral Modeling and Simulation (BMAS ’01), pp. 78-83,Oct. 2001.

[6] F. Douglis and P. Krishnan, “Adaptive Disk Spin-DownPolicies for Mobile Computers,” Computing Systems, vol. 8,no. 4, pp. 381-413, 1995.

[7] G.R. Ganger, B.L. Worthington, and Y.N. Patt, The DiskSimSimulation Environment Version 2.0 Reference Manual, http://www.ece.cmu.edu/ganger/disksim/, Dec. 1999.

[8] M.J. Flynn, “Very High-Speed Computing Systems,” Proc. IEEE,vol. 54, no. 12, pp. 1901-1909, Dec. 1966.

[9] S. Gurumurthi, A. Sivasubramaniam,M. Kandemir, andH. Franke,“DRPM: Dynamic Speed Control for Power Management inServer Class Disks,” Proc. Int’l Symp. Computer Architecture(ISCA ’03), pp. 169-179, June 2003.

[10] S. Gurumurthi, A. Sivasubramaniam, and V. Natarajan, “DiskDrive Roadmap from the Thermal Perspective: A Case forDynamic Thermal Management,” Proc. Int’l Symp. ComputerArchitecture (ISCA ’05), pp. 38-49, June 2005.

[11] M.A. Horowitz, V. Stojanovic, B. Nikolic, D. Markovic, andR.W. Brodersen, “Methods for True Power Minimization,” Proc.Int’l Conf. Computer-Aided Design (ICCAD ’02), pp. 35-42, 2002.

[12] G. Herbst, “IBM’s Drive Temperature Indicator Processor(Drive-TIP) Helps Ensure High Drive Reliability,” IBM whitepaper, Oct. 1997.

[13] Hitachi Global Storage Technologies—HDDTechnologyOverviewCharts, http://www.hitachigst.com/hdd/technolo/verview/storagetechchart.html, 2008.

[14] W.W. Hsu and A.J. Smith, “Characteristics of I/O Traffic inPersonal Computer and Server Workloads,” IBM Systems J.,vol. 42, no. 2, pp. 347-372, 2003.

[15] W.W. Hsu, A.J. Smith, and H.C. Young, “The AutomaticImprovement of Locality in Storage Systems,” ACM Trans.Computer Systems, vol. 23, no. 4, pp. 424-473, Nov. 2005.

[16] Y. Hu and Q. Yang, “DCD Disk Caching Disk: A New Approachfor Boosting I/O Performance,” Proc. Int’l Symp. ComputerArchitecture (ISCA ’96), pp. 169-178, May 1996.

[17] Y. Kim, S. Gurumurthi, and A. Sivasubramaniam, “Understand-ing the Performance-Temperature Interactions in Disk I/O ofServerWorkloads,” Proc. Int’l Symp. High Performance ComputerArchitecture (HPCA ’06), pp. 179-189, Feb. 2006.

[18] D. Kotz, S.B. Toh, and S. Radhakrishnan, “A Detailed SimulationModel of the HP 97560 Disk Drive,” Technical Report PCS-TR94-220, Dept. of Computer Science, Dartmouth College, July 1994.

[19] M.H. Kryder, “Future Storage Technologies: A Look Beyond theHorizon,” Proc. Computerworld Storage Networking World Conf.,Apr. 2006.

[20] K. Li, R. Kumpf, P. Horton, and T.E. Anderson, “QuantitativeAnalysis of Disk Drive Power Management in Portable Compu-ters,” Proc. USENIX Winter Conf., pp. 279-291, 1994.

[21] X. Li, Z. Li, F. David, P. Zhou, Y. Zhou, and S. Adve,“Performance Directed Energy Management for Main Memoryand Disks,” Proc. Int’l Conf. Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS ’04), pp. 271-283, Oct.2004.

12 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Authorized licensed use limited to: University of Virginia Libraries. Downloaded on November 14, 2008 at 13:55 from IEEE Xplore. Restrictions apply.

Page 13: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 1, JANUARY 2009 1

[22] D. Markovi�c, V. Stojanovi�c, B. Nikoli�c, M.A. Horowitz, and R.W.Brodersen, “Methods for True Energy-Performance Optimiza-tion,” IEEE J. Solid-State Circuits, vol. 39, no. 8, pp. 1282-1293, Aug.2004.

[23] The Openmail Trace, http://tesla.hpl.hp.com/privatesoftware/,2006.

[24] D. Patterson, G. Gibson, and R. Katz, “A Case for RedundantArrays of Inexpensive Disks (RAID),” Proc. ACM SIGMOD ’88,pp. 109-116, June 1988.

[25] A.E. Papathanasiou and M.L. Scott, “Energy Efficient Prefetchingand Caching,” Proc. USENIX Ann. Technical Conf., June 2004.

[26] E. Pinheiro and R. Bianchini, “Energy Conservation Techniquesfor Disk Array-Based Servers,” Proc. Int’l Conf. Supercomputing(ICS ’04), June 2004.

[27] C. Ruemmler and J. Wilkes, “Disk Shuffling,” Technical ReportHPL-91-156, HP Laboratories, Oct. 1991.

[28] C. Ruemmler and J. Wilkes, “UNIX Disk Access Patterns,” Proc.USENIX Winter Technical Conf., pp. 405-420, Jan. 1993.

[29] C. Ruemmler and J. Wilkes, “An Introduction to Disk DriveModeling,” IEEE Computer, vol. 27, no. 3, pp. 17-28, Mar. 1994.

[30] I. Sato, K. Otani, M. Mizukami, S. Oguchi, K. Hoshiya, andK.-I. Shimokura, “Characteristics of Heat Transfer in SmallDisk Enclosures at High Rotation Speeds,” IEEE Trans. Compo-nents, Packaging, and Manufacturing Technology, vol. 13, no. 4,pp. 1006-1011, Dec. 1990.

[31] K. Skadron, M.R. Stan, W. Huang, S. Velusamy,K. Sankaranarayanan, and D. Tarjan, “Temperature-AwareMicroarchitecture,” Proc. Int’l Symp. Computer Architecture(ISCA ’03), pp. 1-13, June 2003.

[32] M. Sri-Jayantha, “Trends in Mobile Storage Design,” Proc. Int’lSymp. Low Power Electronics, Oct. 1995.

[33] M.R. Stan, “Low Power CMOS with Sub-Volt Supply Voltages,”IEEE Trans. VLSI Systems, vol. 9, no. 2, pp. 394-400, Apr. 2001.

[34] “Storagereview/The PC Guide—Single versus Multiple Actua-tors,” http://www.storagereview.com/guide2000/ref/hdd/op/actMultiple. html, 1998.

[35] TPC-C Benchmark V5, http://www.tpc.org/tpcc/, 2008.[36] UMass Trace Repository, http://traces.cs.umass.edu, 2008.[37] R. Viswanath, V. Wakharkar, A. Watwe, and V. Lebonheur,

“Thermal Performance Challenges from Silicon to Systems,” IntelTechnology J., Q3, 2000.

[38] L.M. Voßkamper, R. Schmid, and G. Pelz, “Combining Models ofPhysical Effects for Describing Complex ElectromechanicalDevices,” Proc. IEEE/ACM Int’l Workshop Behavioral Modeling andSimulation (BMAS ’00), pp. 42-45, Oct. 2000.

[39] B. Worthington, G. Ganger, Y. Patt, and J. Wilkes, “On-LineExtraction of SCSI Disk Drive Parameters,” Proc. ACMSIGMETRICS Conf. Measurement and Modeling of ComputerSystems, pp. 146-156, May 1995.

[40] J. Zedlewski, S. Sobti, N. Garg, F. Zheng, A. Krishnamurthy, andR. Wang, “Modeling Hard-Disk Power Consumption,” Proc. Ann.Conf. File and Storage Technology (FAST ’03), Mar. 2003.

[41] Y. Zhang, Z. Lu, M.R. Stan, J. Lach, and K. Skadron, “OptimalProcrastinating Voltage Scheduling for Hard Real-Time Systems,”Proc. Design Automation Conf. (DAC ’05), pp. 905-908, June 2005.

[42] Q. Zhu, F.M. David, C. Devraj, Z. Li, Y. Zhou, and P. Cao,“Reducing Energy Consumption of Disk Storage Using Power-Aware Cache Management,” Proc. Int’l Symp. High-PerformanceComputer Architecture (HPCA ’04), Feb. 2004.

[43] Q. Zhu, A. Shankar, and Y. Zhou, “PB-LRU: A Self-Tuning PowerAware Storage Cache Replacement Algorithm for ConservingDisk Energy,” Proc. Int’l Conf. Supercomputing (ICS ’04), June 2004.

[44] Q. Zhu, Z. Chen, L. Tan, Y. Zhou, K. Keeton, and J. Wilkes,“Hibernator: Helping Disk Arrays Sleep through the Winter,”Proc. Symp. Operating Systems Principles (SOSP ’05), pp. 177-190,Oct. 2005.

[45] V. Zyuban and P.N. Strenski, “Balancing Hardware Intensity inMicroprocessor Pipelines,” IBM J. Research and Development,vol. 47, no. 5/6, 2003.

[46] Hitachi Power and Acoustic Management—Quietly Cool, http://www.hitachigst.com/tech/techlib.nsf/productfamilies/WhitePapers, Mar. 2004.

Sriram Sankar received the BE degree incomputer science and engineering from AnnaUniversity, India in 2005 and the MS degree incomputer science from the University of Virgi-nia, Charlottesville in 2008. he was a memberof the Information Technology group at D.E.Shaw India Software Private Ltd. His researchinterests include computer architecture andstorage systems. He currently works at Micro-soft Corporation, Redmond, Washington. He is

a member of the ACM.

Yan Zhang received the BS and MS degreesin electrical engineering from Tsinghua Uni-versity, Beijing, in 1997 and 2000, respectively,and the MS and PhD degrees in electrical andcomputer engineering from the University ofVirginia in 2003 and 2006, respectively. Hisresearch interests include power-aware andtemperature-aware computing, low-power VLSIdesign, and architecture-level power modeling.He is currently a senior design engineer with

Qualcomm, San Diego.

Sudhanva Gurumurthi received the BE degreefrom Anna University, India in 2000 and the PhDdegree from Penn State in 2005, both in thefield of computer science and engineering. He isan assistant professor in the Department ofComputer Science, University of Virginia,Charlottesville. His research area is computerarchitecture. He has held research positions atthe IBM Austin Research Lab and Intel Corp. Hereceived the NSF CAREER Award in 2007, the

CSE Research Assistant Award in 2004, and the Robert M. OwensMemorial Scholarship in 2003. He is a member of the IEEE, the IEEEComputer Society, and the ACM.

Mircea R. Stan received the diploma inelectronics and communications from the“Politehnica” University, Bucharest, Romania,and the MS and PhD degrees in electrical andcomputer engineering from the University ofMassachusetts at Amherst. Since 1996, he hasbeen with the Charles L. Brown Department ofElectrical and Computer Engineering, Univer-sity of Virginia, Charlottesville, where he is nowa full professor. He is teaching and doing

research in the areas of high-performance and low-power VLSI,temperature-aware circuits and architecture, embedded systems,GPU/CPU integration, and nanoelectronics. He has more than eightyears of industrial experience, was a visiting scholar at the Universityof California, Berkeley, in 2004-2005, a visiting faculty member withIBM in 2000, and with Intel in 2002 and 1999. He was the recipient ofthe National Science Foundation CAREER Award in 1997 and was acoauthor on Best Paper Awards at GLSVLSI 2006, ISCA 2003, andSHAMAN 2002. He was the chair of the VLSI Systems andApplications Technical Committee (VSA-TC) of the IEEE CAS for2006-2007, the general chair for ISLPED 2006 and for GLSVLSI 2004,the technical program chair for NanoNets 2007 and ISLPED 2005, andon technical committees for numerous conferences. He was anassociate editor for the IEEE Transactions on Circuits and Systems Ifrom 2004 to 2007 and for the IEEE Transactions on VLSI Systemsfrom 2001 to 2003. He has also been a guest editor for the IEEEComputer special issue on Power-Aware Computing in December2003 and a distinguished lecturer for the IEEE Solid-State CircuitsSociety (SSCS) from 2007 to 2008, and for the IEEE Circuits andSystems (CAS) Society from 2004 to 2005. He is a member of theACM, IET, Eta Kappa Nu, Phi Kappa Phi, and Sigma Xi. He is asenior member of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

SANKAR ET AL.: SENSITIVITY-BASED OPTIMIZATION OF DISK ARCHITECTURE 13

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

Authorized licensed use limited to: University of Virginia Libraries. Downloaded on November 14, 2008 at 13:55 from IEEE Xplore. Restrictions apply.


Recommended