served. February 2006
DISASTER RECOVERY
T need in financial termsness decisions should drive busi-
. The lack of an effective business
ecovery plan can expose organiza-
ncial risk. Aside from lost income,
esmirch a good brand name. Plus,
in regulated industries—such as
e risk of legal liability.
terprises affix a monetary value to
ning efforts to help ensure buy-in
e organization. Additionally, enter-
cial progress as they move through
e business continuity plan. After all,
ontinuity planning do not arrive en
ization flips the switch on its new
covery benefits are associated with
uce the lead time to recovery, and
nt technologies and processes, this
ease. In the end, by viewing the
BY RICH ARMOUR, PAUL ENO, MICHAEL KIMBLE, AND JESSE FREUND
Architecting a Blueprint for Disaster Recovery
Effective risk assessment and business continuity provisions enable organizations to
minimize downtime and recover crucial applications quickly when disaster strikes.
By analyzing business processes and functions and defining the impact of system
downtime in financial terms, enterprises can effectively prioritize the enterprise-wide
recovery process to help reduce business consequences. Based on internally honed
best practices, Dell’s top 10 rules for disaster planning can help enterprises align
recovery efforts with overall business objectives.
DISASTER RECOVERY
www.dell.com/powersolutions Reprinted from Dell Power Solutions, February 2006. Copyright © 2006 Dell Inc. All rights reserved. DELL POWER SOLUTIONS 9
business continuity strategy in monetary terms and tracking finan-
cial progress over time, organizations can align business continuity
efforts with smart business decisions.
Rule 2: Use hard data to create a risk profileBecause Dell’s data centers are located in Texas, the Dell business
continuity team might have assumed that the company’s primary
risk would be a catastrophic tornado. In fact, after analyzing hard
data, the team determined that the primary threat to data centers
is likely to be fire.
How did Dell construct its risk profile? The company invited
its insurance carriers to bring hard data to the business continuity
plan. Insurance carriers possess claims data that can tell organiza-
tions what the risks are in a given geographic area. Plus, insurance
carriers naturally want to help clients reduce risk. In the end, the
use of hard claims data from the insurance carriers offered some
of the most valuable and surprising information as Dell developed
its own business continuity plan.
Rule 3: Identify the critical resourcesAll data is not created equal, and the same holds true for applica-
tions. An effective business continuity planning process requires
organizations to undergo a rigorous analysis of business processes
and functions and to identify the critical resources that require
redundancy, backup, and recovery. Before organizations can discuss
the IT resources necessary to maintain business-critical processes,
they must assess the business impact of losing systems—paying par-
ticular attention to interdependencies that exist among systems. For
example, a business-critical system may rely on input from another
system that is not deemed critical in and of itself. It is crucial that
the business side of the organization lead the discussion of critical
processes before the IT side can define the technologies necessary
to enable business continuity.
Dell has developed a three-tier strategy for classifying data and
applications. Class 1 systems support business-critical processes.
For example, at Dell a business-critical process involves any service
that directly interacts with the customer, which includes taking
and processing orders as well as building, shipping, and servicing
products. By contrast, Class 2 systems correlate to business-essential
processes, where a 48-hour outage would begin to have a negative
business impact. Class 3 systems enable business-support processes,
for which a temporary loss of service is deemed noncritical. By clas-
sifying and defining processes, applications, and data along business-
criticality demarcations, Dell helps ensure that the appropriate
investment is made to recover the most crucial systems first.
Rule 4: Think beyond the data centerMany disaster recovery efforts are focused on keeping the data
center up and running. However, effective business continuity
planning must reach beyond
applications, data, and infra-
structure considerations. For
example, it does no good to have
the data center up and running if
no provisions have been made to
support people performing vital
business functions such as ship-
ping and receiving. Of course,
applications must be available,
data must be accessible, and the
network must be working. But
by focusing on the data center
at the expense of essential busi-
ness processes and infrastructure
components, enterprises run the
risk of turning robust data center
functionality into little more than
a paper tiger.
Rule 5: Eliminate or mitigate single points of failureA single point of failure occurs when there is no redundancy
to compensate for a missing application, data, or infrastructure
component. It may be an application or a database server, a lone
backup generator in a data center, or the long-haul network itself.
Moreover, several single points of failure typically exist within
an enterprise infrastructure. Consequently, organizations should
perform a specific and detailed single-point-of-failure analysis
across the entire infrastructure. Doing so may reveal that a key
f component was missed when a data center, or another form of
disaster recovery system, was built. In the end, performing a
single-point-of-failure analysis may help prevent an organization
from having to entirely reconstitute business capabilities when a
relatively minor component fails.
Rule 6: Assume that everything is going to failOftentimes, when enterprises build a disaster recovery plan, they
do so with the expectation that land lines, cell phones, and the
network will be available. Or they take for granted that the roads to
fthe data center will be accessible—assuming the data center itself
is still standing and the employees are capable of getting there. The
best-laid plans for business continuity include the consideration that
every key piece of internal and external infrastructure may fail or
become unavailable for extended periods.
fAt Dell, the operating assumption is that every vital piece of
infrastructure is capable of failing, and all of them may go down
fat the same time. Along these lines, the Dell recovery plan itself
is stored on CD. Copies of the CD are distributed across multiple
teams. At least one copy resides in each data center, and another
TOP 10 RULES FOR DISASTER PLANNING
1. Articulate the need in financial terms.
2. Use hard data to create a risk profile.
3. Identify the critical resources.
4. Think beyond the data center.
5. Eliminate or mitigate single points of failure.
6. Assume that everything is goingto fail.
7. Consider an active/active data center strategy.
8. Recognize potential vendor weaknesses.
9. Keep disaster recover capability up-to-date.
10. Perform tests on a regular basis.
DISASTER RECOVERY
DELL POWER SOLUTIONS Reprinted from Dell Power Solutions, February 2006. Copyright © 2006 Dell Inc. All rights reserved. February 200610
The Dell™ PowerVault™ ML6000 modular tape library isdesigned to intelligently protect an organization’s criticaldata. Through proactive diagnostics and flexible scalability,the PowerVault ML6000 enables organizations to preparetheir storage environments for disaster recovery.
The built-in intelligence of the PowerVault ML6000 helpsensure that backups can execute as planned. The library’sdiagnostics are designed to predict failures in the library’senvironment, such as tape or drive malfunction, and send e-mail messages to warn admin-istrators of potential issues. This proactive notification allows administrators to plan ahead and resolve problem conditions before failures occur—thus lim-iting unplanned downtime. If a failure occurs, administrators can use one of the library’s simpletroubleshooting wizards, which provide solutions to known issues ranging from cleaning orrestarting drives to opening a service ticket. The wizards helpadministrators save time by resolving issues on-site. If admin-istrators cannot resolve an issue using the wizards, detailedevent logs and built-in relational diagnostics isolate failuresat a subcomponent level, minimizing the time to repair of thePowerVault ML6000.
The tape library’s modular and versatile scalability pro-vides organizations with several capacity options, offeringorganizations the flexibility to pay as they grow without lim-iting their ability to add more drives or slots to the existinglibrary. The PowerVault ML6000 can scale from 14.4 TB ofnative backup storage capacity to 51.2 TB (native) using 9Uexpansion modules to support the demands of workgroupand mid-range data centers; up to 161 TB of capacity isplanned for future releases. The library scales from two tosix Ultrium 3 Linear Tape-Open (LTO-3) SCSI or Fibre Channeldrives; future support is planned for up to 18 LTO-3 SCSI orFibre Channel drives. In addition, from 36 to 128 cartridgeslots give organizations added backup performance andcapacity; support for up to 404 cartridge slots is planned forfuture releases. To satisfy ever-shrinking backup windows,
the PowerVault ML6000 is designed to provide a maximumnative transfer rate of 1.7 GB/hour.
The PowerVault ML6000 control module can be placedanywhere in the expansion stack so that organizations caneasily expand and customize their libraries. Each additional expansion module leverages the existing robotics and intel-ligence of the control module to reduce the overall numberof moving parts, thus enhancing the library’s reliability.
The tape library also offers a wide range of connectivity and compatibility options for pro-tecting storage environments. The drive technology used in thePowerVault ML6000 can expand to support different connectiv-ity standards, including SCSI and Fibre Channel. The library is also compatible with storagesoftware from CommVault, EMC, and Symantec.
In addition, the PowerVault ML6000 is available with two Dell Services offerings. The Backup and Recovery Design and Deployment service provides a detailed backup and recovery plan that is designed to help organizations establish appro-priate procedures to minimize or avoid data loss. Meanwhile,the Backup and Recovery Implementation service is designed to be a comprehensive approach for organizations imple-menting a backup and recovery process on a new Dell orDell/EMC storage area network or network attached stor-age solution, and may include software from key third-partytechnology providers.
The drive and media technology used by the PowerVault ML6000 provides robust backup and restore functionality as well as effective, long-term data retention. For example, the library’s use of LTO-3 drives and WORM (write once, read many) media aids in regulatory compliance by preventing data from being overwritten or digitally altered while providing excellent tape drive performance.
The Dell PowerVault ML6000 tape library is opti-mized for Dell PowerEdge™ servers and Dell’s compre-hensive storage portfolio. For more information, visit www.dell.com/storage.
INTELLIGENT DATA PROTECTION: THE DELL POWERVAULT ML6000 TAPE LIBRARY
Figure A. Dell PowerVault ML6010 CM tape library
DISASTER RECOVERY
www.dell.com/powersolutions Reprinted from Dell Power Solutions, February 2006. Copyright © 2006 Dell Inc. All rights reserved. DELL POWER SOLUTIONS 11
copy is kept in the IT operations center. This way, if a disaster
cripples phone service, Internet availability, or transportation infra-
structure, Dell still has the capability to begin recovery.
Rule 7: Consider an active/active data center strategy One way to know that a recovery plan will work is to make it a part
of the load-balancing activities. Along these lines, Dell relies on
an active/active data center strategy as part of its everyday opera-
tions. To do so, Dell provisions more than 100 percent capacity
for each application so that it can split application load balancing
across multiple data centers. Each application has 75 percent of
required capacity in each data center—lending each application
150 percent of its nominal capacity requirement. Not only does
this load-balancing strategy translate to high-performance applica-
tions, but it also helps ensure that disaster recovery and failover
capabilities are being tested every moment of every day. This
way, when Dell needs to implement its disaster recovery plan,
the company knows it will work because it is already part of the
existing load-balancing strategy.
Rule 8: Recognize potential vendor weaknessesCritical vendors can significantly affect an enterprise’s capacity
for disaster recovery. When putting together a business continuity
plan, organizations must evaluate a vendor’s own disaster recovery
capabilities to understand how their potential weaknesses might
hinder the enterprise. For example, after the 9/11 terrorist attack,
many customers turned to Dell to rapidly reconstitute systems that
had been destroyed or damaged. Thanks to its size and scalability,
Dell was able to respond quickly to get these customers back online.
Had Dell been smaller and less scalable, hardware procurement
might have become a significant bottleneck in its customers’ disaster
recovery capabilities.
Rule 9: Keep disaster recovery capability up-to-dateDisaster recovery capability can quickly become outdated. It must be
maintained by a strong set of procedures and processes, so it becomes
part of the everyday, every project, and every implementation culture.
As each new project or application is initiated, enterprises need to
perform an analysis of where it fits in the criticality matrix. If, for
instance, a new application is deemed to support a Class 1 business
process, then the application must be engineered with the appropriate
recoverability. Plus, that capability must be maintained going forward.
As changes are made to applications, databases, and data centers,
disaster recovery capabilities should be updated as well.
Rule 10: Perform tests on a regular basisEnterprises can never assume that their disaster recovery capability
is actually working. Dell tests its failover and recovery processes
on a quarterly basis. Aside from validating that Dell does indeed
have failover and recovery capabilities, these quarterly tests help
keep the business continuity plan in front of the infrastructure and
application teams, which helps encourage future development with
business continuity in mind. The quarterly tests represent an impor-
tant part of the effort to make disaster recovery part of the everyday,
every project, and every implementation culture at Dell.
Align recovery efforts with business objectivesEffective disaster recovery and business continuity planning
depends on an enterprise’s ability to identify critical processes and
technologies, maintain and recover functionality after a planned or
unplanned event, and balance the risks with the costs of continuity
efforts. In turn, this effectiveness requires an alignment of business
continuity planning with articulated business goals. To align busi-
ness objectives with continuity efforts, enterprises must develop a
risk profile based on hard data. Furthermore, the business side of
the organization should guide the development of the risk profile. By
basing disaster recovery and business continuity efforts on business
objectives and by refining these practices over time, organizations
can develop a plan that not only pays dividends in the event of an
unfortunate event, but also helps organizations realize efficiencies
in their day-to-day operations.
Rich Armour is a director on the Dell Information Technology team. He has a B.S. in Computer Science and Mathematics from Eastern New Mexico University and an M.B.A. from George Washington University.
Paul Eno is a senior manager on the Dell Information Technology team. He has a B.S. in Engineering from the U.S. Military Academy at West Point, New York, and an M.B.A in Financial Information Management from The University of Texas at Austin.
Michael Kimble is an enterprise technologist in the Advanced Systems Group at Dell. He focuses on storage solutions for business continuity and disaster recovery.
Jesse Freund is a business and technology writer based in San Francisco. He has written about business and technology for leading publications, cor-porations, and organizations, including Business 2.0 and Wired magazines. Jesse has a B.A. in History from the University of California, Berkeley.
FOR MORE INFORMATION
Dell business continuity:www.dell.com/disasterrecovery www.dell.com/enterprise www.dell.com/storage www.dell.com/services