Date post: | 20-Nov-2014 |
Category: |
Technology |
Upload: | cameroon45 |
View: | 899 times |
Download: | 2 times |
1
Fault Tolerant Servers and Disaster Recovery
Product ManagementNEC Solutions America
Page 2
Outline
• What is the Fault Tolerance?
• 5 Customer Benefits of FT – 99.999% Uptime– Ease of Maintenance– Standard Software– Remote Management– Beats the TCO of Clusters
• How to Compete with Clusters
• Disaster Recovery Solution– Benefits– Failover/Failback
• Review Benefits
Page 3
What is Fault Tolerance?
99.999%Fault TolerantContinuous Availability (CA)
5 Minutes None
ClusterHigh Availability (HA)
99.9% 8 Hours45 Minutes
Business InterruptionLost Transaction
Stand Alone GP or BladeServer w/RAID
99.5% 43 Hours23 Minutes Tomorrow is O.K.
Availability Average Annual Downtime
User Tolerance toDowntime
72% of mission critical applications experience nine hours of outage per year.- Standish Group Research
Source: IDC
Page 4
FT Server – Customer Advantages
• 0 Downtime hardware - total hardware redundancy
• Ease of maintenance - modular design
• Runs standard software - no modifications to OS or apps
• Lights out computing – complete remote management
• Lower Total Cost of Ownership - beats clusters
Page 5
Fault-tolerant systemFault-tolerant system
Benefit #1 – 99.999% Uptime
DiskDiskCPUCPUCPU
I/O-PCI
Chipset CPU
MemoryConventional systemConventional system
Zero switchover timeZero switchover timeNo single point of failureNo single point of failure
Power
DiskDiskDiskDisk
DiskDisk
CPUCPUCPUCPU
MemoryFaultFault
DetectionDetection ChipsetProcessing Subsystem AProcessing Subsystem A
Module A
Module A
I/O Subsystem AI/O Subsystem A
FaultFaultDetectionDetection
I/O-PCI
DiskDiskDiskDisk
DiskDiskCPUCPUCPUCPU
MemoryFaultFault
DetectionDetection ChipsetProcessing Subsystem BProcessing Subsystem B
Module B
Module B
I/O Subsystem BI/O Subsystem B
FaultFaultDetectionDetection
I/O-PCI
Red
unda
nt P
ower
Red
unda
nt P
ower
Mirr
or
CPU
Lockstep
Page 6
Benefit #2 - Ease of Maintenance
Designed for Simplified Service (Customer Replaceable Units (CRU))
Page 7
Benefit #3 - Runs Out-of-the-Box Applications with FT Software Capabilities
NEC FT can provide fault tolerant availability to any “straight out of the box” application!
• FT uses standard operating system:– Windows Server 2003 Enterprise Edition
• Requires only one copy of any application
• Applications need not be “cluster aware” or Enterprise version
Page 8
OS/Application Monitoring & Recovery
ExpressCluster SRE
Monitors and Restores Server Functionality
Windows OS - Monitors the Windows Operating System resources and drivers; reboots the server if failure is detected
Applications - Monitors the application processes; restarts failed applications. Optionally reboots the server if applications are not restarted after pre-set number of retries
Application Application
ExpressClusterSelf Recovery Edition
Windows 2003
= restart
Page 9
What is Active Upgrade?
• An advanced method of performing software maintenance by using the system architecture of the FT Series Servers.
• Provides the ability to perform software maintenance without requiring a reboot of the operating system:
– Windows Hot-fixes & Security Patches
– Service Packs
– System Software upgrades from NEC
– Applications (dependant upon characteristics)
Page 10
How does Active Upgrade work?
Concept Overview:• Mission critical applications run at 100% on side A.• Upgrades performed on Side B while it is offline.
• Application is turned off and turned back on directed to the system disks on Side B. Application downtime is only 30-100 seconds.
• Full restore option if upgrade causes undesirable server behavior
CPU &Memory
I/O
CPU &Memory
I/O
CPU &Memory
I/O
CPU &Memory
I/O
CPU &Memory
I/O
CPU &Memory
I/O
Normal Operation System Split & Upgrade Resynchronization
Side A Side BSide A Side B
Page 11
VMware GSX Server 3.1
Microsoft Windows Server 2003Host Operating System
Virtualization Layer
Guest Operating Systems
VMware FT Virtual Server Environment
Hardware Requirements
Memory:512MB minimum up to 16GB for Windows Enterprise Edition
Space: 130GB for the Host plus 1GB minimum for each virtual stack
Microsoft:• Win 2000• XP• NT 4.0• Win Me• Win 98• Win 95• Win 3.1• MS-DOS
Linux:• RH AS• RH 6.2 – 9.0• SuSE Ent 7, 8• SuSE 7.3 – 9.1• Turbo 7.0, 8.0• Mand 8.0-9.2
Netware:• 6.5 SP 1• 6.0 SP 3• 5.1 SP 6• 4.2 SP 9
Solaris:x86 Platform Edition 9
FreeBSD:• 5.0 & 5.2• 4.8 & 4.9• 4.0- 4.6.2
Page 12
Benefit #4 – Remote Server Management
• Integrated management:– Standalone– Remote (In & Out-of-band)
• Complete state coverage:– System boot
• Operating system• SNMP Based
• Open standards:– Standard MIB interface for
linking to Tivoli, OpenView, UniCenter, etc.
Industry standard SNMP based management software.Industry standard SNMP based management software.
Module Module ControlControl
Module Level Module Level System System InformationInformation
On-LineOn-LineDiagnosticsDiagnostics
Page 13
320Ma Server Family• 320Ma DC - Dual Core CPUs – Data Center Performance Server
– Equivalent to almost 4 x 2.8GHz Xeon CPUs logical– Supports up to 16GB Memory
• 320Ma 3.6GHz – 2 x 3.6GHz CPUs – Ideal Virtual Host– Supports up to 16GB memory– Includes riser card
• 320Ma 3.2GHz – 2 x 3.2GHz CPUs – App Platform– Supports up to 8GB memory– Ideal SMB or departmental app platform
• 320Ma Single – 1 x 3.2GHz CPUs – Volume– Supports up to 4GB memory– No support for Active Upgrade– Not Upgradeable to dual CPUs– Volume purchases only
Page 14
NEC Storage S1500
• Compact Design:– 15 Drives and Electronics in single
3U enclosure (1 hot standby)– Supports up to 4 enclosures
• Total of up to 60 Drive Bays• Total of 13.8TB with 300GB HDDs
• Performance:– Fiber Channel Optical HBA
• 4GB/s HBA– Integrated Processor (s)
• 1MB of Cache per controller• RAID 0,1,5,10, 50, 6
• Reliability w/Redundant– I/O Paths– RAID & Cache Controller(s)– Power Units & Batteries
Page 15
Benefit #5 – TCO and Performance vs Clusters
App AOS A
Fault Tolerant Fault Tolerant ServerServer
Processor& I/O
Module
App AOS A
App BOS B
Heartbeat
Shared Storage
System 1 System 2
Nor
mal
N
orm
al
Ope
rati
onO
pera
tion
Cluster ServersCluster Servers
App AOS A X
Afte
r Fa
ilove
rAf
ter
Failo
ver
App AApp AApp BOS B
Heartbeat
Shared Storage
System 2
System 1
Processor& I/O
Module
RAID
Lockstep
Processor& I/O
Module
RAID
Lockstep
Page 16
FT Total Cost of Ownership ModelCO
ST
PhaseInstall Service Admin.Software OutageHardware
NEC ftNEC ftClusterCluster
NEC FT should be judged based on Total Cost of Ownership, not Price/Performance. THE NEC FT TCO beats a 2 node cluster.
Page 17
FT vs. Cluster Comparison
99.999%5 min. average/yearAvailability 99.9%
>8 hrs. average/year
Recovery time Zero switchover Minutes of failover
Data loss None (memory & disk) Disk protection Only
Implementation No work requiredScript development
&testing
Application modification RecommendedNone required
OS & Application Multi-licenses RequiredSingle license
Lights out ExtensiveIT support
FT series Cluster solution
Performance Potentially serious impact No impact
System integrity Complete None
Page 18
FT vs. Cluster Total Cost Comparison
Total Cost HP DL380 G4 NEC 320Ma
Base system:System & 1 OSSecond Operating SysLegato Cluster Software4 Hr. Service Contract Total
$22.8K3.3K6.5K1.8K
$34.4K
$31.5KNot RequiredNot RequiredNot Required
$31.5K
1-Year Total Cost-of-Ownership study – 320Ma vs. HP DL380 G4
Other Costs:
Cluster Setup time
Duplicate Application
Enterprise Ed Software
?
??
Downtime: $?K
Total TCO: $34.4K + SW + setup + downtime
SW License – 25 Users
Cluster Enabled
Standard SW
Exchange 2003
2 x $3,999
1 x $699
Microsoft SQL
2 x $6,381
1 x $3,585
Cluster Setup
Cluster with Basic Apps by
HP
Custom Cluster
with Exchange
or SQLMicrosoft Software
HP Price $6,000
3 weeks$30,000
On average 9 hours per year
Page 19
FT Total Cost of Ownership vs GP serversCO
ST
PhaseInstall Service Admin.Software OutageHardware
NEC ftNEC ftGP serverGP server
NEC FT should be judged based on Total Cost of Ownership including the cost of downtime.
Page 20
TCO – FT vs GP Hot Stand By
Acquisition
Costs (AC) HP NEC
Hardware $10,490 $29,495
OS Software $1,598 $2,000
Installation $1,550 $1,999
3 Years Support $1,898 $1,475
Subtotal $15,536 $34,969
Downtime
(6dy x12hr) 3 Years
$2.5K per hour
$5K per hour
$10K per hour
NEC @ 99.999%
$280 $561 $1,123
HP @ 99.9%
$28,170 $56,340 $112,680
TCO AC plus Downtime($2.5K per hour)
A&AC plus Downtime($5K per hour)
A&AC plus Downtime($10K per hour)
NEC @99.999%
$35,249 $35,530 $36,092
HP @99.9%
$43,706 $71,876 $128,216
NEC: TCO superiority!
NEC 320Ma 3.2 versus HP DL380 G4 Standalone Server with Cold Standby
Page 21
Blade Servers
• General purpose servers in a smaller form factor
• All reliability issues for general purpose servers apply to blade servers
• High availability is achieved with software clusters
Network, SCSI I/F
Intel Processor
Server Control Processor (BMC) Expansion Slot
Intel Processors
Memory
CPU Cooling Fan
Express5800/120Ba-4
Page 22
Disaster Tolerance SolutionDisaster Tolerance Solution
• Since 9/11 disaster tolerance has become a critical requirement for IT Directors
• The challenge:– Protect mission critical data in the event of the destruction
of local computing resources– Allow surviving resources to continue working with access
to the latest data– Don’t break the IT budget to do it.
Page 23
LAN LAN
FT Disaster Tolerant Solution
FT Server(320Ma)
FT Server(320Ma)
WAN (T1-1.5Mbps, 60ms RT latency)
Site A Site B
External Storage(S1500)
External Storage(S1500)
Protection against
unexpected HW failures
Corporate Network
Protection against
unexpected SW failures
Protection against major
Disasters
R R
Page 24
Disaster ScenarioSite A – 40 SQL Database Users logged into local FT server when a fire breaks
out in the computer room.
Failover:
• Users lose access to the server (A)
• When network loses contact it initiates a failover using “Floating IP Address or Floating IP Name”. Failover in about two minutes.
• Users log back into accounts but they are now running on remote server B via corporate with full up-to-date data.
LAN LAN
FT ServerFT Server
WAN
SiteA
SiteB
Corporate Network
Page 25
Disaster Scenario (continued)
Recovery:
• Servers reconnect and database begins resynch of changed or new data
• Once synchronization completes, reset can be automatic or on-command
• Automatic – As soon as the databases are ready users lose connection to remote server. When they re-login they are directed to local server
• On-command – IT managers can initiate reset later in day with appropriate warnings to users
LAN LAN
FT ServerFT Server
WAN
SiteA
SiteB
Resynch
Corporate Network
Page 26
Conclusions
5 Customer Benefits of the FT Server:– 99.999% Uptime– Easy Maintenance– Single copy of Standard Software– Great Remote Management Tools– FT TCO Beats Software Clusters
Disaster Recovery Solution:– No Data Loss– Less than 4 minutes of application loss– No reprogramming of users or devices– Affordable to Small/Medium Businesses
Page 27