+ All Categories
Home > Documents > Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray...

Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray...

Date post: 26-Mar-2015
Category:
Upload: paige-lancaster
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
34
Gray FT 4/24/95 1 Dependable Computing Dependable Computing Systems Systems Jim Gray Jim Gray UC Berkeley McKay Lecture UC Berkeley McKay Lecture 25 April 1995 25 April 1995 Gray @ Microsoft.com Gray @ Microsoft.com ny little will win over few big. So Parallel Computers are are in your future. tabase folks do parallelism with dataflow. They get near-linear scaleup, automatic parallelism ult tolerance is important if you have thousands o (many little machines have many little failures)
Transcript
Page 1: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 1

Dependable Computing SystemsDependable Computing Systems

Jim Gray Jim Gray UC Berkeley McKay LectureUC Berkeley McKay Lecture

25 April 199525 April 1995Gray @ Microsoft.comGray @ Microsoft.com

Talk 1: Many little will win over few big.So Parallel Computers are are in your future.

Talk 2: Database folks do parallelism with dataflow.They get near-linear scaleup, automatic parallelism.

Talk 3: Fault tolerance is important if you have thousands of parts(many little machines have many little failures)

Page 2: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 2

1,000 discs = 10 Terrorbytes

100 Tape Transports= 1,000 tapes = 1 PetaByte

100 Nodes 1 Tips

Hig

h S

peed N

etw

ork

( 10

Gb

/s)

The Airplane RuleThe Airplane Rule

“A two engine airplane has twice as many engine problems.” “A thousand-engine airplane has thousands of engine problems.”Fault Tolerance is KEY!

Mask and repair faults

Internet: Node fails every 2 weeksVendors: Disk fails every 40 years

Here: node “fails” every 20 minutes disk fails every 2 weeks.

Page 3: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 3

OutlineOutline

• Does fault tolerance work?Does fault tolerance work?• General methods to mask faults.General methods to mask faults.

• Software-fault toleranceSoftware-fault tolerance

• SummarySummary

Page 4: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 4

DEPENDABILITY: The 3 ITIESDEPENDABILITY: The 3 ITIES

• RELIABILITY / INTEGRITY: Does the right thing RELIABILITY / INTEGRITY: Does the right thing (also large MTTF)(also large MTTF)

• AVAILABILITY: Does it nowAVAILABILITY: Does it now. . (also large(also large MTTF MTTF

MTTF+MTTR MTTF+MTTRSystem Availability:System Availability:If 90% of terminals up & 99% of DB up?If 90% of terminals up & 99% of DB up?

(=>89% of transactions are serviced on time).(=>89% of transactions are serviced on time).

• Holistic vs Reductionist viewHolistic vs Reductionist view

SecurityIntegrity / Reliability

Availability

SecurityIntegrity / Reliability

Availability

Page 5: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 5

High Availability System ClassesHigh Availability System ClassesGoal: Build Class 6 SystemsGoal: Build Class 6 Systems

System Type

Unmanaged

Managed

Well Managed

Fault Tolerant

High-Availability

Very-High-Availability

Ultra-Availability

Unavailable(min/year)

50,000

5,000

500

50

5

.5

.05

Availability

90.%

99.%

99.9%

99.99%

99.999%

99.9999%

99.99999%

AvailabilityClass

1234567

Page 6: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 7

Case Studies - JapanCase Studies - Japan"Survey on Computer Security", Japan Info Dev Corp., March 1986. (trans: Eiichi Watanabe)."Survey on Computer Security", Japan Info Dev Corp., March 1986. (trans: Eiichi Watanabe).

VendorVendor (hardware and software) (hardware and software) 5 Months 5 MonthsApplication softwareApplication software 9 Months 9 MonthsCommunications linesCommunications lines 1.5 Years1.5 YearsOperationsOperations 2 Years 2 YearsEnvironment Environment 2 Years 2 Years

10 Weeks10 Weeks1,383 institutions reported (6/84 - 7/85)1,383 institutions reported (6/84 - 7/85)

7,517 outages, MTTF ~ 10 weeks, avg duration ~ 90 MINUTES7,517 outages, MTTF ~ 10 weeks, avg duration ~ 90 MINUTES

TO GET 10 YEAR MTTF TO GET 10 YEAR MTTF MUST ATTACK ALL THESE AREAS MUST ATTACK ALL THESE AREAS

Vendor

Com Lines

ApplicationSoftware Operations

Environment

42%

12%

25%9.3%

11.2%

Page 7: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 8

Case Studies -TandemCase Studies -TandemOutage Reports to Vendor

Totals:More than 7,000 Customer yearsMore than 30,000 System yearsMore than 80,000 Processor yearsMore than 200,000 Disc Years

Summary Tandem EWR Data1985 1987 1989

Customers 1000 1300 2000EWR Customers ? ? 267

Outage Customers 176 205 164Systems 2400 6000 9000

Processors 7,000 15,000 25,500Discs 16,000 46,000 74,000Cases 305 227 501

Reports 491 535 766Faults 592 609 892

Outages 285 294 438System MTTF 8 years 20 years 21 years

Systematic Under-reportingBut ratios & trends interesting

Page 8: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 10

Case Studies - Tandem TrendsCase Studies - Tandem Trends Reported MTTF by ComponentReported MTTF by Component

0

50

100

150

200

250

300

350

400

450

1985 1987 1989

software

hardware

maintenance

operations

environment

total

Mean Time to System Failure (years) by Cause

1985 1987 19901985 1987 1990SOFTWARESOFTWARE 2 2 53 53 33 33 YearsYearsHARDWAREHARDWARE 29 29 91 91 310310 YearsYearsMAINTENANCEMAINTENANCE 45 45 162162 409409 YearsYearsOPERATIONSOPERATIONS 99 99 171171 136136 YearsYearsENVIRONMENTENVIRONMENT 142142 214214 346346 YearsYears

SYSTEMSYSTEM 88 2020 2121 YearsYearsRemember Systematic Under-reportingRemember Systematic Under-reporting

Page 9: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 11

SummarySummary

• Current Situation:Current Situation: ~4-year MTTF => Fault Tolerance Works.~4-year MTTF => Fault Tolerance Works.

• Hardware is GREAT (maintenance and MTTF).Hardware is GREAT (maintenance and MTTF).

• Software masks most hardware faults.Software masks most hardware faults.

• Many Many hidden software outages in operations: software outages in operations:– New System Software.New System Software.– New Application Software.New Application Software.– Utilities.Utilities.

• Must make all software ONLINE.Must make all software ONLINE.

• Software seems to define a 30-year MTTF ceiling.Software seems to define a 30-year MTTF ceiling.

• Reasonable Goal:Reasonable Goal: 100-year MTTF. 100-year MTTF. class 4 today class 4 today =>=> class 6 tomorrow.class 6 tomorrow.

Page 10: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 12

OutlineOutline

• Does fault tolerance work?Does fault tolerance work?

• General methods to mask faults.General methods to mask faults.• Software-fault toleranceSoftware-fault tolerance

• SummarySummary

Page 11: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 13

Key IdeaKey Idea

ArchitectureArchitecture Hardware FaultsHardware Faults

SoftwareSoftware Masks Masks Environmental FaultsEnvironmental Faults

DistributionDistribution Maintenance Maintenance

• Software automates / eliminates operators Software automates / eliminates operators

So, So,

• In the limit there are only software & design faults.In the limit there are only software & design faults.

Software-fault tolerance is the key to dependability.Software-fault tolerance is the key to dependability.

INVENT IT! INVENT IT!

} { }{

Page 12: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 14

Fault Tolerance TechniquesFault Tolerance Techniques

• FAIL FAST MODULES: work or stopFAIL FAST MODULES: work or stop

• SPARE MODULES : SPARE MODULES : instant repair time. repair time.

• INDEPENDENT MODULE FAILS by designINDEPENDENT MODULE FAILS by designMTTFMTTFPairPair ~ MTTF ~ MTTF22/ MTTR (/ MTTR (so want tiny MTTRso want tiny MTTR) )

• MESSAGE BASED OS: Fault IsolationMESSAGE BASED OS: Fault Isolationsoftware has no shared memory.

• SESSION-ORIENTED COMM: Reliable messagesSESSION-ORIENTED COMM: Reliable messagesdetect lost/duplicate messagescoordinate messages with commit

• PROCESS PAIRS :PROCESS PAIRS :Mask Hardware & Software Faults

• TRANSACTIONS: give A.C.I.D. (simple fault model)TRANSACTIONS: give A.C.I.D. (simple fault model)

Page 13: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 15

Example: the FT BankExample: the FT Bank

Modularity & Repair are KEY:Modularity & Repair are KEY:

vonNeumann needed 20,000x redundancy in wires and switchesvonNeumann needed 20,000x redundancy in wires and switches

We use 2x redundancy.We use 2x redundancy.

Redundant hardware can support peak loads Redundant hardware can support peak loads (so not redundant)(so not redundant)

Fault Tolerant Computer Backup System

System MTTF >10 YEAR (except for power & terminals)

Page 14: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 16

Fail-Fast is Good, Repair is NeededFail-Fast is Good, Repair is Needed

Improving either MTTR or MTTF gives benefitImproving either MTTR or MTTF gives benefit

Simple redundancy does not help much.Simple redundancy does not help much.

Lifecycle of a moduleLifecycle of a modulefail-fast gives fail-fast gives short fault latencyshort fault latency

High Availability High Availability

is low UN-Availabilityis low UN-Availability

Unavailability Unavailability MTTRMTTR MTTFMTTF

Page 15: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 17

Hardware Reliability/Availability Hardware Reliability/Availability (how to make it fail fast)(how to make it fail fast)

Comparitor Strategies:Comparitor Strategies:

Duplex: Duplex: Fail-Fast: fail if either fails (e.g. duplexed cpus)Fail-Fast: fail if either fails (e.g. duplexed cpus)vs vs Fail-Soft: fail if both fail (e.g. disc, atm,...)Fail-Soft: fail if both fail (e.g. disc, atm,...)Note: in recursive pairs, parent knows which is bad.

Triplex:Triplex: Fail-Fast: fail if 2 fail (triplexed cpus)Fail-Fast: fail if 2 fail (triplexed cpus) Fail-Soft: fail if 3 fail (triplexed FailFast cpus)Fail-Soft: fail if 3 fail (triplexed FailFast cpus)

Basic FailFast DesignsPair Triplex

Recursive Designs

Recursive Availability Designs

Pair & Spare + + Triple Modular Redundancy

Page 16: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 18

Redundant Designs have Worse MTTF!Redundant Designs have Worse MTTF!

THIS IS NOT GOOD: Variance is lower but MTTF is worseTHIS IS NOT GOOD: Variance is lower but MTTF is worse

Simple redundancy does not improve MTTF (sometimes hurts).Simple redundancy does not improve MTTF (sometimes hurts).

This is just an example of the airplane rule.This is just an example of the airplane rule.

3 work

2 work

1 work

0 work

mttf/3 mttf/2

mttf/1

3 work

2 work

1 work

0 work

mttf/3 mttf/2

5/6*mttf

11/6*mttf

TMR: fail fast

TMR: fail soft

2 work

1 work

0 work

mttf/2

mttf/1

2 work

1 work

0 work

mttf/2

mttf/2

1.5*mttf

Duplex: fail fast

Duplex: fail soft

mttf/1

3 work

2 work

1 work

0 work

0 mttf/2

3/4*mttfPair & Spare: fail fast

4 work

mttf/4

mttf

3 work

2 work

1 work

0 work

mttf/2

~2.1*mttfPair & Spare: fail soft

4 work

mttf/4 mttf/3

mttf/1 mttf/1

Page 17: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 19

Add Repair: Get 10Add Repair: Get 104 4 ImprovementImprovement

Availability estimates1 year MTTF modules

12-hour MTTR

MTTF EQUATION COST

SIMPLEX 1 year MTTF 1DUPLEX:FAIL FAST

~0.5years

MTTF/2 2+

DUPLEX: FAILSOFT

~1.5years

MTTF(3/2) 2+

TRIPLEX:FAIL FAST

.8 year MTTF(5/6) 3+

TRIPLEX:FAIL SOFT

1.8year

1.8MTTF 3+

Pair and spare:FAIL-FAST

~.7year

MTTF(3/4) 4+

TRIPLEX WITHREPAIR

>105years

MTTF3/3MTTR2

3+

Duplex fail soft +REPAIR

>104years

MTTF2/2MTTR 4+

3 work

2 work

1 work

0 work

mttf/3

TMR: fail fast

2 work

1 work

0 work

mtbf/2

Duplex: fail fast: mttf/2

mttrmttr mttr mttr mttr

10 mttf4

mttf/1

3 work

2 work

1 work

0 work

mttf/3 mttf/2

TMR: fail soft

mttf/1

2 work

1 work

0 work

Duplex: fail soft

mttrmttrmttrmttrmttr

10 mttf5

10 mttf4

mttf/1 mttf/2

mttf/2

mttf/2

Page 18: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 20

When To Repair?When To Repair?

Chances Of Tolerating A Fault are 1000:1 (class 3)Chances Of Tolerating A Fault are 1000:1 (class 3)A 1995 study: Processor & Disc Rated At ~ 10khr MTTFA 1995 study: Processor & Disc Rated At ~ 10khr MTTF

Computed Single Computed Single Observed Observed FailuresFailures Double Fails Double Fails Ratio Ratio

10k Processor Fails10k Processor Fails 14 Double14 Double ~ 1000 : 1 ~ 1000 : 1 40k Disc Fails,40k Disc Fails, 26 Double26 Double ~ 1000 : 1 ~ 1000 : 1

Hardware Maintenance:Hardware Maintenance:

On-Line Maintenance "Works" 999 Times Out Of 1000.On-Line Maintenance "Works" 999 Times Out Of 1000.The chance a duplexed disc will fail during maintenance?1:1000

Risk Is 30x Higher During MaintenanceRisk Is 30x Higher During Maintenance

=> Do It Off Peak Hour=> Do It Off Peak Hour

Software Maintenance:Software Maintenance:

Repair Only Virulent BugsRepair Only Virulent Bugs

Wait For Next Release To Fix Benign BugsWait For Next Release To Fix Benign Bugs

Page 19: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 21

OK: So FarOK: So Far

Hardware fail-fast is easyHardware fail-fast is easy

Redundancy plus Repair is great (Class 7 availability) Redundancy plus Repair is great (Class 7 availability)

Hardware redundancy & repair is via modules.Hardware redundancy & repair is via modules.

How can we get instant software repair?How can we get instant software repair?

We Know How To Get Reliable StorageWe Know How To Get Reliable Storage

RAID Or Dumps And Transaction Logs.RAID Or Dumps And Transaction Logs.

We Know How To Get Available StorageWe Know How To Get Available Storage

Fail Soft Duplexed Discs (RAID 1...N).Fail Soft Duplexed Discs (RAID 1...N).

? HOW DO WE GET RELIABLE EXECUTION?? HOW DO WE GET RELIABLE EXECUTION?

? HOW DO WE GET AVAILABLE EXECUTION?? HOW DO WE GET AVAILABLE EXECUTION?

Page 20: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 22

OutlineOutline

• Does fault tolerance work?Does fault tolerance work?

• General methods to mask faults.General methods to mask faults.

• Software-fault toleranceSoftware-fault tolerance• SummarySummary

Page 21: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 23

Software Techniques: Software Techniques: Learning from HardwareLearning from Hardware

Recall that most outages are not hardware. Recall that most outages are not hardware.

Most outages in Fault Tolerant Systems are SOFTWAREMost outages in Fault Tolerant Systems are SOFTWARE

Fault Avoidance Techniques: Good & Correct design.Fault Avoidance Techniques: Good & Correct design.

After that: Software Fault Tolerance Techniques:After that: Software Fault Tolerance Techniques:

Modularity (isolation, fault containment) Modularity (isolation, fault containment)

Design diversity Design diversity

N-Version Programming: N-different implementations N-Version Programming: N-different implementations

Defensive Programming: Check parameters and data Defensive Programming: Check parameters and data

Auditors: Check data structures in backgroundAuditors: Check data structures in background

Transactions: to clean up state after a failureTransactions: to clean up state after a failure

Paradox: Need Fail-Fast SoftwareParadox: Need Fail-Fast Software

Page 22: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 24

Fail-Fast and High-Availability Fail-Fast and High-Availability ExecutionExecution

Software N-Plexing: Design DiversitySoftware N-Plexing: Design DiversityN-Version ProgrammingN-Version ProgrammingWrite the same program N-Times (N > 3)Write the same program N-Times (N > 3)Compare outputs of all programs and take majority vote Compare outputs of all programs and take majority vote

Process Pairs: Instant restart (repair)Process Pairs: Instant restart (repair)Use Defensive programming to make a process fail-fastUse Defensive programming to make a process fail-fastHave restarted process ready in separate environment Have restarted process ready in separate environment Second process “takes over” if primary faultsSecond process “takes over” if primary faultsTransaction mechanism can clean up distributed state Transaction mechanism can clean up distributed state

if takeover in middle of computation.if takeover in middle of computation.

SESSIONPRIMARYPROCESS

BACKUPPROCESS

STATEINFORMATION

LOGICAL PROCESS = PROCESS PAIR

Page 23: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 25

What Is MTTF of N-Version Program?What Is MTTF of N-Version Program?

First fails after MTTF/NFirst fails after MTTF/NSecond fails after MTTF/(N-1),...Second fails after MTTF/(N-1),...

so MTTF(1/N + 1/(N-1) + ... + 1/2)so MTTF(1/N + 1/(N-1) + ... + 1/2)harmonic series goes to infinity, but VERY slowly harmonic series goes to infinity, but VERY slowly

for example 100-version programming gives for example 100-version programming gives ~4 MTTF of 1-version programming~4 MTTF of 1-version programming

Reduces varianceReduces variance

N-Version Programming Needs REPAIRN-Version Programming Needs REPAIR

If a program fails, must reset its state from other programs.If a program fails, must reset its state from other programs.=> programs have common data/state representation.=> programs have common data/state representation.

How does this work for How does this work for Database Systems?Database Systems?Operating Systems?Operating Systems?Network Systems?Network Systems?

Answer: I don’t know.Answer: I don’t know.

Page 24: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 26

Why Process Pairs Mask FaultsWhy Process Pairs Mask FaultsMany Software Faults are SoftMany Software Faults are Soft

After After Design ReviewDesign Review

Code InspectionCode Inspection

Alpha TestAlpha Test

Beta TestBeta Test

10k Hrs Of Gamma Test (Production)10k Hrs Of Gamma Test (Production)

Most Software Faults Are TransientMost Software Faults Are TransientMVS Functional Recovery Routines MVS Functional Recovery Routines 5:15:1Tandem SpoolerTandem Spooler 100:1100:1AdamsAdams >100:1>100:1

Terminology:Terminology:

Heisenbug: Works On RetryHeisenbug: Works On Retry

Bohrbug: Faults Again On RetryBohrbug: Faults Again On RetryAdams: "Optimizing Preventative Service of Software Products", IBM J R&D,28.1,1984Adams: "Optimizing Preventative Service of Software Products", IBM J R&D,28.1,1984

Gray: "Why Do Computers Stop", Tandem TR85.7, 1985Gray: "Why Do Computers Stop", Tandem TR85.7, 1985

Mourad: "The Reliability of the IBM/XA Operating System", 15 ISFTCS, 1985.Mourad: "The Reliability of the IBM/XA Operating System", 15 ISFTCS, 1985.

Page 25: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 27

Process Pair Repair StrategyProcess Pair Repair Strategy

If software fault (bug) is a Bohrbug, then there is no repairIf software fault (bug) is a Bohrbug, then there is no repair““wait for the next release” or wait for the next release” or ““get an emergency bug fix” orget an emergency bug fix” or““get a new vendor”get a new vendor”

If software fault is a Heisenbug, then repair is If software fault is a Heisenbug, then repair is reboot and retry orreboot and retry orswitch to backup process (instant restart)switch to backup process (instant restart)

PROCESS PAIRS Tolerate PROCESS PAIRS Tolerate Hardware Faults Hardware Faults HeisenbugsHeisenbugs

Repair time is seconds, could be mili-seconds if time is criticalRepair time is seconds, could be mili-seconds if time is critical

Flavors Of Process Pair:Flavors Of Process Pair: LockstepLockstepAutomaticAutomaticState CheckpointingState CheckpointingDelta CheckpointingDelta CheckpointingPersistentPersistent

SESSIONPRIMARYPROCESS

BACKUPPROCESS

STATEINFORMATION

LOGICAL PROCESS = PROCESS PAIR

Page 26: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 28

How Takeover Masks Failures How Takeover Masks Failures

Server Resets At Takeover But What About Server Resets At Takeover But What About Application State?Application State?

Database State?Database State?

Network State?Network State?

Answer: Answer: Use Transactions To Reset State!Use Transactions To Reset State!

Abort Transaction If Process Fails.Abort Transaction If Process Fails.

Keeps Network "Up"Keeps Network "Up"

Keeps System "Up"Keeps System "Up"

Reprocesses Some Transactions On FailureReprocesses Some Transactions On Failure

SESSIONPRIMARYPROCESS

BACKUPPROCESS

STATEINFORMATION

LOGICAL PROCESS = PROCESS PAIR

Page 27: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 29

PROCESS PAIRS - SUMMARYPROCESS PAIRS - SUMMARY

Transactions Give ReliabilityTransactions Give Reliability

Process Pairs Give AvailabilityProcess Pairs Give Availability

Process Pairs Are Expensive & Hard To ProgramProcess Pairs Are Expensive & Hard To Program

Transactions + Persistent Process Pairs Transactions + Persistent Process Pairs

=> Fault Tolerant=> Fault Tolerant SessionsSessionsExecutionExecution

When Tandem Converted To This StyleWhen Tandem Converted To This Style

Saved 3x MessagesSaved 3x Messages

Saved 5x Message Bytes Saved 5x Message Bytes

Made Programming EasierMade Programming Easier

Page 28: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 30

SYSTEM PAIRSSYSTEM PAIRSFOR HIGH AVAILABILITYFOR HIGH AVAILABILITY

Programs, Data, Processes Replicated at two sites.Programs, Data, Processes Replicated at two sites.

Pair looks like a single system.Pair looks like a single system.

System becomes logical conceptSystem becomes logical concept

Like Process Pairs: System Pairs.Like Process Pairs: System Pairs.

Backup receives transaction log (spooled if backup down).Backup receives transaction log (spooled if backup down).

If primary fails or operator Switches, backup offers service.If primary fails or operator Switches, backup offers service.

Primary Backup

Page 29: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 31

SYSTEM PAIR SYSTEM PAIR CONFIGURATION OPTIONSCONFIGURATION OPTIONS

Mutual Backup: Mutual Backup:

each has 1/2 of Database & Applicationeach has 1/2 of Database & Application

Hub: Hub:

One site acts as backup for many othersOne site acts as backup for many others

In General can be any directed graphIn General can be any directed graph

Stale replicas: Lazy replicationStale replicas: Lazy replication

Primary Backup

Primary Backup

Primary

Primary

Primary Backup

Copy

Copy Copy

Page 30: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 32

SYSTEM PAIRS FOR: SYSTEM PAIRS FOR: SOFTWARE MAINTENANCESOFTWARE MAINTENANCE

Similar ideas apply to:Similar ideas apply to:

Database ReorganizationDatabase Reorganization

Hardware modification (e.g. add discs, processors,...)Hardware modification (e.g. add discs, processors,...)

Hardware maintenanceHardware maintenance

Environmental changes (rewire, new air conditioning)Environmental changes (rewire, new air conditioning)

Move primary or backup to new location.Move primary or backup to new location.

V2

(Primary)

(Backup)

V1 V1

(Primary)

(Backup)

V1 V2

Step 1: Both systems are running V1. Step 2: Backup is cold-loaded as V2.

(Backup)

(Primary)

V1 V2

(Backup)

(Primary)

V2

Step 3: SWITCH to Backup. Step 4: Backup is cold-loaded as V2 D30.

Page 31: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 33

SYSTEM PAIR BENEFITSSYSTEM PAIR BENEFITS

Protects against ENVIRONMENT: different sitesProtects against ENVIRONMENT: different sitesweatherweatherutilitiesutilitiessabotagesabotage

Protects against OPERATOR FAILURE: Protects against OPERATOR FAILURE: two sites, two sets of operatorstwo sites, two sets of operators

Protects against MAINTENANCE OUTAGESProtects against MAINTENANCE OUTAGESwork on backupwork on backupsoftware/hardware install/upgrade/move...software/hardware install/upgrade/move...

Protects against HARDWARE FAILURESProtects against HARDWARE FAILURESbackup takes overbackup takes over

Protects against TRANSIENT SOFTWARE ERRORSProtects against TRANSIENT SOFTWARE ERRORS

Commercial systems:Commercial systems: Digital's Remote Transaction Router (RTR)Digital's Remote Transaction Router (RTR)Tandem's Remote Database Facility (RDF)Tandem's Remote Database Facility (RDF)IBM's Cross Recovery XRF( both in same IBM's Cross Recovery XRF( both in same

campus)campus)Oracle, Sybase, Informix, Microsoft... replicationOracle, Sybase, Informix, Microsoft... replication

Page 32: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 34

SUMMARYSUMMARY

FT systems fail for the conventional reasonsFT systems fail for the conventional reasonsEnvironmentEnvironment mostlymostlyPeoplePeople sometimessometimesSoftwareSoftware mostlymostlyHardwareHardware RarelyRarely

MTTF of FT SYSTEMS MTTF of FT SYSTEMS ~ 50X conventional ~ 50X conventional

~ years vs weeks~ years vs weeks

Fail-Fast Modules + Reconfiguration + Repair =>Fail-Fast Modules + Reconfiguration + Repair =>

Good Hardware Fault ToleranceGood Hardware Fault Tolerance

Transactions + Process Pairs => Transactions + Process Pairs =>

Good Software Fault Tolerance (Repair)Good Software Fault Tolerance (Repair)

System Pairs Hide Many FaultsSystem Pairs Hide Many Faults

Challenge: Tolerate Human ErrorsChallenge: Tolerate Human Errors

(make system simpler to manage, operate, and maintain)(make system simpler to manage, operate, and maintain)

Page 33: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 35

Key IdeaKey Idea

ArchitectureArchitecture Hardware FaultsHardware Faults

SoftwareSoftware Masks Masks Environmental FaultsEnvironmental Faults

DistributionDistribution Maintenance Maintenance

• Software automates / eliminates operators Software automates / eliminates operators

So, So,

• In the limit there are only software & design faults.In the limit there are only software & design faults.

Software-fault tolerance is the key to dependability.Software-fault tolerance is the key to dependability.

INVENT IT! INVENT IT!

} { }{

Page 34: Gray FT 4/24/95 1 Dependable Computing Systems Jim Gray UC Berkeley McKay Lecture 25 April 1995 Gray @ Microsoft.com Talk 1: Many little will win over.

Gray FT 4/24/95 36

ReferencesReferences

Adams, E. (1984). “Optimizing Preventative Service of Software Products.” IBM Journal of Research and Development. 28(1): 2-14.0

Anderson, T. and B. Randell. (1979). Computing Systems Reliability.

Garcia-Molina, H. and C. A. Polyzois. (1990). Issues in Disaster Recovery. 35th IEEE Compcon 90. 573-577.

Gray, J. (1986). Why Do Computers Stop and What Can We Do About It. 5th Symposium on Reliability in Distributed Software and Database Systems. 3-12.

Gray, J. (1990). “A Census of Tandem System Availability between 1985 and 1990.” IEEE Transactions on Reliability. 39(4): 409-418.

Gray, J. N., Reuter, A. (1993). Transaction Processing Concepts and Techniques. San Mateo, Morgan Kaufmann.

Lampson, B. W. (1981). Atomic Transactions. Distributed Systems -- Architecture and Implementation: An Advanced Course. ACM, Springer-Verlag.

Laprie, J. C. (1985). Dependable Computing and Fault Tolerance: Concepts and Terminology. 15’th FTCS. 2-11.

Long, D.D., J. L. Carroll, and C.J. Park (1991). A study of the reliability of Internet sites. Proc 10’th Symposium on Reliable Distributed Systems, pp. 177-186, Pisa, September 1991.


Recommended