+ All Categories
Home > Documents > Critical Systems Development IS301 – software Engineering Lecture #19 – 2003-10-28 M. E. Kabay,...

Critical Systems Development IS301 – software Engineering Lecture #19 – 2003-10-28 M. E. Kabay,...

Date post: 13-Jan-2016
Category:
Upload: austin-oconnor
View: 216 times
Download: 0 times
Share this document with a friend
49
Critical Systems Development IS301 – software Engineering Lecture #19 – 2003-10-28 M. E. Kabay, PhD, CISSP Dept of Computer Information Systems Norwich University [email protected]
Transcript

Critical Systems

DevelopmentIS301 – software Engineering

Lecture #19 – 2003-10-28M. E. Kabay, PhD, CISSP

Dept of Computer Information SystemsNorwich University

[email protected]

2 Copyright © 2003 M. E. Kabay. All rights reserved.

First, take a deep breath.You are about to enter the

fire-hose zone.

3 Copyright © 2003 M. E. Kabay. All rights reserved.

AcknowledgementMost of the material in this presentation is

based directly on slides kindly provided by

Prof. Ian Sommerville on his Web site at

http://www.software-engin.com Used with Sommerville’s permission as

extended by him for all non-commercial educational use

Copyright in Kabay’s name applies solely to appearance and minor changes in Sommerville’s work or to original materials and is used solely to prevent commercial exploitation of this material

4 Copyright © 2003 M. E. Kabay. All rights reserved.

Dependable Software Development

Programming techniques for building dependable software systems.

5 Copyright © 2003 M. E. Kabay. All rights reserved.

Software Dependability

In general, software customers expect all software to be dependable

For non-critical applications, may be willing to accept some system failures

Some applications have very high dependability requirements Special programming techniques req’d

6 Copyright © 2003 M. E. Kabay. All rights reserved.

Dependability Achievement

Fault avoidanceSoftware developed so

Human error avoided and System faults minimised

Development process organised so Faults in software detected and Repaired before delivery to customer

Fault toleranceSoftware designed so

Faults in delivered software do not result in system failure

7 Copyright © 2003 M. E. Kabay. All rights reserved.

Fault Minimisation

Current methods of software engineering now

allow for production of fault-free softwareFault-free software means it conforms to its

specificationDoes NOT mean software

which will always perform correctly

Why not?

Because of specificatio

n errors.

8 Copyright © 2003 M. E. Kabay. All rights reserved.

Cost of Producing Fault-Free Software (1)

Very highCost-effective only in exceptional

situationsWhich?

May be cheaper to accept software faultsBut who will bear costs?

Users?Manufacturers?Both?

Will the risk-sharing be with full knowledge?

9 Copyright © 2003 M. E. Kabay. All rights reserved.

Cost of Producing Fault-Free Software (2)

The Pareto Principle

Costs

To

tal

% o

f E

rro

rs F

ixed

20%

80%

100%

If curve really is asymptotic to 100%, cost

may approach

10 Copyright © 2003 M. E. Kabay. All rights reserved.

Cost of ProducingFault-Free Software (3)

Many Few Very fewNumber of residual errors

Co

st p

er e

rro

r d

etec

ted

11 Copyright © 2003 M. E. Kabay. All rights reserved.

Fault-Free Software Development

Needs precise (preferably formal) specification

Requires organizational commitment to quality

Information hiding and encapsulation in software design essential

Use programming language with strict typing and run-time checking

Avoid error-prone constructsUse dependable and repeatable development

process

12 Copyright © 2003 M. E. Kabay. All rights reserved.

Structured Programming

First discussed in 1970'sProgramming without gotoWhile loops and if statements as only

control statementsTop-down design Important because it promoted thought and

discussion about programmingPrograms easier to read and understand than

old spaghetti code

13 Copyright © 2003 M. E. Kabay. All rights reserved.

Error-Prone Constructs (1)Floating-point numbers

Inherently imprecise – and machine-dependent

Imprecision may lead to invalid comparisons

PointersPointers referring to wrong memory as can

corrupt dataAliasing can make programs difficult to

understand and changeDynamic memory allocation

Run-time allocation can cause memory overflow

14 Copyright © 2003 M. E. Kabay. All rights reserved.

Error-Prone Constructs (2)Parallelism

Can result in subtle timing errors (race conditions) because of unforeseen interaction between parallel processes

RecursionErrors in recursion can cause memory

overflow Interrupts

Interrupts can cause critical operation to be terminated and make program difficult to understand

Similar to goto statements

15 Copyright © 2003 M. E. Kabay. All rights reserved.

Error-Prone Constructs (3)

InheritanceCode not localisedCan result in unexpected behaviour when

changes madeCan be hard to understandDifficult to debug problems

All of these constructs don’t have to be absolutely eliminated But must be used with great care

16 Copyright © 2003 M. E. Kabay. All rights reserved.

Information Hiding

Information should only be exposed to those parts of program which need to access it.

Create objects or abstract data types which maintain state and operations on state

Reduces faults:Less accidental corruption of information ‘Firewalls’ make problems less likely to

spread to other parts of programAll information localised:

Programmer less likely to make errorsReviewers more likely to find errors

17 Copyright © 2003 M. E. Kabay. All rights reserved.

Example: Queue Specification in Java

interface Queue {

public void put (Object o) ;public void remove (Object o) ;public int size () ;

} //Queue

Fig. 18.2, p. 398• Users can put, remove, query size• But implementation of queue is concealed as private

18 Copyright © 2003 M. E. Kabay. All rights reserved.

Example: Signal Declaration in Java

class Signal {

public final int red = 1 ; public final int amber = 2 ; public final int green = 3 ;

... other declarations here ...}

Fig. 18.3, p. 398• Define constants as globals once• Refer to signal.red, signal.green etc.• Avoids risk of accidentally using wrong value in parm

19 Copyright © 2003 M. E. Kabay. All rights reserved.

Reliable Software Processes

Well-defined, repeatable software process:Reduces software faultsDoes not depend entirely on individual

skills – can be enacted by different peopleProcess activities should include significant

verification and validation

20 Copyright © 2003 M. E. Kabay. All rights reserved.

Process Validation Activities

Requirements inspectionsRequirements managementModel checkingDesign and code inspectionStatic analysisTest planning and managementConfiguration management also essential

21 Copyright © 2003 M. E. Kabay. All rights reserved.

Fault Tolerance

Critical software systems must be fault tolerantSystem can continue operating in spite of

software failureFault tolerance required in

High availability requirements orSystem failure costs very high

Even “fault-free” systems need fault tolerance May be specification errors orValidation may be incorrect

22 Copyright © 2003 M. E. Kabay. All rights reserved.

Fault Tolerance ActionsFault detection

Incorrect system state has occurredDamage assessment

Identify parts of system state affected by fault

Fault recoveryReturn to known safe state

Fault repairPrevent recurrence of faultIdentify underlying problemIf not transient*, then fix errors of design,

implementation, documentation or training that led to error

E.g., hardware failure

*

23 Copyright © 2003 M. E. Kabay. All rights reserved.

Approaches to Fault ToleranceDefensive programming

Programmers assume faults in codeCheck state after modifications to ensure

consistencyFault-tolerant architectures

HW & SW system architectures support redundancy and fault tolerance

Controller detects problems and supports fault recovery

Complementary rather than opposing techniques

24 Copyright © 2003 M. E. Kabay. All rights reserved.

Fault Detection (1)

Strictly-typed languages E.g., Java and Ada Many errors trapped at compile-time

Some classes of error can only be discovered at run-time

Fault detection: Detecting erroneous system state Throwing exception

To manage detected fault

25 Copyright © 2003 M. E. Kabay. All rights reserved.

Fault Detection (2)

Preventative fault detectionCheck conditions before making changesIf bad state detected, don’t make change

Retrospective fault detectionCheck validity after system state has been

changedUsed when

Incorrect sequence of correct actions leads to erroneous state or

When preventative fault detection involves too much overhead

26 Copyright © 2003 M. E. Kabay. All rights reserved.

Damage Assessment

Analyse system stateJudge extent of corruption caused by

system failureAssess what parts of state space have been

affected by failureGenerally based on ‘validity functions’

Can be applied to state elements Assess if their value within allowed range

27 Copyright © 2003 M. E. Kabay. All rights reserved.

Damage Assessment Techniques

Checksums Used for damage assessment in data

transmissionVerify integrity after transmission

Redundant pointers Check integrity of data structuresE.g., databases

Watch-dog timers Check for non-terminating processesIf no response after certain time, there’s a

problem

28 Copyright © 2003 M. E. Kabay. All rights reserved.

Fault Recovery

Forward recoveryApply repairs to corrupted system stateDomain knowledge required to compute

possible state correctionsForward recovery usually application

specificBackward recovery

Restore system state to known safe stateSimpler than forward recoveryDetails of safe state maintained and

replaces corrupted system state

29 Copyright © 2003 M. E. Kabay. All rights reserved.

Forward Recovery

Data communicationsAdd redundancy to coded dataUse to repair data corrupted during

transmissionRedundant pointers

E.g., doubly-linked lists Damaged list / file may be repaired if

enough links are still validOften used for database and filesystem

repair

30 Copyright © 2003 M. E. Kabay. All rights reserved.

Backward Recovery

Transaction processing often uses conservative methods to avoid problems

Complete computations, then apply changesKeep original data in buffersPeriodic checkpoints allow system to 'roll-

back' to correct state

31 Copyright © 2003 M. E. Kabay. All rights reserved.

Key Points

Fault tolerant software can continue in execution in presence of software faults

Fault tolerance requires failure detection, damage assessment, recovery and repair

Defensive programming approach to fault tolerance relies on inclusion of redundant checks in program

Exception handling facilities simplify process of defensive programming

32 Copyright © 2003 M. E. Kabay. All rights reserved.

Fault Tolerant Architecture

Defensive programming cannot cope with faults involve interactions between hardware and software

Misunderstandings of requirements may mean checks and associated code incorrect

Where systems have high availability requirements, specific architecture designed to support fault tolerance may be required.

Must tolerate both hardware and software failure

33 Copyright © 2003 M. E. Kabay. All rights reserved.

Hardware Fault Tolerance

Depends on triple-modular redundancy (TMR)3 replicated identical components

Receive same input Outputs compared

If 1 output different, is discardedComponent failure assumed

AssumesMost faults result from component failuresFew design faultsLow probability of simultaneous

component failures

34 Copyright © 2003 M. E. Kabay. All rights reserved.

Hardware Reliability With TMR

A2

A1

A3

Outputcomparator

Faultmanager

35 Copyright © 2003 M. E. Kabay. All rights reserved.

Fault Tolerant Software Architectures

Assumptions of TMR for HW not true for softwareSW design flaws more common than in HWCannot replicate same component in

software:Would have common design faultsSimultaneous component failure

therefore virtually inevitableSoftware systems must therefore be diverse

36 Copyright © 2003 M. E. Kabay. All rights reserved.

Design Diversity

Different versions of system designed and implemented in different waysOught to have different failure modes

Different approaches to design (e.g object-oriented and function oriented)Implementation in different programming

languagesUse of different tools and development

environmentsUse of different algorithms in

implementation

37 Copyright © 2003 M. E. Kabay. All rights reserved.

Software Analogies to TMR (1)

N-version programmingSame specification implemented

In number of different versions By different teams

All versions compute simultaneously Majority output selected Using voting system

Most commonly-used approach E.g. in Airbus 320 control systems

38 Copyright © 2003 M. E. Kabay. All rights reserved.

N-version programming (1)

Version 2

Version 1

Version 3

Outputcomparator

N-versions

Agreedresult

39 Copyright © 2003 M. E. Kabay. All rights reserved.

N-version Programming (2)

Different system versions Designed and implemented by different

teamsAssume low probability of same mistakesAlgorithms used should be different

Problem: may not be different enoughSome empirical evidence (research)Teams commonly misinterpret

specifications in same way Choose same algorithms for their systems

40 Copyright © 2003 M. E. Kabay. All rights reserved.

Software Analogies to TMR (2)

Recovery blocksExplicitly different versions of same

specification written and executed in sequence

Acceptance test used to select output to be transmitted

41 Copyright © 2003 M. E. Kabay. All rights reserved.

Recovery Blocks (1)

Acceptancetest

Algorithm 2

Algorithm 1

Algorithm 3

Recoveryblocks

Test forsuccess

Retest

Retry

Retest

Try algorithm1

Continue execution ifacceptance test succeedsSignal exception if allalgorithms fail

Acceptance testfails – re-try

42 Copyright © 2003 M. E. Kabay. All rights reserved.

Recovery Blocks (2)

Force different algorithm to be used for each version so they reduce probability of common errors

However, design of acceptance test difficult as it must be independent of computation used

Problems with approach for real-time systems because of sequential operation of redundant versions

43 Copyright © 2003 M. E. Kabay. All rights reserved.

Problems with Design Diversity

Teams not culturally diverse so they tend to tackle problems in same way

Characteristic errorsDifferent teams make same mistakes

Some parts of implementation more difficult than others so all teams tend to make mistakes in same place.

Specification errorsIf error in specification, then reflected

in all implementationsCan be addressed to some extent by

using multiple specification representations

44 Copyright © 2003 M. E. Kabay. All rights reserved.

Specification Dependency

Both approaches to software redundancy susceptible to specification errors. If specification incorrect, system could fail

Also problem with hardware but software specifications usually more complex than hardware specifications and harder to validate

Has been addressed in some cases by developing separate software specifications from same user specification

45 Copyright © 2003 M. E. Kabay. All rights reserved.

Software Redundancy Needed?

Software faults not inevitable (unlike hardware faults = inevitable consequence of physical world) (WHY?)

Reducing software complexity may improve reliability and availability (WHY?)

Redundant software much more complex Scope for range of additional errorsAffect system reliabilityIronically, caused by existence of fault-

tolerance controllers

46 Copyright © 2003 M. E. Kabay. All rights reserved.

Key Points

Dependability in system can be achieved through fault avoidance and fault tolerance

Some programming language constructs such as gotos, recursion and pointers inherently error-prone

Data typing allows many potential faults to be

trapped at compile time.

47 Copyright © 2003 M. E. Kabay. All rights reserved.

Key Points

Fault tolerant architectures rely on replicated hard and software components

include mechanisms to detect faulty component and to switch it out of system

N-version programming and recovery blocks two different approaches to designing fault-tolerant software architectures

Design diversity essential for software redundancy

48 Copyright © 2003 M. E. Kabay. All rights reserved.

Homework

Study Chapter 18 in detail using SQ3RFor next Tuesday 4 Nov 2003

Complete exercises 18.1, 18.3-18.5 & 18.7-18.9 for 21 points

For next class (Thursday 30 Oct 2003), apply S-Q to chapter 19 on Verification and Validation. This is increasingly hard stuff, so STUDY before coming to class.

OPTIONAL:For 3 extra points per question, answer

any of questions 18.2, 18.6, 18.10 and 18.11 by the 11 Nov.

49 Copyright © 2003 M. E. Kabay. All rights reserved.

DISCUSSION


Recommended