Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | austin-oconnor |
View: | 216 times |
Download: | 0 times |
Critical Systems
DevelopmentIS301 – software Engineering
Lecture #19 – 2003-10-28M. E. Kabay, PhD, CISSP
Dept of Computer Information SystemsNorwich University
2 Copyright © 2003 M. E. Kabay. All rights reserved.
First, take a deep breath.You are about to enter the
fire-hose zone.
3 Copyright © 2003 M. E. Kabay. All rights reserved.
AcknowledgementMost of the material in this presentation is
based directly on slides kindly provided by
Prof. Ian Sommerville on his Web site at
http://www.software-engin.com Used with Sommerville’s permission as
extended by him for all non-commercial educational use
Copyright in Kabay’s name applies solely to appearance and minor changes in Sommerville’s work or to original materials and is used solely to prevent commercial exploitation of this material
4 Copyright © 2003 M. E. Kabay. All rights reserved.
Dependable Software Development
Programming techniques for building dependable software systems.
5 Copyright © 2003 M. E. Kabay. All rights reserved.
Software Dependability
In general, software customers expect all software to be dependable
For non-critical applications, may be willing to accept some system failures
Some applications have very high dependability requirements Special programming techniques req’d
6 Copyright © 2003 M. E. Kabay. All rights reserved.
Dependability Achievement
Fault avoidanceSoftware developed so
Human error avoided and System faults minimised
Development process organised so Faults in software detected and Repaired before delivery to customer
Fault toleranceSoftware designed so
Faults in delivered software do not result in system failure
7 Copyright © 2003 M. E. Kabay. All rights reserved.
Fault Minimisation
Current methods of software engineering now
allow for production of fault-free softwareFault-free software means it conforms to its
specificationDoes NOT mean software
which will always perform correctly
Why not?
Because of specificatio
n errors.
8 Copyright © 2003 M. E. Kabay. All rights reserved.
Cost of Producing Fault-Free Software (1)
Very highCost-effective only in exceptional
situationsWhich?
May be cheaper to accept software faultsBut who will bear costs?
Users?Manufacturers?Both?
Will the risk-sharing be with full knowledge?
9 Copyright © 2003 M. E. Kabay. All rights reserved.
Cost of Producing Fault-Free Software (2)
The Pareto Principle
Costs
To
tal
% o
f E
rro
rs F
ixed
20%
80%
100%
If curve really is asymptotic to 100%, cost
may approach
10 Copyright © 2003 M. E. Kabay. All rights reserved.
Cost of ProducingFault-Free Software (3)
Many Few Very fewNumber of residual errors
Co
st p
er e
rro
r d
etec
ted
11 Copyright © 2003 M. E. Kabay. All rights reserved.
Fault-Free Software Development
Needs precise (preferably formal) specification
Requires organizational commitment to quality
Information hiding and encapsulation in software design essential
Use programming language with strict typing and run-time checking
Avoid error-prone constructsUse dependable and repeatable development
process
12 Copyright © 2003 M. E. Kabay. All rights reserved.
Structured Programming
First discussed in 1970'sProgramming without gotoWhile loops and if statements as only
control statementsTop-down design Important because it promoted thought and
discussion about programmingPrograms easier to read and understand than
old spaghetti code
13 Copyright © 2003 M. E. Kabay. All rights reserved.
Error-Prone Constructs (1)Floating-point numbers
Inherently imprecise – and machine-dependent
Imprecision may lead to invalid comparisons
PointersPointers referring to wrong memory as can
corrupt dataAliasing can make programs difficult to
understand and changeDynamic memory allocation
Run-time allocation can cause memory overflow
14 Copyright © 2003 M. E. Kabay. All rights reserved.
Error-Prone Constructs (2)Parallelism
Can result in subtle timing errors (race conditions) because of unforeseen interaction between parallel processes
RecursionErrors in recursion can cause memory
overflow Interrupts
Interrupts can cause critical operation to be terminated and make program difficult to understand
Similar to goto statements
15 Copyright © 2003 M. E. Kabay. All rights reserved.
Error-Prone Constructs (3)
InheritanceCode not localisedCan result in unexpected behaviour when
changes madeCan be hard to understandDifficult to debug problems
All of these constructs don’t have to be absolutely eliminated But must be used with great care
16 Copyright © 2003 M. E. Kabay. All rights reserved.
Information Hiding
Information should only be exposed to those parts of program which need to access it.
Create objects or abstract data types which maintain state and operations on state
Reduces faults:Less accidental corruption of information ‘Firewalls’ make problems less likely to
spread to other parts of programAll information localised:
Programmer less likely to make errorsReviewers more likely to find errors
17 Copyright © 2003 M. E. Kabay. All rights reserved.
Example: Queue Specification in Java
interface Queue {
public void put (Object o) ;public void remove (Object o) ;public int size () ;
} //Queue
Fig. 18.2, p. 398• Users can put, remove, query size• But implementation of queue is concealed as private
18 Copyright © 2003 M. E. Kabay. All rights reserved.
Example: Signal Declaration in Java
class Signal {
public final int red = 1 ; public final int amber = 2 ; public final int green = 3 ;
... other declarations here ...}
Fig. 18.3, p. 398• Define constants as globals once• Refer to signal.red, signal.green etc.• Avoids risk of accidentally using wrong value in parm
19 Copyright © 2003 M. E. Kabay. All rights reserved.
Reliable Software Processes
Well-defined, repeatable software process:Reduces software faultsDoes not depend entirely on individual
skills – can be enacted by different peopleProcess activities should include significant
verification and validation
20 Copyright © 2003 M. E. Kabay. All rights reserved.
Process Validation Activities
Requirements inspectionsRequirements managementModel checkingDesign and code inspectionStatic analysisTest planning and managementConfiguration management also essential
21 Copyright © 2003 M. E. Kabay. All rights reserved.
Fault Tolerance
Critical software systems must be fault tolerantSystem can continue operating in spite of
software failureFault tolerance required in
High availability requirements orSystem failure costs very high
Even “fault-free” systems need fault tolerance May be specification errors orValidation may be incorrect
22 Copyright © 2003 M. E. Kabay. All rights reserved.
Fault Tolerance ActionsFault detection
Incorrect system state has occurredDamage assessment
Identify parts of system state affected by fault
Fault recoveryReturn to known safe state
Fault repairPrevent recurrence of faultIdentify underlying problemIf not transient*, then fix errors of design,
implementation, documentation or training that led to error
E.g., hardware failure
*
23 Copyright © 2003 M. E. Kabay. All rights reserved.
Approaches to Fault ToleranceDefensive programming
Programmers assume faults in codeCheck state after modifications to ensure
consistencyFault-tolerant architectures
HW & SW system architectures support redundancy and fault tolerance
Controller detects problems and supports fault recovery
Complementary rather than opposing techniques
24 Copyright © 2003 M. E. Kabay. All rights reserved.
Fault Detection (1)
Strictly-typed languages E.g., Java and Ada Many errors trapped at compile-time
Some classes of error can only be discovered at run-time
Fault detection: Detecting erroneous system state Throwing exception
To manage detected fault
25 Copyright © 2003 M. E. Kabay. All rights reserved.
Fault Detection (2)
Preventative fault detectionCheck conditions before making changesIf bad state detected, don’t make change
Retrospective fault detectionCheck validity after system state has been
changedUsed when
Incorrect sequence of correct actions leads to erroneous state or
When preventative fault detection involves too much overhead
26 Copyright © 2003 M. E. Kabay. All rights reserved.
Damage Assessment
Analyse system stateJudge extent of corruption caused by
system failureAssess what parts of state space have been
affected by failureGenerally based on ‘validity functions’
Can be applied to state elements Assess if their value within allowed range
27 Copyright © 2003 M. E. Kabay. All rights reserved.
Damage Assessment Techniques
Checksums Used for damage assessment in data
transmissionVerify integrity after transmission
Redundant pointers Check integrity of data structuresE.g., databases
Watch-dog timers Check for non-terminating processesIf no response after certain time, there’s a
problem
28 Copyright © 2003 M. E. Kabay. All rights reserved.
Fault Recovery
Forward recoveryApply repairs to corrupted system stateDomain knowledge required to compute
possible state correctionsForward recovery usually application
specificBackward recovery
Restore system state to known safe stateSimpler than forward recoveryDetails of safe state maintained and
replaces corrupted system state
29 Copyright © 2003 M. E. Kabay. All rights reserved.
Forward Recovery
Data communicationsAdd redundancy to coded dataUse to repair data corrupted during
transmissionRedundant pointers
E.g., doubly-linked lists Damaged list / file may be repaired if
enough links are still validOften used for database and filesystem
repair
30 Copyright © 2003 M. E. Kabay. All rights reserved.
Backward Recovery
Transaction processing often uses conservative methods to avoid problems
Complete computations, then apply changesKeep original data in buffersPeriodic checkpoints allow system to 'roll-
back' to correct state
31 Copyright © 2003 M. E. Kabay. All rights reserved.
Key Points
Fault tolerant software can continue in execution in presence of software faults
Fault tolerance requires failure detection, damage assessment, recovery and repair
Defensive programming approach to fault tolerance relies on inclusion of redundant checks in program
Exception handling facilities simplify process of defensive programming
32 Copyright © 2003 M. E. Kabay. All rights reserved.
Fault Tolerant Architecture
Defensive programming cannot cope with faults involve interactions between hardware and software
Misunderstandings of requirements may mean checks and associated code incorrect
Where systems have high availability requirements, specific architecture designed to support fault tolerance may be required.
Must tolerate both hardware and software failure
33 Copyright © 2003 M. E. Kabay. All rights reserved.
Hardware Fault Tolerance
Depends on triple-modular redundancy (TMR)3 replicated identical components
Receive same input Outputs compared
If 1 output different, is discardedComponent failure assumed
AssumesMost faults result from component failuresFew design faultsLow probability of simultaneous
component failures
34 Copyright © 2003 M. E. Kabay. All rights reserved.
Hardware Reliability With TMR
A2
A1
A3
Outputcomparator
Faultmanager
35 Copyright © 2003 M. E. Kabay. All rights reserved.
Fault Tolerant Software Architectures
Assumptions of TMR for HW not true for softwareSW design flaws more common than in HWCannot replicate same component in
software:Would have common design faultsSimultaneous component failure
therefore virtually inevitableSoftware systems must therefore be diverse
36 Copyright © 2003 M. E. Kabay. All rights reserved.
Design Diversity
Different versions of system designed and implemented in different waysOught to have different failure modes
Different approaches to design (e.g object-oriented and function oriented)Implementation in different programming
languagesUse of different tools and development
environmentsUse of different algorithms in
implementation
37 Copyright © 2003 M. E. Kabay. All rights reserved.
Software Analogies to TMR (1)
N-version programmingSame specification implemented
In number of different versions By different teams
All versions compute simultaneously Majority output selected Using voting system
Most commonly-used approach E.g. in Airbus 320 control systems
38 Copyright © 2003 M. E. Kabay. All rights reserved.
N-version programming (1)
Version 2
Version 1
Version 3
Outputcomparator
N-versions
Agreedresult
39 Copyright © 2003 M. E. Kabay. All rights reserved.
N-version Programming (2)
Different system versions Designed and implemented by different
teamsAssume low probability of same mistakesAlgorithms used should be different
Problem: may not be different enoughSome empirical evidence (research)Teams commonly misinterpret
specifications in same way Choose same algorithms for their systems
40 Copyright © 2003 M. E. Kabay. All rights reserved.
Software Analogies to TMR (2)
Recovery blocksExplicitly different versions of same
specification written and executed in sequence
Acceptance test used to select output to be transmitted
41 Copyright © 2003 M. E. Kabay. All rights reserved.
Recovery Blocks (1)
Acceptancetest
Algorithm 2
Algorithm 1
Algorithm 3
Recoveryblocks
Test forsuccess
Retest
Retry
Retest
Try algorithm1
Continue execution ifacceptance test succeedsSignal exception if allalgorithms fail
Acceptance testfails – re-try
42 Copyright © 2003 M. E. Kabay. All rights reserved.
Recovery Blocks (2)
Force different algorithm to be used for each version so they reduce probability of common errors
However, design of acceptance test difficult as it must be independent of computation used
Problems with approach for real-time systems because of sequential operation of redundant versions
43 Copyright © 2003 M. E. Kabay. All rights reserved.
Problems with Design Diversity
Teams not culturally diverse so they tend to tackle problems in same way
Characteristic errorsDifferent teams make same mistakes
Some parts of implementation more difficult than others so all teams tend to make mistakes in same place.
Specification errorsIf error in specification, then reflected
in all implementationsCan be addressed to some extent by
using multiple specification representations
44 Copyright © 2003 M. E. Kabay. All rights reserved.
Specification Dependency
Both approaches to software redundancy susceptible to specification errors. If specification incorrect, system could fail
Also problem with hardware but software specifications usually more complex than hardware specifications and harder to validate
Has been addressed in some cases by developing separate software specifications from same user specification
45 Copyright © 2003 M. E. Kabay. All rights reserved.
Software Redundancy Needed?
Software faults not inevitable (unlike hardware faults = inevitable consequence of physical world) (WHY?)
Reducing software complexity may improve reliability and availability (WHY?)
Redundant software much more complex Scope for range of additional errorsAffect system reliabilityIronically, caused by existence of fault-
tolerance controllers
46 Copyright © 2003 M. E. Kabay. All rights reserved.
Key Points
Dependability in system can be achieved through fault avoidance and fault tolerance
Some programming language constructs such as gotos, recursion and pointers inherently error-prone
Data typing allows many potential faults to be
trapped at compile time.
47 Copyright © 2003 M. E. Kabay. All rights reserved.
Key Points
Fault tolerant architectures rely on replicated hard and software components
include mechanisms to detect faulty component and to switch it out of system
N-version programming and recovery blocks two different approaches to designing fault-tolerant software architectures
Design diversity essential for software redundancy
48 Copyright © 2003 M. E. Kabay. All rights reserved.
Homework
Study Chapter 18 in detail using SQ3RFor next Tuesday 4 Nov 2003
Complete exercises 18.1, 18.3-18.5 & 18.7-18.9 for 21 points
For next class (Thursday 30 Oct 2003), apply S-Q to chapter 19 on Verification and Validation. This is increasingly hard stuff, so STUDY before coming to class.
OPTIONAL:For 3 extra points per question, answer
any of questions 18.2, 18.6, 18.10 and 18.11 by the 11 Nov.