Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | mallory-flores |
View: | 33 times |
Download: | 1 times |
20/04/23 Dr Andy Brooks 1
FOR0383 Software Quality Assurance
Lecture 2
ESA Ariane 5 Rocket Flight 501
20/04/23 Dr Andy Brooks 2
4 June 1996
• at ~40 seconds into launch
• at an altitude of ~3700m
• the launcher veered off path and began to break up
• the self-destruct system was triggered
• ~$500 million (uninsured, maiden flight)
• the launcher was unmanned
20/04/23 Dr Andy Brooks 3
Board of Inquiry
• what was the cause of failure?
• was appropriate testing undertaken?
• what corrective actions should there be?
• the report by the Board of Inquiry was completed in less than 6 weeks
20/04/23 Dr Andy Brooks 4
Weather conditions
• the weather was acceptable• there was no risk of lightning• but visibility had worsened for a time• the launch was delayed by about 1hr
The Challenger Space Shuttle disaster was partly due to the weather. Overnight conditions at the launch pad had been extremely cold which meant the O-rings on the booster rockets were brittle and prone to fracture.
20/04/23 Dr Andy Brooks 5
Briefly• nominal behaviour of the launcher until H0 +
36 seconds• the backup Inertial Reference System fails• the active Inertial Reference System fails
– after the backup
• all the rocket nozzles are swivelled into extreme positions
• the launcher breaks up and the self-destruct system was triggered
20/04/23 Dr Andy Brooks 6
Recovery of material
• debris fell back to ground, scattered over a wide area (5 x 2,5km)
• despite mangrove swamps, the two Inertial Reference Systems were recovered
• telemetry data was received on the ground• trajectory data was received from radar
stations• optical observations (camera and film)
20/04/23 Dr Andy Brooks 7
Unrelated Anomaly
• at H0 + 22 seconds
• variations started in the hydraulic pressure of the actuators of the main engine nozzle with a frequency of 10Hz
• “This phenomenon is significant and has not yet been fully explained, but after consideration it has not been found relevant to the failure.”
20/04/23 Dr Andy Brooks 8
Inertial Reference System (SRI)
• complex piece of equipment
• measures attitude and movements in space
• output transmitted to the On-Board Computer (OBC) executing the flight control program
• to improve reliability, two SRIs operated in parallel with identical hardware and software
First question to ask: how is the system backed up?...
20/04/23 Dr Andy Brooks 9
Equipment Redundancy
• there are two On-Board Computers
• and a number of other units in the flight control system are also duplicated
20/04/23 Dr Andy Brooks 10
So, what really happened?
• the OBC received incorrect data • the SRI had declared a failure due to a
software exception (Operand Error)• a data conversion from a 64-bit floating point
was too large for the target 16-bit signed integer value
• this particular data conversion was not protected
20/04/23 Dr Andy Brooks 11
…Different Trajectory
• the operand error occurred because Ariane 5 built up a horizontal velocity much more quickly than Ariane 4– Ariane 5 built up horizontal velocity five times
more quickly than Ariane 4
• the failure context was precisely determined from memory readouts from the recovered SRIs
Ariane family
20/04/23 Dr Andy Brooks 12
20/04/23 Dr Andy Brooks 13
…No useful purpose• the software module which generated the
exception served no useful purpose after launch!
• simply re-used from Ariane 4
“Effective reuse requires design by contract. Without a precise specification attached to each reusable component - precondition, postcondition, invariant - no one can trust a supposedly reusable component. Without a specification, it is probably safer to redo than to reuse.”Jean-Merc Jézéquel and Betrand Mayer, IEEE Computer, January 1997 p130
20/04/23 Dr Andy Brooks 14
Unprotected variables?• 3 variables were unprotected “because a maximum
workload target of 80% had been set for the SRI computer”– remember, this is a real-time system
• the justification was not given in source code• the reasoning was that variables were either physically
limited or there was a large safety margin– this was true for Ariane 4
• the decision to protect some but not all of the variables was taken jointly by project partners
20/04/23 Dr Andy Brooks 15
The specification of exception-handling contributed to the failure.
• the failure should be indicated on the databus– the OBC interpreted the diagnostic data it was sent as
valid data, causing the nozzle deflections
– remember, the backup SRI failed first• the failure context should be stored in EEPROM
memory• the SRI processor should be shut down• this approach addressed random hardware failures
20/04/23 Dr Andy Brooks 16
Testing• no test was performed to verify that the SRI would
behave correctly when subject to the count-down and trajectory of Ariane 5
• the SRI specification did not contain Ariane 5 trajectory data as a functional requirement
“It would have been technically feasible to include almost the entire inertial reference system in the overall system simulations which were performed. For a number of reasons it was decided to use the simulated output of the inertial reference system, not the system itself or its detailed simulation. Had the system been included, the failure could have been detected.”
20/04/23 Dr Andy Brooks 17
Recommendations
R1 … no software function should run during flight unless it is needed
R2 … test facility must include as much real equipment as possible… Complete simulations must take place...
R3 … do not allow sensors to stop sending best effort data
20/04/23 Dr Andy Brooks 18
… more RecommendationsR5 review all flight software… identify all implicit
assumptions
R9 include external participants when reviewing specifications, code and justification documents (someone with a fresh mind can sometimes easily spot mistakes that the authors miss)
R14 provide more transparent organisation of co-operation among partners