+ All Categories
Home > Documents > Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation...

Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation...

Date post: 03-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
18
Recent Progress in the APL Fault Management Process Kristin Fretz & Adrian Hill JHU/APL
Transcript
Page 1: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

Recent Progress in the APL Fault Management Process Kristin Fretz & Adrian Hill JHU/APL

Page 2: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

§  2008 FM Workshop published a White Paper Report identifying 12 top-level findings and recommendations for improving FM systems

§  APL recognized these needs and improved its FM and Autonomy processes to address many of these findings q  Processes are part of APL’s Quality

Management System (QMS) q  Changes implemented on the Radiation Belt

Storm Probes (RBSP) project §  Presentation will highlight 6 of the

findings, discuss the APL process changes, and the impact of the changes on both the FM and Autonomy processes

Overview

Page 3: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

APL FM Development Process

Page 4: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

RBSP Overview

MOC  EMFISIS  SOC  

Inclina.on  10°  

Perigee  Al.tudes  605  &  625  km  

2  Observatories  (RBSP-­‐A,  RBSP-­‐B)  • Nearly  iden.cal,  single  string  architecture  • Spin  Stabilized  ~5  RPM  • Spin-­‐Axis  15°-­‐27°  off  Sun  

• APtude  Maneuvers  Every  21  days  • No  on-­‐board  G&C  aPtude  control    

• Opera.onal  Design  Life  of  2  years  

Launch  and  Orbit  Inser=on  • Single  EELV  (Observatories  Stacked)  • Launch  from  KSC  • Each  observatory  independently  released  Sun  pointed  

• LV  performs  maneuver  to  achieve  nominal  lapping  rate  

• Launch:    May  2012  

APL  Ground    Sta.on  •   Primary  

NEN  Sta.on(s)  •   Data  Augmenta.on  •   Back-­‐up  

Commands  &  Telemetry  

Instrument  commands  

Instrument  Telemetry   EFW  

SOC  ECT  SOC  

PSBR  SOC  

RBSPICE  SOC  

Decoupled  Opera.ons  • Basic  Approach  Based  on  TIMED,  STEREO  

Differing  apogees  allow  for  simultaneous  measurements  to  be  taken  over  the  full  range  of  observatory  separa.on  distances  several  .mes  over  the  course  of  the  mission.    This  design  allows  one  observatory  to  lap  the  other  every  75  days.  

TDRSS  

Cri.cal  Events  at  Launch  

Apogee  Al.tudes  30,410  &  

30,540  km  

Page 5: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

Finding: §  “Responsibility for FM currently is diffused throughout

multiple organizations; unclear ownership leads to gaps, overlap and inconsistencies in FM design, implementation and validation”

Recommendation: §  “Establish clear roles and responsibilities for FM

engineering” §  “Establish a process to train personnel to be FM

engineers and establish or foster dedicated education programs in FM”

Finding 2: Diffused FM

Page 6: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

§  APL split FM functionality into two distinct roles: Fault Management and Autonomy

§  FM is a system engineering function that: q  Develops concept of operations q  Develops FM architecture q  Oversees reliability analyses such as FMEA/PRA q  Defines requirements and allocation of those requirements to

hardware, software, autonomy, and operations q  Verifies and validates system with mission-level tests during I&T

§  Autonomy is a software engineering function that: q  Designs and implements autonomy rule-based system derived

from allocated FM requirements q  Performs unit-level/subsystem-level verification

Implemented Change for Finding 2: Diffused FM

Page 7: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

Finding: §  “There is insufficient formality in the documentation of

FM designs and architectures, as well as a lack of principles to guide the processes”

Recommendation: §  “Identify representation techniques to improve the

design, implementation and review of FM systems” §  “Establish a set of design guidelines to aid in FM design”

Finding 4: Insufficient Formality of Documentation

Page 8: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

§  APL’s QMS has formal FM and Autonomy engineering processes which define required documentation and reviews

Implemented Change for Finding 4: Insufficient Formality of Documentation

FM     Autonomy  

FM  Architecture  Document   N/A  

FM  Requirements  Document   Autonomy  Requirements  Specifica.on  

FM  Design  Specifica.on   Autonomy  Design  Spec  &  Users  Guide  

FM  Test  Plan  &  Verifica.on  Matrix   Autonomy  Acceptance  Test  Plan  

FM  Test  Procedures   Autonomy  Acceptance  Test  Specifica.on  

FM  Test  Reports   Autonomy  Acceptance  Test  Report  (includes  verifica.on  matrix)  

§  RBSP FM and Autonomy developed system and subsystem diagrams to communicate information captured in architecture, requirements, and specifications

Page 9: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

Implemented Change for Finding 4: Insufficient Formality of Documentation

OPERATIONAL MODE •Normal power-positive conditions exist •Spacecraft is available for science data collection

SAFE MODE •Protects against life-threatening low-power or communication fault •No science data collection

Safe ConfigurationLaunch

Configuration

Power

Avionics Communication

FSW off-pulse command to PSE I/F card Load over-current/power

Instrument power downrequest, instrument

heartbeat failure, or over-power

Battery electronics over-powerLow battery current not during eclipse or current controller over-power

SSPA over-power

Separated

Latch Valvecontinuous current

Minimum sun angle violation, thruster armed in eclipse,off-Nominal spin rate, or

thruster arm time-out

Battery over-temperature Battery Temperaturereturns to nominal

G&C/Propulsion

Payload

Nominal Operations Configurations

RF processor watchdogtimeout

SWCLT expiration orTransceiver over-power

IEM processor watchdog timeout

C&DH FSW self-initiated reset XCVR FSW

self-initiated reset

RF CCD off-pulse command IEM off-pulse

RF CCD Command

IEM Reset CCD Command

Sys

tem

Resp

onse

Loc

atio

n

Lo

cal

Ground Commands

KEYAutonomyHardware

MOPSFSW

HWCLT ExpirationLVS or LBSOC

Launch Vehicle Separation

LVS orLBSOC

NotSeparated

Separated

Separated & LVS or LBSOC

Max Sun Angle Exceedance

Post-Separation Sequence

Rule that triggers post-sep macro disabled by ground after confirmation that macro successfully completed

PDU Detects 3 of 4 Switches and Removes Inhibit

Post-Sep Macro (Enable Safety

Buses, RF ON, Sun Sensor ON)

Autonomy Detects 3 of 4

Switches via IEM telemetry

IEM Provides Switch Status to

AutonomyNominal -OR- Fault SA Deploy Macro

Battery Discharge Protection

Power OffCurrent

ControllerReset PSE

Interface Card

Battery Over-Temperature Recovery

Enable Coulometer

Charge Control

Enable Over-Temperature

Protection

Battery Over-Temperature Protection

Disable Coulometer

Charge Control

Set Charge Current to

Trickle

Enable Over-Temperature

Recovery

VT Off-Pulse

1 of 3 VT Regulators Off-Pulsed

(Limited by PSE I/F)

PDU Load Shed Sequence

PGS VT-onlySwitched Loads OFF

Battery Electronics Protection

Disconnect Battery

ElectronicsTurn Battery

Electronics OffNon-Optimal

Battery Maintenance

PDU Off-Pulse via RF CCD

PDU Off-PulseGround

Reconfigures PDU

Load Over-Current

CB or AUT removes

Switched Power from Load

Hardware Command Loss Timer Sequence

XCVR Off-Pulse

PDU Off-Pulse

IEM Off-Pulse

Reinitialize File System(Destructive)

Software Command Loss Timer

Establish Emergency

Comm

Off-pulse XCVR

(3x max)

RF Electronics defaults to Emergency

Comm

RF Electronics Protection

Power Off SSPA

RF Processor Reset

XCVR FSW Reboots

Establish Emergency

Comm

RF Electronics defaults to Emergency

Comm

Maneuver AbortDisable Thruster Fire I/F & Power

Off Catbeds

Latch Valve Continuous-Current

Power-Off Latch Valve

Individual Instrument Safing

Power Off Instrument

IEM Reset via IEM CCD

*Resets SBC, SSR (lose open files only), SCIF (non-critical)

Recover Existing SSR File System

Reset IEM non-critical modules*

IEM Off-Pulse via RF CCD

*Resets SBC, SSR (lose all data), SCIF (full FPGA)**Restore MET, reload time-tagged cmds & SW updates, restart science data collection

Interrupt IEM Unswitched

Power*

Reinitialize File System(Destructive)

Ground Recovery of

IEM**

IEM Non Power-On Reset Recover

Existing SSR File System

(3x max)

C&DH FSW

RebootsSpin Pulse

ManagementEstablish

Emergency Comm

Go-Safe SequenceDisable

Thruster Fire I/F & Power Off

Catbeds

Power on RF Downlink &

Configure Emer Comm

BME, PSE CC, &

Instruments OFF

Load Re-enforcement

(SS, HTRs, PSE I/F ON)

Resume SSR Operations

Assert PDU Go-Safe Relay

Disable C&DH/Delete Inst Time-tag

Commands

Soft LVS

Separated

KEYAutonomyHardware

MOPSFSW

Communication

SSPA over-power

RF processor watchdogtimeout

SWCLT expiration orTransceiver over-power

XCVR FSW self-initiated reset

HWCLT Expiration

Hardware Command Loss Timer Sequence

XCVR Off-Pulse

PDU Off-Pulse

IEM Off-Pulse

Reinitialize File System(Destructive)

Software Command Loss Timer

Establish Emergency

Comm

Off-pulse XCVR

(3x max)

RF Electronics defaults to Emergency

Comm

RF Electronics Protection

Power Off SSPA

RF Processor Reset

XCVR FSW Reboots

Establish Emergency

Comm

RF Electronics defaults to Emergency

Comm

§  Example of Fault Management Modes Diagram used to convey interactions between Hardware, Software, Autonomy, and Ground/Operations for managing spacecraft faults

Page 10: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

Implemented Change for Finding 4: Insufficient Formality of Documentation

SSR MAINTENANCE

SSR INITIALIZATIONSAFE MODE

LATCH VALVE ACTUATION

PROCESSOR RAM DISK

PGS

MANEUVER ABORTS

PRESSURE XDCRS

LAUNCHRF COMMUNICATIONS

PERFORM GOSAFE [M10]q   CALL DISABLE THRUSTERS [M20]q   Delete instrument timetagsq   Suspend C&DH timetags at priorities 4-15q   CALL POWER OFF BATTERY

MANAGEMENT ELECTRONICS [M42]q   Power off PSE current controllerq   CALL SHUTDOWN PAYLOAD [M47]q   CALL POWER ON RF DOWNLINK PATH

AND CONFIGURE EMERGENCY COMM [M01]

q   CALL RECORD ENGINEERING DATA TO SSR [M27]

q   Power on propulsion module heatersq   Power on sun sensor electronics unitq   Power on bus survival heatersq   Inhibit PDU Response to Low Battery State of

Charge from PSE Interface Cardq   Power on PSE interface cardq   Assert PDU Go-Safe relay

DISABLE THRUSTERS [M20]q   Disable IEM thruster fire

interfaceq   Power off catbed heaters

DETECT SPIN RATE ERROR [R20]If spacecraft spin rate is below Minimum Spin Rate [SV07] or exceeds Maximum Spin Rate [SV06] and manuever in progress

DETECT MANEUVER EXECUTION TIMEOUT [R21]If maneuver execution longer than Maximum Maneuver Time [SV08]

DETECT MANEUVER EXECUTION IN ECLIPSE [R22]If solar array current is below limits or no sun pulses and maneuver in progress

DETECT HARDWARE LVS [R10]If Hardware LVS detected and either Mission Phase [SV03] is ORBIT or Deployments Complete Flag [SV11] is TRUE

DETECT LOW BATT SOC [R11]If LBSOC is detected and and either Mission Phase [SV03] is ORBIT or Deployments Complete Flag [SV11] is TRUE

DETECT SUN ANGLE MAX [R12]If sun angle exceeds Maximum Sun Angle [SV04] and either Mission Phase [SV03] is ORBIT or Deployments Complete Flag [SV11] is TRUE

DETECT SUN ANGLE VIOLATION [R23]If sun angle is below Maneuver Min Sun Angle [SV10] or beyond Maneuver Max Sun Angle [SV09] and maneuver in progress

DETECT LV-1 OPEN/CLOSE CMD CONTINUOUS CURRENT [R33]If latch valve-1 open or close command asserts continuous current for 2 seconds

REMOVE LATCH VALVE-1 OPEN/CLOSE PULSE POWER [M33]q   Terminate latch valve-1 open

command pulse actuationq   Terminate latch valve-1 close

command pulse actuation

DETECT LV-2 OPEN/CLOSE CMD CONTINUOUS CURRENT [R34]If latch valve-2 open or close command asserts continuous current for 2 seconds

REMOVE LATCH VALVE-2 OPEN/CLOSE PULSE POWER [M34]q   Terminate latch valve-2 open

command pulse actuationq   Terminate latch valve-2 close

command pulse actuation

DETECT XOVER LV OPEN CMD CONTINUOUS CURRENT [R35]If xover latch valve open command asserts continuous current for 2 seconds

REMOVE XOVER LATCH VALVE OPEN PULSE POWER [M35]q   Terminate xover latch valve

open cmd pulse actuation

A

To SHUTDOWN

PAYLOAD Macro

(Page 2)

PERFORM POST-SEPARATION SEQUENCE [M07]q   Enable RF safety buses (A&B)q   Enable actuator safety buses (A&B)q   Enable propulsion safety buses (A&B)q   Enable battery conn relays (1&2)q   Enable SAJB Relay at Reduced Curr (A&B)q   Select VT Level #4q   Disconnect BME cell bypass relayq   CALL POWER ON RF DOWNLINK PATH

AND CONFIGURE EMERGENCY COMM [M01]

q   Power on sun sensor electronics unit

*** This macro is disabled by default for I&T ***

DETECT SEPARATION [R05]If 3 (or more) of 4 switches indicate separation and Mission Phase [SV03] is LAUNCH for 5 seconds

PERFORM X-AXIS SA DEPLOYMENTS [M08]q   Fire +X primary / -X

redundant actuatorsq   Sleep 5 secondsq   Fire +X redundant / -X

primary actuators

DETECT HARDWARE CLT [R13]If C&DH processor reboots as result of HW CLT expiration (HWCLT Relay set) and (spacecraft is separated or Mission Phase [SV03] is ORBIT)

POWER ON RF DOWNLINK PATH AND CONFIGURE EMERGENCY COMM [M01]q   Power on SSPA (A)q   Sleep 5 secondsq   Issue transceiver firmware

resetq   Sleep 15 secondsq   CALL CONFIGURE RF FOR

EMERGENCY COMM [M19]

OFF PULSE TRANSCEIVER [M16]q   Off-pulse transceiver (h/w-

limited to 3 attempts)

DETECT TRANSCEIVER SOFTWARE REBOOT [R17]If transceiver flight software reboots unexpectedly

off pulse will cause transceiver software reboot

CONFIGURE RF FOR EMERGENCY COMM [M19]q   Configure FSW tables for

emergency downlink ratesq   Disable FSW playbackq   Sleep 30 secondsq   Configure transceiver for

emergency uplink/downlink

DETECT SW CLT TIMEOUT [R15]If time elapsed sinceTime of Last SWCLT Restart [SV01] exceeds SWCLT Interval [SV02] and (spacecraft is separated or Mission Phase [SV03] is ORBIT)

POWER ON RF DOWNLINK PATH AND OFF PULSE TRANSCEIVER [M15]q   Power on SSPA (A)q   CALL OFF PULSE

TRANSCEIVER [M16]q   Disable baseband uplink

DETECT TRANSCEIVER EXCESS POWER [R16]If transceiver power exceeds limits

DETECT ANY CDH REBOOT [R01]If C&DH processor reboots for any reason and (spacecraft is separated or Mission Phase [SV03] is ORBIT)

POWER OFF SSPA [M18]q   Power off SSPA (A&B)

DETECT SSPA EXCESS POWER [R18]If SSPA power exceeds limits

DETECT BATTERY MANAGEMENT ELECTRONICS EXCESS POWER [R42]If battery management electronics power exceeds limits

POWER OFF BATTERY MANAGEMENT ELECTRONICS [M42]q   Throw relay to disconnect

Battery Management Electronics from battery

q   Sleep 2 secondsq   Power off Battery

Management Electronics

DETECT UNEXPECTED BATTERY DISCHARGE [R48]If battery is discharging while shunt current is above acceptable limits

RESET PSE INTERFACE CARD [M46]q   Inhibit PDU Response to Low Battery

State of Charge from PSE Interface Card

q   Power off PSE current controllerq   Reset PSE Interface Card

DETECT PSE CURR CNTRLR OVERCURRENT [R46]If PSE current controller load current exceeds limits

DETECT BATTERY OVER-TEMPERATURE [R40]If battery temperature exceeds over-temp limit and charge rate is above limit and coulometer charge control is enabled

DISABLE COULOMETER CHARGE CONTROL [M40]q   Disable coulometer charge

controlq   Set battery chrg rate to C/100

RESUME BATTERY OVERTEMP MONITORING [R41]If battery temperature is below acceptable limits and PSE current controller is powered on and coulometer charge control is disabled

RESUME COULOMETER CHARGE CONTROL [M41]q   Enable coulometer charge

control

DETECT PSE INTERFACE CARD EXCESS POWER [R44]If PSE Interface card power exceeds limits

POWER OFF PSE INTERFACE CARD AND CURRENT CONTROLLER [M44]q   Inhibit PDU Response to Low

Battery State of Charge from PSE Interface Card

q   Power off PSE interface cardq   Power off PSE current

controller

DETECT CDH REBOOT WITH SSR CONTENTS PRESERVED [R03]If C&DH processor reboots and SSR contents are preserved (subject to maximum 3 attempts)

RESUME SSR RECORDING [M03]q   Enable SSR memory scrubq   Recover existing SSR file systemq   Sleep 5 secondsq   CALL RECORD ENGINEERING

DATA TO SSR [M27]q   CALL RECORD INSTRUMENT

DATA TO SSR [M28]

DETECT CDH REBOOT WITH SSR CONTENTS LOST [R04]If C&DH processor reboots and SSR contents are lost (e.g., IEM Off-Pulse) and SSR H/W Initialization of its SDRAM is complete

REINITIALIZE SSR RECORDING [M04]q   Enable SSR memory scrubq   Create new SSR file system

(destructive)q   Sleep 5 secondsq   CALL RECORD ENGINEERING

DATA TO SSR [M27]

DETECT BATTERY MANAGEMENT ELECTRONICS CIRCUIT BREAKER TRIP [R43]If battery management electronics trips circuit breaker

DETECT CDH REBOOT [R02]If C&DH reboots and this is the first reboot since last ground contact

RECORD HK TO PROCESSOR RAM DISK [M02]q   Change record rate of G&C Maneuver

Tlm Packet to 1 packet every 300 seconds

q   Record engineering data (h/k, evt msgs, cmd status) to local RAM disk

q   Freeze recording of CFE_EVS logq   Dump memory scrub errors to RAM

diskq   Sleep 60 minutesq   Halt recording of engineering data to

local RAM diskq   Restore record rate of G&C Maneuver

Tlm Packet to 1 packet every second

DETECT SA DEPLOY READY [R06]If 3 (or more) of 4 switches indicate sep -AND- Mission Phase [SV03] is LAUNCH -AND- Time of Separation [SV05] is non-zero -AND-( time elapsed since Time of Separation [SV05] exceeds SA Deploy Delay Interval [SV13] -OR- LVS detected -OR- LBSOC is detected )

DETECT PSE INTERFACE CARD CIRCUIT BREAKER TRIP [R45]If PSE Interface card trips circuit breaker

POST-SEPARATION SEQUENCE EXECUTIVE [M05]q   Set Time of Separation [SV05] to current

FSW uptimeq   CALL PERFORM POST-SEPARATION

SEQUENCE [M07]q   Sleep 20 secondsq   CALL PERFORM POST-SEPARATION

SEQUENCE [M07]

SA DEPLOYMENTS EXECUTIVE [M06]q   Sleep 60 secondsq   CALL PERFORM X-AXIS SA DEPLOYMENTS [M08]q   Sleep 60 secondsq   CALL PERFORM Y-AXIS SA DEPLOYMENTS [M13]q   Sleep 60 secondsq   CALL PERFORM X-AXIS SA DEPLOYMENTS [M08]q   Sleep 60 secondsq   CALL PERFORM Y-AXIS SA DEPLOYMENTS [M13]q   Set Deployments Complete Flag [SV11] to TRUE

DETECT CDH REBOOT AND CATBED ON [R24]If C&DH processor reboots for any reason and catbed heaters were left powered on

DETECT PSE CURR CNTRLR CIRCUIT BREAKER TRIP [R47]If PSE current controller trips circuit breaker

DETECT PRESSURE TRANSDUCERS OVERCURRENT [R29]If pressure transducers load current exceeds limits

POWER OFF PRESSURE TRANSDUCERS [M29]q   Power off pressure

transducers

DETECT XOVER LV CLOSE CMD CONTINUOUS CURRENT [R38]If xover latch valve close command asserts continuous current for 2 seconds

REMOVE XOVER LV CLOSE PULSE POWER [M38]q   Terminate xover latch valve

close cmd pulse actuation

SUN SENSORDETECT SUN SPIN PULSE USE MISCONFIGURATION [R31]If sun sensor based spin pulse is selected but sun presence is not detected (i.e., sun sensor eclipse)

SELECT SIMULATED SPIN PULSE [M31]q   Configure IEM to use

Hardware Timer Spin Pulse

DETECT SIMULATED SPIN PULSE USE MISCONFIGURATION {R32]If hardware timer spin pulse is selected but sun presence is detected (i.e., sun sensor pulse is sensed)

SELECT SUN SENSOR SPIN PULSE [M32]q   Configure IEM to use Sun

Sensor Spin Pulse

DETECT SUN SENSOR OVERCURRENT [R30]If sun sensor load current exceeds limits

POWER OFF SUN SENSOR [M30]q   Power off sun sensor

electronics unitq   Configure IEM to use

Hardware Timer Spin Pulse

SAVE TRANSCEIVER REBOOT CAUSE [M17]q   Save transceiver reboot flags

in Last Transceiver Reboot Cause [SV12]

q   Clear transceiver reboot flagsq   CALL CONFIGURE RF FOR

EMERGENCY COMM [M19]

LEGEND

MACRO [ID]

RULE [ID]

MOPS

Stor Var [ID]

PERFORM Y-AXIS SA DEPLOYMENTS [M13]q   Fire +Y primary / -Y

redundant actuatorsq   Sleep 5 secondsq   Fire +Y redundant / -Y

primary actuators

C

RECORD INSTRUMENT DATA TO SSR [M28]q   Open files on SSR to record

instrument data

RECORD ENGINEERING DATA TO SSR [M27]q   Open files on SSR to record

engineering data (includes housekeeping, event message and command status packets)

DETECT SSR RESET [R26]If FSW has induced an SSR Reset

RE-ENABLE SSR MEMORY SCRUB [M26]q   Re-enable SSR memory scrubq   Clear count of FSW-induced

SSR resets

DETECT SOFTWARE LVS [R14]If batt voltage is less than Software LVS Threshold [SV14] and either Mission Phase [SV03] is ORBIT or Deployments Complete Flag [SV11] is TRUE

LEGEND

MACRO [ID]

RULE [ID]

MOPS

Stor Var [ID]

RF COMMUNICATIONS

POWER ON RF DOWNLINK PATH AND CONFIGURE EMERGENCY COMM [M01]q   Power on SSPA (A)q   Sleep 5 secondsq   Issue transceiver firmware

resetq   Sleep 15 secondsq   CALL CONFIGURE RF FOR

EMERGENCY COMM [M19]

OFF PULSE TRANSCEIVER [M16]q   Off-pulse transceiver (h/w-

limited to 3 attempts)

DETECT TRANSCEIVER SOFTWARE REBOOT [R17]If transceiver flight software reboots unexpectedly

off pulse will cause transceiver software reboot

CONFIGURE RF FOR EMERGENCY COMM [M19]q   Configure FSW tables for

emergency downlink ratesq   Disable FSW playbackq   Sleep 30 secondsq   Configure transceiver for

emergency uplink/downlink

DETECT SW CLT TIMEOUT [R15]If time elapsed sinceTime of Last SWCLT Restart [SV01] exceeds SWCLT Interval [SV02] and (spacecraft is separated or Mission Phase [SV03] is ORBIT)

POWER ON RF DOWNLINK PATH AND OFF PULSE TRANSCEIVER [M15]q   Power on SSPA (A)q   CALL OFF PULSE

TRANSCEIVER [M16]q   Disable baseband uplink

DETECT TRANSCEIVER EXCESS POWER [R16]If transceiver power exceeds limits

DETECT ANY CDH REBOOT [R01]If C&DH processor reboots for any reason and (spacecraft is separated or Mission Phase [SV03] is ORBIT)

POWER OFF SSPA [M18]q   Power off SSPA (A&B)

DETECT SSPA EXCESS POWER [R18]If SSPA power exceeds limits

SAVE TRANSCEIVER REBOOT CAUSE [M17]q   Save transceiver reboot flags

in Last Transceiver Reboot Cause [SV12]

q   Clear transceiver reboot flagsq   CALL CONFIGURE RF FOR

EMERGENCY COMM [M19]

§  Example of Autonomy Design Diagram used to convey implementation of monitors and responses that comprise the on-board autonomy system for responding to faults

Page 11: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

Finding 7: §  “The impact of mission-level requirements on FM complexity

and V&V is not fully recognized” q  “Review and understand the impacts of mission-level requirements

on FM complexity; FM designers should not suffer in silence, but should assess and elevate impacts to the appropriate levels of management”

Finding 8: §  “FM architectures often contain complexity beyond what is

defined by project specific definitions of faults and required fault tolerance”

§  “Increased FM architecture complexity leads to increased challenges during I&T and mission operations” q  “Assess the appropriateness of the FM architecture with respect to

the scale and complexity of the mission, and the scope of the autonomy functions to be implemented within the architecture”

Findings 7 & 8: FM Complexity

Page 12: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

§  FM involvement on RBSP started in early phase-A as a part of the systems engineering team q  Involved in mission-level trades and assessed impact of

mission-level decisions on FM q  FM “ownership” of requirements at all levels with

requirements allocated from mission-to-system-to-subsystem

§  Development of FM Architecture Document defined FM approach for the project q  Architecture document now a formal part of FM process

which helped to communicate FM concepts/approach q  Resulted in project “buy-in” on FM approach early in

mission development

Implemented Change for Findings 7 & 8: FM Complexity

Page 13: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

Finding: §  “Inadequate testbed resources is a significant schedule driver

during V&V” Recommendation: §  “Develop high-fidelity simulations and hardware testbeds to

comprehensively exercise the FM system prior to spacecraft level testing”

Finding 11: Inadequate Testbed Resources

Page 14: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

§  RBSP flight software testbeds available for both FM and Autonomy testing q  Available early and provided enough fidelity to verify

Autonomy system q  Allowed for dry-running portions of FM system-level tests

§  RBSP project also developed high-fidelity hardware-in-the-loop (HIL) simulator q  Purpose of HIL was to provide high-fidelity test platform for

the development of FM system tests prior to spacecraft I&T •  Due to hardware issues completion and availability of HIL was

delayed until late I&T q  FM system-level testing (dry-run and for-score) was

performed on spacecraft without the opportunity to dry-run on the HIL

Implemented Change for Finding 11: Inadequate Testbed Resources

Page 15: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

Finding: §  “Unexpected cost and schedule growth during final system

integration and test are a result of underestimated Verification and Validation (V&V) complexity combined with late resource availability and staffing”

Recommendation: §  “Allocate FM resources and staffing early, with appropriate

schedule, resource scoping, allocation, and prioritizing; schedule V&V time to capitalize on learning opportunity”

§  “Establish Hardware / software / “sequences” /operations function allocations within an architecture early to minimize downstream testing complexity”

§  “Engrain FM into the system architecture. FM should be “dyed into design” rather than “painted on””

Finding 1: Cost & Schedule

Page 16: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

§  RBSP spacecraft-level testing for both FM and autonomy started early in I&T q  FM tests were ready at start of I&T which allowed for dry-

running of tests (utilized partially integrated spacecraft) q  Staffing for FM and Autonomy testing increased in

response to lessons learned from New Horizons and MESSENGER

§  FM Design Specification clearly defined allocations of FM functions to hardware, software, autonomy, and operations

§  Improved approach, planning, and staffing resulted in avoiding “death spiral” and staffing “bump” during I&T

Implemented Change for Finding 1: Cost & Schedule

Page 17: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

Implemented Change for Finding 1: Cost & Schedule

-­‐70.00   -­‐60.00   -­‐50.00   -­‐40.00   -­‐30.00   -­‐20.00   -­‐10.00   0.00   10.00  

Staff

 Mon

ths  

Months  to  Launch  

§  Past missions resulted in staffing “bump” during I&T

-­‐70.00   -­‐60.00   -­‐50.00   -­‐40.00   -­‐30.00   -­‐20.00   -­‐10.00   0.00   10.00  

Staff

 Mon

ths  

Months  to  Launch  

§  Noticeable change in shape of staffing curve for RBSP q  FM/Autonomy waited on the spacecraft for testing rather than spacecraft waiting

on FM/Autonomy q  Remaining RBSP time is projected

KEY:    Past  APL  Mission  #1    Past  APL  Mission  #2    Past  APL  Mission  #3    RBSP  

Page 18: Recent Progress in the APL Fault Management Process · Storm Probes (RBSP) project ! Presentation will highlight 6 of the findings, discuss the APL process changes, and the impact

§  Significant changes seen in staffing curve for RBSP as compared to previous APL missions

§  Credit process changes for these improvements: q  Better defined FM/Autonomy roles q  Required documentation at key milestones q  Use of diagrams as communication tool for documentation q  Early program involvement to manage complexity q  Improved testbed resources

§  APL FM/Autonomy processes continuing to evolve by using lessons learned from RBSP and findings from 2008 FM Workshop

Conclusion


Recommended