+ All Categories
Home > Documents > Fault-Free Performance Validation of Fault-Tolerant...

Fault-Free Performance Validation of Fault-Tolerant...

Date post: 12-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
45
CMU-CS- 86-127 Fault-Free Performance Validation of Fault-Tolerant Multiprocessors Ann Marie Grizzaffi November 1985 Dept. of Electrical and Computer Engineering Carnegie-Mellon University Pittsburgh, Pennsylvania 15213 Submitted to Carnegie-Mellon University in partial fulfillment of the requirements for the Degree of Master of Science in Electrical and Computer Engineering Copyright (_) 1986 Ann Marie Grizzaffi This Research was sponsored by the National Aeronautics and Space Administration, Langley Research Center under contract NAG-l-190. This Research was also supported by AT&T Bell Laboratories. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of NASA, the United States Government, AT&T Bell Laboratories, or Carnegie-Mellon University.
Transcript
Page 1: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

CMU-CS- 86-127

Fault-Free

Performance Validation

of Fault-Tolerant

Multiprocessors

Ann Marie Grizzaffi

November 1985

Dept. of Electrical and Computer Engineering

Carnegie-Mellon University

Pittsburgh, Pennsylvania 15213

Submitted to Carnegie-Mellon University in partial fulfillment of the

requirements for the Degree of

Master of Science in Electrical and Computer Engineering

Copyright (_) 1986 Ann Marie Grizzaffi

This Research was sponsored by the National Aeronautics and Space Administration, Langley ResearchCenter under contract NAG-l-190. This Research was also supported by AT&T Bell Laboratories.

The views and conclusions contained in this document are those of the author and should not be

interpreted as representing the official policies, either expressed or implied, of NASA, the United States

Government, AT&T Bell Laboratories, or Carnegie-Mellon University.

Page 2: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations
Page 3: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

Table of ContentsAbstract 11. Introduction 2

2. Background 32.1. Developing the Methodology 3

2.2. Validation Methodology Defined 43. The SIFT Environment 5

3.1. Hardware Configuration 53.2. SIFT Software 5

3.3. Experimental Environment 84. FTMP and Its Experimental Environment 115. The Experiments 14

5.1. Clock Read Characteristics 155.2. Instruction Times 15

5.3. Instruction Combinations 155.4. Task Stretching 16

6. Results and FTMP Comparisons 186.1. Read Time Clock Delay 186.2. Instruction Measurements 19

6.3. Instruction Combination Measurements 206.4. Task Stretching Results 28

7. Future Work 30

7.1. Baseline Experiments 307.2. Synthetic Workload 30

8. Conclusions 31I. Appendix 321.1. Clock Read Dump 331.2. Statistical Data on Instruction Times 34

1.3. Statistical Data on Instruction Combinations 36References 38

Page 4: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

List of FiguresFigure 3-1: Block Diagram of SIFT Distributed System 6Figure 3-2: Block Diagram of a SIFT Processor 7Figure 3-3: Block Diagram of the SIFT Test Environment 9

Figure 4-1: FTMP Block Diagram, [Czeck 85] 11Figure 4-2: FTMP's Test Environment, [Czeck 85] 12Figure 5-1: Basic Task Algorithm 14

Figure 5-2: Program Used For Task Stretching - Voting Case 17Figure 6-1: Frequency vs. Microseconds per 100 A--1 Iterations 21Figure 6-2: Procedure Calls vs Parameters 22

Figure 6-3: Graph of Instruction Times: SIFT vs. FTMP 24Figure 6-4: A--1 vs. Consecutive Executions 25

Page 5: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

ooo

111

List of TablesTable 5-1: Instructions Measured on SIFT 16Table 5-2: Instruction Combinations Tested on SIFT 16

Table 6-1: SIFT Clock Read Results 18Table 6-2: Clock Read Results for SIFT and FTMP 18

Table 6-3: Summary of SIFT Instruction Execution Times 20Table 6-4: Instruction Times: SIFT vs. FTMP 23

Table 6-5: SIFT vs FTMP for Integer Assign A-----1 26

Table 8-6: SIFT: Comparison Instructions Combinations Not Done on FTMP 26Table 6-7: SIFT: Comparison Between Single Instructions and Combinations 27Table 6-8: FTMP: Comparison Between Single Instructions and Combinations 27Table 6-9: SIFT vs. FTMP in Addition Combination 28

Table 6-10: SIFT: Task Stretching Results 29Table I-1: Raw Data: SIFT Clock Read Experiment 33

Table I-2: Instruction Execution Time: Integer and Boolean Data Types 34Table I-3: Statistical Information: Integer and Boolean Data Types 34Table I-4: Instruction Execution Time: Miscellaneous Instructions 35Table I-5: Statistical Information: Miscellaneous Instructions 35Table I-6: Instruction Execution Time: Instruction Combinations 36

Table I-7: Statistical Information: Instruction Combinations 37

Page 6: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations
Page 7: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

Abstract

By the 1990's, aircraft will employ complex computer systems to control flight-critical functions. Since

computer failure would be life threatening, these systems should be experimentally validated before being

given aircraft control.

Over the last decade, Carnegie-Mellon University has developed a validation methodology for testing the

fault-free performance of fault-tolerant computer systems. Although this methodology was developed to

validate the Fault-Tolerant Multiprocessor (FTMP) at NASA-Langley's AIRLAB facility, it is claimed to

be general enough to validate any ultrareliable computer system.

The goal of thisresearchwas to demonstratethe robustnessof the validationmethodology by its

applicationon NASA's SoftwareImplementedFault-Tolerance(SIFT)DistributedSystem. Furthermore,

the performance of two architecturallydifferentmultiprocessorscould be compared by conducting

identicalbaselineexperiments.

From an analysis of the results, SIFT appears to have a better overall performance for instruction

execution than FTMP. One conclusion that can be made is thus far the validation methodology has been

proven general enough to apply to SIFT, and has produced results that were directly comparable to

previous FTMP experiments.

Page 8: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

1. Introduction

Today's aircraft use simple on-board computers to perform isolated functions that are not flight critical.

If the computer fails, the flight crew can assume control of the function previously done by the computer,

without loss of life or cargo. But expanding technology is creating advanced aircraft that are too complex

for humans and simple computers to control. By the next decade, aircraft will require fault-tolerant

computer systems to perform flight-critical functions. Unfortunately, integrating avionics and control

functions means that computer failure can become a life threatening situation. Therefore, it is critical

that any computer system put in control of an aircraft be fault-free.

The NationalAeronauticsand SpaceAdministration(NASA) has on-goingresearchin theintegrationof

avionicsand controlfunctionsat Langley Research Center'sAvionicsIntegratedResearch Laboratory

(AIRLAB). One study determinedthat a computer system couldbe consideredsufficientlyreliablefor

aircraftcontrolifithas a probabilityof lessthan 10"I°failuresper hour,or one failureper millionyears

ofoperation.Sinceitisnot feasibletowaita millionyearsto insurethata computer systemisfault-free,

a validationmethodologyhad to be createdthatwould testfor functionalcorrectness.In lightof this

need,NASA heldseveralworkshopstodeterminethe bestapproach.

Based upon the results of these workshops, Carnegie-Mellon University (CMU) developed a set of

experiments to validate the prototype Fault-Tolerant Multiprocessor (FTMP) at AIRLAB [Clune

84, Feather 85]. Through this research, CMU was further able to develop a validation methodology

claimed to be general enough to test the fault-free performance of any fault-tolerant system.

The goal of this research was to demonstrate the robustness of the validation methodology by

application to NASA's Software Implemented Fault-Tolerant (SIFT) distributed system. This research

also demonstrated that conduction of identical baseline experiments allows the performance of two

architecturally different systems to be directly compared.

Page 9: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

2. Background

In 1979, NASA held several workshops to develop a complete set of procedures for validating fault-

tolerant computer systems. One study in particular [NASA 79] produced an approach to validation that

would test systems in an orderly manner. This list was based on a building block method of analysis.

The approach proposed that experimentation should begin with the measurement of primitive hardware

and operating system activities. After primitive activities are characterized, more complex experiments

should be done to define interactions between primitive activities. This orderly progression would not

only build up confidence in a system in an incremental manner but insure uniform coverage and make the

cause of unexpected phenomenon easier to locate. The steps in the building block approach included:

1. Initial Checkout and Diagnostics.2. Programmer's Manual Validation.3. Executive Routine Validation.

4. Multiprocessor Interconnect Validation.5. Multiprocessor Executive Routine Validation.6. Application Program Validation and Performance Baseline.7. Simulation of Inaccessible Physical Failures.

8. Single Processor Fault Insertion.9. Multiprocessor Fault Insertion.

10. Single Processor Executive Failure Response Characterization.11. Multiprocessor System Executive Fault Handling Capabilities.

12. Application Program Validation on Multiprocessor.13. Multiple Application Program Verification on Multiprocessor.

The first six tasks validate the fault-free functionality of the system while the remaining seven validate

fault handling capabilities.

2.1. Developing the Methodology

Over the last decade, CMU has devoted over 100 man-years to the design, construction, and validation

of multiprocessor systems. This research led to the development of a generalized methodology for

validating the fault-free performance of multiprocessor systems.

Some of the guidelines used in creating this methodology included:

• Refining the validation methodology as experiments uncover new information or themethodology is applied to new multiprocessor systems.

• Designing experiments to validate behavior that is documented as well as uncovering behaviorthat is not documented.

• Performing experiments in a systematic manner. Since the search is for the unexpected, thereshould be no shortcut to thorough testing.

• Designing experiments so that they are repeatable.

Page 10: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

• Using a building block approach that changes one variable at a time, so causes of unexpectedbehavior are easy to isolate.

• Allowing experiments to take advantage of the abstract levels used in the design of the system.

• Tempering experiments by available environments. More sophisticated experiments may haveto be postponed until the experimental environment is provided with more tools.

Each step of the methodology, like NASA's validation procedure, follows a building block approach.

The technique begins by conducting baseline experiments, experiments that measure a single phenomenon

while all other interactions are held constant. Baseline experiments are designed to validate the basic

assumptions used in the mathematical models from which the system was designed. They also test the

validity of the assumptions made by the system's programmers when designing the operating system and

other applications programs. After the baseline and individual phenomenon have been characterized,

advanced experiments to explore the interaction between basic phenomena are begun.

2.2. Validation Methodology Defined

The methodology begins with measurements of the execution times of system primitives, the overhead

time incurred when programs are executed, and the variation of function execution times on the system.

Specifically, the steps in the hierarchical procedure are as follows:

1. First, measure the time it takes to read the clock. Since the clock will be used for later

phases, the experimenter must be certain that the clock is predictable; a clock read must beconstant or vary predictably.

2. Next, measure single system parameters, or baseline parameters. This includes measuringinstruction execution times, operating system function execution times, and task manipulationphenomena.

3. Last, measure the iteration of programs on the system. The most efficient way to accomplishthis is to create synthetic workloads of different sizes and structures. The synthetic workloadenvironment is used to test features such as raw performance, bottlenecks, and overhead in the

operating system.

When applying the techniques to test systems of different architectures, it is not always possible to

perform identical experiments. Sometimes, exact duplications may prove to be architecturally impossible

or irrelevant for the type of system being tested. For these instances, careful substitutions must be made

to insure that the new experiments test comparable characteristics. When more sophisticated experiments

have to be postponed until the advent of more sophisticated tools, the experimenter is encouraged to

move on to the next step in the baseline experiments.

Page 11: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

3. The SIFT Environment

The Software Implemented FaulV-Tolerance (SIFT) computer is one of two prototype systems developed

for NASA for experimentation in fault-tolerant systems research [SRI 82]; the other is a hardware

redundant Fault-Tolerant Multiprocessor, FTMP. SIFT was designed and built by Bendix Flight Systems

Division, under subcontract to SRI International, and delivered to the Langley Avionics Integration

Research Laboratory (AIRLAB) in April 1982. This section gives a brief overview of the SIFT's hardware

configuration and experimental environment [SRI 84].

3.1. Hardware Configuration

The SIFT architecture is made up of a fully distributed configuration of Bendix BDX-930 processors,

with point-to-point communication links between every pair of processors as shown in Figure 3-1.

Although SIFT was designed and built to accommodate eight processors, there are seven in the current

system. Six of these processors are required for fault-tolerant experimentation; reliability estimations

have demonstrated only six are needed to meet the required safety margin of less than 101° probability of

failure per hour [Palumbo & Butler 85]. The seventh processor is used by the Data Acquisition System

described in the next section.

In a fully distributed system, dependency on shared facilities are kept to a minimum. Therefore, each

SIFT processor contains its own main memory, power supply, clock, and I/O channel. As shown in the

block diagram of Figure 3-2, each processor in the system includes:

• 16-bit CPU.

• 32K words of static random access memory (RAM) which holds the SIFT executive program,the application programs, the transaction and data files, and the control stack.

• 1K datafile memory used as a buffer area for the broadcast and 1553 controller.* 1K transaction file memory used to hold the destination address of the values in the datafile

to be transmitted.

• A broadcast controller for interprocessor communication.

• A 1553A controller used to support external I/O to terminals, sensors, or avionics modules.• A real-time clock consisting of a 16-bit counter driven by a 16MHz crystal; each clock tick is

equivalent to 1.6 microseconds.

3.2. SIFT Software

To run an experiment on SIFT, the user writes a task in Pascal on the host computer. Once a task is

written, it is compiled, assembled, and linked with the SIFT operating system. This procedure creates an

absolute executable image file that can be loaded directly onto the selected SIFT processors. Reliability is

achieved by replicating the task on more than one processor. The number of processors chosen is

specified by the user, whose decision is based on the importance of the task.

Allocation of a task is done through a user defined Schedule Table. The Schedule Table lists the set of

Page 12: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

1

SENSORS SENSORS ] SENSORS

AND AND I ANDA CTUATORS A C TUA TOR S A CTUATORS

io"+t ++I '+IPROC 1 PROC 2 PROC 7

AND AND _- ANDMEMORY MEMORY MEMORY

I I t I I l1,_*_s_ I_°_'_/ l,,_*_ l _'_ +_°__

Figure 8-1: Block Diagram of SIFT Distributed System

Page 13: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

16K 16KMEMORY MEMORY

[ [ MEMOR_B_S BROADCASTCONTROLLER

1K DATAFILE

CPU 1K TRANS FILE

1553Ainte rupt CONTROLLER

REAL-TIMECLOCK

Figure 3=2: Block Diagram of a SIFT Processor

Page 14: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

tasks that will be periodically dispatched, along with task specific information. It is the user's job to

decide the order tasks are executed, the number of processors used for replication, and the data to be

voted. The user must also specify the "duration" of the task in increments of 1.6 millisecond slots. This

step insures results are broadcasted in time for voting. It also prevents a non-faulty processor from being

configured out of the system because of task time-out.

After execution of a task, the results from each processor are compared, or Nvoted = on. If all copies are

not the same, an error has occurred. These errors are recorded in the processors' memory to assist the

Executive System in determining which processor is faulty. If an error occurred, the Executive System

isolates the fault by ensuring that only the correct or Umajority" value is passed onto the next task.

Fault isolation prevents a faulty unit from causing problems in the system, such as corrupting a non-

faulty processor's memory. If fault isolation is not done, a faulty or umalicious" processor could create a

life threatening situation, such as transmitting an invalid control signal. Once a processor has been found

faulty, the Executive System reassigns the processor's tasks to another processor, thereby configuring it

out of the system.

3.3. Experimental Environment

SIFT provides the experimenter with a user-friendly test environment, promoting experimentation

through interactive facilities designed to help prepare, exercise, and observe the system's behavior. Figure

3-3 depicts the test environment as seen by the user. From a terminal linked to the host computer, a

researcher can create and run experiments on SIFT, collect data, print out files, and dump data to an

on-line printer. All communication to and from SIFT is through a VAX-11/750. This host computer is

solely dedicated to SIFT research. NASA also installed added features to the SIFT environment to

enhance experimental conditions: a Data Acquisition System (D/kS) for improved data collection and a

global clock for improved measurement conditions.

DAS is made up of many integrated programs that receive and analyze data from the SIFT processors.

These programs are downloaded to the seventh SIFT processor, which can then control data collection.

Before this system was created, data collection was limited to 4K words of memory. With the Data

Acquisition System, information is sent from the SIFT processors to a disk capable of holding 50K blocks,

a total of 12.8M words. DAS requires some initial preparation, but it features an interface program that

facilitates the task. A preprocessing program is also available which provides the user with the ability to

manipulate the data straight from the disk, and the ability to specify what data to save for later

processing.

The global clock is a 16-bit independent measuring device, simultaneously available to all processors via

a read bus. There is no arbitration for this bus and no contention. It features a programmable time-base

Page 15: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

TERMINAL Proe1

HOST

COMPUTERDISK Proc

VAX 2

11/7501

I

I

I

I

LINE ProcPRINTER 7

(DAS)

Figure 3-3: Block Diagram of the SIFT Test Environment

Page 16: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

10

so the user can specify the resolution of the clock (i.e. 1 microsecond, as used in these experiments). The

global clock assures the consistency and reliability of the measurements taken by the processors, sinceclock times come from a common external reference.

Page 17: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

11

4. FTMP and Its Experimental Environment

Since a comparison of SIFT to FTMP will be made, this section will give a brief software overview of

the FTMP system and its experimental environment. Additional information can be found in {Clune

84, Feather 85, Czeck 85].

Figure 4-1 is a block diagram of the FTMP system as seen by the user. Each virtual processor is three

processors tightly coupled and executing in lockstep. Reliability is obtained by having the three

processors (referred to as a "triad") executing code independently and performing hardware votes on the

results. Hence, a triad appears to be a single processor executing a single instruction stream. Within

each processor is a local PROM containing system executive code and initialization data. Also in each

processor is a local RAM containing local data, working stack, and the application code paged in from

system memory. The system memory is also triply redundant, and contains application code and system

data. A quintuply redundant serial bus connects the triads to global memory, I/O devices, a real-time

clock and the error latches.

Processor Processor Processor

Triad 1 Triad 2 Triad 3

8K 8K 8K 8K 8K 8K

PROM RAM PROM RAM PROM RAM !

I/O Port 1 I

I/O Port 2

SystemMemory

32K

--1 I/OPort9 ]System Bus

--t, I/OPortlO 1

Real Time ErrorCounter L arches

Figure 4-1: FTMP Block Diagram, [Czeck 85]

Page 18: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

12

In addition to the nine processors that make up the three triads, FTMP has a tenth processor that

serves as a spare. When a processor is found faulty, the spare is configured in as a replacement. If

another failure occurs, the damaged triad is retired since there are no more replacements, and its

functioning processors are set aside to be used as spares.

FTMP's experimental environment is slightly more complicated than SIFT's, as illustrated in Figure

4-2. Tasks are written on the VAX and transferred to the IBM, where all compilation of tasks and tables

and task linkage take place. The IBM sends the resulting assembly code, absolute load module, and any

corresponding errors to the VAX. Executable code is down-loaded to FTMP via the CTA program

running in PDP-11 Emulation mode. The Test Adapter is also used for debugging and memory

manipulation on FTMP.

IBM VAX 11o7504381

PDP-11Emulation

Unibus

i

Test Adaptert

!

FTMP Rs MilStd Fault

Display 232 15_ Injector

FTMP • FTMPI/O . SystemInterface

Figure 4-2: FTMP's Test Environment, [Czeck 85]

Like SIFT, work on FTMP is divided into tasks. The difference however, is that tasks on FTMP are

executed at different iteration rates. Tasks are grouped into one of three frame sizes depending on their

priority. For example, updating a display terminal need not be done as frequently as adjusting the

Page 19: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

13

plane's airspeed, so its task will reside in a slower frame. FTMP executes tasks in the highest priority

frame first, moving to lower priority frames when the higher priority tasks are done.

Page 20: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

14

5. The Experiments

The goal of this research was to demonstrate the robustness of the validation methodology used on

FTMP through application to SIFT. As with all research, the success of the experiments was tempered

by the SIFT environment. There were instances where exact duplications proved to be impossible or

irrelevant. For these cases, careful substitutions were made to insure that the new experiments would test

comparable characteristics. Experiments reported in this paper provide a careful study of the baseline

primitives. These experiments fall into four categories:

• Measuring the characteristics of the real time clock.• Measuring instruction execution times.

• Measuring execution times of instruction combinations.• Measuring the effects of task stretching.

Figure 5-1 illustrates the basic task used in measuring the baseline parameters. Each processor reads

the global clock and stores the value of the starting time in memory. It then enters the loop where it

executes the statement being tested, LOOPCOUNT times. After the loop terminates, the global clock is

read again and the ending time is stored. Complete experiments incorporated the basic task and were run

as follows:

1. The variable LOOPCOUNT was set to 100. Consequently, the task provided the time, inmicroseconds, for an instruction to be executed 100 times.

2. The task was repeated 250 times, providing 250 data points per run.

3. Step 2 was repeated 4 times, providing 1000 data points per processor.

All experiments were arbitrarily chosen to run on three processors, producing 3000 points of data for

statisticalanalysis.The timerecordedin thebasictask,includedthe overheadforexecutionof the loop.

Therefore,one experimentwas dedicatedto measuringthe time for a nullloop,so thatthe overhead

couldbe subtractedfrom futureexperiments.

begin

data[time]:--gclock;fori:_ 1 toLOOPCOUNT do

beginfunctionto be measured

end;

data[time+l]:_-_gclock;end

Figure 5-1: BasicTask Algorithm

Once a taskwas compiled,linkedand down-loaded,the executableimage filedid not have to be created

againunlessa change was made inthe code. Through SIFT'sinteractivefacility,changescan be easily

made to a downloaded file,sinceitallowsthe userto readand setmemory locationsdirectly.This gives

Page 21: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

15

the experimenter the advantage of varying experimental loops without recreating the image file. The only

problem in using shortcuts is that inconsistencies can be generated. For example, the iteration of a loop

can be increased beyond the allocated time specified in the Schedule Table, causing the task to timeout

and experiment to fail. In such an event, the Schedule Table could also be altered through the interactive

facility, but it would mean changing all schedule tables for every configuration for each processor,

possibly totally up to 24 tables.

5.1. Clock Read Characteristics

In the first category of baseline experiments, the characteristics of the global clock were measured. This

experiment was essential since before the global clock can be used as a measuring tool it must be

ascertained that reading it produces consistent results, or any variations are predictable. To insure that

future experiments using the global clock are valid, repetitive readings of the global clock were performed.

For this experiment, a clock read statement was inserted in the task of Figure 5-1 and iterated 100 times.

Using the experimental procedure described, 3000 data points were collected.

8.2. Instruction Times

In the second category of SIFT baseline experiments, the execution times of various instructions were

measured. This experiment provided the first accurate documentation of SIFT instruction times. In the

past, the user made a random guess as to how much time to allocate a task. This set of experiments

made it possible to put together an accurate listing by which educated estimations can now be made.

Since the performance of SIFT and FTMP are to be compared, efforts were made to insure that as

many of the applicable instructions tested on FTMP [Clune 84, Feather 85] were also measured on SIFT.

Since the SIFT environment does not provide hardware or software support for real or long word data,

only integer and boolean data types were measured. Instructions tested are listed in Table 5-1.

Each of the instructions in Table 5-1 was executed inside the basic task. The null loop itself was

measured so that the overhead from its execution could be subtracted from the results of the other

statements. Using the standard procedure, 3000 data points per instruction were collected.

5.3. Instruction Combinations

In the third category of baseline experiments, the execution times of instruction combinations were

measured to determine if the results exceeded the worst case time, the sum for executing each instruction

alone. This was an important experiment since in the SIFT operating system the user is responsible for

defining the duration of a task: if instruction combinations take longer than expected, the allocated time

may prove insufficient and the task will time out. It was also of interest to determine if tile system's

compiler takes advantage of optimizations.

Page 22: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

16

• Null loop.• Integer Assign, A---1.

• Integer Variable Assign, A _ B.• Integer Addition, A -- B + C.• Integer Multiply, A _ B * C.• Integer Division, A -----B div C.• Integer Negate, A --- -B.

• Integer comparisons (greater than/equal to, less than, equal to).• Boolean Assign, A_-M-True.• Boolean Variable Assign, A -- B.• Boolean Or, A _ B or C.• Boolean And, A -- B and C.• Boolean Negate, A _-_NOT B.• If-then, if-then-else conditional statements.• Procedure calls with 0 through 4 parameters.

Table 5-1: Instructions Measured on SIFT

For this experiment two approaches were taken. One experiment tested the effect on execution times

when the consecutive iteration of a single instruction was increased. For this case, the integer assign

statement A_--1 was iterated between 1 and 20, inside the basic task loop. The second set of experiments

measured the execution times of instruction pair and triple combinations. The instruction combinations

are given in Table 5-2. Each set of instructions was executed in the basic task. Using the standard

procedure, 3000 data points were collected.

• Integer Assign and Integer Add.* Integer Assign and Integer Multiply.

• Integer Assign and Integer Divide.• Integer Assign, Add, and Multiply.• Integer Assign, Multiply, and Divide.

• Integer Assign, Add, and Divide.• Other combinations duplicating FTMP experiments: Assign-Assign and Addition-Addition.

Table 5-2: Instruction Combinations Tested on SIFT

5.4. Task Stretching

In the last category of baseline experiments, the effects of task stretching was explored. Theoretically,

as long as a task does not exceed its time allocation and a processor does not broadcast bad data, a

processor is considered healthy. If either condition is violated, the processor will be tagged "faulty" and

be configured out. The purpose of task stretching experiments was to determine whether a faulty

processor really will be reconfigured out.

To ensure the task stretching results were accurate, experiments were done to validate the 1.6

millisecond slot size. In one experiment, the time between consecutive tasks was measured by reading the

Page 23: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

17

clock upon entering each task. These tasks were allocated a duration time of one slot per task. In a

second experiment, each task was allocated two slots and run consecutively. In a last experiment, three

one-slot tasks were run back to back and the time was taken upon entering the first task and entering the

third task. For a valid time slot, the results were expected to be consistent no matter how it was

measured.

To measure the effects of task stretching two approaches were taken. In the first experiment, conditions

for task timeout were explored. Processor 1 and 3 were allowed to complete their task before the

deadline, while processor 2 stretched its task beyond the allocated time. In the second experiment, the

broadcast of bad data was explored. There are two ways in which bad data can be passed: a processor

can broadcast malicious data even though it had enough time allocated to finish the task, or it can

broadcast bad data because the task ran out of time. An experiment was done to test for both cases.

Figure 5-2 shows the program used for the second case.

begin

data[time] :---- gclock;for i :--_ 1 to LOOPCOUNT do

beginif i -- WHEN then

stobroadcast(passit, 16_ABCD);if pid _ 2 then

for j :-- 1 to STRETCH doextraloop :-- j;

end;

data[time+l] :-- gclock;end

Figure 5-2: Program Used For Task Stretching - Voting Case

To fully control the two conditions that could trigger a configuration two variables, STRETCH and

WHEN, were introduced. STRETCH controls the amount the task running on processor 2 is lengthened,

or stretched. WHEN signals processors 1, 2, and 3 to broadcast hex value ABCD, arbitrarily c:hosen data.

To test the full effects of task manipulation, STRETCH and WHEN were varied from 1 to 20.

Page 24: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

18

6. Results and FT_JIP Comparisons

In this section, the results of the SIFT baseline experiments are reported. Where applicable,

comparisons are made with the validation experiments performed on FTMP.

6.1. Read Time Clock Delay

In this experiment, the clock was tested for repeatability. Analysis of the data shows the global clock to

be a reliable measuring tool. Clock read results for the three processor used are shown in Table 6-1.

Read Time Clock Delay

Microseconds Per 100 Reads with Overhead

Processor Microseconds

min/max

P1 2855/2856

P2 2855/2856

P3 2857/2860

Table 6-1: SIFT Clock Read Results

As Table 6-1 illustrates, the clock reads differed by only 1 to 3 microseconds within a processor, a

negligible amount. As for the variation between processors, the maximum difference was five

microseconds. This is an excellent result considering the SIFT processors are only loosely coupled. The

difference in clock read time is caused by slightly different processor execution rates.

Execution Time for SIFT Clock Read:

100 Iterations of 1 Clock Read -_- 2.86 milliseconds

With Null Loop Overhead

1 Clock Read -- 17.7 microseconds

Without Null Loop Overhead

Execution Time for FTMP Clock Read: 1

16 Iterations of 5 Clock Reads -- 13.99 milliseconds

With Null Loop Overhead1 Clock Read _ 172 microseconds

Without Null Loop Overhead

Table 6-2: Clock Read Results for SIFT and FTMP

1Average of two experiments reported in [Clune 84 I.

Page 25: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

19

Summaries of SIFT and FTMP clock read results are shown in Table 6-2. For both machines, the

global clock proved to be a reliable measuring device where any delays were predictable and negligible. In

comparison to FTMP however, SIFT's clock can measure finer grain of events since it is 10 times faster.

6.2. Instruction Measurements

Since the performance of SIFT and FTMP are to be compared, efforts were made to insure that as

many of the applicable instructions tested on FTMP [Clune 84, Feather 85] were also measured on SIFT.

Table 6-3 summarizes the execution times for these instructions. Appendix 1.2 contains complete tables of

instruction execution times and statistical information for each instruction. As shown in Table I-2 for

every case, the result was well within the margin of error for a 95_ confidence interval. As an example

of a Mtypical N result, the execution time for integer assign A--1 was 3.70 microseconds per instruction.

For 100 instructions (assuming normal distribution), a 95_ confidence interval of 0.0072 was calculated 2.

A histogram of the integer assign A_I (Figure 6-1), illustrates that data points were usually within one

microsecond of each other.

Along with simple instructions, execution times for procedure calls were measured for various number of

parameters. To help visuMize the results in Table 6-3, Figure 6-2 plots procedure calls against number of

parameters. An analysis of Figure 6-2 shows that after some slight initial overhead, the execution time

increases linearly with increasing number of parameters. As a comparison, the results of FTMP's

experiment is plotted on the same graph. Although FTMP's execution time also increases linearly, it has

a 474_o more overhead than SIFT's. Since FTMP is a stack machine, it executes extra instructions that

SIFT does not. It must push the number of parameters on the stack before executing a return statement.

The return statement must then pop this number of parameters so it can adjust the stack pointer before

returning control to the calling program, thereby removing parameters no longer needed.

As an overall comparison, the execution speed of SIFT instructions is listed along side FTMP's in Table

6-4 and illustrated in Figure 6-3. Although SIFT requires more time to negate variables, it is faster at all

other instructions including procedure calls. For example, when executing a uORu function; SIFT is

219_ faster. This disparity is due to the differences in compilers. Whereas SIFT's compiler simply loads

the variables into two registers and uORUs them, FTMP's compiler tests each variable separately and

executes code depending on the outcome of the test (i.e. if the first variable is true, it jumps without

testing the other). Worst case is when both variables are false: it must test both variables before it can

jump. An unweighted average (assuming all instructions tested are equally likely) shows that SIFT is

129_ faster than FTMP in executing instructions.

2Referto IFerrari78]for a descriptionof confidenceintervals and how they are calculated.

Page 26: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

2O

Summary of SIFT Instruction Execution Times

(Without Null Loop Overhead) ....

Pascal Description Microsecs Per Instruction

Instruction ..... iA :--- 1 Integer Assign 3.70A := B Integer Variable Assign 4.39A := B + C Integer Addition 6.45A :--- B * C Integer Multiply 12.57A := B div C Integer Division 20.83A :--- -B Integer Negate 9.48A :--- B - C Integer Compare 8.51A := B >= C Integer Compare 9.70A :- B <: C Integer Compare ....... 9.45 .........

A "- True Boolean Assign 3.70A "-- B Boolean Variable Assign 4.39A "- B or C Boolean Or 6.89

]

I A "-'- B and C Boolean And 6.89A := NOT B BooleanNegate ..... 6.26

NULL Null Loop 10.86

Procall() Procedure Gall 6.45Procall(A) Procedure Call 7.00Procall(A,B) Procedure Call 15.88Procall(A,B,C} Procedure Call 20.27Procall(A,B,C,D ) Procedure Call 24.39If GO then A:--1 Conditional, True 6.95If GO then A:=I Conditional, False 3.70If GO then A:-1 Conditional, True 8.32Else B:= 1

If GO then A:-I Conditional, False 7.14Else B:-- 1 J

Table 8-3: Summary of SIFT Instruction Execution Times

8.3. Instruction Combination Measurements

This experiment measured instruction combinations to determine if their combined execution times

exceeded worst-case results. This experiment also uncovered compiler optimizations. The first part of

this experiment measured the execution time for the integer assign instruction A:_I, as its consecutive

iteration inside the basic task loop was increased from 1 to 20. Figure 6-4 illustrates the results.

Inspection of Figure 6-4 shows that for SIFT although execution time increases linearly with the number

of iterations, the slope reflects a small amount of compiler optimization. An analysis of the assembly

code shows that savings occurs because the compiler loads a register with the value 1 the first time and

uses stores to assign 1 to A thereafter, as illustrated in Table 6-5.

Page 27: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

21

2250 ......

2000 -

1750 -

1500 --

1250 --

Frequency

1000 --

750 -

500 -

250 -

0f 1 I I I I

1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460

Microseconds per Hundred Instructions

FiKure 6-1: Frequency vs. Microseconds per 100 A--1 Iterations

Page 28: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

22

7O

J

60-_

5O

40--Microsecs

Per Instruction

30-

SIFT

20-

10- _

I I I0 I 2 3 4

Number of Parameters

Figure 6-2: Procedure Calls vs Parameters

Page 29: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

23

Instruction Execution Times: SIFT vs. FTMP

• ........... o , , .... •

Pascal Description SIFT FTM1 _ Percent DifferenceInstruction ......

A := 1 Integer Assign 3.70 4.0 8.1%A := B Integer Variable Assign 4.39 5.5 25.3%A := B + C Integer Addition 6.45 10.0 55.0_vA := B * C Integer Multiply 12.57 20.2 60.7%A := B div C Integer Division 20.83 21.7 4.2%A := -B Integer Negate 9.48 7.0 -26.2v-_A := B = C Integer Compare 8.51 23.2 172.6%A := B >= C Integer Compare 9.70 23.5 142.307vA := B < C Integer C.0mp_re .... 9.45 21.2 124.37Zv

A := True Boolean Assign 3.70 4.0 8.1_7Y_A := B Boolean Variable Assign 4.39 5.5 25.3%A := B or C Boolean Or 6.89 22.0 219.3_A := B and C Boolean And 6.89 21.1 206.2_

A := NOT B Boolean Negate. 6.26 10.9 74.12_ _

NULL Null Loop 10.86 17.7 63.0%

Procall() Procedure Call 6.45 37.0 473.6%

I Procall(A} Procedure Call 7.00 51.7 638.6%Procall(A,B} Procedure Call 15.88 57.5 262.1%Procall(A,B,C) Procedure Call 20.27 63.2 211.8%Procall(A,B,C,D ) Procedure Call 24.39 69.0 182.9%If GO then A:= 1 Conditional, True 6.95 9.0 29.5%If GO then A:=I Conditional, False 3.70 5.5 35.1%If GO then A:=I Conditional, True 8.32 13.2 58.7%Else B:= 1

If GO then A:=I Conditional, False 7.14 9.5 33.0%Else B:= 1

Table 6-4: Instruction Times: SIFT vs. FTMP

In comparison, the results of FTMP's experiment show that although FTMP's graph is also linear, there

is no compiler optimization. This result is plotted along SIFT's results in Figure 6-4. Since FTMP is a

stack machine, consecutive stores are not done. As shown in Table 6-5, FTMP must execute a push and

pop for every instruction. Consequently, although SIFT and FTMP start off with a similar execution

time, by the 20th iteration SIFT is done 94% sooner.

In the second set of experiments the execution times of instruction pair and triple combinations were

measured. Complete statistical results for these combinations are included in Appendix 1.3. Table 6-6 is

a summary of these results. Analysis shows that none of the combinations exceeded the expected time

limit; the small savings shown in the first five was probably due to experimental error since analysis of

the assembly code showed no compiler optimization. The assign-multiple-divide showed true savings since

the compiler did not have to reload C after the multiplication was done.

Page 30: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

24

30

F'IlvIP

_ ,_ s _ "" ,,_

I _"0 "_ "" "

20 - ,' " * " "/

/

/

//Microsecs.I

perInstruction ,'

/

I

/

10 -- _'

"_ SIFT

. I I _ I I I I

Int.Ass Var.Ass Neg Add Or And Compare Mult DivInstruction

Figure 8-3: Graph of Instruction Times: SIFT vs. FTMP

Page 31: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

25

8O

7O

FTMP

6O

50 SIFT

Microsecs. 40Per Instruction

3O

2O

10

0 l I l5 10 15 20

Iterations

Figure 6-4: A=I vs. Consecutive Executions

Page 32: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

26

Instruction SIFT FTMP

A = 1 Load 1,R1 Push 1

Store RI,A Pop AStore R1,A Push 1

Store R1,A Pop AStore R1.A Push 1

Store RI,A Pop APush 1

Pop APush 1

Pop A

Table 6-5: SIFT vs FTMP for Integer Assign A--1

Comparisons of Instruction Combinations Not Tested on FTMP

(in microseconds per instruction without overhead) ......

Pascal Description Time If Done Time For Percent DifferenceInstruction ...... S.ep.arately Combination Separate vs. Combo

A :- 1 Assign & Add 10.15 9.88 2.7%A := B + C Combination .....................

A := 1 Assign & Mult 16.27 16.01 1.6%A := B * C Combination

A := 1 Assign & Div 24.53 24.27 1.1%CombinationA "= B div C _ ......

A "= 1 Assign, Add, Mult 22.72 22.46 1.2%A "= B + C CombinationA'=B*C

A "= 1 Assign, Add, Div 30.98 30.45 1.7%A := B + C Combination

A:=B/C

A := 1 Assign, Mult, Div 37.1 34.78 6.7%A := B * C Combination !

A:=B/C .... i

Table 6-0: SIFT: Comparison Instructions Combinations Not Done on FTMP

The architecturaldifferencebetween registerand stack machines was witnessedwhen instruction

combinationsperformedon FTMP were appliedto SIFT. Table 6-7 and Table 6-8 shows the resultsof

these combinations. Table 6-9 is an example illustratinga representativeinstructioncombination.

SIFT's compilerusesregisterallocationto avoid unnecessaryloads and stores,whereas FTMP's must

push itemson thestackeachtime. In general,theonlyoptimizationFTMP featuresisa duplicatestore:

insome cases,ifa variableisgoingtobe usedtwiceitisduplicatedinsteadofstoredand reloaded.

Page 33: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

27

SIFT Combination Comparisons(in microseconds uer instruction without overhead]

'l • , ,a •

Pascal Description Time If Done Time For Percent Difference

Instruction Separa,tely Combination (Separate vs. Combo)

B "= 2 Assign 7.4 5.76 28.5%C :--- 2 Combination

B := C + D Addition 12.9 8.95 44.1%E := C + D Combination

B := C + D Addition 12.9 12.63 2.1%

E := F + A Combination ....!B :-- C + D Addition 12.9 12.63 2.1%

E := A + B Combination ........

B := 2 Assign 8.09 7.82 3.4%C :-- B Combination ....

B :-- 2 Assign 8.09 7.82 _! 3.4%C :-- D Combination

Table 6-7: SIFT: Comparison Between Single Instructions and Combinations

FTMP Combination Comparisons

(in microseconds per instruction without overhead,)HLL Description Time If Done Time For Percent Difference

Instruction Separately Combination (Separate vs. Combo)

B = 2 Assign 8.0 8.0 0.0%C = 2 Combination

B -- C + D Addition 20.0 20.0 0.0%

E -- C + D Combination ....

B -_ C + D Addition 20.0 21.0 -5.0%E = F + A Combination

B = C + D Addition 20.0 21.0 -5.0%

E = A +B CombinationB = C + D Addition 20.0 19.5 2.5%

E = B + A Combination

B -'- 2 Assign 9.5 6.5 31.5_C = B Combination

B-- 2 Assign 9.5 9.5 0.0°/oC -- D Combination

Table 6-8: FTMP: Comparison Between Single Instructions and Combinations

Page 34: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

28

Instruction SIFT FTMP

B = C + D Load CoRI Push C

E = C + D Add D,R1 Push D

Store RI,B Add

Store RI°E Pop BPush CPush D

Add

Pop E

Table 8-9: SIFT vs. FTMP in Addition Combination

6.4. Task Stretching Results

The experiments conducted to validate the slot size produced consistent results. This guaranteed all

tasks on all processors were allocated equivalent slot sizes, in multiples of 1.6 milliseconds. The first task

stretching experiment explored only the condition of a task not meeting its time allocation. For this

experiment, the system behaved as predicted--the straying processor was halted when its task took longer

than the scheduler allowed.

In the second experiment, voting was introduced. One experiment tested the system's reaction to a

processor that broadcasted malicious data even though enough time was allocated for it to finish its task.

As expected, the processor was configured out of the system. Another experiment explored the case where

a processor was forced to broadcasted invalid data because its task timed out. This result is summarized

in Table 6-10. For this case, the processor was not only reconfigured out because it passed bad data, but

also because it timed out.

The results of the task stretching experiments proved that SIFT protects itself against, faulty or

"malicious" processors by reconfiguring them out of the system. In comparison, when this type of

experiment was executed on FTMP the processes were never halted and the frame stretched to

infinity [Clune 84].

Page 35: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

29

SIFT Task Stretching ResultsfTask Execution Time in Milliseconds]

r ....... :f

WHEN STRETCH Task Execution Time (max) P2 ReconfiguredP! P2 P3 Out

1 5 2.36 9.77 2.31 No10 5 2.36 9.78 2.31 No50 5 2.38 9.80 2.35 No

100 5 2.36 9.76 2.31 No......

1 7 2.36 12.32 2.31 No10 7 2.36 12.33 2.31 No50 7 2.38 12.35 2.35 No

100 7 2.36 12.31 2.31 No....

1 8 2.36 timed out 2.31 Yes10 8 2.36 timed out 2.31 Yes50 8 2.38 timed out 2.35 Yes

100 8 2.3{} timed out 2.31 Yes ,1 10 2.36 timed out 2.31 Yes

10 10 2.36 timed out 2.31 Yes50 10 2.38 timed out 2.35 Yes

100 10 2.36 timed out 2.31 Yes1 100 3.77 timed out 3.70 Yes

10 100 3.78 timed out 3.71 Yes50 100 3.82 timed out 3.76 Yes

100 100 3.76 timed out 3.70 Yes

Table 8-10_ SIFT: Task Stretching Results

Page 36: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

3O

7. Future Work

Although application of the validation methodology on SIFT has thus far proven successful, it is by no

means complete. The following sections discuss a few thoughts in these areas.

7.1. Baseline Experiments

To complete the baseline experiments, an experiment on task interaction still remains to be done. The

experiment, as it was conducted on FTMP [Clune 84], is not appropriate for SIFT. Therefore, a careful

modification must be made to insure that comparable characteristics are tested. In one unsuccessful

attempt, a task was executed on a single set of processors and an attempt was made to start up the next

task on another set of processors, so that switching time could be measured. Unfortunately, after a few

unsuccessful runs, it was realized that halting a set of processors to start another set is equivalent to

crashing the system. Therefore, to perform this experiment another approach must, be tried.

Measurements should include the time it takes to switch from one task to another on one set of

processors, and if possible, the time it takes to switch from one set of processors executing a task to

another set of processors executing the next task.

7.2. Synthetic Workload

To date, no work on the synthetic workload has been attempted. The implementation of the synthetic

workload developed for FTMP must be studied [Feather 85], and a comparable experiment must be

designed for SIFT. A synthetic workload is a set of tasks that exercise a computer the same way a

natural workload would, but without all complexity. It is easier to implement than a natural workload,

since it uses simple and repetitive instructions; thus, making it easier to debug. Also, modifications to a

synthetic workload can be readily made since it is controlled with parameters. Although implementing a

synthetic workload is a means of measuring performance, once a synthetic workload is running on SIFT,

it could prove to be a valuable stepping stone to more sophisticated experiments. For example, one idea

presented during the FTMP experiments [Feather 85] was to integrate the synthetic workload with the

fault-injection experiments.

Page 37: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

31

8. Conclusions

The purpose of this research was to demonstrates the robustness of the validation methodology by

reporting the results of its application on SIFT. This report was to also show that by using identical

baseline experiments, the performance of two architecturally different systems could be directly compared.

Application of the methodology was successful. As with all research, the success of the experiments were

tempered by the environment, but careful substitutions were made to insure that new experiments would

test comparable characteristics. Using the methodology made it possible to compare SIFT and FTMP.

The following is a brief summary of the results:

1. Clock Read Delay

• Like FTMP, SIFT's global clock proved to be a very reliable measuring device where

any delays were predictable and negligible. In comparison to FTMP however, SIFT'sclock can measure finer grain of events since it is 10 times faster.

2. Instruction Execution Times

• Although SIFT requires more time to negate variables, it is faster at all otherinstructions including procedure calls. Overall, SIFT executes instructions 129_ fasterthan FTMP.

3. Instruction Combinations

• Because SIFT is a register machine, its compiler is able to optimize for cases whereFTMP's compiler can not. In fact, the only optimization FTMP features is a duplicatestore.

4. Task Stretching

• SIFT handles Mmaliciousm processors exactly as predicted: if a task does not completebefore its allocation of 1.6 millisecond time frames, or if it broadcasts bad data, it willbe reconfigured out of the system. In comparison, when this type of experiment wasexecuted on FTMP the processes were never halted and the frame stretched to infinity.

These experiments have shown that by applying a building block approach in a systematic manner, a

fault-tolerant system can be validated through manageable levels of experimentation. One conclusion

that can be made is thus far the methodology has proven to be general enough to apply to SIFT, and has

produced results that were directly comparable to previous FTMP experiments.

Page 38: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

32

I. Appendix

1.1 Clock Read Dump

1.2 Statistical Data on Instruction Times

1.3 Statistical Data on Instruction Combinations

Page 39: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

33

1.1. Clock Read Dump

Raw Data: Read Time Clock Delay

(Microseconds Per 100 Clock Reads; Including Null Loop Overhead)

Processor Actual Readings (Hex) MicrosecondsStarting Time Ending Time {Decimal)

P1 53F6 5F1D 2855P2 53F3 5F1B 2856P3 53F0 5F1A 2858

P1 B4BC BFE4 2856P2 B4B7 BFDF 2856P3 B4B9 BFE5 2860

P1 C519 D041 2856P2 C51A D042 2856

.p3 C517 D040 .... 2857P1 F694 01BC 2856P2 F694 01BB 2855P3 F691 01BC 2859

Table I-l" Raw Data: SIFT Clock Read Experiment

Page 40: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

34

1.2. Statistical Data on Instruction Times

Instruction Execution Time: Integer and Boolean(Ran[e for 95_ Confidence Interval) ,• w ....... ,,,

Pascal Description microsecs/100 microsecs per instructionInstruction inst. w/overhead w/overhead w/o overhead

A := 1 Integer Assign 1456.28 ±.0072 14.56 3.70A := B Integer Variable Assign 1525.00 ±.0047 15.25 4.39A := B + C Integer Addition 1731.19 ±.0056 17.31 6.45A := B * C Integer Multiply 2343.46 ±.0089 23.43 12.57A := B div C Integer Division 3169.47 ±.0089 31.69 20.83A := -B Integer Negate 2031.08 ±.0036 20.31 9.48A := B = C Integer Compare 1937.37 ±.0083 19.37 8.51A := B >= C Integer Compare 2056.07 ±.0033 20.56 9.70A := B < C Integer Compare 2031.08 ±.0036 20.31 9.45A := True Boolean Assign 1456.26 ±.0069 14.56 3.70A := B Boolean Variable Assign 1526.26 ±.0036 15.26 4.39A := B or C Boolean Or 1774.96 ±.0035 17.75 6.89A := B and C Boolean And 1774.94 ---.0037 17.75 6.89

A := NOT B Boolean Negate 1712.43 ±.0088 17.12 6.26

Table I-2: Instruction Execution Time: Integer and Boolean Data Types

StatisticalInformation:Integerand Boolean(microsecondsver 100 instructions-with nullloouoverhead_

:: _ • ........... • •

Pascal Description min/max variance standardInstruction deviation,,

A :- 1 Integer Assign 1456/1457 .201 .448A := B Integer Variable Assign 1524/1526 .130 .361A :ffi B + C Integer Addition 1731/1732 .156 .395A :ffi B * C Integer Multiply 2343/2344 .248 .498A :ffi B div C Integer Division 3169/3170 .249 .500A := -B Integer Negate 2031/2032 .101 .318A := B = C Integer Compare 1937/1938 .232 .482A :- B > = C Integer Compare 2056/2057 .092 .304A := B < C Integer Compare 2031/2032 .101 .318A := True Boolean Assign 1456/1457 .194 .441A := B Boolean Variable Assign 1526/1527 .102 .319A :ffi B or C Boolean Or 1774/1775 .097 .311A := B and C Boolean And 1774/1775 .103 .320A := NOT B Boolean Negate 1712/1713 .245 .495

Table I-3: Statistical Information: Integer and Boolean Data Types

Page 41: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

35

InstructionExecution Time: Miscellanous

,, (Range for95% Confidence Interval) _

Pascal Description microsecs/100 avg.microsecsper instruction

Instruction instr,w/overhead w/overhead I w/o overhead

NULL NullLoop 1086.38-.0085 - 10.86

Procall() Procedure Call 1731.19 ± .0054 17.31 6.45Procall(A) Procedure Call 1785.92 ±.0064 17.86 7.00Procall(A,B) Procedure Call 2674.57 ±.0088 26.75 15.88Procall(A,B,C) Procedure Call 3113.23 ±.0065 31.13 20.27Procall(A,B,C,D) Procedure Call 3525.58 ±.0087 35.26 2,1.39If GO then A:=I Conditional, True 1781.18 ±.0052 17.81 6.95If GO then A:=I Conditional, False 1456.29 ±.0074 14.56 :3.70If GO then A:=I Conditional, True 1918.62 ±.0084 19.19 8.32Else B:= 1

If GO then A:=I Conditional, False 1799.90 ±.0042 18.00 7.14Else B:= 1

Table I-4: Instruction Execution Time: Miscellaneous Instructions

Instruction Execution Time" Miscellanous(microseconds per 100 instructions - with overhead)

Pascal Description min/max variance standard

Instruction , deviationNULL Null Loop 1086/1087 .236 .486Procall() Procedure Call 1731/1732 .151 .389Procall(A) Procedure Call 2262/2263 .179 .423Procall(A,B) Procedure Call 2674/2675 .245 .495Procall(A,B,C) Procedure Call 3112/3114 .183 .428Procall(A,B,C,D) Procedure Call 3525/3526 .243 .493If GO then A:-I Conditional, True 1781/1782 .145 .380If GO then A:=I Conditional, False 1456/1457 .207 .455If GO then A:=I Conditional, True 1918/1919 .235 .485Else B:= 1

If GO then A'-I I Conditional, False 1799/1801 .117 .342[ Else B:= 1 i

Table I-5: Statistical Information: Miscellaneous Instructions

Page 42: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

36

1.3. Statistical Data on Instruction Combinations

Instruction Execution Time: Combinations(Range for 95% Confidence Interval)

.... ;", i ,,w , , J :

Pascal Description microsecs/100 microseconds per instructionInstructio.n instr, w/overhead w/overhead w/o overhead

A :-- 1 1 Iteration 1456.28 ±.0072 14.56 3.702 Iterations 1662.48 -.0089 16.62 5.763 Iterations 1868.63 ±.0083 18.69 7.825 Iterations 2280.98 ±.0034 22.81 11.958 Iterations 2899.51 ---.0089 29.00 18.13

10 Iterations 3338.17 ± .0065 33.38 22.5212 Iterations 3750.52 ±.0089 37.51 26.6415 Iterations 4369.00 ±.0358 43.69 32.83

20 Iterations .................. 5219...97....±.0216 52.20 41.34

A :- 1 Assign & Add 2074.83 ±.0049 20.75 9.88A :ffi B + C Combination

A := 1 Assign & Mult 2687.12 ±.0043 26.87 16.01A :-- B * C Combination

A := 1 Assign & Div 3513.13 ±.0077 35.13 24.27A :- B div C Combination

A "= 1 Assign, Add, Mult 3331.93 ±.0067 33.32 22.46A "= B + C CombinationA'=B*C

A "= 1 Assign, Add, Div 4131.63 ±.0020 41.32 30.45A "= B + C Combination

A:=B/C

A := 1 Assign, Mult, Div 4564.03 ±.0170 45.64 34.78A := B * C Combination

A:=B/C ......

Table I-6: Instruction Execution Time: Instruction Combinations

Page 43: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

37

StatisticalInformation:Combinations

(microsecondspcr I00 instruction-with overhead)

Pascal Description min/max variance standardInstruction deviation

A := 1 1 Iteration 1456/1457 .201 .448

2 Iterations 1662/1663 .250 .5003 Iterations 1868/1869 .232 .4825 Iterations 2280/2282 .095 .3088 Iterations 2899/2900 .250 .500

10 Iterations 3337/3339 .181 .42512 Iterations 3750/3751 .250 .50015 Iterations 4368/4370 1.000 1.00020 Iterations 5219/5221 .603 .777

A := 1 Assign & Add 2074/2775 .138 .372

A := B + C .....Combination ....

A := 1 Assign & Mult 2686/2688 .119 .345A ;= B * C Combination

A := 1 Assign & Div 3512/3514 .215 1 .464A := B div C Combination

A := 1 Assign, Add, Mult 3331/3333 .186 .431A "- B + C CombinationA:=B*C

A :--- 1 Assign, Add, Div 4131/4132 .603 .776A := B . C Combination

A:--B/C

A := 1 Assign, Mult, Div 4563/4565 .481 .693A :- B * C Combination

A:=B/C

Table I-7: Statistical Information: Instruction Combinations

Page 44: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

38

References

[Butler 84] Ricky W. Butler and Sally C.Johnson.Validation of a Fault-Tolerant Clock Synchronization SystemNASA-Langley Research Center, 1984.

NASA Technical Paper 2346.

[Clune 84] Ed Clune.Analysis of the Fault Free Behavior of the FTMP Multiprocessor System.Master's thesis, Carnegie-Mellon University, 1984.

[Czeck 85] Ed Czeck.Fault Free Performance Validation of a Fault Tolerant MultiProcessor: Baseline and

Synthetic Workload Measurements.

Master's thesis, Carnegie-Mellon University, 1985.

[Feather 85] Frank E. Feather.Validation of a Fault-Tolerant Multiprocessor Baseline Experiments and Workload

Implementation.Master's thesis, Carnegie-Mellon University, 1985.

[Ferrari 78] Domenico Ferrari.Computer System8 Performance Evaluation.Prentice-Hall, 1978.

[Green 84] David F. Green, Jr., Daniel L. Palumbo, and Daniel W.Baltrus.Software Implemented Fault-Tolerant (SIFT) User'8 GuideNASA-Langley Research Center, 1984.NASA Technical Memorandum 86289.

[NASA 79] Research Triangle Institute.Validation Method8 for Fault-Tolerant Avionics and Control Systems - Working Group

Meeting H

NAsA-Langley Research Center, 1979.NASA Conference Publication 2130.

[Palumbo 85] Daniel L. Palumbo.The SIFT Hardware/Software System8NASA-Langley Research Center, 1985.NASA Technical Memorandum 87574.

[Palumbo & Butler 85]Daniel L. Palumbo and Ricky W. Butler.Measurement of SIFT Operating System OverheadNAsA-Langley Research Center, 1985.NASA Technical Memorandum 86322.

[Shin & Krishna 84]Kang G. Shin, C. M.Krishna.Characterization of Real-Time ComputersNASA-Langley Research Center, 1984.

Contract Report (CR) 3807.

[Siewiorek 82] Daniel P. Siewiorek, C. Gordon Bell, and Allen Newell.Computer Structures: Principal and Examples.MaGraw Hill Book Company, 1982.

Page 45: Fault-Free Performance Validation of Fault-Tolerant Multiprocessorsreports-archive.adm.cs.cmu.edu/anon/anon/scan/CMU-CS-86-127.pdf · Table 8-6: SIFT: Comparison Instructions Combinations

39

[Siewiorek & Swarz 82]Daniel P. Siewiorek and Robert S.Swarz.

The Theory and Practice of Reliable System Design.Digital Press, 1982.

[Smith & Lala 82]T. Basil Smith, III and J. H. Lala.

Development and Evaluation of a Fault-Tolerant Multiprocessor(FTMP) ComputerThe Charkes Stark Draper Laboratory, Inc., 1982.Contract Number NAS1-15336.

[Smith & Lala 83]T. Basil Smith, III and J. H. Lala.

Development and Evaluation of a Fault-Tolerant Mulitprocesaor(FTMP) ComputerThe Charkes Stark Draper Laboratory, Inc., 1983.NASA Contractor Report 166071.

[SRI 81] Hierarchical Specification of the SIFT Flight Control SystemSRI International, 1981.

Technical Report CSL-123.

[SRI 82] Investigation, Development, and Evaluation of Performance Proving for Fault-Tolerant ComputersSRI International, 1982.Contract Number NAS1-15528.

[SRI 84] Development and Analysis of the Software Implemented Fault-Tolerace (SIFT)ComputerSRI International, 1984.

Contract Report 172146.

[Starcom 80] Pascal* (ve.O) User's Manual For the Bendix BDX-980Stareom Associates, 1980.

[Wensley 78] Wensley, John H., Green, Milton W., Shostak, Robert E., Lamport, Leslie, Levitt, KarlN., Weinstock, Charles B., Goldberg, Jack, Melliar-Smith, P. M.SIFT: Design and Analysis of a Fault-Tolerant Computer for Aircraft Control.

Proceedings of the 1EEE , October, 1978.

[Wyle 84] Software Users' Manual for the AIRLAB SIFT SchedulerWyle Labortatories, 1984.Document Number SD63148-141R0-D3.


Recommended