End of First Semester Presentation DigiSat Reliable Computer – Multiprocessor Control System. Niv...

Post on 14-Dec-2015

215 views 1 download

Tags:

transcript

End of First Semester End of First Semester PresentationPresentation

DigiSat Reliable Computer – Multiprocessor Control System.

Niv Best, Shai Israeli

Instructor: Oren Kerem

HS-DS Lab, Technion, Winter 2003

Project GoalsProject Goals

Design & implement a hardware mechanism for multiprocessor monitoring & control.

Part of the DigiSat reliable computer project.

The DigiSat ComputerThe DigiSat Computer

PowerPC based.Implemented upon the Virtex II-pro

platform.Hardware redundancy throughout the entire

system.Our project handles processor redundancy

& control.

DescriptionDescription

Satellites contain redundant hardware since servicing in space is not applicable.

A monitoring system is required to identify & handle malfunctions.

Must be implemented in hardware.

DigiSat ComputerDigiSat Computer

PPC1 PLB1

DATA ROUTE

M1 M1 M1 M2 M2 M2

S1 S1 S1 S2 S2 S2PPC2 PLB2

Our Project

TechnologyTechnology

Virtex II-pro FPGA with embedded PowerPC cores.

PowerPC 405PowerPC 405

32-bit RISC core.Low power consumption.Used in various system-on-chip (SoC)

applications (PDAs, network routers, cellular phones…).

Embedded within the Virtex II-pro platform.

PowerPC InterfacesPowerPC Interfaces

RST

EIC

CPU

CPM

DBG

ISPLB

DSPLB

Software UsedSoftware Used

Current StatusCurrent StatusSimulated a PowerPC system using

ModelSim 5_6d.Built autonomic monitoring sub-systems

that survey PowerPC hardware outputs.Studied Assembler programming of the

PowerPC.Built state-machine for controller.

The Processor ControllerThe Processor Controller

Controller State Machine Controller State Machine (General)(General)

PPC_1 Online

PPC_2 Online

Switch 12

Switch 21

Error detected

Error detected

Controller State-Machine DiagramController State-Machine Diagram

Signals InvolvedSignals Involved

Signal I/O Affects Comments

c405cpmtimerresetreq O CPM  Timer Reset Request

c405xxxmachinecheck O CPU  Machine Check Error

c405dbgstopack O EIC  Stop Acknowledge

c405rstchipresetreq O RST  Chip Reset Request

c405plbdcuwritethru O DSPLB  Write Thru

c405rstcoreresetreq O RST  Core Reset Request

C405rstsysresetreq O RST  System Reset Request

Bus_error_Det O PLB Bus Error

Dbgc405debughalt I DBG  Debug Halt

cpmc405clock I CPM Input clock

rstc405resetcore I RST  Reset Core

sys_rst I PLB PLB RST

plb_clk I PLB PLB CLK

Controller ImplementationController Implementation

CPU1 CPU2

PLB1 PLB2

Controller + Arbiter

MUX

P_Sel

To/From Peripherals

External Signals

Controller ImplementationController Implementation

Controller Sub-ModulesController Sub-Modules

Core/Chip/System reset requests monitor.Timer reset request monitor.Write-thru monitor.Machine check error monitor.PLB error monitor.

Reset Requests MonitorReset Requests Monitor

Monitored signals:1. Chip reset request (c405rstchipresetreq)

2. Core reset request (c405rstcoreresetreq)

3. System reset request (c405rstsysresetreq)

If asserted, the required reset action is performed. Processors switch upon core reset request.

Timer Reset MonitorTimer Reset Monitor

Monitored signal: c405cpmtimerresetreq The signal is the logical “OR” of the reset

request signals. Serves as another way for resetting the

system. Monitored along with the regular reset

request signals – checks the watchdog functionality.

Timer Monitor DiagramTimer Monitor Diagram

Wt_to_chkr

Chip reset request

Watchdog timer Error

C405rstchipresetreq C405rstcoreresetreq C405rstsysresetreq c405cpmtimerresetreq Wt_to_err

0 0 0 1 1

0

Timer reset request

System reset request

Core reset request

Write-Thru MonitorWrite-Thru Monitor

Write-back policy is unreliable in space.Monitored signal: c405plbdcuwritethru.Should remain high during normal

operation.Low state – error in MMU.

Write-Thru Monitor DiagramWrite-Thru Monitor Diagram

writethruCheckerPPC write-thru policy Write-thru Error

c405plbdcuwritethru writethru_err

0 1

1 0

0

Bus Error MonitorBus Error Monitor

PPC core & PLB are “tightly coupled”.Monitored signal: bus_error_detAsserted when a bus error interrupt occurs.Implies an error has occurred within the

PLB arbiter.Preliminary inspection of the PLB arbiter.

Bus Error Monitor DiagramBus Error Monitor Diagram

BusErrorCheckerBus_error_detect

Bus_error_detect Bus_err

0 0

1 1

0

Bus_err

Monitor ArbitersMonitor Arbiters

Our system consists of 3 identical monitoring sub-systems and an arbiter that uses majority voting to decide if an error signal is reliable.

There are 2 ways of using the arbiters:

““One Monitor to Rule Them All…”One Monitor to Rule Them All…”1. Each signal is monitored thrice, passed on to an arbiter, then

checked by ERR_Mon.M1

M1

M1 ARB

M3

M3

M3 ARB

M2

M2

M2 ARB Err_Mon

ControllerM1

M1

M1 ARB

M3

M3

M3 ARB

M2

M2

M2 ARB Err_Mon

““The Fellowship of The Signals”The Fellowship of The Signals”2. Each signal batch is monitored by Err_Mon, results are

passed onto an arbiter.

Controller

M1

M3

M2 Err_Mon

M1

M3

M2 Err_Mon

M1

M3

M2 Err_Mon ARB

M1

M3

M2 Err_Mon

M1

M3

M2 Err_Mon

M1

M3

M2 Err_Mon ARB

Monitor ArbitersMonitor Arbiters

err1 err2 err3 Err_out

0 0 0 0

0 0 1 0

0 1 0 0

0 1 1 1

1 0 0 0

1 0 1 1

1 1 0 1

1 1 1 1

0

Controller DemonstrationController Demonstration

ScheduleScheduleFind a way to initiate CoreReset on PPC. (3

weeks)Find more ways to monitor the PPC. (5 weeks)Write WatchDog software to check internal

parts of PPC. (8 weeks)Download hardware to chip and run tests on

single PPC. (11 weeks)Wait for chip with dual cores. (T.B.D)