Hardware Acceleration
Sungho Kang
Yonsei University
2CS&RSOC YONSEI UNIVERSITY
Outline
IntroductionBoeingTEGASYorktown Simulation EngineLogic Simulation MachineHALZYCADAAP-1Reconfigurable
3CS&RSOC YONSEI UNIVERSITY
IntroductionWhy Simulation Engine
Speed up difficulty in software simulationParallel Processing
Multi-processingPipeliningArray Processing
Hardware Implementation
Simulation Engine / Hardware AcceleratorCompiledEvent DrivenMulti-ProcessorArray
4CS&RSOC YONSEI UNIVERSITY
IntroductionSimulation Engine Performance
ArchitectureMaximumEvalution
UnitsSimulationAlgorithm
MaximumGates
AnnouncedSpeed
YSE(IBM)
Multi-ProcessorPipelining
256 Compile1M
( 4 input /1 output gate )
2000M( gates / sec )
HAL(NEC)
Multi-ProcessorPipelining
31Level
ControlledEvent Driven
1.5M 300M( gates / sec )
LE(ZYCAD)
Multi-ProcessorPipelining
16 Event Driven1.6M
( 2 input /1 output gate )
16M( gates / sec )
5CS&RSOC YONSEI UNIVERSITY
IntroductionSimulation Engine Classification
Simulatoron AAPI
(NTT)
YSE(IBM)
HAL(NEC)LE(ZYCAD)
Array Processor Architecture
Multi-Processor Architecture
Hardware EventDriven Architecture
LSM(Bell Lab.)
Software simulator(Compile)
Software simulator(Event driven)
Actual Circuit
Number of Parallel Processing
655366processor
1 processor5 stage pipe
16 processor5 stage pipe
31 processors
256 processors
Hardware Mock-up
6CS&RSOC YONSEI UNIVERSITY
BoeingBoeing Computer Simulator
COMMUNICATIONLOOP STATION
COMMUNICATIONLOOP STATION
COMMUNICATIONLOOP STATION
COMMUNICATIONLOOP STATION
LOGICPROCESSOR
COMMUNICATIONLOOP STATION
CONTROL ANDDISPLAY
CONSOLE
INTERFACEUNIT
GENERALPURPOSE
COMPUTERPERIPHERALS
LOGICPROCESSOR
LOGICPROCESSOR
LOGICPROCESSOR
CENTRALELECTRONICCROSSBARSWITCH
7CS&RSOC YONSEI UNIVERSITY
TEGASTEGAS Accelerator
Control & Statistics Proc.
Status Reg.
Instr.Mem
Cmd Reg
Result Buff Proc.
Activity Srch. Proc.
Evaluation Processors
Result
Flag Mem
PIN MemInstr. Me
m
Update Processor
Host Interface Processor
Input Event
Result Buff.
Update LIFO
672 bit wideSimulation Proc. Mem
To Host
Simulation Data Bus
Maint Proc.
InstrMem
Clock/Clear
I/O Ports
8CS&RSOC YONSEI UNIVERSITY
TEGASTEGAS Accelerator
Functional Level Block DiagramControl and
StatisticProcessor
UpdatePass
Processor
FaultList
Proessor
HostInterfaceBuffers
MaintenanceProcessorand Clock
Logic
EvaluationPass
Processor
HostInterface
Processor
SimulationProcessingMemories
9CS&RSOC YONSEI UNIVERSITY
TEGASTEGAS Accelerator
Accelerator Update Processor
LIFOAddress
andAccessControl
MasterController
SimulationProcessing
MemoryAccess
Controller
DescriptorAddress
PipeLIFO
Addressand
AccessControl
To / FromSimulationProcessing
Memory
To/Fromresult
Buffer MemoryTo/From
Control and StatisticsProcessor
To/FromFault ListProcessor
Support Bus
To/FromHost Interface Processor
From Host InterfaceProcessor
From UpdateProcessor
LIFOMemory
From HostInterface
Processor
TO/FromUpdate
ProcessorLIFO Memory
From EvaluationProcessor
10CS&RSOC YONSEI UNIVERSITY
TEGASTEGAS AcceleratorAccelerator Evaluation Processor
BehaviorProcessorInstruction
Memory
BehaviorialLanguageProcessor
BehavioralBuffer
StructuralEvaluationProcessor
StructuralProcessorInstruction
Memory
StructuralBuffer
StructuralProcessorScheduler
SchedulerMemory
PinAttributeMemory
Level 0
Support Bus
To/From Fault List Processor
To/Fromcontrol and Statistics Processor
To/FromTime Queue Processor,
Time Queue Memory
To/Fromcontrol and Statistics Processor
Behavioral Processor
Structural Processor
To/From Fault List Processor
To/FromActivity Search Processor
To/From Simulation Processing Memory
To/Fromcontrol and Statistics Processor
11CS&RSOC YONSEI UNIVERSITY
TEGASTEGAS AcceleratorAccelerator Time Queue Processor
TimeQueueSearchPage
ActivityLogic
TimeQueueSearchPage
ActivityLogic
TimeQueue
ProcessorController
TimeQueue
MemoryAvailability Memory
TimeQueue
MemoryAvailabilty
Logic
TimeQueueSearchPage
ActivityLogic
To/From Evalution Processor
To/FromTime Queue
Memory
From Control and Statistics
Processor
From Activity Search ProcessorSimulation
ProcessingMemory
Address Bus
Support Bus
From Control and Statistic Processor
12CS&RSOC YONSEI UNIVERSITY
YSEYorktown Simulation Engine
Compiled
256 x 256 Switch
Logic Proc 0
Logic Proc 2
Logic Proc 1
Logic Proc 256
ArraySimulator........
Bus ControlControlProc.
Host
13CS&RSOC YONSEI UNIVERSITY
YSEYorktown Simulation Engine
Partitioning of 256 PUsEach PU simulates a subcircuit consisting of up to 4k gatesSpecialized PU for RAMs and ROMs
All PUs are synchronized by a common clocksPU can evaluate a gate during every clock cyclePartitioning
Minimize the waiting timeControl processor : host to YSE
14CS&RSOC YONSEI UNIVERSITY
YSEYorktown Simulation Engine
Logic Processor
Program Counter
Fetch
Address
InstructionMemory
(1024 x 80)
Function Memory
Fetch Store
Addr Addr
DATA2048 x 2 x 5
Signal Data
Fetch Addr
Delay Value
Memory1024 x16
Logic Unit
Operands 1- 5
Function
15CS&RSOC YONSEI UNIVERSITY
YSEYorktown Simulation Engine
Logic ProcessorPC provide the index to the next gate to be evaluated (compiled)Signal values(0,1,X,Z) are stored in data memory Up to 4 inputs for each gate
Generalized DeMorgan Code(GDM)16 functions of 4 valued variablesEvaluation is done in zoom table
16CS&RSOC YONSEI UNIVERSITY
YSEYorktown Simulation Engine
Function UnitConcatenating gate type with input values by GDM code
Functionmemory GDM
GDM
GDM
GDM1 code
GDM4 code type
GDM5 code
...
input 1value
input 4value
outputvalue
17CS&RSOC YONSEI UNIVERSITY
LSMLogic Simulation Machine
Event Driven Logic Simulation Tasks
Processor TaskCURRENTEVENTPROC
1) retrieve and distribute event2) record event3) oscillation check
MODELACCESSINGUNIT
1) determine first fanout2) determine next fanout
SIMPLECONFPROC
1) update source configuration2) update and transmit
fanout configuration
EVAL1) evaluate2) check for repeated evaluations
SCHED 1) schedule2) timing analysis
EVENT LISTMANAGER insert in event list
18CS&RSOC YONSEI UNIVERSITY
LSMLogic Simulation Engine
EVENTLISTMEM
CURRENTEVENTPROC
EVENTLIST
MANAGER
MODELACCESSING
UNIT
UPDATESAND
EVALUATIONS
SCHED
FROM STIMULUS
FILE TO RESULTS FILE
19CS&RSOC YONSEI UNIVERSITY
LSMLogic Simulation Engine
Pipeline - uniformly distribute the work load How to solve bottleneck of special processors or FIFO buffersElement evaluation is indicated as soon as one of its inputs changeMost time consuming
Fanout determinationGate evaluation
Function evaluations are executed concurrently by parallel processors
20CS&RSOC YONSEI UNIVERSITY
HALHAL
Block Level HardwareLogic Simulator
Router Cell Network
IC Node Proc.1
Event Set
Out Pin Status
Comparison
IC Pin Conn Tab
Input Pin Status
Tab
Event Fetch
Event Pack
Xmission
IC Node Proc.2
9
IC Node Proc.3
0
IC Node Proc.3
1
Packet Xcvr
Exec. Contro
l
Dynamic Gate Array
PIN Array Data Mem
Gate Array Mem
Addr. Mem
MNS
MNS MNS
Main Memroty
State Mem
Master Mem Acces
s
IC Addr Map
Master Mem Acces
s
IC Addr Map
State Mem
Main Mem Data
DMA Contrlr
CPU
RAM Data
Master Cntr. Proc.
Memory IC Procs.
21CS&RSOC YONSEI UNIVERSITY
HALHAL
Event searchSignal propagationSystem clock changeExecutes a level controlled event driven mechanismSimulation model consists of logic blocks24 bit message packets solve conflicts when more than one access
22CS&RSOC YONSEI UNIVERSITY
ZYCADZYCAD Logic Evaluator
Event driven Up to 16 processorsEvent processor - 5 stage pipeline
Get fanout list and determine delayRead fanout and update output statesUpdate input states and determine gate typeEvaluate gate modelSchedule new event
3 input 1 output gate onlyEvent stack - reduce overheadFuture event scheduler - timing wheel
23CS&RSOC YONSEI UNIVERSITY
AAP-1AAP-1
ARRAYCONTROL
UNIT
INSTRUCTIONMEMORY
INTERFACEUNIT
DATABUFFERMEMORY
PE PE PE PE
PEPEPEPE
PE PE PE PE
256 X 256 PE ARRAY
256
HOST
COMPUTER
16
16
24CS&RSOC YONSEI UNIVERSITY
AAP-1AAP-1
Processing Element
MUX-ID1
MUX-RUT
REG-RS
MUX-OD
MUX-ID2 RAM-B(32WX1b)
RAM-B(64X1b)
LAT-A LAT-B LAT-S
MUX-ALDS
MUX-CRY
REG-10
REG-C
A BALU
CORALU
'1'
CiU
CiRCiCo
F
DO2
DTU2DTU1
DO1
RF-A RF-B
PELEVEL
BYPASS
8 NEIGHBOR PEs UPPER AND LOWER PEs
25CS&RSOC YONSEI UNIVERSITY
ReconfigurableReconfigurable Array Architecture
Configuration
26CS&RSOC YONSEI UNIVERSITY
ReconfigurableMapping
Mapping onto PE Array
27CS&RSOC YONSEI UNIVERSITY
ReconfigurableMapping
Expanded Circuit Mapping
28CS&RSOC YONSEI UNIVERSITY
ReconfigurableArray Reconfiguration
29CS&RSOC YONSEI UNIVERSITY
ReconfigurablePE Array Reconfiguration
Folding
30CS&RSOC YONSEI UNIVERSITY
ReconfigurableNode Descriptor Memory
31CS&RSOC YONSEI UNIVERSITY
ReconfigurablePE Cell Block
32CS&RSOC YONSEI UNIVERSITY
ReconfigurableFirst Expansion
33CS&RSOC YONSEI UNIVERSITY
ReconfigurableExpansion of XOR and XNOR
34CS&RSOC YONSEI UNIVERSITY
ReconfigurablePossible Expansion of 4 Input AND
35CS&RSOC YONSEI UNIVERSITY
ReconfigurableSecond Expansion
36CS&RSOC YONSEI UNIVERSITY
ReconfigurableFault Simulation Algorithmmake a fault list attached to each elementfor all patterns
perform good simulationstore the results of good simulationfor all levels
insert the first fault in the fault listsimulate the faultcompare the valueif the value is different from that of good simulation
propagate the fault to the next levelelse
drop the faultinsert the first fault in the fault listsimulate the faultcompare the valueif the value is different from that of good simulation
propagate the fault to the next levelelse
drop the faultend
end