Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | lev-dillon |
View: | 25 times |
Download: | 1 times |
CCECE 2011
ROUTING ARCHITECTURE AND ALGORITHMS FOR A SUPERCONDUCTIVITY CIRCUITS-BASED COMPUTING HARDWARE
Farhad Mehdipour, Hiroaki Honda, Hiroshi Kataoka, Koji Inoue, Kazuaki Murakami
Kyushu University, Japan
2
CREST-JST (2006~): Low-power,high-performance, reconfigurable processor using
single-flux quantum (SFQ) circuits
SFQ-LSRDP
K. MurakamiK. InoueH. Honda
F. MehdipourH. Kataoka
Kyushu Univ.Architecture, Compiler
and Applications
S. Nagasawa et al.
Superconducting Research Lab. (SRL)
SFQ process
N. Yoshikawa et al.
Yokohama National Univ.SFQ-FPU chip, cell library
A. Fujimaki et al.
Nagoya Univ.SFQ-RDP chip, cell library,
and wiring
N. Takagi (Leader) et al.
Nagoya Univ.CAD for logic design
and arithmetic circuits
Our mission: Architecture, compiler and application development
Outline of Large-Scale Reconfigurable Data-Path (LSRDP) Processor
ジョセフソン接合
超伝導ループ
磁束量子Single Flux QuantumSuperconductivityloop
Josephson junctionジョセフソン接合
超伝導ループ
磁束量子
ジョセフソン接合
超伝導ループ
磁束量子
ジョセフソン接合
超伝導ループ
磁束量子Single Flux QuantumSuperconductivityloop
Josephson junction
3
SFQ Features: High-speed switching and signal transmission Low power consumption Compact implementation (smaller area) Suitable for pipeline processing
…
…
…
…
…
…Buffers
Buffers
LSRDP
Memory
inst;inst;…conf_LSRDP ( ); Loop: rearrange_input_data ( ); set_IO_info ( ); run_LSRDP ( ); inst; … sync_lsrdp ( ); rearrange_output_data ( );End_Loopinst;…
instinstconf_LSRDP();
conf. bit-stream …
…
…
…
rearrange_input_data ()
GPP
Memory Controller
set_IO_info ( );
Memory Controller
…
…
…
…
…
…
run_LSRDP ( ); inst sync_lsrdp ( );
GPPGPP
Waiting for the LSRDP LSRDP terminating
the operation
rearrange_output_data ( )
GPP
How it works
4
Architecture Exploration
Layout-I
ADD/SUBMUL
ADD/SUBMUL
ADD/SUBMUL
ADD/SUBMUL
ADD/SUBMUL
ADD/SUBMUL
ADD/SUBMUL
ADD/SUBMUL
ADD/SUBMUL
ADD/SUBMUL
...
...
...
ADD/SUBMUL
ADD/SUBMUL
ADD/SUBMUL
ADD/SUBMUL
...
.
.
.
.
.
.
.
.
.
ADD/SUBMUL
ADD/SUBMUL
ORN
ORN
ORN
.
.
.
Layout-II
ADD/SUB
MUL
ADD/SUB
MULADD/SUB
MUL
ADD/SUB
MULADD/SUB
MUL
...
...
...
ADD/SUB
MULADD/SUB
MUL ...
.
.
.
.
.
.
.
.
.
MULADD/SUB
ORN
ORN
ORN
.
.
.
Layout-III
MUL MUL
ADD/SUB
ADD/SUB
ADD/SUB
ADD/SUB
MUL MUL MUL MUL
...
...
...
ADD/SUB
ADD/SUB
ADD/SUB
ADD/SUB
...
.
.
.
.
.
.
.
.
.
MUL MUL
ORN
ORN
ORN
.
.
.
FU TUTU
PE arch. I
4-inps/3-outs
FU TU
PE arch. II
3-inps/3-outs
TU TU FU TU
Basic PE arch.
3-inps/2-outs
MCL= 1
Num
ber
of
row
s =
1.5
×M
Number of columns = 4×MCL
Num
ber
of
row
s =
2
×M
Number of columns = 6×MCL+2MCL= 1
Num
ber
of
row
s =
1.5
×M
Number of columns = 4×MCL+1
MCL= 2
LSRDP Layouts
PE structures
ORN structures
5
6
LSRDP Tool Chain
ApplicationC code
1 Modified application code
2
Modifying application code
Inserting LSRDP instructions in the code
1
ISAcc or COINS compiler
2
DFG Extraction
1
.asm codefor MIPS-based GPP
2
Data flow graphsPlacing and Routing Tool
2
Configuration file +various text & schematic
reports
1
LSRDP library fileFunction definitions
& declarations1
LSRDP architecture description
2
1: flow of the assembly code generation for GPP
2: flow of configuration bit-stream generation for the LSRDP
SimulatorPerformance evaluation
Mapping DFGs onto LSRDP
7
Longest connections
DFG
LSRDP Architecture Description
Placing Input Nodes
Placing Operational & Output Nodes
Routing Nets
Routing IO Nets
Final Map
Global routing algorithms
src
dest
src
dest
vacantfully- occupied
exhaustive search-basedvery time consuming
branch and bound alg.Very fast
Routing DFG connections between source and destination PEs
8
Micro-Routing-Problem Definition
• Inputs– LSRDP basic specifications
•Layout, Width (W), MCL, PE arch., and etc.•List of connections b/w consecutive rows
– ORN structure including•The number of CBs and T2s in each row•The number of CB rows•Topology of connections among CBs
• Output– Detailed routes via cross-bar switches
•The list of CBs used for routing each connection•Configuration of CBs
FU T FU T FU T FU T…
FU T FU T FU T FU T…
ORN
i-th row
(i+1)-th row
A micro-routing algorithm has been implemented for the LSRDP with underlying layout II and PE arch. III
ORN Micro-routing
00 01 10 11
00 01 10 11
CB
½CB
(PE1 PE 5)
(PE2 PE5, PE6, PE7)
(PE3 PE6, PE8 )
(PE4 PE7, PE8)1/2CB: 1-input/2-ouput
CB: 2-input/2-output
Micro-nets
Example
10
PE1
PE 2
PE 3
PE 5
PE 6
PE 7
PE 4 PE 8
½CB
½CB
½CB
½CB
CB
CB
CB
(CB)
(CB)
CB
CB
CB
CB
3
2
4
2
2
3
4
1
1
22
2
43
3
4
3
4
3
2
2
4
1
-
1817
12
20
18
25
24
24
3231
…
…
…
…
PEs in 3rd Row
PEs in 4th row4
5
6
7
8
9
10
11
ORN Micro-Routing Example: Heat 8x2- ORN b/w 3rd and 4th Rows
9
10
11
12
13
14
16
18
8
17
6
15
7
9
10
11
12
13
14
16
18
8
17
6
15
7
9
10
11
12
13
14
16
18
8
17
6
15
7
9
10
11
12
13
14
16
18
8
17
6
15
7
9
10
11
12
13
14
16
18
8
17
6
15
7
9
10
11
12
13
14
16
18
8
17
6
15
712
17
24
20
25
18
3132
18
24
12
18
20
24
18
17
32
25
24
31
12
18
2524
24
31
18
32
17
20
12
18
18
24
24
3132
25
17
20
9
10
11
12
13
14
16
18
8
17
6
15
7
12
18
20
24
24
31
32
17
18
25
12
1818
20
24
31
17
32
2425
12
18
24
25
32
9
10
11
12
13
14
16
18
8
17
6
15
7
17
20
31
12
18
20
24
3132
25
17
9
10
11
12
13
14
16
18
8
17
6
15
7
12
20
24
31
17
32
18
25
18
12
17
20
24
3132
25
9
10
11
12
13
14
16
18
8
17
6
15
7
64
5
6
7
8
9
10
11
CCECE 2011 12
Specifications of Attempted DFGs
total # of nodes # of Inputs # of outputs # of ops
Heat-8x1 34 6 4 16
Heat-8x2 60 8 4 32
Heat-16x2 172 16 12 96
Poisson-3x3 62 18 1 33
Vibration-4x2 48 8 4 24
Vibration-8x2 136 16 12 72
Vibration-8x4 168 16 8 96
ERI-1 76 16 9 51
ERI-2 67 19 1 47
CCECE 2011 14
Results of routing nets using the proposed algorithms
DFG avg. hor. C.L. avg./max.ver. C.L.
# of global/micro nets to route
Timeto map (sec)
Heat-8x10.35 0.75/3 36/64 0.015
Heat-8x2 0.44 1.32/5 68/114 1.75
Heat-16x2 0.47 1.64/7 204/343 1.05
Poisson-3x3 0.68 2.4/16 67/120 2074.5
Vibration-4x2 0.46 1.58/9 50/88 0.34
Vibration-8x2 0.42 2.15/10 154/332 2.20
Vibration-8x4 2.48 3.72/16 348/610 6721.3
ERI-1 0.75 2.21/9 111/374 53.61
ERI-20.78 2.99/9 95/332 0.327