Post on 08-Jan-2016
description
transcript
CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 1
Ideas for the design of an ASIP for LQCD
Target Compiler TechnologiesCASTNESS’11, Rome, Italy
CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 2
Agenda
ASIPs and IP Designer
EURETILE platform
An ASIP for LQCD
CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 3
ASIPs in Multi-Core SoC
ASIP: Application-Specific Processor Anything between general-purpose P and hardwired data-path Flexibility through programmability and design-time reconfigurability High throughput, low energy through parallelism and specialization
ASIP is foundation of heterogeneous multi-core SoC Balanced SoC architecture offers best performance at lowest energy and lowest cost
CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 4
Why ASIPs?
Maximise performance Specialisation Parallelism: VLIW, SIMD, multi-core
Minimise power dissipation Specialisation Parallelism: VLIW, SIMD, multi-core Power-optimised RTL generation
Leverage the benefits of programmability React to changing requirements Ship first for evolving standards Remedy defects Extend products to new markets without an SoC respin
CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 5
IP Designer Tool Suite
CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 6
nML – ASIP description language
Structural skeleton
reg V[4]<vector>; trn vecr<vector>; trn vecs<vector>; trn vecd<vector>;trn vect<vector>;fu vec;fu vabs;...
opn vec_adiff_opn(t:c2u, r:c2u){ action { stage E1: vecd = vsub(vecr=V[r],vecs=V[t]) @vec; V[t] = vect = vabs(vecd) @vabs; } syntax : "vadiff v"t ",v"r ",v"t; image : t::r;}
Instruction-set grammar
Example: architectural specialisation Absolute-difference instruction in motion estimation
• Registers, busses, functional units
• Application specific data type ‘vector’
• Registers, busses, functional units
• Application specific data type ‘vector’
Primitive functions:•vsub()•vabs()
Primitive functions:•vsub()•vabs()
Operation pattern:V vabs() vsub() V, V Operation pattern:V vabs() vsub() V, V
CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 7
Agenda
ASIPs and IP Designer
EURETILE platform
An ASIP for LQCD
CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 8
EURETILE hardware platform
Communication DNP
Control RISC
Computation DSP ASIPs: specialised towards the application
− Lattice quantum chromo dynamics (LQCD)
− Neural network (Izhikevich)
DNP
RISC
DSP
MEM
***
ASIP1
CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 9
Agenda
ASIPs and IP Designer
EURETILE platform
An ASIP for LQCD
CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 10
LQCD ASIP
Goals Increase performance Decrease gate count or usage of FPGA blocks
Means Task level parallelism (multi tile architecture) Data level parallelism Instruction level parallelism Architecture specialisation
CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 11
LQCD ASIP
Instruction level parallelism
VU_1 … VU_n LS_0 … LS_m
VLIW instruction word
Arithmetic operations in parallel with load/store operations Appropriate mix of n and m based on feedback from
compilation of Qphi() function n*m speed improvement over scalar architecture
Data level parallelism
c1 c2 c3
3-way SIMD fits with SU(3) matrix algebra 3x speed improvement over scalar architecture
CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 12
LQCD ASIP
Architecture specialisation: complex floating point operations:
C + C, C + i*C → 2x speedup over scalar architecture
C – C, C – i*C
C * R → 4x speedup over scalar architecture
C * C → 8x speedup over scalar architecture
…
Behaviour of floating point operations • Defined in a C dialect intended for the modelling of functional units
• Translated into simulation and implementation (RTL) models
• Synthesis on standard cell library, mapping on FPGA primitives
Vector types and operators defined for the C compiler
vector v1, va[4], vb[4];
v1 += va[0] * vb[1];
CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 13
LQCD ASIP
Architecture specialisation: address generationGoal: Vector units should be used every cycle, address generation must be done in parallel
How: to be investigated, after feedback from C compiler!
Deliverables SDK (Compiler, Assembler, Linker, Simulator, Debugger) based
on IP Designer SystemC model RTL Model + FPGA mapping
CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 14
CASTNESS11, Rome Italy© 2011 Target Compiler Technologies L 15