Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | reynold-perry |
View: | 212 times |
Download: | 0 times |
1/60
Tsinghua
Retarget Open64 to an Embedded CPU
A practice for automatic approach
SS&SE Group(System Software & Software Engineering )
Department of Computer Science and Technology
Tsinghua University
2/60
Tsinghua
Overview of the Current Design
Prototype System
Background and Motivation
Outline
Perspective
Acknowledgment
3/60
Tsinghua
Why Open64
Background and Motivation
Not Difficult retarget manually
• Based on a short procedure guideline and some guidance, one of our students had made a preliminary retarget of Open64 to PowerPC within 6 weeks
• Work is still laborious and tedious
changes are scattered in many places
wrong results due to erroneous changes are hard to find for a new developer
4/60
Tsinghua
Why Open64
Background and Motivation
Good Research Platform for Automatic Retarget and Computer Architecture
• High level IR (WHIRL) is machine independent.
• Performance of code generated is already of high quality right after retarget
• There is no contribution in open source to explore automatic retarget. It brings an “clean” platform to research on computer architecture and ISA enhancement
5/60
Tsinghua
Our Current Practice
Background and Motivation
Objective
• Explore a reasonable solution to automatic retarget for Open64 without changing the current CG framework
• Experience a realistic new target CPU (we chose PowerPC)
• Seek more opportunities in research about automatic retarget (software engineering, machine description, etc.)
6/60
Tsinghua
Our Current Practice
Background and Motivation
Status
• A preliminary solution to automatic retarget (exercised with PowerPC)
Overview of the Current Design follows
• A Prototype system
Prototype System will be discussed later
7/60
TsinghuaOverview of the Current Design
Principle of Current Design
Keep the basic structure unchanged
Determine automatable part incrementally
Make machine description as abstract as possible
8/60
TsinghuaOverview of the Current Design Flowchart of Code Generator From Tutorial on the SGI Pro64 Compiler Infrastructure by Gao et. al., PACT 2000
9/60
TsinghuaOverview of the Current Design
Targeting Pro64 to a New Processor From Tutorial on the SGI Pro64 Compiler Infrastructure by Gao et. al., PACT 2000
10/60
TsinghuaOverview of the Current Design
Automation retarget approach
Generate target information including ISA information and some ABI information from machine description automatically
Produce expanding code automatically by using Olive tool (Steve Tjiang) as the code-generator generator
11/60
TsinghuaOverview of the Current Design
Machine Description
Regular Target Information (ISA, ABI, etc. to generate TARG_INFO)
Tree Patterns for WHIRL Operators (to generate Olive rules)
Others
• Information for other retargetable part
• Abstract model for processor properties (to be developed)
12/60
TsinghuaOverview of the Current Design Design of Prototype System
MachineDescription
Framework for Olive
rules
Olive
rules
ParserA ParserB
Complete manually Code Generator Generator
C source programsto collect
target information
C source programsto perform
code generation
Regular target informationTree patterns
13/60
TsinghuaOverview of the Current Design
Regular Target Information Description
ISA information
• Registers, Operators, Operands, …
ABI information
• Calling convention, …
14/60
TsinghuaOverview of the Current Design
Example
{SECTION "architecture" ARCH = "PPC32";END}{SECTION “registers“ ……END}{SECTION “operands“ ……END}……{SECTION "abi_properties" …… END}……
Regular Target Information Description
15/60
TsinghuaOverview of the Current Design
Files Produced from Machine Description By Regular Target Information ( ParserB )
• isa_registers.cxx, isa_operands.cxx, isa_subset.cxx, isa_bundle.cxx, isa_decode.cxx, isa_enums.cxx, isa_print.cxx, isa_properties.cxx, isa_pseudo.cxx, isa_hazards.cxx, isa_lits.cxx, isa_pack.cxx, isa.cxx,
(under ../common/targ_info/isa/)
• abi_properties.cxx
(under ../common/targ_info/abi/)
• proc_properties.cxx, proc.cxx, (PPC specific)ppc_si.cxx
(under ../common/targ_info/proc/) /*To do*/
16/60
TsinghuaOverview of the Current Design
Produce expanding code automatically
Olive tool
• Code generator generator
• A follow-up to Aho, Ganapathi & Tjiang's TWIG [TOPLAS89]
• Generate C source program to perform optimal instruction selection
( the program implements dynamic programming algorithm with cost function, performing tree pattern matching and graph covering )
17/60
TsinghuaOverview of the Current Design Produce expanding code automatically
Grammar for Olive Rules
rule nonterm tree [cost] actiontree term ( tree_list ) term nontermtreelist tree_list , child childchild tree _cost C-code C-expraction C-code
18/60
TsinghuaOverview of the Current Design
Produce expanding code automatically
Expand WHIRL to TOP
• Produce the expander by Olive
• Input VL-WHIRL tree to the expander (Very Low WHIRL, some registers are exposed)
• The expander produces TOP instruction sequence equivalent to the input WHIRL tree semantically (TOP CGIR-level abstraction)
19/60
TsinghuaOverview of the Current Design
Produce expanding code automatically
Expand WHIRL to TOP
• Only expand expressions in the current design
• Why not expanding the whole tree?
Tradeoff benefit change proportion of original CG structure how easy in writing Olive rules
• To investigate further on this in the future
20/60
TsinghuaOverview of the Current Design
2-Stage Editing for Olive rules
• Stage 1: Abstract description of Olive rules (tree patterns) which will produce the framework used in the next stage
• Stage 2: Fill uncompleted Olive rules in the framework description for the specific target
Produce expanding code automatically
21/60
TsinghuaOverview of the Current Design
Stage 1
• Ex.1 Abstract description of a special Olive rule
#reg : I4ADD(reg, reg)(I4ADD res(0, reg, int32); src(0, reg, int32); src(1, reg, int32);=> "add res(0) src(0) src(1)" 1
)
Cost(count of cycles)
2-Stage Editing for Olive rules
22/60
TsinghuaOverview of the Current Design
• Ex.1 A Olive rule automatically produced by the special Olive rule above
Framework Description Produced by ParserA
reg : I4ADD(reg, reg){ $cost[0].cost = 1 + $cost[2].cost + $cost[3].cost; }
= {$action[2](ops);
$action[3](ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Build_OP(TOP_add, $0->result, $2->result, $3->result, ops); }
2-Stage Editing for Olive rules
23/60
TsinghuaOverview of the Current Design
• Ex.2 Abstract description of a general Olive rule (for PowerPC)
# reg : I4F8TRUNC(f8reg) (=>)
Stage 1
2-Stage Editing for Olive rules
24/60
TsinghuaOverview of the Current Design
• Ex. 2 A Olive rule automatically produced by the general Olive rule above (which is an uncompleted Olive rule)
reg : I4F8TRUNC(f8reg){
} = { }
Framework Description Produced by ParserA
2-Stage Editing for Olive rules
25/60
TsinghuaOverview of the Current Design
Stage 2
• Complete uncompleted Olive rules
2-Stage Editing for Olive rules
26/60
TsinghuaOverview of the Current Design
By Olive Rules
• Update Expand_Expr( )
(under ../be/cg/whirl2ops.cxx)
• Replace expand.cxx, exp_loadstore.cxx, exp_divrem.cxx, exp_branch.cxx, etc.
(under ../be/cg/ppc32/, where ppc32 is target specific)
Files Produced from Machine Description
27/60
TsinghuaPrototype System
Prototype for Retargeting to PowerPc
Connect the Machine Description
Get regular target information from the machine description and distribute them into source trees (in proper form)
Expand WHIRL to TOP
Expander is produced automatically by the Olive tool, to which specific Olive rules is input
28/60
TsinghuaPrototype System
Description for Regular Target Information
ISA and ABI Information
Syntax definition reflects directly the data organization in source code where these information is processed
To be further improved in the future
Connecting to the Compiler
The parser, produced by YACC, translates these information to C programs, then connected to the compiler by Makefile
29/60
TsinghuaPrototype System
Examples: Target Information Description
ISA and ABI Information
{SECTION "architecture"ARCH = "PPC32";
END}
{SECTION "isa_list“isa = add,
add_i,adds,addl,…
END}
30/60
TsinghuaPrototype System
ISA and ABI Information
{SECTION "operand“#name=size,type,lit_classLiteral_Type={simm16=16,SIGNED,LC_simm16;uimm16=16,UNSIGNED,LC_uimm16;uimm5 =5, UNSIGNED,LC_uimm5;}
Register_Type={ …… } Enum_Type={ ……} Use_Type={ ……} Instruction_Group={ ……}END}
Examples: Target Information Description
31/60
TsinghuaPrototype System
ISA and ABI Information
{SECTION "registers" # registers definition
# isa_register_class definition NAME = "integer", BIT_SIZE = 32,
CAN_STORE = true, MULTIPLE_SAVE = false; ……
# isa_register_set definitionRCLASS = rc_integer,MIN_REGNUM = 0,MAX_REGNUM =31,
……END}
Examples: Target Information Description
32/60
TsinghuaPrototype System
ISA and ABI Information
{SECTION "abi_properties" #ABI properties definition
(integer, ABI_PROPERTY) = { {……}; # list of integer registers (REG_LOW_BOUND, REG_UPPER_BOUND) = (0, 31); ALLOCATABLE(0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 31, -1) CALLEE(1, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, -1) CALLER(0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, -1) FUNC_ARG(3, 4, 5, 6, 7, 8, 9, 10, -1) FUNC_VAL(3, 4, -1) STACK_PTR(1, -1) FRAME_PTR(1, -1) GLOBAL_PTR(13, -1)}
(float, ABI_PROPERTY) = { … }
… END}
Examples: Target Information Description
33/60
TsinghuaPrototype System
Expand WHIRL to TOP
Interface to Olive
Example Rules Specific to PowerPC
34/60
TsinghuaPrototype System
Interface to Olive
typedef struct COST { int cost;} COST;
static COST COST_INFINITY = { MAX_INT16 };
static COST COST_ZERO = { 0 };
#define COST_LESS(x,y) ((x).cost < (y).cost)
Costs
35/60
TsinghuaPrototype System
Interfacing to Olive
typedef struct burm_state * STATE;typedef struct olive_node * NODEPTR;typedef struct olive_node * TREE;
#define GET_KIDS(r) ((r)->get_kids())#define OP_LABEL(r) ((r)->op_label())#define STATE_LABEL(r) ((r)->state_label())#define SET_STATE(r,s) (r)->set_state(s)
Trees
36/60
TsinghuaPrototype System Interfacing to Olive
Tree Nodesstruct olive_node{ OPCODE opcode; OPERATOR opr; TOP top; int num_opnds; WN * wn; WN * parent; INTRINSIC intrn_id; TN * result; TN * opnd_tn[OP_MAX_FIXED_OPNDS]; NODEPTR kids[OP_MAX_FIXED_OPNDS]; STATE state; int opc; olive_node(WN * w, WN * p, TN * res, INTRINSIC iid); virtual ~olive_node() ; void set_state(STATE s) { state = s; } STATE state_label() { return state; } NODEPTR* get_kids() { return kids; } int op_label() { return opc; } void Print() { /* printf("WN\n%s\n", dump_wn(wn));*/ } };
37/60
TsinghuaPrototype System
Example Rules Specific to PowerPC
Classification of PowerPC Operators
• Integer (arithmetic/compare/logical/rotate/shift)
• Floating-point (arithmetic/multiply-add/rounding and conversion/compare/status and control register/move)
• Load/Store (integer/floating-point/integer byte-reverse /integer multiple/string)
• Branch (unconditional/conditional/conditional to LR/conditional to CTR)
• Misc (system call/trap/ condition register logical)
38/60
TsinghuaPrototype System
Example Rules Specific to PowerPC
Load/Store
reg : I4I4LDID // Integer load{
$cost[0].cost = 3; // Cycles}
= { $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Handle_Load($0->wn, $0->result, TOP_lwz, ops); }
static TN * Handle_Load(WN * , TN *, TOP, OPS *);
39/60
TsinghuaPrototype System
null : I4STID(reg) // integer store{
$cost[0].cost = 3 + $cost[2].cost; }
= { $action[2](ops); $0->result = $2->result; Handle_Store($0->wn, $0->result, TOP_stw, ops); }
static void Handle_Store(WN * , TN *, TOP, OPS *);
Example Rules Specific to PowerPC
Load/Store
40/60
TsinghuaPrototype System
f4reg : F4F4LDID // floating-point load { $cost[0].cost = 4; } = {
$0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Handle_Float_Load($0->wn, $0->result, TOP_lfs, ops); }
static TN * Handle_Float_Load(WN * , TN *, TOP, OPS *);
Example Rules Specific to PowerPC
Load/Store
41/60
TsinghuaPrototype System
null : F4STID(f4reg) // floating-point store { $cost[0].cost = 4; } = {
$action[2](ops); $0->result = $2->result; Handle_Float_Store($0->wn, $0->result, TOP_stfs, ops); }
static void Handle_Float_Store(WN * , TN *, TOP, OPS *);
Example Rules Specific to PowerPC
Load/Store
42/60
TsinghuaPrototype System
Call
null : I4CALL { $cost[0].cost = 2; } = { Handle_Call_Site($0->wn, $0->opr); };
static void Handle_Call_Site (WN *, OPERATOR);
Example Rules Specific to PowerPC
43/60
TsinghuaPrototype System
Example Rules Specific to PowerPC
Addition
reg : I4ADD(reg, reg){
$cost[0].cost = 1 + $cost[2].cost + $cost[3].cost; }
= {$action[2](ops);$action[3](ops);
$0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Build_OP(TOP_add, $0->result, $2->result, $3->result, ops); }
44/60
TsinghuaPrototype System Example Rules Specific to PowerPC
Addition of Immediate
const : I4INTCONST { $cost[0].cost = 0; } = { $0 = $1; };
reg : I4ADD(reg, const) // small immediate { if (!(ISA_LC_Value_In_Class(WN_const_val($3->wn), LC_simm16))) return 0; $cost[0].cost = 1 + $cost[2].cost; }= { $action[2](ops); $action[3](ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Build_OP(TOP_addi, $0->result, $2->result, Gen_Literal_TN(WN_const_val($3->wn), 4), ops);};
45/60
TsinghuaPrototype System
Example Rules Specific to PowerPC Addition of Immediate (continue)
reg : I4ADD(reg, const) // big immediate { $cost[0].cost = 2 + $cost[2].cost; }= { $action[2](ops); $action[3](ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); INT64 val = WN_const_val($3->wn); Build_OP(TOP_addi, $0->result, $2->result, Gen_Literal_TN((short)(val & 0xffff), 4), ops); Build_OP(TOP_addis, $0->result, $2->result, Gen_Literal_TN((short)(val >> 16), 4), ops);};
46/60
TsinghuaPrototype System
Floating-Point Arithmetic (multiply-add)
f4reg : F4MADD(f4reg, f4reg, f4reg) {
$cost[0].cost = 5+ $cost[2].cost + $cost[3].cost + $cost[4].cost ;}
= {$action[2](ops);
$action[3](ops); $action[4](ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn));
Build_OP(TOP_fdivs, $0->result, $2->result, $3->result, ops); }
Example Rules Specific to PowerPC
47/60
TsinghuaPrototype System Example Rules Specific to PowerPC
Floating-Point Rounding and Conversionreg : I4F8TRUNC(f8reg) { $cost[0].cost = 11 + $cost[2].cost; }
= { $action[2](ops); TN* tmp_tn = Build_TN_Of_Mtype(MTYPE_F8); Build_OP(TOP_fctiwz, tmp_tn, $2->result, ops); ST * tmp_sym = CGSPILL_Get_TN_Spill_Location(tmp_tn, CGSPILL_LRA); INT64 ofst = TN_offset(tmp_tn); ST* base_sym; INT64 base_ofst; Base_Symbol_And_Offset_For_Addressing(tmp_sym, 0, &base_sym, &base_ofst); Build_OP(TOP_stfd, tmp_tn, FP_TN, Gen_Literal_TN(base_ofst, 4), ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Build_OP(TOP_lwz, $0->result, FP_TN, Gen_Literal_TN(base_ofst + 4, 4), ops ); }
48/60
TsinghuaPrototype System
Example Rules Specific to PowerPC Conditional Branch
reg : I4F4GT(f4reg, f4reg) {
$cost[0].cost = 7 + $cost[2].cost + $cost[3].cost ;}
= {$action[2](ops);
$action[3](ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Handle_Cond_Branch(TOP_bgt, TOP_fcmpu, $0->result,
$2->result, $3->result, ops); }
static void Handle_Cond_Branch(TOP, TOP, TN *, TN *, TN *, OPS *);
49/60
TsinghuaPrototype System
Example Rules Specific to PowerPC
static void Expand_Cond (TOP top_branch, TOP top_cmp, TN *dest, TN *src1, TN *src2, OPS *ops)
/*Expand_Cond is an auxiliary function shared by compare operators */
/* For example */
reg : I4F4NE(f4reg, f4reg) vs Expand_Cond(TOP_bne, …)reg : I4F4GT(f4reg, f4reg) vs Expand_Cond(TOP_bgt, …)reg : I4F4EQ(f4reg, f4reg) vs Expand_Cond(TOP_beq, …)reg : I4F4GE(f4reg, f4reg) vs Expand_Cond(TOP_bge, …)reg : I4F4LE(f4reg, f4reg) vs Expand_Cond(TOP_ble, …)reg : I4F4LE(f4reg, f4reg) vs Expand_Cond(TOP_ble, …)
Conditional Branch
50/60
TsinghuaPrototype System
Condition Move
reg : I4I4GT(reg, reg){
$cost[0].cost = 3 + $cost[2].cost + $cost[3].cost ;}
= {$action[2](ops);
$action[3](ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Handle_Cond_Move(OPR_GT, TOP_cmpw, $0->result, $2->result, $3->result, ops); }
static void Handle_Cond_Move(OPERATOR, TOP, TN *, TN *, TN *, OPS *)
Example Rules Specific to PowerPC
51/60
TsinghuaPrototype System
Condition Move
reg : I4I4GE(reg, reg){
$cost[0].cost = 4 + $cost[2].cost + $cost[3].cost ;}
= {$action[2](ops);
$action[3](ops); $0->result = Build_TN_Of_Mtype (WN_rtype($0->wn)); Handle_Cond_Move(OPR_GE, TOP_cmpw, $0->result, $2->result, $3->result, ops); }
static void Handle_Cond_Move(OPERATOR, TOP, TN *, TN *, TN *, OPS *)
Example Rules Specific to PowerPC
52/60
TsinghuaPrototype System
Condition Move static void Handle_Cond_Move (OPERATOR opr, TOP top_cmp, TN *dest, TN *src1, TN *src2, OPS *ops) { Build_OP(top_cmp, Gen_Literal_TN(0, 3), src1, src2, ops); Build_OP(TOP_mfcr, dest, ops); switch (opr) { case OPR_GT: Build_OP(TOP_rlwinm, dest, dest, Gen_Literal_TN(2, 5), Gen_Literal_TN(31, 5), Gen_Literal_TN(31, 5), ops); break; case OPR_GE: …… }}
Example Rules Specific to PowerPC
53/60
TsinghuaPrototype System
static void Handle_Cond_Move (OPERATOR opr, TOP top_cmp, TN *dest, TN *src1, TN *src2, OPS *ops) { …… case OPR_GT: …… case OPR_GE: Build_OP(TOP_rlwinm, dest, dest, Gen_Literal_TN(1, 5), Gen_Literal_TN(31, 5), Gen_Literal_TN(31, 5), ops); Build_OP(TOP_xori, dest, dest, Gen_Literal_TN(1, 16), ops); break; case OPR_EQ: …… case OPR_NE: …… case OPR_LE: …… case OPR_LT: ……}
Condition Move
Example Rules Specific to PowerPC
54/60
TsinghuaPrototype System
int main(){ int a = 1; int b = 2; int c = (a > b);}
-O0, Just show IS
Result Samples Condition Move stwu 1,-48(1) #
mflr 0 #stw 0,52(1) # lcl_spill_temp_0li 7, 1 #stw 7,8(1) # ali 6, 2 #stw 6,12(1) # blwz 4,8(1) # alwz 5,12(1) # bcmpw 0,4,5 # mfcr 3 # rlwinm 3,3,2,31,31 # stw 3,16(1) # cmtlr 0 #addi 1,1,48 #blr #
55/60
TsinghuaPrototype System int foo(int a, int b, int c, int d)
{ return ((a > b) && (c > d));} //-O0 home location for debugging
Result Samples Conditional Branch
.BB1_foo: stwu 1,-48(1) # mflr 9 # stw 9,52(1) # lcl_spill_temp_0 mr 8,3 # stw 8,8(1) # a mr 7,4 # stw 7,12(1) # b mr 5,5 # stw 5,16(1) # c mr 4,6 # stw 4,20(1) # d lwz 0,8(1) # a lwz 3,12(1) # b cmplw 0,0,3 # ble 0,.L_0_5 # .BB2_foo: lwz 10,16(1) # c
lwz 11,20(1) # d cmplw 0,10,11 # ble 0,.L_0_5 # .L_0_6: li 0, 1 # li 12, 1 # b .L_0_4 # .L_0_5: li 0, 0 # li 0, 0 # .L_0_4: lwz 0,28(1) # lcl_spill_temp_1 mr 3,0 # mr 0,0 # lwz 0,52(1) # lcl_spill_temp_0 mtlr 0 # addi 1,1,48 # blr #
56/60
TsinghuaPerspective
From Prototype to Available
Complete Open64 Compiler for PowerPC
Estimated time: July 2007
Testing: Specs, MPC7450 developing board
Sophisticated Machine Description
More Abstract and Compact Syntax Improve parser to process necessary semantic information
57/60
TsinghuaPerspective
From Prototype to Available
Exploit more Automatable Aspects for Retarget
• Resource usage
• Delay slots
• Debugging info
• ……
58/60
TsinghuaPerspective
Research on Retargetability
Difficult Aspects of Retarget
Processor Properties (multi-threads, multi-cores, etc.) Data Prefetch
More Powerful Machine Description Language
Petri Net based Approach
New Back End Architecture to Improve Retargetability
New Software Engineering Methodology
59/60
TsinghuaAcknowledgment
Team at Tsinghua
Students
Zhenyang Yu ([email protected])
Ming Lin ([email protected])
Duo Zhang ([email protected])
Yunmin Zhu ([email protected])
Professors
Shengyuan Wang ([email protected]) Yuan Dong ([email protected])
60/60
Tsinghua
Thank You
Welcome for your suggestions