+ All Categories
Home > Documents > Effective Compilation Support for Variable Instruction Set Architecture

Effective Compilation Support for Variable Instruction Set Architecture

Date post: 05-Jan-2016
Category:
Upload: azize
View: 26 times
Download: 3 times
Share this document with a friend
Description:
Jack Liu Timothy Kong Fred Chow Cognigine Corp. www.cognigine.com. Effective Compilation Support for Variable Instruction Set Architecture. Outline. VISC Architecture Compile-time Configurable Code Generation Managing the Dictionary Concluding Remarks. Configurable Computing. - PowerPoint PPT Presentation
24
1 1 Effective Compilation Support for Variable Instruction Set Architecture Jack Liu Timothy Kong Fred Chow Cognigine Corp. www.cognigine.com
Transcript
Page 1: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 1

Effective Compilation Support for Variable Instruction Set

Architecture

Jack LiuTimothy Kong

Fred ChowCognigine Corp.

www.cognigine.com

Page 2: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 2

Outline

1. VISC Architecture

2. Compile-time Configurable Code Generation

3. Managing the Dictionary

4. Concluding Remarks

Page 3: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 3

Configurable Computing

Motivation• Higher performance

• processor and instruction set customized to

type of application

• Lower hardware cost

• non-essential features excluded

• Shorter time-to-market

Page 4: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 4

Variable Instruction Set Architecture (VISC ArchitectureTM)

A new approach to configurable computing:

• Fixed processor hardware

• Many types of operations provided

• Numerous instruction variants (CISC-style)

• Per-program instruction set tailoring during compile time

Page 5: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 5

Background of this work

Cognigine CGN16100 Network Processor• Single-chip, fully programmable network processor

• Processing cores:

16 Re-configurable Communications Units (RCU) processor cores

• VISC architecture• 4 64-bit parallel execution units• Multi-threaded• 512 KB on-chip memory (text and data)

Page 6: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 6

VISC ArchitectureTM

Dictionary (instruction set for current program)

instruction

dictionary entry:32-bit: 2 operations64-bit: 4 operations128-bit: 8 operations

opcode: 8-bit

256

256

entr

ies

opcode opnd0 opnd1 opnd2 opnd3

Page 7: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 7

Motivation for VISC Architecture

1. Efficient way to encode/decode the many operation variants with different addressing modes

• Not all used in each program

2. High instruction encoding density

• Small opcode bit count

• Operands shared among multiple operations

3. Simplified control logic for VLIW-style ILP

• Up to 8 operations per cycle

Page 8: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 8

Operation Specification

In Dictionary Entry (only specified once):1. Operation name2. Operation variants:

• Signed and unsigned• Operand and result sizes — 8-bit, 16-bit, 32-bit, 64-bit

• Support different sizes among operand(s) or result• Vector — 64v8, 64v16, 64v32, 32v8, 32v16

3. Data path to each operand/result

In Instruction:1. Operands’ encoding formats2. Actual operands

Page 9: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 9

RCU Architecture• 5 Stage Pipeline• 4-way multi-threaded• Hardware RSF synchronization

• 128 bit reconfigurable address path• 256 bit reconfigurable data path

ExecutionUnit

64

PointerFile Dictionary

Registers, Scratch Memory

Packet Buffers DataMemory

InstructionCache

RSF Connector

Dic

tio

nar

yD

eco

de

ExecutionUnit

ExecutionUnit

ExecutionUnit

SourceRoute

SourceRoute

SourceRoute

SourceRoute

Ad

dre

ssC

alcu

lati

on

Pip

eline &

Th

readC

on

trol

64 64 64

Dat

a F

low

Syn

chro

niz

atio

n“Back-side” Ports

RSF

256

64

128 128 64

Page 10: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 10

Roles of Compiler for VISC Architecture

1. Determine best instruction set stored in dictionary for best execution time performance

2. Generate optimized code sequence based on best instruction set

3. Cater to various hardware limitations:

• Dictionary limit

• Data path constraints

• Dictionary and Instruction encoding constraints

Page 11: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 11

New Compilation Approach: Configurable Code Generation

• Exact form of generated instructions decided in the last instruction scheduling phase

• Direct result of instruction compaction based on what is allowed by the hardware

Page 12: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 12

Compiler Implementation Method

• Retarget SGI Pro64 (Open64) compiler to an Abstract Machine

• Code generator operates on an Abstract Operation Representation– Code generation optimizations left intact

• Add new Instruction and Dictionary Finalization (IDF) phase as post-passIDF Phase 1:– Instruction scheduling and folding– Abstract operations converted to target code sequence

IDF Phase 2:– Output VISC instructions and dictionary entries

Page 13: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 13

Compiler Phase Structure

GNU / Pro64TM Front-end

WHIRL Optimizer

Code Generator

IDF

Pro64TM Back-end

C

Assembly Program: Instructions Dictionary

Page 14: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 14

Abstract Operation Representation (AOR)

Each operation corresponds to a micro-operation in the core execution units

• RISC-like formats– r1 = op r2, r3– r2 = load <offset>(<base>)– store r2 <offset>(<base>)– r1 = loadimm <imm>

• Optimizations in AOR reflected in final code• No pre-disposition of compiler to any specific

instruction format

Page 15: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 15

Multiple AOR ops can be combined to single target operation

Operations taking immediate operandr2 = move <imm> => r3 = addi r1 <imm>r3 = add r1, r2

Operations supporting memory operandsr2 = load 4(sp) => r3 = add r1 4(sp)r3 = add r1, r2

Post incre/decre memory operationsr2 = load 0(r1) => r2 = load 0(r1++)r1 = addi r1, 4

Branches on condition codesr1 = add r2, r3. . . r1 = add r2, r3compare (r1 != 0) => br.z label (only if immediately after)br.z label

Others

Page 16: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 16

IDF Approach

Instruction scheduling + following tasks:– Instruction folding– Opcode selection– Modelling of irregular hardware constraints– Modelling of encoding constraints– Monitoring of states of condition codes and

transient registers– Keeping track of dictionary contents

Use enumeration (branch and bound) approach

Page 17: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 17

Example of IDF Processing

$w80 = move 0x55$w91 = move 0xf8$w70 = add $w70, $w80$w71 = xor $w92, $w80$w90 = sub $w92, $w91store 8($p1) = $w90

Dictionary

add xor sub nop

instruction

• move and store instructions subsumed• $w71, $w92 mapped to transient registers

Input

3 add xor sub nop

op3 8($p1) $w70 0x55 0xf8

Page 18: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 18

IDF Scheduling Algorithm

To speed up the search:

Shrink solution space by:– Coming up with high

initial boundsch

– Prune useless search paths continuously

• Tight hardware constraints help

start

Estimate initial boundsch

Search for schedule with length <=

boundsch

succeed?

end

yes

no

Input: Sequence of operations in BB

boundsch= boundsch+1

Page 19: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 19

Managing the Dictionary

• Dictionary usage increases due to:– Program size: more variety of operations– High ILP: more combination of operations– Library code linked in

• Currently, dictionary contents fixed for each executable• Role of linker:

– Merge dictionary entries with identical contents across files/libraries

– Error message on dictionary overflow• Role of compiler:

– Maximize dictionary entry re-use

Page 20: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 20

Dictionary Compilation

Strategy:• Keep track of existing dictionary entries during compilation

– Extract dictionary entries from:• Libraries and .s files being linked• .o files compiled before current file

Example: cc a.c b.o c.s– Maintain table of existing dictionary entries– Add to table as new entries are generated

• Re-use existing dictionary entries • Bias scheduling towards dictionary conservation as

dictionary fills up

Page 21: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 21

User Control of Dictionary CompilationBest program performance demands near-full

dictionary.When dictionary overflow, needs to re-compile.Provide user control mechanisms:

– Trade-off between dictionary consumption and program performance

– Command line option: -CG:dict_usage=n n = 0…10– Embedded in code: #pragma dict_usage n

dict_usage is dictionary budget guideline for IDF– Low dict_usage:

• Less new dictionary entries created• Low ILP

– High dict_usage: • Tighter instruction schedule• More dictionary entries created

Page 22: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 22

Additional search goal bounddict

– Number of new dictionary entries allowed for current BB– Automatically adjust lower with more pre-existing entries

When bounddict

reached during enumeration, disallow creating new dictionary entry (unless single operation)

IDF Support of dict_usage

0

100

200

300

400

500

600

700

800

10 8 3 2 0

dict_usage

instructions

dict entries

Page 23: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 23

Experimental Results

Summary (with dict_usage=10):• ILP from IDF scheduling: 1.38 ops per instruction• ILP from relaxed scheduling: 1.51 ops per instruction• 23% of all subsumable operations subsumed• Each dictionary entry referred to by 2.63 instructions

(statically)• Scheduling via enumeration: 100 times slower than

one-pass schedulers• Compilation time: 1 to 2 minutes per program

Page 24: Effective Compilation Support for Variable Instruction Set Architecture

1111111111 24

Concluding Remarks• VISC approach most suitable as embedded processors

– Limited program size– Dictionary space less of an issue– Slow compilation tolerable– CISC-style instructions enable small code size

• Compilation support key to deploying applications on VISC– Very hard to write in assembly language– Advanced optimizations performed by compiler– Dictionary managed by compiler with user hints

• Compile-time configurable code generation enables RISC compilation techniques to generate CISC output


Recommended