IBM Toronto Lab © 2007 IBM Corporation An Idiom Recognition Framework for Exploiting Complex...

IBM Toronto Lab

© 2007 IBM Corporation

An Idiom Recognition Framework for Exploiting Complex Hardware Instructions

Pramod Ramarao, Joran Siu, Motohiro Kawahito*

IBM Toronto Lab, *IBM Tokyo Research Lab

IBM Toronto Lab

© 2007 IBM Corporation2

Notes about this talk

Implemented in the JIT compiler in IBM JDK for Java 6

Describes a patented methodology

IBM Toronto Lab


Outline

Background

Our approach to idiom recognition

Experiments on the IBM System z platform

Summary

IBM Toronto Lab


What is Idiom Recognition?

Idiom Recognition is a form of pattern matching done by optimizing compilers

Compilers can detect input code sequences in a program and replace them with complex hardware instructions

Performance of such sequences can be dramatically increased by using complex instructions

IBM Toronto Lab


Complex hardware instructions

These are available today

– x86 processors have complex instructions (e.g. ‘repstos’) and have SSE, SSE4 (string and text processing)

– IBM System z processors have a coprocessor that supports character-translation

– POWER has vector instructions

Optimizing compilers can take advantage of these instructions to obtain good performance

IBM Toronto Lab


Example: searching for a single delimiter

do { if (bytes[index] == 13) break; index++; } while(index < bytes.length);

T h i s i s a t e s t . 13

// Intermediate languageindex = SRST(bytes, index, 13) // SRST: SEARCH STRING

bytes:

index

IBM Toronto Lab


Example: searching for a single delimiter

LA R2, 16(bytes, index) // startLA R3, 12(bytes) // lengthLHI R0, 13SRST R3, R2 LR index, R3

T h i s i s a t e s t . 13

bytes:

index

LA R3, 12(bytes) // lengthL001:LB R0, 16(bytes,index) // array loadCHI R0, 13 // checkBRC COND, Label L002AHI index, 1 // incrementCHI index, R3BRC COND, Label L001L002:

No hardware instruction Use hardware instruction


IBM Toronto Lab


SRST instruction performance on IBM System z 990

0

200

400

600

800

1,000

1,200

1,400

1,600

8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128

Number of characters processed by SRST

mil

lio

n c

har

acte

rs /

sec

w/ SRST

w/o SRST

x7

Larger numbers are better

IBM Toronto Lab


Idiom Recognition

Compilers need to match the program source code to an idiom

do { if (bytes[index] op C) break; index++; } while(index < bytes.length)

Example: Idiom of delimiter search

index = SRST(bytes, index, C)

Single delimiter Multiple delimiters

index = TRT(bytes, index, Table)

op will match equality or inequality, such as “==“, “<=“, “!=“, …

C will match any constant.

IBM Toronto Lab


We can use the SRST instruction for all of these examples

do { b = bytes[index]; if (b == 13) break; index++; } while(index < bytes.length);temp = b; // Used after the loop

b = bytes[index];do { if (b == 13) break; index++; b = bytes[index];} while(index < bytes.length);

do { if (bytes[index++] == 13) break; } while(index < bytes.length);

Program 1: (Separated code)

Program 2: (Additional code)

Program 3: (Different order)

IBM Toronto Lab


We can use the SRST instruction for all of these examples

index = SRST(bytes, index, 13)

index = SRST(bytes, index, 13)b = bytes[index]temp = b // Used after the loop

index = SRST(bytes, index, 13)index++







IBM Toronto Lab


Exact pattern matching cannot optimize these examples.


The case for exact matching:







IBM Toronto Lab


Outline

Background

Our approach to idiom recognition


Summary

IBM Toronto Lab


Our approach to Idiom Recognition

Step 1: Find potential candidates by using a topological embedding algorithm

Step 2: Attempt to transform each candidate to exactly match the idiom by applying code transformations

– Partial peeling

– Forward code motion

– Copying store nodes

Computational order is O(|VP||ET| + |EP|)VP: Nodes of the idiom graphEP: Edges of the idiom graphET: Edges of the target graph

IBM Toronto Lab


Topological Embedding (TE)

Uses ordered label directed graphs as a representation, where order of siblings is significant

In exact matching, directed graph P matches T f : P → T

f preserves label, degree and parent relationship

TE relaxes the restriction by requiring f to preserve the ancestor relationship

IBM Toronto Lab


Idiom

a

b c

Exact Matching vs. Topological Embedding

Topological embedding matches if there is a path in the target graph corresponding to each edge in the idiom

ExactMatching

TopologicalEmbedding

Idiom

a

b c

a

b c

a

b c

ZY

Target Graph

an edge to an edge

an edge to a path

IBM Toronto Lab


Our approach using TE

Build a directed graph from IL using opcodes as labels

To detect commutative operations, ignore order of siblings in the graph

Use wild-card nodes to allow matching of different opcodes in a target graph

• E.g., to detect multiple IF statements

Pattern match the target graph (from IL) using TE and apply graph transformations if needed

IBM Toronto Lab


Direct Conversions

Idiom

a

c

i

array load

check it with constants

increment the index

IBM Toronto Lab


Direct Conversions (cont…)

Idiom

a

c

i

array load


increment the index Case 2: Multiple IFs

Case 1: Separated Node

a

c

i

a

a

c1

c2

i

IBM Toronto Lab


Graph transformationsDifferent Order

i

a

c

a

i

c

Idiom

a

c

i

array load


increment the index

IBM Toronto Lab


Graph transformations – Partial peeling

Partialpeeling

Different Order

i

a

c

i

a

c

i

Idiom

a

c

i

array load


increment the index

IBM Toronto Lab


Graph transformations – Forward code motion

Forwardcode motion

Different Order

a

i

c

i

a

c

i

Idiom

a

c

i

array load


increment the index

IBM Toronto Lab


Graph transformations – Copy store nodesAdditional Node

a

S

c

i

Idiom

a

c

i

array load


increment the index

IBM Toronto Lab


Graph transformations – Copy store nodes

S

Copystore nodes

Additional Node

a

S

c

i

a

S

c

i

Idiom

a

c

i

array load


increment the index

IBM Toronto Lab


Graph transformations - Example

Idiom

a

c

i

do { if (bytes[index] == 13) break; index++;} while(index < bytes.length);

do { index++; b = bytes[index]; if (b == 13) break;} while(index < bytes.length);

temp = b; // Used

i

a

S

c

IBM Toronto Lab


Graph transformations – Example (cont…)

Idiom

a

c

i


temp = b; // Used

index++;

do { b = bytes[index]; if (b == 13) break; index++;} while(index < bytes.length);

temp = b; // Used

Partialpeeling


i

a

S

c

i

IBM Toronto Lab


Idiom

a

c

i


index++;do { b = bytes[index]; if (b == 13) break; index++;} while(index < bytes.length);

temp = b; // Used


i

a

S

c

i

IBM Toronto Lab


Idiom

a

c

i


Copy store nodes

index++;do { b = bytes[index]; if (b == 13) break; index++;} while(index < bytes.length);

temp = b; // Used

index++;


b = bytes[index];temp = b; // Used


i

a

S

c

iS

IBM Toronto Lab


Transformation steps for example

Idiom

a

c

i

index++;

index = SRST(…)



temp = b; // Used

index++;




IBM Toronto Lab


Outline

Background

Our approach for idiom recognition


Summary

IBM Toronto Lab


Implemented idioms

Idiom Name Description

findbytes Search for delimiters

arraytranslate Conversion of character codes

memcpy Copy memory

memset Fill memory

memcmp Compare memory

IBM Toronto Lab



Environment: System z990 2084-316, 64-bit, 8 GB RAM, Linux

Three algorithm variants:

– Baseline: No matching done

– Exact Match

– Our approach: our approach in addition to exact match

Benchmarks used

– Micro-benchmarks for J2SE class files

– IBM XML Parser

– Codepage Converter primitives

IBM Toronto Lab


High-level Flow Diagram

Idiom Recognition

Find candidate loops

Transform to match the idiom

Faster Code

Loop Canonicalization &Loop Versioning

Canonicalize each loop

ExactMatching

TopologicalEmbedding

Graph Transformations

…optimizations…

…optimizations…

IBM Toronto Lab


Performance improvements - Micro-Benchmarks

0%

50%

100%

150%

200%

250%

300%

350%

16 32 64 128 16 32 64 128

Number of characters processed by hardware instructions

Imp

rove

men

t

Our approach

Exact Match

java/lang/String.compareTo() java/io/BufferedReader.readLine()

Larger numbers are better(Baseline = “No match” normalized to 100%)

IBM Toronto Lab


Performance improvements - IBM XML Parser

111%

240%

142%

0%

50%

100%

150%

200%

250%

300%

small=10Kb medium=9M large=13M

Size of input XML document

Imp

rove

men

t

Our approach

Exact Match


IBM Toronto Lab


Performance improvements - Codepage Converter primitives

0%

100%

200%

300%

400%

500%

600%

Codepage

Imp

rov

em

en

t

Our approach

Exact Match


IBM Toronto Lab


Compilation Time

Reduce compilation time

– Filters to exclude target candidates unlikely to be matched

– Applied at higher optimization levels on frequently executed methods

• Match selected idioms at lower optimization levels

Measured maximum compilation time overhead of 0.28%

IBM Toronto Lab


Summary

New approach for idiom recognition

– Much more powerful than exact matching

Significant performance improvements

– Up to 240% on IBM XML parser

– Small compilation time overhead 0.28%

Future work:

– More idioms

– More graph transformations

– More architectures

IBM Toronto Lab


Thank you

Date post:	14-Jan-2016
Category:	Documents
Upload:	presley-hoare
View:	218 times
Download:	0 times

IBM Toronto Lab © 2007 IBM Corporation An Idiom Recognition Framework for Exploiting Complex...

Documents