+ All Categories
Home > Documents > Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore...

Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore...

Date post: 20-Dec-2015
Category:
View: 214 times
Download: 1 times
Share this document with a friend
Popular Tags:
18
Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session
Transcript
Page 1: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Data Access Profiling & Improved Structure Field Regrouping in Pegasus

Vas Chellappa & Matt MooreMay 2, 2005 / Optimizing Compilers / Project Poster Session

Page 2: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Introduction

Structure definitions group fields by semantics, not access contemporaneity

Data access profiling can be used to improve cache performance by reordering for contemporaneity

In this context, contemporaneity is a measure of how close in time two data accesses to structure fields occur

Page 3: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Problem Statement

Obtaining contemporaneity information for structure fields

Exploiting this information to improve the ordering of the fields

Doing this within the CASH/Pegasus environment

Page 4: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Approach

Pegasus Implementation Data Access Profiling to track contemporaneous

field accesses to build the Field Affinity Graphs Modify Simulator interface to SimpleScalar (3rd

party cache simulator) to achieve this Regrouping Algorithm

Field Affinity Graphs built by the modified Simulator are then used to recommend reorderings based on a new regrouping algorithm

Page 5: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Project Design

C Source

SUIF/C2DIL

Simulator

load/dump

.cir file(Pegasus representation)

RFU Simulator

Tagged Pegasus IR

SimpleScalar(libcachesim)

Memory Accesses

Regrouper(libregroup)

Contemporaneous Accesses

End of Simulation

OutputCycles

Regroupings

Legend

Unmodified

New

Modified

Page 6: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Design Overview

1. Build stage: Tag structure field accesses in the Pegasus IR

2. Simulation stage: Propagate tag information through SimpleScalar to the new regroup library

3. Final stage: Invoke regrouping algorithm to calculate reordering recommendations

Page 7: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Build Stage, Tagging Accesses

Objective: Identify and tag structure field accesses in the Pegasus IR

Not trivial, since SUIF/C2DIL do not preserve required type information during transformation to IR

Need to identify patterns that indicate structure field accesses

Page 8: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Field Accesses in Pegasus

+

Structure pointer(Structure’s base address)

Field offset

Memory Op(Load/Store)

Structure pointer

(structure specific)

Add

ress

intF

ield

type

Typical structure field access made through a structure pointer

+

Structure Address

Field offset

Memory Op(Load/Store)

Address (int)

Add

ress

int

Fie

ld ty

pe

Typical structure field access made to a structure variable on

the stack.

Structure Address

Memory Op(Load/Store)

Add

ress

Fie

ld ty

pe

Optimized to

Since the base address and offsets are constants, they can be, and are, optimized away. There is no way to know that this represents a structure field access. Also, there is no wire that now contains the structure type. Type information is thus lost,

and impossible to recreate.

Page 9: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Actual Pegasus Illustration

int foo(struct my_t stestfoo) { int retval = stestfoo.f2; return(retval);}

Which wire here should have struct type?

int foo(struct my_t* stestfoo) { return(stestfoo->f2);}

Which wire here has struct type?

Page 10: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Simulation Process

Tag info on loads and stores is propagated through SimpleScalar to the regrouping library that builds the field affinity graph (done online, during simulation)

Page 11: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Regrouping Stage

After simulation, analyze collected profiling data to produce reordering recommendation

Can be done better than has been done in previous work (greedy)

Cannot be done optimally (NP-hard) Field Affinity Graph (one per structure):

Vertices: fields in a structure Edge weights: represent degree of

contemporaneity of accesses between the fields

Page 12: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Matching Heuristic

Find a maximum weight matching in the field affinity graph

Fields that will not fit into a cache line together anyway are identified and ignored

Structure is reordered by placing matched fields together

Page 13: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Greedy vs. Matching

struct foo { int f1; int f3; int f2; int f4;}

Matching-Based Field Ordering

f1 f2

f3 f4

1000

900

1

900

Greedy Field Ordering

Cache Layout (8 byte lines):

struct foo { int f1; int f2; int f3; int f4;}

f1 f2

f3 f4

1000

900

1

900

Cache Layout (8 byte lines):

Page 14: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

NP-Hardness

NP-Hardness is shown by reducing graph coloring problem to regrouping problem

1

1

1

11

-1

-1

-1

-1

-1-1

Reduction

K-Coloring Regrouping (K cache lines)

Page 15: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Results

Implemented successfully to handle structure field accesses done through pointers (ptr->fld)

So far, only small programs have been tested

Reordering is done manually and fed into simulator again to obtain the number of cycles for comparison

Page 16: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Results - Example

Original:struct my_t {

int f1;

int f2;

char nu[4096];

int f3;

int f4;

};

int foo(struct my_t *elt)

{

int i;

elt->f1 = 2;

elt->f4 = 100;

for(i=0; i < 50; i++)

{

elt->f1++;

elt->f4--;

}

return elt->f1+elt->f4;

}

750 Cycles per Call 745 Cycles per Call(one less cache miss)

Modified:struct my_t {

int f1;

int f4;

int f2;

char nu[4096];

int f3;

};

int foo(struct my_t *elt)

{

int i;

elt->f1 = 2;

elt->f4 = 100;

for(i=0; i < 50; i++)

{

elt->f1++;

elt->f4--;

}

return elt->f1+elt->f4;

}

Page 17: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

Conclusion

Performance improvements are achievable even on simple programs using reorganization recommendations

Propagation of full type information in SUIF/c2dil from source would be required to optimize non-pointer accesses

Less memory-exposed languages would allow for easy and quick implementation of the reordering recommendation

Page 18: Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session.

References

Trishul M. Chilimbi, Bob Davidson, and James R. Larus, “Cache-Conscious Structure Definition,'' in Proceedings of the ACM SIGPLAN '99 Conference on Programming Language Design and Implementation, pages 13-24, May 1999.

Mathprog (Weighted Matching Algorithm) http://elib.zib.de/pub/Packages/mathprog/matching/weighted/

Pegasus: http://www-2.cs.cmu.edu/~phoenix/

SUIF: http://suif.stanford.edu/

SimpleScalar Tool set: http://www.cs.wisc.edu/~mscalar/simplescalar.html


Recommended