+ All Categories
Home > Documents > Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering...

Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering...

Date post: 14-Dec-2015
Category:
Upload: carly-whitling
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
25
Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota
Transcript
Page 1: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Dynamic Optimization using ADORE Framework

10/22/2003

Wei Hsu

Computer Science and Engineering Department

University of Minnesota

Page 2: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

• Compiler Optimization:

The phases of compilation that generates good code to make as efficiently use of the target machines as possible.

• Static Optimization:

Compile time optimization – one time, fixed optimization that will not change after distribution.

• Dynamic Optimization:

Optimization performed at program execution time – adaptive to the execution environment.

Background

Page 3: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Instruction scheduling

Cache prefetching

Examples of Compiler Optimizations

Ld R1,(R2)Add R3,R1,R4Ld R5,(R6)Add R7,R5,R4

Ld R1,(R2)Ld R5,(R6)Add R3,R1,R4Add R7,R5,R4

Ld R1,(R2)Addi R2,R2,64Add R3,R1,R4

Ld R1,(R2)prefetch 256(R2)Addi R2,R2,64Add R3,R1,R4

Frequent data cache misses !!

Page 4: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

In the last 15 years, the computer performance has increased by ~1000 times. Clock rate increased by ~100 X Micro-architecture contributed ~5X

(the number of transistors doubles every 18 months)

Compiler optimization added ~2-3X for single processors (some overlap between clock rate and micro-architectures, and some overlap between micro-architecture and compiler optimizations)

Is Compiler Optimization Important ?

Page 5: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Speed up from Compiler Optimization

0

1

2

3

4

5

6

SPEC95Int (running on HP-PA8000)

Sp

eed

up

O1

O2

O3

O4

O4 + PBO

Page 6: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Speed up from Compiler Optimization

0

5

10

15

20

25

30

35

40

Spec95fp (Running on HP-PA-8000)

Sp

eed

up

O1

O2

O3

O4

O4 + PBO

Page 7: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Excellent Benchmark Performance

02

468

101214

1618

Spec2000Int (Runing on HP/Intel Itanium)

Sp

eed

up O2

O3

O3 + PBO

Page 8: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Mediocre Application Performance

• Many application binaries not optimized by compilers.

• ISV releases one binary for all machines in the same architecture (e.g. P5), but the binary may not run efficiently on the user’s machine (e.g. P6).

• ISV might have optimized code with some profiles exercising different parts of the application than what is actually executed.

• Application is built from many shared libraries, but no cross-library optimizations.

Performance not effectively delivered for end-users!!

Page 9: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Instruction scheduling

Cache prefetching

Examples of Compiler Optimizations

Ld R1,(R2)Add R3,R1,R4Ld R5,(R6)Add R7,R5,R4

Ld R1,(R2)Ld R5,(R6)Add R3,R1,R4Add R7,R5,R4

Ld R1,(R2)Addi R2,R2,64Add R3,R1,R4

Ld R1,(R2)prefetch 256(R2)Addi R2,R2,64Add R3,R1,R4

What if the load latency is 4 clocks instead of 2?

Does the compiler know where are data cache misses?

Page 10: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Execution environment can be quite different from the assumption made at compile time.Code should be optimized for the

machine it runs onCode should be optimized by how

the code is usedCode should be optimized when all

executables are availableCode should be optimized only the

part that matters

A Case for Dynamic Optimization

Page 11: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

ADORE ADaptive Object code RE-optimization

• The goal of ADORE is to create a system that transparently finds and optimizes performance critical code at runtime.– Adapting to new micro-architectures– Adapting to different user environments– Adapting to dynamic program behavior– Optimizing shared library calls

• A prototype ADORE has been implemented on the Itanium/Linux platform.

Page 12: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Framework of ADORE

Main Program

OptimizedTracePool

Main Thread

Trace Selector

Optimizer

PatcherPhase

Detector

User Event Buffer (UEB)

DynOpt Thread

Kernel SpaceSystem Sample

Buffer (SSB)

Page 13: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Current Optimizations in ADORE

• We have implemented – Data cache prefetching– Trace selection and layout

• We are investigating and testing the following optimizations– Instruction scheduling with control and data

speculation– Instruction cache prefetching– Partial dead code elimination

Page 14: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Speedup

-10%

0%

10%

20%

30%

40%

50%

60%

70%

bzip2 gz

ipm

cf vpr

pars

erga

p

vorte

xgc

c

amm

p art

applu

equa

ke

face

rec

fma3

dluc

as

mes

asw

imBlas

t

O2 + RuntimePrefetching

Performance Impact of O2/O3 Binary

Page 15: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Mcf

0

1

2

3

4

5

6

7

8

9

Execution Time

CPI

Original Program

with ADORE

Art

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Execution Time

CPI

with ADORE

Original Program

Page 16: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Optimizing BLAST with ADORE

• BLAST is the most popular tool used in bioinformatics. Several faculty members and research colleagues are using it.

• Used as a benchmark by companies to test their latest systems and processors

• The performance of BLAST matters.

Page 17: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Speedup from BLAST queriesSpeedUp of queries from ADORE and ECC over ORC

-15

-10

-5

0

5

10

15

20

25

30

35

blastn ntnt.1

blastn ntnt.45min

blastp nraa.1

blastp nraa.10

tblastn ntaa.q1

blastx nrnt.1

blastx nrnt.45min

blastx nrnt.10

Queries

Per

cen

tag

e S

pee

du

p

Speedup from ADORE Speedup from Ecc

Page 18: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Cycle Accounting for Various Queries

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

blastn, nt,nt.1 (Singleshort query)

blastn, nt,nt.45min

(Single longquery)

blastp nraa.1 (Singleshort query)

blastp, nr,aa.10

(Multipleshort

queries)

tblastn, nt,aa.q1

(Singlemediumquery)

blastx, nr,nt.1 (Singleshort query)

blastx, nr,nt.45min(Single

medium-short query)

blastx, nr,nt.10

(Multipleshort

queries)

Queries

Pe

rce

nta

ge

of

tota

l c

yc

les

Support register dependency stalls

Integer register dependency stalls

RSE stalls

FPU stalls

Branch misprediction stalls

I-Cache stalls

D-Cache stalls

Unstalled cycles

Page 19: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Observations from BLAST• ADORE is robust. It can handle real, large

application code.• ADORE does not speed up all queries, since

the code is already running quite efficiently on Itanium systems. It adds about 1-2% of profiling and optimization overhead.

• ADORE does speed up one long query by 30%.

• It is difficult to further improve performance of BLAST by static compilers.

Page 20: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Future Direction of ADORE

• Show more performance on more real applications

• Make ADORE more transparent– Compiler independent– Exception handling

• Study the impact of compiler annotations

• Study architectural/Micro-architectural support for ADORE

Page 21: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

ADORE Group• Professors

– Prof. Wei-Chung Hsu– Prof. Pen-Chung Yew– Dr. Bobbie Othmer

• Graduate Students–Howard Chen–Jiwei Lu–Jinpyo Kim–Sagar Dalvi–Rao Fu–WeiChuan Dong

–Abhinav Das–Dwarakanath Rajagopal–Ananth Lingamneni–Vijayakrishna Griddaluru–Amruta Inamdar–Aditya Saxena

Page 22: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.
Page 23: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Summary• Dynamic Binary Optimization customizes

performance delivery.

• The ADORE project at U. of Minnesota is a research dynamic binary optimizer. It demonstrates a good performance potential.

• With architecture/micro-architecture and static compiler support, a future dynamic optimizer could be more effective, more adaptive and more applicable.

Page 24: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Conclusion

Be Adaptive !!

Be Dynamic !!

Page 25: Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.

Dynamic Translation• Fast Simulation

– SimOS (Stanford), SHADE (SUN)• Migration

– DAISY, BOA (IBM), Virtual PC, ARIES (HP), Crusoe (Transmeta)

• Internet applications– Java HotSpot, MS dot NET

• Performance Tools (dynamic instrumentation)– Paradyn and EEL (UW), Caliper (HP)

• Optimization– Dynamo, Tinker (NCSU), Morph (Harvard),

DyC (UW)


Recommended