+ All Categories
Home > Documents > Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is...

Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is...

Date post: 21-Dec-2015
Category:
View: 217 times
Download: 2 times
Share this document with a friend
Popular Tags:
21
Real-Time Address Trace Compression for Emulated and Real System-on-Chip Processor Core Debugging Bojan Mihajlovi´c, Željko Žili´c McGill University Dept. of Electrical and Computer Engineering Montreal, Quebec, Canada GLSVLSI’11, May 2–4, 2011 Presenter: Shao-Jay Hou
Transcript
Page 1: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Real-Time Address Trace Compression for Emulated and Real System-on-Chip

Processor Core Debugging

Bojan Mihajlovi´c, Željko Žili´cMcGill University

Dept. of Electrical and Computer EngineeringMontreal, Quebec, CanadaGLSVLSI’11, May 2–4, 2011

Presenter: Shao-Jay Hou

Page 2: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability to transfer vast amounts of trace data off-chip without significant slow-down has impeded the debugging of such software, in both pre-silicon emulation and in real designs. We consider on-chip trace compression performed in hardware to reduce data volume, using techniques that exploit inherent higher-order redundancy in address trace data. While hardware trace compression is often restricted to poor or moderate performance due to area and memory constraints, we present a parameterizable scheme that leverages the re- sources already found on existing platforms. Harnessing resources such as existing trace buffers on CPUs, and unused embedded memory on FPGA emulation platforms, our trace compression scheme requires only a small additional hardware area to achieve superior compression ratios.

Abstract

Page 3: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

MPSoCs multi-threaded program Traditional debug method can’t be use Non-invasive method is a good way(on-chip emulation)

immense amount of data that must be either stored on-chip or transferred off-chip in real-time trace of a 32-bit processor, 1 clock per instruction, 100

MHz 400 MB/s data Data need to be compressed

What’s the problem?

Page 4: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Related workThis Paper

Compression

algorithms[5]

Combine MTF and LZ

[1]

DMTF[17]

Multi-stage compression

[11]

Lempel-Ziv(LZ)

[18]

MCDS[12]

ARM ETM[2]

Trace compression

schemes

Compression methods

Some example

tools

Page 5: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Proposes method

Page 6: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Compression flow

Page 7: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Why? instructions consecutively until a branch is reached Branch target address

How? Divided into two part

。address 。length

Example:

Consecutive Address Elimination

Page 8: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Compression flow

Page 9: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Why? Branch will be taken or not taken Sequential locality

How? similar to a cache

。miss the first time a set of instructions is encountered。hit for every subsequent encounter that matches the

prediction

Finite Context Method

Page 10: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Compression flow

Page 11: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Why? MTF

。Increase the relevance Prefix

。Assist for differential compression

How? Input address and predicted address Differential compression

Move-to-Front & Address Encoding

Page 12: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Compression flow

Page 13: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Why? Prefix byte compression Probability of prefix

How? Huffman encoding

Run-length and Prefix Encoding

Page 14: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Compression flow

Page 15: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Why? The input for data form MTF/AE stage is 5bytes But the output to LZ stage is 1byte

How? Use a little buffer to save

Data Stream Serializer

Page 16: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Compression flow

Page 17: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Why? The input data has high Repeatability

How? Use LZ compression

。Create a dictionary to save the repeat part。But don’t output the dictionary。While decompression, create a same dictionary

Don’t output every cycle

Lempel-Ziv Encoding of Data Stream

Page 18: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Benchmark : Mibench CPU: Apple PowerMac G4 with a 1.25 GHz

PowerPC 7455, 32-bit fixed instruction-length processor, Linux SMP kernel 2.6.32-24.

Simulation software: ModelSim SE-64 v6.5c

Experimental Results

Page 19: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

Logic utilization

Usage Scenario JTAG software fault 10-3

Experimental Results(cont.)

Page 20: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

This paper presented a parameterizable microarchitecture for address trace compression, suited to implementation on ASICs and modern FPGAs.

Better compression ratio to others

Conclution

Page 21: Presenter: Shao-Jay Hou. In the multicore era, capturing execution traces of processors is indispensable to debugging complex software. The inability.

The paper use a dictionary base, multi-stage compression method, can be use to improve our tracer.

The paper give a inspiration for future work for our tracer

My comment

CPU GPU

Bus

B.T.

P.T.P.T.

T.M.


Recommended