+ All Categories
Home > Documents > Software Logging under Sp l ti P r ll liz ti nSpeculative...

Software Logging under Sp l ti P r ll liz ti nSpeculative...

Date post: 08-Oct-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
40
Software Logging under Sp l ti P r ll liz ti n Speculative Parallelization M í J ú G á María Jesús Garzarán M. Prvulovic, J. M. Llabería,V. Viñals, L Rauchwerger and J Torrellas L. Rauchwerger, and J.Torrellas Ud Z U f Illi i U . de Zaragoza U. Politècnica de Catalunya U . of Illinois Texas A&M U.
Transcript
Page 1: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Software Logging under Sp l ti P r ll liz ti nSpeculative Parallelization

M í J ú G áMaría Jesús Garzarán

M. Prvulovic, J. M. Llabería,V. Viñals, L Rauchwerger and J TorrellasL. Rauchwerger, and J.Torrellas

U d Z U f Illi iU. de ZaragozaU. Politècnica de Catalunya

U. of IllinoisTexas A&M U.

Page 2: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Roadmap of the Talk

Speculative tasks running in the same processor can l i l i f h i blcreate multiple versions of the same variable

– Stall the processor or redesign the caches

Alternative solution: Logs

Contribution:

D i i i d l i f fDesign, integration and evaluation of software logging on top of a speculation protocol

h h d

2

- cheap, low overhead (10%)

Page 3: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Outline

Speculative Parallelization

Multiple Local Speculative VersionsSoftware Logginggg gEvaluationConclusionsConclusions

3

Page 4: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Speculative Parallelization

Assume no dependences and execute tasks in parallelp pTrack data accessesDetect violationsSquash offending tasks and restart them

Task J+1= A(2)+

Do I = 1 to N… = A(L(I))+… Task J

= A(4)+Task J+2

= A(5)+… A(2)+…

A(2) = ...A(K(I)) = …

EndDo

… A(4)+…

A(5) = ...

… A(5)+…

A(6) = ...RAW

4

Page 5: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Speculative Parallelization

Speculative tasks cannot displace speculative dataS b ff d il k b l iState buffered until task becomes non-speculative

Tasks 3 5 64

Cache

Memory

Network

5

Network

Page 6: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Several Tasks Share a Cache

Processors must hold speculative state of several tasksT k ID fi ld id if h [Ci 00][S f 00]Task-ID field to identify the owner [Cintra00][Stefan00]

Tasks 3 5 64 87

Cache

Memory

Network

6

Network

Page 7: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Outline

Speculative ParallelizationMultiple Local Speculative Versions

Software Logginggg gEvaluationConclusionsConclusions

7

Page 8: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Last and Non-Last Versions

Speculative tasks in the same processor write the same memory addressmemory address

Task 5:

store value1, 0x400

T k 8

non-last version

Task 8 :

store value2, 0x400 last version ….load r4, 0x400 needs last version

8

Page 9: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Multiple Local Speculative Versions

To avoid the stall of the processor:

– Modify the cache

U L– Use Logs

9

Page 10: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Modify the cache

Cache keeps last and non-last versions (same Tag, but different task ID)different task-ID)– complexity and extra comparisons

h f di l i– chances of displacement increase– equally hard access last versions than non-last versions

DataTagTask-ID

58

0x400 value1value20x400

10

Cache

Page 11: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Logs

Cache keeps last versionsL hid l i

Log

Logs hide away non-last versions

g

AddrTask-ID DataDataTagTask-ID

CacheMemory

Data

0x4005 value1

Dataag

8 value20x4008 value20x400

11

Page 12: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

LogsCollect the state that a task made staleUsefulUseful– Free up space when the task commits– To recover in case of squashesq

Undo LogT k 5Task 8Task 10

cache

Task 5

Task 8

last versions

non last versions

Task 10

12

non-last versions

Page 13: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Speculative protocol

Speculative protocol using Hw logs was proposed:

[Zhang99] Y. Zhang. ” Hardware for Speculative Parallelization in DSM Multiprocessors”. Ph.D. thesis, U i i f Illi i M 1999University of Illinois, May 1999

Use Sw Logs on top of a speculative protocol:Use Sw Logs on top of a speculative protocol:

– Task-ID: per memory word in the local memory

– ISA: new ld/st instructions

13

Page 14: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Outline

Speculative ParallelizationMultiple Local Speculative VersionsSoftware Logginggg g

EvaluationConclusionsConclusions

14

Page 15: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Software Logs

A compiler instruments the applicationI i i b f i– Insert : extra instructions before store operations

– Recycle : free up space when a task commits

Interrupt handlers– Recovery : in case of a o-o-o RAW and squash– Retrieval : in-order RAW and the version in the log

15

Page 16: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Software Data Structures

Logs are allocated locally before speculation starts

Task Pointer TableLog Buffer

Task OwnerValid

TaskID Ovflw End Next Vaddr

Owner Task-ID

ij 1

Value

Free

j 1

Free SectorStack

Sector

16

Page 17: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Instructions to insert in log

# Assembly Instruct-------------------------

Ch k l fl 1Check log overflow 1Log.Vaddr = addr of var 2Log. OwnerTask_ID = current Task_ID 2Log.Value = value of var 2Increment log pointer 1Update Task ID 1Update Task_ID 1

Original store

17

-------------Total 9

Page 18: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Reducing unnecessary logging

Create log entry: only 1st write to each variableCreate log entry: only 1st write to each variable– Non-spec vars: easy to identify

Spec vars: hard– Spec vars: hard• Insert run-time check in all spec writes

If 1 l• If 1st, create log entry

==> Much reduced instrumentation overhead

18

Page 19: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Outline

Speculative ParallelizationMultiple Local Speculative VersionsSoftware Logginggg gEvaluation

ConclusionsConclusions

19

Page 20: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Simulation Environment

Execution-driven simulatorScalable multiprocessor: 16 nodesDetailed superscalar processor modelp pProcessor: 4-issue, dynamic, 2K BTB32 KB L1 2 way 512 KB L2 4 way32 KB L1 2-way, 512 KB L2 4-waySpeculative protocol [Zhang99]

20

Page 21: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Applications

Applications dominated by non-analyzable loops (subscripted subscripts)– P3m (NCSA)– Tree (Univ. de Hawaii)– Apsi (Specfp2000)

d f l b

Non analyzable loops account for anaverage of 51 4%– Bdna (Perfect Club)

– Track (Perfect Club)D 3d (HPF2)

average of 51.4% of sequential time

– Dsmc3d (HPF2)Non-analyzable loops and stores to instrument

21

identified by the Polaris compiler

Page 22: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Performance Results

P3m Tree Apsi Bdna

0.6

0.8

1

n Ti

me

0.9 3.5 4.0 8.1 14.6 14.7 3.5 3.8 4.4 6.1 7.0 7.6 p

0

0.2

0.4

g w w g w w g w w g w w

Exec

utio

n

NoL

og Sw Hw

NoL

og Sw Hw

NoL

og Sw Hw

NoL

og Sw Hw

Useful Hazard Sync Memory Stall

Sw only increases execution by 10% over Hw

22

Sw only increases execution by 10% over HwSw reduces execution time by 36% over NoLog

Page 23: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Outline

Speculative ParallelizationMultiple Local Speculative VersionsSoftware Logginggg gEvaluationConclusionsConclusions

23

Page 24: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

ConclusionsLogs:

S l i l i– Support multiple versions– Minimize changes to cache

Software logging:gg g– No hardware support necessary– Low time overhead (10% over HW)( )

Software logging: good solution for spec

24

Software logging: good solution for spec parallelization

Page 25: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Software Logging under Speculative ParallelizationSpeculative Parallelization

María Jesús Garzarán ([email protected])

M. Prvulovic, J. M. Llabería,V. Viñals, dL. Rauchwerger, and J.Torrellas

htt // i /d /DIIS/http://www.cps.unizar.es/deps/DIIS/gazhttp://iacoma.cs.uiuc.edu

Page 26: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

How to access Task_ID (TID)

2 special instructions: lh_ts addr, sh_ts addr

h dd i i l dd f h dwhere addr is virtual address of the data, not of the TID, since TIDs do not have virtual address

lh_ts: bring data from TID page into cachesh_ts: update TID in cache

Dependence-checking HW reads/updates the TID pages in memory automatically

26

Page 27: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Implementation of lh_ts Vaddress2 possibilities:

TLB has 2 physical addresses per entryVaddressVar PaddressVar PaddressTID

TLB only has 1 physical address and there is a fixed offset between PaddressVar and PaddressTID

27

Page 28: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Hardware logging

It has hardware cost:FSM– FSM

– Extra protocol messages– HW in caches to detect first writes– HW in caches to detect first writes...

Need log physical address: complicates recoveryNeed log physical address: complicates recovery– Should not have changed the mapping of Vir to Phys– Recovery needs to be done by priviledged processy y p g p

28

Page 29: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Instructions to insert in log

; r1 = upper limit of the sector; r2 = address in memory to insert the log record; r2 = address in memory to insert the log record; offset(r3) = address of the variable to update

bgt r1 r2 insertionbgt r1, r2, insertion… allocate another sector

insertion: addu r4, r3, offset ; address of the variablesw r4, 0(r2) ; store in the loglh_ts r4, offset(r3) ; load the task-IDsw r4, 4(r2) ; store in the logLogging , ( ) ; glw r4, offset(r3) ; load value of variablesw r4, 8(r2) ; store in the logaddu r2 r2 log record size

instr

29

addu r2, r2, log_record_sizesw r5, offset(r3)

Page 30: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Reducing unnecessary instrumentation

Not all the stores need to be instrumentedIInstrument:– first store of the non-speculative ones– all speculative stores

• Run time filtering of the first speculative store

Others

First

SpeculativeInstrumented

30

First stores Non speculative

Page 31: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Filtering first speculative store

Using Task-ID

lh_ts r6, offset (r3) ; load task-IDbeq r6, r5, no_insert ; first store?addu r4, r3, offset ; insert as usualsw r4, 0(r2)……...

Logginginstr

addu r2, r2, log_record_size

no insert:sh_ts r5, offset (r3) ; store task-ID

instr

no_insert:sw r5, offset (r3)

31

Page 32: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Software handlers

Recovery : Out-of order RAW– Undo the modifications using data from log

Retrieval : Some in-order RAWs– The exposed load needs dig version from log p g g

32

Page 33: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Stores can cause squashes

Stores can produce squashes of tasks that loaded a value prematurelyvalue prematurely– out-of-order RAW

Tasks loadstore3 5 64

Cache

Memory

Network

33

Network

Page 34: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

3 Support for multiple versions

Tasks 3 5 643 5 67 8

Cache

Memory

Network

34

Network

Page 35: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Logs help managing overflow area

Logs hide away past versions of varsOverflow area and cache have the latest versionOverflow area and cache have the latest version– The processor will request the latest version

Task 4Task 7Task 10

Undo Log

T k 7T k 4cache

Task 4

Latest version Past

Task 7Task 4

overflowarea

Task 7 Task 10

Overwriting

35

Latest version Past versions

Overwritingtask

Page 36: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Problem: Address time stamp in software

The time stamp is not mapped in virtual spaceHow to make visible the time stamp to the sw?

lw r3, addr_TS?

V dd V i V l

Undo LogLogginginst

sw r5, offset(r3)Vaddr Version Valueinst

36

Page 37: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Problem: Address time stamp in software

OS copiesdata even

– Data page in even page– Time stamp page in next odd page time

stamp odd

lh_ts r3, offset(r3) Undo LogLoggingi t

sw r5, offset(r3)Vaddr Version Valueinst

37

Page 38: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Log Sizes

(R l )

# Tasks in Undo Log per Processor

Log size/Task(KB)

Appl

Apsi

All Filter Maximum Average

(Recycle)

184 40 24

Appl

Apsi

Dsmc3dP3m

184 4056.7 18.2

1 1 1

2100

250

4

Track 0.3 0.3 6 2

38

Page 39: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Logging under exposed loads

A local version can be killed with an exposed loadHard are m st detect it and send an interr ptHardware must detect it and send an interrupt

Tasks load3 5 64

Cache

Memory

Network

39

Network

Page 40: Software Logging under Sp l ti P r ll liz ti nSpeculative ...iacoma.cs.uiuc.edu/iacoma-papers/PRES/present_wmpi01_2.pdf · whhere addddr ii ldd fhdis virtual address of the data,

Loads find correct version

On a exposed loadh l i l fi d h i– the speculation protocol finds the correct version

– provides it to the consumer task

Tasks load3 5 64

Cache

Memory

Network

40

Network


Recommended