+ All Categories
Home > Documents > EazyHTM : Eager-Lazy Hardware Transactional Memory

EazyHTM : Eager-Lazy Hardware Transactional Memory

Date post: 23-Feb-2016
Category:
Upload: elan
View: 58 times
Download: 0 times
Share this document with a friend
Description:
EazyHTM : Eager-Lazy Hardware Transactional Memory. Saša Tomić , Cristian Perfumo , Chinmay Kulkarni , Adrià Armejach , Adri á n Cristal, Osman Unsal , Tim Harris, Mateo Valero. Barcelona Supercomputing Center, UPC BITS Pilani Microsoft Research Cambridge. Why Transactional Memory?. - PowerPoint PPT Presentation
Popular Tags:
24
EazyHTM: Eager-Lazy Hardware Transactional Memory Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris, Mateo Valero Barcelona Supercomputing Center, UPC BITS Pilani Microsoft Research Cambridge
Transcript
Page 1: EazyHTM : Eager-Lazy Hardware Transactional Memory

EazyHTM: Eager-Lazy Hardware Transactional Memory

Saša Tomić, Cristian Perfumo, Chinmay Kulkarni,

Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris, Mateo Valero

Barcelona Supercomputing Center, UPC

BITS Pilani

Microsoft Research Cambridge

Page 2: EazyHTM : Eager-Lazy Hardware Transactional Memory

2

Why Transactional Memory?• Lock-based parallel programming has

problems– Deadlocks, races, complexity, performance, …

• Transactional Memory (TM) to the rescue– Optimistic concurrency control mechanism– Easy to use– Deadlock free– Supports composability– Protects data in critical sections

• Hardware-TM (HTM), Software-TM (STM) and hybrid

Page 3: EazyHTM : Eager-Lazy Hardware Transactional Memory

3

HTM terminology• Atomic section/transaction: group of

instructions that appear to take effect instantaneously

• Where are speculative values stored (version management):– in-place, and log the original value, or– buffered in private storage, publish on commit

• Conflict: TX writes where others TX reads– Detection: an action in which we check for

conflicts– Resolution: an action performed to resolve

the conflict• Can be abort, stalling the execution, …

Page 4: EazyHTM : Eager-Lazy Hardware Transactional Memory

4

• A.k.a. pessimistic• Writes in-place, detects&resolves conflicts on

every access• LogTM [Moore, HPCA06], LogTM-SE [Yen, HPCA07]

Eager HTM

Stall

W

RR

TX 1

TX 2

TX 3

fastcomm

it

Limitedconcurrency

Fast commit

Slow abort

Page 5: EazyHTM : Eager-Lazy Hardware Transactional Memory

5

• A.k.a. optimistic• Writes buffered, detect&resolve conflicts on

commit• TCC [Hammond, ISCA04], Scalable-TCC [Chafi,

HPCA07]

Lazy HTM

W

RR

TX 1

TX 2

TX 3

complexcommit: validate + write

Fast abort

Complex commit

Good concurrency

Page 6: EazyHTM : Eager-Lazy Hardware Transactional Memory

The MotivationSplitting conflict management

• Eager-Lazy hardware-software TM exists (FlexTM [Shriraman, ISCA08]):– Software begin, commit and abort– Probabilistic (signature based) conflict detection

• EazyHTM is the first pure-hardware TM6

Conflictdetection

Eager

Lazy

Conflict resolution

Eager Lazy

LogTM

TCC, S-TCCImpossible

EazyHTM Fast commit

Good concurrency

Page 7: EazyHTM : Eager-Lazy Hardware Transactional Memory

Outline• Motivation• Contributions• Hardware changes• The Protocol• Evaluation• Conclusions

7

Page 8: EazyHTM : Eager-Lazy Hardware Transactional Memory

EazyHTM Contributions• The best of two worlds

– Eager conflict detection: simple commit/exact list of conflicts in advance

– Lazy conflict resolution: good concurrency• Parallel commits of non-conflicting TXs• Designed for CMPs (Chip-Multiprocessors)

– Use cores proximity– MESI/MOESI protocol upgrade (easier

verification)

8

Page 9: EazyHTM : Eager-Lazy Hardware Transactional Memory

Hardware changes

9

Racers list – 1 bit per coreKillers list – 1 bit per core

SR – 1 bit per lineSM – 1 bit per line

TD – 1 bit per line

Register file checkpoint

Racers list

Killers listCPU

SR Existing cache logic

PrivateCache(s)

SM

TD Existing directory logicDirectory

• tracks conflicts• bit-vector• 32 bits for 32 cores

holds read/write set

read-only optimization bit(details in the paper)

core core core... ... ...

Page 10: EazyHTM : Eager-Lazy Hardware Transactional Memory

Racers and killers list• If line is shared between two TXs:

– Read-Read• No conflict

– Write-Read, Read-Write, Write-Write• Writer adds reader TX into “racers” list

– “TXs that I have to abort” list, if I commit first• Reader adds writer TX into “killers” list

– “TXs that can abort me” list, if they commit first• We illustrate only the Write-after-Read (WAR)

conflict

10

Page 11: EazyHTM : Eager-Lazy Hardware Transactional Memory

txMark @A

ACK @A, 0

... ...

no othersharers

EazyHTM Protocol

Conflict Detection (1/2)

11

racers

killers

TX 0

racers

killers

TX 2

sharers @A

Directory

1

2

TX 0 TX 2BTX

BTXRD A

WR ACTX

CTX

ReplacesGETS/GETX

Page 12: EazyHTM : Eager-Lazy Hardware Transactional Memory

TX 0 TX 2BTX

BTXRD A

WR ACTX

CTX

racers

killers

TX 2

sharers @A

Directory

racers

killers

TX 0

ACK @A, 1txAccessor #2, @A

txMark @A

Reader #0, @A

Potentialconflict

1 othersharer

Writer #2, @A

EazyHTM Protocol

Conflict Detection (2/2)

12

Remember: abort TX#0 on commit

Remember:TX#2 canabort me

1

23

4

5

Page 13: EazyHTM : Eager-Lazy Hardware Transactional Memory

racers

killers

TX 2

racers

killers

TX 0

sharers @A

Directory

Abort from TX#2WR @A (commit)

Abort Ack from TX#0

EazyHTM Protocol

Conflict Resolution

13

TX#2 first came to the commit point, abort TX#0!1

12

3

TX 0 TX 2BTX

BTXRD A

WR ACTX

CTX

Page 14: EazyHTM : Eager-Lazy Hardware Transactional Memory

TX 0 TX 2BTX

BTXWR A

WR BCTX

CTX

TX 0 TX 2BTX

BTXWR A

WR BCTX

CTX

TX 0 TX 2BTX

BTXWR A

WR BCTX

CTX

0 othersharers

EazyHTM Protocol

Disjoint data => parallel commit

14

txMark @B

...

txMark @A

ACK @A, 0

WR @A(commit)

WR @B(commit)

TX#0 works with line @A TX#2 works with line @B

sharers @A

Directorysharers @B

1 1

ACK @B, 022

racers

killers

TX 0

3racers

killers

TX 2

3

...

NO SERIALIZAT

ION 0 othersharers

Page 15: EazyHTM : Eager-Lazy Hardware Transactional Memory

Implementation• Implemented in M5, full-system simulator

(Alpha)• Private L1 (32KB, 4-way, 64B CL, 2 cycles)• Private L2 (512KB, 8-way, 64B CL, 10

cycles)• Memory (with directory, 100 cycles)• ICN (2D Mesh, 10 cycles per hop)

15

Page 16: EazyHTM : Eager-Lazy Hardware Transactional Memory

Evaluation• Evaluated STAMP benchmarks• Compared with Scalable-TCC-like HTM

– Same base simulator– Implemented specialized directory protocol

• Compared with ideal lazy HTM (MESI based)– magical conflict detection– instant conflict resolution– parallel write-back commit

16

Page 17: EazyHTM : Eager-Lazy Hardware Transactional Memory

17

Kmeans Low

• Small TXs (RS 15 CL; WS 5 CL)

• Low contention(10% aborts)

• Similar profile to “replacing locks with atomic”

• Near ideal performance• K-means: groups N-

dimensional space into K clusters

• Most of the SPLASH-2 suite has similar profile0 5 10 15 20 25 30 35

0

5

10

15

20

25

30

Kmeans-Low

IdealEazyHTMSTCC

processors

spee

dup

Page 18: EazyHTM : Eager-Lazy Hardware Transactional Memory

SSCA2

• Small TXs (RS 50 CL, WS 10 CL)

• Low contention(1.2% aborts)

• Near ideal performance• Scalability affected by

barriers, not by contention• SSCA2: large directed

graph operations

18

0 5 10 15 20 25 30 350

0.5

1

1.5

2

2.5

3

3.5

4

4.5

SSCA2

IdealEazyHTMSTCC

processors

spee

dup

Page 19: EazyHTM : Eager-Lazy Hardware Transactional Memory

Yada

• Large TXs (260 CL RS, 140 CL WS)

• Moderate contention (35% aborts)

• We can see good performance also for large TXs!

• Yada: delaunay mesh refinement

19

0 5 10 15 20 25 30 350

2

4

6

8

10

12

Yada

IdealEazyHTMSTCC

processors

spee

dup

Page 20: EazyHTM : Eager-Lazy Hardware Transactional Memory

Intruder

• Medium TXs (53 CL RS, 20 CL WS)

• High contention (85% aborts)

• Very bad scalability for all HTMs

• Every transaction detects conflicts over and over again – lot of conflict detection messages slow down the execution

• Intruder: signature based network intrusion detection system

20

0 5 10 15 20 25 30 35 400

2

4

6

8

10

12

Intruder

IdealEazyHTMSTCC

processors

spee

dup

Page 21: EazyHTM : Eager-Lazy Hardware Transactional Memory

Only high-conflict STAMP

• >50% abort rate only

• High contention high-core-count should be optimized

• Averages:• Labyrinth• Intruder• Kmeans-Hi

• Results highly affected by Intruder

21

0 5 10 15 20 25 30 350

2

4

6

8

10

12

High-conflict STAMP

IdealEazyHTMSTCC

processors

spee

dup

Page 22: EazyHTM : Eager-Lazy Hardware Transactional Memory

Only low-conflict STAMP

• <50% abort rate only

• Low abort rate necessary for scaling

• Excludes:• Labyrinth 8-32• Intruder 16-32• Kmeans-Hi 32

22

0 5 10 15 20 25 30 350

2

4

6

8

10

12

Scaling STAMP

IdealEazyHTMSTCC

processors

spee

dup

Page 23: EazyHTM : Eager-Lazy Hardware Transactional Memory

Conclusions• Introduced EazyHTM, a new HTM implementation

– Eager conflict detection, lazy conflict resolution– Fast: performs well for low conflict parallel applications– Minimal changes to directory protocols (easier

verification)– As scalable as standard directory protocol

• EazyHTM mechanism could allow (future work):– Simpler transaction prioritization– Less wasted work– Better performance optimization– Power efficient TM mechanisms

23

Page 24: EazyHTM : Eager-Lazy Hardware Transactional Memory

Thank you!

Questions? [email protected]

24


Recommended