Competition => Collaboration - University of Washington · 2013. 3. 1. · Competition =>...

Post on 14-Sep-2020

1 views 0 download

transcript

©2009, 2013 Simon Kahan

Competition => Collaborationa runtime-centric view of parallel computation

Simon Kahanskahan@cs.washington.edu

©2009, 2013 Simon Kahan

Competition

Multiple entities in contention for limited, indivisible resources or opportunities

©2009, 2013 Simon Kahan

Direct Mitigation Techniques• Take turns

• Share

• Find more dolls

©2009, 2013 Simon Kahan

Direct Mitigation Techniques• Take turns

– Mutual Exclusion

• Share– Transactions

• Find more dolls– Replication (eg, of Data Structures)

©2009, 2013 Simon Kahan

Direct Mitigation Techniques• Take turns

– Mutual exclusion– Delay is linear in concurrency: does not scale

• Share– Transactions– Aborted work is up to quadratic in concurrency: does not scale

• Find more dolls– Replication (eg, of Data Structures)– Cost ~ maximum concurrency sustained + coherency overheads: does not scale

©2009, 2013 Simon Kahan

Collaboration

Entities align to reduce contention, increase throughput.

©2009, 2013 Simon Kahan

Transform competition to collaboration?

Why won’t these people collaborate ?!

©2009, 2013 Simon Kahan

Are computers better collaborators?

©2009, 2013 Simon Kahan 9

MTA-2 Processor

load, store, int_fetch_add

+ - * /, etc.

Arithmetic Pipeline

Every clock cycle, a ready instruction may begin execution…

(M A C) Stream1

Stream2

.

.

.

Stream128

Instructions

©2009, 2013 Simon Kahan 10

CPU

Memory

Be Parallel or Die.

CPU

CPU

Simplified Cray/Tera MTA-2 System Architecture

4,00

0 A

ctiv

e Th

read

s

©2009, 2013 Simon Kahan

OS

Heap

Application

Memory Allocation

sbreak()

malloc(), free()

©2009, 2013 Simon Kahan

OS

Heap

Application

Parallel Memory Allocation

sbreak()

malloc(), free()malloc(), free()malloc(), free()malloc(), free()

©2009, 2013 Simon Kahan

OS

Application

Replication for Concurrency

Heap HeapHeap

sbreak() sbreak() sbreak()

©2009, 2013 Simon Kahan

OS

Application

Increase heap size to lower sbreak rate

Heapsbreak()

Heap Heap

Q: What’s wrong with this picture? A: O(P2) wasted space!

©2009, 2013 Simon Kahan

Can collaboration help?

• Idea: apply the ticket line trick!– tasks need to “find” each other– aggregate their requests into one– one “master” task continues; other waits– until master finds heap uncontended, repeat process– master locks heap, fulfills request, unlocks heap– master recursively splits and awakens waiters

• Simon Kahan and Petr Konecny. 2006. "MAMA!": a memory allocator for multithreaded architectures. PPoPP '06.

15

©2009, 2013 Simon Kahan

Combining FunnelsConcurrent Asynchronous Individual

Malloc and Free Requests

Tim

e: lg

F

Aggregate Requests of Size at most F served serially.(Output rate is at most a constant.)

“Funnel”:combining data structure

Concurrency: F

See: “Combining Funnels: a Dynamic Approach to Software Combining”, Nir Shavit, Asaph Zemach, 1999

©2009, 2013 Simon Kahan

Aggregates: Pennants for speedSingle requests (Pennants of order 0)

Combine

Combine

Combine

• Merge is 2 ops:

T2.left = T1.right

T1.right = T2

• Balanced

Unlike linked lists,supports parallel traversal

•Unique representation

©2009, 2013 Simon Kahan

Tree-Heap

NULL NULL………

NULL

while (int_fetch_add(&sem, 1)) try_combine();heap_op(); sem = 0;

Allocate tries for corresponding slot; if empty, marches to right.Free tries for corresponding slot; if full, combines and carries.It’s just binary arithmetic! Worst-case O(log N); Average O(1)

©2009, 2013 Simon Kahan

Instructions vs Delay

MAMA! 1 cpu is highest

MTA Malloc 10 cpu highest (instructions)

©2009, 2013 Simon Kahan

Original MTA malloc vs MAMA

220 MHz MTA-40, 100 streams per processor

MTA

MAMA

40

40

5

5

©2009, 2013 Simon Kahan

annihilate (or void fn).

General Combining SchemeAsynchronous (Competitive)•Arbitrary # computations•Any number of threads•Timing of interaction arbitrary•Chaos!

Synchronous (Collaborative)•Single computation•Number of threads is explicit•Synchronized, exclusive access to data•Order!

DataStructure

make requests …Asynchronous threads…

combine in funnel…aggregate tries lock…

if fail, circulate...

got lock!

re-enter funnel and try again…

Satisfy aggregate in parallel synchronously…

release lock…de-aggregate, returning results in parallel to requesting threads.

©2009, 2013 Simon Kahan

Conclusion

• Concurrency often creates competition.• Competition indicates duplication in need.• Serializing, transacting, replicating -- may

only mitigate competition• Consider transforming competition to

collaboration, aligning common need to get there faster.

22