+ All Categories
Home > Documents > Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben...

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben...

Date post: 19-Dec-2015
Category:
View: 231 times
Download: 3 times
Share this document with a friend
Popular Tags:
75
Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm By : Priya Limaye
Transcript
Page 1: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Tornado: Maximizing Locality and Concurrencyin a Shared Memory Multiprocessor Operating

System

Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm

By : Priya Limaye

Page 2: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Locality

• What is Locality of reference?

Page 3: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Locality

• What is Locality of reference?

sum = 0; for (int i = 0; i < 10; i ++) {

sum = sum + number[i]; }

Page 4: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Locality

• What is Locality of reference?

sum = 0; for (int i = 0; i < 10; i ++) {

sum = sum + number[i]; }

Temporal Locality Recently accessed data and instruction are likely to be

accessed in near future

Page 5: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Locality

• What is Locality of reference?

sum = 0; for (int i = 0; i < 10; i ++) {

sum = sum + number[i]; }

Spatial LocalityData and instructions close to recently accessed data and instructions are likely to be accessed in the near

future.

Page 6: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Locality

• What is Locality of reference?– Recently accessed data and instructions and

nearby data and instructions are likely to be accessed in the near future.

– Grab a larger chunk than you immediately need– Once you’ve grabbed a chunk, keep it

Page 7: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Locality in multiprocessor

• Computation depends on data local to processor– Each processor uses data from its own cache– Once data is brought in cache it stays there

Page 8: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Locality in multiprocessor

Memory

CPU

Cache

CPU

Cache

Counter

Page 9: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Shared

Memory

CPU CPU

0

Page 10: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Shared

Memory

CPU

0

CPU

0

Page 11: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Shared

Memory

CPU

1

CPU

1

Page 12: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Shared

Memory

CPU

1

CPU

1

1

Read : OK

Page 13: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Shared

Memory

CPU CPU

2

2

Invalidate

Page 14: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Comparing counter 1. Scales well with old

architecture2. Performs worse with shared

memory multiprocessor

Page 15: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Array

• Sharing requires moving back and forth between CPU Caches

• Split counter into array • Each CPU get its own counter

Page 16: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Array

Memory

CPU CPU

0 0

Page 17: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Array

Memory

CPU

1

CPU

1 0

Page 18: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Array

Memory

CPU

1

CPU

1

1 1

Page 19: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Array

Memory

CPU

1

CPU

1

1 1

CPU

2

Read Counter

Add All Counters

(1 + 1)

Page 20: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Array

• This solves the problem • What about performance?

Page 21: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Comparing counter Does not perform better than ‘shared counter’.

Page 22: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Array

• This solves the problem • What about performance?• What about false sharing?

Page 23: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: False Sharing

Memory

CPU CPU

0,0

Page 24: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: False Sharing

Memory

CPU

0,0

CPU

0,0

Page 25: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: False Sharing

Memory

CPU

0,0

CPU

0,0

0,0

Sharing

Page 26: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: False Sharing

Memory

CPU

1,0

CPU

1,0

Invalidate

Page 27: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: False Sharing

Memory

CPU

1,0

CPU

1,0

1,0

Sharing

Page 28: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: False Sharing

Memory

CPU CPU

1,1

1,1

Invalidate

Page 29: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Solution?

• Use padded array• Different elements map to different locations

Page 30: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Padded Array

Memory

CPU CPU

00

Page 31: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Padded Array

Memory

CPU

1

CPU

1

11

Update independent of each other

Page 32: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Comparing counter Works better

Page 33: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Locality in OS

• Serious performance impact• Difficult to retrofit• Tornado– Ground up design– Object Oriented approach – Natural locality

Page 34: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Tornado

• Object Oriented Approach• Clustered Objects• Protected Procedure Call• Semi-automatic garbage collection– Simplified locking protocol

Page 35: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Object Oriented Approach

Process 1

Process 2

Process Table

Page 36: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Object Oriented Approach

Process 1

Process 2

Process Table

Process 1

Lock

Page 37: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Object Oriented Approach

Process 1

Process 2

Process Table

Process 1

Lock

Process 2

Page 38: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Object Oriented Approach

Process 1

Process 2

Process Table

Process 1

Lock

Process 2

Lock

Page 39: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Object Oriented Approach

Class ProcessTableEntry{datalock

code}

Page 40: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Object Oriented Approach

• Each resource is represented by different object

• Requests to virtual resources handled independently– No shared data structure access– No shared locks

Page 41: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Object Oriented Approach

Process

Page Fault Exception

Page 42: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Object Oriented Approach

Process

Page Fault Exception

Region

Region

Page 43: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Object Oriented Approach

Process

Page Fault Exception

Region

Region

FCM

FCM

FCM File Cache Manager

Page 44: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Object Oriented Approach

HAT

Process

Region FCM

Region FCM

HAT Hardware Address TranslationFCM File Cache Manager

Search for responsible region

Page Fault Exception

Page 45: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Object Oriented Approach

Process

Page Fault Exception

Region

Region

FCM

FCM

COR

COR

DRAM

FCM File Cache ManagerCOR Cached Object RepresentativeDRAM Memory manager

Page 46: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Object Oriented Approach

• Multiple implementations for system objects• Dynamically change the objects used for

resource• Provides foundation for other Tornado

features

Page 47: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Clustered Objects

• Improve locality for widely shared objects• Appears as single object– Composed of multiple component objects

• Has representative ‘rep’ for processors– Defines degree of clustering

• Common clustered object reference for client

Page 48: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Clustered Objects

Page 49: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Clustered Objects : Implementation

Page 50: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Clustered Objects : Implementation

• A translation table per processor– Located at same virtual address– Pointer to rep

• Clustered object reference is just a pointer into the table

• ‘reps’ created on demand when first accessed– Special global miss handling object

Page 51: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Clustered Object

Counter – Clustered Object

CPU CPU

rep 1 rep 1

Object Reference

Page 52: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Clustered Object

Counter – Clustered Object

CPU

1

CPU

1

rep 1 rep 1

Object Reference

Page 53: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Clustered Object

Counter – Clustered Object

CPU

2

CPU

1

rep 2 rep 1

Object Reference

Update independent of each other

Page 54: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Clustered Objects

• Degree of clustering• Multiple reps per object – How to maintain consistency ?

• Coordination between reps– Shared memory– Remote PPCs

Page 55: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Clustered Object

Counter – Clustered Object

CPU

1

CPU

1

rep 1 rep 1

Object Reference

Page 56: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Clustered Object

rep 1 rep 1

Object Reference

Counter – Clustered Object

CPU

1

CPU

1

CPU

rep 1 rep 1

Read Counter

Page 57: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Counter: Clustered Object

rep 1 rep 1

Object Reference

Counter – Clustered Object

CPU

1

CPU

1

CPU

2

rep 1 rep 1

Add All Counters

(1 + 1)

Page 58: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Clustered Objects : Benefits

• Facilitates optimizations applied on multiprocessor e.g. replication and partitioning of data structure

• Preserves object-oriented design• Enables incremental optimizations• Can have several different implementations

Page 59: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Synchronization

• Two kinds of locking issues– Locking– Existence guarantees

Page 60: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Synchronization: Locking

• Encapsulate locking within individual objects• Uses clustered objects to limit contention• Uses spin-then-block locks– Highly efficient– Reduces cost of lock/unlock pair

Page 61: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Synchronization: Existence guarantees

• All references to an object protected by lock– Eliminates races where one thread is accessing the

object and another is deallcoating it• Complex global hierarchy of locks• Tornado - semi automatic garbage collection– Clustered object reference can be used any time– Eliminates needs for locks

Page 62: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Garbage Collection

• Distinguish between temporary references and persistent references– Temporary: clustered references held privately– Persistent: shared memory, can persist beyond

lifetime of a thread

Page 63: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Garbage Collection

• Remove all persistent references– Normal cleanup

• Remove all temporary references– Event driven kernel– Maintain counter for each processor – Delete object if counter is zero

• Destroy object itself

Page 64: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Garbage Collection

2 5 9

Process 1

Read

Page 65: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Garbage Collection

2 5 9

Process 1

Read

Counter ++

Page 66: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Garbage Collection

2 5 9

Process 1

Read

Counter = 1Process 2

Delete

Page 67: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Garbage Collection

2 5 9

Process 1

Read

Counter = 1Process 2

Delete

GC

If counter = 0

Page 68: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Garbage Collection

2 5 9

Process 1

Counter-- Process 2

Page 69: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Garbage Collection

2 9

Process 1

Counter = 0Process 2

GC

If counter = 0

Page 70: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Interprocess communication

• Uses Protected Procedure Calls• A call from client object to server object– Clustered object call that crosses protection

domain of client to server• Advantages– Client requests serviced on local processor– Client and server share processors similar to

handoff scheduling– Each client request has one thread in server

Page 71: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

PPC: Implementation

• On demand creation of server threads• Maintains list of worker threads• Implemented as a trap and some queue

manipulations– Dequeue worker thread from ready workers – Enqueue caller thread on the worker– Return from-trap to the server

• Registers are used to pass parameters

Page 72: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Performance

Page 73: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Performance: summary

• Strong basic design• Highly scalable• Locality and locking overhead are major

source of slowdown

Page 74: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Conclusion

• Object-oriented approach and clustered objects exploits locality and concurrency

• OO design has some overhead, but these are low compared to performance advantages

• Tornado scales extremely well and achieves high performance on shared-memory multiprocessors

Page 75: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

References

• http://web.cecs.pdx.edu/~walpole/class/cs510/papers/05.pdf

• Presentation by Holly Grimes, CS 533, Winter 2008

• http://en.wikipedia.org/wiki/Locality_of_reference


Recommended