+ All Categories
Home > Documents > Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos...

Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos...

Date post: 28-Dec-2015
Category:
Upload: victoria-brooks
View: 216 times
Download: 1 times
Share this document with a friend
25
Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos [email protected] Joint work with: D. Cederman, B. Chatterjee, N. Nguyen, M. Papatriantafilou, P. Tsigas Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden
Transcript
Page 1: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

Behavior of Synchronization Methods in Commonly Used Languages and Systems

Yiannis [email protected]

Joint work with:D. Cederman, B. Chatterjee, N. Nguyen,

M. Papatriantafilou, P. Tsigas

Distributed Computing and SystemsChalmers University of TechnologyGothenburg, Sweden

Page 2: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

2

Developing a multithreaded application…

Yiannis [email protected]

The boss wants .NET

The client wants speed…

(C++?)

Java is nice

Multicores everywhere

Page 3: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

3Yiannis [email protected]

The worker threads need to access data

Concurrent Data Structures

Then we need Synchronization.

Developing a multithreaded application…

Page 4: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

4

Implementation

Coarse Grain Locking

Fine Grain Locking

Test And Set

Array Locks

And more!

Yiannis [email protected]

Implementing Concurrent Data Structures

Performance Bottleneck

Page 5: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

5

Implementation

Coarse Grain Locking

Fine Grain Locking

Test And Set

Array Locks

And more!

Lock Free

Yiannis [email protected]

Implementing Concurrent Data Structures

Runtime System

Hardware platform

Which is the fastest/most

scalable?

Page 6: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

6

Implementing concurrent data structures

Yiannis [email protected]

Page 7: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

7

Problem Statement

• How the interplay of the above parameters and the different synchronization methods, affect the performance and the behavior of concurrent data structures.

Yiannis [email protected]

Page 8: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

Yiannis Nikolakopoulos [email protected]

8

Outline

Introduction

Experiment SetupHighlights of Study and ResultsConclusion

Page 9: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

9

Which data structures to study?

Represent different levels of contention:• Queue - 1 or 2 contention points• Hash table - multiple contention points

Yiannis [email protected]

Page 10: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

10

How do we choose implementation?

Possible criteria:• Framework dependencies• Programmability• “Good” performance

Yiannis [email protected]

Page 11: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

11

Interpreting “good”

• Throughput:The more operations completed per time unit the better.

• Is this enough?

Yiannis [email protected]

Page 12: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

12

Non-fairness

Page 13: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

13

• Throughput:Data structure operations completed per time unit.

What to measure?

Yiannis [email protected]

Operations by thread i

Average operations per

thread

Page 14: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

14

Implementation Parameters

Yiannis [email protected]

Programming Environments C++ Java C# (.NET, Mono)

SynchronizationMethods

TAS, TTAS, Lock-free, Array lock

PMutex, Lock-free memory

management

Reentrant, synchronized

lock construct,Mutex

NUMAArchitectures

Intel Nehalem, 2 x 6 core(24 HW threads)

AMD Bulldozer, 4 x 12 core(48 HW threads)

Do they influence fairness?

Page 15: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

15

Experiment Parameters

• Different levels of contention• Number of threads• Measured time intervals

Yiannis [email protected]

Page 16: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

Yiannis Nikolakopoulos [email protected]

16

Outline

• Queue– Fairness– Intel vs AMD– Throughput vs Fairness

• Hash Table– Intel vs AMD– Scalability

IntroductionExperiment Setup

Highlights of Study and ResultsConclusion

Page 17: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

Yiannis Nikolakopoulos [email protected]

17

Fairness can change along different time intervals24 Threads, High contention

Observations: Queue

0

0,2

0,4

0,6

0,8

1

400 600 800 1000 2000 3000 4000 5000 10000

Fairn

ess

Measurement interval (ms)

C# (.NET)

Intel - Lock-free AMD - Lock-free

Intel - TAS AMD - TAS

Page 18: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

Yiannis Nikolakopoulos [email protected]

18

Significantly different fairness behavior in different architectures24 Threads, High contention

Observations: Queue

0

0,2

0,4

0,6

0,8

1

400 600 800 1000 2000 3000 4000 5000 10000

Measurement interval (ms)

Java

Intel - TAS Intel - TTAS

Intel - Synchronized Intel - Lock-free

Fai

rnes

s

Page 19: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

Yiannis Nikolakopoulos [email protected]

19

Significantly different fairness behavior in different architectures24 Threads, High contention

Lock-free is less affected in this case

Observations: Queue

Fai

rnes

s

0

0,2

0,4

0,6

0,8

1

400 600 800 1000 2000 3000 4000 5000 10000

Fairn

ess

Measurement interval (ms)

Java

Intel - TAS AMD - TASIntel - TTAS AMD - TTASIntel - Synchronized AMD - SynchronizedIntel - Lock-free AMD - Lock-free

Page 20: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

Yiannis Nikolakopoulos [email protected]

20

Queue: Throughput vs Fairness

Fairness 0.6 s, Intel Throughput

0

0,2

0,4

0,6

0,8

1

2 4 6 8 12 24 48

Fairn

ess

Threads

C++

TTAS Lock-free PMutex

0

2

4

6

8

10

12

14

16

2 4 6 8 12 24 48

Ope

ratio

ns p

er m

s (t

hous

ands

)

Threads

C++

Page 21: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

21

Observations: Hash table

• Operations are distributed in different buckets• Things get interesting when

#threads > #buckets• Tradeoff between throughput and fairness– Different winners and losers– Contention is lowered in the linked list

components

Yiannis [email protected]

Page 22: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

Yiannis Nikolakopoulos [email protected]

22

0

0,2

0,4

0,6

0,8

1

400 600 800 1000 2000 3000 4000 5000 10000

Fairn

ess

Measurement interval (ms)

C# (Mono)

Intel - TAS Intel - TTAS Intel - Lock-free

Fairness differences in Hash table across architectures24 Threads, High contention

Observations: Hash table

Page 23: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

Yiannis Nikolakopoulos [email protected]

23

Fairness differences in Hash table across architectures24 Threads, High contention

Lock-free is again not affected

Observations: Hash table

0

0,2

0,4

0,6

0,8

1

400 600 800 1000 2000 3000 4000 5000 10000

Fairn

ess

Measurement interval (ms)

C# (Mono)

Intel - TAS AMD - TASIntel - TTAS AMD - TTASIntel - Lock-free AMD - Lock-free

Page 24: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

Yiannis Nikolakopoulos [email protected]

24

Observations: Hash tableIn C++, custom memory management and lock-free implementations excel in

scalability and performance.

0

5

10

15

20

25

30

2 4 6 8 12 24 48

Su

cess

ful o

per

atio

ns

pe

r m

s (t

ho

usa

nd

s)

Threads

C++

TAS TTAS Lock-free

Array Lock PMutex Lock-free, MM

0

1

2

3

4

5

6

2 4 6 8 12 24 48

Threads

Java

TAS TTAS Lock-freeArray Lock Reentrant Reentrant FairSynchronized

Page 25: Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos ioaniko@chalmers.se Joint work with: D. Cederman, B.

Yiannis Nikolakopoulos [email protected]

25

Conclusion

• Complex synchronization mechanisms (Pmutex, Reentrant lock) pay off in heavily contended hot spots

• Scalability via more complex, inherently parallel designs and implementations

• Tradeoff between throughput and fairness– LF Hash table – Reentrant lock vs Array Lock vs LF Queue

• Fairness can be heavily influenced by HW– Interesting exceptions

Which is the fastest/most

scalable?

Is fairness influenced by

NUMA?


Recommended