+ All Categories
Home > Documents > Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster...

Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster...

Date post: 03-Jun-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
38
1/38 Introduction Concepts Thread Scheduling New Schedulers Conclusion References Thread Scheduler Efficiency Improvements for Multicore Systems Daniel Collin Frazier Division of Science and Mathematics University of Minnesota, Morris Morris, Minnesota, USA 18 November 2017 UMM, Minnesota
Transcript
Page 1: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

1/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Thread Scheduler Efficiency Improvementsfor Multicore Systems

Daniel Collin Frazier

Division of Science and MathematicsUniversity of Minnesota, Morris

Morris, Minnesota, USA

18 November 2017UMM, Minnesota

Page 2: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

2/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Introduction

• Thread scheduler: system component that manages theprocessing programs receive

• Always running, so it must be efficient

• Pre-2000 single-core era, scheduling was easy• Led majority of Linux community to believe problem solved

Page 3: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

3/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

“...not very many things ... have aged as well as thescheduler. Which is just another proof that schedulingis easy.”

Linus, Torvals, 2001 [1]

Page 4: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

4/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Introduction

• Popular hardware changed rapidly throughout the 2000s

• Increasing affordability and adoption of multicore systems

• Hardware changes complicated thread schedulerimplementation

• Complexity led to bugs that have been present for adecade

Page 5: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

5/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

A Decade of Wasted Cores

• In A Decade of Waster Cores• Lozi et al. Found four bugs in Linux

thread scheduler, fixed them

• Previously undetected, required thedevelopment of new tools

https://goo.gl/3wsfVU

Page 6: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

6/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

A Decade of Wasted Cores

• Lozi et al. compared performance benchmarks ran onbuggy and fixed Linux scheduler implementations

• Below are average performance improvements

Bug title ImprovementThe Scheduling Group Construction bug 5.96xThe Group Imbalance bug 1.05xThe Overload-on-Wakeup bug 1.13xThe Missing Scheduling Domains bug 29.68x

from Lozi et al. [1]

Page 7: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

7/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Outline

Concepts

Thread Scheduling on Linux

Two New Schedulers

Conclusion

Page 8: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

8/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Outline

ConceptsThreadsSynchronicity and LocksThread State and Cache

Thread Scheduling on Linux

Two New Schedulers

Conclusion

Page 9: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

9/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Processors

• Responsible for executing code

• Contain a number of cores:• Single-core processor (one processing unit)• Multicore processor (two or more processing units)• Manycore processor (~20 or more processing units)

• Multiple cores allows processor to perform multiple tasksconcurrently on each core

Page 10: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

10/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Multithreading Example

• Imagine you’re usingphotoshop, but assumeone thread

• Say you load a large imageand perform an expensivefilter operation

main()thread

FilterOperation

Page 11: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

11/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Threading

• Threads allow programs torun multiple independenttasks concurrently

• Useful for programs:• with long,

mostly-independentcomputations

• with a graphical interface

main()thread

spawns

Windowthread

spawns

Eventthread(e.g. customButton-Press logic)

Example GUI Program.Three threads are created withinone process

Page 12: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

12/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

What if I ask you all a question right now?

Page 13: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

13/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

What if I ask you all a question right now?

If all answered at once, chaos!

Raise hands to control who gets to talk, this is like locks!

Page 14: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

14/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Synchronicity and Locks

• Control achieved by employing locks

• Locks secure objects or data shared between threads sothat only one thread can read and write to it at one time

• When a thread locks a lock it acquires the lock• When a thread unlocks a lock it releases the lock

Page 15: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

15/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Process and Thread State

• Process State

Resources shared amongst its multiple threads

• Thread StateScheduler uses this information to pause and resume athread’s execution

• Note: Process states are much heavier than thread states

Page 16: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

16/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Context Switching

• The scheduler switches active threads on cores by savingand restoring thread and processor state information.

• These switches are called context switches

• Process context switches are more expensive

Page 17: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

17/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Cache

• Local copy of datadesigned for fast retrieval

• Hierarchical structure

• Placement relative to core:

• on• inside of• outside

CPUL3L2 Bus

RAML1 L1

DiskCtrlHDD

Color represents distance from CPU

Figure: Distance of various formsof memory from CPU

Page 18: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

18/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Cache

• Locality: Speed of memoryread and writes decreaseas distance from CPUincreases

• Cache is the fastest formof memory

• Cache coherence: Anychanges to memoryshared by two caches mustpropogate to the other tomaintain correctness

CPUL3L2 Bus

RAML1 L1

DiskCtrlHDD

Color represents distance from CPU

Figure: Distance of various formsof memory from CPU

Page 19: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

19/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Outline

Concepts

Thread Scheduling on LinuxCompletely Fair Scheduler

Two New Schedulers

Conclusion

Page 20: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

20/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Completely Fair Scheduler (CFS)

• Default Linux thread scheduler (there are others)• Handles which threads are executed at what times on this

core• Spend a fair amount of runtime on all threads

• Designed with responsiveness and fairness in mind.

Page 21: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

21/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Single-core Completely Fair Scheduler (CFS)

• Runs on one core• Ensure all threads run at least once within arbitrary interval

of CPU cycles• Distribute timeslices (max CPU cycles) among threads• Threads with higher priority (weights) get larger timeslices

CPU Core(1000 cycle interval)

Thread 1weight=0.2

Thread 2weight=0.6

Thread 3weight=0.2

200 200600Timeslices:(max runtimein interval)

CFS

Page 22: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

22/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

CFS Runqueue

• Data structure containing threads• Priority queue: sorts threads by number of cycles

consumed in current interval• When thread reaches its maximum cycles, preempted

Page 23: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

23/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Runqueues on Multiple Cores

• Process states heavier than thread states, so contextswitches between threads of different processes are moreexpensive

• If cores shared a runqueue, access and changes need tobe synchronous and cache-coherent

• Would slow the system to crawl• So each core has its own runqueue and threads

• Load on each of the core’s runqueues must stay balanced• CFS periodically runs a load-balancing algorithm

Page 24: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

24/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Outline

Concepts

Thread Scheduling on Linux

Two New SchedulersShufflerFLSCHED

Conclusion

Page 25: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

25/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Shuffler and FLSCHED

• Both schedulers aim to solve the same problem, but fordifferent architectures

• Problem: Adding more threads to certain parallelcomputing applications on CFS makes the applicationoperate slower rather than faster!

• Architectures:

Shuffler → multiprocessor multicoreFLSCHED → single-chip manycore processor

Page 26: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

26/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Shuffler

• Researchers Kumar et al. measured lock times ofmassively parallel applications

• Lock times: amount of time process spends waiting forlocks

• Found that massively parallel shared-memory programsexperienced high lock times

Page 27: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

27/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Lock Contention

• When two threads repeatedly contend for one lock, boththreads are frequently waiting for each other to release

• If the two threads are located on separate processors, thisproblem is compounded by reduced locality

• Further, when both of the threads repeatedly modify thedata corresponding to their lock, the cache of bothprocessors must continue to update each other

• High lock contention

Page 28: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

28/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Shuffler

• CFS not mindful of lock contention or parent processeswhen choosing cores for threads

• Kumar et al. wanted to create a scheduler that did!• Used Solaris scheduler as base

• Strategy: Migrate threads whose locks are contending sothey are near each other

• How do you determine which threads’ locks arecontending?

• Contending threads have similar lock acquisition times

Page 29: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

29/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

input : N: Number of threads;C: Number of Processors.

repeati. Monitor Threads – sample lock times of N threads.if lock times exceed threshold then

ii. Form Thread Groups – sort threads according tolock times and divide them into C groups.

iii. Perform Shuffling – shuffle threads to establishnewly computed thread groups.

enduntil application terminates;

Page 30: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

30/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Shuffler Performance• Kumar et al. compared the efficiency of Shuffler vs Solaris

scheduler• Used programs from four benchmarks to gather data

Program % ImprovementBT 54.1%SC 29.0%RX 19.0%JB 14.0%OC 13.4%AL 13.2%AS 13.0%PB 13.0%VL 12.8%FS 12.0%

Program % ImprovementFM 10.7%AM 9.3%GL 9.1%EQ 9.0%MG 8.8%FA 6.0%

WW 5.2%SM 4.7%GA 4.0%RT 4.0%

from Kumar et al. [2]

Page 31: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

31/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

FLSCHED: The Lockless Monster

• Designed by Jo et al. with manycore processors in mind,particularly the Xeon Phi

• The Xeon and Xeon Phi have 24 to 76 cores.

• One processor, so cache looks different than system thatwould use Shuffler

• With such parallelism, small pauses significantly reduceefficiency

• In the CFS, pauses come from locks necessitated by itsfeatures and requirements

Page 32: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

32/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

One requirement to rule them all: EFFICIENCY!

• FLSCHED Improves efficiency by removing all locks fromthe scheduler implementation

• Gutted requirements and features of CFS and simplified

• Requirements they removed were Fairness andResponsiveness

• Context switches requests delayed to reduce chanceanother thread steals the core in hope thread reactivates

• Threads never forcefully preempt, instead join runqueuewith high priority

• Removed scheduler statistics reporting capabilities

Page 33: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

33/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

FLSCHED Performance• Used 8 of 9 programs the NAS Parallel Benchmark (NPB)

Operations per second (OPS) relative to CFS, from Jo et al. [1]

Page 34: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

34/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Outline

Concepts

Thread Scheduling on Linux

Two New Schedulers

Conclusion

Page 35: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

35/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Conclusion

• Thread scheduling is an important problem and becomesmore relevant as number of cores increase

• System architecture can have surprising complexity in itseffect on efficiency

• CFS tries to be the go-to scheduler for all problems, butcan’t

• Does well, but when you need some extra push there arepowerful alternatives available

Page 36: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

36/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Thanks!

Thank you for your time and attention!

Questions?

Page 37: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

37/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

References

Jo, Heeseung and Kang, Woonhak and Min, Changwooand Kim, Taesoo.FLsched: A lockless and lightweight approach to OSscheduler for Xeon Phi.In Proceedings of the 8th Asia-Pacific Workshop onSystems 3 APSys ’17, pages 8:1–8:8, Mumbai, India, 2017.ACM.

K. Kumar and P. Rajiv and G. Laxmi and N. BhuyanShuffling: A framework for lock contention aware threadscheduling for multicore multiprocessor systemsIn 2014 23rd International Conference on ParallelArchitecture and Compilation Techniques 3 PACT , pages289–300, 2014.

Page 38: Thread Scheduler Efficiency Improvements for …...A Decade of Wasted Cores In A Decade of Waster Cores Lozi et al. Found four bugs in Linux thread scheduler, fixed them Previously

38/38

Introduction Concepts Thread Scheduling New Schedulers Conclusion References

Lozi, Jean-Pierre and Lepers, Baptiste and Funston, Justinand Gaud, Fabien and Quéma, Vivien and Fedorova,AlexandraThe Linux Scheduler: A Decade of Wasted CoresIn Proceedings of the Eleventh European Conference onComputer Systems EuroSys ’16, pages 1:1–1:16, London,United Kingdom, 2016. ACM.


Recommended