CSE506:Opera.ngSystems
Scheduling
DonPorter
1
CSE506:Opera.ngSystems
LogicalDiagram
MemoryManagement
CPUScheduler
User
Kernel
Hardware
BinaryFormats
Consistency
SystemCalls
Interrupts Disk Net
RCU FileSystem
DeviceDrivers
Networking Sync
MemoryAllocators Threads
Today’sLectureSwitchingtoCPU
scheduling
2
CSE506:Opera.ngSystems
Lecturegoals• Understandlow-levelbuildingblocksofascheduler• UnderstandcompeLngpolicygoals• UnderstandtheO(1)scheduler– CFSnextlecture
• FamiliaritywithstandardUnixschedulingAPIs
3
CSE506:Opera.ngSystems
Undergradreview• WhatiscooperaLvemulLtasking?– ProcessesvoluntarilyyieldCPUwhentheyaredone
• WhatispreempLvemulLtasking?– OSonlyletstasksrunforalimitedLme,thenforciblycontextswitchestheCPU
• Pros/cons?– CooperaLvegivesmorecontrol;somuchthatonetaskcanhogtheCPUforever
– PreempLvegivesOSmorecontrol,moreoverheads/complexity
4
CSE506:Opera.ngSystems
Wherecanwepreemptaprocess?• Inotherwords,whatarethelogicalpointsatwhichtheOScanregaincontroloftheCPU?
• Systemcalls– Before– During(morenextLmeonthis)– AXer
• Interrupts– Timerinterrupt–ensuresmaximumLmeslice
5
CSE506:Opera.ngSystems
(Linux)Terminology• mm_struct–representsanaddressspaceinkernel• task–representsathreadinthekernel– Ataskpointsto0or1mm_structs
• Kernelthreadsjust“borrow”previoustask’smm,astheyonlyexecuteinkerneladdressspace
– Manytaskscanpointtothesamemm_struct• MulL-threading
• Quantum–CPULmeslice
6
CSE506:Opera.ngSystems
Outline• Policygoals• Low-levelmechanisms• O(1)Scheduler• CPUtopologies• Schedulinginterfaces
7
CSE506:Opera.ngSystems
Policygoals• Fairness–everythinggetsafairshareoftheCPU• Real-Lmedeadlines– CPULmebeforeadeadlinemorevaluablethanLmeaXer
• Latencyvs.Throughput:Timeslicelengthmaeers!– GUIprogramsshouldfeelresponsive– CPU-boundjobswantlongLmeslices,beeerthroughput
• UserprioriLes– Virusscanningisnice,butIdon’twantitslowingthingsdown
8
CSE506:Opera.ngSystems
NoperfectsoluLon• OpLmizingmulLplevariables• LikememoryallocaLon,thisisbest-effort– Someworkloadsprefersomeschedulingstrategies
• Nonetheless,somesoluLonsaregenerallybeeerthanothers
9
CSE506:Opera.ngSystems
Contextswitching• Whatisit?– Swapouttheaddressspaceandrunningthread
• Addressspace:– Needtochangepagetables– Updatecr3registeronx86– SimplifiedbyconvenLonthatkernelisatsameaddressrangeinallprocesses
– Whatwouldbehardaboutmappingkernelindifferentplaces?
10
CSE506:Opera.ngSystems
Othercontextswitchingtasks• Swapoutotherregisterstate– Segments,debuggingregisters,MMX,etc.
• IfdeschedulingaprocessforthelastLme,reclaimitsmemory
• Switchthreadstacks
11
CSE506:Opera.ngSystems
Switchingthreads• ProgrammingabstracLon:
/*Dosomework*/schedule();/*Somethingelseruns*//*Domorework*/
12
CSE506:Opera.ngSystems
Howtoswitchstacks?• Storeregisterstateonthestackinawell-definedformat
• Carefullyupdatestackregisterstonewstack– Tricky:can’tusestack-basedstorageforthisstep!
13
CSE506:Opera.ngSystems
Example
Thread1(prev)
Thread2(next)
/* eax is next->thread_info.esp *//* push general-purpose regs*/push ebpmov esp, eaxpop ebp/* pop other regs */
ebp
esp
eax
regs
ebp
regs
ebp
14
CSE506:Opera.ngSystems
Weirdcodetowrite• Insideschedule(),youendupwithcodelike:switch_to(me, next, &last);/* possibly clean up last */• Wheredoeslastcomefrom?– Outputofswitch_to– Wrieenonmystackbypreviousthread(notme)!
15
CSE506:Opera.ngSystems
Howtocodethis?• Pickaregister(sayebx);beforecontextswitch,thisisapointertolast’slocaLononthestack
• Pickasecondregister(sayeax)tostoresthepointertothecurrentlyrunningtask(me)
• MakesuretopushebxaXereax• AXerswitchingstacks:– popebx/*eaxsLllpointstooldtask*/– mov(ebx),eax/*storeeaxatthelocaLonebxpointsto*/
– popeax/*Updateeaxtonewtask*/
16
CSE506:Opera.ngSystems
Outline• Policygoals• Low-levelmechanisms• O(1)Scheduler• CPUtopologies• Schedulinginterfaces
17
CSE506:Opera.ngSystems
Strawmanscheduler• Organizeallprocessesasasimplelist• Inschedule():– Pickfirstoneonlisttorunnext– Putsuspendedtaskattheendofthelist
• Problem?– Onlyallowsround-robinscheduling– Can’tprioriLzetasks
18
CSE506:Opera.ngSystems
Evenstraw-ierman• NaïveapproachtoprioriLes:– ScantheenLrelistoneachrun– Orperiodicallyreshufflethelist
• Problems:– Forking–wheredoeschildgo?– Whataboutifyouonlyusepartofyourquantum?
• E.g.,blockingI/O
19
CSE506:Opera.ngSystems
O(1)scheduler• Goal:decidewhotorunnext,independentofnumberofprocessesinsystem– SLllmaintainabilitytoprioriLzetasks,handleparLallyunusedquanta,etc
20
CSE506:Opera.ngSystems
O(1)Bookkeeping• runqueue:alistofrunnableprocesses– Blockedprocessesarenotonanyrunqueue– ArunqueuebelongstoaspecificCPU– Eachtaskisonexactlyonerunqueue
• Taskonlyscheduledonrunqueue’sCPUunlessmigrated
• 2*40*#CPUsrunqueues– 40dynamicprioritylevels(morelater)– 2setsofrunqueues–oneacLveandoneexpired
21
CSE506:Opera.ngSystems
O(1)DataStructuresAcLve Expired
139
138
137
100
101
...
139
138
137
100
101
...
22
CSE506:Opera.ngSystems
O(1)IntuiLon• Takethefirsttaskoffthelowest-numberedrunqueueonacLveset– Confusingly:alowerpriorityvaluemeanshigherpriority
• Whendone,putitonappropriaterunqueueonexpiredset
• OnceacLveiscompletelyempty,swapwhichsetofrunqueuesisacLveandexpired
• ConstantLme,sincefixednumberofqueuestocheck;onlytakefirstitemfromnon-emptyqueue
23
CSE506:Opera.ngSystems
O(1)ExampleAcLve Expired
139
138
137
100
101
...
139
138
137
100
101
...
Pickfirst,highest
prioritytasktorun
Movetoexpiredqueuewhenquantumexpires
24
CSE506:Opera.ngSystems
Whatnow?AcLve Expired
139
138
137
100
101
...
139
138
137
100
101
...
25
CSE506:Opera.ngSystems
BlockedTasks• WhatifaprogramblocksonI/O,sayforthedisk?– ItsLllhaspartofitsquantumleX– Notrunnable,sodon’twasteLmepuungitontheacLveorexpiredrunqueues
• Weneeda“waitqueue”associatedwitheachblockableevent– Disk,lock,pipe,networksocket,etc.
26
CSE506:Opera.ngSystems
BlockingExampleAcLve Expired
139
138
137
100
101
...
139
138
137
100
101
...
Disk
Blockondisk!
Processgoesondiskwaitqueue
27
CSE506:Opera.ngSystems
BlockedTasks,cont.• AblockedtaskismovedtoawaitqueueunLltheexpectedeventhappens– Nolongeronanyac.veorexpiredqueue!
• Diskexample:– AXerI/Ocompletes,interrupthandlermovestaskbacktoacLverunqueue
28
CSE506:Opera.ngSystems
Timeslicetracking• Ifaprocessblocksandthenbecomesrunnable,howdoweknowhowmuchLmeithadleX?
• EachtasktracksLcksleXin‘Lme_slice’field– OneachclockLck:current->time_slice--– IfLmeslicegoestozero,movetoexpiredqueue
• RefillLmeslice• Schedulesomeoneelse
– AnunblockedtaskcanusebalanceofLmeslice– ForkinghalvesLmeslicewithchild
29
CSE506:Opera.ngSystems
MoreonprioriLes• 100=highestpriority• 139=lowestpriority• 120=basepriority– “nice”value:user-specifiedadjustmenttobasepriority– Selfish(notnice)=-20(Iwanttogofirst)– Reallynice=+19(Iwillgolast)
30
CSE506:Opera.ngSystems
BaseLmeslice
• “Higher”prioritytasksgetlongerLmeslices– Andrunfirst
time =(140− prio)*20ms prio <120
(140− prio)*5ms prio ≥120
#
$%
&%
31
CSE506:Opera.ngSystems
Goal:ResponsiveUIs• MostGUIprogramsareI/Oboundontheuser– UnlikelytouseenLreLmeslice
• UsersgetannoyedwhentheytypeakeyandittakesalongLmetoappear
• Idea:giveUIprogramsapriorityboost– Gotofrontofline,runbriefly,blockonI/Oagain
• WhichonesaretheUIprograms?
32
CSE506:Opera.ngSystems
Idea:InferfromsleepLme• BydefiniLon,I/OboundapplicaLonsspendmostoftheirLmewaiLngonI/O
• WecanmonitorI/OwaitLmeandinferwhichprogramsareGUI(anddiskintensive)
• GivetheseapplicaLonsapriorityboost• Notethatthisbehaviorcanbedynamic– Ex:GUIconfiguresDVDripping,thenitisCPU-bound– Schedulingshouldmatchprogramphases
33
CSE506:Opera.ngSystems
Dynamicprioritydynamicpriority=max(100,min(sta.cpriority−bonus+5,139))• BonusiscalculatedbasedonsleepLme• Dynamicprioritydeterminesatasks’runqueue• ThisisaheurisLctobalancecompeLnggoalsofCPUthroughputandlatencyindealingwithinfrequentI/O– MaynotbeopLmal
34
CSE506:Opera.ngSystems
DynamicPriorityinO(1)Scheduler• Important:Therunqueueaprocessgoesinisdeterminedbythedynamicpriority,notthestaLcpriority– DynamicpriorityismostlydeterminedbyLmespentwaiLng,toboostUIresponsiveness
• Nicevaluesinfluencesta.cpriority(directly)– StaLcpriorityisastarLngpointfordynamicpriority– Nomaeerhow“nice”youare(oraren’t),youcan’tboostyour“bonus”withoutblockingonawaitqueue!
35
CSE506:Opera.ngSystems
Rebalancingtasks• Asdescribed,onceataskendsupinoneCPU’srunqueue,itstaysonthatCPUforever
36
CSE506:Opera.ngSystems
RebalancingCPU0 CPU1
......
CPU1NeedsMoreWork!
37
CSE506:Opera.ngSystems
Rebalancingtasks• Asdescribed,onceataskendsupinoneCPU’srunqueue,itstaysonthatCPUforever
• WhatifalltheprocessesonCPU0exit,andalloftheprocessesonCPU1forkmorechildren?
• Weneedtoperiodicallyrebalance• Balanceoverheadsagainstbenefits– Figuringoutwheretomovetasksisn’tfree
38
CSE506:Opera.ngSystems
Idea:IdleCPUsrebalance• IfaCPUisoutofrunnabletasks,itshouldtakeloadfrombusyCPUs– BusyCPUsshouldn’tloseLmefindingidleCPUstotaketheirworkifpossible
• TheremaynotbeanyidleCPUs– OverheadtofigureoutwhetherotheridleCPUsexist– JusthavebusyCPUsrebalancemuchlessfrequently
39
CSE506:Opera.ngSystems
Averageload• HowdowemeasurehowbusyaCPUis?• AveragenumberofrunnabletasksoverLme• Availablein/proc/loadavg
40
CSE506:Opera.ngSystems
Rebalancingstrategy• ReadtheloadavgofeachCPU• Findtheonewiththehighestloadavg• (Handwaving)Figureouthowmanytaskswecouldtake– Ifworthit,locktheCPU’srunqueuesandtakethem– Ifnot,tryagainlater
41
CSE506:Opera.ngSystems
Whynotrebalance?• IntuiLon:IfthingsrunsloweronanotherCPU• Whymightthishappen?– NUMA(Non-UniformMemoryAccess)– Hyper-threading– MulL-corecachebehavior
• Vs:SymmetricMulL-Processor(SMP)–performanceonallCPUsisbasicallythesame
42
CSE506:Opera.ngSystems
SMP
• AllCPUssimilar,equally“close”tomemory
CPU0 CPU1 CPU2 CPU3
Memory
43
CSE506:Opera.ngSystems
NUMA
• WanttokeepexecuLonnearmemory;highermigraLoncosts
CPU0 CPU1 CPU2 CPU3
MemoryMemory
Node Node
44
CSE506:Opera.ngSystems
SchedulingDomains• GeneralabstracLonforCPUtopology• “Tree”ofCPUs– Eachleafnodecontainsagroupof“close”CPUs
• WhenanidleCPUrebalances,itstartsatleafnodeandworksuptotheroot– Mostrebalancingwithintheleaf– Higherthresholdtorebalanceacrossaparent
45
CSE506:Opera.ngSystems
SMPSchedulingDomain
CPU0 CPU1 CPU2 CPU3
Flat,allCPUSequivalent!
46
CSE506:Opera.ngSystems
NUMASchedulingDomains
CPU0 CPU1 CPU2 CPU3
CPU0startsrebalancingherefirst
Higherthresholdtomovetosibling/parent
47
CSE506:Opera.ngSystems
Hyper-threading• PrecursortomulL-core– AfewmoretransistorsthanIntelknewwhattodowith,butnotenoughtobuildasecondcoreonachipyet
• Duplicatearchitecturalstate(registers,etc),butnotexecuLonresources(ALU,floaLngpoint,etc)
• OSview:2logicalCPUs• CPU:pipelinebubbleinone“CPU”canbefilledwithoperaLonsfromanother;yieldinghigheruLlizaLon
48
CSE506:Opera.ngSystems
Hyper-threadedscheduling• Imagine2hyper-threadedCPUs– 4LogicalCPUs– Butonly2CPUs-worthofpower
• SupposeIhave2tasks– Theywilldomuchbeeeron2differentphysicalCPUsthansharingonephysicalCPU
• Theywillalsocontendforspaceinthecache– Lessofaproblemforthreadsinsameprogram.Why?
49
CSE506:Opera.ngSystems
NUMA+HyperthreadingDomains
CPU0
CPU1
NUMADOMAIN1 NUMADOMAIN1
CPU2
CPU3
CPU4
CPU5
CPU6
CPU7
LogicalCPU
PhysicalCPU
isascheddomain
50
CSE506:Opera.ngSystems
MulL-core• Morelevelsofcaches• MigraLonamongCPUssharingacachepreferable– Why?– Morelikelytokeepdataincache
• Schedulingdomainsbasedonsharedcaches– E.g.,coresonsamechipareinonedomain
51
CSE506:Opera.ngSystems
Outline• Policygoals• Low-levelmechanisms• O(1)Scheduler• CPUtopologies• Schedulinginterfaces
52
CSE506:Opera.ngSystems
SeungprioriLes• setpriority(which,who,niceval)andgetpriority()– Which:process,processgroup,oruserid– PID,PGID,orUID– Niceval:-20to+19(recallearlier)
• nice(niceval)– Historicalinterface(backwardscompaLble)– Equivalentto:
• setpriority(PRIO_PROCESS,getpid(),niceval)
53
CSE506:Opera.ngSystems
SchedulerAffinity• sched_setaffinityandsched_getaffinity• CanspecifyabitmapofCPUsonwhichthiscanbescheduled– Beeernotbe0!
• Usefulforbenchmarking:ensureeachthreadonadedicatedCPU
54
CSE506:Opera.ngSystems
yield• Movesarunnabletasktotheexpiredrunqueue– Unlessreal-Lme(morelater),thenjustmovetotheendoftheacLverunqueue
• Severalotherreal-LmerelatedAPIs
55
CSE506:Opera.ngSystems
Summary• UnderstandcompeLngschedulinggoals• Understandhowcontextswitchingimplemented• UnderstandO(1)scheduler+rebalancing• UnderstandvariousCPUtopologiesandschedulingdomains
• Schedulingsystemcalls
56