1
Process Scheduling
Chapter 5
2
Introduction Policy and implementation Objectives:
Fast response time High throughput (turnaround time) Avoidance of process starvation
Context switching is expensive Context is a snapshot of the values of the
general-purpose, memory management, and other special registers.
3
Type of Scheduling Long-term
Performed when new process is created. The decision to add to the pool of processes
to be executed. Medium-term
Swapping The decision to add to the number of
processes that are partially or fully in main memory
4
Types of Scheduling Short-term
Which ready process to execute next The decision as to which available
processes will be executed by the processor.
FCFS, Round-Robin, Shortest process next, Shortest remaining time
I/O The decision as to which process’s pending
I/O request shall be handled by available I/O device
5
Scheduling and Process State Transition
ExitRunningReady
BlockedBlocked,suspend
Ready,suspend
New
Long-termscheduling
Medium-termscheduling
Medium-termscheduling
Long-termscheduling
Short-termscheduling
6
Queuing Diagram for Scheduling
Processor
Medium-termscheduling
Medium-termscheduling
Batchjobs
Long-termscheduling
Time-out
Interactiveusers
Ready Queue
Ready, Suspend Queue
Blocked, Suspend Queue
Blocked QueueEvent
Occurs
Event Wait
ReleaseShort-termscheduling
7
5.2 Clock Interrupt Handling
Clock interrupt is the 2nd to the power-failure interrupt.
Tasks: Returns the hardware clock Update CPU usage statistics Performs scheduler-related functions Sends a SIGXCPU signal to the current process Updates the time-of-day and other related clocks. Handles callouts Wakes up system processes Handles alarms
8
5.2.1 Callouts Records a function that the kernel must
invoke at a later time. int to_ID = timeout(void(*fn), caddr_t arg, long delta) void untimeout(int to_ID)
Tasks: Retransmission of network packets Certain scheduler and memory management
functions Monitoring devices to avoid losing interrupts Polling devices that do not support interrupts
9
Callout in BSD UNIX
10
5.2.2 Alarms Real-time:
relates to the actual elapsed time, and notifies the process via a SIGALRM signal.
Profiling: Measures the amount of time the process has
been executing and uses the SIGPROF signal for notification.
Virtual-time: Monitors only the time spent by the process in
user mode and sends the SIGVTALRM signal.
11
5.3 Scheduler Goals The scheduler must ensure that the
system delivers acceptable performance to each application.
Different applications: Interactive: 50-150ms Batch: scientific computation Real-time: time-critical
12
5.4 Traditional UNIX Scheduling
To improve response times of interactive users, while ensuring that low-priority, background jobs do not starve.
Priority-based: User-process is preempted Kernel is strictly non-preempted
13
Priority Kernel:0-49, user: 50-127 proc fields:
p_pri: Current scheduling priority p_usrpri: User mode priority p_cpu: Measure of recent CPU usage p_nice: User-controllable nice factor
Kernel: Sleeping priority
14
User mode priority
Depends on two factors: Nice: 0-39 CPU usage
Time-sharing: equal opportunity decay factor: for SVR3 it is 1/2, for 4.3BSD:
decay = (2*load_average)/(2*load_average+1) p_cpu = p_cpu* decay
p_usrpri = PUSER + (p_cpu/4) +(2*p_nice)
15
Example : PUSER = 50
P1P_usrpri= 110P_cpu = 80Nice = 20
P2P_usrpri= 120P_cpu = 80Nice=25
T1
P1P_usrpri= 115P_cpu = 100Nice = 20
P2P_usrpri= 110P_cpu = 40Nice = 25
T2
Decay=1/2
P1P_usrpri= 102P_cpu = 50Nice = 20
P2P_usrpri= 115P_cpu = 60Nice = 25
T3
Decay=1/2
16
Scheduler Implementation 32 run queues: doubly linked list of proc
structures for runnable processes. whichqs: bitmask for each queue, “1”
means that there is a runnable process swtch(): context switch by p_addr
Saving part of u area (pcb) Loading the saved context.
VAX ffs & ffc : special instructions for context switch
17
18
Run Queue Manipulation
roundrobin(): for the processes with the same priority.
schedcpu(): recomputes the priority once per second Removes the process from the run queue; recomputes the priority Puts it back
19
When to switch context The current process blocks on a
resource or exits. The priority recomputation procedure
results in the priority of another process becoming greater than that of the current one( flag runrun set).
The current process, or an interrupt handler, wake up a higher-priority process
20
Analysis Not scale well No way to let a specific process to
occupy the CPU No guarantee to real-time applications Little control of priorities Kernel is non-preemptive, high-priority
runnable processes may have to wait for the kernel to relinquish the CPU
21
5.5 The SVR4 Scheduler Support a diverse range of applications including those
requiring real-time response Separate the scheduling policy from the mechanisms
that implement it Provide applications with greater control over their
priority and scheduling. Define a scheduling framework with a well-defined
interface to the kernel Allow new scheduling policies to be added in a modular
manner, including dynamic loading of scheduler implementations.
Limit the dispatch latency for time-critical applications.
22
The class-independent Layer Responsible for context switching, run
queue management, & preemption.
23
Preemption points Places of code where the kernel data is in a
steady state and is about to begin a long computation. In the pathname parsing routine lookuppn() In the open system call, before file creation In the memory subsystem, before freeing the
pages of a process.
Call PREEMPT() check kprunrun
24
Interface to the Scheduling Classes
3 fields of proc p_cid: class ID, an index into the global class table p_clfuncs: pointer to the classfuncs vector for the
class p_clproc: pointer to a class-dependent private
data structure
#define CL_SLEEP(procp, clprocp, …) (*(procp)-p->clfuncs->cl_sleep)(clprocp, …)
25
Interface cnt’d Entry
CL_TICK: the clock interrupt handler CL_FORK, CL_FORKRET: fork CL_ENTERCLASS, CL_EXITCLASS: enter, exit CL_SLEEP: sleep() CL_WAKEUP: wakeprocs()
Priorities: 0-59: time-sharing class 60-99: system priority 100-159: real-time class
26
27
The Time Sharing Class The default class for a process. Round-robin scheduling: Event-driven scheduling tsproc:
ts_timeleft: time remaining in the quantum ts_cpupri : system part of the priority ts_upri: user part of the priority(nice value) ts_umpri: user mode priority (ts_cpupri+ ts_upri) ts_dispwait: seconds since start the quantum
Dispatcher parameter table
28
Dispatcher parameter tableNew ts_cpupri to set when the quantum expires.
New ts_cpupri to set when returning to user mode after sleeping
Number of seconds to wait for quantum expiry before using ts_lwait.
Use instead of ts_tqexp if process took longer than ts_maxwait to use up its quantum.
29
The Real-Time Class 100-159: higher than any time-sharing
process. The real-time process must wait until the
current process is about to return to user mode or until it reaches a kernel preemption point.
Real-time processes require bounded dispatch latency and bounded response time.
The response time = the time for interrupt handler + dispatch latency.
30
31
The priocntl System Call Basic operations:
Changing the priority class of the process Setting ts_upri for time-sharing processes Resetting priority and quantum for real-time
processes Obtaining the current value of several scheduling
parameters priocntlset: perform the same operations on
a set of processes - a system/ a process group/ session/ a scheduling class/ a particular user/ having the same parent.
32
Adding a scheduling class Provide an implementation of each class-
dependent scheduling function Initialize a classfuncs vector to point to these
functions Provide an initialization function to perform
setup tasks such as allocating internal data structures
Add an entry for this class in the class table Rebuild the kernel
33
Analysis Provides flexible approach that allows the
addition of scheduling classes to a system. Event-driven scheduling favors I/O-bound &
interactive jobs over CPU-bounded ones. No good way for a time-sharing class process
to switch to a different one. priocntl is only used by the superuser.
It is difficult to tune the system properly for a mixed set of applications.
Solaris2.x improved SVR4
34
5.6 Solaris 2.x Enhancements
Multithreaded, symmetric-multiprocessing OS
Preemptive Kernel Fully preemptive Implement interrupts by special kernel
threads Interrupt threads always run at the highest
priority in the system.
35
Multiprocessor Support Processors can communicate by cross-
processor interrupt Per-processor data structure
Cpu_thread: currently running thread Cpu_dispthread: last selected to run Cpu_idle: idle thread Cpu_runrun: preemption flag used for time-
sharing threads Cpu_kprunrun: preemption flag set by real-time
threads Cpu_chosen_level: priority of thread that is
going to preempt the current thread
36
Multiprocessor schedulingT6 becomes runnable - preempts T3
37
38
Hidden Scheduling The kernel schedules the work without
considering the priority of the thread for which it is doing the work.
E.G. STREAMS services. Moving STREAMS processing into kernel
threads. Callouts handled by a special callout
thread (has max system priority)
39
Priority Inversion A situation where a lower-priority thread
holds a resource needed by a higher priority process, thereby blocking that higher-priority process.
40
Solution
Solved by priority inheritance or priority lending.
41
Priority inheritance must be transitive.
42
Implementation of Priority inheritance
An extra state to implement priority inheritance
A global priority & inherited priority for each thread
pi_willto(): traverses the synchronization chain and passes on the inherited priority of the calling thread.
pi_waive(): surrenders its inherited priority.
43
44
45
Limitations of Priority Inheritance
Can be implemented only when it is known which thread is going to free the resource, i.e. when the resource is held by a single, known thread.
For mutexes the owner is always known, so pr. Inh. can be used,
For semaphores, and conditions variables the owner is usually indeterminate, so pr. inh. is not used,
When a reader/writer lock is used for writing there is a single, known owner; It may be held however by multiple readers, so then there is no single owner.
46
Limitations of Priority Inheritance
Solaris defines an owner-of record, which is the first thread that obtained the read clock. If a higher priority writer blocks on this object, the owner-of record thread will inherit its priority. If there are other readers – they cannot inherit the writer’s priority, so the solution is limited.
While reducing the time a high-priority process must block, in the worst case however this time is still much greater than what is acceptable for many real-time applications.
Alternative solutions – ceiling protocol – it requires however a priori knowledge of all processes in the system and their resource requirements – possible in embedded applications.
47
Turnstiles
Restrict the sleep queue to threads blocked on a particular resource – limiting the time taken to process the queue
Threads are queued in order of their priority; To unlock turnstile:
signal – for single highest priority thread, broadcast – for all blocked threads.
48
49
Solaris scheduling evaluation
Suitable for multithreaded and many real-time applications for uni- and multiprocessors;
Still missing other desirable real-time features such as gang scheduling and deadline-driven scheduling
50
Linux Scheduling
Scheduling classes SCHED_FIFO: First-in-first-out real-time
threads SCHED_RR: Round-robin real-time
threads SCHED_OTHER: Other, non-real-time
threads Within each class multiple priorities
may be used
51
52
Non-Real-Time Scheduling
Linux 2.6 uses a new scheduler - the O(1) scheduler
Time to select the appropriate process and assign it to a processor is constant regardless of the load on the system or number of processors
Separate queue for each priority. Higher priority assigned lower number
53
Non-Real-Time Scheduling
Two queue structure – for active queues and for expired queues
All scheduling is done from the active queue structure;
when it becomes empty a switch is made with the expired queue structure and the scheduling continues
54
55
Calculating priorities
For non-real time priority is changed dynamically as a function of the task’s static priority and its execution behavior.
For real-time tasks priority is fixed
SCHED_FIFO tasks do not have assigned time-slices
SCHED_RR tasks have assigned time slices but they are never moved to the expired queue structure