Post on 12-Jul-2015
transcript
Operating Systems I PT / FF 14
Process Concept
• Classically, processes are executed programs that have ...
• Resource Ownership
• Process includes a virtual address space to hold the process image
• Operating system prevents unwanted interference between processes
• Scheduling/Execution
• Process follows an execution path that may be interleaved with other processes
• Process has an execution state (Running, Ready, etc.) and a dispatching priority and is scheduled and dispatched by the operating system
• Today, the unit of dispatching is referred to as a thread or lightweight process
• The unit of resource ownership remains the process or task
2
Operating Systems I PT / FF 14
Single and Multithreaded Processes
3
code% data% files%
registers% stack%
Thread'
code% data% files%
registers%
stack%
Thread'
stack%
registers%
stack%
registers%
Thread' Thread'
Operating Systems I PT / FF 14
Scheduling
• Assign activities (processes / threads) to processor(s)
• System objectives to be considered; Response time, throughput, efficiency, ...
• Long-term scheduling: Decision to add a process to the pool of executed processes
• Example: Transition of a new process into „ready“ state; batch processing queue
• Medium-term scheduling: Decision to load process into memory for execution
• Example: Resume suspended processes from backing store
• Short-term scheduling: Decision which particular ready process will be executed
• Example: Move a process from „ready“ state into „running“ state
• I/O scheduling: Decision which process is allowed to perform device activities
• Overall goal is to minimize queuing time for all processes
4
Operating Systems I PT / FF 14
Short-Term Scheduler
• In cooperation with the dispatcher as part of the core operating system function
• Frequent fine-grained decision about what runs next, happens on:
• Clock interrupt (regular scheduling interval)
• I/O interrupts
• Operating system calls
• Signals
• Any event that blocks the currently running process / thread
• Needs decision criteria to choose the next
• User perspective vs. system perspective
5
Operating Systems I PT / FF 14
CPU and I/O Bursts
• Processes / threads can be described as either:
• I/O-bound – spends more time doing I/O than computations, many short CPU bursts
• Compute-bound – spends more time doing computations, few very long CPU bursts
• Behavior can change during run time
• Many short CPU bursts are typical
6
!!!!!!!!…!load!val!inc!val!read!file!
wait!for!I/O!
inc!count!add!data,!val!write!file!
wait!for!I/O!
load!val!inc!val!read!from!file!
wait!for!I/O!
…!
CPU!burst!
CPU!burst!
CPU!burst!
I/O!burst!
I/O!burst!
I/O!burst!
Burst&dura)on&(msec)&0& 10& 20& 30&
distrib
u)on
&
Operating Systems I PT / FF 14
Short-Term Scheduler
• Scheduling criteria
• CPU utilization - Keep the CPU as busy as possible
• Throughput - Number of processes that complete their execution per time unit
• Turnaround time - Amount of time to fully execute a particular process
• Waiting time - Amount of time a process has been waiting in the ready queue
• Response time - Amount of time it takes from when a request was submitted until the first response is produced
• Response is not necessarily valuable output, can also be just a wait indicator
7
Operating Systems I PT / FF 14
Short-Term Scheduling Criteria
8
User-oriented System-oriented
Performance
Turnaround time (submission to completion)
Response time
(interactive)
Deadlines
Throughput (#process completions)
Resource utilization
OtherPredictability
(regardless of system load)
Fairness (no starvation)
Priority enforcement
Resource balancing
Operating Systems I PT / FF 14
Short-Term Scheduling: Multiprocessors
• Load Sharing - Processes are not assigned to a particular processor, global queue
• Central data structure with mutual exclusion may become a bottleneck
• Caching may become ineffective
• Optimized version became default in all standard operating systems
• Gang Scheduling - Set of related threads is scheduled to run on a set of processors at the same time on a one-to-one base
• Mainly beneficial for parallel applications
• Dedicated Processor Assignment - Implicit scheduling by the fixed assignment of threads to processors until completion
• Sacrifices processor utilization for an exact metric of performance
• Dynamic Scheduling - Number of threads in a process can be altered by the scheduler (research approach)
9
Operating Systems I PT / FF 14
Scheduling Function and Decision Mode
• Selection function for scheduling determines which process, among ready processes, is selected next for execution
• May be based on priority, resource requirements, or the execution characteristics
• If based on execution characteristics, then important quantities are:
• w = time spent in system so far, waiting
• e = time spent in execution so far
• s = total time required by the process, including e (user estimation)
• Decision mode specifies the kind of scheduler
• Preemptive: Currently running process is interrupted and moved to ready queue
• Non-preemptive: Process runs until termination or intentional blocking (e.g. I/O)
10
Operating Systems I PT / FF 14
Round Robin
• Uses preemption based on a clock interrupt, manage „ready“ processes in a queue
• Also known as time slicing - each process get‘s a time quantum
• Particularly effective in time-sharing system or transaction processing system
• Compute-bound processes are favored over I/O bound processes in mixed load
• I/O wait delays the move-back to the „ready“ list
• Better for short jobs in comparison to FCFS
• Very short quantum brings overhead penalty, typical lower limit of 10ms
11
Operating Systems I PT / FF 14
Round Robin
12
• Quantum should be slightly longer than the time required to complete a typical request or function
• Quantum higher than the longest request processing time leads to pure FCFS
Thread'execu+on'+me:'15'
0' 15'
15'
15'
0'
0'
10'
10'
quantum'context'switches'
20'
10'
1'
0'
1'
14'
Operating Systems I PT / FF 14
Round Robin with I/O Bursts
13
Thread Burst Time T1 23 T2 7 T3 38 T4 14
T1! T2! T3! T4! T1! T3! T4! T1! T3! T3!
0! 10! 17! 27! 37! 47! 57! 61! 64! 74! 82!
Operating Systems I PT / FF 14
Multilevel Queue Scheduling
• Ready queue is partitioned into separate queues
• Real-time (system, multimedia) and Interactive
• Queues may have different scheduling algorithms
• Real-Time – Round Robin
• Interactive – Round Robin + priority-elevation + quantum stretching
• Scheduling must be done between the queues
• Fixed priority scheduling (i.e., serve all real-time threads then from interactive)
• Possibility of starvation
• Time slice – each queue gets a certain amount of CPU time which it can schedule
• Established approach in Solaris operating system family
14
Operating Systems I PT / FF 14
Example: Windows
• Windows dispatcher
• Gives control to the thread selected by the short-term scheduler
• Switching context, switching to user mode
• Jumping to the proper location in the user program to restart that program
• Windows has no mid-term or long-term scheduler
• Dispatch latency – time it takes for the dispatcher to stop one and start another
• Windows scheduling is event-driven - no central dispatcher module in the kernel
• Starvation problem
• Unix: Decreasing priority + aging
• VMS / Windows: Priority elevation
15
Operating Systems I PT / FF 14
Windows Scheduler
• Priority-driven preemptive scheduling system
• Highest-priority runnable thread always runs
• Thread runs for time amount of quantum
• No single scheduler - event-based scheduling code spread across the kernel
• Dispatcher routine triggered by the following event
• Thread becomes ready for execution
• Thread leaves running state (quantum expires, wait state)
• Thread‘s priority changes (system call / NT activity)
• Processor affinity of a running thread changes
16
Operating Systems I PT / FF 14
Windows Scheduling Principles
• 32 priority levels
• Threads within same priority are scheduled following round robin policy
• Realtime priorities (i.e.; > 15) are assigned statically to threads
• Non-realtime priorities are adjusted dynamically
• Priority elevation as response to certain I/O and dispatch events
• Quantum stretching to optimize responsiveness
• In multiprocessor systems, affinity mask is considered
• No attempt to share processors fairly among processes, only among threads
17
6
N-.#-0K((L6.-/$(G."%.",4(d-<-0'(
$%&'()*+,-.)/&+)0)+1&
$2&0*(3*4+)&+)0)+1&
51)6&47&8)("&9*:)&#;()*6&
51)6&47&36+)&#;()*6<1=&
>$&
$%&
&?&
&!"
$2&
&$&
Operating Systems I PT / FF 14
Multiprocessor Systems
• Threads can run on any CPU, unless specified otherwise
• Scheduling tries to keep threads on same CPU (soft affinity)
• Threads can be bound to particular CPUs (hard affinity)
• SetThreadAffinityMask, SetProcessAffinityMask, SetInformationJobObject
• Bit mask where each bit corresponds to a CPU number
• Thread affinity mask must be a subset of process affinity mask, which must be a subset of the active processor mask and may be derived from the image affinity mask, if given
• The scheduling code runs fully distributed, no ,master‘ processor
• Any processor can interrupt another processor to schedule a thread
• Scheduling database as per-CPU data structure of ready queues
18
Operating Systems I PT / FF 14
Multiprocessor Systems
• Every thread has an ideal processor
• System selects ideal processor for the first thread of a fresh process (round robin across CPUs)
• Next thread gets next CPU relative to the process seed
• SetThreadIdealProcessor (HANDLE hThread, DWORD dwIdealProcessor)
• Hard affinity changes update ideal processor settings
• Used in selecting where a thread runs next
• Hyperthreading: GetLogicalProcessorInformation()
• NUMA systems: GetProcessAffinityMask(), GetNumaProcessorNode(), GetNumaHighestNodeNumber(), GetNumaNodeProcessorMask()
19
Operating Systems I PT / FF 14
Windows Scheduling Principles
• No central scheduler, i.e. there is no routine or thread called “the scheduler”
• Routines are called whenever events change the ready state of a thread
• Things that cause scheduling events include:
• Interval timer interrupts (for quantum end)
• Interval timer interrupts (for timed wait completion)
• Other hardware interrupts (for I/O wait completion)
• Thread changes the state of a waitable object upon which thread(s) are waiting
• A thread waits on one or more dispatcher objects
• A thread priority is changed
• Based on doubly-linked lists (queues) of ready threads
20
Operating Systems I PT / FF 14
Windows Scheduling Principles
• Windows API point of view
• Processes are given a priority class upon creation ( Idle, Normal, High, Realtime )
• Windows 2000 added “Above normal” and “Below normal”
• Threads have a relative priority within the class ( Idle, Lowest, Below_Normal, Normal, Above_Normal, Highest, and Time_Critical )
• Different API functions to influence scheduling ( Get/SetPriorityClass, Get/SetThreadPriority, Get/SetProcessAffinityMask, SetThreadAffinityMask, SetThreadIdealProcessor, Suspend/ResumeThread )
• Kernel point of view
• Threads have priorities 0 through 31, scheduled accordingly
• Process priority class is not used to make scheduling decisions
21
Operating Systems I PT / FF 14
Special Thread Priorities
• One idle thread per CPU
• When no threads want to run, idle thread is executed
• Appears to have priority zero, but actually runs “below” priority 0
• Provides CPU idle time accounting - unused clock ticks are charged to idle thread
• Loop:
• Calls HAL to allow for power management, processes DPC list
• Dispatches to a thread if selected
• One zero page thread per system
• Zeroes pages of memory in anticipation of “demand zero” page faults
• Runs at priority zero (lower than reachable with Windows API) in the „system“ process
24
Operating Systems I PT / FF 14
Thread Scheduling States (2000, XP)
25
Ready&(1)& Running&(2)&
Wai0ng&(5)&
Ready&=&thread&eligible&to&be&scheduled&to&run&Standby&=&thread&is&selected&to&run&on&CPU&>=&Vista:&Addi0onal&‘Deferred&ready’&state&
voluntary&switch&
preemp0on,&&quantum&end&
Init&(0)&
Terminate&(4)&
Transi0on&(6)&
wait&resolved&aRer&kernel&stack&made&&&&&pageable&
Standby&(3)&preempt&
Operating Systems I PT / FF 14
Thread Scheduling States (2000, XP)
26
• Transition:
• Thread was in a wait entered from user mode for 12 seconds or more
• System was short on physical memory, so the balance set manager marked the thread’s kernel stack as pageable
• Later, the thread’s wait was satisfied, but it can’t become ready until the system allocates a non-pageable kernel stack frame
• Initiate:
• Thread is “under construction” and can’t run yet
• Standby: One processor has selected a thread for execution on another processor
• Terminate: Thread has executed its last code, but can’t be deleted until all handles and references to it are closed (object manager)
Operating Systems I PT / FF 14
Scheduling Scenarios
• Preemption
• A thread becomes ready at a higher priority than the currently running thread
• The lower-priority running thread is preempted
• The preempted thread goes back to the head of its ready queue
• Scheduler needs to pick the lowest priority thread to preempt
• Preemption is strictly event-driven, does not wait for the next clock tick
• Threads in kernel mode may be preempted (unless they raise IRQL to >= 2)
27
Operating Systems I PT / FF 14
Priority Adjustments• Dynamic priority adjustments are applied to threads in dynamic classes
• Disable if desired with SetThreadPriorityBoost or SetProcessPriorityBoost
• Types of priority adjustment
• I/O completion
• Wait completion on executive events or semaphores
• When threads in the foreground process complete a wait operation
• Boost value of 2, lost after one full quantum
• Quantum decremented by 1 so that threads that get boosted after I/O completion won't keep running and never experiencing quantum end
• GUI threads that wake up to windowing input (e.g. messages) get a boost of 2
• Added the current priority, not the base priority
28
Operating Systems I PT / FF 14
Priority Adjustments
• No automatic adjustments in real-time class (16 or above)
• Real time here really means “system won’t change the relative priorities of your real-time threads”
• Hence, scheduling is predictable with respect to other “real-time” threads,but not for absolute latency
• Example: Boost on I/O completion
• Specified by the device driver through IoCompleteRequest(Irp, PriorityBoost)
• Common boost values (see NTDDK.H): 1 - disk, CD-ROM, parallel, video ;2 - serial, network, named pipe, mailslot ; 6 - keyboard or mouse ;8 - sound
29
Operating Systems I PT / FF 14
Foreground Applications
• Quantum Stretching
• The threads of a normal-priority process that owns the foreground window may get longer quantum (Win32PrioritySeparation registry key)
• „Maximum“ - 6 ticks, „Middle“ - 4 ticks, „None“ - 2 ticks
• Does not happen on Server editions by default, depends on Windows „performance options“; NT4 Server had 12 ticks
30
8""
Running"""Ready"
Operating Systems I PT / FF 14
Choosing a CPU for a Ready Thread• For Windows 2000 / XP
• Check if any processors are idle that are in the thread’s hard affinity mask:
• If its ideal processor is idle, it runs there
• Else, if the previous processor it ran on is idle, it runs there
• Else if the current processor is idle, it runs there
• Else it picks the highest numbered idle processor in the thread’s affinity mask
• If no processors are idle:
• If the ideal processor is in the thread’s affinity mask, it selects that
• Else if the previous processor is in the thread’s affinity mask, it selects that
• Else it picks the highest numbered processor in the thread’s affinity mask
• Check the priority of the thread running on the processor for preemption31
Operating Systems I PT / FF 14
Choosing a Thread for a CPU
• For Windows 2000 / XP
• System needs to choose a thread to run on a specific CPU at quantum end, wait state entering, affinity mask changes, or thread exit
• Starting with the first thread in the highest priority non-empty ready queue, it scans the queue for the first thread that:
• Has the current processor in its hard affinity mask, and
• Ran last on the current processor, or has its ideal processor equal to the current processor, or has been in its ready queue for 3 or more clock ticks, or has a priority >=24
• If it cannot find such a candidate, it selects the highest priority thread that can run on the current CPU (whose hard affinity includes the current CPU)
32
Operating Systems I PT / FF 14
Scheduling Data Structures (since Server 2003)
• Threads always go into the ready queue of their ideal processor
• Instead of locking the dispatcher database to look for a candidate to run, per-CPU ready queue is checked first (PRCB lock)
• If a thread has been selected on the CPU, just perform the dispatching
• Otherwise scan of other CPU’s ready queues looking for a thread to run
• This scan is done OUTSIDE the dispatcher lock, just acquires PRCB lock
• Dispatcher lock still need to wait or un-wait a thread
• In sum, global dispatcher database lock is now held for a MUCH shorter time
• Idle processor selection considers NUMA and hyperthreading characteristics
• Next ideal processor is the first logical processor on the next physical processor
33
Operating Systems I PT / FF 14
New since Windows 7
• Core Parking
• Historically, CPU workload was distributed fairly evenly across logical processors,even on low utilization
• Core Parking tries to keep the load on fewest logical processors possible,all others can sleep; only overridden by hard affinity and thread ideal processor
• Power management code notifies scheduling code about parked cores
• Considers socket topology - newer processors put sockets into deep sleep if all the cores are idle
• At least one CPU in each NUMA node is left unparked for fast memory access
• Core Parking is active on server and hyperthreading systems
• Best returns on medium utilization workloads, but typical Desktop client systems tend to run at extremes
34
Operating Systems I PT / FF 14
New since Windows 7
• Before, no quality of service for Remote Desktop (formerly called Terminal Server)
• One user could hog server’s CPU
• Remote Desktop role now automatically enables dynamic fair share scheduling
• Sessions are given weight 1-9 (default is 5), internal API can set weight
• Each session given CPU budget, charge happens at every scheduler event
• When session exceeds quota, its threads go to idle-only queue
• Scheduled only when no other session wants to run
• At end of interval, all threads made ready to run
35
Operating Systems I PT / FF 14
Unix SVR4 Scheduling
• Differentiation between different three priority classes for 160 priority levels
• Real-time processes (159-100)
• kernel-mode processes (99-60)
• time-shared processes (59-0, user mode)
• Kernel was not preemptible, so specific preemption points were defined
• Region of code where all kernel data structures are either updated and consistent, or locked via a semaphore
• One dispatch queue per priority level, each handled in round-robin
• Each time a time-shared process used a quantum, its priority is decreased
• Each time it blocks on an event or resource, its priority is increased
• Time-shared process quantum depends on priority, fixed for real-time processes36
Operating Systems I PT / FF 14
Linux Scheduling• schedule function as central
organization point for scheduling
• Runtime of the scheduler became thread-count-independent with Linux 2.6 - O(1) scheduler
• Also established for a while in BSD and Windows NT kernels
• Internal priorities: real-time processes (0-99), regular processes (100-139)
• nice system call allows to modify the static priority between -20 and +19(less means higher priority)
37
Operating Systems I PT / FF 14
Linux Scheduling• Each process is represented by a task_struct, which contains all scheduling-related
information
• Dynamic and static priority
• Scheduling policy - SCHED_NORMAL, SCHED_RR, SCHED_FIFO
• Real-time scheduling classes demanded for POSIX compatibility
• Round-robin real-time processes have a quantum, FIFO processes not
• Processor affinity mask
• Average sleep time of the task (high sleep time gives better priority, to support interactive tasks in the best-possible way)
• Remaining quantum as time slice
• Tasks are scheduled independently, so threads from the same process can run on different processors
38
Operating Systems I PT / FF 14
Linux Scheduling
• Each CPU has three queues
• active queue (still have quantum)
• expired queue (quantum over)
• migration queue (for processor migration)
• Queues are summarized in a runqueue structure
• When active queue is empty, it is swapped with the expired queue
• Periodic scheduling function (scheduler_tick) decreases the current quantum and calls the main scheduling function if needed
• Main function takes the highest priority task from the active queue and runs it
• Calculation of the dynamic priority in the effective_prio() function
39
Operating Systems I PT / FF 14
Linux Scheduling• Base time quantum
• Static priority determines the base time quantum, which is assigned when the former quantum is exhausted
• With static priority < 120: (140 - static priority) * 20
• With static priority >= 120: (140 - static priority) * 5
• Base time quantum gets longer with higher priority (lower value)
• Dynamic priority
• max(100, min(static priority - bonus + 5, 139))
• Bonus is a value between 0 and 10, depends on average sleep time
• less than 5 is a penalty, more than 5 is a premium
• Average sleep time is decreasing when the process is running
40