Date post: | 25-May-2015 |
Category: |
Technology |
Upload: | hemanth-venkatesh |
View: | 764 times |
Download: | 8 times |
Slide 1
Driver Parallelism using SMP and Kernel Pre-emption
Hemanth V
Slide 2
• Understanding of Linux Device Drivers
• Basic understanding of Linux Synchronization mechanisms like Semaphore, Mutex and Spin Locks
PrerequisitesPrerequisites
Slide 3
Contents
Kernel Pre-emption Feature
SMP Architecture
USB Usecase Analysis
Driver Scenarios
Summary
What's Driver Parallelism
Slide 4
Driver Parallelism
• Parallelism or Concurrency arises when system tries to do more than one thing at once
– Concurrency is when two tasks can start, run, and complete in overlapping time periods. It doesn't necessarily mean they'll ever both be running at the same instant.
– Parallelism is when tasks literally run at the same time
• The goal of parallelism/concurrency is to improve system performance
• The side affect is that it can also lead to Race conditions
• Further discussion in the slides will highlight the sources of parallelism/concurrency, howto improve performance and avoid race conditions for Linux Device Drivers
http://www.fasterj.com/cartoon/cartoon106.shtml
Slide 5
Kernel Preemption
• CONFIG_PREEMPT– This kernel config option reduces the latency of the kernel by making all kernel
code (that is not executing in a critical section) preemptible.
– This allows reaction to interactive events by permitting a low priority process to be preempted involuntarily even if it is in kernel mode executing
– After execution of an asynchronous event like interrupt handler, if a higher priority process is ready to run the current process is replaced.
– Useful for embedded system with latency requirements in the milliseconds range.
Slide 6
SMP Architecture• Evolution of multiprocessor architectures
– Late 60s saw need for more CPU processing power for scientific and compute intensive applications.
– Two or more CPUs combined to form a single computer
• SMP (Symmetric Multiprocessing) is one of the multiprocessor architecture.
• AMP, Cluster are others
• Basic idea, more tasks in parallel per unit time
Slide 7
SMP Architecture
Cache Cache Cache Cache
CPU CPU CPU CPU
I/OMemory
Fig 1 : Logical view of SMPIn actual hardware implementation, cache will not be directly connected to bus.
Cache Cache Cache Cache
CPU CPU CPU CPU
I/OMemory
Fig 1 : Logical view of SMPIn actual hardware implementation, cache will not be directly connected to bus.
Slide 8
SMP Architecture Contd• 4 CPU SMP system shown in diagram, all CPUs would be symmetric i.e.
would be of same architecture, frequency etc
• CPU, Memory, IO tightly coupled using high speed interconnect bus, allowing any unit connected to bus to communicate with any other unit
• Single globally accessible memory used by all CPUs, No local RAM in CPUs, Data changes visible to all CPUs
• Symmetric or equal access to global shared memory, contents are fully shared, all CPUs use the same address whenever referring to the same piece of data
• I/O access also symmetric, i.e. any cpu can initiate I/O
Slide 9
SMP Architecture Cont• Interrupts distributed across CPUs by PIC
• Access to bus and memory has to be arbitrated so that no 2 CPUs step on each other, and all have guaranteed fair access
• Max CPUs that can be used depends on Bus bandwidth
• Only one instance of OS or Operating System, which is loaded in main memory
• Concurrent access to kernel data structures, hence kernel needs to be SMP aware
Slide 10
SMP Intricacies: Cache Coherency
Slide 11
SMP Intricacies: Cache Coherency
• CPU stores data into cache in most implementations to improve system performance.
• Consider the case of 2 Threads running on 2 different CPUs in a SMP system. Both use global variable “Data”. If one of them modifies it to 1, it is reflected in its own cache only. Values in main memory and other cpu’s cache are stale, and if those values are read by other CPU, results could be unpredictable. Hence the need to maintain consistency or coherency of caches.
• This problem is typically solved by Hardware cache consistency protocols, which include snooping and write-update/write-invalidate
Slide 12
SMP Intricacies: Atomic operations
• Two threads trying to obtain the same semaphore simultaneously. Both read value of 0 think its available and set it to 1.
• These issues are solved by using atomic instructions provided by each architecture
• Special instructions provide Atomic test and set operations. Example load-linked and store-conditional instructions in MIPS and load-exclusive store-exclusive in ARM
Slide 13
USB Subsystem Analysis
USB Host Controller
EHCI Driver
USB Core
USB Print Class Driver
USB Mass Storage Class Driver
USB Print APP
USB Mass Storage APP
Linux Host
USB Device Controller
UDC Driver
Mass storage gadget Driver
Print gadget Driver
USB Print App
Linux Device
Simplified view of USB Subsystem
Slide 14
USB Subsystem Analysis: No preempt
• Assume Linux host has initiated a large transfer for USB mass storage.
• In-kernel transfer would not be pre-empted until available data is exhausted.
• High priority, small amount of data for Print would get scheduled only after mass storage transfer is complete.
• This affects end user experience
Slide 15
USB Subsystem Analysis: Preempt Enabled
• Assume the same scenario with kernel preemption enabled.
• In kernel transfer of mass-storage can be preempted and replaced by Print data transfer, for example after processing a keyboard or timer interrupt
• Opens another parallel path into both USB core and Ehci drivers, since mass storage transfer is not complete and Print transfer has started.
• Print transfer could re-open the same device, access the same data structures for initiating transfer, and could even disconnect the device.
Slide 16
USB Subsystem Analysis: Preempt Enabled
• Hence driver design needs to determine all parallel paths and points at which its safe to be pre-empted, at the same time enable parallelism.
• For example it could be safe to pre-empt once URB request is queued, but might not be safe to pre-empt when DMA is in progress since DMA configuration registers could be overwritten.
Slide 17
USB Subsystem Analysis: SMP• Assume the previous scenario on a SMP system
• In this case the scheduler need not pre-empt the running mass storage transfer, but can schedule the print transfer on an another CPU.
• This too opens a new parallel path into the drivers, and both would be executing at the same instant of time.
• Hence if parallelism is taken care in the drivers, its to a large extent SMP safe.
• In SMP systems Interrupt handler and driver code could run concurrently on different CPUs.
• Hence the need to protect Interrupt handlers using spin locks
Slide 18
Driver Scenarios
static LIST_HEAD(ts_list);
int process_ts_entries ()
{
local_irq_disable();
list_for_each_entry(ts, &ts_list, node) {
/* Process List elements */
list_del(node);
}
local_irq_enable();
}
irqreturn_t ts_isr (int irq, void *dev_id) { /* Process Interrupt */ list_add_tail(node, &ts_list); }
local_irq_disable () protects from both interrupt handler and preemption
spin_lock_irqsave () needs to be added for SMP safe in Driver Code & ISR
Slide 19
Driver Scenarios: Cont
Locking using Mutex/Semaphore doesn't disable pre-emption, but guarantees that data structure is not corrupted on pre-emption
Both SMP safe and Pre-empt Safe
static LIST_HEAD(ts_list);
int process_ts_entries ()
{
mutex_lock_interruptible(ts->lock);
list_for_each_entry(ts, &ts_list, node) {
/* Process List elements */
list_del(node);
}
mutex_unlock(ts->lock);
}
int process_rest_entries(){ mutex_lock_interruptible(ts->lock);
list_for_each_entry(ts, &ts_list, node) {
/* Process remaining elements */
}
mutex_unlock(ts->lock); }
Slide 20
Driver Scenarios: Cont
Functions process_ts_entries() and process_rest_entries() could deadlock if pre-empted while holding one of the locks
Locks need to be obtained in the same order, to avoid deadlock
static LIST_HEAD(ts_list);
static LIST_HEAD(tc_list);
int process_ts_entries ()
{
mutex_lock_interruptible(ts->lock);
/* Some processing */
mutex_lock_interruptible(tc->lock);
}
int process_rest_entries(){ mutex_lock_interruptible(tc->lock);
/* Some processing */
mutex_lock_interruptible(ts->lock);
}
Slide 21
Driver Scenarios: Cont
In some cases it might be better to access resources from a single function, rather than have locks spread across through out the code
static LIST_HEAD(ts_list);
int process_ts_entries ()
{
mutex_lock_interruptible(ts->lock);
list_for_each_entry(ts, &ts_list, node) {
/* Process List elements */
list_del(node);
}
mutex_unlock(ts->lock);
}
{ /* Process list elements */ process_ts_entries(); }
{ /* Process list elements */ process_ts_entries(); }
Slide 22
Driver Scenarios • Don’t use one big lock for everything, reduces concurrency• Too fine-grained locks increases overhead • Need to balance both aspects
• Reader –Writer locks– If Data structures are read more often than being updated– Allows multiple reads locks to be obtained simultaneously.– Allows single write lock to be obtained, and also prevents any read lock from
being obtained while write lock is held– Available for both spin locks and semaphores
• Stack variables/structures don't need locking, since on pre-emption another instance is created
Slide 23
Summary• Concurrency/Parallelism needs to be one of the criteria during Driver Design
phase
• Analysis required to determine the parallel paths and protection for critical sections
• Drivers which ensure concurrency using appropriate locking techniques, not only avoids race conditions but also improves performance
• Unit testing could be used to test some of the parallel paths in the driver– Two different applications which will enable parallel path into the same driver.
– Two instances for the same application.