Linux Device Driver parallelism using SMP and Kernel Pre-emption

Slide 1

Driver Parallelism using SMP and Kernel Pre-emption

Hemanth V

Slide 2

• Understanding of Linux Device Drivers

• Basic understanding of Linux Synchronization mechanisms like Semaphore, Mutex and Spin Locks

PrerequisitesPrerequisites

Slide 3

Contents

Kernel Pre-emption Feature

SMP Architecture

USB Usecase Analysis

Driver Scenarios

Summary

What's Driver Parallelism

Slide 4

Driver Parallelism

• Parallelism or Concurrency arises when system tries to do more than one thing at once

– Concurrency is when two tasks can start, run, and complete in overlapping time periods. It doesn't necessarily mean they'll ever both be running at the same instant.

– Parallelism is when tasks literally run at the same time

• The goal of parallelism/concurrency is to improve system performance

• The side affect is that it can also lead to Race conditions

• Further discussion in the slides will highlight the sources of parallelism/concurrency, howto improve performance and avoid race conditions for Linux Device Drivers

http://www.fasterj.com/cartoon/cartoon106.shtml

http://www.fasterj.com/cartoon/cartoon106.shtml

Slide 5

Kernel Preemption

• CONFIG_PREEMPT– This kernel config option reduces the latency of the kernel by making all kernel

code (that is not executing in a critical section) preemptible.

– This allows reaction to interactive events by permitting a low priority process to be preempted involuntarily even if it is in kernel mode executing

– After execution of an asynchronous event like interrupt handler, if a higher priority process is ready to run the current process is replaced.

– Useful for embedded system with latency requirements in the milliseconds range.

Slide 6

SMP Architecture• Evolution of multiprocessor architectures

– Late 60s saw need for more CPU processing power for scientific and compute intensive applications.

– Two or more CPUs combined to form a single computer

• SMP (Symmetric Multiprocessing) is one of the multiprocessor architecture.

• AMP, Cluster are others

• Basic idea, more tasks in parallel per unit time

Slide 7

SMP Architecture

Cache Cache Cache Cache

CPU CPU CPU CPU

I/OMemory

Fig 1 : Logical view of SMPIn actual hardware implementation, cache will not be directly connected to bus.

Cache Cache Cache Cache

CPU CPU CPU CPU

I/OMemory

Fig 1 : Logical view of SMPIn actual hardware implementation, cache will not be directly connected to bus.

Slide 8

SMP Architecture Contd• 4 CPU SMP system shown in diagram, all CPUs would be symmetric i.e.

would be of same architecture, frequency etc

• CPU, Memory, IO tightly coupled using high speed interconnect bus, allowing any unit connected to bus to communicate with any other unit

• Single globally accessible memory used by all CPUs, No local RAM in CPUs, Data changes visible to all CPUs

• Symmetric or equal access to global shared memory, contents are fully shared, all CPUs use the same address whenever referring to the same piece of data

• I/O access also symmetric, i.e. any cpu can initiate I/O

Slide 9

SMP Architecture Cont• Interrupts distributed across CPUs by PIC

• Access to bus and memory has to be arbitrated so that no 2 CPUs step on each other, and all have guaranteed fair access

• Max CPUs that can be used depends on Bus bandwidth

• Only one instance of OS or Operating System, which is loaded in main memory

• Concurrent access to kernel data structures, hence kernel needs to be SMP aware

Slide 10

SMP Intricacies: Cache Coherency

Slide 11

SMP Intricacies: Cache Coherency

• CPU stores data into cache in most implementations to improve system performance.

• Consider the case of 2 Threads running on 2 different CPUs in a SMP system. Both use global variable “Data”. If one of them modifies it to 1, it is reflected in its own cache only. Values in main memory and other cpu’s cache are stale, and if those values are read by other CPU, results could be unpredictable. Hence the need to maintain consistency or coherency of caches.

• This problem is typically solved by Hardware cache consistency protocols, which include snooping and write-update/write-invalidate

Slide 12

SMP Intricacies: Atomic operations

• Two threads trying to obtain the same semaphore simultaneously. Both read value of 0 think its available and set it to 1.

• These issues are solved by using atomic instructions provided by each architecture

• Special instructions provide Atomic test and set operations. Example load-linked and store-conditional instructions in MIPS and load-exclusive store-exclusive in ARM

Slide 13

USB Subsystem Analysis

USB Host Controller

EHCI Driver

USB Core

USB Print Class Driver

USB Mass Storage Class Driver

USB Print APP

USB Mass Storage APP

Linux Host

USB Device Controller

UDC Driver

Mass storage gadget Driver

Print gadget Driver

USB Print App

Linux Device

Simplified view of USB Subsystem

Slide 14

USB Subsystem Analysis: No preempt

• Assume Linux host has initiated a large transfer for USB mass storage.

• In-kernel transfer would not be pre-empted until available data is exhausted.

• High priority, small amount of data for Print would get scheduled only after mass storage transfer is complete.

• This affects end user experience

Slide 15

USB Subsystem Analysis: Preempt Enabled

• Assume the same scenario with kernel preemption enabled.

• In kernel transfer of mass-storage can be preempted and replaced by Print data transfer, for example after processing a keyboard or timer interrupt

• Opens another parallel path into both USB core and Ehci drivers, since mass storage transfer is not complete and Print transfer has started.

• Print transfer could re-open the same device, access the same data structures for initiating transfer, and could even disconnect the device.

Slide 16

USB Subsystem Analysis: Preempt Enabled

• Hence driver design needs to determine all parallel paths and points at which its safe to be pre-empted, at the same time enable parallelism.

• For example it could be safe to pre-empt once URB request is queued, but might not be safe to pre-empt when DMA is in progress since DMA configuration registers could be overwritten.

Slide 17

USB Subsystem Analysis: SMP• Assume the previous scenario on a SMP system

• In this case the scheduler need not pre-empt the running mass storage transfer, but can schedule the print transfer on an another CPU.

• This too opens a new parallel path into the drivers, and both would be executing at the same instant of time.

• Hence if parallelism is taken care in the drivers, its to a large extent SMP safe.

• In SMP systems Interrupt handler and driver code could run concurrently on different CPUs.

• Hence the need to protect Interrupt handlers using spin locks

Slide 18

Driver Scenarios

static LIST_HEAD(ts_list);

int process_ts_entries ()

{

local_irq_disable();

list_for_each_entry(ts, &ts_list, node) {

/* Process List elements */

list_del(node);

}

local_irq_enable();

}

irqreturn_t ts_isr (int irq, void *dev_id) { /* Process Interrupt */ list_add_tail(node, &ts_list); }

local_irq_disable () protects from both interrupt handler and preemption

spin_lock_irqsave () needs to be added for SMP safe in Driver Code & ISR

Slide 19

Driver Scenarios: Cont

Locking using Mutex/Semaphore doesn't disable pre-emption, but guarantees that data structure is not corrupted on pre-emption

Both SMP safe and Pre-empt Safe



{

mutex_lock_interruptible(ts->lock);



list_del(node);

}

mutex_unlock(ts->lock);

}

int process_rest_entries(){ mutex_lock_interruptible(ts->lock);


/* Process remaining elements */

}

mutex_unlock(ts->lock); }

Slide 20


Functions process_ts_entries() and process_rest_entries() could deadlock if pre-empted while holding one of the locks

Locks need to be obtained in the same order, to avoid deadlock


static LIST_HEAD(tc_list);


{


/* Some processing */

mutex_lock_interruptible(tc->lock);

}

int process_rest_entries(){ mutex_lock_interruptible(tc->lock);

/* Some processing */


}

Slide 21


In some cases it might be better to access resources from a single function, rather than have locks spread across through out the code



{




list_del(node);

}

mutex_unlock(ts->lock);

}

{ /* Process list elements */ process_ts_entries(); }

{ /* Process list elements */ process_ts_entries(); }

Slide 22

Driver Scenarios • Don’t use one big lock for everything, reduces concurrency• Too fine-grained locks increases overhead • Need to balance both aspects

• Reader –Writer locks– If Data structures are read more often than being updated– Allows multiple reads locks to be obtained simultaneously.– Allows single write lock to be obtained, and also prevents any read lock from

being obtained while write lock is held– Available for both spin locks and semaphores

• Stack variables/structures don't need locking, since on pre-emption another instance is created

Slide 23

Summary• Concurrency/Parallelism needs to be one of the criteria during Driver Design

phase

• Analysis required to determine the parallel paths and protection for critical sections

• Drivers which ensure concurrency using appropriate locking techniques, not only avoids race conditions but also improves performance

• Unit testing could be used to test some of the parallel paths in the driver– Two different applications which will enable parallel path into the same driver.

– Two instances for the same application.

Slide 24

Thank You

[email protected]

Date post:	25-May-2015
Category:	Technology
Upload:	hemanth-venkatesh
View:	764 times
Download:	8 times

Linux Device Driver parallelism using SMP and Kernel Pre-emption

Technology