SMPSynchronisation - Linux Kernel Newbies

In order to run well on multi-processor (SMP) system, accesses to data structures need to be coordinated between multiple processors. Because different data structures get used in different ways, Linux has a whole range of SMP synchronisation mechanisms.

CPU-local data structures

This is not really an SMP synchronisation mechanism, but worth mentioning because it does help the system work on SMP systems and usually scales better than any shared data structure. The trick is simply to give each CPU its own data structure and not or rarely access the data structure of other CPUs. When this method is used for statistics, the function that displays the statistics simply reads the concerned variable for each CPU in the system and adds them up.

This method is used in the scheduler (per-CPU runqueues), in the VM (statistics) and various other areas of the kernel. Note that CPU-local data structures do not work if the data structure needs to be shared between all CPUs (eg. the VFS directory structure, the dcache).

Spinlocks

This is the simplest SMP synchronisation mechanism. A data structure is protected by a lock that only one CPU can take at a time. When a CPU wants the lock, but the lock is already taken by another CPU, then the CPU simply spins around a while loop until the other CPU releases the lock. Of course, this means that the code that runs with the spinlock held can not call schedule.

Spinlocks are often used to protect data structures that are commonly read and written and only need to be locked for a short time.

There are read-write spinlocks (rwlocks) for situations where it is advantageous to have multiple readers access the data structure simultaneously. However, due to the short time spinlocks are usually taken this is not very common.

Semaphores, uhhh Mutexes

Semaphores are a bit like spinlocks, except the holder of a semaphore is a process, not a CPU. It is safe for a process to sleep or get scheduled out while holding a semaphore. Semaphores are used when the lock may be held for a long time, eg. in the page fault path, where a process may need to wait for disk IO to complete before the page fault is serviced.

There are also read-write semaphores for situations where it is beneficial to have multiple readers simultaneously.

Recently in the 2.6 series the kernel gained Mutexes, which have pretty much the same semantics as semaphores. The arch-specific semaphore implementations have been replaced by a generic one. Mutexes are faster than semaphores, but do not have the fairness between CPUs. A mutex must be unlocked by the task which locked it and the task can not lock a locked mutex again. If the kernel is compiled with DEBUG_MUTEXES these rules are enforced.

In most situations, you will want to use mutexes.

Fast Reader Locks

The fast reader lock, or frlock, is not really a lock. This a synchronisation mechanism uses sequence numbers to make sure that readers get a consistent state of a data structure. It is used for a data structure that is read so often that a read-write lock might starve writers. Instead, writers can always go ahead and write the data structure, while readers simply read the data structure again if it changed during the read.

This is implemented in the following way. Readers read the sequence number of the data structure, then they read the data they need for the data structure and finally they read the sequence number again. If the sequence number changed during the data read, they simply retry the whole operation. If the sequence number stayed the same, the data structure was not written to and the data is valid.

A writer increments the sequence number before and after modifying the data structure. This way readers can see whether or not the data structure is being changed under them.

Read Copy Update (RCU)

RCU is a lockless scheme for synchronising access to a shared data structure that is mostly read-only, eg. the network route cache. It is most often used for linked lists, in particular for deleting entries from linked lists.

Adding an entry to a linked list can be done quite easily in a lockless way, since setting pointers in list heads is atomic. Simply set the next and previous entries right and add your entry to the tail of the linked list.

Deletion is a bit trickier, since other CPUs could be using the linked list entry you want to delete. RCU simply unlinks the entry from the linked list and keeps it around until all the CPUs in the system have scheduled - at that point nobody can still be using the entry, and it can be freed.

Links

http://greenteapress.com/semaphores/downey05semaphores.pdf

CategoryKernelHacking