Linux 3.10 has been released on Sun, 30 Jun 2013.

Summary: This release adds support for bcache, which allows to use SSD devices to cache data from other block devices; a Btrfs format improvement that makes the tree dedicated to store extent information 30-35% smaller; support for XFS metadata checksums and self-describing metadata, timerless multitasking, SysV IPC, rwlock and mutex scalability improvements, a TCP Tail loss probe algorithm that reduces tail latency of short transactions, KVM virtualization support in the MIPS architecture, support for the ARM big.LITTLE architecture that mixes CPUs of different types, tracing snapshots, new drivers and many small improvements.

1. Prominent features (the cool stuff)

1.1. Timerless multitasking

In the prehistory of computing, computers could only have one task running at one time. But people wanted to start other tasks without waiting for first one to end, and even switch between tasks, and thus multitasking was born. First, multitasking was "collaborative", a process would run until its own code voluntarily decided to pause and allow other tasks to run. But it was possible to do multitasking better: the hardware could have a timer that fires up at regular intervals (called "ticks"); this timer could forcefully pause any program and run a OS routine that decides which task should continue running next. This is called preemptive multitasking, and it's what modern OSs do.

But preemptive multitasking had some side effects in modern hardware. CPUs of laptops and mobile devices require inactivity to enter in low power modes. Preemptive multitasking fires the the timer often, 1000 times per second in a typical Linux kernel, even when the system is not doing anything, so the CPUs could not save as much power as it was possible. Virtualization created more problems, since each Linux VM runs its own timer.In 2.6.21, released in April 2007, Linux partially solved this: the timer would fire off 1000 times per second as always when the system is running tasks, but it would stop completely the timer when the system is idle. But this is not enough. There are single task workloads like scientific number crunching or users of the real-time pachset whose performance or latency is hurt because they need to be temporally paused 1000 times per second for no reason.

This Linux release adds support for not firing the timer (tickless) even when tasks are running. With some caveats: in this release it's not actually fully tickless, it still needs the timer, but only fires up one time per second; the full tickless mode is disabled when a CPU runs more than one process; and a CPU must be kept running with full ticks to allow other CPUs to go into tickless mode.

For more details and future plans, it's strongly recommended to read this LWN article: '(Nearly) full tickless operation in 3.10' and the Documentation.

Code: (merge commit)

1.2. Bcache, a block layer cache for SSD caching

Since SSD storage devices became popular, many people has used them to speed up their storage stack. Bcache is an implementation of this functionality, and it allows SSDs to cache other block devices. It's analogous to L2Arc for ZFS, but Bcache also does writeback caching (besides just write through caching), and it's filesystem agnostic. It's designed to be switched on with a minimum of effort, and to work well without configuration on any setup. By default it won't cache sequential IO, just the random reads and writes that SSDs excel at. It's meant to be suitable for desktops, servers, high-end storage arrays, and perhaps even embedded.

For more details read the documentation or visit the wiki

Recommended LWN article: A bcache update

Code: (commit)

1.3. Btrfs: smaller, more space-efficient extent tree

Btrfs has incorporated a new key type for metadata extent references which uses disk space more efficiently and reduces the size from 51 bytes to 33 bytes per extent reference for each tree block. In practice, this results in a 30-35% decrease in the size of the extent tree, which means less copy-on-write operations, larger parts of the extent tree stored in memory which makes heavy metadata operations go much faster.

This is not an automatic format change, it must be enabled at mkfs time or with btrfstune -x.

Code: (commit)

1.4. XFS metadata checksums

In this release, XFS has a experimental implementation of metadata CRC32c checksums. These metadata checksums are part of a bigger project that aims to implement what the XFS developers have called "self-describing metadata". This project aims to solve the problem of verification scalability (fsck will need too much time to verify petabyte scale filesystems with billions of inodes). It requires a filesystem format change that will add to every XFS metadata object some information that allows to quickly determine if the metadata is intact and can be ignored for the purpose of forensic analysis. metadata type, filesystem identifier and block placement, metadata owner, log sequence identifier and, of course, CRC checksum.

This feature is experimental and requires using experimental xfsprogs. For more information, you can read the self-describing metadata Documentation.

Code: (merge commit)

1.5. SysV IPC scalability improvements

Linux IPC semaphore scalability was pitiful. Linux used to lock much too big ranges, and it used to have a single IPC lock per IPC semaphore array. Most loads never cared, but some do. This release splits out locking and adds per-semaphore locks for greater scalability of the IPC semaphore code. Micro benchmarks show improvements of more than 10x in some cases (see commit links for details).

Code: (merge commit),(commit 1, 2, 3, 4, 5, 6, 7

1.6. rwsem locking scalability improvements

The rwsem ("read-writer semaphore") locking scheme, used in many places in the Linux kernel, had performance problems because of strict, serialized, FIFO sequential write-ownership of the semaphore. In Linux 3.9, an "opportunistic lock stealing" patch was merged to fix it, but only in the slow path.

In this release, opportunity lock stealing has been implemented in the fast path, improving the performance of pgbench with double digits in some cases.

Code: (merge commit)

1.7. mutex locking scalability improvements

The mutex locking scheme, used widely in the Linux kernel, has been improved with some scalability improvements due to the use of less atomic operations and some queuing changes that reduce reduce cacheline contention. For details, see the commit links.

Code: (commit), (commit)

1.8. TCP optimization: Tail loss probe

This release adds the TCP Tail loss probe algorithm. Its goal is to reduce tail latency of short transactions. It achieves this by converting retransmission timeouts (RTOs) occuring due to tail losses (losses at end of transactions) into fast recovery. TLP transmits one packet in two round-trips when a connection is in Open state and isn't receiving any ACKs. The transmitted packet, aka loss probe, can be either new or a retransmission. When there is tail loss, the ACK from a loss probe triggers FACK/early-retransmit based fast recovery, thus avoiding a costly retransmission timeout.

Code: (commit 1, 2)

1.9. ARM big.LITTLE support

The ARM big.LITTLE architecture is a ARM SMP solution where, according to this LWN Article, "instead of having a bunch of identical CPU cores put together in a system, the big.LITTLE architecture is effectively pushing the concept further by pulling two different SMP systems together: one being a set of "big" and fast processors, the other one consisting of "little" and power-efficient processors."

Recommended LWN article: Multi-cluster power management

Product site:

Code: (commit)

1.10. MIPS KVM support

Another Linux architecture has added support for KVM; in this case MIPS. KVM/MIPS should support MIPS32R2 and beyond. For more details, see the release notes.

Code: (commit)

1.11. tracing: tracing snapshots, stack tracing

The tracing framework has got the ability to allow several tracing buffers, which can be used to take snapshots of the main tracing buffer. These tracing snapshots can be triggered manually or with function probes. It's also possible to cause a stack trace to be traced in the ring buffer when a given function is called.

Code: (commit 1, 2, 3, 4, 5, 6)

2. Drivers and architectures

All the driver and architecture-specific changes can be found in the Linux_3.10-DriversArch page

3. Core



4. Memory management

Memory control group

5. Block layer

6. File systems






7. Networking



802.11 (wireless)


8. Crypto

9. Virtualization





10. Security

11. Tracing/perf



12. Other news sites that track the changes of this release


KernelNewbies: Linux_3.10 (last edited 2017-12-30 01:30:13 by localhost)