Linux 2.6.38 released 14 March, 2011.

Summary: This release adds support for a automatic process grouping (called "the wonder patch" in the news), significant scalability improvements in the VFS, Btrfs LZO compression and read-only snapshots, support for the B.A.T.M.A.N. mesh protocol (which helps to provide network connectivity in the presence of natural disasters, military conflicts or Internet censorship), transparent Huge Page support (without using hugetblfs), automatic spreading of outcoming network traffic across multiple CPUs, support for the AMD Fusion APUs, many drivers and other changes.

1. Prominent features (the cool stuff)

1.1. Automatic process grouping (a.k.a. "the patch that does wonders")

Recommended LWN article :Group scheduling and alternatives

The most impacting feature in this release is the so-called "patch that does wonders", a patch that changes substantially how the process scheduler assigns shares of CPU time to each process. With this feature the system will group all processes with the same session ID as a single scheduling entity. Example: Let's imagine a system with six CPU-hungry processes, with the first four sharing the same session ID and the other using another two different sessions each one.

Without automatic process grouping:  [proc. 1 | proc. 2 | proc. 3 | proc. 4 | proc. 5 | proc. 6] 

With automatic process grouping:    [proc. 1, 2, 3, 4  |     proc. 5       |     proc. 6      ] 

The session ID is a property of processes in Unix systems (you can see it with commands like ps -eo session,pid,cmd). It is inherited by forked child processes, which can start a new session using setsid(3). The bash shell uses setsid(3) every time it is started, which means you can run a "make -j 20" inside a shell in your desktop and not notice it while you browse the web. This feature is implemented on top of group scheduling (merged in 2.6.24). You can disable it in /proc/sys/kernel/sched_autogroup_enabled

Code: (commit)

1.2. VFS scalability: scaling the directory cache

Recommended LWN article: Dcache scalability and RCU-walk

There are ongoing efforts to make the Linux VFS layer ("Virtual File System", the code that glues the syscall and the filesystem) more scalable. In the previous release some changes were already merged as part of this work, in this release, the dcache (alias for "directory cache", which keeps a cache of directories ) and the whole path lookup mechanisms have been reworked to be more scalable (you can find details in the LWN article).

These changes make the VFS more scalable in multithreaded workloads, but more interestingly (and it's what excites Linus Torvalds) they also make some single threaded workloads quite faster (due to the removal of atomic CPU operations in the code paths): a hot-cache "find . -size" on his home directory seems to be 35% faster. Single threaded git diff on a cached kernel tree runs 20% faster (64 parallel git diffs increase throughput by 26 times). Everything that calls stat() a lot is faster.

Changes: Far too many to track here, see the patches done by Nick Piggin in this list (inverse chronological order)

1.3. Btrfs LZO compression, read-only snapshots

Btrfs adds supports for transparent compression using the LZO algorithm, as an alternative to zlib. You can find here a small performance comparison.

There is also support for marking snapshots as read-only. Finally, filesystems which find errors will be "force mounted" as read-only, which is a step forward to make the codebase more tolerant to failures.

Code: LZO (commit 1,2, 3); read-only snapshots (commit 1, 2), forced readonly mounts (commit)

1.4. Transparent huge pages

Recommended LWN article: Transparent huge pages in 2.6.38

Processors manage memory in small units called "pages" (which is 4 KB in size in x86). Each process has a virtual memory address space, and there is a "page table" where all the correspondencies between each virtual memory address page and its correspondent real RAM page are kept. The work of walking the page table to find out which RAM page corresponds to a given virtual address is expensive, so the CPU has a small cache to store the result of that work for frequently accessed virtual addresses. However, this cache is not very big and it only supports 4KB pages, so many data-intensive workloads (databases, KVM) have performance problems because all their frequently accessed virtual addresses can't be cached.

To solve this problem, modern processors add cache entries that support pages bigger than 4KB (like 2MB/4MB). Until now, the one way that userspace had to use those pages in Linux was hugetblfs, a filesystem-based API. This release adds support for transparent hugepages ( - hugepages are used automatically where possible. Transparent Huge Pages can be configured to be used always or only as requested with madvise(MADV_HUGEPAGE), and its behaviour can be changed online in /sys/kernel/mm/transparent_hugepage/enabled. For more details, check Documentation/vm/transhuge.txt

Code: Far too many to track here, see the patches from Andrea Arcangeli in this list (inverse chronological order)

1.5. Transparent spreading of outcoming network traffic across CPUs on multiqueue devices

This patch implements transmit packet steering (XPS) for multiqueue devices. XPS selects a transmit queue during packet transmission based on configuration. This is done by mapping the CPU transmitting the packet to a queue. This is the transmit side analogue to RPS -- where RPS is selecting a CPU based on receive queue, XPS selects a queue based on the CPU.

Each transmit queue can be associated with a number of CPUs which will use the queue to send packets. This is configured as a CPU mask on a per queue basis in /sys/class/net/eth<n>/queues/tx-<n>/xps_cpus

A netperf benchmark with 500 instances of netperf TCP_RR test with 1 byte req. and resp. on 16 core AMD: XPS (16 queues, 1 TX queue per CPU) 1234K at 100% CPU No XPS (16 queues) 996K at 100% CPU

Code: (commit)

1.6. B.A.T.M.A.N. mesh protocol

B.A.T.M.A.N. is an alias for "Better Approach To Mobile Adhoc Networking". An ad hoc network is a decentralized network that does not rely on a preexisting infrastructure, such as routers in wired networks or access points in managed (infrastructure) wireless networks. Instead, each node participates in routing by forwarding data for other nodes, and so the determination of which nodes forward data is made dynamically based on the network connectivity. B.A.T.M.A.N. is a routing protocol implementation ot these networks. B.A.T.M.A.N is useful for emergency situations like natural disasters, military conflicts or Internet censorship. More information about this project can be found at

Code: (commit)

1.7. Support for AMD Fusion graphics

This release adds support for the AMD Fusion GPU+CPUs

2. Drivers and architectures

All the driver and architecture-specific changes can be found in the Linux_2_6_38-DriversArch page

3. Core

4. CPU scheduler

5. Memory management

6. Block

Device Mapper (DM)

7. File systems






8. Networking


* dcbnl: add support for ieee8021Qaz attributes (commit)

9. Crypto

10. Virtualization

11. Security


12. Tracing/perf


KernelNewbies: Linux_2_6_38 (last edited 2017-12-30 01:30:28 by localhost)