Linux 2.6.37 released 4 January, 2011.

Summary: Linux 2.6.37 includes several SMP scalability improvements for Ext4 and XFS, an option to compile the kernel with the Big Kernel Lock disabled, support for per-cgroup IO throttling, a network device based in the Ceph cluster filesystem, several Btrfs improvements, more efficient static probes, perf support to probe modules and listing of accesible local and global variables, image hibernation using LZO compression, PPP over IPv4 support, several networking microoptimizations and many other small changes, improvements and new drivers.

1. Prominent features (the cool stuff)

1.1. Ext4: better SMP scalability, faster mkfs

1.2. XFS scalability improvements

Scalability of metadata intensive workloads has been improved. A 8-way machine running a fs_mark instance of 50 million files was improved by over 15%, and removal of those files by over 100%. More scalability improvements are expected in 2.6.38.

Code: (list of commits)

1.3. No BKL (Big Kernel Lock)

The Big Kernel Lock is a giant lock that was introduced in Linux 2.0, when Alan Cox introduced SMP support for first time. But it was just an step to achieve SMP scalability - only one process can run kernel code at the same time in Linux 2.0, long term the BKL must be replaced by fine-grained locking to allow multiple processes running kernel code in parallel. In this version, it is possible to compile a kernel completely free of BKL support. Note that this doesn't have performance impact: all the critical Linux codepaths have been BKL-free for a long time. It still was used in many non-performance critical places -ioctls, drivers, non-mainstream filesystems, etc-, which are the ones that are being cleaned up in this version. But the BKL is being replaced in these places with mutexes, which doesn't improve parallelism (these places are not performance critical anyway).

Code: (commit)

1.4. A Ceph-based network block device

Ceph is a distributed network filesystem that was merged in Linux 2.6.34. In the Ceph design there are "object storage devices" and "metadata servers" which store metadata about the storage objects. Ceph uses these to implement its filesystem; however these objets can also be used to implement a network block device (or even Amazon S3-compatible object storage)

This release introduces the Rados block device (RBD). RBD lets you create a block device that is striped over objects stored in a Ceph distributed object store. In contrasts to alternatives like iSCSI or AoE, RBD images are striped and replicated across the Ceph object storage cluster, providing reliable (if one node fails it still works), scalable, and thinly provisioned access to block storage. RBD also supports read-only snapshots with rollback, and there are also Qemu patches to create a VM block device stored in a Ceph cluster.

Code: (commit)

1.5. I/O throttling support

I/O throttling support has been added. It makes possible to set upper read/write limits to a group of processes, which can be useful in many setups. Example:

{{{ Mount the cgroup blkio controller # mount -t cgroup -o blkio none /cgroup/blkio

Specify a bandwidth rate on particular device for root group. The format for policy is "<major>:<minor> <byes_per_second>" # echo "8:16 1048576" > /cgroup/blkio/blkio.read_bps_device

Above will put a limit of 1MB/second on reads happening for root group on device having major/minor number 8:16. }}} The limits can also be set in IO operations per second (blkio.throttle.read_iops_device). There also write equivalents - blkio.throttle.write_bps_device and blkio.throttle.write_iops_device. This feature does not replace the IO weight controller merged in 2.6.33.

Code.(commit 1, 2, 3, 4, 5, 6)

1.6. "Jump label": disabled tracepoints don't impact performance

A tracepoint can be described as a special printf() call, which is used inside the kernel and is used with tools like perf, LTT or systemtap to analyze the system behaviour. There are two types of tracepoints: Dynamic and static. Dynamic tracepoints modify the kernel code at runtime inserting CPU instructions where neccesary to obtain the data. Dynamic tracepoints are called 'kprobes' in the linux kernel, and their performance overhead was optimized in Linux 2.6.34.

Static tracepoints, on the other hand, are inserted by the kernel developers by hand in strategic points of the code. For example, Ext4 has 50 static tracepoints. These tracepoints are compiled with the rest of the kernel code, and by default they are "disabled" - until someone activates them, they are not called. Basically, an 'if' condition tests a variable. The performance impact is nearly negligible, but it can be improved, and that's what the "jump label" feature does: A "no operation" CPU instruction is inserted in place of the conditional test, so a disabled static tracepoint has zero overhead. (Tip: You can use the "sudo perf list" command to see the full list of static tracepoints available in your system)

Recommended LWN article: Jump label

Code: (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)

1.7. Btrfs Updates

1.8. Perf probe improvements

1.9. Power management improvements: LZO hibernation compression, delayed autosuspends

Several power-management related features have been added

1.10. Support for PPP over IPv4

This version introduces PPP over IPv4 support (PPTP). It dramatically speeds up pptp vpn connections and decreases cpu usage in comparison of existing user-space implementation (poptop/pptpclient). There is accel-pptp project to utilize this module, t contains plugin for pppd to use pptp in client-mode and modified pptpd (poptop) to build high-performance pptp NAS.

Code: (commit)

1.11. Enable Fanotify API

Fanotify was included in the previous version, but it was disabled before the release due to concerns about the API. The concerns have been solved and Fanotify has been enabled.

Code: (commit)

2. Drivers and architectures

All the driver and architecture-specific changes can be found in the Linux_2_6_37-DriversArch page

3. Core

4. VFS scalability work

5. CPU scheduler

6. Memory management

7. File systems








8. Networking

9. Block

10. Crypto

11. Virtualization


12. Security


13. Tracing/perf


KernelNewbies: Linux_2_6_37 (last edited 2017-12-30 01:30:29 by localhost)