Linux 2.6.37 not released yet.

Summary: Linux 2.6.37 includes


1. Prominent features (the cool stuff)

1.1. Ext4: better SMP scalability, faster mkfs

* Better SMP scalability: In this release Ext4 will use the "bio" layer directly instead of the intermediate "buffer" layer. The "bio" layer (alias for Block I/O: it's the part of the kernel that sends the requests to the IO/O scheduler) was one of the first features merged in the Linux 2.5.1 kernel. The buffer layer has a lot of performance and SMP scalability issues that will get solved with this port. A FFSB benchmark in a 48 core AMD box using a 24 SAS-disk hardware RAID array with 192 simultaneous ffsb threads [ speeds up by 300%] (400% disabling journaling), while reducing CPU usage by a factor of 3-4.

* Faster mkfs: One of the slowest parts while creating a new Ext4 filesystem is initializating the inode tables. mkfs can avoid this step and leave the inode tables uninitialized. When mounted for first time, the kernel will run a kernel thread -ext4lazyinit- which will initialize the tables.

* Add batched discard support for ext4 [ (commit)], [ (commit)], [ (commit)]

Code:[ (commit 1], [ 2)]

1.2. XFS scalability improvements

Scalability of metadata intensive workloads has been improved. A 8-way machine running a fs_mark instance of 50 million files was improved by over 15%, and removal of those files by over 100%. More scalability improvements are expected in 2.6.38.

Code: [;a=history;f=fs/xfs;hb=05340d4ab2ec2b6b4962c1c41c6ea8fb550f947b (list of commits)]

1.3. No BKL (Big Kernel Lock)

The Big Kernel Lock is a [ giant lock] that was introduced in Linux 2.0, when Alan Cox introduced SMP support for first time. But it was just an step to achieve SMP scalability - only one process can run kernel code at the same time in Linux 2.0, long term the BKL must be replaced by fine-grained locking to allow multiple processes running kernel code in parallel. In this version, it is possible to compile a kernel completely free of BKL support. Note that this doesn't have performance impact: all the critical Linux codepaths have been BKL-free for a long time. It still was used in many non-performance critical places -ioctls, drivers, non-mainstream filesystems, etc-, which are the ones that are being cleaned up in this version. But the BKL is being replaced in these places with mutexes, which doesn't improve parallelism (these places are not performance critical anyway).

Code: [ (commit)]

1.4. A Ceph-based network block device

Ceph is a distributed network filesystem that was merged in [ Linux 2.6.34]. In the Ceph design there are "object storage devices" and "metadata servers" which store metadata about the storage objects. Ceph uses these to implement its filesystem; however these objets can also be used to implement a network block device (or even [ Amazon S3-compatible object storage])

This release introduces the Rados block device (RBD). RBD lets you create a block device that is striped over objects stored in a Ceph distributed object store. In contrasts to alternatives like iSCSI or AoE, RBD images are striped and replicated across the Ceph object storage cluster, providing reliable (if one node fails it still works), scalable, and thinly provisioned access to block storage. RBD also supports read-only snapshots with rollback, and there are also Qemu patches to create a VM block device stored in a Ceph cluster.

Code: [ (commit)]

1.5. I/O throttling support

I/O throttling support has been added. It makes possible to set upper read/write limits to a group of processes, which can be useful in many setups. Example:

Mount the cgroup blkio controller # mount -t cgroup -o blkio none /cgroup/blkio Specify a bandwidth rate on particular device for root group. The format for policy is "<major>:<minor> <byes_per_second>" # echo "8:16 1048576" > /cgroup/blkio/blkio.read_bps_device Above will put a limit of 1MB/second on reads happening for root group on device having major/minor number 8:16.

The limits can also be set in IO operations per second (blkio.throttle.read_iops_device). There also write equivalents - blkio.throttle.write_bps_device and blkio.throttle.write_iops_device. This feature does not replace the IO weight controller [ merged in 2.6.33].

Code.[ (commit 1], [ 2], [ 3], [ 4], [ 5], [ 6)]

1.6. "Jump label": disabled tracepoints don't impact performance

A tracepoint can be described as a special printf() call, which is used inside the kernel and is used with tools like perf, LTT or systemtap to analyze the system behaviour. There are two types of tracepoints: Dynamic and static. Dynamic tracepoints modify the kernel code at runtime inserting CPU instructions where neccesary to obtain the data. Dynamic tracepoints are called 'kprobes' in the linux kernel, and their performance overhead was [ optimized in Linux 2.6.34].

Static tracepoints, on the other hand, are inserted by the kernel developers by hand in strategic points of the code. For example, Ext4 has 50 static tracepoints. These tracepoints are compiled with the rest of the kernel code, and by default they are "disabled" - until someone activates them, they are not called. Basically, an 'if' condition tests a variable. The performance impact is nearly negligible, but it can be improved, and that's what the "jump label" feature does: A "no operation" CPU instruction is inserted in place of the conditional test, so a disabled static tracepoint has zero overhead. (Tip: You can use the "sudo perf list" command to see the full list of static tracepoints available in your system)

Recommended LWN article: [ Jump label]

Code: [ (commit 1], [ 2], [ 3], [ 4], [ 5], [ 6], [ 7], [ 8], [ 9], [ 10], [ 11], [ 12)]

1.7. Btrfs Updates

[ (commit 1], [ 2], [ 3], [ 4)]

1.8. Perf probe improvements

· Show accessible local and global variables: A "-V" ("--vars") option has been added for listing accessible local variables at given probe point. This will help finding which local variables are available for event arguments. For example: "# perf probe -V call_timer_fn:23" will show all the local variables. In addition, global variables can also be shown addin the "--externs" argument [ (commit)], [ (commit)], [ (commit)] · Module support: It's possible to set a probe inside modules, using the "--module" command. For example, "# ./perf probe --module drm drm_vblank_info:3 node m" [ (commit)]

1.9. Power management improvements: LZO hibernation compression, delayed autosuspends

Several power-management related features have been added

1.10. Support for PPP over IPv4

This version introduces PPP over IPv4 support (PPTP). It dramatically speeds up pptp vpn connections and decreases cpu usage in comparison of existing user-space implementation (poptop/pptpclient). There is [ accel-pptp project] to utilize this module, t contains plugin for pppd to use pptp in client-mode and modified pptpd (poptop) to build high-performance pptp NAS.

Code: [ (commit)]

2. Drivers and architectures

All the driver and architecture-specific changes can be found in the [ Linux_2_6_37-DriversArch page]

3. Core

4. CPU scheduler

5. Memory management

6. File systems








7. Networking

8. Block

9. Crypto

10. Virtualization


11. Security


KernelNewbies: Linux_2_6_37 (last edited 2011-01-05 09:39:03 by diegocalleja)