KernelNewbies:

Linux 2.6.37 not released yet.

Summary: Linux 2.6.37 includes several SMP scalability improvements for Ext4 and XFS, an option to compile the kernel with the Big Kernel Lock disabled, support for per-cgroup IO throttling, a network device based in the Ceph cluster filesystem, several Btrfs improvements, perf support to probe modules and global variables, image hibernation using LZO compression, PPP over IPv4 support, several networking microoptimizations and many other small changes, improvements and new drivers.

TableOfContents()

1. Prominent features (the cool stuff)

1.1. Ext4: better SMP scalability, faster mkfs

1.2. XFS scalability improvements

Scalability of metadata intensive workloads has been improved. A 8-way machine running a fs_mark instance of 50 million files was improved by over 15%, and removal of those files by over 100%. More scalability improvements are expected in 2.6.38.

Code: [http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;f=fs/xfs;hb=05340d4ab2ec2b6b4962c1c41c6ea8fb550f947b (list of commits)]

1.3. No BKL (Big Kernel Lock)

The Big Kernel Lock is a [http://en.wikipedia.org/wiki/Giant_lock giant lock] that was introduced in Linux 2.0, when Alan Cox introduced SMP support for first time. But it was just an step to achieve SMP scalability - only one process can run kernel code at the same time in Linux 2.0, long term the BKL must be replaced by fine-grained locking to allow multiple processes running kernel code in parallel. In this version, it is possible to compile a kernel completely free of BKL support. Note that this doesn't have performance impact: all the critical Linux codepaths have been BKL-free for a long time. It still was used in many non-performance critical places -ioctls, drivers, non-mainstream filesystems, etc-, which are the ones that are being cleaned up in this version. But the BKL is being replaced in these places with mutexes, which doesn't improve parallelism (these places are not performance critical anyway).

Code: [http://git.kernel.org/linus/6de5bd128d381ad88ac6d419a5e597048eb468cf (commit)]

1.4. A Ceph-based network block device

Ceph is a distributed network filesystem that was merged in [http://kernelnewbies.org/Linux_2_6_34#head-87b23f85b5bdd35c0ab58c1ebfdcbd48d1658eef Linux 2.6.34]. In the Ceph design there are "object storage devices" and "metadata servers" which store metadata about the storage objects. Ceph uses these to implement its filesystem; however these objets can also be used to implement a network block device (or even [http://ceph.newdream.net/2010/11/s3-compatible-object-storage-with-radosgw/ Amazon S3-compatible object storage])

This release introduces the Rados block device (RBD). RBD lets you create a block device that is striped over objects stored in a Ceph distributed object store. In contrasts to alternatives like iSCSI or AoE, RBD images are striped and replicated across the Ceph object storage cluster, providing reliable (if one node fails it still works), scalable, and thinly provisioned access to block storage. RBD also supports read-only snapshots with rollback, and there are also Qemu patches to create a VM block device stored in a Ceph cluster.

Code: [http://git.kernel.org/linus/602adf400201636e95c3fed9f31fba54a3d7e844 (commit)]

1.5. I/O throttling support

I/O throttling support has been added. It makes possible to set upper read/write limits to a group of processes, which can be useful in many setups. Example:

{{{ Mount the cgroup blkio controller # mount -t cgroup -o blkio none /cgroup/blkio

Specify a bandwidth rate on particular device for root group. The format for policy is "<major>:<minor> <byes_per_second>" # echo "8:16 1048576" > /cgroup/blkio/blkio.read_bps_device

Above will put a limit of 1MB/second on reads happening for root group on device having major/minor number 8:16. }}} The limits can also be set in IO operations per second (blkio.throttle.read_iops_device). There also write equivalents - blkio.throttle.write_bps_device and blkio.throttle.write_iops_device. This feature does not replace the IO weight controller [http://kernelnewbies.org/Linux_2_6_33#head-2e432d67d2aa0ed119298a767a21066a039d70e1 merged in 2.6.33].

Code.[http://git.kernel.org/linus/062a644d6121d5e2f51c0b2ca0cbc5155ebf845b (commit 1], [http://git.kernel.org/linus/4c9eefa16c6f124ffcc736cb719b24ea27f85017 2], [http://git.kernel.org/linus/7702e8f45b0a3bb262b9366c60beb5445758d94c 3], [http://git.kernel.org/linus/e43473b7f223ec866f7db273697e76c337c390f9 4], [http://git.kernel.org/linus/2786c4e5e54802c34297e55050fef3e862a27b3f 5], [http://git.kernel.org/linus/8e89d13f4ede2467629a971618537430fafaaea3 6)]

1.6. "Jump label": disabled tracepoints don't impact performance

A tracepoint can be described as a special printf() call, which is used inside the kernel and is used with tools like perf, LTT or systemtap to analyze the system behaviour. There are two types of tracepoints: Dynamic and static. Dynamic tracepoints modify the kernel code at runtime inserting CPU instructions where neccesary to obtain the data. Dynamic tracepoints are called 'kprobes' in the linux kernel, and their performance overhead was [http://kernelnewbies.org/Linux_2_6_34#head-c073d95babd93637a135873e9506b8197ad4ebdc optimized in Linux 2.6.34].

Static tracepoints, on the other hand, are inserted by the kernel developers by hand in strategic points of the code. For example, Ext4 has 50 static tracepoints. These tracepoints are compiled with the rest of the kernel code, and by default they are "disabled" - until someone activates them, they are not called. Basically, an 'if' condition tests a variable. The performance impact is nearly negligible, but it can be improved, and that's what the "jump label" feature does: A "no operation" CPU instruction is inserted in place of the conditional test, so a disabled static tracepoint has zero overhead. (Tip: You can use the "sudo perf list" command to see the full list of static tracepoints available in your system)

Recommended LWN article: [http://lwn.net/Articles/412072/ Jump label]

Code: [http://git.kernel.org/linus/bf5438fca2950b03c21ad868090cc1a8fcd49536 (commit 1], [http://git.kernel.org/linus/f49aa448561fe9215f43405cac6f31eb86317792 2], [http://git.kernel.org/linus/fa6f2cc77081792e4edca9168420a3422299ef15 3], [http://git.kernel.org/linus/e0cf0cd49632552f063fb3ae58691946da45fb2e 4], [http://git.kernel.org/linus/4c3ef6d79328c0e23ade60cbfc8d496123a6855c 5], [http://git.kernel.org/linus/52159d98be6f26c48f5e02c7ab3c9848a85979b5 6], [http://git.kernel.org/linus/8f7b50c514206211cc282a4247f7b12f18dee674 7], [http://git.kernel.org/linus/d9f5ab7b1c0a520867af389bab5d5fcdbd0e407e 8], [http://git.kernel.org/linus/46eb3b64dddd20f44e76b08676fa642dd374bf1d 9], [http://git.kernel.org/linus/dff9d3c215251022dd8bb3823c9f75edb4b63fe9 10], [http://git.kernel.org/linus/d6dad199a10423ce37b8bfec1f055c380dc4a3d5 11], [http://git.kernel.org/linus/95fccd465eefb3d6bf80dae0496607b534d38313 12)]

1.7. Btrfs Updates

1.8. Perf probe improvements

1.9. Power management improvements: LZO hibernation compression, delayed autosuspends

Several power-management related features have been added

1.10. Support for PPP over IPv4

This version introduces PPP over IPv4 support (PPTP). It dramatically speeds up pptp vpn connections and decreases cpu usage in comparison of existing user-space implementation (poptop/pptpclient). There is [https://sourceforge.net/projects/accel-pptp/ accel-pptp project] to utilize this module, t contains plugin for pppd to use pptp in client-mode and modified pptpd (poptop) to build high-performance pptp NAS.

Code: [http://git.kernel.org/linus/00959ade36acadc00e757f87060bf6e4501d545f (commit)]

2. Drivers and architectures

All the driver and architecture-specific changes can be found in the [http://kernelnewbies.org/Linux_2_6_37-DriversArch Linux_2_6_37-DriversArch page]

3. Core

4. CPU scheduler

5. Memory management

6. File systems

XFS

OCFS2

EXT4

CIFS

NFS

GFS2

NILFS2

7. Networking

8. Block

9. Crypto

10. Virtualization

KVM

11. Security

SELinux

KernelNewbies: Linux_2_6_37 (last edited 2011-01-05 09:49:34 by diegocalleja)