KernelNewbies:

Linux 2.6.32 has not been released, please don't publish articles about these features before it is released.

Summary: This version adds virtualization memory de-duplicacion, a rewrite of the writeback code which provides noticeable performance speedups, many important Btrfs improvements and speedups, ATI R600/R700 3D and KMS support and other graphic improvements, a CFQ low latency mode, tracing improvements including a "perf timechart" tool that tries to be a better bootchart, soft limits in the memory controller, support for the S+Core architecture, support for Intel Moorestown and its new firmware interface, run time power management support, and many other improvements and new drivers.

TableOfContents()

1. Prominent features (the cool stuff)

1.1. Per-backing-device based writeback

Slides from Jens Axboe: '[http://oss.oracle.com/~axboe/lpc2009.pdf Per backing device writeback]'

Recommended LWN article: '[http://lwn.net/Articles/326552/ Flushing out pdflush]' '[http://lwn.net/Articles/354851/ In defense of per-BDI writeback]'

"Writeback" in the context of the Linux kernel can be defined as the process of writing "dirty" memory from the page cache to the disk. The amount of data that needs to be written can be huge - hundreds of MB, or even GB, and the work is done by the well know "pdflush" kernel threads when the amount of dirty memory surpasses the limits set in /proc/sys/vm. The current pdflush system has disadvantages, specially in systems with multiple storage devices that need to write large chunks of data to the disk. This design has some deficiencies, described in the links abobe, that cause poor performance and seekiness in some situations. A new flushing system has been designed by Jens Axboe (Oracle), which focus around the idea of having a dedicated kernel thread to flushing the dirty memory of each storage device. The "pdflush" threads are gone and have been replaced with others named after "flush-MAJOR" (the threads are created when there's flushing work that needs to be done and will dissapear after a while if there's nothing to do).

The new system has much better performance in several workloads: A benchmark with two processes doing streaming writes to a 32 GB file to 5 SATA drives pushed into a LVM stripe set, XFS was [http://oss.oracle.com/~mason/seekwatcher/bdi-writeback/xfs-streaming-compare.png 40% faster], and Btrfs [http://oss.oracle.com/~mason/seekwatcher/bdi-writeback/btrfs-streaming-compare.png 26% faster]. A sample ffsb workload that does random writes to files was found to be about 8% faster on a simple SATA drive during the benchmark phase. File layout is much smoother on the vmstat stats. A SSD based writeback test on XFS performs over 20% better as well, with the throughput being very stable around 1GB/sec, where pdflush only manages 750MB/sec and fluctuates wildly while doing so. Random buffered writes to many files behave a lot better as well, as does random mmap'ed writes. A streaming vs random writer benchmark went from a few MB/s to ~120 MB/s. In short, performance improves in many important workloads.

Code: [http://git.kernel.org/linus/d8a8559cd7a9ccac98d5f6f13297a2ff68a43627 (commit 1], [http://git.kernel.org/linus/66f3b8e2e103a0b93b945764d98e9ba46cb926dd 2], [http://git.kernel.org/linus/03ba3782e8dcc5b0e1efe440d33084f066e38cae 3], [http://git.kernel.org/linus/d0bceac747b547c0b4769b91fec7d3c15600153f 4], [http://git.kernel.org/linus/f09b00d3e789a88fa6c7c03cedc62cb65c1de0cb 5], [http://git.kernel.org/linus/d993831fa7ffeb89e994f046f93eeb09ec91df08 6], [http://git.kernel.org/linus/500b067c5e6ceea49cf280a02597b1169320e08c 7)]

1.2. Btrfs improvements

Recommended LWN artice: [http://www.lwn.net/Articles/358940/ A Btrfs update]

1.3. Kernel Samepage Merging (memory deduplication)

Recommended LWN articles: '[http://lwn.net/Articles/306704/ /dev/ksm: dynamic memory sharing]', '[http://lwn.net/Articles/330589/ KSM tries again]'

Kernel Samepage Mergin, aka KSM (also know as Kernel Shared Memory in the past) is a memory de-duplication implementation.

Modern operative systems already use memory sharing extensively, for example forked processes share initially with its parent all the memory, there are shared libraries, etc. Virtualization however can't benefit easily from memory sharing. Even when all the VMs are running the same OS with the same kernel and libraries the host kernel can't know that a lot of those pages are identical and can be shared. KSM allows to share those pages. The KSM kernel daemon, ksmd, periodically scans areas of user memory, looking for pages of identical content which can be replaced by a single write-protected page (which is automatically COW'ed if a process wants to update it). Not all the memory is scanned, the areas to look for candidates for merging are specified by userspace apps using madvise(2): madvise(addr, length, MADV_MERGEABLE).

The result is a dramatic decrease in memory usage in virtualization environments. In a virtualization server, Red Hat found that thanks to KSM, KVM can run as many as 52 Windows XP VMs with 1 GB of RAM each on a server with just 16 GB of RAM. Because KSM works transparently to userspace apps, it can be adopted very easily, and provides huge memory savings for free to current production systems. It was originally developed for use with KVM, but it can be also used with any other virtualization system - or even in non virtualization workloads, for example applications that for some reason have several processes using lots of memory that could be shared.

The KSM daemon is controlled by sysfs files in /sys/kernel/mm/ksm/, documentation can be found in Documentation/vm/ksm.txt

Code: [http://git.kernel.org/linus/828502d30073036a486d96b1fe051e0f08b6df83 (commit 1], [http://git.kernel.org/linus/3866ea90d3635ddddcd77ce51087222ac7de85f2 2], [http://git.kernel.org/linus/d19f352484467a5e518639ddff0554669c10ffab 3], [http://git.kernel.org/linus/f8af4da3b4c14e7267c4ffb952079af3912c51c5 4], [http://git.kernel.org/linus/21333b2b66b805a360641568588e5a0bb06d9d1f 5], [http://git.kernel.org/linus/9a840895147b12de5cdd633c600b38686840ee53 6], [http://git.kernel.org/linus/31dbd01f314364b70c2e026a5793a29a4da8a9dc 7], [http://git.kernel.org/linus/1ff829957316670af64be24192ef849e7253a509 8], [http://git.kernel.org/linus/339aa62469f65daf38a01d6c098b5f3ff8016653 9], [http://git.kernel.org/linus/b4028260334e1ecf63fb5e0a95d65bb2db02c1ec 10], [http://git.kernel.org/linus/e178dfde3952192cf44eeb0612882f01fc96c0a9 11], [http://git.kernel.org/linus/473b0ce4d13ee77925a7062e25dea0d16a91f654 12], [http://git.kernel.org/linus/26465d3ea5a62d59efb3796b9e0e2b0656d02cb1 13], [http://git.kernel.org/linus/6e15838425ac855982f10419558649954a0684a3 14], [http://git.kernel.org/linus/81464e30609cdbd3d96d8dd6991e7481195a89a1 15], [http://git.kernel.org/linus/d952b79136a6c32a3f97e0628ca78340f1d5c6f9 16], [http://git.kernel.org/linus/cd551f97519d35855be5a8720a47cc802ee4fd06 17], [http://git.kernel.org/linus/9ba6929480088a85c1ff60a4b1f1c9fc80dbd2b7 18], [http://git.kernel.org/linus/1c2fb7a4c2ca7a958b02bc1e615d0254990bba8d 19], [http://git.kernel.org/linus/2ffd8679c8e4ec226718bff58b50b226dd477015 20], [http://git.kernel.org/linus/7701c9c0f54feb682d0cefa2ae1f4a1e00e0ba09 21], [http://git.kernel.org/linus/8314c4f24a0a5c9b1f7544e9fa83a1d5367ddaa7 22], [http://git.kernel.org/linus/a913e182ab9484308e870af37a14d372742d53b0 23)]

1.4. Improvements in the graphic stack

The landing of GEM and KMS in past releases is driving a much needed renovation in the Linux graphic stack. This release adds several improvements to the graphic drivers that show the steady progress of this kernel subsystem:

1.5. CFQ low latency mode

[http://lwn.net/Articles/355987/ Recommended LWN commentary from Jens Axboe]

In this release, the CFQ IO scheduler (the one used by default) gets a new feature that greatly helps to reduce the impact that a writer can have on the system interactiveness. The end result is that the desktop experience should be less impacted by background IO activity, but it can cause noticeable performance issues, so people who only cares about throughput (ie, servers) can try to turn it off echoing 0 to /sys/class/block/<device name>/queue/iosched/low_latency. It's worth mentioning that the 'low_latency' setting defaults to on.

Code: [http://git.kernel.org/linus/1d2235152dc745c6d94bedb550fea84cffdbf768 (commit)], [http://git.kernel.org/linus/963b72fc6664be12ea52f35a6addea14ec373433 (commit)]

1.6. Tracing improvements: perf tracepoints, perf timechart and perf sched

The perf tool is getting a lot of attention and patches. In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility, so the tool has been renamed from "Performance Counters" to "Performance Events".

Code: rename [http://git.kernel.org/linus/cdd6c482c9ff9c55475ee7392ec8f672eddb7be6 (commit)], perf trace [http://git.kernel.org/linus/5f9c39dca52d3e639ac899e169f408c6fd8396cc (commit)], syscalls [http://git.kernel.org/linus/a871bd33a6c0bc86fb47cd02ea2650dd43d3d95f (commit 1], [http://git.kernel.org/linus/fb34a08c3469b2be9eae626ccb96476b4687b810 2], [http://git.kernel.org/linus/64c12e0444fcc6b75eb49144ba46d43dbdc6bc8f 3], [http://git.kernel.org/linus/540b7b8d65575c80162f2a0f38e1d313c92a6042 4)], module [http://git.kernel.org/linus/7ead8b8313d92b3a69a1a61b0dcbc4cd66c960dc (commit)], skb [http://git.kernel.org/linus/5a165657bef7c47e5ff4cd138f7758ef6278e87b (commit)], memory allocator: [http://git.kernel.org/linus/4b4f278c030aa4b6ee0915f396e9a9478d92d610 (commit 1], [http://git.kernel.org/linus/e0fff1bd12469c45dab088e353d8882761387bb6 2], [http://git.kernel.org/linus/0d3d062a6e289e065bd0aa537a6806a1806bf8aa 3], [http://git.kernel.org/linus/c9d05cfc001fef3d6d37651e19ab9227a32b71f5 4], [http://git.kernel.org/linus/bb72222086260695d71afe60fa105649c1ea9463 5], [http://git.kernel.org/linus/8fbb398f5c78832ee61e0d5ed0793fa8857bd853 6], perf sched [http://git.kernel.org/linus/0a02ad9331dd4db56c29c60db2e99c4daaad8a48 (commit 1], [http://git.kernel.org/linus/ec156764d424dd67283c2cd5e9f6f1b8388364ac 2], [http://git.kernel.org/linus/fbf9482911825f965829567aea8acff3bbc5279c 3], [http://git.kernel.org/linus/cdce9d738b91e1ec686e4ef6ca46d46e2891e208 4], [http://git.kernel.org/linus/daa1d7a5eafc0a3a91a9add6a9a9f1bcaed63108 5], [http://git.kernel.org/linus/f2858d8ad9858e63c87257553c5721cba5db95ae 6], [http://git.kernel.org/linus/1fc35b29b4098aa3bf9fc9acb4c1615d0b5dd95d 7], [http://git.kernel.org/linus/c13f0d3c8165e9592102687fa999da0a0d9c3724 7], [http://git.kernel.org/linus/ea57c4f5203d82c7844c54cdef54e972cf4e9d1f 8], [http://git.kernel.org/linus/39aeb52f99f2380c1f16036deed2f7bb8b2e0559 9], [http://git.kernel.org/linus/0ec04e16d08b69d8da46abbcfa3e3f2cd9738852 10)], perf timechart[http://git.kernel.org/linus/f48d55ce7871824eae3065f4d81956d7113eff19 (commit 1], [http://git.kernel.org/linus/10274989fd595db455874fc2c83272fb33f6b27b 2], [http://git.kernel.org/linus/3c09eebd61eaacca866cd60b50416f18640bc731 3)]

1.7. Soft limits in the memory controller

Control groups are a sort of virtual "containers" that are created as directories inside a special virtual filesystem (usually, with [http://libcg.sourceforge.net/ tools]), and an arbitrary set of processes can be add to that control group and you can configure the control group to have a set of cpu scheduling or memory limits for the processes inside the group.

This release adds soft memory limits - the processes can surpass the soft limit as long as there is no memory contention (and they do no exceed their hard limit), but if the system needs to free memory, it will reclaiming it from the groups that exceed their soft limit.

Code: [http://git.kernel.org/linus/a6df63615b943dbef22df04c19f4506330fe835e (commit)], [http://git.kernel.org/linus/296c81d89f4f14269f7346f81442910158c0a83a (commit)], [http://git.kernel.org/linus/f64c3f54940d6929a2b6dcffaab942bd62be2e66 (commit)], [http://git.kernel.org/linus/4e41695356fb4e0b153be1440ad027e46e0a7ea2 (commit)], [http://git.kernel.org/linus/75822b4495b62e8721e9b88e3cf9e653a0c85b73 (commit)]

1.8. Virtualization improvements

This version adds a few notable improvements to the Linux virtualization subsystem, KVM:

1.9. Run-time Power Management

Recommended LWN article: '[http://lwn.net/Articles/347573/ Runtime power management]'

This feature enable functionality allowing I/O devices to be put into energy-saving (low power) states at run time (or autosuspended) after a specified period of inactivity and woken up in response to a hardware-generated wake-up event or a driver's request. Hardware support is generally required for this functionality to work and the bus type drivers of the buses the devices are on are responsible for the actual handling of the autosuspend requests and wake-up events.

Code: Introduce core framework for run-time PM of I/O devices (rev. 17) [http://git.kernel.org/linus/5e928f77a09a07f9dd595bb8a489965d69a83458 (commit)]

1.10. S+core architecture support

This release adds support for a new architecture, [http://w3.sunplus.com/products/S%2Bcore.asp S+core]. Score instruction set support 16bits, 32bits and 64bits instruction, Score SOCs had been used in game machine and LCD TV.

Code: [http://git.kernel.org/linus/6bc9a3966f0395419b09b2ec90f89f7f00341b37 (commit)]

1.11. Intel Moorestown and SFI (Simple Firmware Interface) and ACPI 4.0 support

The Simple Firmware Interface (SFI) is a method for platform firmware to export static tables to the operating system (OS) - something analogous to ACPI, used in the MID devices based on the 2nd generation Intel Atom processor platform, code-named Moorestown.

SFI is used instead of ACPI in those platforms because it's more simple and lightweight. It's not intended to replace ACPI. For more information, see [http://simplefirmware.org the web site]

At the same time, this release adds support for Moorestown, Intel's Low Power Intel Architecture (LPIA) based Moblin Internet Device(MID) platform. Moorestown consists of two chips: Lincroft (CPU core, graphics, and memory controller) and Langwell IOH. Unlike standard x86 PCs, Moorestown does not have many legacy devices nor standard legacy replacement devices/features. e.g. Moorestown does not contain i8259, i8254, HPET, legacy BIOS, most of the io ports.

There're also several patches that implement ACPI 4.0 support - Linux is in fact the first platform to support it.

SFI: [http://git.kernel.org/linus/6349d9979beba240fe7182872cb547250264b865 (commit 1], [http://git.kernel.org/linus/117a9ac777f8034d4675b821172d2ff71f6ec47a 2], [http://git.kernel.org/linus/6ae6996a466e14bcf41618cde641a74ae03dc285 3], [http://git.kernel.org/linus/13e82d023c4c3f13ab1e665cbb917a7ebba8935c 4], [http://git.kernel.org/linus/efafc8b213e67ed148a5b53ade29ee7b48af907d 5], [http://git.kernel.org/linus/5f0db7a2fb78895a197f64e548333b3bbd433996 6)] Moorestown: [http://git.kernel.org/linus/162bc7ab01a00eba1c5d614e64a51e1268ee3f96 (commit)], [http://git.kernel.org/linus/3f4110a48a749a1aa1c54fb807afb3f32f49711c (commit)]

1.12. NAPI-like approach for block devices

Recommended LWN article: '[http://lwn.net/Articles/346219/ Interrupt mitigation in the block layer]'

blk-iopoll is a NAPI like approach for block devices, it reduces the interrupt overhead. In benchmarks, blk-iopoll cut sys time by 40% in some cases.

Code: [http://git.kernel.org/linus/5e605b64a183a6c0e84cdb99a6f8acb1f8200437 (commit)]

2. Various core changes

3. Block

4. Virtualization

5. PCI

6. MD/DM

7. Filesystems

8. Networking

9. Security

10. Tracing/Profiling

11. Crypto

12. Architecture-specific changes

13. Drivers

13.1. Graphics

13.2. Storage

13.3. Networking devices

13.4. USB

13.5. Input

13.6. Sound

13.7. Staging Drivers

13.8. V4L/DVB

13.9. Bluetooth

13.10. MTD

13.11. HWMON

13.12. ACPI

13.13. Various

13.14. Other news sources tracking the kernel changes

KernelNewbies: Linux_2_6_32 (last edited 2009-12-02 01:04:40 by diegocalleja)