Linux kernel version 2.6.23 Released 9 October 2007 (full SCM git log)

During the development of 2.6.23, the 2007 version of the Linux Kernel Developers' Summit was held on September 5 and 6 in Cambridge, UK. We only can recommend to read the excellent coverage done by LWN.

Contents

Short overview (for news sites, etc)
Important things (AKA: ''the cool stuff'')
Subsystems
Drivers
Crashing soon a kernel near you
In the news

1. Short overview (for news sites, etc)

2.6.23 includes the new, better, fairer CFS process scheduler, a simpler read-ahead mechanism, the lguest 'Linux-on-Linux' paravirtualization hypervisor, XEN guest support, KVM smp guest support, variable process argument length, make SLUB the default slab allocator, SELinux protection for exploiting null dereferences using mmap, XFS and ext4 improvements, PPP over L2TP support, the 'lumpy' reclaim algorithm, a userspace driver framework, the O_CLOEXEC file descriptor flag, splice improvements, new fallocate() syscall, lock statistics, support for multiqueue network devices, various new drivers and many other minor features and fixes.

2. Important things (AKA: ''the cool stuff'')

2.1. The CFS process scheduler

The new process scheduler, a.k.a CFS (Completely Fair Scheduler), has generated too much noise in some circles due to the way this scheduler has been chosen over its competitor RSDL. A bit of history is needed to clarify what happened and what CFS does compared to the old scheduler.

During the development of Linux 2.5, the 'O(1)' process scheduler (PS) from Ingo Molnar was merged to replace the one inherited from 2.4. The O(1) PS was designed to fix the scalability issues in the 2.4 PS - the performance improvements were so big that the O(1) PS was one of the most frequently backported features to 2.4 in commercial Linux distributions. However, the algorithms in charge of scheduling processes were not changed that much, as they were considered 'good enough', or at least it wasn't perceived as a critical issue. But those algorithms can make a huge difference in what the users perceive as 'interactivity'. For example, if a process - or more than one - starts an endless loop and due to those CPU-bound loopers and the PS doesn't assign as much CPU as necessary to the already present non-looping processes in charge of implementing the user interfaces (X.org, kicker, firefox, openoffice.org, etc), the user will perceive that the programs don't react to the users' actions very smoothly. Or worse, in the case of music players your music could skip.

The O(1) PS, just like the previous PSs, tried to improve those cases and generally, it did a good job most of the time. However, many users reported corner cases and not-so-corner cases where the new PS didn't work as expected. One of those users was Con Kolivas, and despite his inexperience in the kernel hacking world, he tried to fine-tune the scheduling algorithms, without replacing them. His work was a success, and his patches found a way into the main kernel, and other people (Mike Galbraith, Davide Libenzi, Nick Piggin) also helped to tweak the scheduler. But not all the corner cases disappeared, and some new ones appeared when trying to fix others. Con found that the 'interactivity estimator' - a piece of code used by the PS to try to decide which processes were more 'interactive' and hence needed more attention, so that the user would perceive their desktops as 'more interactive' - caused more problems than it solved. Contrary to its original purpose, the interactivity estimator couldn't fix all the 'interactivity' problems present in the PS, and trying to fix one would open another issue. It was the typical case of an algorithm using statistics to try to predict the future with heuristics, and failing at it. Con designed a new PS, called RSDL, that killed the interactivity estimation code. Instead, his PS was based on the concept of 'fairness': processes are treated equally and are given same timeslices (see this LWN article for more details on this PS), and the PS doesn't care or even try to guess if the process is CPU bound or IO-bound (interactive). This PS improved the user's perceived "interactivity" in those corner cases as well.

This PS was the one that was going to get merged, but Ingo Molnar (the O(1) creator) created another new PS, called CFS (alias for 'Completely Fair Scheduler'), taking as one the basic design element the 'fair scheduling' idea that Con's PS had proven to be superior. It was well received by some hackers, which helped Ingo (and Mike Galbraith, Peter Zijlstra, Thomas Gleixner, Suresh Siddha, and many others) to make CFS a good PS alternative for mainline. 'Fairness' is the only idea shared between RSDL and CFS and that's where the similarities stop, and even the definition of 'fairness' is very different: RSDL uses a 'strict' definition of fairness. But CFS includes the sleep time in the task's fairness metric: this means that in CFS, sleeping tasks (the kind of tasks that usually run the code that the user feels as 'interactive', like X.org, mp3 players, etc) do get more CPU time than running tasks (unlike the 'strict fairness' of RSDL, where they are treated with a strict fairness), but it's all kept under control by the fairness engine. This design gets the best of both worlds: fairness and interactivity, but without resorting to an interactivity estimator.

CFS has other differences compared to the old mainline scheduler and RSDL: instead of runqueues, it uses a time-ordered rbtree to build a 'timeline' of future task execution, to try to avoid the 'array switch' artifacts that both the vanilla and the RSDL PS can suffer. It also uses nanosecond granularity accounting and does not rely on any jiffies or other HZ detail; in fact it does not have the notion of traditional 'timeslices': the slicing is decided dynamically, not statically, and there's no persistency to timeslices (i.e. timeslices are not 'given' to a task and 'used up' by a task, in the traditional sense, because CFS is able to accurately track the full history of the task's execution via the nanoseconds accounting). Plus it has extensive instrumentation with CONFIG_SCHED_DEBUG=y. Because of all those changes, CFS is a quite radical rewrite of the Linux PS (~70% of its code is touched), and hence bigger than RSDL (in terms of patch's size, not the memory footprint: RSDL patchset weighted 88K, whereas CFS patcheset weights 290k). Read this LWN article for more details on CFS design.

So CFS was finally chosen as replacement for the current 'O(1)' PS over RSDL - surprisingly this choice generated much noise due to Con announcement about quitting from kernel development - but Con has publicly said that it's not due to that. It seems like the debate has calmed down now and that there's no reason to think that CFS was chosen for anything but technical reasons. It must be noted that both RSDL and CFS are better schedulers than the one in mainline, and that it was Con who pioneered the idea of using the concept of 'fairness' over the 'interactivity estimations', but that doesn't mean that CFS didn't deserve to get merged as the definitive replacement of the mainline's PS; it doesn't mean either that RSDL isn't also great replacement.

NOTE!: Applications that depend heavily on sched_yield()'s behaviour (like, f.e., many benchmarks) can suffer from huge performance gains/losses due to the very very subtle semantics of what sched_yield() should do and how CFS changes them. There's a sysctl at /proc/sys/kernel/sched_compat_yield that you can set to "1" to change the sched_yield() behaviour that you should try in those cases. It must be also noticed that CFS is also available as a backport for 2.6.22, 2.6.21 and 2.6.20.

CFS code: (commit 1, 2, 3, 4, 5, 6, 7, 8, 9)

2.2. On-demand read-ahead

Click to read a recommended LWN article about on-demand read-ahead

On-demand read-ahead is an attempt to simplify the Adaptive read-ahead patches. On-demand readahead reimplements the Linux readahead functionality, removing a lot of complexity from the current system and making it more flexible. This new system maintains the same performance for trivial sequential/random reads, it improves the sysbench/OLTP MySQL benchmark up to 8%, and performance on readahead thrashing gains up to 3 times. There are more read-ahead patches based in this infrastructure pending and further work could be done in this area as well, so expect more improvements in the future. Detailed design document and benchmarks can be found here.

Code: (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)

2.3. fallocate()

Click to read a recommended LWN article about fallocate()

fallocate() is a new system call which will allow applications to preallocate space to any file(s) in a file system. Applications can get a guarantee of space for particular file(s) - even if later the system becomes full. Applications can also use this feature to avoid fragmentation to certain level in many filesystems (fe: it avoids the fragmentation that can happen in files that are frequently increasing their size) and thus get faster access speed.

Currently, glibc provides the POSIX interface called posix_fallocate() which can be used for similar cause. Though this has the advantage of working, it is quite slow (since it writes zeroes to each block that has to be preallocated). Without a doubt, file systems can do this more efficiently within the kernel, by implementing the proposed fallocate() system call, and this is what 2.6.23 does. It is expected that posix_fallocate() will be modified to call this new system call first and in case the kernel/filesystem does not implement it, it should fall back to the current implementation of writing zeroes to the new blocks.

In 2.6.23, only ext4 and ocfs2 are adding support for the fallocate() interface.

Code: (commit)

2.4. Virtualization: lguest and Xen

Linux has good virtualization support thanks to the paravirtualization and KVM support. 2.6.23 is improving the support of the trend-of-the-decade by adding lguest and Xen support - both of them based in the paravirt_ops infrastructure.

2.4.1. lguest

Click to read a recommended article about lguest

lguest is a simple hypervisor for Linux on Linux (in other words, it allows to run linux -only linux- guests) based in the paravirt_ops infrastructure. Unlike kvm it doesn't need VT/SVM hardware. Unlike Xen it's simply "modprobe and go". Unlike both, it's 5000 lines and self-contained.

The goal of his author, Rusty Russell, was not to create the simplest and greatest hypervisor ever, but rather create a simple, small (5000 lines of code) hypervisor example to show the world how powerful the paravirt_ops infrastructure is. Performance is ok, but not great (-30% on kernel compile), precisely because it was written to be simple. But given its hackability, it may improve soon. The author encourages people to fork it and try to create a much better hypervisor: Too much of the kernel is a big ball of hair. lguest is simple enough to dive into and hack, plus has some warts which scream "fork me!". A 64-bit version is also being worked on.

Lguest host support (CONFIG_LGUEST) can be compiled as a module (lg.ko). This is the host support - once you load it, your kernel will be able to run virtualized lguest guests. But kernel guests need to compile lguest guest support in order to be able to run under the lguest host. The configuration variable that enables the guest support is CONFIG_LGUEST_GUEST - but that option will be enabled automatically once you set CONFIG_LGUEST to 'y' or 'm'. This means that a kernel compiled with lguest host support does also get lguest guest support. In other words, you can use the same kernel you use to be a host as guest kernel. In order to load and run new guests, you need a loader userspace program. The instructions and the program can be found at Documentation/lguest/lguest.txt

Code: drivers/lguest, Documentation/lguest

2.4.2. Xen

Part of Xen has been merged. The support included in 2.6.23 will allow the kernel to boot in a paravirtualized environment under the Xen hypervisor. But support for the hypervisor is not included - this is only guest support, no dom0, no suspend/resume, no ballooning. It's based in the paravirt_ops infrastructure.

Code: (part 1, drivers/xen, part 2, arch/i386/xen)

2.5. Variable argument length

From a Slashdot interview to Rob Pike: I didn't use Unix at all, really, from about 1990 until 2002, when I joined Google. (I worked entirely on Plan 9, which I still believe does a pretty good job of solving those fundamental problems.) I was surprised when I came back to Unix how many of even the little things that were annoying in 1990 continue to annoy today. In 1975, when the argument vector had to live in a 512-byte-block, the 6th Edition system would often complain, 'arg list too long'. But today, when machines have gigabytes of memory, I still see that silly message far too often. The argument list is now limited somewhere north of 100K on the Linux machines I use at work, but come on people, dynamic memory allocation is a done deal!

While Linux is not Plan 9, in 2.6.23 Linux is adding variable argument length. Theoretically you shouldn't hit frequently "argument list too long" errors again, but this patch also limits the maximum argument length to 25% of the maximum stack limit (ulimit -s).

Code: (commit)

2.6. PPP over L2TP

Linux 2.6.23 adds support for PPP-over-L2TP socket family. L2TP (RFC 2661) is a protocol used by ISPs and enterprises to tunnel PPP traffic over UDP tunnels. L2TP is replacing PPTP for VPN uses. The kernel component included in 2.6.23 handles only L2TP data packets: a userland daemon handles L2TP the control protocol (tunnel and session setup). One such daemon is OpenL2TP.

Code: (commit 1, 2, 3) Documentation: (commit)

2.7. Autoloading of ACPI kernel modules

With Linux 2.6.23, the ACPI modules are exporting the device table symbols in the drivers so that udev can automatically load them through the usual mechanisms.

Code: (commit 1, 2, 3)

2.6.23 also adds DMI/SMBIOS based module autoloading to the Linux kernel. The idea is to load laptop drivers automatically (and other drivers which cannot be autoloaded otherwise), based on the DMI system identification information of the BIOS. Right now most distros manually try to load all available laptop drivers on bootup in the hope that at least one of them loads successfully. This patch does away with all that, and uses udev to automatically load matching drivers on the right machines.

Code: (commit)

2.8. async_tx API

The async_tx API provides methods for describing a chain of asynchronous bulk memory transfers/transforms with support for inter-transactional dependencies. It is implemented as a dmaengine client that smooths over the details of different hardware offload engine implementations. The raid5 DM engine has been transformed to use the async_tx API, getting performance improvements (in the tiobenchmark and with iop342, it shows 20 - 30% higher throughput for sequential writes and 40 - 55% gains in sequential reads to a degraded array). API documentation.

Code: (commit)

2.9. 'Lumpy' reclaim

Click to read a recommended LWN article which touches the 'lumpy' reclaim feature

High-order petitions of free memory in the kernel (IOW, petitions of free memory that are bigger than one memory page and must be contiguous) can fail easily due to the memory fragmentation when there's very little free memory left: When the memory management subsystem tries to free some memory to make room for the petition, it frees pages in LRU (Least Recently Used) order, and pages freed in LRU order are not necessarily contiguous - rather, they're freed according to how recently it was used. So the allocation may still fail.

The 'lumpy' reclaim modifies the reclaim algorithm to improve this situation: When it needs to free some pages, it tries to free the pages contiguous to the first chosen page in the LRU, ignoring the recency, improving the possibilities of finding a contiguous block of free memory.

Code: (commit)

2.10. Movable Memory Zone

It is often known at allocation time whether a page may be migrated or not. This feature adds a flag called __GFP_MOVABLE to the memory allocator and a new mask called GFP_HIGH_MOVABLE. Allocations using the __GFP_MOVABLE can be either migrated using the page migration mechanism or reclaimed by syncing with backing storage and discarding. This feature also creates a memory zone called ZONE_MOVABLE that is only usable by allocations that specify both __GFP_HIGHMEM and __GFP_MOVABLE. This has the effect of keeping all non-movable pages within a single memory partition while allowing movable allocations to be satisfied from either partition. More details in the commit links.

Code: (commit 1, 2, 3, 4, 5)

2.11. UIO

Click to read a recommended LWN article about UIO

UIO is a framework that allows to implement drivers in userspace. This kind of thing causes much noise due to "monolithic vs microkernel" topic. To the surprise of many, the Linux ecosystem has actually supported userspace drivers for cases that had sense for a long time. libusb allows to access the USB bus from userspace and implement drivers there. This is why you don't have specific drivers for, f.e., your scanner or USB digital camera, programs like sane, gphoto, gnokii, gtkam, hplip, or even some music players like rhythmbox or amarok, use libusb to access the USB bus and talk to USB devices directly. The 2D X.org drivers that you configure in your x.org file are another popular example of drivers that not only they run in userspace, they also are portable to other unix operative systems (they're also an example of why userspace drivers can't avoid hanging your machine due to a bug in the driver that triggers a hardware hang). CUPS and programs accessing the serial port like pppd are yet another example of userspace programs accessing the devices directly - the kernel doesn't implement any specific LPT printer or serial modem driver, those userspace programs implement the driver that knows how to talk to the printer.

In other words, userspace drivers are not new. UIO is not a try to migrate all the Linux kernel drivers to userspace. In fact, a tiny (150 lines in the sample driver, including comments etc) kernel-side driver to handle some basic interrupt routine is needed as part of every UIO driver. UIO is just a simple way to create very simple, non-performance critical drivers, which has probably been merged more with a "merge-and-see-if-it-happens-something-interesting" attitude than anything else. For now UIO doesn't allow to create nothing but very very simple drivers: No DMA, no network and block drivers....

UIO Code: (commit) UIO Documentation: (commit) Sample kernel-side UIO Hilscher CIF card driver (commit)

2.12. O_CLOEXEC file descriptor flag

Click to read a recommended LWN article about the O_CLOEXEC open() flag

In multi-threaded code (or more correctly: all code using clone() with CLONE_FILES) there's a race when exec'ing (see commit link for details). In some applications this can happen frequently. Take a web browser. One thread opens a file and another thread starts, say, an external PDF viewer. The result can even be a security issue if that open file descriptor refers to a sensitive file and the external program can somehow be tricked into using that descriptor. 2.6.23 includes the O_CLOEXEC ("close-on-exec") fd flag on open() and recvmsg() to avoid this problem.

Code: (commit 1, 2)

2.13. Use splice in the sendfile() implementation

Splice is a innovative I/O method which was added in Linux 2.6.17, based in a in-kernel buffer that the user has control over, where "splice()" moves data to/from the buffer from/to an arbitrary file descriptor with splice(), while "tee()" copies the data in one buffer to another, ie: it "duplicates" it, or vmsplice() to splice the data from/to user memory. Because the in-kernel buffer isn't really copied from one address space to another, it allows to move data from/to a fd without an extra copy (ie, "zero-copy").

For the particular case of sending the data from a file descriptor to a fd socket, there's been always the sendfile() syscall. splice() however is a generic mechanism, not just limited to what sendfile(). In other words, sendfile() is just a small subset of what splice can do, splice obsoletes it. In Linux 2.6.23, the sendfile() mechanism implementation is killed, but the API and its functionality is not removed, it's instead implemented internally with the splice() mechanisms.

Because sendfile() is critical for many programs, specially for static web servers and FTPs, performance regressions could happen (and performance improvements!) and the kernel hackers would really like to hear about them both in linux-kernel@vger.kernel.org and/or other usual communication channels.

In other news, 2.6.23 adds splice vmsplice-to-user support. It must be noticed again that splice() obsoletes sendfile() in Linux, and its mechanisms allow to build further performance improvements in your software.

Code: (commit 1, 2, 3, 4, 5, 6)

2.14. XFS and EXT4 improvements

XFS
- Lazy Superblock Counters: When there are a couple of hundred transactions on the fly at once, they all typically modify the on disk superblock in some way. , locking the buffer until the transaction is committed into the incore log buffer. The result of this is that with enough transactions on the fly the incore superblock buffer becomes a bottleneck. In 2.6.23, XFS avoids this bottleneck (see commit for details). But due the way XFS works, in order to make it work well with this new feature, a new counter was added to track the number of blocks used by the free space btrees. This is an on-disk format change. As a result of this, lazy superblock counters are a mkfs option and at the moment on Linux there is no way to convert an old filesystem, although one solution will be developed. Code (commit)
- Concurrent Multi-File Data Streams: In media spaces, video is often stored in a frame-per-file format. When dealing with uncompressed realtime HD video streams in this format, it is crucial that files do not get fragmented and that multiple files are placed contiguously on disk. When multiple streams are being ingested and played out at the same time, it is critical that the filesystem does not cross the streams and interleave them together as this creates seek and readahead cache miss latency and prevents both ingest and playout from meeting frame rate targets. This feature creates a "stream of files" concept into the allocator to place all the data from a single stream contiguously on disk so that RAID array readahead can be used effectively. Each additional stream gets placed in different allocation groups within the filesystem, thereby ensuring that XFS doesn't cross any streams. Code: (commit)
EXT4: As it gets developed, ext4 codebase is synced periodically to mainline. In 2.6.23, a sync with some features has been done - this doesn't mean that ext4 is stable:
- Fallocate() support (commit), write support for preallocated blocks (commit)
- Change on-disk format to support 2^15 uninitialized extents (commit)
- Enable extents by default (commit), (commit)
- Add nanosecond timestamps (commit)
- Remove 65000 subdirectory limit (commit)

2.15. Coredump filter mask

The purpose of this feature is to control which VMAs should be dumped based on their memory types and per-process flags, in order to avoid longtime system slowdown when a number of processes which share a huge shared memory are dumped at the same time, or just to avoid dumping information that you don't need. Users can access the per-process flags via /proc/<pid>/coredump_filter interface. coredump_filter represents a bitmask of memory types, and if a bit is set, VMAs of corresponding memory type are written into a core file when the process is dumped. The bitmask is inherited from the parent process when a process is created.

Code: (commit 1, 2, 3, 4)

2.16. Rewrite the x86 asm setup in C

In 2.6.23 the x86 setup code, which is currently all in assembly, is replaced with a version written in C, using the ".code16gcc" feature of binutils. The new code is vastly easier to read and debug. It should be noted that a fair number of minor bugs were found while going through this code, but new ones could have been created, due to the extreme fragility of a part of the kernel like this. During testing, it has showed to be very stable.

Code:

(commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)

2.17. New drivers

Sound
- Add drivers for the devices on S3C24xx embedded systems, like the Openmoko Neo1973 phone, the SMDK2443 reference board and the Samsung S3C2443 CPU (commit 1, 2, 3, 4, 5)
- Add driver for the SEGA Dreamcast PCM device (commit)
- Add driver for the SH7760 embedded systems (commit)
- Add driver for the Cyrix/NatSemi Geode CS5530 (VSA1) (commit)
- Add sound driver for the PS3 (commit)
Hwmon
- Add driver for the SMSC DME1737 and Asus A8000 Super-I/O chips (commit)
- Add driver for the SCH5317 chip (commit)
- Add driver for the newer uGuru's (commit)
- Add driver for the National Semiconductor LM93 chips (commit)
- Add driver for the Texas Instruments THMC50 / Analog Devices ADM1022 (commit)
RTC
- Add driver for the ST m48t59 RTC (commit)
- Add driver for the DS1216 chips (commit)
- Add driver for the ST m41t80 series chip (commit)
- Add driver for the Atmel on-chip RTC on AT32AP700x devices (commit)
EDAC
- Add driver for the 440BX chipset (I82443BXGX) (commit)
- Add driver for the Intel 5000X/V/P (Blackford/Greencreek) (commit)
- Add driver for the Intel 3000 and 3010 memory controllers (commit)
- Add driver for the memory controllers on PA Semi PA6T-1682M (commit)
- Add driver for the i82975x memory controller chipset Used on ASUS motherboards (commit)
Network
- Add driver for the Asix AX88796 network controller, an NE2000 compatible 10/100 ethernet device (commit)
- Add driver for the gigabit network device in the PS3 (commit)
- Add wifi rtl8187 driver for the Realtek 8187 USB wireless card (commit)
- Add driver for EISA only SNI RM200/RM400 machines (commit)
- Add driver for MAC-VLAN (commit)
USB
- Add driver for the M66592 USB peripheral controller (commit)
- Add driver for the Renesas R8A66597 USB HCD (commit)
- Add serial driver for the OTi-6858 USB To RS232 Bridge Controller (in Nokia CA-42 cable) (commit)
- Add driver for the USB AMD5536 UDC, as found in the AMD Geode CS5536 (southbridge) (commit)
- Add gadget driver for the Samsung s3c2410 ARM (commit)
SPI
- Add driver for the Infineon TLE62X0 (for power switching) (commit)
- Add master driver for Xilinx virtex (commit)
- Add driver for the OMAP24XX McSPI (commit)
- Add driver for the Toshiba TXx9 SPI controller (commit)
Watchdog
- Add driver for the MPC5200 (commit)
- Add driver for the on-chip watchdog device in the Blackfin chips (commit)
- Add driver for the watchdog device in TI Davinci DM644x/DM646x processors (commit)
- Add driver for the AT32AP700X devices (commit)
Generic PDA/phone power drivers for PDAs and phones (commit 1, 2):
- APM emulation driver for class batteries (commit)
- 1-Wire ds2760 chip battery driver (commit), (commit)
- Apple PMU driver (commit)
- One Laptop Per Child power/battery driver (commit)
I2C
- Add driver for the New DS1682 chip (commit)
- Add driver for the PMC MSP71xx TWI bus (commit)
- Add driver for the Taos TSL2550 ambient light sensors (commit)
Graphics
- Texas Instruments OMAP framebuffer driver, used various OMAP1/2 series based boards and products e.g Nokia N800 Internet Tablet, H4, H3, Siemens SX1... (commit)
- Add framebuffer support for the display controller integrated into the AMD Geode LX processors (commit)
Input
- Add driver for Fujitsu serial touchscreens (commit)
- Add gpio-mouse driver (commit)
Dmaengine
- Add drivers for the iop32x, iop33x, and iop13xx raid engines (commit)
Various
- IrDA: EP7211 IR driver port to the latest SIR API (commit)
- Add generic GPIO LED driver (commit)
- Add support for Xilinx SystemACE CompactFlash interface (commit)
- SB1250 DUART serial support (commit)
- spi_lm70llp parport adapter driver (commit)

3. Subsystems

Memory management, block layer, various...
- vmsplice-to-user (commit)
- Make SLUB the default allocator (commit)
- Support slub_debug on by default (commit)
- Slob: initial NUMA support (commit), sparsemem support (commit)
- numa: mempolicy: dynamic interleave map for system init (commit)
- NUMA's zonelist (of pgdat) order selectable. Available order are Default(automatic)/ Node-based / Zone-based (commit)
Block layer
- Support for full generic block layer SG v3 (commit)
- Replace SG v3 with SG v4 (except for SG_IO) (commit)
- Bind bsg to all SCSI devices (commit)
- Add SCSI transport-level request support to bsg (commit)
Various
- lockdep: lock statistics, which provides lock wait-time and hold-time (as well as the count of corresponding contention and acquisitions events). Also, the first few call-sites that encounter contention are tracked (commit), lock bouncing measurements (commit)
- Maximum stack utilization instrumentation: this feature will look at each kernel stack at process exit and log it if it's the deepest stack seen so far (commit)
- fault-injection: add min-order parameter to fail_page_alloc (commit)
- Add a flag to indicate deferrable timers in /proc/timer_stats (commit)
- add printk.time option, deprecate 'time' (commit)
- make seccomp zerocost in schedule (commit)
- taskstats: add context-switch counters (commit)
- DM: LSI/Engenio RDAC multipath support (commit)
- PM: introduce hibernation and suspend notifiers (commit), optional beeping during resume from suspend to RAM (commit)
- Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION (commit)
- Introduce CONFIG_SUSPEND for suspend-to-Ram and standby (commit)
- Add /sys/kernel/notes (commit)
- Console handover to preferred console (commit)
- Enable arbitrary speed tty support (commit)
- init: wait for asynchronously scanned block devices (commit)
- user namespaces, that allows containers, i.e. vservers, to use user namespaces to provide different user info for different server (commit), (commit), (commit)
- diskquota: 32bit quota tools on 64bit architectures (commit)
- Report that kernel is tainted if there was an OOPS (commit)
- Add LZO1X algorithm to the kernel (commit) and CRC7 support (commit)
- Document translation into Japanese, Chinese and Korean (commit 1, 2, 3, 4, 5, 6, 7, 8)

3.1. Filesystems

OCFS2
- Fallocate() support (commit)
- Add "preferred slot" mount option (commit)
- Shared writeable mmap support (commit)
- Support xfs style space reservation ioctls (commit)
- Btree changes for unwritten extents (commit), btree support for removal of arbitrary extents (commit), support writing of unwritten extents (commit), support creation of unwritten extents (commit)
NFS
- Re-enable forced umounts (commit)
- Add support for mounting NFSv4 file systems with string options (commit)
- Add final pieces to support in-kernel mount option parsing (commit)
- Add the mount option "nosharecache" (commit)
CIFS
- Add support for new POSIX unlink (commit)
- Allow disabling CIFS Unix Extensions as mount option (commit)
EXT*: statfs speed up in ext2 (commit) and ext3 (commit)
GFS2: Add nanosecond timestamp feature (commit)
AFS: implement file locking (commit)
Debugfs: add rename support for debugfs files (commit)

3.2. Networking

Add multiqueue hardware support API (commit), (commit)
Add the new sch_rr qdisc for multiqueue network device support (commit)
Dynamic multicast groups for generic netlink. (commit), (commit)
SKBUFF: Keep track of writable header len of headerless clones it saves huge amounts of system time in case of sendfile, bringing it down to basically the same amount as without NAT, with sendmsg it only helps on loopback, probably because of the large MTU.(commit)
IPV6 checksum offloading in network devices (commit)
Loadable module support for MIPv6. (commit)
AF_UNIX: Rewrite garbage collector (commit)
TIPC: Improved support for Ethernet traffic filtering (commit)
Add support for configuring secondary unicast addresses on network devices (commit)
PKTGEN: IPSEC support (commit)
Allow group ownership of TUN/TAP devices. (commit)
IrDA: Netlink layer. (commit), monitor mode (commit)
The scheduled removal of multipath cached routing support. (commit)
Remove CONFIG_NET_ESTIMATOR option (commit)
NETFILTER
- nf_conntrack: UDPLITE support (commit), introduce extension infrastructure (commit), use extension infrastructure for helper (commit), remove old memory allocator of conntrack (commit), use hashtable for expectations (commit), nf_conntrack_helper: use hashtable for conntrack helpers (commit)
- nf_nat: add reference to conntrack from entry of bysource list (commit), use extension infrastructure (commit)
- Add u32 match (commit)
- x_tables: add TRACE target (commit), add connlimit match (commit)
MAC80211
- Monitor mode radiotap-based packet injection (commit)
- cfg80211: Radiotap parser (commit)
- Add support for iwlist channel (commit)
- Implementation of SIOCSIWRATE (commit), (commit)
- Show transmitted frames on monitor interfaces (commit)

3.3. SELinux

Protection for exploiting null dereference using mmap (commit)
Add support for querying object classes and permissions from the running policy (commit)
Add selinuxfs structure for object class discovery (commit)
Enable dynamic activation/deactivation of Netlabel/SELinux enforcement (commit)

3.4. Audit

Add TTY input auditing (commit)
Allow audit filtering on bit & operations (commit)

3.5. KVM

Enable guest smp (commit)
Implement rdmsr and wrmsr. This allows smp Windows to boot (commit)
i386: Allow KVM on i386 nonpae (commit)

3.6. Architecture-specific changes

x86/x86_64
- Remove support for the Rise CPU (commit)
- PM_TRACE support (commit)
- Divorce CONFIG_X86_PAE from CONFIG_HIGHMEM64G (commit)
- Basic infrastructure support for AMD geode-class machines (commit)
- i386: show unhandled signals, it makes the i386 behave the same way that x86_64 does when a segfault happens (commit)
- x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu (commit)
- x86_64: introduce CalIOC2 support (commit)
- x86_64: make k8topology multi-core aware (commit)
SH
- Allow for bootmem debug support. (commit)
- sparsemem support (commit)
- memory hot-add for sparsemem users support. (commit)
- cpufreq: clock framework support. (commit)
- intc: Add support for 7722 processor (commit), add support for 7780 (commit), add support for SH7750 and its variants (commit), shared IPR and INTC2 controller (commit)
- Preliminary support for the SH-X3 CPU. (commit)
- r7780rp: Add R8A66597 and M66592 support. (commit)
- Remove support for sh7300 and solution engine 7300 (commit), remove support for sh73180 and solution engine 73180 (commit)
POWERPC
- Add EEH sysfs blinkenlights (commit)
- spufs: Add support for SPU single stepping (commit), add a "capabilities" file to spu contexts (commit), add spu shutdown method (commit), implement /proc/spu_loadavg (commit), add spu stats in sysfs (commit), dynamic timeslicing for SCHED_OTHER (commit)
- ptrace updates & new, better requests (commit)
- Oprofile support for Power 5++ (commit)
- Add 8548 CDS PCI express controller node and PCI-X device node (commit)
- Add basic PCI node for mpc8568mds board (commit)
- Added 8568 PCIe support (commit)
- Add basic PCI/PCI Express support for 8544DS board (commit)
- FSL: Add support for PCI-X controllers (commit)
- Add driver for DDR2 memory on AXON Cell systems (commit)
- Add support for MSI on Axon-based Cell systems (commit)
- Add support to OProfile for profiling CELL BE SPUs (commit)
- 8xx: mpc885ads pcmcia support (commit)
- PS3: Bootwrapper support. (commit)
- ps3: BD/DVD/CD-ROM Storage Driver (commit)
- ps3: Disk Storage Driver (commit)
- ps3: FLASH ROM Storage Driver (commit)
- PS3: Add support for HDMI RGB Full Range mode (commit), Kexec support (commit), Storage Driver Core (commit)
MIPS
- User stack pointer randomisation (commit)
- Add generic GPIO support (commit)
- Add generic GPIO to Au1x00 (commit)
- EV64120: Remove support (commit)
- Add PMC MSP71xx core platform (commit)
- New files for lemote fulong mini-PC support (commit)
- Enable support for the userlocal hardware register (commit)
- rbtx4938: Add generic GPIO support (commit)
- PMC MSP71xx PCI support (commit)
- PMC MSP71xx mips common (commit)
- Remove Momenco Ocelot C support (commit)
- Remove LASAT Networks platforms support (commit)
- Delete Ocelot 3 support. (commit)
- Remove Momentum Ocelot support. (commit)
- DDB5477: Remove support (commit)
ARM
- Remove the arm26 port due to lack of maintenance (commit)
- davinci: GPIO support (commit), clock control support (commit), pin mux support (commit),
- ANUBIS: Anubis AX88796 support (commit), add SM501 device resources (commit), large page NAND support (commit)
- BAST: AX88796 device resources (commit)
- OSIRIS: large page NAND support (commit)
- AT91: LCD support on SAM9261-EK and SAM9263-EK boards (commit)
- Add EM7210/SS4000E board support (commit)
- Add EM-x270 board support (commit)
- i.MX/MX1 clock event source (commit), GPIO support implementation (commit)
- Gateway 7001 series support (commit)
- ixdp425: NAND support (commit)
- Netgear WG302 v2 and WAG302 v2 support (commit)
- KS8695: GPIO driver (commit)
- ARMv7: Add uncompressing code for the new CPU Id format (commit)
- HP Jornada 7XX: Addition of SSP Platform Driver (commit)
- MXC platform and i.MX31ADS core support (commit)
- Add noMMU support for ARMv7 (commit)
SPARC64
- Add LDOM virtual channel driver and VIO device layer. (commit)
- Add Sun LDOM virtual network driver. (commit)
- Add Sun LDOM virtual disk driver. (commit)
- Initial domain-services driver. (commit)
- Initial LDOM cpu hotplug support. (commit)
- dr-cpu unconfigure support. (commit)
- Add proper multicast support to VNET driver. (commit)
- Add basic infrastructure for MD add/remove notification. (commit)
BLACKFIN
- Add kgdb support (commit)
- Add support for the ADSP-BF54x (commit),(commit), (commit)
- Add Support for Peripheral PortMux and resource allocation (commit)
- Blackfin On-Chip RTC driver update for supporting BF54x (commit)
- Blackfin on-chip ethernet driver (commit)
M68K
- m68knommu: generic irq handling (commit)
S390
- scatter-gather for inbound traffic in qeth driver (commit)
- z/VM unit record device driver (commit)
IA64
- Add support for vector domain (commit)
- Support irq migration across domain (commit)
- Convert to generic timekeeping/clocksource (commit)

4. Drivers

4.1. Graphics drivers

remove tx3912fb (commit)
nvidiafb: Add proper support for Geforce 7600 chipset (commit)
fbcon: cursor blink control (commit)
radeonfb: Add support for Radeon Xpress 200M (RS485) (commit)
pm3fb: fillrect acceleration (commit)

4.2. SATA/libata/IDE drivers

Remove almost 700KB of legacy CDROM drivers: They are all broken beyond repair. Given that nobody has complained about them (most haven't worked in 2.6 AT ALL), remove them from the tree - users are welcome to resurrect them, though (commit)
Support chips with 64K PRD quirk (commit)
AHCI: Add support for Marvell AHCI-like chips (initially 6145) (commit)
pata_atiixp: add SB700 PCI ID (commit)
libata-acpi: implement _GTM/_STM support, power-management features (commit)
ata_piix: Add a PCI ID for santa rosa's PATA controller. (commit)
sata_promise: SATA hotplug support (commit)
pata_mpc52xx: suspend/resume support (commit)
ide: add short cables support (commit)

4.3. Network drivers

Add ethtool support for NETIF_F_IPV6_CSUM devices (BNX2, TG3) (commit)
zd1211rw: Allow channels 1-11 for unrecognised regulatory domains (commit), detect more AL2230S radios (commit), add ID for Buffalo WLI-U2-KG54L (commit), add UW2453 RF support (commit), add ID for ZyXEL G-200v2 (commit), add ID for Siemens Gigaset USB Stick 54 (commit), add ID for Planex GW-US54GXS (commit)
BNX2: Add support for remote PHY. (commit), add ethtool support for remote PHY. (commit), support NVRAM on 5709. (commit)
phylib: add the ICPlus IP175C PHY driver (commit), enable SGMII mode in m88e1111 (commit), add Marvell 88E1112 phy id (commit)
macb: Add multicast capability (commit), use generic PHY layer (commit)
ucc_geth: add support to netif message level (commit), add ethtool support (commit)
TG3: Enable auto MDI. (commit)
Add 93cx6 eeprom library (commit)
sky2: carrier management (commit), add support for read/write of EEPROM (commit), Yukon Extreme (88e8071) support. (commit)
eHEA: net_poll support (commit), add support for DLPAR memory add (commit)
gianfar: add support for SGMII (commit)
s2io: add PCI error recovery support (commit)
r8169: mac address change support (commit)
forcedeth: mcp73 device addition (commit)
ns83820: Handle multicast frames. (commit)
saa9730: Handle multicast frames. (commit)
ni5010: Handle multicast frames. (commit)
arm/ether3: Handle multicast frames. (commit)

4.4. Sound drivers

More scheduled OSS driver removal. sound/oss/emu10k1, sound/oss/nm256*, sound/oss/opl3*, sound/oss/cs46*, sound/oss/aci*, sound/oss/ac97*, sound/oss/ad18{16,89}* are removed; they already had been disabled in Kconfig in 2.6.20 (commit)
DDB5477: remove driver bits of support (commit)
HDA: Support for iMac 24 inches released on 09/2006 (commit), add support of newer version of Intel iMac (commit), add AD1884 / AD1984 codec support (commit), add model for Toshiba A135 (commit), add HP Pavilion quirk to Realtek code (commit), add Fujitsu Siemens v3515 support (commit), output MFG information for HDA devices (commit), add AD1882 codec support (commit), add support for HP Spartan (commit), add support for HP Nettle (commit), add HP Lucknow 5.1 support (commit), add VIA HDA to si3054 (commit), add LG LW20 si3054 modem id (commit), add proper model for HP xw series (commit), add support of ALC268 codec (commit), add quirk for Asus P5LD2 (commit), yet another Uniwill laptop with ALC861 codec (commit), add the MCP73/77 support to hda_intel driver (commit), enable SPDIF in/out on some stac9205 boards (commit), add support for MSI K9AGM2-FIH motherboard (commit)
usb-audio: add Roland SH-201 support (commit), add quirk for Roland Juno-G (commit)
snd-emu10k1: Initial support for E-Mu 1616 and 1616m. (commit), enable E-Mu 1616m notebook firmware loading. (commit)
ice1724 - Add PCM Playback Switch to Revo 7.1 (commit)
opl3sa2 - Add Neomagic MagicWave 3D ISA PnP ID (commit)
snd-ca0106: Add support for X-Fi Extreme Audio. (commit)

4.5. SCSI drivers

libsas: Add SATA support to libsas (commit), add support NCQ for SATA disks (commit), add SAS management protocol handler (commit), add SAS management protocol support (commit)
FC Transport support for vports based on NPIV (N-Port IDE Virtualization) (commit)
lpfc: NPIV: add SLI-3 interface (commit), add NPIV support on top of SLI-3 (commit)
qla2xxx: add support for NPIV (commit), add ISP25XX support (commit)
53c7xx: kill driver. Support is added below (commit), (commit), (commit)
53c700: m68k support for the 53c700 SCSI core (commit), m68k BVME6000 NCR53C710 SCSI (commit), m68k MVME16x NCR53C710 SCSI (commit), Amiga 4000T NCR53c710 SCSI (commit), Amiga Zorro NCR53c710 SCSI (commit)
3w-9xxx: add support for 9690SA (commit)
aacraid: add user initiated reset (commit), add support for FUA (commit), changeable queue depth (commit)
aic94xx: add SATAPI support (commit)
areca: improve driver stability by adding PCI-E error recovery support and fixing bugs (commit)
mpt fusion: add support for Brocade branded LSI FC HBA (commit)
ibmmca: Resurrect converting it to new probing API (commit)
ibmvscsi: Changeable queue depth (commit)
initio: Convert into a real Linux driver and update to modern style, ie, rewrite big parts of it (commit)
qla4xxx: ql4_fw.h add support foCFSr qla4032 (commit)
scsi_lib: add scatter/gather data buffer accessors (commit)
cciss: add new controller support for P700m (commit)

4.6. V4L/DVB drivers

Add experimental support for tea5761 tuner (commit)
saa7134: add support for 10moons TM300 card (commit)
Add support for the AF9005 demodulator from Afatech (commit)
Add support for A-LINK DTU dvb-t adapter (commit)
Budget-av: Add support for EasyWatch DVB-S (0x1894:0x001b) (commit)
Cx88: add support for ADS Tech Instant Video PCI (commit)
Ir-kbd-i2c: add support for Hauppauge HVR1300 remote (commit)
Zr364xx: add support for Trust Powerc@m 970Z (commit)
Dvb-pll: add support for Philips fcv1236d (commit)
Bttv: add support for DViCO FusionHDTV 2 (commit)
tveeprom: add support for Philips FQ1216LME MK3 tuner. (commit)

4.7. USB

Add USB-Persist facility allows USB devices to persist across a power loss during system suspend. When the option is off the behavior will remain the same as it is now. But when the option is on, people will be able to use suspend-to-disk and keep their USB filesystems intact -- something particularly valuable for small machines where the root filesystem is on a USB device (commit)
Suspend support for usb serial (commit)
usbmon: Add class for binary interface (commit)
Serial Keyspan: add support for USA-49WG & USA-28XG (commit)
USB: RTS/CTS handshaking support, DTR fixes for MCT U232 serial adapter (commit)
USB: io_ti: Digi EdgePort update for new devices (commit)
ehci-hub: improved over-current recovery (commit)
berry_charge: Support Blackberry Pearl (commit)
sierra: Add TRU-Install (c) Support (commit), add new devices (commit)

4.8. IB/ipath drivers

Support the IBA6110 revision 4 (commit)
Remove support for preproduction HTX InfiniPath cards (commit)
Support UD low-latency QPs (commit)
Add Shared Receive Queue support (commit)
IB/mad: Enhance SMI for switch support (commit)
IB/ehca: Support large page MRs (commit) and small QP queues (commit)
IB/mlx4: Implement query QP (commit), implement query SRQ (commit)

4.9. Input drivers

Add support for Cortron PS/2 Trackballs (commit)
Add support for Xbox 360 gamepad (commit) and gamepad rumble support (commit)
wistron - add LED support (commit)
Add support for the new Bamboo tablets (commit)
wistron: add support for querying/changing keymap (commit)
usbtouchscreen: add support for IRTOUCHSYSTEMS touchscreens (commit)

4.10. Hwmon drivers

lm90: Add support for the Maxim MAX6680/MAX6681 (commit)
it87: Add IT8726F support (commit)
w83627hf: Add PWM frequency selection support (commit)
f71805f: Add temperature-tracking fan control mode (commit)

4.11. HID

Make debugging output runtime-configurable (commit)
Add support for Gameron dual psx adaptor (commit)
Add support for Petalynx Maxter remote control (commit)
Add support for logitech cordless desktop LX500 special mapping (commit)

4.12. Cpufreq

Longhaul: VT8237 support (commit)
Longhaul: Embedded "conservative" governor (commit)
Longhaul: Option to disable ACPI C3 support (commit)
CPU frequency scaling for AT32AP (commit)

4.13. I2C

Delete the i2c-isa pseudo bus driver (commit)
i2c-nforce2: Add support for SMBus block transactions (commit)
i2c-piix4: Add support for the ATI SB700 (commit)

4.14. FireWire drivers

raw1394: Add ioctl() compatibility for 32bit userland on 64bit kernel (commit)
Remove old isochronous ABI (commit)
Various stability fixes to the new alternative FireWire drivers, notably command abortions in firewire-sbp2 (commit)
See also linux1394.org's release notes.

4.15. OMAP

add TI TWL92330/Menelaus Power Management chip driver (commit)
Add Texas Instrument OMAP1 (commit) and TI OMAP2 internal display controller support (commit),
LCD panel support for the TI OMAP H3 board (commit), Palm Zire71 (commit), TI OMAP1510 Innovator board (commit), Palm Tungsten E (commit), Epson HWA742 LCD controller support (commit), TI OMAP1610 Innovator board (commit), Palm Tungsten|T (commit), TI OMAP H4 board (commit), Siemens SX1 mobile phone (commit), TI OMAP OSK board (commit), RFBI (commit), Epson Blizzard LCD controller support (commit), SoSSI (commit)

4.16. ACPI

Add ACPI 3.0 _TPC _TSS _PTC throttling support (commit)
Populate /sys/firmware/acpi/tables/ (commit)
thinkpad-acpi: enable more hotkeys (commit), add input device support to hotkey subdriver (commit)
sony-laptop: add new SNC handlers (commit), add support for recent Vaios Fn keys (C series for now) (commit)

4.17. Watchdog

631xESB/632xESB support for iTCO_wdt (commit)
Add watchdog support for the rtc-m41t80 driver (commit)

4.18. Various

serial: convert early_uart to earlycon for 8250 (commit)
RTC: Add support for STK17TA8 chip (commit)
mmc: bounce requests for simple hosts (commit)
mmc: sdhci: add ene controller id (commit)
zs: move to the serial subsystem (commit)

5. Crashing soon a kernel near you

This is a list of some of the ongoing patches being developed at the kernel community that will be part of future Linux releases. Those features may take many months to get into the Linus' git tree, or may be dropped. The features are tested in the -mm tree, but be warned, it can crash your machine, eat your data (unlikely but not impossible) or kidnap your family (just because it has never happened it doesn't mean you're safe):

Reading the Linux Weather Forecast page is recommended.

CFS improvements: Performance improvements (now CFS is faster than even 2.6.22's scheduler, and the resulting code is smaller!) and Group scheduling
Per-device write throttling control.
Filesystems: BTRFS and NILFS, unionfs, chunkfs, reiser 4, logfs
Dynticks for x86-64.
Reunification of x86 and x86-64.
PIE executable randomization.
Utrace.
NAPI rework.
Kernel markers.
Swap prefetch.
Process and memory containers.
USB device authorization.
Unify all the various kgdb stubs lying around various architectures and extends kgdb support to other architectures.
The waited mac80211-based driver for Atheros 5xxx wireless cards.
10GbE driver for Intel 82598 based PCI Express adapters

1. Short overview (for news sites, etc)

2. Important things (AKA: ''the cool stuff'')

2.1. The CFS process scheduler

2.2. On-demand read-ahead

2.3. fallocate()

2.4. Virtualization: lguest and Xen

2.4.1. lguest

2.4.2. Xen

2.5. Variable argument length

2.6. PPP over L2TP

2.7. Autoloading of ACPI kernel modules

2.8. async_tx API

2.9. 'Lumpy' reclaim

2.10. Movable Memory Zone

2.11. UIO

2.12. O_CLOEXEC file descriptor flag

2.13. Use splice in the sendfile() implementation

2.14. XFS and EXT4 improvements

2.15. Coredump filter mask

2.16. Rewrite the x86 asm setup in C

2.17. New drivers

3. Subsystems

3.1. Filesystems

3.2. Networking

3.3. SELinux

3.4. Audit

3.5. KVM

3.6. Architecture-specific changes

4. Drivers

4.1. Graphics drivers

4.2. SATA/libata/IDE drivers

4.3. Network drivers

4.4. Sound drivers

4.5. SCSI drivers

4.6. V4L/DVB drivers

4.7. USB

4.8. IB/ipath drivers

4.9. Input drivers

4.10. Hwmon drivers

4.11. HID

4.12. Cpufreq

4.13. I2C

4.14. FireWire drivers

4.15. OMAP

4.16. ACPI

4.17. Watchdog

4.18. Various

5. Crashing soon a kernel near you

6. In the news