WARNING: This document may not be completely finished at the time of the release. Sorry. You can look at the LWN list of 2.6.25 features ([ 1], [ 2], and [ 3])

Linux kernel version 2.6.25 Released ([ full SCM git log])


1. Short overview (for news sites, etc)

2. Important things (AKA: the cool stuff)

2.1. RCU Preempt support

Recommended LWN article: [ "The design of preemptible read-copy-update"]

[ RCU] is a very powerful locking scheme used in Linux to scale to [ very large] number of CPUs on a single system. However, it wasn't well suited for the Real Time patchsets that have been developed to make Linux a RT OS, because some parts weren't preemptible, causing latencies too big for RT workloads. In 2.6.25, RCU can be preempted, eliminating that source of latencies and making Linux a bit more RT-ish.

Code: [;a=commit;h=e260be673a15b6125068270e0216a3bfbfc12f87 (commit)]

2.2. FIFO ticket spinlocks

Recommended article: [ "Ticket spinlocks"]

In certain workloads, spinlocks can be unfair, ie: a process spinning on a spinlock can be starved up to 1,000,000 times. Usually starvation in spinlocks is not a problem, because it becomes a performance problem before any starvation is noticed, but testing has showed the contrary. And it's always possible to find an obscure corner case that will generate a lot of contention on some lock, and the processor that will grab the lock does it randomly. With the new spinlocks, the processes grab the spinlock in FIFO order. Spinlocks configured to run in more than 255 CPUs will also use a 32-bit value (instead of the 16 bits used when NR_CPUS < 255) that allows a theoretical limit of up to 65536 processors.

Code: [;a=commit;h=314cdbefd1fd0a7acf3780e9628465b77ea6a836 (commit)]

2.3. Better process memory usage measurement

Recommended LWN article: [ "How much memory are applications really using?"]

Measuring how much memory processes are using is more difficult than it looks, specially when processes are sharing the memory used. Features like /proc/$PID/smaps (added in [ 2.6.14]) help, but it has not been enough. 2.6.25 adds new statistics to make this task easier. A new /proc/$PID/pagemaps file is added for each process. In this file the kernel exports (in binary format) the physical page localization for each page used by the process. Comparing this file with the files of other processes allows to know what pages they are sharing. Another file, /proc/kpagemaps, exposes another kind of statistics about the pages of the system. The author of the patch, Matt Mackall, proposes two new statistic metrics: "proportional set size" (PSS) - divide each shared page by the number of processes sharing it; and "unique set size" (USS) (counting of pages not shared). The first statistic, PSS, has also been added to each file in /proc/$PID/smaps. In [ this HG repository] you can find some sample command line and graphic tools that exploits all those statistics.

Code: (commit [;a=commit;h=1e88328111aae3ea408f346763ba9f9bad71f876 1], [;a=commit;h=304daa8132a95e998b6716d4b7bd8bd76aa152b2 2], [;a=commit;h=161f47bf41c5ece90ac53cbb6a4cb9bf74ce0ef6 3], [;a=commit;h=85863e475e59afb027b0113290e3796ee6020b7d 4])

2.4. Memory Resource Controller

Recommended LWN article: [ "Controlling memory use in containers"]

The memory resource controller is a cgroups-based feature. Cgroups, aka "Control Groups", is a feature that was merged in [ 2.6.24], and its purpose is to be a generic framework where several "resource controllers" can plug in and manage different resources of the system such as process scheduling or memory allocation. It also offers a unified user interface, based on a virtual filesystem where administrators can assign arbitrary resource constraints to a group of chosen tasks. For example, in [ 2.6.24] they merged two resource controllers: Cpusets and Group Scheduling. The first allows to bind CPU and Memory nodes to the arbitrarily chosen group of tasks, aka cgroup, and the second allows to bind a CPU bandwidth policy to the cgroup.

The memory resource controller isolates the memory behavior of a group of tasks -cgroup- from the rest of the system. It can be used to:

The configuration interface, like all the cgroups, is done by mounting the cgroup filesystem with the "-o memory" option, creating a randomly-named directory (the cgroup), adding tasks to the cgroup by catting its PID to the 'task' file inside the cgroup directory, and writing values to the following files: 'memory.limit_in_bytes', 'memory.usage_in_bytes' (memory statistic for the cgroup), 'memory.stats' (more statistics: RSS, caches, inactive/active pages), 'memory.failcnt' (number of times that the cgroup exceeded the limit), and 'mem_control_type'. OOM conditions are also handled in a per-cgroup manner: when the tasks in the cgroup surpass the limits, OOM will be called to kill a task between all the tasks involved in that specific cgroup.

Code: (commit [;a=commit;h=1b6df3aa457690100f9827548943101447766572 1], [;a=commit;h=8cdea7c05454260c0d4d83503949c358eb131d17 2], [;a=commit;h=e552b6617067ab785256dcec5ca29eeea981aacb 3], [;a=commit;h=78fb74669e80883323391090e4d26d17fe29488f 4], [;a=commit;h=8a9f3ccd24741b50200c3f33d62534c7271f3dfc 5], [;a=commit;h=66e1707bc34609f626e2e7b4fe7e454c9748bad5 6], [;a=commit;h=67e465a77ba658635309ee00b367bec6555ea544 7], [;a=commit;h=0eea10301708c64a6b793894c156e21ddd15eb64 8], [;a=commit;h=c7ba5c9e8176704bfac0729875fa62798037584d 9], [;a=commit;h=8697d33194faae6fdd6b2e799f6308aa00cfdf67 10], [;a=commit;h=bed7161a519a2faef53e1bce1b47595e297c1d14 11], [;a=commit;h=e1a1cd590e3fcb0d2e230128daf2337ea55387dc 12])

2.5. EXT4 update

Recommended article: [ "A better ext4"]

EXT4 mainline snapshot gets an update with a bunch of features: Multi-block allocation, large blocksize up to PAGEZIZE (Shouldn't this be "PAGESIZE"? -zamb), journal checksumming, large file support, large filesystem support, inode versioning, and allow in-inode extended attributes on the root inode. These features should be the last ones that require on-disk format changes. Other features that don't affect the disk format, like delayed allocation, have still to be merged.

Code: (commit [;a=commit;h=c9de560ded61faa5b754137b7753da252391c55a 1], [;a=commit;h=0040d9875dcccfcb2131417b10fbd9841bc5f05b 2], [;a=commit;h=0fc1b451471dfc3cabd6e99ef441df9804616e63 3], [;a=commit;h=c14c6fd5c56a0d0495d8a7c0f2bc330be658663e 4], [;a=commit;h=25ec56b518257a56d2ff41a941d288e4b5ff9488 5], [;a=commit;h=725d26d3f09ccb5bac4b4293096b985a312a0d67 6], [;a=commit;h=7a224228ed79d587ece2304869000aad1b8e97dd 7], [;a=commit;h=8180a5627d126362c2f64e4fa886d6f608d9632a 8], [;a=commit;h=818d276ceb83aa9fdebb5e0a53188290312de987 9], [;a=commit;h=8e85fb3f305b24b79c6d9cb7a56d22b062335ad3 10], [;a=commit;h=afc7cbca5bfd556c3e12d3acefbee5ab0cbd4670 11])

2.6. MN10300/AM33 architecture support

The MN10300/AM33 architecture is now supported under the "mn10300" subdirectory. 2.6.25 adds support MN10300/AM33 CPUs produced by MEI. It also adds board support for the ASB2303 with the ASB2308 daughter board, and the ASB2305. The only processor supported is the MN103E010, which is an AM33v2 core plus on-chip devices. Code: [;a=commit;h=b920de1b77b72ca9432ac3f97edb26541e65e5dd (commit)]

3. Subsystems

3.1. Various

3.2. Filesystems

3.3. Networking

3.4. Crypto

3.5. Security

3.6. Architecture-specific changes

4. Drivers

4.1. Graphics


4.3. Sound

4.4. SCSI

4.5. Network

4.6. V4L/DVB

4.7. I2C

4.8. HID

4.9. Input

4.10. USB

4.11. RDMA

4.12. Hwmon

4.13. MTD

4.14. ACPI

* intel_menlo: introduce new platform specific driver [;a=commit;h=cc0573b3250214034062ddf8c64359596d8af521 (commit)]

4.15. RTC/W1

4.16. Leds

4.17. Various

* mcp23s08 spi gpio expander support [;a=commit; h=e58b9e2762a6ef99e20dba47aba21b911658541d (commit)]

KernelNewbies: Linux_2_6_25 (last edited 2008-03-31 18:31:40 by diegocalleja)