KernelNewbies:

Linux kernel version 2.6.24 Released ([http://kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.24 full SCM git log])

TableOfContents()

1. Short overview (for news sites, etc)

2.6.23 includes

2. Important things (AKA: ''the cool stuff'')

2.1. CFS improvements

Performance/size improvements

The CFS task scheduler [http://kernelnewbies.org/Linux_2_6_23#head-f3a847a5aace97932f838027c93121321a6499e7 merged in Linux 2.6.23] is getting [http://lkml.org/lkml/2007/9/11/395 some microoptimization work] in 2.6.24. 2.6.23's CFS context switching is more than 10% slower than the old task scheduler. With the optimization done in 2.6.24, CFS is now even a bit faster than the old task scheduler (which is quite fast already). The compiled size of the scheduler has also improved and now it's a bit more smaller on UP and a lot smaller in SMP.

Fair Group Scheduling

You can read [http://lwn.net/Articles/240474/ this recommended article] about the Fair Group Scheduling feature.

Another feature in the scheduler is the Fair Group Scheduling. Normally the scheduler operates on individual tasks and strives to provide fair CPU time to each task. Sometimes, it may be desirable to group tasks and provide fair CPU time to each such task group. For example, it may be desirable to first provide fair CPU time to each user on the system and then to each task belonging to a user. In other words, given two users, one running one cpu-bound process and the other two cpu-bound processes, you may want to give 50% of CPU time to the first users and his task, and 50% to the other user, which will be shared between his two processes - 25% of CPU time for each.

Thats the kind of thing that the Group Scheduling feature does. At present, there are two (mutually exclusive) mechanisms to group tasks for CPU bandwidth control purpose: 1) Group scheduling based on user id, which is the case previously mentioned as example. This mechanism is configurable, which means you can have more CPU time than just a 50%/50% rule - you can assign user root the double of priority than other users. 2) Group scheduling. This mechanism lets the administrator create arbitrary groups of tasks (ie: "multimedia", "compiling"), set how much CPU time 'priority' you want to give that group by catting the value to its cpu_share file, and then attach a PID to whatever task group you want. Documentation on how to use those two features can be found at Documentation/sched-design-CFS.txt.

guest time reporting

Aditionally, the task scheduler in 2.6.24 is adding a new "guest" field after "system" and "user" in /proc/<PID>/stat, where it tracks how much CPU time a task is spending in running a 'virtual' CPU.

2.2. Tickless support for x86-64, PPC, UML, ARM, MIPS

The Tickless feature was [http://kernelnewbies.org/Linux_2_6_21#head-8547911895fda9cdff32a94771c8f5706d66bba0 added in Linux 2.6.21]. This feature allows the kernel to disable timer interrupts for longer, variable periods, saving some power and improving performance, specially in virtual guests. 2.6.24 adds tickless support to the widespread 64-bit x86 architecture, but also to PPC, the virtualized architecture UML, and some variants of ARM and MIPS. They join to the already supported x86-32, SPARC-64 and SH.

2.3. New wireless drivers

In Linux 2.6.22, it was [http://kernelnewbies.org/Linux_2_6_22#head-1498b990e997cc0e95dbfa9047e7ebe8d84847cc merged] the new mac80211 wifi stack, but not many drivers that use this new stack have been merged (only one). Linux 2.6.24 will have a lot of new wireless drivers using the new stack, 2,3 MB of source files in total:

2.4. New wireless configuration interface

In [http://kernelnewbies.org/Linux_2_6_22 Linux 2.6.22], Linux got a new and shiny wireless stack. This new stack has backwards compatibility with the old ioctl-based configuration of the old stack. However, the new stack was designed to have a much better configuration interface, based on netlink. While the backwards compatibility isn't going away, all wireless configuration tools are recommended to do long-term plans to switch to this interface

2.5. SPI/SDIO support in the MMC layer

The MMC layer, which is the code which implements support for MMC/SD memory cards, is suffering one of the biggest transformations in its life, because it has been [http://lkml.org/lkml/2007/9/24/37 heavily modified] to get support for [http://en.wikipedia.org/wiki/Secure_Digital_card#SDIO SDIO] and [http://en.wikipedia.org/wiki/Serial_Peripheral_Interface_Bus SPI].

SDIO is an alias for "Secure Digital I/O", and it allows to use the SD card slot (in the devices that support SDIO, ie. PDAs, cell phones or laptops) to use "small devices designed for the SD form factor, like GPS receivers, Wi-Fi or Bluetooth adapters, modems, Ethernet adapters, barcode readers, IrDA adapters, FM radio tuners, TV tuners, RFID readers, digital cameras, or other mass storage media such as hard drives" (quote from the [http://en.wikipedia.org/wiki/Secure_Digital_card#SDIO Wikipedia entry]). There are currently three working drivers for this new stack: sdio_uart, a driver for the standardised GPS interfaces; libertas_sdio, a driver for Marvell's 8686 Libertas wifi chip; and hci_sdio, a driver for the standardised bluetooth interface.

SPI is required by SDIO, an it's a "bus" (like IDE, SATA, USB...) which is used to access a wide range of devices, but more importantly, some systems require to access MMC/SD cards using a SPI controller instead of using a "native" MMC/SD controller. This has a disadvantage of being relatively high overhead, but a compensating advantage of working on many systems without dedicated MMC/SD controllers. 2.6.24 includes support for SPI and a experimental "MMC/SD over SPI" driver.9 (commit)]

2.6. USB authorization

As part of the efforts to make the USB layer ready for [http://en.wikipedia.org/wiki/Wireless_USB wireless USB], Linux 2.6.24 is getting support for USB device authorization, which allows you to control if a USB device (wireless or not) can be used or not in a system. As of now, when a USB device is connected it is configured and it's interfaces inmediately made available to the users. With this modification, only if root authorizes the device to be configured will then it be possible to use it.

Beside of providing a infrastructure to allow secure usage of wireless USB devices, this feature also allows to implement kiosk-style lockdown of USB devices, fully controlled by user space. Every USB device has a corresponding /sys/bus/usb/devices/<DEVICE>/authorized file. Writing 1 to that file authorizes a device to connect, 0 deauthorizes it. USB hosts can also set new devices connected to be deauthorized by writing 0 (or 1 to authorize) to /sys/bus/usb/devices/usb<X>/authorized_default. By default, wired USB devices are authorized by default to connect, and wireless USB hosts deauthorize by default all new connected devices (this is so because they need to do an authentication phase before authorizing).

2.7. Per-device dirty thresholds

You can read [http://lwn.net/Articles/245600/ this recommended article] about the "per-device dirty thresholds" feature.

When a process writes data to the disk, the data is stored temporally in 'dirty' memory until the kernel decides to write the data to the disk ('cleaning' the memory used to store the data). A process can 'dirty' the memory faster than the data is written to the disk, so the kernel throttles processes when there's too much dirty memory around. The problem with this mechanism is that the dirty memory thresholds are global, the mechanism doesn't care if there're several storage devices in the system, much less if some of them are faster than others. There're lot of scenaries where this design harms performance. For example, if there's a very slow storage device in the system (ex: a USB 1.0 disk, or a NFS mount over dialup), the thresholds are hit very quickly - not allowing other processes that may be working in much faster local disk to progress. Stacked block devices (ex: LVM/DM) are much worse and even deadlock-prone (check the LWN article).

In 2.6.24, the dirty thresholds are per-device, not global. The limits are variable, depending on the writeout speed of each device. This improves the performance greatly in many situations.

2.8. PID and network namespaces

You can read [http://lwn.net/Articles/256389/ this recommended article] about the "Linux Kernel Markers" feature.

Usually, there's a global PID namespace for a whole Linux system: The list of processes contains all the processes running in the system. There's also a global view of the networking stack (routing tables and firewall rules, etc). However, [http://en.wikipedia.org/wiki/Operating_system-level_virtualization operating-system virtualization] like [http://openvz.org OpenVZ] or [http://en.wikipedia.org/wiki/Linux-VServer Vserver] need to have different views of the PID namespace and the networking stack. Linux 2.6.24 adds PID namespaces and basic support for network namespaces. They're used through the CLONE_NEWPID and CLONE_NEWNET clone() flags.

2.9. Large Receive Offload (LRO) support for TCP traffic

You can read [http://lwn.net/Articles/243949/ this recommended article] about the "Large Receive Offload" feature.

LRO combines received tcp packets to a single larger tcp packet and passes them then to the network stack in order to increase performance (throughput). After many out-of-the-tree iterations, mainline Linux is getting support for this feature [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=71c87e0cedca843162206c698cfa02e5fea9e2e3 (commit)], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1e6e9342d41ff80ced0ad5dfcf084926700cdfc5 (commit)], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d4dc4ec9d84e0578b9bfbe56a11fafdb7cbac771 (commit)]

2.10. Task Control Groups

There have been various proposals in the Linux arena for resource management/accounting and other task grouping subsystems in the kernel (Resgroups, User Beancounters, NSProxy cgroups, and others). Task Control Groups is the framework that is getting merged in 2.6.24 to fulfill the functionality that lead to the creation of such proposals. TCG can track and group processes into arbitrary "cgroups" and assign arbitrary state to those groups, in order to control its behaviour. The intention is that other subsystems hook into the generic cgroup support to provide new attributes for cgroups, such as accounting/limiting the resources which processes in a cgroup can access.

For example, cpusets (see Documentation/cpusets.txt) allows you to associate a set of CPUs and a set of memory nodes with the tasks in each cgroup. The CFS group scheduling feature uses cgroups to control the CPU time that every cgroup can get. Other various resource management and virtualization/cgroup efforts can become task cgroup clients. The configuration interface is described in Documentation/cgroups.txt

2.11. Linux Kernel Markers

You can read [http://lwn.net/Articles/245671/ this recommended article] about the "Linux Kernel Markers" feature.

The Linux Kernel Markers implement static probing points for the Linux kernel. Dynamic probing system like kprobes/dtrace can put probes pretty much anywhere. However, the scripts that dynamic probing points use can become quickly outdated, because a small change in the kernel may trigger a rewrite of the script, which needs to be maintained and updated separately, and will not work for all kernel versions. Thats why static probing points are useful, since they can be put directly into the kernel source code and hence they are always in sync with the kernel development. Static probing points apparently can also have some performance advantages. They've no performance costs when they're not being used.

The kernel markers are a sort of "derivative" of the long-time and external patchset "Linux Trace Toolkit" (LTT), which is a feature that has been around since [http://www.opersys.com/LTT/news.html#18-11-1999 1999]. The Kernel Markers are a feature needed for the [http://lwn.net/Articles/245671/ SystemTap] project. In this release, there're no probing points being included, but many will be certainly include in the future, and some tracking tools like blktrace will probably be ported to this kind of infrastructure in the future.

2.12. x86-32/64 arch reunification

You can read [http://lwn.net/Articles/243704/ this recommended article].

When support for the x86-64 AMD architecture was developed, it was decided to develop it as a "fork" of the traditional x86 architecture for comodity reasons. Many patches needed to patch a file in the i386 architecure directory, and another similar patch for the duplicated file in the x86_64 directory. It has been decided to unify both architectures in the same directory again.

This reunification has not been done in a radical way. In this release, botch architectures have been unificated in arch/x86, but only in appearance. All the source files in i386 and x86-64 directories have been moved to arch/x86, but renaming them with "_32" and "_64" suffixes. Ex: arch/i386/kernel/reboot.c has been moved to arch/x86/kernel/reboot_32.c, and arch/x86_64/kernel/reboot.c has been moved to arch/x86/kernel/reboot_64.c. Makefiles have been modified accordingly. So for now the reunification has been pretty much just a relocation of all the files and adaptation of the build machinery to make it compile just as it'd have been compiled in the old separated directories, done mostly with scripts.

In the future lots of those files will be unificated and shared by both architectures, ex. reboot_32.c and reboot_64.c into reboot.c, and many files have already been unificated in this release. Others will keep separated forever, due to the differences between both architectures.

3. Subsystems

3.1. Networking

3.2. Filesystems

3.3. CRYPTO

3.4. Architecture-specific changes

3.5. SELinux

4. Drivers

4.1. Graphics

4.2. SATA/IDE

4.3. Networking

4.4. Sound

4.5. ACPI

4.6. MTD

4.7. Input

4.8. SCSI

4.9. USB

4.10. V4L/DVB

4.11. HWMON

4.12. Bluetooth

KernelNewbies: Linux_2_6_24 (last edited 2007-12-02 19:52:57 by diegocalleja)