#pragma section-numbers on #pragma keywords Linux, Kernel, Operative System, Linus Torvalds, Open Source, drivers #pragma description Summary of the changes and new features merged in the Linux Kernel during the 2.6.24 development Linux kernel version 2.6.24 Released ([http://kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.24 full SCM git log]) [[TableOfContents()]] = Short overview (for news sites, etc) = 2.6.23 includes = Important things (AKA: ''the cool stuff'') = == CFS improvements == The CFS task scheduler [http://kernelnewbies.org/Linux_2_6_23#head-f3a847a5aace97932f838027c93121321a6499e7 merged in Linux 2.6.23] is getting [http://lkml.org/lkml/2007/9/11/395 some microoptimization work] in 2.6.24. 2.6.23's CFS context switching is more than 10% slower than the old task scheduler. With the optimization done in 2.6.24, CFS is now even a bit faster than the old task scheduler (which is quite fast). The compiled size of the scheduler has also improved and now it's a bit more smaller on UP and a lot smaller in SMP. Another feature in the scheduler is the Fair Group Scheduling. Normally the scheduler operates on individual tasks and strives to provide fair CPU time to each task. Sometimes, it may be desirable to group tasks and provide fair CPU time to each such task group. For example, it may be desirable to first provide fair CPU time to each user on the system and then to each task belonging to a user. In other words, given two users, one running one cpu-bound process and the other two cpu-bound processes, you may want to give 50% of CPU time to the first users and his task, and 50% to the other user, which will be shared between his two processes - 25% of CPU time for each. Thats the kind of thing that the Group Scheduling feature does. At present, there are two (mutually exclusive) mechanisms to group tasks for CPU bandwidth control purpose: 1) Group scheduling based on user id, which is the case previously mentioned as example, and 2) Group scheduling based on "cgroup" pseudo filesystem. This last options lets the administrator create arbitrary groups of tasks through the Control Groups (a feature already present in the Linux kernel, used for aggregating/partitioning sets of tasks for resource management needs of all kind, not just CPU time) The user-id based mechanism is configurable, which means you can have more CPU time than just a 50%/50% rule. You can change i /sys/kernel/uids//cpu_share.CPU bandwidth between two users are divided in the ratio of their CPU shares expresed in those files (which defaults to "nice 0 load" - 1024). For ex: if you would like user "root" to get twice the bandwidth of user, "guest", then set the cpu_share for both the users such that "root"'s cpu_share is twice "guest"'s cpu_share. The Control Groups based system configuration system is described in Documentation/sched-design-CFS.txt. Basically, you can create random task groups (ie: "multimedia", "compiling"), set how much CPU time 'priority' you want to give that group by catting the value to its cpu_share file, and then attach a PID to whatever task group you want. Aditionally, the task scheduler in 2.6.24 is adding a new "guest" field after "system" and "user" in /proc//stat, where it tracks how much CPU time a task is spending in running a 'virtual' CPU. == New wireless drivers == In Linux 2.6.22, it was [http://kernelnewbies.org/Linux_2_6_22#head-1498b990e997cc0e95dbfa9047e7ebe8d84847cc merged] the new mac80211 wifi stack, but not many drivers that use this new stack have been merged (only one). Linux 2.6.24 will have a lot of new wireless drivers using the new stack, 2,3 MB of source files in total: * iwlwifi driver for the Intel PRO/Wireless 3945ABG/BG Network Connection and Intel Wireless Wifi Link AGN (4965) adapters [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b481de9ca074528fe8c429604e2777db8b89806a (commit)] * rt2x00 driver for Ralink wireless hardware (rt2400 pci/pcmcia, rt2500 pci/pcmcia, rt61 pci/pcmcia, rt2500 usb, rt73 usb). Check the [http://rt2x00.serialmonkey.com/wiki/index.php/Hardware hardware matrix] [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=95ea36275f3c9a1d3d04c217b4b576c657c4e70e (commit)] * adm8211 driver for the ADMtek ADM8211x based wireless cards. These are PCI/mini-PCI/Cardbus 802.11b chips found in cards such as: Xterasys Cardbus XN-2411b, Blitz Netwave Point PC, Trendnet 221pc, Belkin F5d6001, SMC 2635W, Linksys WPC11 v1, Fiberline FL-WL-200X, 3com Office Connect (3CRSHPW796), Corega WLPCIB-11, SMC 2602W V2 EU, D-Link DWL-520 Revision C [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=cc0b88cf5ecf13cdd750f08e201ce8fadcdb601f (commit)] * b43 driver for modern BCM43xx devices. This driver supports the new BCM43xx IEEE 802.11G devices, but not the old IEEE 802.11B devices - those are supported by the b43legacy driver. This driver uses V4 firmware, which must be installed separately using b43-fwcutter [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e4d6b7951812d98417feb10784e400e253caf633 (commit)] * b43legacy driver for legacy BCM43xx devices from Broadcom (BCM4301 and BCM4303) and early model 802.11g chips (BCM4306 Ver. 2) used in the Linksys WPC54G V1 PCMCIA devices. Newer 802.11g and 802.11a devices need the b43 driver. This driver uses V3 firmware, which must be installed separately using b43-fwcutter [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=75388acd0cd827dc1498043daa7d1c760902cd67 (commit)] * p54 driver for prism54 softmac pci/usb hardware [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=eff1a59c48e3c6a006eb4fe5f2e405a996f2259d (commit)] == Per-device dirty thresholds == You can read [http://lwn.net/Articles/245600/ this recommended article] about the "per-device dirty thresholds" feature. When a process writes data to the disk, the data is stored temporally in 'dirty' memory until the kernel decides to write the data to the disk ('cleaning' the memory used to store the data). A process can 'dirty' the memory faster than the data is written to the disk, so the kernel throttles that process when there's too much dirty memory around. The problem with this mechanism is that the dirty memory thresholds are global, the mechanism doesn't care if there're several storage devices in the system, much less if some of them are faster than others. There're lot of scenaries where this design harms performance. For example, if there's a very slow storage device in the system (ex: a USB 1.0 disk, or a NFS mount over dialup), the thresholds are hit very quickly - not allowing other processes that may be working in much faster local disk to progress. Stacked block devices (ex: LVM/DM) are much worse (check the LWN article). In 2.6.24, the dirty thresholds are per-device, not global. The limits are variable, depending on the writeout speed of each device. This improves the performance greatly and solves some dedlock situations. == Linux Kernel Markers == You can read [http://lwn.net/Articles/245671/ this recommended article] about the "Linux Kernel Markers" feature. The Linux Kernel Markers implement static probing points for the Linux kernel. Dynamic probing system like kprobes/dtrace can put probes pretty much anywhere. However, the scripts that dynamic probing points use can become quickly outdated, because a small change in the kernel may trigger a rewrite of the script, which needs to be maintained and updated separately, and will not work for all kernel versions. Thats why static probing points are useful, since they can be put directly into the kernel source code and hence they are always in sync with the kernel development. Static probing points apparently can also have some performance advantages. They've no performance costs when they're not being used. The kernel markers are a sort of "derivative" of the long-time and external patchset "Linux Trace Toolkit" (LTT), which is a feature that has been around since [http://www.opersys.com/LTT/news.html#18-11-1999 1999]. The Kernel Markers are a feature needed for the [http://lwn.net/Articles/245671/ SystemTap] project. In this release, there're no probing points being included, but many will be certainly include in the future, and some tracking tools like blktrace will probably be ported to this kind of infrastructure in the future. == x86-32/64 arch reunification == You can read [http://lwn.net/Articles/243704/ this recommended article]. When support for the x86-64 AMD architecture was developed, it was decided to develop it as a "fork" of the traditional x86 architecture for comodity reasons. Many patches needed to patch a file in the i386 architecure directory, and another similar patch for the duplicated file in the x86_64 directory. It has been decided to unify both architectures in the same directory again. This reunification has not been done in a radical way. In this release, botch architectures have been unificated in arch/x86, but only in appearance. All the source files in i386 and x86-64 directories have been moved to arch/x86, but renaming them with "_32" and "_64" suffixes. Ex: arch/i386/kernel/reboot.c has been moved to arch/x86/kernel/reboot_32.c, and arch/x86_64/kernel/reboot.c has been moved to arch/x86/kernel/reboot_64.c. Makefiles have been modified accordingly. So for now the reunification has been pretty much just a relocation of all the files and adaptation of the build machinery to make it compile just as it'd have been compiled in the old separated directories, done mostly with scripts. In the future lots of those files will be unificated and shared by both architectures, ex. reboot_32.c and reboot_64.c into reboot.c, and many files have already been unificated in this release. Others will keep separated forever, due to the differences between both architectures.