KernelNewbies:

Linux 2.6.23; not yet released

TableOfContents()

1. Short overview (for news sites, etc)

2.6.23 includes the fallocate() syscall

2. In the news

3. Important things (AKA: ''the cool stuff'')

3.1. The CFS process scheduler

The new process scheduler, a.k.a CFS, has generated much noise in some circles due to the way this scheduler has been choosen over it's 'competitor' RDSL. A bit of story is needed to clarify what happened and what CFS does compared to the old scheduler.

Long time ago, during the development of Linux 2.5, the 'O(1)' process scheduler from Ingo Molnar was merged to replace the process scheduler inherited from 2.4. The O(1) scheduler was mainly designed to fix the scalability issues in the 2.4 process scheduler - the improvements were so big, that the O(1) scheduler was one of the most frequently backported features to 2.4 in commercial Linux distributions. However, the algorithms in charge of scheduling the processes did not receive so much attention - the main goal of the new scheduler was to solve the scalability issues from the ground up, where as the process scheduling was considered good enought, or at least it wasn't perceived as a critical issue. Those algorithms can make a huge difference in what the users perceive as 'interactivity'. For example, if a process - or more than one - starts an endless loop and due to those CPU-bound loopers the process scheduler doesn't assign as much CPU as neccesary to the already present non-looping processes in charge of implementing the user interfaces (X.org, kicker, firefox, openoffice.org, etc), the user will perceive that the programs don't react to his actions very smoothly. Worse, in the case of music players your music could skip.

The O(1) scheduler, just like the previous scheduler, tried to handle those situations as good as possible, and generally, they did a good job in most of cases. However, many users reported corner cases and not-so-corner cases where the new scheduler didn't worked as expected. Between those people was Con Kolivas, and despite of his inexperience in the kernel hacking world, he tried to fine-tune the scheduling algorithms, without replacing them. His work was a big sucess, and his patches found a way into the main kernel.

He didn't stop there. Con found that the 'interactivity estimator' - a piece of code used by the process scheduler to try to decide which processes were more 'interactive' and hence needed more attention so that the user would perceive a smoother behaviour on their desktops - caused more problems than it solved. Contrary to its original purpose, the interactivity estimator couldn't fix all the 'interactivity' problems present in the process scheduler, and trying to fix one would open another issue. It was the typical case of an algorithm using statistics to try to predict the future with heuristics, and failing at it.

Con designed a new scheduler that killed all the failed interactivy estimations. Instead, his scheduler was based on the concept of fairness while conserving the 'O(1)-ness' of the mainline scheduler: processes are treated equally and are given same timeslices (see [http://lwn.net/Articles/224865/ this LWN article for more details on this scheduler]), and the scheduler doesn't care or even try to guess if the process is CPU bound or IO-bound (interactive). This scheduler improved the user's perceived smoothness to unprecedent levels.

This scheduler was the one that was going to get merged, but Ingo Molnar (the O(1) creator) created his own new scheduler, called CFS (alias for 'Completely Fair Scheduler'), taking as the basic design element the 'fairness' idea that Con's scheduler had probed to be superior. The CFS scheduler has some differences compared to Con's RDSL: Instead of runqueues (that are used in both RDSL and mainline O(1)), it uses a time-ordered rbtree to build a 'timeline' of future task execution, to try to avoid the 'array switch' artifacts that both the vanilla and the RSDL scheduler can suffer. It also uses nanosecond granularity accounting and does not rely on any jiffies or other HZ detail; in fact it does not have notion of 'timeslices' and has no heuristics whatsoever (read [http://lwn.net/Articles/230574/ this LWN article for more details on CFS design]). CFS has been choosen as replacement for the current 'O(1)' scheduler over RDSL - surprisingly this choice has generated much noise. It must be noticed that both RDSL and CFS are great schedulers, much better than the one in mainline, and that it was Con who pioneered the idea of using the concept of 'fairness' over the 'interactivity estimations', but that doesn't mean that CFS didn't deserve to get merged instead of RDSL (neither the contrary, if that had been the case).

CFS code: [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5e7eaade55d53da856f0e07dc9c188f78f780192 (commit 1], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=20b8a59f2461e1be911dce2cfafefab9d22e4eee 2], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6aa645ea5f7a246702e07f29edc7075d487ae4a3 3], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bb44e5d1c6b3b748e0facf8f516b3162009feb27 4], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bf0f6f24a1ece8988b243aefe84ee613099a9245 5], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa72e9e484c16f0c9aee23981917d8c8c03f0482 6], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=dd41f596cda0d7d6e4a8b139ffdfabcefdd46528 7], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=43ae34cb4cd650d1eb4460a8253a8e747ba052ac 8)],

3.2. fallocate()

The new fallocate() system call allows applications to preallocate space for a file (http://lwn.net/Articles/226710/). Each file system implementation that wants to use this feature will need to support an inode operation called fallocate.

Applications can use this feature to avoid fragmentation to certain level and thus get faster access speed. With preallocation, applications also get a guarantee of space for particular file(s) - even if later the system becomes full [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=97ac73506c0ba93f30239bb57b4cfc5d73e68a62 (commit)].

3.3. Two additional virtualisation solutions, Xen and lguest, merged

3.3.1. Xen merged

The Xen virtual machine monitor was recently merged into the upcoming 2.6.23 Linux kernel in a series of patches from Jeremy Fitzhardinge. Xen is a virtual machine monitor (VMM) for x86-compatible computers (http://kerneltrap.org/node/13917) [http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5ead97c84fa7d63a6a7a2f4e9f18f452bd109045 (commit)].

From a Kerneltrap comment : "just limited (no dom0, no suspend/resume, no ballooning) xen client support for i386 only".

3.3.2. lguest merged

Rusty Russell's lguest was recently merged into the upcoming 2.6.23 Linux kernel. The merge comment describes the project, "lguest is a simple hypervisor for Linux on Linux. Unlike kvm it doesn't need VT/SVM hardware. Unlike Xen it's simply 'modprobe and go". (http://kerneltrap.org/node/13916) [http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=07ad157f6e5d228be78acd5cea0291e5d0360398 (commit)].

4. Miscellaneous kernel-userland changes

4.1. open() O_CLOEXEC flag

2.6.23 adds a new O_CLOEXEC flag for open(2) (http://lwn.net/Articles/236843/) [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f23513e8d96cf5e6cf8d2ff0cb5dd6bbc33995e4 (commit)] [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4a19542e5f694cd408a32c3d9dc593ba9366e2d7 (commit)]. This flag makes it possible to avoid race conditions in multithreaded applications that do the following:

  1. Thread A: fd=open()
  2. Thread B: fork + exec
  3. Thread A: fcntl(fd,F_SETFD,FD_CLOEXEC)

(Instead, Thread A would drop the fcntl() call and just open the file with O_CLOEXEC.)

KernelNewbies: Linux_2_6_23 (last edited 2007-08-24 07:12:14 by 81)