2637
Comment: oops. i meant for these to go under cool stuff.
|
6556
CFS introduction
|
Deletions are marked like this. | Additions are marked like this. |
Line 9: | Line 9: |
== In the news == |
|
Line 10: | Line 13: |
=== The CFS process scheduler === The new process scheduler, a.k.a CFS, has generated many noise in some circles due to the way this scheduler has been choosen over it's 'competitor' SD. During the development of Linux 2.5, the 'O(1)' process scheduler from Ingo Molnar was merged to replace the process scheduler inherited from 2.4. The O(1) scheduler was mainly designed to fix the scalability issues in the 2.4 process scheduler - the improvements were so big, that the O(1) scheduler was one of the most frequently backported features to 2.4 in commercial Linux distributions. However, the algorithms in charge of scheduling the processes did not receive so much attention - the main goal of the new scheduler was to solve the scalability issues from the ground up, where as the process scheduling was considered good enought, or at least it wasn't perceived as a critical issue. Those algorithms can make a huge difference in what the users perceives as 'interactivity'. For example, if a process -or more than one- starts a endless loop and due to those busy-loopers the process scheduler doesn't assigns as much CPU as neccesary to the already present non-looping processes in charge of implementing the user interfaces (X.org, kicker, firefox, openoffice, etc), the user will perceive that the programs don't react to his actions very smoothly. Worse, in the case of music players your music could skip. The O(1) scheduler, just like the previous scheduler, tried to handle those situations as good as possible, and generally, they did a good job in most of cases. However, many users reported corner cases and not-so-corner cases where the new scheduler didn't worked as expected. Between those people was Con Kolivas, and despite of his inexperience in the kernel hacking world, he tried to fine-tune the scheduling algorithms, without replacing them. His work was a big sucess, and his patches found a way into the main kernel. He didn't stopped there. Con found that the 'interactivity estimator' -a piece of code used by the process scheduler to try to decide what processes were more 'interactive' and hence needed more attention so that the user would perceive a smoother behaviour on their desktops- did caused more problems than it solved. Contrary to its original purpose, the interactivity estimator couldn't fix all the 'interactivity' problems present in the process scheduler, and trying to fix one would open another issue for other workloads. It was the typical case of an algorithm using stadistics to try to predict the future with heuristics, and failing at it. * CFS * core data types [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=20b8a59f2461e1be911dce2cfafefab9d22e4eee (commit)] * cfs rq data types [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6aa645ea5f7a246702e07f29edc7075d487ae4a3 (commit)] * cfs core, kernel/sched_rt.c [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bb44e5d1c6b3b748e0facf8f516b3162009feb27 (commit)] * cfs core, kernel/sched_fair.c [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bf0f6f24a1ece8988b243aefe84ee613099a9245 (commit)] * cfs core, kernel/sched_idletask.c [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa72e9e484c16f0c9aee23981917d8c8c03f0482 (commit)] * cfs core code [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=dd41f596cda0d7d6e4a8b139ffdfabcefdd46528 (commit)] * scheduler debugging, core [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=43ae34cb4cd650d1eb4460a8253a8e747ba052ac (commit)] * add CFS documentation [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5e7eaade55d53da856f0e07dc9c188f78f780192 (commit)] |
Linux 2.6.23; not yet released
1. Short overview (for news sites, etc)
2.6.23 includes the fallocate() syscall
2. In the news
3. Important things (AKA: ''the cool stuff'')
3.1. The CFS process scheduler
The new process scheduler, a.k.a CFS, has generated many noise in some circles due to the way this scheduler has been choosen over it's 'competitor' SD.
During the development of Linux 2.5, the 'O(1)' process scheduler from Ingo Molnar was merged to replace the process scheduler inherited from 2.4. The O(1) scheduler was mainly designed to fix the scalability issues in the 2.4 process scheduler - the improvements were so big, that the O(1) scheduler was one of the most frequently backported features to 2.4 in commercial Linux distributions. However, the algorithms in charge of scheduling the processes did not receive so much attention - the main goal of the new scheduler was to solve the scalability issues from the ground up, where as the process scheduling was considered good enought, or at least it wasn't perceived as a critical issue. Those algorithms can make a huge difference in what the users perceives as 'interactivity'. For example, if a process -or more than one- starts a endless loop and due to those busy-loopers the process scheduler doesn't assigns as much CPU as neccesary to the already present non-looping processes in charge of implementing the user interfaces (X.org, kicker, firefox, openoffice, etc), the user will perceive that the programs don't react to his actions very smoothly. Worse, in the case of music players your music could skip.
The O(1) scheduler, just like the previous scheduler, tried to handle those situations as good as possible, and generally, they did a good job in most of cases. However, many users reported corner cases and not-so-corner cases where the new scheduler didn't worked as expected. Between those people was Con Kolivas, and despite of his inexperience in the kernel hacking world, he tried to fine-tune the scheduling algorithms, without replacing them. His work was a big sucess, and his patches found a way into the main kernel.
He didn't stopped there. Con found that the 'interactivity estimator' -a piece of code used by the process scheduler to try to decide what processes were more 'interactive' and hence needed more attention so that the user would perceive a smoother behaviour on their desktops- did caused more problems than it solved. Contrary to its original purpose, the interactivity estimator couldn't fix all the 'interactivity' problems present in the process scheduler, and trying to fix one would open another issue for other workloads. It was the typical case of an algorithm using stadistics to try to predict the future with heuristics, and failing at it.
* CFS
core data types [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=20b8a59f2461e1be911dce2cfafefab9d22e4eee (commit)]
cfs rq data types [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6aa645ea5f7a246702e07f29edc7075d487ae4a3 (commit)]
cfs core, kernel/sched_rt.c [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bb44e5d1c6b3b748e0facf8f516b3162009feb27 (commit)]
cfs core, kernel/sched_fair.c [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bf0f6f24a1ece8988b243aefe84ee613099a9245 (commit)]
cfs core, kernel/sched_idletask.c [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa72e9e484c16f0c9aee23981917d8c8c03f0482 (commit)]
cfs core code [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=dd41f596cda0d7d6e4a8b139ffdfabcefdd46528 (commit)]
scheduler debugging, core [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=43ae34cb4cd650d1eb4460a8253a8e747ba052ac (commit)]
add CFS documentation [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5e7eaade55d53da856f0e07dc9c188f78f780192 (commit)]
3.2. fallocate()
The new fallocate() system call allows applications to preallocate space for a file (http://lwn.net/Articles/226710/). Each file system implementation that wants to use this feature will need to support an inode operation called fallocate.
Applications can use this feature to avoid fragmentation to certain level and thus get faster access speed. With preallocation, applications also get a guarantee of space for particular file(s) - even if later the system becomes full [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=97ac73506c0ba93f30239bb57b4cfc5d73e68a62 (commit)].
3.3. Two additional virtualisation solutions, Xen and lguest, merged
3.3.1. Xen merged
The Xen virtual machine monitor was recently merged into the upcoming 2.6.23 Linux kernel in a series of patches from Jeremy Fitzhardinge. Xen is a virtual machine monitor (VMM) for x86-compatible computers (http://kerneltrap.org/node/13917) [http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5ead97c84fa7d63a6a7a2f4e9f18f452bd109045 (commit)].
From a Kerneltrap comment : "just limited (no dom0, no suspend/resume, no ballooning) xen client support for i386 only".
3.3.2. lguest merged
Rusty Russell's lguest was recently merged into the upcoming 2.6.23 Linux kernel. The merge comment describes the project, "lguest is a simple hypervisor for Linux on Linux. Unlike kvm it doesn't need VT/SVM hardware. Unlike Xen it's simply 'modprobe and go". (http://kerneltrap.org/node/13916) [http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=07ad157f6e5d228be78acd5cea0291e5d0360398 (commit)].
4. Miscellaneous kernel-userland changes
4.1. open() O_CLOEXEC flag
2.6.23 adds a new O_CLOEXEC flag for open(2) (http://lwn.net/Articles/236843/) [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f23513e8d96cf5e6cf8d2ff0cb5dd6bbc33995e4 (commit)] [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=4a19542e5f694cd408a32c3d9dc593ba9366e2d7 (commit)]. This flag makes it possible to avoid race conditions in multithreaded applications that do the following:
- Thread A: fd=open()
- Thread B: fork + exec
- Thread A: fcntl(fd,F_SETFD,FD_CLOEXEC)
(Instead, Thread A would drop the fcntl() call and just open the file with O_CLOEXEC.)