For Monday, June 8th 2009 I'm Jon Masters with a summary of today's LKML traffic .
In today's issue: Fair Anticipatory Scheduling, making mapped executable pages the first class citizen, zone_reclaim() behavorial expectations, MCE ring buffer, RTL8169 related crashes, and a few good hackers.
Fair Anticipatory Scheduling. Corrado Zoccolo raised the issue of Low latency I/O scheduling, suggesting that he had some ideas for ways to improve I/O workloads that have >1 random readers. Rather than dividing requests up into per-process queues like CFQ, Corrado divides requests globally into two classes - sequential and random - with the former remaining in per-process queues and the latter singled out globally and stored in a separate structure. He posted some code and figures apparently showing significant performance improvements, but as of yet there have been no comments from others on-list, nor independent analysis of his claims or posted code. Still, it sounds like yet another interesting alterative form of I/O scheduling algorithm.
Making mapped executable pages the first class citizen. Wu Fengguang followed up to the previous thread concerning alterating the VM page eviction logic to more preferentially favor mapped executable pages with some statistics showing a marked improvement. These had previously been asked for since the kernel community is loathe to take patches purely on a gut feeling alone, even though this would seem fairly obvious as a performance improvement. Kosaki Motohiro ACKed the patches following this, while Fengguang was interested in further (automated) testing of desktop environments. So as to ensure reproducibility, he asked about - and then discovered - remote window manager control tools for the modern X Window System implementations as deployed by Linux. These use two different standards for remote WM control - DCOP in KDE, while EWHM/NetWM compatible window managers like metacity use an tool such as wmctrl.
zone_reclaim() behavioral expectations. Mel Gorman (author of "Understanding The Linux Virtual Memory Manager") posted a series of patches intended to bring zone_reclaim() behavior in line with expectations. Apparently, on a particular system, a large tmpfs occupying a significant chunk of physical memory would help to cause significant CPU utilization because zone_reclaim() would sit and uselessly scan lists of pages that could not be freed. To help address such situations, Mel re-introduces the "zone_reclaim_interval" tunable, and alters the zone_reclaim() heuristics. The patch has not been tested heavily yet as more information is pending from a third party. Mel also posted a rather handy "gfp-translate" script that can be used to decode kernel messages of the form "page allocation failure. order:1, mode:0x4020" into component parts, using the kernel header files for the values.
MCE ring buffer. Huang Ying posted version 5 of a re-implementation of the MCE log ring buffer as a per-CPU ring buffer. The implementation is similar to Steven Rostedt's existing generic kernel ring_buffer insomuch that one buffer exists per-cpu and is written to by the CPU receiving an MCE. At read time, a global lock protects the reading of individual per-CPU buffers as they are iterated through, one by one.
RTL8169 related crashes. Michael Tokarev described how the current rtl8169 driver incorrectly handles "jumbo" Ethernet frames received in appropriately configured network environments. On his system, such packets lead to memory corruption and subsequent kernel oopses. The driver problem was identified thanks to a series of discussions with other kernel hackers and a patch has subsequently been suggested for application to the -stable kernels.
A few good hackers. Kallol Biswas posted a message entitled "looking for ideas on VM related research projects" in which he solicited for suggestions on areas of the Linux VM subsystem that needed work or other R&D ideas. Avi Kivity followed up with a list that included "automatically using and break up large pages, task/vma affinity, and automatically migrating memory when thread/vma affinity cannot be satisfied". On an unrelated note, Avi also asked Linus to pull a late breaking KVM fix for 2.6.30 if it's not too late. The fix allows KVM guests to successfully reboot on CONFIG_SMP enabled systems.
Michael also discovered a problem booting recent git RCs because he had the acpi-dsdt-initrd patches applied that allow one to provide a fake ACPI DSDT at kernel boot time. A DSDT or "Differentiated System Description Table" is part of the ACPI specification and is used to provide system information in structure binary format created using tools such as those provided by Intel in the Open Source project IASL. Using the Intel compiler tools, one can dissassemble their vendor-provided ACPI metadata, modify it, and then supply a fake version to the Linux kernel at boottime. Ongoing work is happening to make this easier, through patches such as the one Michael had applied.
Finally today, Jesper Dangaard Brouer noted that unloadable modules using RCU callbacks must use rcu_barrier() if they are to be safely unloaded. He cited the LWN article on RCU barriers, as well as the kernel Documentation is providing a list of a few example drivers that needed to be changed. There are probably more of these around worth looking for if you have time.
The latest kernel release is still 2.6.30-rc8, but there is likely to be a final 2.6.30 release at any moment now.
That's a summary of today's LKML traffic. For further information visit kernel.org. I'm Jon Masters.