For Monday, June 8th 2009 I'm Jon Masters with a summary of today's LKML traffic
.

In today's issue: Fair Anticipatory Scheduling, making mapped executable pages
the first class citizen, zone_reclaim() behavorial expectations, MCE ring
buffer, RTL8169 related crashes, and a few good hackers.

Fair Anticipatory Scheduling. Corrado Zoccolo raised the issue of Low latency
I/O scheduling, suggesting that he had some ideas for ways to improve I/O
workloads that have >1 random readers. Rather than dividing requests up into
per-process queues like CFQ, Corrado divides requests globally into two
classes - sequential and random - with the former remaining in per-process
queues and the latter singled out globally and stored in a separate structure.
He posted some code and figures apparently showing significant performance
improvements, but as of yet there have been no comments from others on-list,
nor independent analysis of his claims or posted code. Still, it sounds like
yet another interesting alterative form of I/O scheduling algorithm.

Making mapped executable pages the first class citizen. Wu Fengguang followed
up to the previous thread concerning alterating the VM page eviction logic to
more preferentially favor mapped executable pages with some statistics showing
a marked improvement. These had previously been asked for since the kernel
community is loathe to take patches purely on a gut feeling alone, even though
this would seem fairly obvious as a performance improvement. Kosaki Motohiro
ACKed the patches following this, while Fengguang was interested in further
(automated) testing of desktop environments. So as to ensure reproducibility,
he asked about - and then discovered - remote window manager control tools for
the modern X Window System implementations as deployed by Linux. These use two
different standards for remote WM control - DCOP in KDE, while EWHM/NetWM
compatible window managers like metacity use an tool such as wmctrl.

zone_reclaim() behavioral expectations. Mel Gorman (author of "Understanding
The Linux Virtual Memory Manager") posted a series of patches intended to
bring zone_reclaim() behavior in line with expectations. Apparently, on a
particular system, a large tmpfs occupying a significant chunk of physical
memory would help to cause significant CPU utilization because zone_reclaim()
would sit and uselessly scan lists of pages that could not be freed. To help
address such situations, Mel re-introduces the "zone_reclaim_interval"
tunable, and alters the zone_reclaim() heuristics. The patch has not been
tested heavily yet as more information is pending from a third party. Mel
also posted a rather handy "gfp-translate" script that can be used to decode
kernel messages of the form "page allocation failure. order:1, mode:0x4020"
into component parts, using the kernel header files for the values.

MCE ring buffer. Huang Ying posted version 5 of a re-implementation of the MCE
log ring buffer as a per-CPU ring buffer. The implementation is similar to
Steven Rostedt's existing generic kernel ring_buffer insomuch that one buffer
exists per-cpu and is written to by the CPU receiving an MCE. At read time, a
global lock protects the reading of individual per-CPU buffers as they are
iterated through, one by one.

RTL8169 related crashes. Michael Tokarev described how the current rtl8169
driver incorrectly handles "jumbo" Ethernet frames received in appropriately
configured network environments. On his system, such packets lead to memory
corruption and subsequent kernel oopses. The driver problem was identified
thanks to a series of discussions with other kernel hackers and a patch has
subsequently been suggested for application to the -stable kernels.

A few good hackers. Kallol Biswas posted a message entitled "looking for ideas
on VM related research projects" in which he solicited for suggestions on
areas of the Linux VM subsystem that needed work or other R&D ideas. Avi
Kivity followed up with a list that included "automatically using and break up
large pages, task/vma affinity, and automatically migrating memory when
thread/vma affinity cannot be satisfied". On an unrelated note, Avi also asked
Linus to pull a late breaking KVM fix for 2.6.30 if it's not too late. The fix
allows KVM guests to successfully reboot on CONFIG_SMP enabled systems.

Michael also discovered a problem booting recent git RCs because he had the
acpi-dsdt-initrd patches applied that allow one to provide a fake ACPI DSDT at
kernel boot time. A DSDT or "Differentiated System Description Table" is part
of the ACPI specification and is used to provide system information in
structure binary format created using tools such as those provided by Intel in
the Open Source project IASL. Using the Intel compiler tools, one can
dissassemble their vendor-provided ACPI metadata, modify it, and then supply a
fake version to the Linux kernel at boottime. Ongoing work is happening to
make this easier, through patches such as the one Michael had applied.

Finally today, Jesper Dangaard Brouer noted that unloadable modules using RCU
callbacks must use rcu_barrier() if they are to be safely unloaded. He cited
the LWN article on RCU barriers, as well as the kernel Documentation is
providing a list of a few example drivers that needed to be changed. There are
probably more of these around worth looking for if you have time.

The latest kernel release is still 2.6.30-rc8, but there is likely to be a
final 2.6.30 release at any moment now.

That's a summary of today's LKML traffic. For further information visit kernel.org. I'm Jon Masters.