• Immutable Page
  • Info
  • Attachments

KernelPodcast20090602

For Tuesday, June 2nd 2009, I'm Jon Masters with a summary of today's LKML traffic.

In today's issue: Xen, OOM, DebugFS, Dynamic ftrace support for s390, kprobe-based event tracing, and resetting the TSC.

Xen is a feature! Various core members of the Linux community - including Linus Torvalds, and Thomas Gleixner - weighed in on the ongoing discussion of the apparent "need" to merge in Xen Dom0 support. They suggested that such a merge really is far more intrusive than others, since touching a great deal of subsystems might be necessary given the current design of Xen. Linus suggested that readers look at a specific set of "git log" output and compare the impact of the KVM and Xen merges, then "once you've done that, ask yourself which one is going to be merged easily and without and pushback". As he noted, this isn't meant to be a "xen vs kvm" thing - he said you could just aswell replace "kvm" with many other different subsystem changes, and then suggested that the Xen folks re-examine how they might split out their code into less sweeping changes, thinking about their code from the point of view of someone performing the merge, rather than '"somebody who whines "I want my code to be merged". IOW, if you have trouble getting your code merged, ask yourself what _you_ are doing wrong".

OOM. David Rientjes posted a series of patches that, amongst other things, movee oomkilladdj from task_struct into the (quite possibly shared) mm_struct, in the process renaming it to oom_adj. This is the value that is calculated in the kernel's "badness" loop, which iterates over tasks look for particularly bad memory hogs - in the past there were even proposals for pluggable OOM killers, and an ability to "always blame firefox". David points out that this change is acceptable because tasks with shared MM structs should not be OOM killed. It fixes a livelock than can occur if another thread sharing the same memory as a given task has an oom_adj value of OOM_DISABLE, in which case the OOM killer would previously sit in a loop, repeatedly unable to kill the task it deemed most offensive to its fragile sensibilities.

David also posted a patch that would invoke the OOM killer for GFP_NOFAIL, which caused some discussion because it included handling for order>0 (in fact, greater than PAGE_ALLOC_COSTLY_ORDER) in combination with GFP_NOFAIL allocations, which many people considered a sign that it was time to ban such combinations rather than special case them. David pointed out that doing that, or just deprecating GFP_NOFAIL altogether, were worthwhile activities but were unrelated to the specific patch at hand - the intention of his patch was simply to deny the OOM killer in a situation where it won't be of any help.

Read/write DebugFS. Mike Frysinger posted an extension to debugfs that would allow developers to choose a standard debugfs file_operations struct (by means of appropriate file mode bits at debugfs_create time) that enforced read-only or write-only filesystem entries. This is particularly intended for the case of kernel developers wishing to access read-only or write-only hardware device registers. The patch extends debugfs_create functions in the obvious way.

Dynamic ftrace support for s390. Not to be left out, Heiko Carstens posted dynamic ftrace support for IBM's System 390 mainframe systems. In his patch announcement, Heiko noted that s390 provides an alternative to disabling kernel text section write protection (remember, the kernel includes text section write protection in modern kernels) prior to patching instruction call sites using a special set of instructions that can bypass the kernel virtual addresses and work directly on the physical underlying pages. For this reason, he suggests an arch specific probe_kernel_write() be added.

Kprobe-based event tracing. Hasami Hiramatsu posted version 9 of his kprobe-based event tracer for x86. This event tracer behaves similarly to the other in-kernel tracers, except it uses kprobes in order to allow for insertion at arbitrary points, including in the middle of functions. The patch also includes an x86(-64) instruction decoder that may be able to share pieces with KVM's decoder at some point in the future. The patchset has undergone a little evolution since it was first posted, and includes helpful documentation for those who are interested in playing with it.

Resetting the TSC at the beginning of check_tsc_warp. Luming Yu had previously started a small thread in which he proposed forcibly writing to the TSC on certain systems with an out-of-sync initial TSC value in order to ensure it is in sync between multiple cores. As Andi Kleen pointed out, this might help one special-case system, but will break most other systems. He suggested the correct fix might be to "give up and say the system won't be able to use TSC unless the BIOS fixes its act" (the BIOS is of course ultimately responsible for early pre-Linux CPU initialization).

Several miscellaneous pull requests continue to come in, including hang fixes for the PowerPC and XFS trees. Joerg Roedal added a dma-debug driver filter that allows developers to filter which driver DMA problems will be reported.

In today's announcements: Dan Carpenter noted that Smatch 1.53 has been released. From the announcement, "Smatch is a source code checker for C. Right now the focus is on checking for kernel bugs". Certainly, it's worth digging into the differences between Smatch and sparse, Linus' existing checker. Jeff Garzik followed up to the announcement of mdadmin 3.0 - which includes support for userspace metadata updates - with a rant about how distribution vendors must now choose between RAID stack MD and software RAID DM based on knowing the underlying RAID metadata format that is required, and the features one wants to get out it. Others drew parallels between the ongoing MD/DM activities and the switchover from legacy IDE code to libata, which of course Jeff is the owner of. Either way, Jeff made a number of very good points.

The latest kernel on Tuesday evening was 2.6.30-rc7, which was released by Linus over the US Memorial Day Weekend holiday last Saturday night. A number of fixes would lead to a release of rc8 shortly thereafter, as shall be covered by the next installment of this podcast.

Stephen Rothwell posted a linux-next tree for Tuesday. Since Monday, the asm-generic tree gained a conflict against the arm-current tree, and the tree continues to fail to build in an allyesconfig powerpc build configuration. The total sub-tree count remains steady from the previous build at 142 trees.

Finally today, a little reminder for those using Rhythmbox to listen to this podcast. Several people have reported a recent bug in GNOME vfs/rhythmbox integration wherein the vfsd-http process may get stuck and need killing. If you experience issues fetching this (or other podcasts) try that first.

That's a summary of today's LKML traffic. For further information visit kernel.o rg. I'm Jon Masters.

Tell others about this page:

last edited 2009-06-04 04:01:02 by JonMasters