For the weekend of Jun 7th 2009, I'm Jon Masters with a summary of the day's LKML traffic.
Apologies for the slight delay, your author was reading Knuth at the beach over the weekend, in between looking at patches, and writing a ZNC plugin IRC->SMS/email/Twitter gateway that uses Greg K-H's 'bti' tool.
In today's issue: Mild ext4 filesystem corruption, private anonymous mmaps, performance overhead, introducing the initdev patchset, IDE fixes, Performance Counters, Introducing this_cpu_xx operations, converting ftrace syscalls to TRACE_EVENT, the IEEE 802.15.4 stack, DebugFS documentation, CPU hard limits, CONFIG_VFAT_NO_CREATE_WITH_LONGNAMES, and benchmarking the Per-bdi writeback flusher threads patchset.
Mild ext4 filesystem corruption. Alan Jenkins rediscovered an existing bug in ext4 that had not received a large amount of on-list attention. On shutdown, he noticed that a file generated by locale-gen was being corrupted. On a subsequent reboot, the MD5 would no longer be correct. He (Alan) proved this by modifying his Debian shutdown scripts to capture various state, and by setting the file to root read-only and immutable. Others suggested trying to drop caches, perform various syncing, and other ideas. Ted T'so later followed up, noting that this seemed to be a previously-reported bug that is only triggered when a filesystem is unmounted and remounted. Apparently, someone at Google is currently looking into this.
Private anonymous mmap. Julian Phillips recently asked about a change in kernel behavior. He (Julian) had a program that creates a reasonably large private anonymous map and was finding that, previously, memory would only be allocated on write (as one might expect). But in kernel 2.6.24.5 he noticed that memory would also be allocated on read from untouched pages. Quoting Julian, "Basically I seem to be seeing copy-on-read instead of copy-on-write type behavior". He subsequently confirmed that Robert Hancock was correct in attributing this to a previous git commit affecting ZERO_PAGE assignment. This wasn't the only obscure VM issue to be discussed. Hugh Dickins, in NACKing a patch from Mincham Kim discussed the overloading of page_table_lock in anon_vma_prepare, explaining how two different threads might be working simultaneously on different anonymous VMAs and thus require the lock.
Performance overhead. Listeners may recall a dialog between Rusty Russell, and Ingo Molnar last week, in which Rusty responded to Ingo's ascertain that CONFIG_PARAVIRT_OPS had a very high overhead to users, with a series of benchmarks showing the impact of other configuration options that vendors (and other users) enable in their kernels also. In the most recent discussion, Dave McCracken followed up to comments from Linus and others saying that he saw Rusty's benchmarks as more of an example of the impact of vendor kernel configurations vs. optimizing a kernel configuration for a specific target. Quoting Rusty, 'Well, Ingo was ranting because (paraphase) "no other config option when *used* has as much impact as CONFIG_PARAVIRT!!!!!!". That was the point of my mail; facts show it's simply untrue.'
Introducing the initdev patchset. Stepping into the early (root) device initialization frey, David VomLehn announced the release of a new patchset aimed at replacing the "rootwait" kernel parameter. With the new patches applied, boot-time users of devices specified in the kernel command line can apparently cause the kernel to wait until their attached devices have been discovered. Currently, both USB and SCSI buses are supported. Stefan Richter looked over the patch series, and found a large number of issues with the current implementation. Quoting Stefan, "Hence the whole thing, as currently implemented is quite useless: The user/admin has to guess what a safe rootdelay value is, and then the kernel will always be delayed for >= rootdelay".
IDE Fixes. Bartlomiej Zolnierkiewicz requested that a number of IDE "fixes" be pulled from his git tree into the pending 2.6.30 release. These seemed, on their face to be fairly innocuous, but included a change in feature support to HPA - "Host Protected Area" on hard disks that a number of people took objection to - not the least of which included James Bottomley and Linus Torvalds. Although these changes might seem minor nobody was even remotely interested in pulling in modifications to partition handling on disks after rc8. Quoting Linus in his entirity, for context: "The thing is, I had planned on doing a final release yesterday, even before your pull came in. I decided to hold off, let it be, just to test the _current_ state a bit more. No way am I then adding some effectively totally untested new code-path. And if you start messing in partitions.c and adding whole new callback functions to generic block_device_operations, then that's a new code-path. The IDE subsystem has _no_ business adding random new callbacks in the very last days of a release. It sure as hell is not just a bugfix, it's a new feature. The feature may be _needed_ for some specific bug report, but that is totally irrelevant. We don't do things like that. It's also almost certainly not a regression, is it? So by no measure does it work as a "late in the -rc sequence" patch.
Performance Counters. Ingo Molnar announced version 8 of his performance counters ptchset, which he has been working on combination with Peter Zizlstra, and others, for some time. The new subsystem adds a new system call (sys_per_counter_open()) and it provides the new 'perf' tool that makes use of these new kernel capabilities. The patchset includes a large amount of documentation, a very complete tool (it's atypical for such tools to be shipped inside the kernel source - especially a tool which claims to integrate "all things performance analysis under one roof"), and represents a very polished effort to add support for these modern CPU counters. Performance counters have been included in all recent AMD and Intel chips and allow users to capture information on a specified subset of the CPU's operation, for example pipeline stalls, cache hits, and even SMI total counts. They rely upon a limited number of CPU registers to expose this information - although that count has rapidly increased on recent CPUs (from 2 on the Pentium III to 18 on the Pentium 4 chip, according to the wikipedia article on the subject).
Introducing this_cpu_xx operations. Christoph Lameter announced a new set of kernel operations allowing efficient access to per-cpu variables for the current processor. Quoting Christoph, "Currently there is no way in the core to calculate the address of the instance of a per-cpu variable without a table lookup through [per_cpu_ptr]". One caveat is that the current patchset does not yet work with System 390, although that support will be forthcoming as it is pending another patch from Tejun Heo.
Converting Ftrace syscalls to TRACE_EVENT. Jason Baron posted a followup patchset, implementing his previous RFC discussion on the topic of converting ftrace syscall tracing to TRACE_EVENT. With the new patchset, one can toggle on/off individual syscalls using the DebugFS interface - which now includes a new "trace_syscalls" entry.
IEEE 802.15.4. Dmitry Eremin-Solenikov posted version 2 of an implementation of the IEEE 802.15.4 protocol for Linux. This stack implements the LR-WPAN or "Low-rate wireleess personal area networks" standard and is the basis for specifications including ZigBee. It is this author's recollection that these are supposed to be lower power, more easily implemented alternatives to heavyweight stacks, such as Bluetooth.
DebugFS documentation. True to his word, Jonathan Corbet (of Linux Weekly News - http://www.lwn.net/) followed up to his previous comments with a revised version of his debugfs API documentation. The previous document (written for LWN) was very complete back in the day, but the implementation has varied considerably since then and was no longer accurate. The new document is fairly complete and is being merged into Jonathan's documentation tree.
CPU hard limits. Bharata B Rao and Avi Kivity had a protracted dialog concerning the nature of CPU limits, and guarantees. The basic object of discussion was an apparent need for certain users to assign guaranteed CPU resources (and hard limits) to certain tasks - for example in order to provide a customer with a guaranteed 10% of the CPU. The debate centered around whether it is truly possible to make certain guarantees to users in terms of CPU availability, how this should be done, and how one should do this optimally so as to avoid the CPU not being at full utilization.
CONFIG_VFAT_NO_CREATE_WITH_LONGNAMES. While it may seem like all has been quiet on the VFAT front recently, this is apparently not true. Vimal Singh asked whether Tridge's previous patch (which disables creation of long filenames on mounted VFAT filesystems - but not the reading of existing long filenames that might already be there) was still acceptable, to which Tridge replied that it was although, quoting Tridge, "as several people pointed out it lacked a public explanation of why it is needed. We are working on fixing that." He apparently is also currently working on a slightly enhaced version, since the existing one "loses more functionality than is strictly necessary".
Per-bdi writeback flusher threads. Frederic Weisbecker published the results of another round of testing of Jens Axboe's per-BDI writeback flusher threads patchset. This second set was produced "only with bdi-writeback" this time and is available from Frederic's kernel.org people page.
In today's announcements: Atheros 802.11n USB firmware source code released. Luis R. Rodriguez announced that firmware source for the Atheros 802.11n USB Otus/at9170 device was released under the GPL.
The latest kernel release is 2.6.30-rc8, which was released by Linus last week. A final 2.6.30 release is imminent now.
The latest release of the 2.4 series kernel is now 2.4.37.2, which was released by Willy Tarreau on Sunday afternoon. The new release fixes a regression brought in by 2.4.37.1 in which the CAP_KILL fix caused modprobe to leave zombies on auto-loading, and includes a SCTP overflow fix as documented in CVE-2009-0065 that had been too late for the previous cycle.
Stephen Rothwell posted a linux-next tree for June 5th. Since the previous day a total of 4 trees have gained conflicts and the powerpc tree continues to fail to build in an allyesconfig build configuration.
Andrew Morton posted an mm-of-the-moment for 2009-06-05-16-19.
Rafael J. Wysocki posted a list of reported regressions between 2.6.29 and the automatically generated 2.6.30-rc8-git4 tree. The total number of regressions currently stands at 37, of which 36 are pending and 28 are unresolved. This number is fairly congruent with the previous update.
That's a summary of today's LKML traffic. For further information visit kernel.org. I'm Jon Masters.