For the weekend of Jun 7th 2009, I'm Jon Masters with a summary of the day's
LKML traffic.

Apologies for the slight delay, your author was reading Knuth at the beach
over the weekend, in between looking at patches, and writing a ZNC plugin
IRC->SMS/email/Twitter gateway that uses Greg K-H's 'bti' tool.

In today's issue: Mild ext4 filesystem corruption, private anonymous mmaps,
performance overhead, introducing the initdev patchset, IDE fixes,
Performance Counters, Introducing this_cpu_xx operations, converting ftrace
syscalls to TRACE_EVENT, the IEEE 802.15.4 stack, DebugFS documentation,
CPU hard limits, CONFIG_VFAT_NO_CREATE_WITH_LONGNAMES, and benchmarking the
Per-bdi writeback flusher threads patchset.

Mild ext4 filesystem corruption. Alan Jenkins rediscovered an existing bug in
ext4 that had not received a large amount of on-list attention. On shutdown,
he noticed that a file generated by locale-gen was being corrupted. On a
subsequent reboot, the MD5 would no longer be correct. He (Alan) proved this
by modifying his Debian shutdown scripts to capture various state, and by
setting the file to root read-only and immutable. Others suggested trying
to drop caches, perform various syncing, and other ideas. Ted T'so later
followed up, noting that this seemed to be a previously-reported bug that is
only triggered when a filesystem is unmounted and remounted. Apparently,
someone at Google is currently looking into this.

Private anonymous mmap. Julian Phillips recently asked about a change in
kernel behavior. He (Julian) had a program that creates a reasonably large
private anonymous map and was finding that, previously, memory would only
be allocated on write (as one might expect). But in kernel 2.6.24.5 he noticed
that memory would also be allocated on read from untouched pages. Quoting
Julian, "Basically I seem to be seeing copy-on-read instead of copy-on-write
type behavior". He subsequently confirmed that Robert Hancock was correct in
attributing this to a previous git commit affecting ZERO_PAGE assignment. This
wasn't the only obscure VM issue to be discussed. Hugh Dickins, in NACKing a
patch from Mincham Kim discussed the overloading of page_table_lock in
anon_vma_prepare, explaining how two different threads might be working
simultaneously on different anonymous VMAs and thus require the lock.

Performance overhead. Listeners may recall a dialog between Rusty Russell,
and Ingo Molnar last week, in which Rusty responded to Ingo's ascertain that
CONFIG_PARAVIRT_OPS had a very high overhead to users, with a series of
benchmarks showing the impact of other configuration options that vendors (and
other users) enable in their kernels also. In the most recent discussion,
Dave McCracken followed up to comments from Linus and others saying that he
saw Rusty's benchmarks as more of an example of the impact of vendor kernel
configurations vs. optimizing a kernel configuration for a specific target.
Quoting Rusty, 'Well, Ingo was ranting because (paraphase) "no other config
option when *used* has as much impact as CONFIG_PARAVIRT!!!!!!". That was
the point of my mail; facts show it's simply untrue.'

Introducing the initdev patchset. Stepping into the early (root) device
initialization frey, David VomLehn announced the release of a new patchset
aimed at replacing the "rootwait" kernel parameter. With the new patches
applied, boot-time users of devices specified in the kernel command line can
apparently cause the kernel to wait until their attached devices have been
discovered. Currently, both USB and SCSI buses are supported. Stefan Richter
looked over the patch series, and found a large number of issues with the
current implementation. Quoting Stefan, "Hence the whole thing, as currently
implemented is quite useless: The user/admin has to guess what a safe rootdelay
value is, and then the kernel will always be delayed for >= rootdelay".

IDE Fixes. Bartlomiej Zolnierkiewicz requested that a number of IDE "fixes" be
pulled from his git tree into the pending 2.6.30 release. These seemed, on
their face to be fairly innocuous, but included a change in feature support to
HPA - "Host Protected Area" on hard disks that a number of people took
objection to - not the least of which included James Bottomley and Linus
Torvalds. Although these changes might seem minor nobody was even remotely
interested in pulling in modifications to partition handling on disks after
rc8. Quoting Linus in his entirity, for context: "The thing is, I had planned
on doing a final release yesterday, even before your pull came in. I decided
to hold off, let it be, just to test the _current_ state a bit more. No way am
I then adding some effectively totally untested new code-path. And if you start
messing in partitions.c and adding whole new callback functions to generic
block_device_operations, then that's a new code-path. The IDE subsystem has
_no_ business adding random new callbacks in the very last days of a release.
It sure as hell is not just a bugfix, it's a new feature. The feature may be
_needed_ for some specific bug report, but that is totally irrelevant. We
don't do things like that. It's also almost certainly not a regression, is
it? So by no measure does it work as a "late in the -rc sequence" patch.

Performance Counters. Ingo Molnar announced version 8 of his performance
counters ptchset, which he has been working on combination with Peter
Zizlstra, and others, for some time. The new subsystem adds a new system call
(sys_per_counter_open()) and it provides the new 'perf' tool that makes use of
these new kernel capabilities. The patchset includes a large amount of
documentation, a very complete tool (it's atypical for such tools to be
shipped inside the kernel source - especially a tool which claims to integrate
"all things performance analysis under one roof"), and represents a very
polished effort to add support for these modern CPU counters. Performance
counters have been included in all recent AMD and Intel chips and allow users
to capture information on a specified subset of the CPU's operation, for
example pipeline stalls, cache hits, and even SMI total counts. They rely upon
a limited number of CPU registers to expose this information - although that
count has rapidly increased on recent CPUs (from 2 on the Pentium III to 18 on
the Pentium 4 chip, according to the wikipedia article on the subject).

Introducing this_cpu_xx operations. Christoph Lameter announced a new set of
kernel operations allowing efficient access to per-cpu variables for the
current processor. Quoting Christoph, "Currently there is no way in the core
to calculate the address of the instance of a per-cpu variable without a table
lookup through [per_cpu_ptr]". One caveat is that the current patchset does
not yet work with System 390, although that support will be forthcoming as
it is pending another patch from Tejun Heo.

Converting Ftrace syscalls to TRACE_EVENT. Jason Baron posted a followup
patchset, implementing his previous RFC discussion on the topic of converting
ftrace syscall tracing to TRACE_EVENT. With the new patchset, one can toggle
on/off individual syscalls using the DebugFS interface - which now includes a
new "trace_syscalls" entry.

IEEE 802.15.4. Dmitry Eremin-Solenikov posted version 2 of an implementation
of the IEEE 802.15.4 protocol for Linux. This stack implements the LR-WPAN
or "Low-rate wireleess personal area networks" standard and is the basis for
specifications including ZigBee. It is this author's recollection that these
are supposed to be lower power, more easily implemented alternatives to
heavyweight stacks, such as Bluetooth.

DebugFS documentation. True to his word, Jonathan Corbet (of Linux Weekly
News - http://www.lwn.net/) followed up to his previous comments with a revised
version of his debugfs API documentation. The previous document (written for
LWN) was very complete back in the day, but the implementation has varied
considerably since then and was no longer accurate. The new document is fairly
complete and is being merged into Jonathan's documentation tree.

CPU hard limits. Bharata B Rao and Avi Kivity had a protracted dialog
concerning the nature of CPU limits, and guarantees. The basic object of
discussion was an apparent need for certain users to assign guaranteed CPU
resources (and hard limits) to certain tasks - for example in order to provide
a customer with a guaranteed 10% of the CPU. The debate centered around
whether it is truly possible to make certain guarantees to users in terms of
CPU availability, how this should be done, and how one should do this
optimally so as to avoid the CPU not being at full utilization.

CONFIG_VFAT_NO_CREATE_WITH_LONGNAMES. While it may seem like all has been
quiet on the VFAT front recently, this is apparently not true. Vimal Singh
asked whether Tridge's previous patch (which disables creation of long
filenames on mounted VFAT filesystems - but not the reading of existing long
filenames that might already be there) was still acceptable, to which Tridge
replied that it was although, quoting Tridge, "as several people pointed out
it lacked a public explanation of why it is needed. We are working on fixing
that." He apparently is also currently working on a slightly enhaced version,
since the existing one "loses more functionality than is strictly necessary".

Per-bdi writeback flusher threads. Frederic Weisbecker published the results
of another round of testing of Jens Axboe's per-BDI writeback flusher threads
patchset. This second set was produced "only with bdi-writeback" this time
and is available from Frederic's kernel.org people page.

In today's announcements: Atheros 802.11n USB firmware source code released.
Luis R. Rodriguez announced that firmware source for the Atheros 802.11n USB
Otus/at9170 device was released under the GPL.

The latest kernel release is 2.6.30-rc8, which was released by Linus last
week. A final 2.6.30 release is imminent now.

The latest release of the 2.4 series kernel is now 2.4.37.2, which was
released by Willy Tarreau on Sunday afternoon. The new release fixes a
regression brought in by 2.4.37.1 in which the CAP_KILL fix caused modprobe to
leave zombies on auto-loading, and includes a SCTP overflow fix as documented
in CVE-2009-0065 that had been too late for the previous cycle.

Stephen Rothwell posted a linux-next tree for June 5th. Since the previous day
a total of 4 trees have gained conflicts and the powerpc tree continues to
fail to build in an allyesconfig build configuration.

Andrew Morton posted an mm-of-the-moment for 2009-06-05-16-19.

Rafael J. Wysocki posted a list of reported regressions between 2.6.29 and
the automatically generated 2.6.30-rc8-git4 tree. The total number of
regressions currently stands at 37, of which 36 are pending and 28 are
unresolved. This number is fairly congruent with the previous update.

That's a summary of today's LKML traffic. For further information visit kernel.org. I'm Jon Masters.