KernelNewbies:

Linux 2.6.31 kernel released on 9 September, 2009

Spam: Valerie Aurora has publised on LWN [http://lwn.net/Articles/342892/ a great article explaining some parts of the deep internals of btrfs]. Since btrfs is expected to replace Ext4 at some point, it's an interesting read.

Summary: This version adds USB 3.0 support, a equivalent of FUSE for character devices used for proxying OSS sound to ALSA, some memory management changes that improve interactivity in desktops, readahead improvements, ATI Radeon Modesetting support, support for Intel's Wireless Multicomm 3200 Wifi devices, kernel support and a userspace tool for performance counters, gcov support, a memory checker for unitialized memory, a memory leak detector, a reimplementation of inotify and dnotify on top of a new filesystem notification infrastructure, btrfs improvements, support for the IEEE 802.15.4 network standard, IPv4 over Firewire, many new drivers, small improvements and fixes.

TableOfContents()

1. Prominent features (the cool stuff)

1.1. USB 3 support

This version Linux adds support for USB 3.0 devices (contributed by Sarah Sharp from Intel) and the hardware that implements the [http://www.intel.com/technology/usb/xhcispec.htm eXtensible Host Controller Interface (xHCI) 0.95 specification]. No xHCI hardware has made it onto the market yet, but these patches have been tested under the Fresco Logic host controller prototype.

Code: drivers/usb/host/xhci*

1.2. CUSE (character devices in userspace) and OSS Proxy

Recommended LWN article: [http://lwn.net/Articles/308445/ Character devices in user space]

CUSE is an extension of FUSE allowing character devices to be implemented in userspace, it has been contributed by Tejun Heo (SUSE)

It can be used for many things, for example "proxying" OSS audio from OSS apps through the ALSA userspace layer, or to an audio system which can forward the sound through the network. ALSA contains OSS emulation, but sadly the emulation is in the kernel, behind the userland multiplexing layer, which means that if your sound card doesn't support multiple audio streams (most modern cards don't), only either one of ALSA or OSS emulation interface would be usable at any given moment.

OSS Proxy uses CUSE to implement the OSS interface - /dev/dsp, /dev/adsp and /dev/mixer. From the POV of the applications, these devices are proper character devices and behave exactly the same way, so it can be used as a replacement for the in-kernel ALSA OSS emulation layer. The app sends the audio to these CUSE devices, and the OSS Proxy will forward it to a "slave" (currently there's only one slave implemented, pulseaudio)

Code: CUSE [http://git.kernel.org/linus/151060ac13144208bd7601d17e4c92c59b98072f (commit)] OSS Proxy home and code: http://userweb.kernel.org/~tj/ossp/

1.3. Improve desktop interactivity under memory pressure

PROT_EXEC pages are pages that normally belong to some currently running executables and their linked libraries, they shall really be cached aggressively to provide good user experiences because if they aren't, the desktop applications will experience very long and noticeable pauses when the application's code path jumps to a part of the code which is not cached in memory and needs to be read from the disk, which is very slow. Due to some memory management scalability work in recent kernel versions, there're some (commonly used) workloads which can send these PROT_EXEC pages to the list of filesystem-backed pages (the ones used to map files) which are unactive and can get flushed out of the working set. The result is a desktop environment with poor interactivity: the applications become unresponsive too easily.

In this version, some heuristics have been used to make much harder to get the mapped executable pages out of the list of active pages. The result is an improved desktop experience: Benchmarks on memory tight desktops show clock time and major faults reduced by 50%, and pswpin numbers are reduced to ~1/3, that means X desktop responsiveness is doubled under high memory/swap pressure. Memory flushing benchmarks in a file server shows the number of major faults going from 50 to 3 during 10% cache hot reads. See the commit link for more details and benchmarks.

Code: [http://git.kernel.org/linus/56e49d218890f49b0057710a4b6fef31f5ffbfec (commit 1], [http://git.kernel.org/linus/6fe6b7e35785e3232ffe7f81d3893f1316710a02 2], [http://git.kernel.org/linus/8cab4754d24a0f2e05920170c845bd84472814c6 3)]

1.4. ATI Radeon Kernel Mode Setting support

This version adds Kernel Mode Setting (KMS) support for ATI Radeon. Hardware supported is R1XX,R2XX,R3XX,R4XX,R5XX (radeon up to X1950). Works is underway to provide support for R6XX, R7XX and newer hardware (radeon from HD2XXX to HD4XXX).

Code: [http://git.kernel.org/linus/771fe6b912fca54f03e8a72eb63058b582775362 (commit)], [http://git.kernel.org/linus/ba4e7d973dd09b66912ac4c0856add8b0703a997 (commit)]

1.5. Performance Counters

Recommended LWN article: [Followups: performance counters, ksplice, and fsnotify http://lwn.net/Articles/311850/]

The Performance Counter subsystem provides an abstraction of special performance counter hardware registers available on most modern CPUs. These registers count the number of certain types of hw events: such as instructions executed, cachemisses suffered, or branches mis-predicted - without slowing down the kernel or applications. These registers can also trigger interrupts when a threshold number of events have passed - and can thus be used to profile the code that runs on that CPU. In this release, support for x86, PPC and partial support for S390 and FRV have been added.

Users are not expected to use the API themselves. Instead, a powerful performance analysis tool has been built: "perf", which is available at tools/perf/ (in an unusual decision of including kernel-related userspace software into the kernel tree).

perf supports a few modes of operation, like "perf top", which shows a top-like interface, which you can restrict to any given set of events, process or CPU. There's also "perf record", which records a profile into a file, and "perf report", which reads the profile and shows it in the screen, or "perf annotate", which reads the data and displays the annotated code. There's also "perf list", which shows the list of events supported by the hardware, and "perf stat", which runs a command and gathers performance statistics which are printed into the screen. All the documentation and man pages are available in the 'Documentation' subdirectory. Some examples:

$ ./perf stat -r 3 -- echo -n

 Performance counter stats for 'echo -n' (3 runs):

       2.337404  task-clock-msecs         #      0.566 CPUs    ( +-   1.704% )
              1  context-switches         #      0.000 M/sec   ( +-   0.000% )
              0  CPU-migrations           #      0.000 M/sec   ( +-   0.000% )
            184  page-faults              #      0.079 M/sec   ( +-   0.000% )
        4319963  cycles                   #   1848.188 M/sec   ( +-   1.615% )
        5024608  instructions             #      1.163 IPC     ( +-   0.722% )
          73278  cache-references         #     31.350 M/sec   ( +-   1.636% )
           2019  cache-misses             #      0.864 M/sec   ( +-   6.535% )

    0.004126139  seconds time elapsed   ( +-  24.603% )


$ perf report -s comm,dso,symbol -C firefox -d /usr/lib64/xulrunner-1.9.1/libxul.so | grep :: | head
         2.21%  [.] nsDeque::Push(void*)
         1.78%  [.] GraphWalker::DoWalk(nsDeque&)
         1.30%  [.] GCGraphBuilder::AddNode(void*, nsCycleCollectionParticipant*)
         1.27%  [.] XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)
         1.18%  [.] imgContainer::DrawFrameTo(gfxIImageFrame*, gfxIImageFrame*, nsRect&)
         1.13%  [.] nsDeque::PopFront()
         1.11%  [.] nsGlobalWindow::RunTimeout(nsTimeout*)
         0.97%  [.] nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&)
         0.95%  [.] nsJSEventListener::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&)
         0.95%  [.] nsCOMPtr_base::~nsCOMPtr_base()

Code: [http://git.kernel.org/linus/0793a61d4df8daeac6492dbf8d2f3e5713caae5e (commit 1], [http://git.kernel.org/linus/e7bc62b6b3aeaa8849f8383e0cfb7ca6c003adc6 2], [http://git.kernel.org/linus/241771ef016b5c0c83cd7a4372a74321c973c1e6 3], [http://git.kernel.org/linus/d662ed26734473d4cb5f3d78cebfec8f9126e97c 4], [http://git.kernel.org/linus/4574910e5087085a1f330ff8373cee4503f5c77c 5], [http://git.kernel.org/linus/16b067993dee3dfde61b20027e0b168dc06201ee 6], [http://git.kernel.org/linus/f78628374a13bc150db77c6e02d4f2c0a7f932ef 7], [http://git.kernel.org/linus/aabbaa6036fd847c583f585c6bae82b5a033e6c7 8], [http://git.kernel.org/linus/880860e392d92c457e8116cdee39ec4d109174ee 9], [http://git.kernel.org/linus/105988c015943e77092a6568bc5fb7e386df6ccd 10], [http://git.kernel.org/linus/7325927e5a20bfe0f006acf92801bf41c537d3d4 11], [http://git.kernel.org/linus/12310e9c1b9a53896e4df0459039dd125f62aa9b 12)]

1.6. IEEE 802.15.4 Low-Rate Wireless Personal Area Networks support

IEEE Std 802.15.4 defines a low data rate, low power and low complexity short range wireless personal area networks. It was designed to organise networks of sensors, switches, etc automation devices. Maximum allowed data rate is 250 kb/s and typical personal operating space around 10m.

Code: [http://git.kernel.org/linus/fcb94e422479da52ed90bab230c59617a0462416 (commit 1], [http://git.kernel.org/linus/9ec7671603573ede31207eb5b0b3e1aa211b2854 2], [http://git.kernel.org/linus/2c21d11518b688cd4c8e7ddfcd4ba41482ad075b 3], [http://git.kernel.org/linus/02cf228639233aa227a152955a98564c7a18f9ee 4], [http://git.kernel.org/linus/8459464f07cf67cab07b17d5736d75fb86adab22 5)]

1.7. Gcov support

This version enables the use of [http://gcc.gnu.org/onlinedocs/gcc/Gcov.html GCC's coverage testing tool gcov] with the Linux kernel. gcov may be useful for: debugging (has this code been reached at all?), test improvement (how do I change my test to cover these lines?), minimizing kernel configurations (do I need this option if the associated code is never run?) and other things.

Code: [http://git.kernel.org/linus/2521f2c228ad750701ba4702484e31d876dbc386 (commit 1], [http://git.kernel.org/linus/7bf99fb673f18408be1ebc958321ef4c3f6da9e2 2)]

1.8. Kmemcheck

Kmemcheck is a debugging feature for the Linux Kernel. More specifically, it is a dynamic checker that detects and warns about some uses of uninitialized memory. Userspace programmers might be familiar with Valgrind's memcheck. The main difference between memcheck and kmemcheck is that memcheck works for userspace programs only, and kmemcheck works for the kernel only.

Enabling kmemcheck on a kernel will probably slow it down to the extent that the machine will not be usable for normal workloads such as e.g. an interactive desktop. kmemcheck will also cause the kernel to use about twice as much memory as normal. For this reason, kmemcheck is strictly a debugging feature.

Code: [http://git.kernel.org/linus/e594c8de3bd4e7732ed3340fb01e18ec94b12df2 (commit 1], [http://git.kernel.org/linus/dfec072ecd35ba6ecad2d51dde325253ac9a2936 2], [http://git.kernel.org/linus/f85612967c93b67b10dd240e3e8bf8a0eee9def7 3], [http://git.kernel.org/linus/2dff440525f8faba8836e9f05297b76f23b4af30 4], [http://git.kernel.org/linus/d7002857dee6e9a3ce1f78d23f37caba106b29c5 5], [http://git.kernel.org/linus/c175eea466e760de4b69b9aad90157e7aa9ff54f 6], [http://git.kernel.org/linus/5a896d9e7c921742d0437a452f991288f4dc2c42 7], [http://git.kernel.org/linus/7d46d9e6dbffe8780aa8430a63543d3f7ba92860 8)]

1.9. Kmemleak

Recommended LWN article: [http://lwn.net/Articles/187979/ Detecting kernel memory leaks]

Kmemleak provides a way of detecting possible kernel memory leaks in a way similar to a [http://en.wikipedia.org/wiki/Garbage_collection_%28computer_science%29#Tracing_garbage_collectors tracing garbage collector], with the difference that the orphan objects are not freed. Instead, a kernel thread scans the memory every 10 minutes (by default) and prints any new unreferenced objects found in /sys/kernel/debug/kmemleak and warns about them ti . A similar method is used by the Valgrind tool (memcheck --leak-check) to detect the memory leaks in user-space applications.

Code: [http://git.kernel.org/linus/3c7b4e6b8be4c16f1e6e5c558e33b7ff0db2dfaf (commit 1], [http://git.kernel.org/linus/04f70336c80c43a15e617b36c2043dfa0ad6ed0f 2], [http://git.kernel.org/linus/d5cff635290aec9ad7e6ee546aa4fae895361cbb 3], [http://git.kernel.org/linus/4374e616d28e65265a5b433ceece275449f3d2e3 4], [http://git.kernel.org/linus/06f22f13f3cc2eff00db09f053218e5d4b757bc8 5], [http://git.kernel.org/linus/89219d37a2377c44fde7bff0bf0623453c05329a 6], [http://git.kernel.org/linus/dbb1f81ca67a56c6cfce4c94d07c76378fd4af9e 7], [http://git.kernel.org/linus/4f2294b6dc88d99295230d97fef2c9863cec44c3 8], [http://git.kernel.org/linus/2e1483c995bbd0fa6cbd055ad76088a520799ba4 9], [http://git.kernel.org/linus/3bba00d7bdd57cb7aa739b751fa0a1fbbb04dc18 10], [http://git.kernel.org/linus/0822ee4ac1ae6af5a953f97f75553738834b10b9 11)]

1.10. Fsnotify

Fsnotify is a backend for filesystem notification. Fsnotify itself does not provide any userspace interface but does provide the basis needed for other notification schemes such as dnotify, inotify and fanotify (this last notification interface, will be included in future releases). In fact, in this release dnotify and inotify have been rewritten in top of fsnotify, removing at the same time the ugly and complex code from those systems. Fsnotify provides a mechanism for "groups" to register for some set of filesystem events and to then deliver those events to those groups for processing, and the locking is much simpler. Fsnotify has other benefits, like shrinking the size of an inode.

Code: [http://git.kernel.org/linus/90586523eb4b349806887c62ee70685a49415124 (commit 1], [http://git.kernel.org/linus/3be25f49b9d6a97eae9bcb96d3292072b7658bd8 2], [http://git.kernel.org/linus/c28f7e56e9d95fb531dc3be8df2e7f52bee76d21 3], [http://git.kernel.org/linus/3c5119c05d624f95f4967d16b38c9624b816bdb9 4], [http://git.kernel.org/linus/a2d8bc6cb4a3024661baf877242f123787d0c054 5], [http://git.kernel.org/linus/62ffe5dfba056f7ba81d710fee9f28c58a42fdd6 6], [http://git.kernel.org/linus/47882c6f51e8ef41fbbe2bbb746a1ea3228dd7ca 7], [http://git.kernel.org/linus/e4aff117368cfdd3567ee41844d216d079b55173 8], [http://git.kernel.org/linus/1ef5f13c6c8acd3fd10db9f1743f3b4cf30a4abb 9], [http://git.kernel.org/linus/164bc6195139047faaf5ada1278332e99494803b 10], [http://git.kernel.org/linus/63c882a05416e18de6fb59f7dd6da48f3bbe8273 11], [http://git.kernel.org/linus/ff52cc2158b32b3b979ca7802b1fd7c70f36e13c 12], [http://git.kernel.org/linus/5ac697b793a3c45005c568df692518da6e690390 13], [http://git.kernel.org/linus/ce61856bd2aadb064f595e5c0444376a2b117c41 14], [http://git.kernel.org/linus/a092ee20fd33d2df0990dcbf2235afc181612818 15], [http://git.kernel.org/linus/e42e27736de80045f925564ea27a1d32957219e7 16)]

1.11. Preliminary NFS 4.1 client support

2.6.30 added some developer support for NFS 4.1. This version enables optional support for minor version 1 of the NFSv4 protocol (draft-ietf-nfsv4-minorversion1) in the kernel's NFS client

Code: [http://git.kernel.org/linus/1efae38140546db403845d628db9f2d608caa87e (commit)]

1.12. Context Readahead algorithm and mmap readhead improvements

This version introduces a page cache context based readahead algorithm. The current readahead algorithm detects interleaved reads in a passive way, the context readahead algorithm guarantees to discover the sequentialness no matter how the streams are interleaved. The beneficiaries are strictly interleaved reads and cooperative IO processes (i.e. NFS and SCST). SCST benchmarks [http://lkml.org/lkml/2009/3/19/239 show] 6%~40% performance gains in various cases and achieves equal performance in others

There're also some improvements to mmap readahead. On a NFS-root desktop, mmap readahead reduced major faults by 1/3 and no obvious overheads, mmap io can be further reduced by 1/4.

Code: [http://git.kernel.org/linus/70ac23cfa31f68289d4b720c6162b3929ab4de36 (commit 1], [http://git.kernel.org/linus/2fad6f5deee5556f511eab58da78737a23ddb35d 2], [http://git.kernel.org/linus/10be0b372cac50e2e7a477852f98bf069a97a3fa 3], [http://git.kernel.org/linus/7ffc59b4d0bdfa00e882339f85b8a969bb7021e2 4)]

2. Various core changes

3. Filesystems

4. Networking

5. Security

6. Tracing/Profiling

7. DM

8. Crypto

9. Virtualization

10. PCI

11. Block

12. Memory management

13. Architecture-specific changes

14. Drivers

14.1. Graphics

14.2. Storage

14.3. Network

14.4. Input

14.5. USB

14.6. Sound

14.7. V4L/DVB

14.8. Staging

14.9. FireWire

14.10. MTD

14.11. WATCHDOG

14.12. HWMON

14.13. HID

14.14. RTC

14.15. Serial

14.16. I2C

14.17. MFD

14.18. Rfkill

14.19. MMC

14.20. Regulator

14.21. Various

14.22. Other news sources tracking the kernel changes

KernelNewbies: Linux_2_6_31 (last edited 2009-09-09 23:18:12 by diegocalleja)