KernelNewbies:

Linux 2.6.22 Released, 8 July 2007 ([http://kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.22 full SCM git log])

TableOfContents()

Short overview (for news sites, etc)

2.6.22 includes an optional, more SMP-friendly SLUB allocator (http://lwn.net/Articles/229984), new and much better wireless and firewire stacks, a new architecture called Blackfin, a LVM-for-flash-storage-devices called UBI, event notifications through file descriptors (http://lwn.net/Articles/225714), the POSIX-draft utimensat() syscall, the 'TCP Illinois' and 'YeAH-TCP' congestion control algorithms, IPV6 Optimistic Duplicate Address Detection, AF_RXRPC socket support, relocatable x86-64 kernel support, improvements to the CFQ I/O scheduler, more process footprint information in /proc, various new drivers and many other improvements.

Important things (AKA: ''the cool stuff'')

New Slab allocator: SLUB

(Recommended article: [http://lwn.net/Articles/229984/ "The SLUB allocator"])

The slab allocator is a object-caching kernel memory allocator used for dealing with "objects that are frequently allocated and freed" (see the [http://citeseer.ist.psu.edu/bonwick94slab.html "slab allocator" paper from Jeff Bonwick]). It is a critical piece of the innards of the memory management subsystem, and a critical piece to get good performance. The Linux slab allocator works quite well for pretty much everybody; however some people (SGI) has found its current design inefficient in some cases. For example, in 1K nodes/processors configurations, several GB of memory are wasted only in object queues, not counting the objects themselves. The memory management quickly becomes too complex when adding features like proper NUMA policy support.

As result, a new slab allocator called "SLUB" has been developed by Christoph Lameter from SGI, to solve those and other problems. Its design is simpler, but it also addresses some problems that can result in better performance in some cases and more efficient memory usage (see the full design notes in this [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=81819f0fc8285a2a5a921c019e3e3d7b6169d225 commit link]). It also has better debug capabilities. There's a slabinfo userspace tool that you can find in Documentation/vm/slabinfo.c.

Its aim is to transparently replace slab, but in 2.6.22 this new slab allocator is optional and not enabled by default. You can enable it at compile time (making it the third option along with SLOB, the embedded-oriented slab allocator). SLUB has been tested for some time and it's solid enough to try it on your systems, but due to the importance of this part of the kernel, it won't completely replace the current slab allocator until more exposure and testing has been done, hence it's not recommended to use it in production systems. Testing reports, specially regressions, are greatly appreciated.

User documentation can be found [http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/vm/slub.txt here]

Code: [http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=mm/slub.c;hb=HEAD mm/slub.c];

New Wireless stack

For too many years, Linux wireless support has worked, but not very well. 2.6.22 has a completely new, better wireless stack included. This new wireless stack has been donated by the known WiFi specialist company Devicescape (many thanks to [http://www.devicescape.com Devicescape] for their contribution and [http://www.devicescape.org support to open source]!). This wireless stack has many features, like a complete software MAC implementation, WEP, WPA, a "link-layer" bridging module, hostapd, QoS support to prioritize things like VoIP, 802.11g support, and full debug capabilities. All of this comes in a single implementation that drivers can use without rewriting those features themselves, which sadly has been done multiple times in the linux WiFi world.

Another feature of this stack is a completely new user interface. The old stacks have an ugly ioctl-based interface which were standardized under the name of "wireless extensions" (wext). The new interface uses a netlink-based interface, suited for the needs of desktop-based configuration interfaces, but retaining at the same time userspace compatibility with the old interface.

The disadvantage is the lack of drivers using this stack: the drivers that have been in the tree for a long time do not support this stack, and will need to be ported (which will hopefully not be that hard, since the new stack is actually a much better ground to build drivers upon that the current mess). There are quite a lot of new and ported drivers that are already using the new stack which have not been merged in this release, but will get merged in future releases, like the RT2x00 drivers, the bcm43xx driver, zd1211rw, adm8211, rtl818x, Intel iwlwifi (ipw3945 and ipw4965). Distributions like Ubuntu and Fedora already are using them.

In any case, this is the building block that will bring better wireless support to Linux.

Code: [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f0706e828e96d0fa4e80c0d25aa98523f6d589a0 (commit 1], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=64a327a7029d3860ddf6a024816afa9e6673eb57 2], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a9de8ce0943e03b425be18561f51159fcceb873d 3], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e9f207f0ff90bf60b825800d7450e6f2ff2eab88 4], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=704232c2718c9d4b3375ec15a14fc0397970c449 5], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2a5e1c0eb9efe26eed1dd072fe08de5797a7efd5 6], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=9e101eab153073d8a1fc7ea22b20af65de8ab44b 7)]

New FireWire stack

The FireWire stack is also getting a rewrite, with the old stack being kept around for the time being. The main driver behind this work, according the author, is "to get a small, maintainable and supportable FireWire stack, with an acceptable backwards compatibility story".

This stack has many advantages: Considerably leaner codebase (less than 8k lines of code compared to 30k lines of code in the old stack, and a similar size reduction in the sizes of the binary files), cleaned-up and improved in-stack APIs (with the side effect of getting rid of a bunch of old bugs) and design (no kernel threads, compared to one subsystem thread and one thread per FireWire controller in the old stack), consolidation of the currently four userspace ABIs into one improved ABI, the userspace ABI is changed, but compatibility is kept stable at library level (libraw1394 and libdc1394), and per-device device files, letting userspace set up finer-grained access control, such as preventing direct access to FireWire storage devices.

Still missing features relative to the old stack are: eth1394 (IP over 1394) not ported over, and no support for the PCILynx chipset (less important because that chip is very rare), isochronous support at the moment only for OHCI-1.1 chips, not on OHCI-1.0 chips. Plus the disadvantage of any new piece of code: despite being tested in Fedora rawhide etc, it can contain many bugs.

Code: [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=3038e353cfaf548eb94f02b172b9dbe412abd24c (commit 1], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=9ba136d0fe5a3dd33533b4a2a21156aa22f80ebe 2], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ed5689122f4cdb5cb8c6770ad1a2c8561b32d9b3 3], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=19a15b937b26638933307bb02f7b1801310d6eb2 4)]

Linux1394.org's Release Notes: [http://marc.info/?l=linux1394-user&m=118401466928264 posting], [http://wiki.linux1394.org/ReleaseNotesKernel wiki]

Signal/timer events notifications through file descriptors

(Recommended article: [http://lwn.net/Articles/225714/ "Kernel events without kevents"])

Linux currently lacks a proper way to get complete event reporting like other systems do. poll/epoll isn't a solution for everything, because it only works in file descriptors so things like timer and signal notifications aren't covered by it, so to get fe. signal notifications in the main event loop people has needed to use (clever) hacks, like writing a byte between two internal pipes.

After considering the inclusion of [http://linux-net.osdl.org/index.php/Kevent an implementation] of a [http://www.freebsd.org/cgi/man.cgi?query=kevent&apropos=0&sektion=0&manpath=FreeBSD+6.2-RELEASE&format=html FreeBSD/OSX-like ] generic event notification mechanism, a simpler, more Unixy solution ([http://groups.google.com/group/linux.kernel/msg/1f3fc521db812a07 inspired by Linus] some years ago) has been adopted.

Three new syscalls have been added: signalfd()/timerfd()/eventfd(). What those syscalls do is to implement event delivery into file descriptors. You can use the standard read(), select(), poll(), epoll() on those fds. They've the following differences

Code: Anonymous inode source [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5dc8bf8132d59c03fe2562bce165c2f03f021687 (commit)], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=da66f7cb0f69ab27dbf5b9d0b85c4b97716c44d1 (commit)] ; signalfd: [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fba2afaaec790dc5ab4ae8827972f342211bbb86 (commit)], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=2121e24bd8dd16b4e3f8d995428e2a748d5180cc (commit)], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6d18c9220965b437287c3a7e803725c24992ceac (commit)]; timerfd: [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b215e283992899650c4271e7385c79e26fb9a88e (commit)], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=57ac8898508638ca6d15ecd8b911a431d673ff30 (commit)], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=83f5d1266926c75890f1bc4678e49d79483cb573 (commit)]; eventfd: [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e1ad7468c77ddb94b0615d5f50fa255525fde0f0 (commit)], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fdb902b1225e1668315f38e96d2f439452c03a15 (commit)], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=9c3060bedd84144653a2ad7bea32389f65598d40 (commit)]

Blackfin architecture

2.6.22 adds support for yet another architecture: The Analog Devices Blackfin processor architecture, and currently supports the BF533, BF532, BF531, BF537, BF536, BF534, and BF561 (Dual Core) devices, with a variety of development platforms including those available from Analog Devices (BF533-EZKit, BF533-STAMP, BF537-STAMP, BF561-EZKIT) and Bluetechnix! Tinyboards.

The Blackfin architecture was jointly developed by Intel and Analog Devices Inc. (ADI) as the Micro Signal Architecture (MSA) core and introduced it in December of 2000. Since then ADI has put this core into its Blackfin processor family of devices. The Blackfin core has the advantages of a clean, orthogonal, RISC-like microprocessor instruction set. It combines a dual-MAC (Multiply/Accumulate), state-of-the-art signal processing engine and single-instruction, multiple-data (SIMD) multimedia capabilities into a single instruction set architecture.

The Blackfin architecture, including the instruction set, is described by the [http://blackfin.uclinux.org/gf/download/frsrelease/29/2549/Blackfin_PRM.pdf ADSP-BF53x/BF56x Blackfin Processor Programming Reference]. The Blackfin processor is already supported by major releases of gcc, and there are [http://blackfin.uclinux.org/gf/project/toolchain/frs binary and source rpms/tarballs] available for many architectures. There is [http://docs.blackfin.uclinux.org/ complete documentation], including "getting started" guides, which provides links to the sources and patches you will need in order to set up a cross-compiling environment for bfin-linux-uclib. All the code is actively supported by Analog Devices Inc, at: http://blackfin.uclinux.org

Code: [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1394f03221790a988afc3e4b3cb79f2e477246a9 (commit 1], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a5f6abd4f7558fea97bc4021fd0eb7dcc5d16a77 2], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8cc75c9a1498913d668b6d3559940c6837cee8bf 3], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d24ecfcc3953f9c3b833508cd839be614a3f3c64 4], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0851a2848cfd40012063ca9cf86fb67b7bebceff 5], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=194de5612777a9ff4f96dae1932f77a5a89e5f0a 6)]

UBI

The shortest description for UBI is "LVM for NAND flash memory devices". Why duplicate LVM? Well, because flash devices can't really be handled as typical hard disks. UBI provides wear-levelling support across the whole flash chip. UBI completely hides 2 aspects of flash chips which make them very difficult to work with: 1. wear of eraseblocks; 2. bad eraseblocks. UBI also makes it possible to dynamically create, delete and re-size flash partitions (UBI volumes).

Home page: http://www.linux-mtd.infradead.org/doc/ubi.html

Code: [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=801c135ce73d5df1caf3eca35b66a10824ae0707 (commit)], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0029da3bf430eea498eee8cef5933f9214534b8a (commit)]

Secure RxRPC sockets

The RxRPC protocol driver included in 2.6.22 provides a reliable two-phase transport on top of UDP that can be used to perform RxRPC remote operations. This is done over sockets of AF_RXRPC family, using sendmsg() and recvmsg() with control data to send and receive data, aborts and errors. The AFS filesystem has been ported to use AF_RXRPC instead of the old RxRPC code.

Code: [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=17926a79320afa9b95df6b977b40cca6d8713cea (commit 1], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=651350d10f93bed7003c9a66e24cf25e0f8eed3d 2], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=08e0e7c82eeadec6f4871a386b86bf0f0fbcb4eb 3], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=63b6be55e8b51cb718468794d343058e96c7462c 4)]

Process footprint measurement facility

2.6.22 adds a "Referenced" line to each VMA in /proc/pid/smaps, which indicates how many pages within it are currently marked as referenced or accessed. There's also a new /proc/pid/clear_refs file. When any non-zero number is written to this clear_refs file, the Reference fiel is cleared-

With those mechanism it is now possible to measure approximately how much memory a task is using by clearing the reference bits with "echo 1 > /proc/pid/clear_refs" and checking the reference count for each VMA from the /proc/pid/smaps output at a measured time interval (fe. 1 second). This is a valuable tool to get an approximate measurement of the memory footprint for a task.

Code: [http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f79f177c25016647cc92ffac8afa7cb96ce47011 (commit)], [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b813e931b4c8235bb42e301096ea97dbdee3e8fe (commit)]

utimensat()

The next revision of POSIX will support fine-grained filesystem timestamps. struct stat will report nanosecond values. During the development one additional problem was found: there is no interface to set the file timestamp with that precision. utimes only takes a timeval structure which allows only micro-second resolution. This is why the utimensat() interface was created. It is basically the same as futimesat() interface but it takes a timespec structure.

Code: [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1c710c896eb461895d3c399e15bb5f20b39c9073 (commit)]

New drivers

Various core changes

Architecture-specific changes

Various subsystems

Filesystems

Networking

DM

SELinux

Audit

* auditing ptrace [http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a5cb013da773a67ee48d1c19e96436c22a73a7eb (commit)]

Crypto

KVM

Power Management

Drivers

Network drivers

SATA/IDE/SCSI

Graphics

Sound

Input

MTD

USB

V4L/DVB

I2C

ACPI

Watchdog

Bluetooth

Cpufreq

HwMon

Various

Crashing soon a kernel near you

This is a list of some of the ongoing patches being developed at the kernel community that will be part of future Linux releases. Those features may take many months to get into the Linus' git tree, or may be dropped. The features are tested in the -mm tree, but be warned, it can crash your machine, eat your data (unlikely but not impossible) or kidnap your family (just because it has never happened it doesn't mean you're safe):

KernelNewbies: Linux_2_6_22 (last edited 2007-08-24 16:22:44 by NicolasKaiser)