KernelNewbies:

Linux 2.6.33 has been released on February 24th, 2010.

Summary: This version features Nouveau (a reverse-engineered driver for Nvidia graphic cards), Nintendo Wii and Gamecube support, DRBD (Distributed Replicated Block Device), a security extension for TCP called "cookie transactions", a syscall for batching recvmsg() calls, several new perf subcommands (perf probe, perf bench, perf kmem, perf diff), support for cache compression, Xen PV-on-HVM support, drivers for virtual network and graphic cards from VMWare, swappable KSM pages, and many new drivers and many small improvements and bugfixes

1. Prominent features (the cool stuff)

1.1. Nouveau, a driver for Nvidia graphic cards

This version includes Nouveau, a driver for Nvidia graphic cards, the one major GPU vendor without opensource drivers in the Linux kernel. Being developed since 2006, it has 26,000 LoC (not counting the Mesa stuff and the rest of the DRM stack). Nvidia has not contributed to this driver, it has been reverse-engineered. Graphic cards are one of the most complex pieces of hardware that you can find in modern computers, it's difficult to write drivers for them even having all the docs. So the developers of Nouveau deserve a big applause.

Nouveau is important because opensource is the one way to get good long term support for your graphic card. The new and powerful graphic card you've bought today will be unsupported in a few years. This doesn't happen with open source drivers, Nouveau (and ATI open source drivers) support today more devices than the official propietary drivers: for example, Riva TNT and Geforce 2/4MX/4Ti/FX.

The feature set, however, is not comparable, but Nouveau already supports a decent set of features: modesetting (KMS), suspend/resume, Dual Head (RandR 1.2), and 2D operations (EXA, Xrender, Xv video). 3D functionality is not fully supported, but it's improving. And, of course, it's not stable, which is why it's only being merged in the staging directory.

The ctxprogs/ctx_voodoo" firmware will not be needed in the future, because it can be autogenerated. Only a few cards autogenerate it today, but the dependency will be removed in the future.

Code: (commit), (commit)

1.2. DRBD (Distributed Replicated Block Device)

Recommended LWN article: DRBD: a distributed block device

Web site (includes extensive documentation): http://www.drbd.org/

DRBD ("Distributed Replicated Block Device") is a shared-nothing, synchronously replicated block device, developed by LINBIT. It is designed to serve as a building block for high availability (HA) clusters. DRBD can be understood as network based raid-1.

For automatic failover you need a cluster manager (e.g. heartbeat). See also: http://www.drbd.org/, http://www.linux-ha.org

Code: (commit)

1.3. Perf improvements: perf probe, perk kmem, perf bench, perf diff, perf perl scripts and filters

Recommende LWN article: Dynamic probes with ftrace

This release adds a lot of improvements to the tracing infrastructure and the perf tool. (tools/perf)

perf probe: perf probe is a subcommand that allows to create kprobe events. Kprobe is a system that allows to break into any kernel routine at runtime and collect debugging and performance information non-disruptively. It's the system used by Systemtap to do kernel instrumentation. Perf probe allows to define kprobe events using C expressions (C line numbers, C function names, and C local variables). For example:

Step 1: Add a new kprobe probe on a line of C source code: "perf probe -P 'p:myprobe @fs/read_write.c:285 file buf count'" (it creates a new probe, called "myprobe", which will inspect the variables file, buf and count). Alternatively, you could also run simpler commands like "perf probe sys_open" to add a probe for the sys_open symbol (open() syscall)

Step 2: Add a new kretprobe probe on a function return "perf probe -P 'r:myretprobe vfs_read $rv'"

Step 3: If you run "perf list", you will see a event section named "kprobes" which contains the probes you just set up.

Step 4: Record the event: "perf record -f -e kprobes:myprobe:record -F 1 -a ls" and trace it "perf trace"

Code: (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)

perf bench: perf bench is a small suite of microbenchmarks. In this release, there're only three benchmarks: perf bench sched messaging (for benchmarking scheduler and IPC), perf bench sched pipe (benchmarks pipe()) and perf bench mem memcpy (measures memory bandwith). The command perf bench all will run all benchmarks.

Code: (commit 1, 2, 3, 4, 5, 6, 7, 8)

perf kmem: This tool is mostly a perf version of kmemtrace-user. It shows various information of SLAB

Code: (commit 1, 2, 3, 4)

perf diff: perf diff shows performance differences between various records

Code: (commit)

perf perl scripts: (Recommended LWN article: Scripting support for perf It's a Perl scripting engine for programmable 'perf trace' scripting. See perf trace -g/--gen-script and perf trace -s/--script.

Code: (commit 1, 2, 3, 4, 5, 6)

perf filters: This feature adds "--filter expression" support to tracepoints, which utilizes the filter engine within the kernel. For example, to trace only timer interrupts in the system: "perf record -e irq:irq_handler_entry --filter='irq==0' -R -f -a sleep 10". Or to only record IRQ 19 when the 'achi' handler is triggered: "perf record -e irq:irq_handler_entry --filter='irq==19 && name==ahci' -R -f -a sleep 10"

Code: Add filter Suppport (commit), (commit)

1.4. recvmmsg(), a syscall for batching recvmsg() calls

Recommended LWN article: In brief

Recommended slides: Batch datagram processing

recvmmsg() is a new syscall that allows to receive with a single syscall multiple messages that would require multiple calls to recvmsg(). For high-bandwith, small packet applications, throughput and latency are improved greatly.

Code: (commit)

Recommended LWN article: TCP cookie transactions

Recommended Wikipedia article: TCP Cookie Transactions

TCP Cookie Transactions (TCPCT) is an extension of TCP intended to secure it against denial-of-service attacks, such as resource exhaustion by SYN flooding and malicious connection termination by third parties. Unlike the original SYN cookies approach, TCPCT does not conflict with other TCP extensions, but requires TCPCT support in the client (initiator) as well as the server (responder) TCP stack. The immediate reason for the TCPCT extension is deployment of the DNSSEC protocol.

Code: (commit), (commit), (commit), (commit), (commit), (commit), (commit)

1.6. Support for Xen PV-on-HVM guests

Support for Xen PV-on-HVM guests can be implemented almost entirely in userspace, except for handling one annoying MSR that maps a Xen hypercall blob into guest address space. This patch implementes a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell KVM which MSR the guest will write to, as well as the starting address and size of the hypercall blobs (one each for 32-bit and 64-bit) that userspace has loaded from files. When the guest writes to the MSR, KVM copies one page of the blob from userspace to the guest.

Code: (commit)

1.7. Swappable KSM pages

Kernel Samepage Merging (KSM) is a feature merged in Linux 2.6.32 which deduplicates memory of virtualized guests. The implementation, however, didn't allow to swap the pages that were shared. This release brings swap support for KSM pages.

Code: (commit)

1.8. Block IO Controller

Recommended LWN article: The block I/O controller

Control groups are virtual "containers" that are created as directories inside a special virtual filesystem (usually, with the help of tools), and arbitrary sets of processes can be added to that control group, which you can configure to a set of cpu scheduling or memory limits that will affect to all the processes inside the control group.

This release adds a block IO controller. Currently, CFQ IO scheduler uses it to recognize task groups and control disk bandwidth allocation to such task groups (somewhat like CFQ priorities, but implemented in a very different way), this controller will be extended in the future. For more details, read the documentation

Code: (commit 1), 2 ,3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15, 16, 17, 18, 19, 20)

1.9. Compcache: memory compressed swapping

Recommended LWN article: Compcache: in-memory compressed swapping

Compcache is a project (still under development, only available in Staging) creates RAM-based block devices (/dev/ramzswapX) which are used as swap disks. Pages swapped to this virtual device are compressed to a smaller size. Part of your RAM is used as usually, and another part (the size is configurable) is used to save compressed pages, increases the amount of RAM you can use in practice.

This feature can be very useful in many cases: Netbooks, smartphones and other embedded devices, distro installers, dumb clients without disk, virtualization, or old machines with not enought RAM to run modern software.

Measurements have found this feature very effective. See this page to see some benchmarks. The project home page can be found at http://compcache.googlecode.com/

Code: (commit), (commit)

1.10. Graphic improvements

Besides the inclusion of Nouveau, there's the usual round of improvements to the graphic stack that have become common after GEM and KMS were merged.

1.11. Nintendo Wii and Gamecube support

The gc-linux.sourceforge.net project has been working in Linux support for the PPC-based game consoles Nintendo Wii and Nintendo Gamecube. This release merges this support in the kernel.

1.12. VMware drivers

VMWare has contributed two drivers for the VWware Virtual GPU, and for the VMware's virtual Ethernet NIC vmxnet3. Thanks to udev, this means that Linux guests running inside a VMware host will have optimal graphic and network performance out-of-the-box.

vmwgfx: (commit), (commit) vmxnet3: (commit), (commit)

1.13. Reiserfs de-BKLification

One of the biggest shortcomings of reiserfs v3 (and one of the reasons why most distros use Ext instead) is that its codebase handles concurrency using a single big lock - the BKL (Big Kernel Lock). This means that its SMP scalability is very poor. This release won't fix that issue, but it replaces the BKL with a reiserfs-specific solution. In this release, there are no more traces of the BKL inside reiserfs. It has been converted into a recursive mutex. This sounds dirty but plugging a traditional lock into reiserfs would involve a deeper rewrite as the reiserfs architecture is based on the ugly big kernel lock rules.

Due to the subtle semantics of the locking changes, some workloads may have small performance regressions and other have improvements.

Code: Many commits.

1.14. Android removed from the Linux kernel

Recommended article: Android and the Linux kernel community

Google doesn't seem to have interest in improving the Android drivers to have minimum quality standards which could allow to merge them in the main Linux tree and share them with the rest of community. Of course, that's totally legal, but it's sad that a project that is doing so much to bring open source to the masses has become an example of how not to interact with an open source community.

2. Various core changes

3. Block

4. Virtualization

5. MD/DM

6. Filesystems

7. Networking

8. Security

9. Tracing/Profiling

10. Crypto

11. Architecture-specific changes

12. Drivers

12.1. Graphics

12.2. Storage

12.3. Networking devices

12.4. USB

12.5. FireWire

12.6. Input

12.7. Sound

12.8. Staging Drivers

12.9. V4L/DVB

12.10. HID

12.11. RTC

12.12. Bluetooth

12.13. MFD

12.14. MTD

12.15. HWMON

12.16. Various

12.17. Other news sources tracking the kernel changes

KernelNewbies: Linux_2_6_33 (last edited 2017-12-30 01:29:53 by localhost)