KernelNewbies:

Linux 2.6.35 has been released on 1 Aug, 2010.

Summary: Linux 2.6.35 includes support for transparent spreading of incoming network load across CPUs, Direct-IO support for Btrfs, an new experimental journal mode for XFS, the KDB debugger UI based on top of KGDB, improvements to 'perf', H.264 and VC1 video acceleration in Intel G45+ chips, support for the future Intel Cougarpoint graphic chip, power management for AMD Radeon chips, a memory defragmentation mechanism, support for the Tunneling Protocol version 3 (RFC 3931), support for multiple multicast route tables, support for the CAIF protocol used by ST-Ericsson products, support for the ACPI Platform Error Interface, and many new drivers and small improvements.

Note: Details on architecture-specific and driver changes have been moved to this page: Linux_2_6_35-DriversArch

1. Prominent features (the cool stuff)

1.1. Transparent spreading of incoming network traffic load across CPUs

Recommended LWN articles: "Receive packet steering" and "Receive flow steering"

Network cards have improved the bandwidth to the point where it's hard for a single modern CPU to keep up. Two new features contributed by Google aim to spread the load of network handling across the CPUs available in the system: Receive Packet Steering (RPS) and Receive Flow Steering (RFS).

RPS distributes the load of received packet processing across multiple CPUs. This solution allows protocol processing (e.g. IP and TCP) to be performed on packets in parallel (contrary to the previous code). For each device (or each receive queue in a multi-queue device) a hashing of the packet header is used to index into a mask of CPUs (which can be configured manually in /sys/class/net/<device>/queues/rx-<n>/rps_cpus) and decide which CPU will be used to process a packet. But there're also some heuristics provided by the RFS side of this feature. Instead of randomly choosing the CPU from a hash, RFS tries to use the CPU where the application running the recvmsg() syscall is running or has run in the past, to improve cache utilization. Hardware hashing is used if available. This feature effectively emulates what a multi-queue NIC can provide, but instead it is implement in software and for all kind of network hardware, including single queue cards and not excluding multiqueue cards.

Benchmarks of 500 instances of netperf TCP_RR test with 1 byte request and response show the potential benefit of this feature, a e1000e network card on 8 core Intel server goes from 104K tps at 30% CPU usage, to 303K tps at 61% CPU usage when using RPS+RFS. A RPC test which is similar in structure to the netperf RR test with 100 threads on each host, but doing more work in userspace that netperf, goes from 103K tps at 48% of CPU utilization to 223K at 73% CPU utilization and much lower latency.

Code: (commit 1, 2, 3)

1.2. Btrfs improvements

1.3. XFS Delayed logging

This version adds a logging (journaling) mode called delayed logging, which is very briefly modeled after the journaling mode in the ext3/4 and reiserfs filesystems. It allows to accumulated multiple asynchronous transactions in memory instead of possibly writing them out many times. The I/O bandwidth used for the log decreases by orders of magnitude and performance on metadata intensive workloads increases massively. The log disk format is not changed, only the in-memory data structures and code. This feature is experimental, so it's not recommended for final users or production servers. Those who want to test it can enable it with the "-o delaylog" mount option. Code: (commit 1, 2)

1.4. KDB kernel debugger frontend

The Linux kernel has had a kernel debugger since 2.6.26, called Kgdb. But Kgdb is not the only linux kernel debugger, there is also KDB, developed years ago by SGI. The key difference between Kgdb and KDB is that using Kgdb requires an additional computer to run a gdb frontend, and you can do source level debugging. KDB, on the other hand, can be run on the local machine and can be used to inspect the system, but it doesn't do source-level debugging. What is happening in this version is that Jason Wessel, from Windriver, has ported KDB to work on top of the Kgdb core, making possible to use both interfaces.

Code: (commit 1, 2, 3, 4, 5, 6, 7, 8)

1.5. perf improvements

1.6. Graphic improvements

This version carries a bunch of interesting improvements to the graphics stack.

1.7. Memory compaction

Recommended LWN article: "Memory compaction"

The memory compaction mechanism tries reduces external memory fragmentation in a memory zone by trying to move used pages to create a new big block of contiguous used pages. When compaction finishes, the memory zone will have a big block of used pages and a big block of free pages. This will make easier to allocate bigger chunks of memory. The mechanism is called "compaction" to distinguish it from other forms of defragmentation.

In this implementation, a full compaction run involves two scanners operating within a zone, a migration scanner and a free scanner. The migration scanner starts at the beginning of a zone and finds all used pages that can be moved. The free scanner begins at the end of the zone and searches for enough free pages to migrate all the used pages found by the previous scanner. A compaction run completes within a zone when the two scanners meet and used pages are migrated to the free blocks scanned. Testing has showed the amount of IO required to satisfy a huge page allocation is reduced significantly.

Memory compaction can be triggered in one of three ways. It may be triggered explicitly by writing any value to /proc/sys/vm/compact_memory and compacting all of memory. It can be triggered on a per-node basis by writing any value to /sys/devices/system/node/nodeN/compact where N is the node ID to be compacted. When a process fails to allocate a high-order page, it may compact memory in an attempt to satisfy the allocation instead of entering direct reclaim. Explicit compaction does not finish until the two scanners meet and direct compaction ends if a suitable page becomes available that would meet watermarks.

Code: (commit 1, 2, 3, 4 ,5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)

1.8. Support for multiple multicast route tables

Normally, a multicast router runs a userspace daemon and decides what to do with a multicast packet based on the source and destination addresses. This feature adds support for multiple independent multicast routing instances, so the kernel is able to take interfaces and packet marks into account and run multiple instances of userspace daemons simultaneously, each one handling a single table. Code: (commit 1, 2, 3)

1.9. L2TP Version 3 (RFC 3931) support

This version adds support for Layer 2 Tunneling Protocol (L2TP) version 3, RFC 3931. L2TP provides a dynamic mechanism for tunneling Layer 2 (L2) "circuits" across a packet-oriented data network (e.g., over IP). L2TP, as originally defined in RFC 2661, is a standard method for tunneling Point-to-Point Protocol (PPP) [RFC1661] sessions. L2TP has since been adopted for tunneling a number of other L2 protocols, including ATM, Frame Relay, HDLC and even raw ethernet frames, this is the version 3. Code: (commit 1, 2, 3, 4)

1.10. CAIF Protocol support

Support for the CAIF protocol. CAIF is a MUX protocol used by ST-Ericsson cellular modems for communication between Modem and host. The host processes can open virtual AT channels, initiate GPRS Data connections, Video channels and Utility Channels. The Utility Channels are general purpose pipes between modem and host. ST-Ericsson modems support a number of transports between modem and host. Currently, UART and Loopback are available for Linux (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)

1.11. ACPI Platform Error Interface support

Support for the ACPI Platform Error Interface (APEI). This improves NMI handling especially. In addition it supports the APEI Error Record Serialization Table (ERST), the APEI Generic Hardware Error Source and APEI Error INJection (EINJ) and saving of MCE (Machine Check Exception) errors into flash. For more information about APEI, please refer to ACPI Specification version 4.0, chapter 17. Code: (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)

2. Various core changes

3. Filesystems

4. Block

5. Memory management

6. Networking

7. Tracing/Profiling

8. Crypto

9. Virtualization

10. MD

11. CPU scheduler

12. Cpufreq/cpuidle

13. Security

KernelNewbies: Linux_2_6_35 (last edited 2017-12-30 01:29:51 by localhost)