Kernel 2.6.39 released on Wed, 18 May 2011.
Summary: EXT4 SMP scalability improvements, increase of the initial TCP congestion window, a new architecture called Unicore-32, a feature that allows the creation of groups of network resources called IPset, Btrfs updates, a feature that allows to store crash information in firmware to recover it after a reboot, open-by-handle syscalls, perf updates, and many other small changes and new drivers.
- Prominent features (the cool stuff)
- Drivers and architectures
- CPU scheduler
- Memory management
- File systems
1. Prominent features (the cool stuff)
1.1. Ext4 SMP scalability
In 2.6.37, huge Ext4 scalability improvements were merged and mentioned in the changelog. But this feature was not ready for prime time and had been disabled in source before the release - something that the changelog didn't mention. In this release it has been enabled by default. This is the text from the previous changelog:
"In this release Ext4 will use the "bio" layer directly instead of the intermediate "buffer" layer. The "bio" layer (alias for Block I/O: it's the part of the kernel that sends the requests to the IO/O scheduler) was one of the first features merged in the Linux 2.5.1 kernel. The buffer layer has a lot of performance and SMP scalability issues that will get solved with this port. A FFSB benchmark in a 48 core AMD box using a 24 SAS-disk hardware RAID array with 192 simultaneous ffsb threads speeds up by 300% (400% disabling journaling), while reducing CPU usage by a factor of 3-4"
1.2. TCP: Increase the initial congestion window to 10 packets
Recommended LWN article: Increasing the TCP initial congestion window
The initial congestion window of the Linux TCP stack has been increased to 10 packets, it should improve latencies.
Official IPset webpage: http://ipset.netfilter.org/
IPset allows the creation of groups of network resources (IPv4/v6 addresses, TCP/UDP port numbers, IP-MAC address pairs, IP-port number pairs, etc), called "IP sets", then you can use those sets to define Netfilter/iptables rules. These sets are much more lookup-efficient than bare iptables rules, but may come with a greater memory footprint. Different storage algorithms (for the data structures in memory) are provided in ipset for the user to select an optimum solution. IPset has been available for some time in the xtables-addons patches and is now being included in the Linux tree.
This tool is useful to do things like: store multiple IP addresses or port numbers and match against the collection by iptables at one swoop; dynamically update iptables rules against IP addresses or ports without performance penalty; express complex IP address and ports based rulesets with one single iptables rule and benefit from the speed of IP sets.
1.4. Btrfs updates
Btrfs allows different compression and copy-on-write settings for each file/directory (in addition to the per-filesystem controls). There is also the usual round of minor speedups, and tracepoints for runtime analysis.
1.5. Pstore: storing crash information across a reboot
Recommended LWN article: Persistent storage for a kernel's "dying breath"
Pstore is a filesystem interface that allows to store and recover crash information across a reboot storing it in places like the ERST, a mechanism specified by ACPI that allows saving and retrieving hardware error information to and from a non-volatile location (like flash).
1.6. New Architecture: UniCore-32
1.7. Transcendent Memory
Recommended LWN article: Transcendent memory
Trascendent memory is a new type of memory with a particular set of characteristics. From LWN: "transcendental memory can be thought of as a sort of RAM disk with some interesting characteristics: nobody knows how big it is, writes to the disk may not succeed, and, potentially, data written to the disk may vanish before being read back again". This memory could be used in places like the page cache, swap, or virtualization. In this release it is used for to implement a compressed in-memory caching mechanism called zcache.
1.8. BKL: That's all, folks
In 2.6.37, it was possible to compile a Linux kernel without support for the BKL. In this release, the BKL has been removed completely from the kernel sources, including the functions lock_kernel() and unlock_kernel().
1.9. Open-by-handle syscalls
Recommended LWN article: Open by handle
Two new syscalls have been added, name_to_handle_at() and open_by_handle_at(). These syscalls return a file handle, which is useful for user-space filesystems, backup software and other storage management tools. These handles can be used in a new flag that has been added to the open() syscall: O_PATH.
1.10. Perf updates
Add the ability to filter monitoring based on container groups (cgroups) for both perf stat and perf record. It is possible to monitor multiple cgroup in parallel. There is one cgroup per event. The cgroups to monitor are passed via a new -G option followed by a comma separated list of cgroup names (commit), (commit)
Initial python binding (commit)
perf list: Allow filtering list of events (commit)
perf stat: Provide support for filters (commit)
perf evlist: New command to list the names of events present in a perf.data file (commit)
perf probe: add variable filter support, e.g. perf probe -V schedule --externs --filter=cpu* (commit), add filters support for available functions (commit), support function@filename syntax for --line (commit), add --funcs to show available functions in symtab (commit), enable to put probe inline function call site This will increase line-based probe-ability (commit)
2. Drivers and architectures
All the driver and architecture-specific changes can be found in the Linux_2_6_39-DriversArch page
Add a commandline parameter "threadirqs" which forces all interrupts except those marked IRQF_NO_THREAD to run threaded (commit)
POSIX timers: Introduce a syscall for clock tuning. (commit)
bloat-o-meter: include read-only data section in report (commit)
console: allow to retain boot console via boot option keep_bootcon (commit)
firmware: DMI table support in sysfs /sys/firmware/dmi (commit)
Add shared BCH ECC library (commit)
lib: cpu_rmap: CPU affinity reverse-mapping (commit)
PM: Add support for device power domains (commit)
proc: enable writing to /proc/pid/mem (commit)
scripts/extract-ikconfig: add xz compression support (commit)
4. CPU scheduler
Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy (commit)
Export ns irqtimes through /proc/stat (commit)
5. Memory management
Lockless (and preemptless) fastpaths for slub (commit)
Add VM counters for transparent hugepages (commit)
Have smaps show transparent huge pages (commit)
Introduce the sys_syncfs() syscall to sync a single file system (commit)
Add flakey target that it returns I/O errors periodically (commit)
stripe: implement merge method, performance improvement has been measured to be ~12-35% -- when a reasonable chunk_size is used (e.g. 64K) in conjunction with a stripe count that is a power of 2 (commit)
IPv4: Remove the hash based routing table implementation, make the FIB Trie implementation the default (commit)
AF_UNIX: implement socket filter (commit)
Implement mechanism for HW based QOS (commit)
RPS: Enable hardware acceleration of RFS (commit)
UDP: Add lockless transmit path (commit)
TX timestamps for IPv6 UDP packets (commit)
9p: Implement syncfs 9P operation (commit)
tipc: Add support for SO_RCVTIMEO socket option (commit)
Audit target to record accepted/dropped packets (commit)
xtable: connlimit revision 1 (commit)
xtable: speedup compat operations (commit)
xtable: "set" match and "SET" target support (commit)
xt_addrtype: ipv6 support (commit)
xt_CLASSIFY: add ARP support, allow CLASSIFY target on any table (commit)
xt_conntrack: support matching on port ranges (commit)
ebt_ip6: allow matching on ipv6-icmp types/codes (commit)
nf_conntrack: nf_conntrack snmp helper (commit)
nf_conntrack_tstamp: add flow-based timestamp extension (commit)
8. File systems
Add buffered write support for v9fs. (commit)
Add direct IO support in cached mode (commit)
Add posixacl mount option (commit)
Deallocation performance patch (commit)
Improve cluster mmap scalability (commit)
Introduce AIL lock (commit)
Use RCU for glock hash table (commit)
Make HPFS compile on preempt and SMP (commit)
Implement fsync for hpfs (commit)
Remove CR/LF conversion option (commit)
Add option to mount by osdname (commit)
Implement FS_IOC_GETFLAGS/SETFLAGS/GETVERSION (commit)
Allow user names longer than 32 bytes (commit)
Add compression options support to xz decompressor (commit)
authencesn - Add algorithm to handle IPsec extended sequence numbers (commit)
picoxcell - add support for the picoxcell crypto engines (commit)
KVM: SVM: Add support for perf-kvm (commit)
Smack: mmap controls for library containment (commit)
Make selinux cache VFS RCU walks safe (commit)
Add a 4th criteria to object labeling: directory entry (commit)