• Immutable Page
  • Info
  • Attachments

Linux 2 6 39

Kernel 2.6.39 released on Wed, 18 May 2011.

Summary: EXT4 SMP scalability improvements, increase of the initial TCP congestion window, a new architecture called Unicore-32, a feature that allows the creation of groups of network resources called IPset, Btrfs updates, a feature that allows to store crash information in firmware to recover it after a reboot, open-by-handle syscalls, perf updates, and many other small changes and new drivers.

  1. Prominent features (the cool stuff)
    1. Ext4 SMP scalability
    2. TCP: Increase the initial congestion window to 10 packets
    3. IPset
    4. Btrfs updates
    5. Pstore: storing crash information across a reboot
    6. New Architecture: UniCore-32
    7. Transcendent Memory
    8. BKL: That's all, folks
    9. Open-by-handle syscalls
    10. Perf updates
  2. Drivers and architectures
  3. Core
  4. CPU scheduler
  5. Memory management
  6. Block
  7. Networking
  8. File systems
  9. Crypto
  10. Virtualization
  11. Security
  12. Tracing/perf

1. Prominent features (the cool stuff)

1.1. Ext4 SMP scalability

In 2.6.37, huge Ext4 scalability improvements were merged and mentioned in the changelog. But this feature was not ready for prime time and had been disabled in source before the release - something that the changelog didn't mention. In this release it has been enabled by default. This is the text from the previous changelog:

"In this release Ext4 will use the "bio" layer directly instead of the intermediate "buffer" layer. The "bio" layer (alias for Block I/O: it's the part of the kernel that sends the requests to the IO/O scheduler) was one of the first features merged in the Linux 2.5.1 kernel. The buffer layer has a lot of performance and SMP scalability issues that will get solved with this port. A FFSB benchmark in a 48 core AMD box using a 24 SAS-disk hardware RAID array with 192 simultaneous ffsb threads speeds up by 300% (400% disabling journaling), while reducing CPU usage by a factor of 3-4"

Code: (commit)

1.2. TCP: Increase the initial congestion window to 10 packets

Recommended LWN article: Increasing the TCP initial congestion window

The initial congestion window of the Linux TCP stack has been increased to 10 packets, it should improve latencies.

Code: (commit)

1.3. IPset

Official IPset webpage: http://ipset.netfilter.org/

IPset allows the creation of groups of network resources (IPv4/v6 addresses, TCP/UDP port numbers, IP-MAC address pairs, IP-port number pairs, etc), called "IP sets", then you can use those sets to define Netfilter/iptables rules. These sets are much more lookup-efficient than bare iptables rules, but may come with a greater memory footprint. Different storage algorithms (for the data structures in memory) are provided in ipset for the user to select an optimum solution. IPset has been available for some time in the xtables-addons patches and is now being included in the Linux tree.

This tool is useful to do things like: store multiple IP addresses or port numbers and match against the collection by iptables at one swoop; dynamically update iptables rules against IP addresses or ports without performance penalty; express complex IP address and ports based rulesets with one single iptables rule and benefit from the speed of IP sets.

Code: (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

1.4. Btrfs updates

Btrfs allows different compression and copy-on-write settings for each file/directory (in addition to the per-filesystem controls). There is also the usual round of minor speedups, and tracepoints for runtime analysis.

Code: (commit 1, 2, 3, 4, 5)

1.5. Pstore: storing crash information across a reboot

Recommended LWN article: Persistent storage for a kernel's "dying breath"

Pstore is a filesystem interface that allows to store and recover crash information across a reboot storing it in places like the ERST, a mechanism specified by ACPI that allows saving and retrieving hardware error information to and from a non-volatile location (like flash).

Code: (commit), (commit)

1.6. New Architecture: UniCore-32

UniCore-32 is 32-bit Instruction Set Architecture, including a series of low-power-consumption RISC chip designs licensed by PKUnity Ltd.

Code: arch/unicore32

1.7. Transcendent Memory

Recommended LWN article: Transcendent memory

Trascendent memory is a new type of memory with a particular set of characteristics. From LWN: "transcendental memory can be thought of as a sort of RAM disk with some interesting characteristics: nobody knows how big it is, writes to the disk may not succeed, and, potentially, data written to the disk may vanish before being read back again". This memory could be used in places like the page cache, swap, or virtualization. In this release it is used for to implement a compressed in-memory caching mechanism called zcache.

Code: (commit 1, 2, 3)

1.8. BKL: That's all, folks

In 2.6.37, it was possible to compile a Linux kernel without support for the BKL. In this release, the BKL has been removed completely from the kernel sources, including the functions lock_kernel() and unlock_kernel().

Code: (commit)

1.9. Open-by-handle syscalls

Recommended LWN article: Open by handle

Two new syscalls have been added, name_to_handle_at() and open_by_handle_at(). These syscalls return a file handle, which is useful for user-space filesystems, backup software and other storage management tools. These handles can be used in a new flag that has been added to the open() syscall: O_PATH.

Code (commit), (commit), (commit), (commit), (commit), (commit)

1.10. Perf updates

  • Add the ability to filter monitoring based on container groups (cgroups) for both perf stat and perf record. It is possible to monitor multiple cgroup in parallel. There is one cgroup per event. The cgroups to monitor are passed via a new -G option followed by a comma separated list of cgroup names (commit), (commit)

  • perf top: Introduce slang based TUI with live annotation, perf top --tui (commit), (commit)

  • Initial python binding (commit)

  • perf list: Allow filtering list of events (commit)

  • perf script: Support custom field selection for output (commit), add support for dumping symbols (commit), add support for H/W and S/W events (commit)

  • perf stat: Provide support for filters (commit)

  • perf evlist: New command to list the names of events present in a perf.data file (commit)

  • Add Intel SandyBridge CPU support (commit)

  • perf probe: add variable filter support, e.g. perf probe -V schedule --externs --filter=cpu* (commit), add filters support for available functions (commit), support function@filename syntax for --line (commit), add --funcs to show available functions in symtab (commit), enable to put probe inline function call site This will increase line-based probe-ability (commit)

2. Drivers and architectures

All the driver and architecture-specific changes can be found in the Linux_2_6_39-DriversArch page

3. Core

  • Add a commandline parameter "threadirqs" which forces all interrupts except those marked IRQF_NO_THREAD to run threaded (commit)

  • POSIX timers: Introduce a syscall for clock tuning. (commit)

  • bloat-o-meter: include read-only data section in report (commit)

  • console: allow to retain boot console via boot option keep_bootcon (commit)

  • firmware: DMI table support in sysfs /sys/firmware/dmi (commit)

  • Add shared BCH ECC library (commit)

  • lib: cpu_rmap: CPU affinity reverse-mapping (commit)

  • PM: Add support for device power domains (commit)

  • proc: enable writing to /proc/pid/mem (commit)

  • scripts/extract-ikconfig: add xz compression support (commit)

4. CPU scheduler

  • Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy (commit)

  • Export ns irqtimes through /proc/stat (commit)

5. Memory management

  • Lockless (and preemptless) fastpaths for slub (commit)

  • Add VM counters for transparent hugepages (commit)

  • vmap area cache (solves a regression introduced in 2.6.28 (commit)

  • Have smaps show transparent huge pages (commit)

6. Block

  • Remove per-queue plugging, reimplement FLUSH/FUA to support merge (commit), (commit)

  • Introduce the sys_syncfs() syscall to sync a single file system (commit)


  • Add flakey target that it returns I/O errors periodically (commit)

  • stripe: implement merge method, performance improvement has been measured to be ~12-35% -- when a reasonable chunk_size is used (e.g. 64K) in conjunction with a stripe count that is a power of 2 (commit)

7. Networking

  • IPv4: Remove the hash based routing table implementation, make the FIB Trie implementation the default (commit)

  • AF_UNIX: implement socket filter (commit)

  • Net Schedulers: SFB flow scheduler (commit), CHOKe flow scheduler (recommended LWN article: The CHOKe packet scheduler) (commit), multi-queue priority scheduler (MQPRIO) (commit)

  • Implement the ability to enslave/release slave devices (commit), (commit)

  • Implement mechanism for HW based QOS (commit)

  • RPS: Enable hardware acceleration of RFS (commit)

  • UDP: Add lockless transmit path (commit)

  • Add support for IPsec extended sequence numbers (esn) as defined in RFC 4303 (commit 1, 2, 3, 4, 5)

  • TX timestamps for IPv6 UDP packets (commit)

  • 9p: Implement syncfs 9P operation (commit)

  • Add support for network device groups (commit), (commit)

  • tipc: Add support for SO_RCVTIMEO socket option (commit)


  • Audit target to record accepted/dropped packets (commit)

  • xtable: connlimit revision 1 (commit)

  • xtable: speedup compat operations (commit)

  • xtable: "set" match and "SET" target support (commit)

  • xt_addrtype: ipv6 support (commit)

  • xt_CLASSIFY: add ARP support, allow CLASSIFY target on any table (commit)

  • xt_conntrack: support matching on port ranges (commit)

  • ebt_ip6: allow matching on ipv6-icmp types/codes (commit)

  • nf_conntrack: nf_conntrack snmp helper (commit)

  • nf_conntrack_tstamp: add flow-based timestamp extension (commit)

8. File systems


  • Add buffered write support for v9fs. (commit)

  • Add direct IO support in cached mode (commit)

  • Add posixacl mount option (commit)



  • Make HPFS compile on preempt and SMP (commit)

  • Implement fsync for hpfs (commit)

  • Remove CR/LF conversion option (commit)


  • Enable delaylog by default (commit)

  • Stop using the page cache to back the buffer cache (commit)


  • Add ino32 mount option (commit)

  • Add lingering request and watch/notify event framework (commit)


  • Add option to mount by osdname (commit)




  • Allow user names longer than 32 bytes (commit)


  • Add compression options support to xz decompressor (commit)

9. Crypto

  • authencesn - Add algorithm to handle IPsec extended sequence numbers (commit)

  • picoxcell - add support for the picoxcell crypto engines (commit)

10. Virtualization

  • KVM: SVM: Add support for perf-kvm (commit)

11. Security

  • Smack: mmap controls for library containment (commit)


  • Make selinux cache VFS RCU walks safe (commit)

  • Add a 4th criteria to object labeling: directory entry (commit)

12. Tracing/perf


Tell others about this page:

last edited 2012-07-22 14:13:02 by diegocalleja