Linux 3.10 [https://lkml.org/lkml/2013/6/30/75 has been released].
Summary: This release adds support for bcache, which allows to use SSD devices to cache data from other block devices; a Btrfs format improvement that makes the tree dedicated to store extent information 30-35% smaller; support for XFS metadata checksums and self-describing metadata, timer free multitasking for applications running alone in a CPU, SysV IPC and rwlock scalability improvements, the TCP Tail loss probe algorithm that reduces tail latency of short transactions, KVM virtualization support in the MIPS architecture, many new drivers and small improvements.
1. Prominent features (the cool stuff)
1.1. Timer free multitasking
In the prehistory of computing, computers could only have one task running at one time. But people wanted to start other tasks without waiting for first one to end, and even switch between tasks, and thus multitasking was born. First, multitasking was "collaborative", a process would run until its own code voluntarily decided to pause and allow other tasks to run. But it was possible to do multitasking better: the hardware could have a timer that fires up at regular intervals (called "ticks"); this timer could forcefully pause any program and run a OS routine that decides which task should continue running next. This is called preemptive multitasking, and it's what modern OSs do.
But preemptive multitasking had some side effects in modern hardware. CPUs of laptops and mobile devices require inactivity to enter in low power modes. Preemptive multitasking fires the the timer often, 1000 times per second in a typical Linux kernel, even when the system is not doing anything, so the CPUs could not save as much power as it was possible. Virtualization created more problems, since each Linux VM runs its own timer.[http://kernelnewbies.org/Linux_2_6_21#head-8547911895fda9cdff32a94771c8f5706d66bba0 In 2.6.21, released in April 2007, Linux partially solved this]: the timer would fire off 1000 times per second as always when the system is running tasks, but it would stop completely the timer when the system is idle. But this is not enough. There are single task workloads like scientific number crunching or users of the real-time pachset whose performance or latency is hurt because they need to be temporally paused 1000 times per second for no reason.
This Linux release adds support for not firing the timer (tickless) even when tasks are running. With some caveats: in this release it's not actually fully tickless, it still needs the timer, but only fires up one time per second; the full tickless mode is disabled when a CPU runs more than one process; and a CPU must be kept running with full ticks to allow other CPUs to go into tickless mode.
For more details and future plans, it's strongly recommended to read this LWN article: '[https://lwn.net/Articles/549580/ (Nearly) full tickless operation in 3.10]' and the [http://git.kernel.org/linus/0c87f9b5ca5bdda1a868b0d7df4bec92e41a468d Documentation].
Code: [https://git.kernel.org/linus/534c97b0950b1967bca1c753aeaed32f5db40264 (merge commit)]
1.2. Bcache, a block layer cache for SSD caching
Since SSD storage devices became popular, many people has used them to speed up their storage stack. Bcache is an implementation of this functionality, and it allows SSDs to cache other block devices. It's analogous to L2Arc for ZFS, but Bcache also does writeback caching (besides just write through caching), and it's filesystem agnostic. It's designed to be switched on with a minimum of effort, and to work well without configuration on any setup. By default it won't cache sequential IO, just the random reads and writes that SSDs excel at. It's meant to be suitable for desktops, servers, high-end storage arrays, and perhaps even embedded.
For more details read the [https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/bcache.txt documentation] or visit the [http://bcache.evilpiepirate.org/ wiki]
Recommended LWN article: [https://lwn.net/Articles/497024/ A bcache update]
Code: [http://git.kernel.org/linus/cafe563591446cf80bfbc2fe3bc72a2e36cf1060 (commit)]
1.3. Btrfs: smaller extents
Btrfs has incorporated a new key type for metadata extent references which uses disk space more efficiently and reduces the size from 51 bytes to 33 bytes per extent reference for each tree block. In practice, this results in a 30-35% decrease in the size of the extent tree, which means less copy-on-write operations, larger parts of the extent tree stored in memory which makes heavy metadata operations go much faster.
This is not an automatic format change, it must be enabled at mkfs time or with btrfstune -x.
Code: [http://git.kernel.org/linus/3173a18f70554fe7880bb2d85c7da566e364eb3c (commit)]
1.4. XFS metadata checksums
In this release, XFS has a experimental implementation of metadata CRC32c checksums. These metadata checksums are part of a bigger project that aims to implement what the XFS developers have called "self-describing metadata". This project aims to solve the problem of verification scalability (fsck will need too much time to verify petabyte scale filesystems with billions of inodes). It requires a filesystem format change that will add to every XFS metadata object some information that allows to quickly determine if the metadata is intact and can be ignored for the purpose of forensic analysis. metadata type, filesystem identifier and block placement, metadata owner, log sequence identifier and, of course, CRC checksum.
This feature is experimental and requires using experimental xfsprogs. For more information, you can read the self-describing metadata [http://git.kernel.org/linus/dccc3f447a5e065a1c4406aede72d160ae38a736 Documentation].
Code: [https://git.kernel.org/linus/c8d8566952fda026966784a62f324c8352f77430 (merge commit)]
1.5. SysV IPC scalability improvements
Linux IPC semaphore scalability was pitiful. Linux used to lock much too big ranges, and it used to have a single IPC lock per IPC semaphore array. Most loads never cared, but some do. This release splits out locking and adds per-semaphore locks for greater scalability of the IPC semaphore code. Micro benchmarks show improvements of more than 10x in some cases (see commit links for details).
Code: [https://git.kernel.org/linus/823e75f723aa3fefd5d2eecbf8636184ca4790fc (merge commit)],[https://git.kernel.org/linus/16df3674efe39f3ab63e7052f1244dd3d50e7f84 (commit 1], [https://git.kernel.org/linus/6062a8dc0517bce23e3c2f7d2fea5e22411269a3 2], [https://git.kernel.org/linus/9f1bc2c9022c1d4944c4a1a44c2f365487420aca 3], [https://git.kernel.org/linus/c460b662d5cae467f1c341c59b02a5c5e68fed0b 4], [https://git.kernel.org/linus/444d0f621b64716f7868dcbde448e0c66ece4e61 5], [https://git.kernel.org/linus/4d2bff5eb86e8d7b4a20934cccb93bdeebed3558 6], [https://git.kernel.org/linus/7bb4deff61bdab3338534841cb6d0508314a41d6 7]
1.6. rwsem locking scalability improvements
The rwsem ("read-writer semaphore") locking scheme, used in many places in the Linux kernel, had [https://lkml.org/lkml/2013/1/29/84 performance problems] because of strict, serialized, FIFO sequential write-ownership of the semaphore. In [http://kernelnewbies.org/Linux_3.9#head-8c93f0925010a379572a8e544adad642fe0c5009 Linux 3.9], an "opportunistic lock stealing" patch was merged to fix it, but only in the slow path.
In this release, opportunity lock stealing has been implemented in the fast path, improving the performance of pgbench with double digits in some cases.
Code: [https://git.kernel.org/linus/c8de2fa4dc2778ae3605925c127b3deac54b2b3a (merge commit)]
1.7. TCP optimization: Tail loss probe
This release adds the [http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 TCP Tail loss probe algorithm]. Its goal is to reduce tail latency of short transactions. It achieves this by converting retransmission timeouts (RTOs) occuring due to tail losses (losses at end of transactions) into fast recovery. TLP transmits one packet in two round-trips when a connection is in Open state and isn't receiving any ACKs. The transmitted packet, aka loss probe, can be either new or a retransmission. When there is tail loss, the ACK from a loss probe triggers FACK/early-retransmit based fast recovery, thus avoiding a costly retransmission timeout.
1.8. MIPS KVM support
Another Linux architecture has added support for KVM; in this case MIPS. KVM/MIPS should support MIPS32R2 and beyond. For more details, see the [https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/mips/kvm/00README.txt?id=03a0331c8c715c73d877aba8c542a60b13f70ed0 release notes].
Code: [http://git.kernel.org/linus/03a0331c8c715c73d877aba8c542a60b13f70ed0 (commit)]
1.9. tracing: tracing snapshots, stack tracing
The tracing framework has got the ability to allow several tracing buffers, which can be used to take snapshots of the main tracing buffer. These tracing snapshots can be triggered manually or with function probes. It's also possible to cause a stack trace to be traced in the ring buffer when a given function is called.
[http://git.kernel.org/linus/277ba04461c2746cf935353474c0961161951b68 (commit 1], [http://git.kernel.org/linus/0b85ffc293044393623059eda9904a7d5b644e36 2], [http://git.kernel.org/linus/ce9bae55972b228cf7bac34350c4d2caf8ea0d0b 3], [http://git.kernel.org/linus/55034cd6e648155393b0d665eef76b38d49ad6bf 4], [http://git.kernel.org/linus/77fd5c15e3216b901be69047ca43b05ae9099951 5], [http://git.kernel.org/linus/dd42cd3ea96d687f15525c4f14fa582702db223f 6)]
This document is incomplete and will be finished in the next 24 hours
2. Drivers and architectures
4. CPU scheduler
5. Memory management
6. File systems
Rescan for qgroups [http://git.kernel.org/linus/2f2320360b0c35b86938bfc561124474f0dac6e4 (commit)]
Automatic rescan after "quota enable" command [http://git.kernel.org/linus/3d7b5a2882133a04716903b1f4878a64c6610842 (commit)]
Create the subvolume qgroup automatically when enabling quota [http://git.kernel.org/linus/7708f029dca5f1b9e9d6ea01ab10cd83e4c74ff2 (commit)]
Deprecate subvolrootid mount option (obsoleted by subvol) [http://git.kernel.org/linus/5e2a4b25da232a2f4ce264a4b2ae113d0b2a799c (commit)]
Add CMAC support to CryptoAPI [http://git.kernel.org/linus/93b5e86a6d13c5dec18c6611933fb38d7d80f0d2 (commit)]
aesni_intel - add more optimized XTS mode for x86-64 [http://git.kernel.org/linus/c456a9cd1ac4eae9147ffd7ac4fb77ca0fa980c6 (commit)]
atmel-aes: add support for latest release of the IP (0x130) [http://git.kernel.org/linus/cadc4ab8f6f73719ef0e124320cdd210d1c9ff3e (commit)]
atmel-sha - add support for latest release of the IP (0x410) [http://git.kernel.org/linus/d4905b38d1f6b60761a6fd16f45ebd1fac8b6e1f (commit)]
atmel-tdes - add support for latest release of the IP (0x700) [http://git.kernel.org/linus/1f858040c2f78013fd2b10ddeb9dc157c3362b04 (commit)]
blowfish: add AVX2/x86_64 implementation of blowfish cipher [http://git.kernel.org/linus/604880107010a1e5794552d184cd5471ea31b973 (commit)]
camellia: add AVX2/AES-NI/x86_64 assembler implementation of camellia cipher [http://git.kernel.org/linus/f3f935a76aa0eee68da2b273a08d84ba8ffc7a73 (commit)], add more optimized XTS code [http://git.kernel.org/linus/b5c5b072dc2f35d45d3404b957e264a3e8e71069 (commit)]
sahara: Add driver for SAHARA2 accelerator. [http://git.kernel.org/linus/5de8875281e1db024d67cbd5c792264194bfca2a (commit)]
sha256: optimized sha256 x86_64 assembly routine using Supplemental SSE3 instructions. [http://git.kernel.org/linus/46d208a2bdf5c3d4a60f2363318f600d64493f60 (commit)], otimized sha256 x86_64 assembly routine with AVX instructions. [http://git.kernel.org/linus/ec2b4c851f4da48a51b79a69843beb135e3db8c2 (commit)], optimized sha256 x86_64 routine using AVX2's RORX instructions [http://git.kernel.org/linus/d34a460092d857f1616e39eed7eac6f40cea2225 (commit)]; module providing optimized routines using SSSE3, AVX or AVX2 instructions. [http://git.kernel.org/linus/8275d1aa642295edd34a11a117080384bb9d65c2 (commit)]
sha512: Optimized SHA512 x86_64 assembly routine using AVX instructions. [http://git.kernel.org/linus/e01d69cb01956e97b6880c1952e264b19473e7f3 (commit)], optimized SHA512 x86_64 assembly routine using AVX2 RORX instruction. [http://git.kernel.org/linus/5663535b69eef3940dcdb3110f95651304fe41af (commit)], optimized SHA512 x86_64 assembly routine using Supplemental SSE3 instructions. [http://git.kernel.org/linus/bf215cee23ad6e278bfba1291863718934de392a (commit)]; create module providing optimized SHA512 routines using SSSE3, AVX or AVX2 instructions. [http://git.kernel.org/linus/87de4579f92dbe50e92f33b94f8688793c894571 (commit)]
twofish: add AVX2/x86_64 assembler implementation of twofish cipher [http://git.kernel.org/linus/cf1521a1a5e21fd1e79a458605c4282fbfbbeee2 (commit)], use optimized XTS code [http://git.kernel.org/linus/18be45270a80ab489d9402b63e1f103428f0afde (commit)]
Add more optimized XTS-mode for serpent-avx [http://git.kernel.org/linus/a05248ed2d9a83ae7c3e6db7c4ef9331c3dedc81 (commit)]
KVM: Port to MIPS32 [http://git.kernel.org/linus/740765ce45689a4eca21914f8b2cc872a970f53f (commit)]
KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls [http://git.kernel.org/linus/8e591cb7204739efa8e15967ea334eb367039dde (commit)]
KVM: PPC: Book3S: Add kernel emulation for the XICS interrupt controller [http://git.kernel.org/linus/bc5ad3f3701116e7db57268e6f89010ec714697e (commit)]
KVM: x86: Increase the "hard" max VCPU limit [http://git.kernel.org/linus/cbf64358588ae45dcf0207dbc97fba783577d64a (commit)]
Add new "perf mem" command for memory access profiling [http://git.kernel.org/linus/ccf49bfc6bb1025788637417780e9f1eeae9fc37 (commit)], [http://git.kernel.org/linus/f4f7e28d0e813ddb997f49ae718ddf98db972292 (commit)], [http://git.kernel.org/linus/98a3b32c99ada4bca8aaf4f91efd96fc906dd5c4 (commit)], [http://git.kernel.org/linus/f20093eef5f7843a25adfc0512617d4b1ff1aa6e (commit)], [http://git.kernel.org/linus/028f12ee6beff0961781c5ed3f740e5f3b56f781 (commit)]
perf stat: Add per-core aggregation. This option is used to aggregate system-wide counts on a per physical core basis. On processors with hyperthreading, this means counts of all HT threads running on a physical core are aggregated [http://git.kernel.org/linus/12c08a9f591aeda57fb3b05897169e7da5439a79 (commit)]
perf annotate: Add --group option to enable event grouping. When enabled, all the group members information will be shown with the leader so skip non-leader events [http://git.kernel.org/linus/b1dd443296b4f8c6869eba790eec950f80392aea (commit)], [http://git.kernel.org/linus/c7e7b6101361025fbea03833c6aee18e3d7bed34 (commit)], [http://git.kernel.org/linus/d8d7cd93e6b5f42bd2ae77680b5dc27415ba7492 (commit)]
perf report: Add --no-demangle option [http://git.kernel.org/linus/328ccdace8855289ad114b70ee1464ba5e3f6436 (commit)]
perf stat: Introduce --repeat forever [http://git.kernel.org/linus/a7e191c376fad084d9f3c7ac89a1f7c47462ebc8 (commit)], rename --aggr-socket to --per-socket [http://git.kernel.org/linus/d4304958a25414a6e67b8a41c0f230e05cafafb6 (commit)]
perf tests: Add attr record -C cpu test [http://git.kernel.org/linus/b03ec1b53070e0fae9de72b584d94b65a4a97635 (commit)], add attr stat -C cpu test [http://git.kernel.org/linus/9687b89d21999301ed386855c04b60d00ed1ec02 (commit)]
Add support for weightened sampling [http://git.kernel.org/linus/05484298cbfebbf8c8c55b000541a245bc286bec (commit)]
Make perf_event cgroup hierarchical [http://git.kernel.org/linus/ef824fa129b7579f56b92d466ecda2e378879806 (commit)]
tracing: Add function probe triggers to enable/disable events [http://git.kernel.org/linus/3cd715de261182413b3487abfffe1b6af41b81b3 (commit)]
tracing: Add "uptime" trace clock that uses jiffies [http://git.kernel.org/linus/8aacf017b065a805d27467843490c976835eb4a5 (commit)]
tracing: Add a way to soft disable trace events [http://git.kernel.org/linus/417944c4c7a0f657158d0515f3b8e8c043fd788f (commit)]
tracing: Add function-trace option to disable function tracing of latency tracers [http://git.kernel.org/linus/328df4759c03e2c3e7429cc6cb0e180c38f32063 (commit)]