KernelNewbies:

Linux 3.10 [https://lkml.org/lkml/2013/6/30/75 has been released].

Summary: This release adds support for bcache, which allows to use SSD devices to cache data from other block devices; a Btrfs format improvement that makes the tree dedicated to store extent information 30-35% smaller; support for XFS metadata checksums and self-describing metadata, timer free multitasking for applications running alone in a CPU, SysV IPC and rwlock scalability improvements, the TCP Tail loss probe algorithm that reduces tail latency of short transactions, KVM virtualization support in the MIPS architecture, many new drivers and small improvements.

TableOfContents()

1. Prominent features (the cool stuff)

1.1. Timer free multitasking

In the prehistory of computing, computers could only have one task running at one time. But people wanted to start other tasks without waiting for first one to end, and even switch between tasks, and thus multitasking was born. First, multitasking was "collaborative", a process would run until its own code voluntarily decided to pause and allow other tasks to run. But it was possible to do multitasking better: the hardware could have a timer that fires up at regular intervals (called "ticks"); this timer could forcefully pause any program and run a OS routine that decides which task should continue running next. This is called preemptive multitasking, and it's what modern OSs do.

But preemptive multitasking had some side effects in modern hardware. CPUs of laptops and mobile devices require inactivity to enter in low power modes. Preemptive multitasking fires the the timer often, 1000 times per second in a typical Linux kernel, even when the system is not doing anything, so the CPUs could not save as much power as it was possible. Virtualization created more problems, since each Linux VM runs its own timer.[http://kernelnewbies.org/Linux_2_6_21#head-8547911895fda9cdff32a94771c8f5706d66bba0 In 2.6.21, released in April 2007, Linux partially solved this]: the timer would fire off 1000 times per second as always when the system is running tasks, but it would stop completely the timer when the system is idle. But this is not enough. There are single task workloads like scientific number crunching or users of the real-time pachset whose performance or latency is hurt because they need to be temporally paused 1000 times per second for no reason.

This Linux release adds support for not firing the timer (tickless) even when tasks are running. With some caveats: in this release it's not actually fully tickless, it still needs the timer, but only fires up one time per second; the full tickless mode is disabled when a CPU runs more than one process; and a CPU must be kept running with full ticks to allow other CPUs to go into tickless mode.

For more details and future plans, it's strongly recommended to read this LWN article: '[https://lwn.net/Articles/549580/ (Nearly) full tickless operation in 3.10]' and the [http://git.kernel.org/linus/0c87f9b5ca5bdda1a868b0d7df4bec92e41a468d Documentation].

Code: [https://git.kernel.org/linus/534c97b0950b1967bca1c753aeaed32f5db40264 (merge commit)]

1.2. Bcache, a block layer cache for SSD caching

Since SSD storage devices became popular, many people has used them to speed up their storage stack. Bcache is an implementation of this functionality, and it allows SSDs to cache other block devices. It's analogous to L2Arc for ZFS, but Bcache also does writeback caching (besides just write through caching), and it's filesystem agnostic. It's designed to be switched on with a minimum of effort, and to work well without configuration on any setup. By default it won't cache sequential IO, just the random reads and writes that SSDs excel at. It's meant to be suitable for desktops, servers, high-end storage arrays, and perhaps even embedded.

For more details read the [https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/bcache.txt documentation] or visit the [http://bcache.evilpiepirate.org/ wiki]

Recommended LWN article: [https://lwn.net/Articles/497024/ A bcache update]

Code: [http://git.kernel.org/linus/cafe563591446cf80bfbc2fe3bc72a2e36cf1060 (commit)]

1.3. Btrfs: smaller extents

Btrfs has incorporated a new key type for metadata extent references which uses disk space more efficiently and reduces the size from 51 bytes to 33 bytes per extent reference for each tree block. In practice, this results in a 30-35% decrease in the size of the extent tree, which means less copy-on-write operations, larger parts of the extent tree stored in memory which makes heavy metadata operations go much faster.

This is not an automatic format change, it must be enabled at mkfs time or with btrfstune -x.

Code: [http://git.kernel.org/linus/3173a18f70554fe7880bb2d85c7da566e364eb3c (commit)]

1.4. XFS metadata checksums

In this release, XFS has a experimental implementation of metadata CRC32c checksums. These metadata checksums are part of a bigger project that aims to implement what the XFS developers have called "self-describing metadata". This project aims to solve the problem of verification scalability (fsck will need too much time to verify petabyte scale filesystems with billions of inodes). It requires a filesystem format change that will add to every XFS metadata object some information that allows to quickly determine if the metadata is intact and can be ignored for the purpose of forensic analysis. metadata type, filesystem identifier and block placement, metadata owner, log sequence identifier and, of course, CRC checksum.

This feature is experimental and requires using experimental xfsprogs. For more information, you can read the self-describing metadata [http://git.kernel.org/linus/dccc3f447a5e065a1c4406aede72d160ae38a736 Documentation].

Code: [https://git.kernel.org/linus/c8d8566952fda026966784a62f324c8352f77430 (merge commit)]

1.5. SysV IPC scalability improvements

Linux IPC semaphore scalability was pitiful. Linux used to lock much too big ranges, and it used to have a single IPC lock per IPC semaphore array. Most loads never cared, but some do. This release splits out locking and adds per-semaphore locks for greater scalability of the IPC semaphore code. Micro benchmarks show improvements of more than 10x in some cases (see commit links for details).

Code: [https://git.kernel.org/linus/823e75f723aa3fefd5d2eecbf8636184ca4790fc (merge commit)],[https://git.kernel.org/linus/16df3674efe39f3ab63e7052f1244dd3d50e7f84 (commit 1], [https://git.kernel.org/linus/6062a8dc0517bce23e3c2f7d2fea5e22411269a3 2], [https://git.kernel.org/linus/9f1bc2c9022c1d4944c4a1a44c2f365487420aca 3], [https://git.kernel.org/linus/c460b662d5cae467f1c341c59b02a5c5e68fed0b 4], [https://git.kernel.org/linus/444d0f621b64716f7868dcbde448e0c66ece4e61 5], [https://git.kernel.org/linus/4d2bff5eb86e8d7b4a20934cccb93bdeebed3558 6], [https://git.kernel.org/linus/7bb4deff61bdab3338534841cb6d0508314a41d6 7]

1.6. rwsem locking scalability improvements

The rwsem ("read-writer semaphore") locking scheme, used in many places in the Linux kernel, had [https://lkml.org/lkml/2013/1/29/84 performance problems] because of strict, serialized, FIFO sequential write-ownership of the semaphore. In [http://kernelnewbies.org/Linux_3.9#head-8c93f0925010a379572a8e544adad642fe0c5009 Linux 3.9], an "opportunistic lock stealing" patch was merged to fix it, but only in the slow path.

In this release, opportunity lock stealing has been implemented in the fast path, improving the performance of pgbench with double digits in some cases.

Code: [https://git.kernel.org/linus/c8de2fa4dc2778ae3605925c127b3deac54b2b3a (merge commit)]

1.7. TCP optimization: Tail loss probe

This release adds the [http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 TCP Tail loss probe algorithm]. Its goal is to reduce tail latency of short transactions. It achieves this by converting retransmission timeouts (RTOs) occuring due to tail losses (losses at end of transactions) into fast recovery. TLP transmits one packet in two round-trips when a connection is in Open state and isn't receiving any ACKs. The transmitted packet, aka loss probe, can be either new or a retransmission. When there is tail loss, the ACK from a loss probe triggers FACK/early-retransmit based fast recovery, thus avoiding a costly retransmission timeout.

Code: [http://git.kernel.org/linus/6ba8a3b19e764b6a65e4030ab0999be50c291e6c (commit 1], [http://git.kernel.org/linus/9b717a8d245075ffb8e95a2dfb4ee97ce4747457 2)]

1.8. MIPS KVM support

Another Linux architecture has added support for KVM; in this case MIPS. KVM/MIPS should support MIPS32R2 and beyond. For more details, see the [https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/mips/kvm/00README.txt?id=03a0331c8c715c73d877aba8c542a60b13f70ed0 release notes].

Code: [http://git.kernel.org/linus/03a0331c8c715c73d877aba8c542a60b13f70ed0 (commit)]

1.9. tracing: tracing snapshots, stack tracing

The tracing framework has got the ability to allow several tracing buffers, which can be used to take snapshots of the main tracing buffer. These tracing snapshots can be triggered manually or with function probes. It's also possible to cause a stack trace to be traced in the ring buffer when a given function is called.

[http://git.kernel.org/linus/277ba04461c2746cf935353474c0961161951b68 (commit 1], [http://git.kernel.org/linus/0b85ffc293044393623059eda9904a7d5b644e36 2], [http://git.kernel.org/linus/ce9bae55972b228cf7bac34350c4d2caf8ea0d0b 3], [http://git.kernel.org/linus/55034cd6e648155393b0d665eef76b38d49ad6bf 4], [http://git.kernel.org/linus/77fd5c15e3216b901be69047ca43b05ae9099951 5], [http://git.kernel.org/linus/dd42cd3ea96d687f15525c4f14fa582702db223f 6)]

/!\ /!\ This document is incomplete and will be finished in the next 24 hours /!\ /!\

2. Drivers and architectures

All the driver and architecture-specific changes can be found in the [http://kernelnewbies.org/Linux_3.10-DriversArch Linux_3.10-DriversArch page]

3. Core

4. CPU scheduler

5. Memory management

6. File systems

EXT4

Btrfs

7. Networking

8. Block

9. Crypto

10. Virtualization

11. Security

12. Tracing/perf


CategoryReleases

KernelNewbies: Linux_3.10 (last edited 2013-07-01 13:46:43 by diegocalleja)