KernelNewbies:

Linux 3.18 [https://lkml.org/lkml/2014/12/7/202 has been released] on Sun, 7 Dec 2014

Summary: This release adds support for overlayfs, which allows to combine two filesystem in a single mount point; support for mapping user space memory into the GPU on Radeon devices, a bpf() syscall that allows to upload BPF-like programs that can be attached to events; a TCP congestion algorithm optimized for data centers; the Geneve virtualization encapsulation, support for embedding IP protocols over UDP, improved networking performance thanks to batching the processing of socket buffers, and optional multi-queue SCSI support. There are also new drivers and many other small improvements.

TableOfContents()

1. Prominent features

1.1. Overlayfs

An overlay filesystem combines two filesystems, an 'upper' filesystem and a 'lower' filesystem, into a single file system namespace, modifications will be done to the upper filesystem. It has many uses, but it is most often used for live CDs, where a read-only OS image is used as lower filesystem and a writeable RAM-backed filesystem is used as the upper one. Any modifications will be done in the upper filesystem, thus allowing users to run the OS image provided normally. Overlayfs differs from other "union filesystem" implementations in that after a file is opened all operations go directly to the underlying, lower or upper, filesystems. This simplifies the implementation and allows native performance in these cases.

It is possible for both directory trees to be in the same filesystem and there is no requirement that the root of a filesystem be given for either upper or lower. The lower filesystem can be any filesystem supported by Linux and does not need to be writable. The lower filesystem can even be another overlayfs. The upper filesystem will normally be writable and if it is it must support the creation of trusted.* extended attributes, and must provide valid d_type in readdir() responses, so NFS is not suitable.

Documentation: [http://git.kernel.org/linus/7c37fbda85ceb9be7bdb9d5f53e702efc40cf783 commit] Code: [http://git.kernel.org/linus/e9be9d5e76e34872f0c37d72e25bc27fe9e2c54c commit]

1.2. Radeon: mapping of user pages into video memory

[http://kernelnewbies.org/Linux_3.16#head-822ab6b7936786bb9e91c16ecdcefe6fd20dc6bf Linux 3.16 added] the ability to map users addresses into the video memory for Intel hardware. In this release, Radeon has also gained support for this feature. Normal application data can be used as a texture source or even as a render target (depending upon the capabilities of the chipset). This has a number of uses, with zero-copy downloads to the GPU and efficient readback making the intermixed streaming of CPU and GPU operations fairly efficient. This ability has many widespread implications from faster rendering of client-side software rasterisers (chromium), mitigation of stalls due to read back (firefox) and to faster pipelining of texture data (such as pixel buffer objects in GL or data blobs in CL).

Code: [http://git.kernel.org/linus/f72a113a71ab08c4df8a5f80ab2f8a140feb81f6 commit]

1.3. bpf() syscall for eBFP virtual machine programs

bpf() syscall is a multiplexor for a range of different operations on eBPF which can be characterized as "universal in-kernel virtual machine". eBPF is similar to original Berkeley Packet Filter used to filter network packets. eBPF "extends" classic BPF in multiple ways including ability to call in-kernel helper functions and access shared data structures like eBPF maps. The programs can be written in a restricted C that is compiled into eBPF bytecode and executed on the eBPF virtual machine or JITed into native instruction set.

eBPF programs are similar to kernel modules. They are loaded by the user process and automatically unloaded when process exits. Each eBPF program is a safe run-to-completion set of instructions. eBPF verifier statically determines that the program terminates and is safe to execute. The programs are attached to different events. These events can be packets, tracepoint events and other types in the future. Beyond storing data the programs may call into in-kernel helper functions which may, for example, dump stack, do trace_printk or other forms of live kernel debugging.

Recommended LWN article: [http://lwn.net/Articles/612878/ The BPF system call API, version 14]

ebfp() man page and design documentation can be read on the merge commit: [http://git.kernel.org/linus/b4fc1a460f3017e958e6a8ea560ea0afd91bf6fe commit]

Code: [http://git.kernel.org/linus/99c55f7d47c0dc6fc64729f37bf435abf43f4c60 commit 1], [http://git.kernel.org/linus/db20fd2b01087bdfbe30bce314a198eefedcc42e 2], [http://git.kernel.org/linus/09756af46893c18839062976c3252e93a1beeba7 3], [http://git.kernel.org/linus/0a542a86d73b1577e7d4f55fc95dcffd3fe62643 4], [http://git.kernel.org/linus/51580e798cb61b0fc63fa3aa6c5c975375aa0550 5], [http://git.kernel.org/linus/cbd357008604925355ae7b54a09137dabb81b580 6], [http://git.kernel.org/linus/0246e64d9a5fcd4805198de59b9b5cf1f974eb41 7], [http://git.kernel.org/linus/475fb78fbf48592ce541627c60a7b331060e31f5 8], [http://git.kernel.org/linus/17a5267067f3c372fec9ffb798d6eaba6b5e6a4c 9], [http://git.kernel.org/linus/3c731eba48e1b0650decfc91a839b80f0e05ce8f 10]

1.4. TCP: Data Center TCP congestion algorithm

This release adds the Data Center TCP (DCTCP) congestion control algorithm. DCTCP is an enhancement to the TCP congestion control algorithm for data center networks. DCTCP has been designed for workloads typical of data center environments to provide/achieve: high burst tolerance, low latency and high throughput.

For more details about DCTCP, see the [http://simula.stanford.edu/~alizade/Site/DCTCP.html DCTCP web page]

Code: [http://git.kernel.org/linus/e3118e8359bb7c59555aca60c725106e6d78c5ce commit]

1.5. Networking: Geneve Virtualization Encapsulation

Advent of network virtualization has caused a surge of renewed interest and a corresponding increase in the introduction of new protocols, ranging all the way from VLANs and MPLS through the more recent VXLAN, NVGRE , and STT. Existing tunnel protocols have each attempted to solve different aspects of the new requirements, only to be quickly rendered out of date. This release adds Geneve, a protocol which seeks to avoid these problems by providing a framework for tunneling that provide Layer 2 Networks over Layer 3 Networks.

For more information see http://tools.ietf.org/html/draft-gross-geneve-01

Related vmware blog post: [http://blogs.vmware.com/cto/geneve-vxlan-network-virtualization-encapsulations/ Geneve, VXLAN, and Network Virtualization Encapsulations]

Code: [http://git.kernel.org/linus/0b5e8b8eeae40bae6ad7c7e91c97c3c0d0e57882 commit]

1.6. Networking performance optimization: transmission queue batching

This release adds support for deferred flushing of transmission [http://vger.kernel.org/~davem/skb.html SKB]s (socket buffers) to the networking driver. Processing the transmission queue is expensive, so some batching shares that cost with other SKBs. This change allows to achieve 10Gbit/s full TX wirespeed smallest packet size on a single CPU core (14.8Mpps). Several drivers already have support for this feature: i40e, igb, ixgbe, mlx4, virtio_net, more will follow in next releases.

Recommended LWN article: [http://lwn.net/Articles/615238/ Bulk network packet transmission]

Recommended blog post [http://netoptimizer.blogspot.dk/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html Unlocked 10Gbps TX wirespeed smallest packet single core]

Code: see the blog post link

1.7. Foo-over-UDP support

This release adds the ability to encapsulate any IP protocol over UDP including tunnels (IPIP, GRE, SIT).

The rationale for this funcionality is that network mechanisms, hardware and optimizations for UDP (such as ECMP and RSS) can be leveraged to provide better service. GRE, IPIP, and SIT have been modified with netlink commands to configure use of FOU on transmit. A new "ip fou" has been added in newer releases of ip to make use of this feature. Details on configuration can be found in the merge link.

Recommended LWN link: [http://lwn.net/Articles/614348/ Foo over UDP]

Merge link: [https://git.kernel.org/linus/fb5690d2458340b645ea3b36e8db560cb3272e65 merge]

1.8. Optional multiqueue SCSI support

Linux 3.13 [http://kernelnewbies.org/Linux_3.13#head-3e5f0c2bcebc98efd197e3036dd814eadd62839c added] a new design for the block layer that allowed to process multiple IO queues in parallel. This feature, however, wasn't transparent, and required modifications in drivers to support it. In this release, support for the multi-queue layer has been added to the SCSI layer (used by ATA and SATA drivers) as an optional configuration option.

Code: [http://git.kernel.org/linus/24c20f10583647e30afe87b6f6d5e14bc7b1cbc6 commit]

2. Drivers and architectures

All the driver and architecture-specific changes can be found in the [http://kernelnewbies.org/Linux_3.18-DriversArch Linux_3.18-DriversArch page]

3. File systems

4. Security

5. Block

6. Memory management

7. Crypto

8. Virtualization

9. Tracing & perf

10. Networking

11. Core (various)

KernelNewbies: Linux_3.18 (last edited 2015-01-13 13:08:01 by MartiRaudsepp)