KernelNewbies:

Linux 4.6 was released on Sun, 15 May 2016.

Summary: This release adds support for USB 3.1 SuperSpeedPlus (10 Gbps), the new distributed file system OrangeFS, a more reliable out-of-memory handling, support for Intel memory protection keys, a facility to make easier and faster implementations of application layer protocols, support for 802.1AE MAC-level encryption (MACsec), support for the version V of the BATMAN protocol, a OCFS2 online inode checker, support for cgroup namespaces, support for the pNFS SCSI layout, and many other improvements and new drivers.

1. Prominent features

1.1. USB 3.1 SuperSpeedPlus (10 Gbps) support

USB 3.1 specification includes a new SuperSpeedPlus protocol supporting up to 10Gbps speeds. USB 3.1 devices using the new SuperSpeedPlus protocol are called USB 3.1 Gen2 devices (note that USB 3.1 SuperSpeedPlus is not the same as Type-C or power delivery).

This release adds support for the USB 3.1 SuperSpeedPlus 10 Gbps speeds for USB core and xHCI host controller, meaning that a USB 3.1 mass storage connected to a USB 3.1 capable xHCI host should work with 10 Gbps speeds.

Code: commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit

1.2. Improve the reliability of the Out Of Memory task killer

In previous releases, the OOM killer (which tries to kill a task to free memory) tries to kill a single task in a good hope that the task will terminate in a reasonable time and frees up its memory. In practice, it has been shown that it's easy to find workloads which break that assumption, and the OOM victim might take unbounded amount of time to exit because it might be blocked in the uninterruptible state waiting for an event which is blocked by another task looping in the page allocator. This release adds a specialized kernel thread oom_reaper that tries to reclaim memory by preemptively reaping the anonymous or swapped out memory owned by the OOM victim, under an assumption that such a memory won't be needed when its owner is killed anyway.

Recommended LWN article: Toward more predictable and reliable out-of-memory handling

Code: commit, commit, commit, commit, commit, commit, commit, commit, commit

1.3. Support for Intel memory protection keys

This release adds support for a memory protection hardware feature that is available in upcoming Intel CPUs: protection keys. Protection keys allow the encoding of user-controllable permission masks in the page table entries (pte). Instead of having a fixed protection mask in the pte (which needs a system call to change and works on a per page basis), the user can map a handful of protection mask variants. User space can then manipulate a new user-accessible, thread-local register, (PKRU) with two separate bits (Access Disable and Write Disable) for each mask. This makes possible to dynamically switch the protection bits of very large amounts of virtual memory by just manipulating a CPU register, without having to change every single page in the affected virtual memory range.

It also allows more precise control of MMU permission bits: for example the executable bit is separate from the read bit. This release adds the infrastructure for that, plus it adds a high level API to make use of protection keys. If a user-space application calls: mmap(..., PROT_EXEC) or mprotect(ptr, sz, PROT_EXEC) (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice this special case, and will set a special protection key on this memory range. It also sets the appropriate bits in the PKRU register so that the memory becomes unreadable and unwritable. So using protection keys the kernel is able to implement 'true' PROT_EXEC: code that can be executed, but not read, which is a small security advantage (but note that malicious code can manipulate the PKRU register too). In the future, there will further work around protection keys that will offer more high level call APIs to manage protection keys.

Recommended LWN article: Memory protection keys

Code: (merge)

1.4. OrangeFS, a new distributed file system

OrangeFS is an LGPL scale-out parallel storage system. Oiginally called PVFS, it was first developed in 1993 by Walt Ligon and Eric Blumer as a parallel file system for Parallel Virtual Machine as part of a NASA grant to study the I/O patterns of parallel programs. It is ideal for large storage problems faced by HPC, BigData, Streaming Video, Genomics, Bioinformatics. OrangeFS can be accessed through included system utilities, user integration libraries, MPI-IO and can be used by the Hadoop ecosystem as an alternative to the HDFS filesystem.

Applications often don't require Orangefs to be mounted into the VFS, but the Orangefs kernel client allows Orangefs filesystems to be mounted as a VFS. The kernel client communicates with a userspace daemon which in turn communicates with the Orangefs server daemons that implement the file system. The server daemons (there's almost always more than one) need not be running on the same host as the kernel client. Orangefs filesystems can also be mounted with FUSE.

Recommended LWN article: The OrangeFS distributed filesystem

Documentation: Documentation/filesystems/orangefs.txt

Website: http://www.orangefs.org/

Code: fs/orangefs

1.5. Kernel Connection Multiplexor, a facility for accelerating application layer protocols

This release adds Kernel Connection Multiplexor (KCM), a facility that provides a message-based interface over TCP for accelerating application layer protocols. The motivation for this is based on the observation that although TCP is byte stream transport protocol with no concept of message boundaries, a common use case is to implement a framed application layer protocol running over TCP. Most TCP stacks offer byte stream API for applications, which places the burden of message delineation, message I/O operation atomicity, and load balancing in the application.

With KCM an application can efficiently send and receive application protocol messages over TCP using a datagram interface. The kernel provides necessary assurances that messages are sent and received atomically. This relieves much of the burden applications have in mapping a message based protocol onto the TCP stream. KCM also make application layer messages a unit of work in the kernel for the purposes of steerng and scheduling, which in turn allows a simpler networking model in multithreaded applications. In order to delineate message in a TCP stream for receive in KCM, the kernel implements a message parser based on BPF, which parses application layer messages and returns a message length. Nearly all binary application protocols are parseable in this manner, so KCM should be applicable across a wide range of applications.

For development plans, benchmarks and FAQ, see the merge

Recommended LWN article: The kernel connection multiplexer

API documentation: Documentation/networking/kcm.txt

Code: commit, commit, commit, commit, commit, commit, commit

1.6. 802.1AE MAC-level encryption (MACsec)

This release adds support for MACsec IEEE 802.1AE, a standard that provides encryption over ethernet. It encrypts and authenticate all traffic in a LAN with GCM-AES-128. It can protect DHCP traffic and VLANs, prevent tampering on ethernet headers. MACsec is designed to be used with the MACsec Key Agreement protocol extension to 802.1X, which provides channel attribution and key distribution to the nodes, but can also be used with static keys getting fed manually by an administrator.

Media: DevConf.cz video about MACsec

Code: commit

1.7. BATMAN V protocol

B.A.T.M.A.N. (Better Approach To Mobile Adhoc Networking) adds support for the V protocol, successor of the IV protocol. The new protocol splits the OGM protocol into two subcomponents: ELP (Echo Location Protocol), in charge of dealing with the neighbour discovery and link quality estimation; and a new OGM protocol, OGMv2, which implements the algorithm that spreads the metrics around the network and computes optimal paths. The biggest change introduced with B.A.T.M.A.N. V is the new metric: the protocol won't rely on packet loss anymore, but the estimated throughput.

Code: commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit

1.8. dma-buf: new ioctl to manage cache coherency between CPU and GPU

Userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl.

Recommender article: Sharing CPU and GPU buffers on Linux

Code: commit

1.9. OCFS2 online inode checker

OCFS2 is often used in high-availaibility systems. OCFS2 usually converts the filesystem to read-only when encounters an error, but this decreases availability and is not always necessary. OCFS2 has the mount option (errors=continue), which returns the EIO error to the calling process, it doesn't remount the filesystem to read-only, and the problematic file's inode number is reported in the kernel log. This release adds a very simple in-kernel inode checker that can be used to check and reset the inode. Note that this feature is intended for very small issues which may hinder day-to-day operations of a cluster filesystem by turning the filesystem read-only, it is not suited for complex checks which involve dependency of other components of the filesystem. In these cases, the offline fsck is recommended.

The scope of checking/fixing is at the file level, initially only for regular files. The way this file checker is by writting the inode number, reported in dmesg, to /sys/fs/ocfs2/devname/filecheck/check, then read the output of that file to know what kind of error it has. If you determine to fix this inode, write the inode number to /sys/fs/ocfs2/devname/filecheck/fix, then read the file to know if the inode was able to be fixed or not. For more details see the documentation

Code: commit, commit, commit, commit, commit

1.10. Support for cgroup namespaces

This release adds support for cgroup namespaces, which provides a mechanism to virtualize the view of the /proc/$PID/cgroup file and cgroup mounts. A new clone flag, CLONE_NEWCGROUP, can be used with clone(2) and unshare(2) to create a new cgroup namespace. For a correctly setup container this enables container-tools (like libcontainer, lxc, lmctfy, etc.) to create completely virtualized containers without leaking system level cgroup hierarchy.

Without cgroup namespace, the /proc/$PID/cgroup file shows the complete path of the cgroup of a process. In a container setup where a set of cgroups and namespaces are intended to isolate processes the /proc/$PID/cgroup file may leak potential system level information to the isolated processes.

Documentation https://git.kernel.org/torvalds/c/d4021f6cd41f03017f831b3d40b0067bed54893d

Code: commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit

1.11. Add support for the pNFS SCSI layout

This release adds NFSv4.1 support for parallel NFS SCSI layouts in the Linux NFS server, a variant of the block layout which uses SCSI features to offer improved fencing and device identification. With pNFS SCSI layouts, the NFS server acts as Metadata Server for pNFS, which in addition to handling all the metadata access to the NFS export, also hands out layouts to the clients so that they can directly access the underlying SCSI LUNs that are shared with the client. See draft-ietf-nfsv4-scsi-layout for more details

To use pNFS SCSI layouts, the exported file system needs to support the pNFS SCSI layouts (currently just XFS), and the file system must sit on a SCSI LUN that is accessible to the clients in addition to the MDS. As of now the file system needs to sit directly on the exported LUN, striping or concatenation of LUNs on the MDS and clients is not supported yet. On a server built with CONFIG_NFSD_SCSI, the pNFS SCSI volume support is automatically enabled if the file system is exported using the "pnfs" option and the underlying SCSI device support persistent reservations. On the client make sure the kernel has the CONFIG_PNFS_BLOCK option enabled, and the file system is mounted using the NFSv4.1 protocol version (mount -o vers=4.1.

Code: commit, commit, commit, commit

2. Core (various)

3. File systems

4. Memory management

5. Block layer

6. Cryptography

7. Security

8. Tracing and perf tool

9. Virtualization

10. Networking

11. Architectures

12. Drivers

12.1. Graphics

12.2. Storage

12.3. Staging

12.4. Networking

12.5. Audio

12.6. Tablets, touch screens, keyboards, mouses

12.7. TV tuners, webcams, video capturers

12.8. USB

12.9. Serial Peripheral Interface (SPI)

12.10. Watchdog

12.11. Serial

12.12. ACPI, EFI, cpufreq, thermal, Power Management

12.13. Real Time Clock (RTC)

12.14. Voltage, current regulators, power capping, power supply

12.15. Rapid I/O

12.16. Pin Controllers (pinctrl)

12.17. Memory Technology Devices (MTD)

12.18. Multi Media Card

12.19. Industrial I/O (iio)

12.20. Multi Function Devices (MFD)

12.21. Inter-Integrated Circuit (I2C)

12.22. Hardware monitoring (hwmon)

12.23. General Purpose I/O (gpio)

12.24. Clocks

12.25. PCI

12.26. Various

13. List of merges

14. Other news sites

KernelNewbies: Linux_4.6 (last edited 2017-12-30 01:30:23 by localhost)