KernelNewbies:

Linux 4.6 [https://lkml.org/lkml/2016/5/15/83 was released] on Sun, 15 May 2016.

Summary: This release adds support for USB 3.1 SuperSpeedPlus (10 Gbps), the new distributed file system OrangeFS, a more reliable out-of-memory handling, support for Intel memory protection keys, a facility to make easier and faster implementations of application layer protocols, support for 802.1AE MAC-level encryption (MACsec), support for the version V of the BATMAN protocol, a OCFS2 online inode checker, support for cgroup namespaces, support for the pNFS SCSI layout, and many other improvements and new drivers.

TableOfContents()

1. Prominent features

1.1. USB 3.1 SuperSpeedPlus (10 Gbps) support

USB 3.1 specification includes a new SuperSpeedPlus protocol supporting up to 10Gbps speeds. USB 3.1 devices using the new SuperSpeedPlus protocol are called USB 3.1 Gen2 devices (note that USB 3.1 SuperSpeedPlus is not the same as Type-C or power delivery).

This release adds support for the USB 3.1 SuperSpeedPlus 10 Gbps speeds for USB core and xHCI host controller, meaning that a USB 3.1 mass storage connected to a USB 3.1 capable xHCI host should work with 10 Gbps speeds.

Code: [https://git.kernel.org/torvalds/c/8a1b2725a60d3267135c15e80984b4406054f650 commit], [https://git.kernel.org/torvalds/c/2c0e06f8829a542e71b14ffcaa14b8fafa2223c3 commit], [https://git.kernel.org/torvalds/c/b2316645ca5ea93eb8f637f57199fbbe88bee07d commit], [https://git.kernel.org/torvalds/c/9508e3b7a70c11370d70252147b75d3024754970 commit], [https://git.kernel.org/torvalds/c/0cdd49a1d1a483d80170d9e592f832274e8bce1b commit], [https://git.kernel.org/torvalds/c/0caf6b33452112e5a1186c8c964e90310e49e6bd commit], [https://git.kernel.org/torvalds/c/5f9c3a668b3f75768aec686901d7a4c8782983df commit], [https://git.kernel.org/torvalds/c/5da665fcec1a308f5273aacb9da8e87b89da8b4f commit], [https://git.kernel.org/torvalds/c/d78540419866887345cec480016b0f87f6a5aca2 commit], [https://git.kernel.org/torvalds/c/c8b1d8977eee3acc63a65811dd72ec4a93b74388 commit], [https://git.kernel.org/torvalds/c/b37d83a6a41499d582b8faedff1913ec75d9e70b commit], [https://git.kernel.org/torvalds/c/faee822c5a7ab99de25cd34fcde3f8d37b6b9923 commit], [https://git.kernel.org/torvalds/c/def4e6f7b419c4092c82222d0896d6c409692326 commit], [https://git.kernel.org/torvalds/c/8ef8a9f5c148ae1dbeae926e5b6129e396faded2 commit], [https://git.kernel.org/torvalds/c/09c352ed671c156b7ce30c81a4f4424641859918 commit], [https://git.kernel.org/torvalds/c/2f6d3b653777e68bbccfdcff3de2ea8165934531 commit]

1.2. Improve the reliability of the Out Of Memory task killer

In previous releases, the OOM killer (which tries to kill a task to free memory) tries to kill a single task in a good hope that the task will terminate in a reasonable time and frees up its memory. In practice, it has been shown that it's easy to find workloads which break that assumption, and the OOM victim might take unbounded amount of time to exit because it might be blocked in the uninterruptible state waiting for an event which is blocked by another task looping in the page allocator. This release adds a specialized kernel thread oom_reaper that tries to reclaim memory by preemptively reaping the anonymous or swapped out memory owned by the OOM victim, under an assumption that such a memory won't be needed when its owner is killed anyway.

Recommended LWN article: [https://lwn.net/Articles/668126/#reaper Toward more predictable and reliable out-of-memory handling]

Code: [https://git.kernel.org/torvalds/c/69b27baf00fa9b7b14b3263c105390d1683425b2 commit], [https://git.kernel.org/torvalds/c/aac453635549699c13a84ea1456d5b0e574ef855 commit], [https://git.kernel.org/torvalds/c/36324a990cf578b57828c04cd85ac62cd25cf5a4 commit], [https://git.kernel.org/torvalds/c/bc448e897b6d24aae32701763b8a1fe15d29fa26 commit], [https://git.kernel.org/torvalds/c/03049269de433cb5fe2859be9ae4469ceb1163ed commit], [https://git.kernel.org/torvalds/c/855b018325737f7691f9b7d86339df40aa4e47c3 commit], [https://git.kernel.org/torvalds/c/29c696e1c6eceb5db6b21f0c89495fcfcd40c0eb commit], [https://git.kernel.org/torvalds/c/e26796066fdf929cbba22dabb801808f986acdb9 commit], [https://git.kernel.org/torvalds/c/bb29902a7515208846114b3b36a4281a9bbf766a commit]

1.3. Support for Intel memory protection keys

This release adds support for a memory protection hardware feature that is available in upcoming Intel CPUs: protection keys. Protection keys allow the encoding of user-controllable permission masks in the page table entries (pte). Instead of having a fixed protection mask in the pte (which needs a system call to change and works on a per page basis), the user can map a handful of protection mask variants. User space can then manipulate a new user-accessible, thread-local register, (PKRU) with two separate bits (Access Disable and Write Disable) for each mask. This makes possible to dynamically switch the protection bits of very large amounts of virtual memory by just manipulating a CPU register, without having to change every single page in the affected virtual memory range.

It also allows more precise control of MMU permission bits: for example the executable bit is separate from the read bit. This release adds the infrastructure for that, plus it adds a high level API to make use of protection keys. If a user-space application calls: mmap(..., PROT_EXEC) or mprotect(ptr, sz, PROT_EXEC) (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice this special case, and will set a special protection key on this memory range. It also sets the appropriate bits in the PKRU register so that the memory becomes unreadable and unwritable. So using protection keys the kernel is able to implement 'true' {PROT_EXEC: code that can be executed, but not read, which is a small security advantage (but note that malicious code can manipulate the PKRU register too). In the future, there will further work around protection keys that will offer more high level call APIs to manage protection keys.

Recommended LWN article: [https://lwn.net/Articles/667156/ Memory protection keys]

Code: [https://git.kernel.org/torvalds/c/643ad15d47410d37d43daf3ef1c8ac52c281efa5 (merge)]

1.4. OrangeFS, a new distributed file system

OrangeFS is an LGPL scale-out parallel storage system. Oiginally called PVFS, it was first developed in 1993 by Walt Ligon and Eric Blumer as a parallel file system for Parallel Virtual Machine as part of a NASA grant to study the I/O patterns of parallel programs. It is ideal for large storage problems faced by HPC, BigData, Streaming Video, Genomics, Bioinformatics. OrangeFS can be accessed through included system utilities, user integration libraries, MPI-IO and can be used by the Hadoop ecosystem as an alternative to the HDFS filesystem.

Applications often don't require Orangefs to be mounted into the VFS, but the Orangefs kernel client allows Orangefs filesystems to be mounted as a VFS. The kernel client communicates with a userspace daemon which in turn communicates with the Orangefs server daemons that implement the file system. The server daemons (there's almost always more than one) need not be running on the same host as the kernel client. Orangefs filesystems can also be mounted with FUSE.

Recommended LWN article: [https://lwn.net/Articles/643165/ The OrangeFS distributed filesystem]

Documentation: [https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/Documentation/filesystems/orangefs.txt Documentation/filesystems/orangefs.txt]

Website: http://www.orangefs.org/

Code: [https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/fs/orangefs fs/orangefs]

1.5. Kernel Connection Multiplexor, a facility for accelerating application layer protocols

This release adds Kernel Connection Multiplexor (KCM), a facility that provides a message-based interface over TCP for accelerating application layer protocols. The motivation for this is based on the observation that although TCP is byte stream transport protocol with no concept of message boundaries, a common use case is to implement a framed application layer protocol running over TCP. Most TCP stacks offer byte stream API for applications, which places the burden of message delineation, message I/O operation atomicity, and load balancing in the application.

With KCM an application can efficiently send and receive application protocol messages over TCP using a datagram interface. The kernel provides necessary assurances that messages are sent and received atomically. This relieves much of the burden applications have in mapping a message based protocol onto the TCP stream. KCM also make application layer messages a unit of work in the kernel for the purposes of steerng and scheduling, which in turn allows a simpler networking model in multithreaded applications. In order to delineate message in a TCP stream for receive in KCM, the kernel implements a message parser based on BPF, which parses application layer messages and returns a message length. Nearly all binary application protocols are parseable in this manner, so KCM should be applicable across a wide range of applications.

For development plans, benchmarks and FAQ, see the [https://git.kernel.org/torvalds/c/9531ab65f4ec066a6e6617a08a293c60397a161b merge]

Recommended LWN article: [https://lwn.net/Articles/657999/ The kernel connection multiplexer]

API documentation: [https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/Documentation/networking/kcm.txt?id=10016594f4c6b3ef34c5de97d8ab62205d9d26a5 Documentation/networking/kcm.txt]

Code: [https://git.kernel.org/torvalds/c/ab7ac4eb9832e32a09f4e8042705484d2fb0aad3 commit], [https://git.kernel.org/torvalds/c/cd6e111bf5be5c70aef96a86d791ee7be0c0e137 commit], [https://git.kernel.org/torvalds/c/91687355b92735e5f247ed163b3b0b4d14c3cab6 commit], [https://git.kernel.org/torvalds/c/f29698fc6b3a45a5c6147eca8379f38be8232117 commit], [https://git.kernel.org/torvalds/c/7ced95ef525c329f947c424859cf2b0a3b731f8c commit], [https://git.kernel.org/torvalds/c/29152a34f72cb4d7ab32885ad2f20a482c92a8f3 commit], [https://git.kernel.org/torvalds/c/10016594f4c6b3ef34c5de97d8ab62205d9d26a5 commit]

1.6. 802.1AE MAC-level encryption (MACsec)

This release adds support for [http://standards.ieee.org/getieee802/download/802.1AE-2006.pdf MACsec IEEE 802.1AE], a standard that provides encryption over ethernet. It encrypts and authenticate all traffic in a LAN with GCM-AES-128. It can protect DHCP traffic and VLANs, prevent tampering on ethernet headers. MACsec is designed to be used with the MACsec Key Agreement protocol extension to 802.1X, which provides channel attribution and key distribution to the nodes, but can also be used with static keys getting fed manually by an administrator.

Media: [https://www.youtube.com/watch?v=G_8gW_iOS58 DevConf.cz video about MACsec]

Code: [https://git.kernel.org/torvalds/c/c09440f7dcb304002dfced8c0fea289eb25f2da0 commit]

1.7. BATMAN V protocol

B.A.T.M.A.N. (Better Approach To Mobile Adhoc Networking) adds support for the [https://www.open-mesh.org/projects/batman-adv/wiki/BATMAN_V V protocol], successor of the IV protocol. The new protocol splits the OGM protocol into two subcomponents: [https://www.open-mesh.org/projects/batman-adv/wiki/ELP ELP] (Echo Location Protocol), in charge of dealing with the neighbour discovery and link quality estimation; and a new OGM protocol, [https://www.open-mesh.org/projects/batman-adv/wiki/OGMv2 OGMv2], which implements the algorithm that spreads the metrics around the network and computes optimal paths. The biggest change introduced with B.A.T.M.A.N. V is the new metric: the protocol won't rely on packet loss anymore, but the estimated throughput.

Code: [https://git.kernel.org/torvalds/c/d6f94d91f766b4205e5b0aa4b11f96271c793f6d commit], [https://git.kernel.org/torvalds/c/162bd64c24aba8efe68948e95e61628403106cd7 commit], [https://git.kernel.org/torvalds/c/7f136cd491013285442ee1e7854fab1736f5757c commit], [https://git.kernel.org/torvalds/c/0da0035942d47766c32843143fb5dba7a29cb48c commit], [https://git.kernel.org/torvalds/c/9323158ef9f49935f0c61509919acd31dda8f11b commit], [https://git.kernel.org/torvalds/c/0b5ecc6811bd576ecc9813bbe069f2293cb1c6aa commit], [https://git.kernel.org/torvalds/c/c833484e5f3872a38fe232c663586069d5ad9645 commit], [https://git.kernel.org/torvalds/c/8d2d499e08145d9851097e1241ef15aad8c9170a commit], [https://git.kernel.org/torvalds/c/9786906022eba35763b17c54a35913ca65151a78 commit], [https://git.kernel.org/torvalds/c/261e264db636ae1f4c43e56b8c57d7343b166fc9 commit], [https://git.kernel.org/torvalds/c/626d23e83c88df5ff535414c2fe29e16b95d6b7a commit]

1.8. dma-buf: new ioctl to manage cache coherency between CPU and GPU

Userspace might need some sort of cache coherency management e.g. when CPU and GPU domains are being accessed through dma-buf at the same time. To circumvent this problem there are begin/end coherency markers, that forward directly to existing dma-buf device drivers vfunc hooks. Userspace can make use of those markers through the DMA_BUF_IOCTL_SYNC ioctl.

Recommender article: [https://01.org/blogs/2016/sharing-cpu-and-gpu-buffers-linux Sharing CPU and GPU buffers on Linux]

Code: [https://git.kernel.org/torvalds/c/c11e391da2a8fe973c3c2398452000bed505851e commit]

1.9. OCFS2 online inode checker

OCFS2 is often used in high-availaibility systems. OCFS2 usually converts the filesystem to read-only when encounters an error, but this decreases availability and is not always necessary. OCFS2 has the mount option (errors=continue), which returns the EIO error to the calling process, it doesn't remount the filesystem to read-only, and the problematic file's inode number is reported in the kernel log. This release adds a very simple in-kernel inode checker that can be used to check and reset the inode. Note that this feature is intended for very small issues which may hinder day-to-day operations of a cluster filesystem by turning the filesystem read-only, it is not suited for complex checks which involve dependency of other components of the filesystem. In these cases, the offline fsck is recommended.

The scope of checking/fixing is at the file level, initially only for regular files. The way this file checker is by writting the inode number, reported in dmesg, to /sys/fs/ocfs2/devname/filecheck/check, then read the output of that file to know what kind of error it has. If you determine to fix this inode, write the inode number to /sys/fs/ocfs2/devname/filecheck/fix, then read the file to know if the inode was able to be fixed or not. For more details see the [https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/Documentation/filesystems/ocfs2-online-filecheck.txt documentation]

Code: [https://git.kernel.org/torvalds/c/d750c42ac265c00df3f0963a240a4440fa073603 commit], [https://git.kernel.org/torvalds/c/d56a8f32e4c662509ce50a37e78fa66c777977d3 commit], [https://git.kernel.org/torvalds/c/a849d46816fe9e11d59aae78ea95c54f640b1904 commit], [https://git.kernel.org/torvalds/c/9dde5e4f3383c3459a67ab63786ca58d645d6b5e commit], [https://git.kernel.org/torvalds/c/a860f6eb4c6a8bb0ca6860d9472f424bad9af9cf commit]

1.10. Support for cgroup namespaces

This release adds support for [http://man7.org/linux/man-pages/man7/cgroup_namespaces.7.html cgroup namespaces], which provides a mechanism to virtualize the view of the /proc/$PID/cgroup file and cgroup mounts. A new clone flag, CLONE_NEWCGROUP, can be used with [http://man7.org/linux/man-pages/man2/clone.2.html clone(2)] and [http://man7.org/linux/man-pages/man2/unshare.2.html unshare(2)] to create a new cgroup namespace. For a correctly setup container this enables container-tools (like libcontainer, lxc, lmctfy, etc.) to create completely virtualized containers without leaking system level cgroup hierarchy.

Without cgroup namespace, the /proc/$PID/cgroup file shows the complete path of the cgroup of a process. In a container setup where a set of cgroups and namespaces are intended to isolate processes the /proc/$PID/cgroup file may leak potential system level information to the isolated processes.

Documentation https://git.kernel.org/torvalds/c/d4021f6cd41f03017f831b3d40b0067bed54893d

Code: [https://git.kernel.org/torvalds/c/d22025570e2ebfc68819b35c5d457e53d9337217 commit], [https://git.kernel.org/torvalds/c/fa5ff8a1c43fc7b78353059899edf3cbedf54e9f commit], [https://git.kernel.org/torvalds/c/a79a908fd2b080977b45bf103184b81c9d11ad07 commit], [https://git.kernel.org/torvalds/c/a0530e087e648263f81a81d62ca020f66b54bcb0 commit], [https://git.kernel.org/torvalds/c/ed82571b1a14ab2bfbede2bb2c209700495749fc commit], [https://git.kernel.org/torvalds/c/d4021f6cd41f03017f831b3d40b0067bed54893d commit], [https://git.kernel.org/torvalds/c/1c53753e0df1ae4d21661053459e7c024a43f1d3 commit], [https://git.kernel.org/torvalds/c/ed82571b1a14ab2bfbede2bb2c209700495749fc commit], [https://git.kernel.org/torvalds/c/9f6df573a4041f896cbf51f1b3743494196620a7 commit], [https://git.kernel.org/torvalds/c/fb3c8315650f89a1993fb3ae3e74e9c7e4a1c9c0 commit], [https://git.kernel.org/torvalds/c/5e2bec7c2248ae27c5b16cd97215ae05c1d39179 commit]

1.11. Add support for the pNFS SCSI layout

This release adds NFSv4.1 support for parallel NFS SCSI layouts in the Linux NFS server, a variant of the block layout which uses SCSI features to offer improved fencing and device identification. With pNFS SCSI layouts, the NFS server acts as Metadata Server for pNFS, which in addition to handling all the metadata access to the NFS export, also hands out layouts to the clients so that they can directly access the underlying SCSI LUNs that are shared with the client. See [https://tools.ietf.org/html/draft-ietf-nfsv4-scsi-layout-05 draft-ietf-nfsv4-scsi-layout] for more details

To use pNFS SCSI layouts, the exported file system needs to support the pNFS SCSI layouts (currently just XFS), and the file system must sit on a SCSI LUN that is accessible to the clients in addition to the MDS. As of now the file system needs to sit directly on the exported LUN, striping or concatenation of LUNs on the MDS and clients is not supported yet. On a server built with CONFIG_NFSD_SCSI, the pNFS SCSI volume support is automatically enabled if the file system is exported using the "pnfs" option and the underlying SCSI device support persistent reservations. On the client make sure the kernel has the CONFIG_PNFS_BLOCK option enabled, and the file system is mounted using the NFSv4.1 protocol version (mount -o vers=4.1.

Code: [https://git.kernel.org/torvalds/c/40cf446b9482bd2c681b60062b34cc47c78342f8 commit], [https://git.kernel.org/torvalds/c/d9186c03976506cde2c2b1219028bed449c948ed commit], [https://git.kernel.org/torvalds/c/81c39329010d6131c0909ccb91ffeaffc2e99010 commit], [https://git.kernel.org/torvalds/c/f99d4fbdae6765d0bb4ed5441f6fa1f036122d59 commit]

2. Core (various)

3. File systems

4. Memory management

5. Block layer

6. Cryptography

7. Security

8. Tracing and perf tool

9. Virtualization

10. Networking

11. Architectures

12. Drivers

12.1. Graphics

12.2. Storage

12.3. Staging

12.4. Networking

12.5. Audio

12.6. Tablets, touch screens, keyboards, mouses

12.7. TV tuners, webcams, video capturers

12.8. USB

12.9. Serial Peripheral Interface (SPI)

12.10. Watchdog

12.11. Serial

12.12. ACPI, EFI, cpufreq, thermal, Power Management

12.13. Real Time Clock (RTC)

12.14. Voltage, current regulators, power capping, power supply

12.15. Rapid I/O

12.16. Pin Controllers (pinctrl)

12.17. Memory Technology Devices (MTD)

12.18. Multi Media Card

12.19. Industrial I/O (iio)

12.20. Multi Function Devices (MFD)

12.21. Inter-Integrated Circuit (I2C)

12.22. Hardware monitoring (hwmon)

12.23. General Purpose I/O (gpio)

12.24. Clocks

12.25. PCI

12.26. Various

13. List of merges

14. Other news sites

KernelNewbies: Linux_4.6 (last edited 2016-05-16 20:02:20 by MichaelKerrisk)