KernelNewbies:

Linux 5.8 has been released on Sun, 2 Aug 2020.

Summary: This release adds: memory management changes to improve the behaviour of systems under thrashing situations; a event notification mechanism built on top of standard pipes that splices messages from the kernel into pipes opened by userspace; support for having different procfs mounts with different mount options each one; a Kernel Concurrency Sanitizer that helps to find data race bugs; make it possible to use pidfds with setns(2) for easier attachment to the namespaces of a process; support for Shadow Call Stack and Branch Target Identification in ARM64 to prevent security exploits; support for Inline Encryption hardware; new CAP_BPF and CAP_PERFMON capabilities for BPF and performance monitoring programs; and IPv6 MPLS support. As always, there are many other new drivers and improvements.

1. Prominent features

1.1. Better behavior in memory thrashing situations

The reclaim code that balances between swapping and cache memory reclaim tries to predict likely reuse of a memory page. When it fails it cannot detect when the cache is thrashing pathologically, or when the system is in the middle of a swap storm. This code has been tuned over time to a point where even in the presence of large amounts of cold anonymous memory and a capable swap device, the VM refuses to even seriously scan these pages, and can leave the page cache thrashing needlessly. The proliferation of fast random IO devices such as SSDs has made this undesirable behavior more noticeable.

This release sets out to address this. Since Linux 3.15, the kernel has exact tracking of refault IO - the ultimate cost of reclaiming the wrong pages. This allows to use an IO cost based balancing model that is more aggressive about scanning anonymous memory when the cache is thrashing, while being able to avoid unnecessary swap storms. This release base the LRU balance on the rate of refaults on each list, times the relative IO cost between swap device and filesystem (swappiness), in order to optimize reclaim for least IO cost incurred. The swapiness sysctl can also now be raised up to 200 to force the kernel to use swapping, which can be useful with in-memory swap, like zram or zswap.

1.2. Kernel Concurrency Sanitizer

The Kernel Concurrency Sanitizer (KCSAN) is a data race detector for the kernel. Key priorities in KCSAN's design are lack of false positives, scalability, and simplicity. KCSAN uses compile-time instrumentation to instrument memory accesses and it is supported in both GCC and Clang.

Documentation: The Kernel Concurrency Sanitizer (KCSAN)

Recommended LWN article: Concurrency bugs should fear the big bad data-race detector (part 1)

1.3. Kernel event notification mechanism

This release adds an event notification mechanism built on top of standard pipes, it splices notification messages from the kernel into pipes opened by userspace. The pipe is opened in a special mode, and its internal buffer is used to hold messages generated by the kernel, which are then read out by read(2). The owner of the pipe tells the kernel which sources it would like to watch through that pipe, and filters may also be emplaced on a pipe so that certain source types and subevents can be ignored if they’re not of interest. In this release, the only event source is for keys/keyrings, such as linking and unlinking keys and changing their attributes, which will be used by Gnome.

Documentation: General notification mechanism

Recommended LWN article: A kernel event notification mechanism

1.4. Private procfs instances

Procfs was historically tied to PID namespaces, this has the effect that all new procfs mounts are just a mirror of the internal one; any change, any mount option update, any new future introduction will propagate to all other procfs mounts in the same PID namespace.

This release allows to have several procfs mounts with different mounts options within the same PID namespace. The main aim of this work is to have on embedded systems one supervisor for apps. It also adds some convenient mount options that let a private procfs mount to show only ptraceable processes in the procfs, which allows to support lightweight sandboxes in Embedded Linux. Or a mount option that allows to hide non-pid inodes.

1.5. Using pidfds to attach to namespaces

This release makes it possible to use pidfds to attach to the namespaces of a process, i.e. they can be passed as the first argument to the setns(2) syscall. When a pidfd is passed, multiple namespace flags can be specified in the second argument and setns(2) will then attach the caller to all the specified namespaces all at once or to none of them. Eg: setns(pidfd, CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET);

These features support various use-cases where callers setns to a subset of namespaces to retain privilege, perform an action and then re-attach another subset of namespaces. Apart from reducing the number of syscalls needed to attach to all currently supported namespaces, this also allows to setns to a set of namespaces atomically, this is useful for a standard container manager interacting with a running container.

1.6. Shadow Call Stack and Branch Target Identification for improved security on ARM64

This release adds generic support for Clang's Shadow Call Stack on ARM64, which uses a shadow stack to protect function return control flow from buffer overruns on the main stack.

There is also support for ARMv8.5-BTI in both user- and kernel-space. This allows branch targets to limit the types of branch from which they can be called and additionally prevents branching to arbitrary code.

Recommended LWN article: Some near-term arm64 hardening patches

1.7. Support for Inline Encryption hardware

This release supports Inline Encryption in the block layer. Inline Encryption hardware allows software to specify an encryption context (an encryption key, crypto algorithm, data unit num, data unit size, etc.) along with a data transfer request to a storage device, and the inline encryption hardware will use that context to en/decrypt the data. The inline encryption hardware is part of the storage device, and it conceptually sits on the data path between system memory and the storage device.

Recommended LWN article: Inline encryption for filesystems

1.8. Introduce CAP_BPF and CAP_PERFMON security capabilities

Using BPF has required the CAP_SYS_ADMIN capability to run. This means that software that needs to use BPF needs that capability, which grants way too many privileges. This releases grants access to BPF functionality with a new CAP_BPF capability combined with CAP_PERFMON, CAP_NET_ADMIN and some of them kept under CAP_SYS_ADMIN. The user process has to have: CAP_BPF to create maps and do other sys_bpf() commands, CAP_BPF and CAP_PERFMON to load tracing programs, and CAP_BPF plus CAP_NET_ADMIN to load networking programs.

This release also adds the CAP_PERFMON capability for performance monitoring and observability.

Recommended LWN article: CAP_PERFMON — and new capabilities in general

1.9. IPv6 MPLS support

This release extends the Multi-Protocol Label Switching support to IPv6.

1.10. bridge: Add support for Media Redundancy Protocol (MRP)

This release adds support for the Media Redundancy Protocol is a data network protocol standardized by International Electrotechnical Commission as IEC 62439-2. It allows rings of Ethernet switches to overcome any single failure with recovery time faster than STP. It is primarily used in Industrial Ethernet applications.

2. Core (various)

3. File systems

4. Memory management

5. Block layer

6. Tracing, perf and BPF

7. Virtualization

8. Cryptography

9. Security

10. Networking

11. Architectures

11.1. ARM

11.2. MIPS

11.3. X86

11.4. POWERPC

11.5. RISCV

11.6. S390

11.7. ARC

11.8. M68K

11.9. SH

11.10. PARISC

12. Drivers

12.1. Graphics

12.2. Power Management

12.3. Storage

12.4. Drivers in the Staging area

12.5. Networking

12.6. Audio

12.7. Tablets, touch screens, keyboards, mouses

12.8. TV tuners, webcams, video capturers

12.9. Universal Serial Bus

12.10. Serial Peripheral Interface (SPI)

12.11. Real Time Clock (RTC)

12.12. Pin Controllers (pinctrl)

12.13. Multi Media Card (MMC)

12.14. Memory Technology Devices (MTD)

12.15. Industrial I/O (iio)

12.16. Multi Function Devices (MFD)

12.17. Pulse-Width Modulation (PWM)

12.18. Inter-Integrated Circuit (I2C + I3C)

12.19. Hardware monitoring (hwmon)

12.20. General Purpose I/O (gpio)

12.21. Leds

12.22. DMA engines

12.23. Cryptography hardware acceleration

12.24. PCI

12.25. Non-Transparent Bridge (NTB)

12.26. Thunderbolt

12.27. Clock

12.28. PHY ("physical layer" framework)

12.29. EDAC (Error Detection And Correction)

12.30. 1-Wire (W1)

12.31. Firmware

12.32. Various

13. List of Pull Requests

14. Other news sites

KernelNewbies: Linux_5.8 (last edited 2020-10-08 19:05:16 by diegocalleja)