KernelNewbies:

Linux 5.1 released on 5 May 2019

Summary: This release includes io_uring, an high-performance interface for asynchronous I/O; it also adds improvements in fanotify to provide a scalable way of watching changes on large file systems; it adds a method to allow safe delivery of signals in presence of PID reuse; persistent memory can be used now as hot-plugabble RAM; Zstd compression levels have been made configurable in Btrfs; it also adds a new cpuidle governor that makes better power management decisions than the menu governor; all 32 bit architectures have added the necessary syscalls to deal with the y2038 problem; it is possible now to boot to a device-mapper device without initramfs; and live patching has added support for creating cumulative patches. As always, there are many other new drivers and improvements.

(Note: in this release, the "coolest features" section does not have links to the code commits, those are available for each feature in their respective sections, preceded by FEATURED)

1. Coolest features

1.1. High-performance asynchronous I/O with io_uring

Linux has supported asynchronous I/O for a long time. However, the interface suffers from a large number of shortcomings. It does not support buffered I/O, only unbuffered (O_DIRECT) I/O, which only a subset of a subset of applications use. Even in those cases asynchronous IO was some times not really asynchronous or fast. All attempts to fix the existing interface have failed.

A new asynchronous interface, io_uring, has been created and merged in the release, with the purpose of finally adding fast, scalable asynchronous I/O to Linux, both buffered and unbuffered. It also supports asynchronous polled IO, and other features that will be added in the future. For more details, read this document (PDF). For performance details, see this email.

Additionally, a user space library, liburing, has been created to provides basic functionality for applications that don't need or want to care about how to fiddle with the low level details of the kernel interface. It has helpers to allow applications to easily set up an io_uring instance, and submit/complete IO through it without knowing about the intricacies of the rings, and will continue to grow support helper functions and features as time progresses.

Document explaining the reasons for the existence of io_uring, inner workings of it, the user visible interface and liburing: io_uring.pdf

Recommended LWN article: Ringing in a new asynchronous I/O API

liburing git repository: http://git.kernel.dk/cgit/liburing/

1.2. Improved fanotify for better file system monitorization

Unlike other operating systems, Linux does not have an efficient way to watch changes on a large file system. The only way to monitor file system dirent modification events is recursive inotify watches, which scales poorly for large directory trees. The fanotify interface, introduced in Linux 2.6.36, was intended to supersede inotify and solve its deficiencies, and it initially took several steps in the direction of solving scalability issues, but the work needed to completely supersede inotify.

This release (along with the fanotify changes in Linux 4.20) expands fanotify to provide "super block root watch" functionality, which is a scalable way of watching changes on large file systems. For more details see the project wiki

1.3. Safe signal delivery in presence of PID reuse

The kill(2) syscall operates on PIDs. After a process has exited its PID can be reused by another process. If a caller sends a signal to a reused PID it will end up signaling the wrong process. This is an old problem with the Unix process design, and has caused numerous security problems.

After considering several proposals, the Linux kernel has added a new syscall, pidfd_send_signal(2), which uses file descriptors from /proc/<pid> as stable handles on struct pid. Even if a pid is recycled the handle will not change, and the file descriptor can be used to safely send signals to the process it refers to. Note that Linux 5.2 adds a CLONE_PIDFD flag to clone(2) that will allow to retrieve a pid file descriptor that is usable for these purposes.

Recommended LWN article: Toward race-free process signaling

asciinema recording for the basic functionality: https://asciinema.org/a/IQjuCHew6bnq1cr78yuMv16cy

1.4. Use persistent memory as RAM

Linux supports persisten memory devices, but they are often used as storage devices. Some users want to use persistent memory as additional volatile memory, they are willing to cope with potential performance differences, and want to use typical Linux memory management apis rather than a userspace memory allocator layered over an mmap() of a dax file. This release allows them to do so. This is intended for use with NVDIMMs that are physically persistent (physically like flash) so that they can be used as a cost-effective RAM replacement.

Recommended LWN article: Persistent memory for transient data

1.5. TEO, an alternative cpuidle governor to 'menu'

The cpuidle subsystem is the part of the kernel in charge of deciding which CPU deep idle state should be used when the CPU has nothing to do (deeper idle states save more power, but it takes more time to get out of them). There are two cpuidle governors, "menu" and "ladder", each one using different heuristics. However, the menu governor is believed to have a number of shortcomings in its heuristics, but instead of being fixed an alternative was introduced in this release so people can compare both: TEO, the Timer Events Oriented Governor, which seems to offer improved performance with no extra power consumption cost. You can check your governor in /sys/devices/system/cpu/cpuidle/current_governor_ro) and change the default cpuidle governor at boot time with the cpuidle.governor=teo boot parameter.

Recommended LWN article: Improving idle behavior in tickless systems

1.6. More Year 2038 preparation

As part of the work to prepare for the prepare for the year 2038 problem, this release includes syscalls for 32 bits architecture with a 64-bit time_t structure. This finally allows to have system calls with 64-bit time_t on all architectures, after a long time of preparation

Recommended LWN article: Approaching the kernel year-2038 end game

1.7. Configurable Zstd compression level in Btrfs

Btrfs has support Zstd compression since Linux 4.14, but it didn't allow to configure the compression level used by the filesystem, which can make a big difference. This release adds support for configuring the compression level used in a Btrfs file system, as it's done with zlib, by using the mount option -o compress=zstd:level. To see the levels available and their impact in performance and compression rate, see this commit

1.8. Boot to a device-mapper device without initramfs

In order to boot to a filesystem placed in a device-mapper device, you need an initramfs. Some people, however, don't want or can't use an initramfs. This release allows to use of DM targets in the boot process (as the root device or otherwise) without the need of an initramfs, with the help of a tricky kernel boot parameter. For more details see the documentation

1.9. live patching: support cumulative patches

There might be dependencies between livepatches. If multiple patches need to do different changes to the same function(s) then it's necessary to define an order in which the patches will be installed. And function implementations from any newer livepatch must be done on top of the older ones. This might become a maintenance nightmare for distros, especially if anyone would want to remove a patch that is in the middle of the stack.

An elegant solution included in this release is the feature called "Atomic Replace". It allows creation of so called "Cumulative Patches". They include all wanted changes from all older livepatches and completely replace them in one transition. As a result, the livepatch authors might maintain sources only for one cumulative patch. For more details, see the documentation

2. Core (various)

3. File systems

4. Memory management

5. Block layer

6. Tracing and perf

7. Security

8. Networking

9. Architectures

9.1. ARM

9.2. X86

9.3. POWERPC

9.4. S390

9.5. MIPS

9.6. XTENSA

9.7. M68K

9.8. RISCV

9.9. ARC

10. Drivers

10.1. Graphics

10.2. Storage

10.3. Drivers in the Staging area

10.4. Networking

10.5. Audio

10.6. Tablets, touch screens, keyboards, mouses

10.7. TV tuners, webcams, video capturers

10.8. Universal Serial Bus

10.9. Serial Peripheral Interface (SPI)

10.10. Watchdog

10.11. Serial

10.12. ACPI, EFI, cpufreq, thermal, Power Management

10.13. Real Time Clock (RTC)

10.14. Voltage, current regulators, power capping, power supply

10.15. Pin Controllers (pinctrl)

10.16. Multi Media Card (MMC)

10.17. Memory Technology Devices (MTD)

10.18. Industrial I/O (iio)

10.19. Multi Function Devices (MFD)

10.20. Pulse-Width Modulation (PWM)

10.21. Inter-Integrated Circuit (I2C)

10.22. Hardware monitoring (hwmon)

10.23. General Purpose I/O (gpio)

10.24. DMA engines

10.25. Hardware Random Number Generator (hwrng)

10.26. Cryptography hardware acceleration

10.27. PCI

10.28. Clock

10.29. EDAC (Error Detection And Correction)

10.30. PHY ("physical layer" framework)

10.31. Various

11. List of merges

12. Other news sites

KernelNewbies: Linux_5.1 (last edited 2019-07-29 08:05:22 by VineetGupta)