KernelNewbies:

Linux 5.4 was released on 24 November 2019.

Summary: This release includes the kernel lockdown mode, intended to strengthen the boundary between UID 0 and the kernel; virtio-fs, a high-performance virtio driver which allows a virtualized guest to mount a directory that has been exported on the host; fs-verity, for detecting file tampering, like dm-verity, but works on files rather than block devices; dm-clone, which allows live cloning of dm targets; two new madvise() flags for improved app memory management on Android, support for new Intel/AMD GPUs, support for the exfat file system and removing the experimental status of the EROFS file system; a new haltpoll cpuidle driver and governor that greatly improves performance for virtualized guests wanting to do guest-side polling in the idle loop; and blk-iocost, a I/O cgroup controller that attempts to calculate the cost of I/O more accurately. As always, there are many other new drivers and improvements.

1. Coolest features

1.1. Kernel lockdown mode

This release introduces an optional kernel lockdown feature, intended to strengthen the boundary between UID 0 (root) and the kernel. When enabled, various pieces of kernel functionality are restricted. Applications that rely on low-level access to either hardware or the kernel may cease working as a result - therefore this should not be enabled without appropriate evaluation beforehand. The original purpose of this feature was to honour the anti-tampering protections expected in a secure-boot environment, but it is not tied to that. The majority of mainstream distributions have been carrying variants of this patchset for many years now.

Kernel lockdown is implemented as a Linux Security Module that can be configured in integrity or lockdown mode. If set to integrity, kernel features that allow userland to modify the running kernel are disabled. If set to confidentiality, kernel features that allow userland to extract confidential information from the kernel are also disabled. Configuration can be done at runtime (through securityfs), boot time (via a kernel parameter) or build time (via a kconfig option).

Recommended LWN article: Lockdown as a security module

1.2. virtio-fs, a bridge to share file systems with virtualized guests

This release includes virtio-fs, a FUSE-based virtio driver for guest <-> host file system sharing. It allows a guest to mount a directory that has been exported on the host. Although there are existing technologies that allow this kind of functionality (NFS, virtio-9P), virtio-fs takes advantage of the proximity of VMs to achieve API semantics and performance more like local file systems. This is desirable both for performance and for application compatibility.

For more details, see the documentation, the design documentation and the official web site

1.3. fs-verity, for detecting file modifications

fs-verity is a support layer that filesystems can use to support transparent integrity and authenticity protection of read-only files. It is similar to dm-verity but works on files rather than block devices. Currently, it is supported by the ext4 and f2fs filesystems.

On regular files on filesystems supporting fs-verity, userspace can execute an ioctl that causes the filesystem to build a Merkle tree for the file and persist it to a filesystem-specific location associated with the file. Optionally, it is possible sign files with a key loaded into a keyring. After this, the file is made readonly, and all reads from the file are automatically verified against the file's Merkle tree. Reads of any corrupted data, including mmap reads, will fail. Userspace can efficently retrieve the root hash ("file measurement") with another ioctl, which can be used for a variety of security applications.

For more details, read the documentation

Recommended LWN article: Yet another try for fs-verity

1.4. dm-clone

dm-clone is a device mapper target which produces a one-to-one copy of an existing, read-only source device into a writable destination device: It presents a virtual block device which makes all data appear immediately, and redirects reads and writes accordingly. The main use case of dm-clone is to clone a potentially remote, high-latency, read-only, archival-type block device into a writable, fast, primary-type device for fast, low-latency I/O. The cloned device is visible/mountable immediately and the copy of the source device to the destination device happens in the background, in parallel with user I/O.

For more details, see the documentation

1.5. Support for new AMD/Intel graphics

This release adds support in the amdgpu driver for four new amdgpu products: Navi 12/14, Arcturus and Renoir APU support.

It also includes the first pieces for supporting the future Intel Tiger Lake GPU.

1.6. Two new madvise() flags: MADV_COLD and MADV_PAGEOUT

In order to improve memory usage in some systems (notably, Android), two new madvise() flags have been added: MADV_COLD and MADV_PAGEOUT. These new options complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to gain some free memory space.

MADV_COLD hints the kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use, this can reduce workingset eviction so it ends up increasing performance. In contrast to MADV_FREE, the contents of the region are preserved regardless of subsequent writes to pages. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. Access in the range after successful operation could cause major page fault but never lose the up-to-date contents unlike MADV_DONTNEED

1.7. EROFS and exFAT

This release moves the EROFS file system out of the staging area. First included in Linux 4.19, EROFS is a lightweight read-only file system with a modern design aimed for scenarios which need high-performance read-only requirements, e.g. firmware in mobile phone or Livecds. Recommended LWN article: On-disk format robustness requirements for new filesystems

This release also adds the exFAT file system to the staging area. Recommended LWN article: Examining exFAT

1.8. More efficient polling in virtualized guests with haltpoll

This release includes a haltpoll cpuidle driver and a new matching governor. These two pieces allows guest vcpus to poll for a specified amount of time before halting, which provides the following benefits to host side polling: 1) The POLL flag is set while polling is performed, which allows a remote vCPU to avoid sending an IPI (and the associated cost of handling the IPI) when performing a wakeup. 2) The VM-exit cost can be avoided. The downside of guest side polling is that polling is performed even with other runnable tasks in the host. For more details see the documentation

1.9. More accurate cgroup I/O control with blk-iocost

One challenge of controlling I/O resources is the lack of reliability of trivial cost metrics. Bandwidth and iops can be off by orders of magnitude depending on the device type and I/O pattern. This is challenging for the I/O cgroup controllers: while io.latency provides the capability to comprehensively prioritize and protect IOs depending on the cgroups, its protection is binary - the lowest latency target cgroup is protected at the cost of all others.

This release introduces blk-iocost, an I/O cost model based work-conserving proportional controller. It currently has a simple linear cost model builtin where each I/O is classified as sequential or random and given a base cost accordingly and additional size-proportional cost is added on top. Each I/O is given a fairly reliable cost, and distributes I/O capacity for each cgroup according to their hierarchical weight. For more details, see the cgroup documentation for io.cost.qos and io.cost.model

Recommended LWN article: The io.weight I/O-bandwidth controller (the io.weight name is no longer used)

1.10. Kernel symbol namespacing

In order to support modules, the kernel needs to export the symbols of functions needed by modules. With more than 30k of those symbols existing in the current kernel, managing the symbols is sometimes messy. This feature allows allows subsystem maintainers to partition and categorize their exported symbols into explicit namespaces, which makes easier to control the use of symbols. Module authors are now required to import the namespaces they need.

For more details, read the documentation

Recommended LWN article: Kernel symbol namespacing

2. Core (various)

3. File systems

4. Memory management

5. Block layer

6. Tracing, perf and BPF

7. Virtualization

8. Cryptography

9. Security

10. Power Management

11. Networking

12. Architectures

12.1. ARM

12.2. PowerPC

12.3. x86

12.4. S390

12.5. MIPS

12.6. RISC-V

12.7. User Mode Linux

12.8. PA-RISC

12.9. IA-64

12.10. Xtensa

12.11. MicroBlaze

12.12. ARC

13. Drivers

13.1. Graphics

13.2. Storage

13.3. Drivers in the Staging area

13.4. Networking

13.5. Audio

13.6. Tablets, touch screens, keyboards, mouses

13.7. TV tuners, webcams, video capturers

13.8. Universal Serial Bus

13.9. Serial Peripheral Interface (SPI)

13.10. Watchdog

13.11. Serial

13.12. CPU Frequency Scaling

13.13. Device Voltage and Frequency Scaling

13.14. Real Time Clock (RTC)

13.15. Voltage, current regulators, power capping, power supply

13.16. Pin Controllers (pinctrl)

13.17. Multi Media Card (MMC)

13.18. Memory Technology Devices (MTD)

13.19. Industrial I/O (iio)

13.20. Multi Function Devices (MFD)

13.21. Pulse-Width Modulation (PWM)

13.22. Inter-Integrated Circuit (I2C + I3C)

13.23. Hardware monitoring (hwmon)

13.24. General Purpose I/O (gpio)

13.25. LEDs

13.26. DMA engines

13.27. Cryptography hardware acceleration

13.28. PCI

13.29. Non-Transparent Bridge (NTB)

13.30. Thunderbolt

13.31. Clock

13.32. PHY ("physical layer" framework)

13.33. EDAC (Error Detection And Correction)

13.34. Various

14. List of Pull Requests

15. Other news sites

KernelNewbies: Linux_5.4 (last edited 2020-01-28 23:08:41 by VineetGupta)