KernelNewbies:

Linux 5.7 was released on Sun, 31 May 2020.

Summary: This release adds: support for the notion of Thermal Pressure, which lets the task scheduler to take better scheduling decisions in the face of CPU frequency changes; support for frequency invariant scheduler accounting on x86 CPUs, which makes x86 perform better with the schedutil governor; a new and better exFAT file system implementation; support for a x86 feature that allows to detect atomic operations that span cache lines; ARM Pointer Authentication support for kernel code, which helps to prevent security issues; support for spawning processes with clone() into cgroups; write protection support in userfaultfd(), which is equivalent to (but faster than) using mprotect(2) and a SIGSEGV signal handler; and a BPF-based Linux Security Module which allows for more dynamic security auditing. As always, there are many other new drivers and improvements.

1. Prominent features

1.1. Thermal Pressure in the task scheduler

When a CPU is overheating, the thermal governor will usually cap the maximum CPU frequency. This, however, decreases the maximum available compute capacity of that CPU. If the task scheduler is not immediately aware of those frequency changes, it will take wrong scheduling decisions assuming that the CPU has greater computing capacity than it actually has. This release introduces the notion of Thermal Pressure, which makes the task scheduler more aware of frequency capping, and leads to better task placement among available cpus in event of overheating, which in turn leads to better performance numbers.

Recommended LWN article: Telling the scheduler about thermal pressure

1.2. Frequency invariant scheduler accounting on x86 CPUs

Suppose a CPU has two frequencies: 500 and 1000 MHz. When running a task that would consume 1/3rd of a CPU at 1000 MHz, it would appear to consume approximately 2/3rd when running at 500 MHz, giving the false impression this CPU is almost at capacity, even though it can go faster. Without frequency scale-invariance, tasks look larger just because the CPU is running slower. This makes the schedutil cpufreq governor -which uses scheduler-provided CPU utilization information as input for making its decisions- take wrong decisions and perform worse.

This release implements frequency invariant scheduler accounting on (some) x86 CPUs. This makes capacity estimates more precise and keeps tasks on the same CPU better in the face of dynamic voltage and frequency scaling. Because of the improved behavior, the intel_pstate driver defaults now to using the schedutil governor.

Recommended LWN article: Frequency-invariant utilization tracking for x86

1.3. New exFAT file system

Linux 5.4 added an implementation for the exFAT file system. This file system has been dropped; instead, an alternative implementation coming from Samsung has been found to have better quality, and has been merge in this release as a substitute of the previous implementation

1.4. Split lock detection

A split-lock occurs when an atomic CPU instruction operates on data that spans two cache lines. This is much slower than an atomic operation within a cache line, and it disrupts performance on other cores. This release adds support for a x86 features that allows to detect split locks. Using the split_lock_detect boot command line, it is possible to warn or even send SIGBUS to applications that make use of split locks.

Recommended LWN article: Developers split over split-lock detection

1.5. ARM Kernel Pointer Authentication support

Linux 5.0 added support for the ARMv8.3 Pointer Authentication extension, which uses a Pointer Authentication Code to determine whether pointers have been modified unexpectedly. This prevents many security vulnerabilities, but this support was only added for user space code. This release adds support for the arm64 kernel, which should help protect the kernel against attacks using return-oriented programming.

Recommended LWN article: ARM pointer authentication

1.6. userfaultfd() write protection support

This release adds to userfaultfd(2) -a system call added in Linux 4.3 to let a process handle page faults in userspace- the support for write protection. This means that attempts to write in areas of the address space specified with userfaultfd() can be handled in userspace. This is equivalent to (but faster than) using mprotect(2) and a SIGSEGV signal handler. hugetlbfs/shmem is not supported in this release. For more details see the documentation.

Recommended LWN article: Write-protect for userfaultfd().

1.7. bpf-lsm: A BPF-based Linux Security Module

The current kernel infrastructure for providing telemetry (Audit, Perf etc.) is disjoint from access enforcement (i.e. LSMs). Augmenting the information provided by audit requires kernel changes to audit, its policy language and user-space components. Furthermore, building a MAC policy based on the newly added telemetry data requires changes to various LSMs and their respective policy languages. This release adds a new LSM allows BPF programs to be attached to LSM hooks, which facilitates a unified and dynamic (not requiring re-compilation of the kernel) audit and MAC policy.

Recommended LWN article: KRSI — the other BPF security module.

1.8. clone(): Allow spawning processes into cgroups

This release adds support in clone(2) for creating a process in a different cgroup than its parent, which means that callers can limit and account processes and threads right from the moment they are spawned. A service manager can directly spawn new services into dedicated cgroups; a process can be directly created in a frozen cgroup and will be frozen as well; the initial accounting jitter experienced by process supervisors and daemons is eliminated; threaded applications or even thread implementations can choose to create a specific cgroup layout where each thread is spawned directly into a dedicated cgroup.

Recommended LWN article: Cloning into a control group

1.9. Improved perf cgroup profiling

In the past, perf could only profile tasks in a specific cgroup and there was no way to know to which cgroup the current sample belonged to. In this release, perf incorporates cgroup information into each sample, which makes possible to profile more than one cgroup and used a cgroup sort key in perf report

2. Core (various)

3. File systems

4. Memory management

5. Block layer

6. Tracing, perf and BPF

7. Cryptography

8. Virtualization

9. Security

10. Networking

11. Architectures

11.1. ARM

11.2. x86

11.3. RISC-V

11.4. S390

11.5. PowerPC

11.6. C-Sky

11.7. MIPS

11.8. ARC

12. Drivers

12.1. Graphics

12.2. Power management

12.3. Storage

12.4. Drivers in the Staging area

12.5. Networking

12.6. Audio

12.7. Input devices: Tablets, touch screens, keyboards, mouses

12.8. TV tuners, webcams, video capturers

12.9. Universal Serial Bus

12.10. Serial Peripheral Interface (SPI)

12.11. Watchdog

12.12. Serial

12.13. CPU frequency scaling

12.14. Voltage, current regulators, power capping, power supply

12.15. Real Time Clock (RTC)

12.16. Pin Controllers (pinctrl)

12.17. Multi Media Card (MMC)

12.18. Memory Technology Devices (MTD)

12.19. Industrial I/O (iio)

12.20. Multi Function Devices (MFD)

12.21. Pulse-Width Modulation (PWM)

12.22. Inter-Integrated Circuit (I2C + I3C)

12.23. Hardware monitoring (hwmon)

12.24. General Purpose I/O (gpio)

12.25. LEDs

12.26. DMA engines

12.27. Cryptography hardware acceleration

12.28. PCI

12.29. Clock

12.30. PHY ("physical layer" framework)

12.31. EDAC (Error Detection And Correction)

12.32. Modem Host Interface (MHI) Bus

12.33. Various

13. List of pull requests

14. Other news sites

KernelNewbies: Linux_5.7 (last edited 2020-06-01 19:01:08 by diegocalleja)