KernelNewbies:

Linux 5.9 has been released on Sun, 11 Oct 2020.

Summary: This release implements better management of anonymous (malloc'ed) memory; a new cgroup slab controller that improves slab utilization by allowing memory cgroups to share slab memory; support for proactive memory defragmentation; CPU Capacity awareness for the deadline scheduling class; support for running BPF programs on socket lookups; new close_range() system call for easier closing of entire ranges of file descriptors, support for FSGSBASE x86 instructions that provide faster context switching, NFS support for extended attributes; and support for ZSTD compressed kernel, ramdisk and initramfs. As always, there are many other new drivers and improvements.

Contents

  1. Prominent Features
    1. Better management of anonymous memory
    2. New cgroup slab controller shares slab memory
    3. Proactive memory compaction
    4. New close_range() system call for easier closing of file descriptors
    5. Support for running BPF programs on socket lookups
    6. CPU Capacity awareness for the deadline scheduling class
    7. Faster context switch with supports FSGSBASE x86 instructions
    8. NFS support for extended attributes
    9. Support for ZSTD compressed kernel, ramdisk and initramfs
  2. Core (various)
  3. File systems
  4. Memory management
  5. Block layer
  6. Tracing, perf and BPF
  7. Virtualization
  8. Security
  9. Networking
  10. Architectures
    1. ARM
    2. PowerPC
    3. x86
    4. RISC-V
    5. MIPS
    6. C-SKY
    7. Xtensa
    8. S390
    9. SH
    10. SPARC
    11. UNICORE32
    12. OpenRISC
  11. Drivers
    1. Graphics
    2. Power Management
    3. Storage
    4. Drivers in the Staging area
    5. Networking
    6. Audio
    7. Tablets, touch screens, keyboards, mouses
    8. TV tuners, webcams, video capturers
    9. Universal Serial Bus / Thunderbolt
    10. Serial Peripheral Interface (SPI)
    11. Watchdog
    12. Serial
    13. CPU Frequency scaling
    14. Device Voltage and Frequency Scaling
    15. Voltage, current regulators, power capping, power supply
    16. Real Time Clock (RTC)
    17. Pin Controllers (pinctrl)
    18. MultiMediaCard (MMC)
    19. Memory Technology Devices (MTD)
    20. Industrial I/O (iio)
    21. Multi Function Devices (MFD)
    22. Pulse-Width Modulation (PWM)
    23. Inter-Integrated Circuit (I2C + I3C)
    24. Hardware monitoring (hwmon)
    25. General Purpose I/O (gpio)
    26. LEDs
    27. DMA engines
    28. Hardware Random Number Generator (hwrng)
    29. Cryptography hardware acceleration
    30. PCI
    31. Clock
    32. PHY ("physical layer" framework)
    33. Memory Controller Drivers
    34. Firmware Drivers
    35. Remote Processors
    36. Various
  12. List of Pull Requests
  13. Other news sites

1. Prominent Features

1.1. Better management of anonymous memory

This release implements better workload detection and protection of anonymous memory (memory that is not file-backed, ie. malloc'ed memory). The Linux kernel manages the memory of anonymous memory placing its pages in either the active list or inactive list. Under memory pressure, unused pages are moved from the active to the inactive list and unmapped, giving them a chance of being referenced again (aka: soft fault) before being moved to swap, if there is more pressure.

In the previous implementation, newly created or swap-in pages were placed on the active list, which could force actively used pages to the inactive list. In this release, newly created or swap-in anonoymous pages are started in the inactive list (thus protecting existing hot workloads), and only promoted to the active list when they are referenced enough. Aditionally, because this change can also cause newly created or swap-in anonymous pages to swap-out existing pages in the inactive list, the existing workingset detection mechanisms have been extended to deal with the anonymous LRU list to make more optimal decisions.

1.2. New cgroup slab controller shares slab memory

The cgroup slab memory controller was based on the idea of replicating slab allocator internals for each memory cgroup, so those cgroups didn't share slab memory, which lead to low slab utilization and higher slab memory usage. The slab controller used to be an opt-int feature, but today it's enabled by default in the memory controller, and modern systems with systemd create many cgroups, so these ineffiencies affect many people.

This release incorporats a new cgroup slab memory controller that allows to share slab memory between memory cgroups. For Facebook, it saved significant amount of memory, measured from high hundreds of MBs to single GBs per host; on average the size of slab memory was reduced by 35-45%. Desktop systems also benefit: on a 16GB Fedora system, the new slab controller saves ~45-50% of slab memory, measured just after loading of the system.

1.3. Proactive memory compaction

Huge Pages (ie. pages bigger than 4KB on x86) are a processor feature that can improve performance due to reduced TLB usage. Making use of these pages requires having large amounts of contiguous free memory, which can difficult to obtain when memory is heavily fragmented. Linux supports memory compaction (ie. defragmentation), but it is only triggered when a huge page needs to be allocated, which can take time and hence hurts allocation latency. This release adds support for proactive memory compaction, that is, automatically triggering memory compaction before doing any allocation, so that future allocations can succeed faster.

Recommended LWN article: Proactive compaction for the kernel

1.4. New close_range() system call for easier closing of file descriptors

This release incorporates a new system call, close_range(2). It allows to efficiently close a range of file descriptors up to all file descriptors of a calling task. Eg, close_range(3, ~0U); will close all descriptors past stderr. It turns out, quite a bunch of projects need to do exactly that: service managers, libcs, container runtimes, programming language runtimes/standard libraries (Rust/Python). This system call has been coordinated with FreeBSD, so it is also available there.

1.5. Support for running BPF programs on socket lookups

As with every new version, there are many improvements to BPF. An interesting new feature is a new BPF program type named BPF_PROG_TYPE_SK_LOOKUP, which runs when transport layer is looking up a listening socket for a new connection request (TCP), or when looking up an unconnected socket for a packet (UDP). This serves as a mechanism to overcome the limits of the bind() API. Two use-cases driving this work are: 1) steer packets destined to an IP range, fixed port to a single socket, 2) steer packets destined to an IP address, any port to a single socket.

1.6. CPU Capacity awareness for the deadline scheduling class

Since Linux 3.14 the Linux task scheduler supports a deadline scheduling class, designed around real-time concepts for applications that need strict time requirements. This scheduling class was not aware of the existence of heterogeneous platforms where CPUs have not the same performance (ie. ARM big.LITTLE), which leads to wrong scheduling decisions. This release makes the deadline class aware of the capacity of each CPU.

Recommended LWn article: Capacity awareness for the deadline scheduler

1.7. Faster context switch with supports FSGSBASE x86 instructions

The FSGSBASE instructions are an Intel feature that has been available for a long time. They allow direct access to the FS and FS segment base registers. In addition to benefits to applications, performance improvements to the OS context switch code are possible by making use of these instructions

Recommended LWN article: A possible end to the FSGSBASE saga

1.8. NFS support for extended attributes

This release incorporates support for extended attributes (RFC 8276), which bridges one of the most relevant gaps in NFS.

1.9. Support for ZSTD compressed kernel, ramdisk and initramfs

This release adds support for a ZSTD-compressed kernel, ramdisk, and initramfs in the kernel boot process (ZSTD-compressed ramdisk and initramfs are supported on all architectures, the ZSTD-compressed kernel is only hooked up to x86 for now). ZSTD offers good compression rates and very high decompression speeds. When Facebook switched from a xz compressed initramfs to a zstd compressed initramfs decompression time shrunk from 12 seconds to 3 seconds. When they switched from a xz compressed kernel to a zstd compressed kernel they saved 2 seconds of boot time.

2. Core (various)

3. File systems

4. Memory management

5. Block layer

6. Tracing, perf and BPF

7. Virtualization

8. Security

9. Networking

10. Architectures

10.1. ARM

10.2. PowerPC

10.3. x86

10.4. RISC-V

10.5. MIPS

10.6. C-SKY

10.7. Xtensa

10.8. S390

10.9. SH

10.10. SPARC

10.11. UNICORE32

10.12. OpenRISC

11. Drivers

11.1. Graphics

11.2. Power Management

11.3. Storage

11.4. Drivers in the Staging area

11.5. Networking

11.6. Audio

11.7. Tablets, touch screens, keyboards, mouses

11.8. TV tuners, webcams, video capturers

11.9. Universal Serial Bus / Thunderbolt

11.10. Serial Peripheral Interface (SPI)

11.11. Watchdog

11.12. Serial

11.13. CPU Frequency scaling

11.14. Device Voltage and Frequency Scaling

11.15. Voltage, current regulators, power capping, power supply

11.16. Real Time Clock (RTC)

11.17. Pin Controllers (pinctrl)

11.18. MultiMediaCard (MMC)

11.19. Memory Technology Devices (MTD)

11.20. Industrial I/O (iio)

11.21. Multi Function Devices (MFD)

11.22. Pulse-Width Modulation (PWM)

11.23. Inter-Integrated Circuit (I2C + I3C)

11.24. Hardware monitoring (hwmon)

11.25. General Purpose I/O (gpio)

11.26. LEDs

11.27. DMA engines

11.28. Hardware Random Number Generator (hwrng)

11.29. Cryptography hardware acceleration

11.30. PCI

11.31. Clock

11.32. PHY ("physical layer" framework)

11.33. Memory Controller Drivers

11.34. Firmware Drivers

11.35. Remote Processors

11.36. Various

12. List of Pull Requests

13. Other news sites

KernelNewbies: Linux_5.9 (last edited 2020-12-30 18:03:12 by diegocalleja)