Linux 4.20 was released on Sun, 23 Dec 2018.
Summary: This release includes support for a new way to measure the system load; it adds support for future AMD Radeon Picasso and Raven2 and enables non-experimental support for Radeon Vega20; it adds support for the C-SKY CPU architecture and the x86 Hygon Dhyana CPUs; a TLB microoptimization brings a small performance win in some workloads; TCP has switched to a "Early Departure Time" model; a mechanism to turn memfd regions into dma-buf allows qemu to improve virtualized graphics performance; it also includes the latest round of fixes for CPU security bugs; and it also adds many new drivers and other improvements.
Contents
-
Prominent features
- Pressure stall information for better overview of system load
- AMD Radeon Picasso + Raven2 support, and Vega20 enablement
- More efficient virtualized graphics
- New chinese CPUs: C-SKY architecture and Hygon Dhyana x86 CPUs
- TCP: switch to Early Departure Time model
- Make lazy TLB even lazier
- Another round of fixes for CPU security bugs
- Core (various)
- File systems
- Memory management
1. Prominent features
1.1. Pressure stall information for better overview of system load
When systems are overcommitted and resources become contended, it's hard to tell exactly the impact this has on workload productivity, or how close the system is to lockups and OOM kills. In particular, when machines work multiple jobs concurrently, the impact of overcommit in terms of latency and throughput on the individual job can be enormous. In order to maximize hardware utilization without sacrificing individual job health or risk complete machine lockups, this release adds pressure stall information, a way to quantify resource pressure in the system.
Under /proc/pressure/ (and also inside each cgroup), several files expose the percentage of time the system is stalled on CPU, memory, or IO, respectively. These percentages of walltime can be thought of as pressure percentages, and they give a general sense of system health and productivity loss incurred by resource overcommit. They can also indicate when the system is approaching lockup scenarios and OOMs.
Documentation: Documentation/accounting/psi.txt
Recommended LWN article: Tracking pressure-stall information
1.2. AMD Radeon Picasso + Raven2 support, and Vega20 enablement
This release adds support for Raven2 APU and Picasso APUs. Also, support for Vega20 is no longer considered experimental.
1.3. More efficient virtualized graphics
This release includes udmabuf, a device that can turn memfd regions into dma-buf. It allows qemu to create dmabufs for the vga framebuffer or virtio-gpu resources. Then they can be passed around to display those guest things on the host. To spice client for classic full framebuffer display, and hopefully some day to wayland server for seamless guest window display.
1.4. New chinese CPUs: C-SKY architecture and Hygon Dhyana x86 CPUs
This release adds support for new architecture: C-SKY architecture, a 32 bit chinese CPU ISA
There is also support for Hygon Dhyana Family 18h, a new x86 processor coming from the AMD–Chinese joint venture
1.5. TCP: switch to Early Departure Time model
Internet grew following a AFAP ("As Fast As Possible") idea. Due to the constraints of modern hardware, this model is not optimal any more. This release switches the TCP stack to Early Departure Time model
See:
1.6. Make lazy TLB even lazier
As TLB flushes are extremely expensive, a technique called lazy TLB is employed by Linux which avoids unnecessary TLB flushes by processes which do not access the userspace page tables as the kernel portion of the address space is always visible. This release tries to use lazy TLB used even lazier: On most workloads, the number of context switches far exceeds the number of TLB flushes sent. Optimizing the context switches, by always using lazy TLB mode, speeds up those workloads. This reduces total CPU use in the system by about 1-2% for a memcache workload on two socket systems, and by about 1% for a heavily multi-process netperf between two systems.
1.7. Another round of fixes for CPU security bugs
- x86: Remedy the overhead of STIBP/IBPB with per task indirect branch speculation control
- x86: Harden spectre v2 userspace-userspace protection
- x86: The "minimal retpoline" support is no longer useful and has been removed
- arm64: Support for the new ARMv8.5 PSTATE.SSBS bit which can be used to mitigate Spectre-v4 dynamically without trapping to EL3 firmware
Recommended LWN articles:
2. Core (various)
fanotify super block marks: It implement the new mark type FAN_MARK_FILESYSTEM to enable monitoring of filesystem events on all filesystem objects regardless of the mount where event was generated commit, commit, commit
New fanotify event info API: In order to identify which thread triggered the event in a multi-threaded program, add the FAN_REPORT_TID flag in fanotify_init to opt-in for reporting the event creator's thread id information commit, commit, commit, commit, commit, commit, commit
(FEATURED) psi: pressure stall information for CPU, memory, and IO commit, commit, commit, commit, commit, commit, commit, commit, commit
init: add root=PARTLABEL=<name> support commit
- task scheduler
Android binder: Add BINDER_GET_NODE_INFO_FOR_REF ioctl. It allows the context manager to retrieve information about nodes that it holds a reference to, such as the current number of references to those nodes commit
gcc-plugins: Add the STACKLEAK plugin, which erases the kernel stack before returning from syscalls. That reduces the information which kernel stack leak bugs can reveal and blocks some uninitialized stack variable attacks commit, commit, commit
qspinlock_stat: Count instances of nested lock slowpaths commit
- Building
3. File systems
- XFS
Support returning partial reflink results. For example, if userspace sends a 1GB clone request and we run out of space halfway through, we at least can tell userspace that we completed 512M of that request like a regular write commit
- BTRFS
qgroups: Reduce dirty extents for metadata. It significantly improves balancing with quota enabled commit, commit, commit, commit, commit, commit
Improve btrfs btree locking modes. It significantly improves performance: more files/sec in fsmark, better perf on multi-threaded workloads (filebench, dbench), fewer context switches, better latency, etc commit
- CIFS
Compounding support, it aggregates several operations to avoid round trips commit, commit, commit, commit, commit, commit, commit, commit, commit, commit
Implement direct I/O: The file system passes the I/O data directly from user-space buffer to the transport layer, when file system is mounted with cache-none commit, commit, commit
Add ioctl for QUERY_INFO passthrough to userspace. It allows userspace tools to query the raw info levels for cifs files and process the response in userspace commit
Add support for ioctl on directories commit
Show number of current open files in /proc/fs/cifs/Stats commit
- F2FS
- FUSE
- GFS2
Add support for ioctl getfslabel commit
- NFSD
- 9P
Add mount option for lock retry interval commit
- AFS
- CEPH
- OCFS2
Support partial clone range and dedupe range commit
- OVERLAYFS
Automatically enable redirect_dir on metacopy=on commit
4. Memory management
XArray (recommended LWN article The XArray data structure) commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit
Introduce kmalloc-reclaimable caches which can be reclaimed under memory pressure (typically through a shrinker), these are shown as "kmalloc-rcl-SIZE" in /proc/slabinfo. This makes the slab pages accounted as NR_SLAB_RECLAIMABLE in vmstat, which is reflected also the MemAvailable counter in /proc/meminfo and in overcommit decisions. A new KReclaimable counter is added to that file show the sum of all reclaimable kernel allocations, including slab. The NR_INDIRECTLY_RECLAIMABLE_BYTES counter is repurposed commit, commit, commit, commit, commit, commit
zero-seek shrinkers: implement a "zero-seek" setting for shrinkers that results in a target ratio of 0:1 between their objects and IO-backed caches. This allows such virtual caches to grow when memory is available (they do cache/avoid CPU work after all), but effectively disables them as soon as IO-backed objects are under pressure commit
workingset: add vmstat counter for shadow nodes commit
kmemleak: add module param to print warnings to dmesg commit
contiguous page allocator: Improve large page preservation handling commit, [[https://git.kern