KernelNewbies:

Linux 4.8 has been released on Sun, 2 Oct 2016.

Shameless spam: LWN.net has published its coverage about the 2016 Linux Storage, Filesystem, and Memory-Management Summit.

Summary: This release adds support for using Transparent Huge Pages in the page cache, support for eXpress Data Path, a high performance, programmable network data path; support for XFS reverse mappings which is the building block of several upcoming features; stricter checking of memory copies with hardened usercopy; support IPv6 security labeling (CALIPSO, RFC 5570); GCC plugin support; virtio-vsocks for easier guest/host communication; the new Vegas TCP congestion control algorithm; the documentation has been moved to the reStructuredText format, and many other improvements and new drivers.

1. Prominent features

1.1. Support for using Transparent Huge Pages in the page cache

Huge pages allow to use pages bigger than 4K (in x86), when the system makes use of those pages automatically without user intervention we call it "transparent". Until now, Linux didn't support the use of transparent huge pages in the page cache (this is the cache of pages used for backing file system data). This release adds support for transparent huge pages in the page cache in tmpfs/shmem (other filesystems may be added in the future).

You can control hugepage allocation policy in tmpfs with mount option huge=. It can have following values: always (attempt to allocate huge pages every time it needs a new page); never (do not allocate huge pages - this is the default); within_size (only allocate huge page if it will be fully within i_size, also respect fadvise()/madvise() hints); advise (only allocate huge pages if requested with fadvise()/madvise());

There's also sysfs knob to control hugepage allocation policy for internal shmem mount: /sys/kernel/mm/transparent_hugepage/shmem_enabled. The mount is used for SysV SHM, memfds, shared anonymous mmaps (of /dev/zero or MAP_ANONYMOUS), GPU drivers' DRM objects, Ashmem. In addition to policies listed above, shmem_enabled allows two further values: deny (for use in emergencies, to force the huge option off from all mounts); force (force the huge option on for all - useful for testing).

Recommended LWN article: Two transparent huge page cache implementations

Code: commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32

1.2. Support for eXpress Data Path

XDP or eXpress Data Path provides a high performance, programmable network data path in the Linux kernel. XDP provides bare metal packet processing at the lowest point in the software stack. Much of the huge speed gain comes from processing RX packet-pages directly out of drivers RX ring queue, before any allocations of meta-data structures like SKBs occurs. Its properties are:

Use cases include pre-stack processing like filtering to do DOS mitigation; forwarding and load balancing; batching techniques such as in Generic Receive Offload; flow sampling, monitoring; ULP processing (e.g. message delineation).

Recommended LWN article: Early packet drop — and more — with BPF

IO Visor page: https://www.iovisor.org/technology/xdp

Prototype docs: prototype-kernel.readthedocs.io

PDF: Express_Data_Path.pdf

Code: (merge), commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12

1.3. XFS reverse mapping

Reverse mapping allows XFS to track the owner of a specific block on disk precisely. It is implemented as a set of btrees (one per allocation group) that track the owners of allocated extents. Effectively it is a "used space tree" that is updated when the file system allocates or free extents. i.e. it is coherent with the free space btrees that are already maintained and never overlaps with them.

This reverse mapping infrastructure is the building block of several upcoming features - reflink, copy-on-write data, dedupe, online metadata and data scrubbing, highly accurate bad sector/data loss reporting to users, and significantly improved reconstruction of damaged and corrupted filesystems. There's a lot of new stuff coming along in the next couple of cycles, and it all builds in the rmap infrastructure. As such, it's a huge chunk of new code with new on-disk format features and internal infrastructure. It warns at mount time as an experimental feature and that it may eat data (as XFS does with all new on-disk features until they stabilise). XFS maintainers have not released userspace suport for it yet - userspace support currently requires download from Darrick's xfsprogs repo and build from source, so the access to this feature is really developer/tester only at this point. Initial userspace support will be released at the same time kernel with this code in it is released.

Code: (merge)

1.4. Stricter checking of memory copies with hardened usercopy

This is a security feature ported from Grsecurity's PAX_USERCOPY. It checks for obviously wrong memory regions when copying memory to/from the kernel (via copy_to_user() and copy_from_user() functions) by rejecting memory ranges that are larger than the specified heap object, span multiple separately allocates pages, are not on the process stack, or are part of the kernel text. This kills entire classes of heap overflow exploits and similar kernel memory exposures. Performance impact is negligible.

Recommended LWN article: Hardened usercopy

Code: commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12

1.5. GCC plugin support

Like this previous one, this is a feature ported from Grsecurity. It enables the use of GCC plugins, which are loadable compiler modules that can be used for runtime instrumentation and static analysis, allowing to analyse, change and add further code during compilation. Grsecurity uses these mechanisms to improve security. Two plugins are included in this release: sancov, a plugin used as a helper for the kcov feature; and the Cyclomatic complexity plugin, which calculates the cyclomatic complexity of a function.

Recommended LWN article: Kernel building with GCC plugins

Code: commit 1, 2, 3, 4

1.6. virtio-vsocks for easier guest/host communication

This release adds virtio-vsock, which provides AF_VSOCK sockets that allow applications in the guest and host to communicate. This can be used to implement hypervisor services and guest agents (like qemu-guest-agent or SPICE vdagent). Unlike virtio-serial, virtio-vsock supports the POSIX Sockets API so existing networking applications require minimal modification. The Sockets API allows N:1 connections so multiple clients can connect to a server simultaneously. The device has an address assigned automatically so no configuration is required inside the guest.

Code: commit, commit, commit, commit

1.7. Support IPv6 security labeling (CALIPSO, RFC 5570)

This release implements RFC 5570 - Common Architecture Label IPv6 Security Option (CALIPSO). Its goal is to set Multi-Level Secure (MLS) sensitivity labels on IPv6 packets using a hop-by-hop option. It is intended for use only within MLS networking environments that are both trusted and trustworthy. CALIPSO is very similar to its IPv4 cousin CIPSO and much of this feature is based on that code. To use CALIPSO you'll need some patches to netlabel-tools that are available on the 'working-calipso-v3' branch at: https://github.com/netlabel/netlabel_tools.

Code: commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19

1.8. Add New Vegas TCP congestion control

This release adds a new congestion control, TCP New Vegas is a major update to TCP-Vegas. Like Vegas, NV is a delay based congestion avoidance mechanism for TCP. Its filtering mechanism is similar: it uses the best measurement in a particular period to detect and measure congestion. It develop to coexist with modern networks where links bandwidths are 10 Gbps or higher, where the RTTs can be 10’s of microseconds, where interrupt coalescence and TSO/GSO can introduce noise and nonlinear effects, etc.

A description of TCP-NV, including implementation details as well as experimental results, can be found at http://www.brakmo.org/networking/tcp-nv/

Code: commit

1.9. Documentation moved to the reStructuredText format

In an attempt to modernize it, the kernel documentation will be converted to the Sphinx system, which uses reStructuredText as its markup language.

Documentation: Documentation/kernel-documentation.rst

Recommended LWN articles: Kernel documentation with Sphinx, part 1: how we got here, Kernel documentation with Sphinx, part 2: how it works

Code: (merge)

2. Core (various)

3. File systems

4. Memory management

5. Block layer

6. Cryptography

7. Tracing and perf tool

8. Virtualization

9. Security

10. Networking

11. Architectures

12. Drivers

12.1. Graphics

12.2. Storage

12.3. Staging

12.4. Networking

12.5. Audio

12.6. Tablets, touch screens, keyboards, mouses

12.7. TV tuners, webcams, video capturers

12.8. USB

12.9. Serial Peripheral Interface (SPI)

12.10. Watchdog

12.11. Serial

12.12. ACPI, EFI, cpufreq, thermal, Power Management

12.13. Real Time Clock (RTC)

12.14. Voltage, current regulators, power capping, power supply

12.15. Rapid I/O

12.16. Pin Controllers (pinctrl)

12.17. Memory Technology Devices (MTD)

12.18. Multi Media Card

12.19. Industrial I/O (iio)

12.20. Multi Function Devices (MFD)

12.21. Pulse-Width Modulation (PWM)

12.22. Inter-Integrated Circuit (I2C)

12.23. Hardware monitoring (hwmon)

12.24. General Purpose I/O (gpio)

12.25. Clocks

12.26. Hardware Random Number Generator

12.27. Various

13. List of merges

14. Other news sites

KernelNewbies: Linux_4.8 (last edited 2017-12-30 01:30:25 by localhost)