Size: 30161
Comment:
|
← Revision 38 as of 2017-12-30 01:30:23 ⇥
Size: 55423
Comment: converted to 1.6 markup
|
Deletions are marked like this. | Additions are marked like this. |
Line 5: | Line 5: |
Summary: Linux 4.4 has not been released |
Linux 4.4 [[https://lkml.org/lkml/2016/1/10/305|has been released]] on Sun, 10 Jan 2016. Summary: This release adds support for 3D support in virtual GPU driver, which allows 3D hardware-accelerated graphics in virtualization guests; loop device support for Direct I/O and Asynchronous I/O, which saves memory and increases performance; support for Open-channel SSDs, which are devices that share the responsibility of the Flash Translation Layer with the operating system; the TCP listener handling is completely lockless and allows for faster and more scalable TCP servers; journalled RAID5 in the MD layer which fixes the RAID write hole; eBPF programs can now be run by unprivileged users, they can be made persistent, and perf has added support for eBPF programs aswell; a new mlock2() syscall that allows users to request memory to be locked on page fault; and block polling support for improved performance in high-end storage devices. There are also new drivers and many other small improvements. <<TableOfContents>> |
Line 8: | Line 11: |
== TCP listener handling completely lockless, which make possible faster and more scalable TCP servers == In this release, and as a result from an effort that started two years ago, the TCP implementation has been refactored to make the TCP listener fast path completely lockless. During tests, a server was able to process 3,500,000 SYN packets per second on one listener and still have available cpu cycles - about 2 to 3 order of magnitude what it was possible before. SO_REUSEPORT has also been extended (see Networking section) to add proper cpu/numa affinities, so that heavy duty TCP servers can get proper siloing thanks to multi-queues NICs. Code: [https://git.kernel.org/torvalds/c/4d54d86546f62c7c4a0fe3b36a64c5e3b98ce1a9 commit], [https://git.kernel.org/torvalds/c/e6934f3ec00b04234acb24a1a2c28af59763d3b5 commit], [https://git.kernel.org/torvalds/c/c3fc7ac9a0b978ee8538058743d21feef25f7b33 commit] |
|
Line 19: | Line 15: |
Code: [https://git.kernel.org/torvalds/c/ab1cb278bc7027663adbfb0b81404f8398437e11 commit], [https://git.kernel.org/torvalds/c/2e5ab5f379f96a6207c45be40c357ebb1beb8ef3 commit], [https://git.kernel.org/torvalds/c/5b5e20f421c0b6d437b3dec13e53674161998d56 commit], [https://git.kernel.org/torvalds/c/bc07c10a3603a5ab3ef01ba42b3d41f9ac63d1b6 commit], [https://git.kernel.org/torvalds/c/e03a3d7a94e2485b6e2fa3fb630b9b3a30b65718 commit] | Code: [[https://git.kernel.org/torvalds/c/ab1cb278bc7027663adbfb0b81404f8398437e11|commit]], [[https://git.kernel.org/torvalds/c/2e5ab5f379f96a6207c45be40c357ebb1beb8ef3|commit]], [[https://git.kernel.org/torvalds/c/5b5e20f421c0b6d437b3dec13e53674161998d56|commit]], [[https://git.kernel.org/torvalds/c/bc07c10a3603a5ab3ef01ba42b3d41f9ac63d1b6|commit]], [[https://git.kernel.org/torvalds/c/e03a3d7a94e2485b6e2fa3fb630b9b3a30b65718|commit]] |
Line 23: | Line 20: |
virtio-gpu is a driver for virtualization guests that allows to use the host graphics card efficiently. In this release, it allows the virtualization guest to use the capabilities of the host GPU to accelerate 3D rendering. In practice, this means that a virtualized linux guest can run a opengl game while using the GPU acceleration capabilities of the host, as show in [https://www.youtube.com/watch?v=ONFGnUaln-4 this] or [https://www.youtube.com/watch?v=ZuuF092RDDc this] video. This also requires running [http://wiki.qemu.org/ChangeLog/2.5#virtio QEMU 2.5]. [Outdated project page https://virgil3d.github.io/] [https://www.youtube.com/watch?v=rPeMrmeLTig 44m linux.conf talk about the project] Code: [https://git.kernel.org/torvalds/c/3187567222178d4b3742e88242f7abb3c3b7a215 commit] == Journalled RAID5 MD support == This release adds journalled raid5 support to the MD (RAID/LVM) layer. With a journal device configured (typically NVRAM or SSD), Data/parity writing to raid array first writes to the log, then write to raid array disks. If crash happens, we can recovery data from the log. This can speed up raid resync and fixes RAID5 write hole issue - a crash during degraded operations cannot result in data corruption. In future releasees the journal will also be used to improve performance and latency Code: [https://git.kernel.org/torvalds/c/ac322de6bf5416cb145b58599297b8be73cd86ac merge] == Unprivileged eBPF == eBPF programs got its own syscall in [http://kernelnewbies.org/Linux_3.18#head-ead251efb6bbdbe2922e7c6bd0c7b46342e03dad Linux 3.18], but until now its use had been restricted to root, because these programs were dangerous for security. eBPF programs are, however, validated by the kernel, and in this release the eBPF verifier has been improved and unprivileged users can use it (although unprivileged eBPF is only meaningful for 'socket filter'-like programs, eBPF programs for tracing and TC classifiers/actions will stay root only). This feature can be switched off with the sysctl ''kernel.unprivileged_bpf_disabled'' (once true, bpf programs and maps cannot be accessed from unprivileged process, and the toggle cannot be set back to false) Recommended LWN article: [http://lwn.net/Articles/660331/ Unprivileged bpf()] Code: [https://git.kernel.org/torvalds/c/1be7f75d1668d6296b80bf35dcf6762393530afc commit], [https://git.kernel.org/torvalds/c/aaac3ba95e4c8b496d22f68bd1bc01cfbf525eca commit] |
virtio-gpu is a driver for virtualization guests that allows to use the host graphics card efficiently. In this release, it allows the virtualization guest to use the capabilities of the host GPU to accelerate 3D rendering. In practice, this means that a virtualized linux guest can run a opengl game while using the GPU acceleration capabilities of the host, as show in [[https://www.youtube.com/watch?v=ONFGnUaln-4|this]] or [[https://www.youtube.com/watch?v=ZuuF092RDDc|this]] video. This also requires running [[http://wiki.qemu.org/ChangeLog/2.5#virtio|QEMU 2.5]]. [[https://virgil3d.github.io/|project page]] [[https://www.youtube.com/watch?v=rPeMrmeLTig|44m linux.conf talk about the project]] Code: [[https://git.kernel.org/torvalds/c/3187567222178d4b3742e88242f7abb3c3b7a215|commit]] == LightNVM adds support for Open-Channel SSDs == Open-channel SSDs are devices that share responsibilities with the operating system in order to implement and maintain features that typical SSDs keep strictly in firmware. These include the Flash Translation Layer (FTL), bad block management, and hardware units such as the flash controller, the interface controller, and large amounts of flash chips. In this way, Open-channels SSDs exposes direct access to their physical flash storage, while keeping a subset of the internal features of SSDs. LightNVM is a specification that gives support to Open-channel SSDs. LightNVM allows the host to manage data placement, garbage collection, and parallelism. Device specific responsibilities such as bad block management, FTL extensions to support atomic IOs, or metadata persistence are still handled by the device. This Linux release adds support for lightnvm, (and adds support to NVMe as well). Recommended LWN article: [[https://lwn.net/Articles/641247/|Taking control of SSDs with LightNVM]] Code: [[https://git.kernel.org/torvalds/c/48add0f5a6f46919dd307575aad6ea3de7c9cb2a|commit]], [[https://git.kernel.org/torvalds/c/cd9e9808d18fe7107c306f6e71c8be7230ee42b4|commit]], [[https://git.kernel.org/torvalds/c/ca0640850e43f5f80c6029e2895b119b705f23bd|commit]], [[https://git.kernel.org/torvalds/c/b2b7e00148a203e9934bbd17aebffae3f447ade7|commit]], [[https://git.kernel.org/torvalds/c/ae1519ec448bc31a7fe7369b66e7c78872f91e84|commit]] == TCP listener handling completely lockless, making TCP servers faster and more scalable == In this release, and as a result from an effort that started two years ago, the TCP implementation has been refactored to make the TCP listener fast path completely lockless. During tests, a server was able to process 3,500,000 SYN packets per second on one listener and still have available CPU cycles - about 2 to 3 order of magnitude what it was possible before. SO_REUSEPORT has also been extended (see Networking section) to add proper CPU/NUMA affinities, so that heavy duty TCP servers can get proper siloing thanks to multi-queues NICs. Code: [[https://git.kernel.org/torvalds/c/4d54d86546f62c7c4a0fe3b36a64c5e3b98ce1a9|commit]], [[https://git.kernel.org/torvalds/c/e6934f3ec00b04234acb24a1a2c28af59763d3b5|commit]], [[https://git.kernel.org/torvalds/c/c3fc7ac9a0b978ee8538058743d21feef25f7b33|commit]] == Preliminary journalled RAID5 MD support == This release adds journalled RAID 5 support to the MD (RAID/LVM) layer. With a journal device configured (typically NVRAM or SSD), Data/parity writing to RAID array first writes to the log, then write to raid array disks. If crash happens, we can recovery data from the log. This can speed up RAID resync and fixes RAID5 write hole issue - a crash during degraded operations cannot result in data corruption. In future releases the journal will also be used to improve performance and latency Code: [[https://git.kernel.org/torvalds/c/ac322de6bf5416cb145b58599297b8be73cd86ac|merge]] == Unprivileged eBPF + persistent eBPF programs == '''Unprivileged eBPF ''' eBPF programs got its own syscall in [[http://kernelnewbies.org/Linux_3.18#head-ead251efb6bbdbe2922e7c6bd0c7b46342e03dad|Linux 3.18]], but until now its use had been restricted to root, because these programs were dangerous for security. eBPF programs are, however, validated by the kernel, and in this release the eBPF verifier has been improved and unprivileged users can use it (although unprivileged eBPF is only meaningful for 'socket filter'-like programs, eBPF programs for tracing and TC classifiers/actions will stay root only). This feature can be switched off with the sysctl ''kernel.unprivileged_bpf_disabled'' (once true, bpf programs and maps cannot be accessed from unprivileged process, and the toggle cannot be set back to false) Recommended LWN article: [[http://lwn.net/Articles/660331/|Unprivileged bpf()]] Code: [[https://git.kernel.org/torvalds/c/1be7f75d1668d6296b80bf35dcf6762393530afc|commit]], [[https://git.kernel.org/torvalds/c/aaac3ba95e4c8b496d22f68bd1bc01cfbf525eca|commit]] '''Persistent eBPF maps/progs''' This release also adds support for "persistent" eBPF maps/programs. The term "persistent" is to be understood that maps/programs have a facility that lets them survive process termination. This is desired by various eBPF subsystem users, for example: tc classifier/action. Whenever tc parses the ELF object, extracts and loads maps/progs into the kernel, these file descriptors will be out of reach after the tc instance exits, so a subsequent tc invocation won't be able to access/relocate on this resource, and therefore maps cannot easily be shared, f.e. between the ingress and egress networking data path. To fix issues as these, a new minimal file system has been created that can hold map/prog objects at ''/sys/fs/bpf/''. Any subsequent mounts within a given namespace will point to the same instance. The file system allows for creating a user-defined directory structure. The objects for maps/progs are created/fetched through ''bpf(2)'' along with a pathname with two new commands (''BPF_OBJ_PIN/BPF_OBJ_GET''), that in turn creates the file system nodes. The user can use that to access maps and progs later on, through ''bpf(2)''. Code: [[https://git.kernel.org/torvalds/c/b2197755b2633e164a439682fb05a9b5ea48f706|commit]], [[https://git.kernel.org/torvalds/c/https://git.kernel.org/torvalds/c/42984d7c1e563bf92e6ca7a0fd89f8e933f2162e|commit]] == perf + eBPF integration == In this release, eBPF programs have been integrated with perf. When perf is given an eBPF .c source file (or .o file built for the 'bpf' target with clang), will get it automatically built, validated and loaded into the kernel, which can then be used and seen using ''perf trace'' and other tools. Users are allowed to use BPF filter like: ''# perf record --event ./hello_world.o ls'', and the eBPF program is attached to a newly created perf event which works with all tools. Code: [[https://git.kernel.org/torvalds/c/69d262a93a25cf475012ea2e00aeb29f4932c028|commit]], [[https://git.kernel.org/torvalds/c/84c86ca12b2189df751eed7b2d67cb63bc8feda5|commit]], [[https://git.kernel.org/torvalds/c/ed63f34c026e9a60d17fa750ecdfe3f600d49393|commit]], [[https://git.kernel.org/torvalds/c/1f45b1d49073541947193bd7dac9e904142576aa|commit]], [[https://git.kernel.org/torvalds/c/4edf30e39e6cff32390eaff6a1508969b3cd967b|commit]], [[https://git.kernel.org/torvalds/c/71dc2326252ff1bcdddc05db03c0f831d16c9447|commit]], [[https://git.kernel.org/torvalds/c/d509db0473e40134286271b1d1adadccf42ac467|commit]], [[https://git.kernel.org/torvalds/c/aa3abf30bb28addcf593578d37447d42e3f65fc3|commit]], [[https://git.kernel.org/torvalds/c/1e5e3ee8ff3877db6943032b54a6ac21c095affd|commit]], [[https://git.kernel.org/torvalds/c/ba1fae431e74bb427a699187434142fd3fe98390|commit]] |
Line 49: | Line 82: |
Recommended LWN article: [http://lwn.net/Articles/663879/ Block-layer I/O polling] Code: [https://git.kernel.org/torvalds/c/15c4f638f3d41bae52105ca4c0c8760afbcbeaab commit], [https://git.kernel.org/torvalds/c/05229beeddf7e75e2e616ddaad4b70e7fca9528d commit], [https://git.kernel.org/torvalds/c/a0fa9647a54e81883abd57c5c865d1747f68a577 commit] |
Recommended LWN article: [[http://lwn.net/Articles/663879/|Block-layer I/O polling]] Code: [[https://git.kernel.org/torvalds/c/15c4f638f3d41bae52105ca4c0c8760afbcbeaab|commit]], [[https://git.kernel.org/torvalds/c/05229beeddf7e75e2e616ddaad4b70e7fca9528d|commit]], [[https://git.kernel.org/torvalds/c/a0fa9647a54e81883abd57c5c865d1747f68a577|commit]] == mlock2() syscall allow users to request memory to be locked on page fault == ''mlock()'' allows a user to control page out of program memory, but this comes at the cost of faulting in the entire mapping when it is allocated. For large mappings this is not ideal: For example, security applications that need ''mlock()'' are forced to lock an entire buffer, no matter how big it is. Or maybe a large graphical models where the path through the graph is not known until run time, they are forced to lock the entire graph or lock page by page as they are faulted in. This new ''mlock2()'' syscall set creates a middle ground. Pages are marked to be placed on the unevictable LRU (locked) when they are first used, but they are not faulted in by the mlock call. The new system call that takes a flags argument along with the start address and size. This flags argument gives the caller the ability to request memory be locked in the traditional way, or to be locked after the page is faulted in. New calls are added for ''munlock()'' and ''munlockall()'' which give the called a way to specify which flags are supposed to be cleared. A new MCL flag is added to mirror the lock on fault behavior from mlock() in mlockall(). Finally, a flag for mmap() is added that allows a user to specify that the covered are should not be paged out, but only after the memory has been used the first time. Recommended LWN article: [[http://lwn.net/Articles/650538/|Deferred memory locking]] Code: [[https://git.kernel.org/torvalds/c/de60f5f10c58d4f34b68622442c0e04180367f3f|commit]], [[https://git.kernel.org/torvalds/c/b0f205c2a3082dd9081f9a94e50658c5fa906ff1|commit]], [[https://git.kernel.org/torvalds/c/a8ca5d0ecbdde5cc3d7accacbd69968b0c98764e|commit]], [[https://git.kernel.org/torvalds/c/1aab92ec3de552362397b718744872ea2d17add2|commit]] |
Line 55: | Line 97: |
All the driver and architecture-specific changes can be found in the [http://kernelnewbies.org/Linux_4.4-DriversArch Linux_4.4-DriversArch] page | All the driver and architecture-specific changes can be found in the [[http://kernelnewbies.org/Linux_4.4-DriversArch|Linux_4.4-DriversArch]] page |
Line 58: | Line 100: |
* process scheduler: Apply a frequency scaling correction factor to per-entity load tracking to make it invariant with respect to CPU frequency. Currently, load appears bigger when the CPU is running at slower frequencies, which affects load-balancing decisions [https://git.kernel.org/torvalds/c/e0f5f3afd2cffa96291cd852056d83ff4e2e99c7 commit], [https://git.kernel.org/torvalds/c/e3279a2e6d697e00e74f905851ee7cf532f72b2d commit] * seccomp: add support for dumping a process' (classic BFP) seccomp filters via ptrace + PTRACE_SECCOMP_GET_FILTER [https://git.kernel.org/torvalds/c/f8e529ed941ba2bbcbf310b575d968159ce7e895 commit] * watchdog: Mimic the ''softlockup_panic'' kernel knob and create a ''/proc/sys/kernel/hardlockup_panic''. It enables a hardlockup to panic the machine [https://git.kernel.org/torvalds/c/ac1f591249d95372f3a5ab3828d4af5dfbf5efd3 commit] * watchdog: optionally perform all-CPU backtrace in case of hard lockup. Can be enabled with sysctl ''/proc/sys/kernel/hardlockup_all_cpu_backtrace'' [https://git.kernel.org/torvalds/c/55537871ef666b4153fd1ef8782e4a13fee142cc commit] * coredump: Add two new flags to the existing coredump mechanism for ELF and FDPIC ELF files to allow us to explicitly filter DAX mappings. This is desirable because DAX mappings, like hugetlb mappings, have the potential to be very large [https://git.kernel.org/torvalds/c/5037835c1f3eabf4f22163fc0278dd87165f8957 commit], [https://git.kernel.org/torvalds/c/ab27a8d04b32b6ee8c30c14c4afd1058e8addc82 commit] * test_printf: test printf family at runtime [https://git.kernel.org/torvalds/c/707cc7280f452a162c52bc240eae62568b9753c2 commit] |
* process scheduler: Apply a frequency scaling correction factor to per-entity load tracking to make it invariant with respect to CPU frequency. Currently, load appears bigger when the CPU is running at slower frequencies, which affects load-balancing decisions [[https://git.kernel.org/torvalds/c/e0f5f3afd2cffa96291cd852056d83ff4e2e99c7|commit]], [[https://git.kernel.org/torvalds/c/e3279a2e6d697e00e74f905851ee7cf532f72b2d|commit]] * seccomp: add support for dumping a process' (classic BFP) seccomp filters via ptrace + PTRACE_SECCOMP_GET_FILTER [[https://git.kernel.org/torvalds/c/f8e529ed941ba2bbcbf310b575d968159ce7e895|commit]] * watchdog: Mimic the ''softlockup_panic'' kernel knob and create a ''/proc/sys/kernel/hardlockup_panic''. It enables a hardlockup to panic the machine [[https://git.kernel.org/torvalds/c/ac1f591249d95372f3a5ab3828d4af5dfbf5efd3|commit]] * watchdog: optionally perform all-CPU backtrace in case of hard lockup. Can be enabled with sysctl ''/proc/sys/kernel/hardlockup_all_cpu_backtrace'' [[https://git.kernel.org/torvalds/c/55537871ef666b4153fd1ef8782e4a13fee142cc|commit]] * coredump: Add two new flags to the existing coredump mechanism for ELF and FDPIC ELF files to allow us to explicitly filter DAX mappings. This is desirable because DAX mappings, like hugetlb mappings, have the potential to be very large [[https://git.kernel.org/torvalds/c/5037835c1f3eabf4f22163fc0278dd87165f8957|commit]], [[https://git.kernel.org/torvalds/c/ab27a8d04b32b6ee8c30c14c4afd1058e8addc82|commit]] * test_printf: test printf family at runtime [[https://git.kernel.org/torvalds/c/707cc7280f452a162c52bc240eae62568b9753c2|commit]] * Make sync_file_range(2) use WB_SYNC_NONE writeback. It helps PostgreSQL avoid large latency spikes when flushing data in the background [[https://git.kernel.org/torvalds/c/23d0127096cb91cb6d354bdc71bd88a7bae3a1d5|commit]] |
Line 68: | Line 110: |
* Add per-filesystem stats in ''/sys/fs/xfs/<block>/stats/stats'', and a ''stats_clear'' file to clear them. Also, the global stats that are currently present in ''/proc'' are duplicated in ''/sys/fs/xfs/stats/stats'' (along with a ''stats_clear'' file) [https://git.kernel.org/torvalds/c/bb230c124730f21eea13deab433f9f8fc96bd5f3 commit], [https://git.kernel.org/torvalds/c/225e4635580ce9fb12f8a2dc88473161cd64dbf6 commit], [https://git.kernel.org/torvalds/c/ff6d6af2351caea7db681f4539d0d893e400557a commit] * BTRFS * Add ''fragment'' debug mount option. It can be used to cause extreme fragmentation in data, metadata or both [https://git.kernel.org/torvalds/c/d0bd456074dca089579818312da7cbe726ad2ff9 commit] * Add balance filter for stripes. This is useful to selectively rebalance only chunks that do not span enough devices, applies to RAID0/10/5/6. [https://git.kernel.org/torvalds/c/dee32d0ac3719ef8d640efaf0884111df444730f commit] |
* Add per-filesystem stats in ''/sys/fs/xfs/<block>/stats/stats'', and a ''stats_clear'' file to clear them. Also, the global stats that are currently present in ''/proc'' are duplicated in ''/sys/fs/xfs/stats/stats'' (along with a ''stats_clear'' file) [[https://git.kernel.org/torvalds/c/bb230c124730f21eea13deab433f9f8fc96bd5f3|commit]], [[https://git.kernel.org/torvalds/c/225e4635580ce9fb12f8a2dc88473161cd64dbf6|commit]], [[https://git.kernel.org/torvalds/c/ff6d6af2351caea7db681f4539d0d893e400557a|commit]] * Btrfs * Add ''fragment'' debug mount option. It can be used to cause extreme fragmentation in data, metadata or both [[https://git.kernel.org/torvalds/c/d0bd456074dca089579818312da7cbe726ad2ff9|commit]] * Add balance filter for stripes. This is useful to selectively rebalance only chunks that do not span enough devices, applies to RAID0/10/5/6. [[https://git.kernel.org/torvalds/c/dee32d0ac3719ef8d640efaf0884111df444730f|commit]] |
Line 76: | Line 117: |
* Allow duplicate extents (''cp --reflink'') in SMB3.0 not just SMB3.1.1 [https://git.kernel.org/torvalds/c/ca9e7a1c85594f61d7ffb414071e6cae82eae23a commit] * Add resilienthandles mount parameter. Since many servers (Windows clients, and non-clustered servers) do not support persistent handles but do support resilient handles, allow the user to specify a mount option "resilienthandles" in order to get more reliable connections and less chance of data loss (at least when SMB2.1 or later). Default resilient handle timeout (120 seconds to recent Windows server) is used [https://git.kernel.org/torvalds/c/592fafe644bf3a48b9e00e182a67d301493634fc commit] * Add support for persistent handles, which are like durable file handles with strong guarantees [https://git.kernel.org/torvalds/c/b2a3077414fd6ff1de8972ea55e91f27bcabd913 commit], [https://git.kernel.org/torvalds/c/f16dfa7cd1b588e5d7ef4b5a19ee579f11b7a41f commit], [https://git.kernel.org/torvalds/c/b618f001a20e44f691dd0e2ffea651a40a651871 commit] * Allow copy offload (copychunk) across shares [https://git.kernel.org/torvalds/c/7b52e2793a58af61b5d349c2c080437a437a4edb commit] |
* Allow duplicate extents (''cp --reflink'') in SMB3.0 not just SMB3.1.1 [[https://git.kernel.org/torvalds/c/ca9e7a1c85594f61d7ffb414071e6cae82eae23a|commit]] * Add resilienthandles mount parameter. Since many servers (Windows clients, and non-clustered servers) do not support persistent handles but do support resilient handles, allow the user to specify a mount option "resilienthandles" in order to get more reliable connections and less chance of data loss (at least when SMB2.1 or later). Default resilient handle timeout (120 seconds to recent Windows server) is used [[https://git.kernel.org/torvalds/c/592fafe644bf3a48b9e00e182a67d301493634fc|commit]] * Add support for persistent handles, which are like durable file handles with strong guarantees [[https://git.kernel.org/torvalds/c/b2a3077414fd6ff1de8972ea55e91f27bcabd913|commit]], [[https://git.kernel.org/torvalds/c/f16dfa7cd1b588e5d7ef4b5a19ee579f11b7a41f|commit]], [[https://git.kernel.org/torvalds/c/b618f001a20e44f691dd0e2ffea651a40a651871|commit]] * Allow copy offload (copychunk) across shares [[https://git.kernel.org/torvalds/c/7b52e2793a58af61b5d349c2c080437a437a4edb|commit]] |
Line 82: | Line 123: |
* Support for NFSv4.2 file CLONE using the btrfs ioctl [https://git.kernel.org/torvalds/c/21fad313d5890b674432fe3ad0c7bcf040320340 commit] [https://git.kernel.org/torvalds/c/e5341f3a5762d17be9cdd06257c02c0098bdcab8 commit], [https://git.kernel.org/torvalds/c/36022770de6cf9a403c40a68712ed2d2ea2746be commit], [https://git.kernel.org/torvalds/c/bea51b30b281039f0f43fb4f42028ddf33fb601f commit], [https://git.kernel.org/torvalds/c/a340abcf4173461f688292a6879b4d5bc781c2b1 commit] * EXT4 * Store checksum seed in superblock [https://git.kernel.org/torvalds/c/8c81bd8f586c46eaf114758a78d82895a2b081c2 commit] |
* Support for NFSv4.2 file CLONE using the btrfs ioctl [[https://git.kernel.org/torvalds/c/21fad313d5890b674432fe3ad0c7bcf040320340|commit]] [[https://git.kernel.org/torvalds/c/e5341f3a5762d17be9cdd06257c02c0098bdcab8|commit]], [[https://git.kernel.org/torvalds/c/36022770de6cf9a403c40a68712ed2d2ea2746be|commit]], [[https://git.kernel.org/torvalds/c/bea51b30b281039f0f43fb4f42028ddf33fb601f|commit]], [[https://git.kernel.org/torvalds/c/a340abcf4173461f688292a6879b4d5bc781c2b1|commit]] * ext4 * Store checksum seed in superblock [[https://git.kernel.org/torvalds/c/8c81bd8f586c46eaf114758a78d82895a2b081c2|commit]] |
Line 88: | Line 129: |
* Improve performance for localalloc [https://git.kernel.org/torvalds/c/1d1aff8cf367d2216a678c722161784e207965c4 commit] | * Improve performance for localalloc [[https://git.kernel.org/torvalds/c/1d1aff8cf367d2216a678c722161784e207965c4|commit]] * UBIFS * atime support [[https://git.kernel.org/torvalds/c/8c1c5f263833ec2dc8fd716cf4281265c485d7ad|commit]] |
Line 92: | Line 136: |
* Get rid of ''vmalloc_info'' from ''/proc/meminfo''. It is too expensive to calculate and shows up in real workloads, people who actually want to know what the situation is wrt the vmalloc area should just look at the much more complete ''/proc/vmallocinfo'' instead [https://git.kernel.org/torvalds/c/a5ad88ce8c7fae7ddc72ee49a11a75aa837788e0 commit] * Add ''HugetlbPages'' field to ''/proc/PID/status''. Currently there's no easy way to get per-process usage of hugetlb pages, which is inconvenient because userspace applications which use hugetlb can need it [https://git.kernel.org/torvalds/c/5d317b2b6536592a9b51fe65faed43d65ca9158e commit] * Add hugetlb-related fields to ''/proc/PID/smaps'' to know per-task or per-vma base hugetlb usage: ''AnonHugePages'' shows the amount of memory backed by transparent hugepage; ''Shared_Hugetlb'' and ''Private_Hugetlb'' show the amounts of memory backed by hugetlbfs page which is not counted in ''RSS'' or ''PSS'' field for historical reasons. And these are not included in ''{Shared,Private}_{Clean,Dirty}'' field [https://git.kernel.org/torvalds/c/25ee01a2fca02dfb5a3ce316e77910c468108199 commit] * memcontrol: eliminate ''memory.current'' on the root level, because it doesn't add anything that wouldn't be more accurate and detailed using system statistics [https://git.kernel.org/torvalds/c/f5fc3c5d817435970aa301d066820a9ac12c8120 commit] |
* Get rid of ''vmalloc_info'' from ''/proc/meminfo''. It is too expensive to calculate and shows up in real workloads, people who actually want to know what the situation is wrt the vmalloc area should just look at the much more complete ''/proc/vmallocinfo'' instead [[https://git.kernel.org/torvalds/c/a5ad88ce8c7fae7ddc72ee49a11a75aa837788e0|commit]] * Add ''HugetlbPages'' field to ''/proc/PID/status''. Currently there's no easy way to get per-process usage of hugetlb pages, which is inconvenient because userspace applications which use hugetlb can need it [[https://git.kernel.org/torvalds/c/5d317b2b6536592a9b51fe65faed43d65ca9158e|commit]] * Add hugetlb-related fields to ''/proc/PID/smaps'' to know per-task or per-vma base hugetlb usage: ''AnonHugePages'' shows the amount of memory backed by transparent hugepage; ''Shared_Hugetlb'' and ''Private_Hugetlb'' show the amounts of memory backed by hugetlbfs page which is not counted in ''RSS'' or ''PSS'' field for historical reasons. And these are not included in ''{Shared,Private}_{Clean,Dirty}'' field [[https://git.kernel.org/torvalds/c/25ee01a2fca02dfb5a3ce316e77910c468108199|commit]] * memcontrol: eliminate ''memory.current'' on the root level, because it doesn't add anything that wouldn't be more accurate and detailed using system statistics [[https://git.kernel.org/torvalds/c/f5fc3c5d817435970aa301d066820a9ac12c8120|commit]] |
Line 99: | Line 143: |
* Add Persistent Reservations support. It includes a user space interface for simplified Persistent Reservations which map to block devices that support these (only SCSI for now). Persistent Reservations allow restricting access to block devices to specific initiators in a shared storage setup [https://git.kernel.org/torvalds/c/bbd3e064362e5057cc4799ba2e4d68c7593e490b commit], [https://git.kernel.org/torvalds/c/924d55b06347d813b38c51e75ce1a6666c113933 commit], [https://git.kernel.org/torvalds/c/71cdb6978a80f9f6c51bef0622388c1414c2fe32 commit] * Export integrity data interval size in ''/sys/block/<disk>/integrity/protection_interval_bytes'', so that apps can tell whether the interval is different from the device's logical block size [https://git.kernel.org/torvalds/c/4c241d08dbfcbdc7a949b91d72707a289d464954 commit] * cdrom: Random writing support for BD-RE media [https://git.kernel.org/torvalds/c/f7e7868b4743f1cc5e59e6e0ddd3ccf9cfe53a1b commit] |
* Block polling support [[https://git.kernel.org/torvalds/c/15c4f638f3d41bae52105ca4c0c8760afbcbeaab|commit]], [[https://git.kernel.org/torvalds/c/05229beeddf7e75e2e616ddaad4b70e7fca9528d|commit]], [[https://git.kernel.org/torvalds/c/a0fa9647a54e81883abd57c5c865d1747f68a577|commit]] * loop: direct and asynchronous I/O [[https://git.kernel.org/torvalds/c/ab1cb278bc7027663adbfb0b81404f8398437e11|commit]], [[https://git.kernel.org/torvalds/c/2e5ab5f379f96a6207c45be40c357ebb1beb8ef3|commit]], [[https://git.kernel.org/torvalds/c/5b5e20f421c0b6d437b3dec13e53674161998d56|commit]], [[https://git.kernel.org/torvalds/c/bc07c10a3603a5ab3ef01ba42b3d41f9ac63d1b6|commit]], [[https://git.kernel.org/torvalds/c/e03a3d7a94e2485b6e2fa3fb630b9b3a30b65718|commit]] * Add Persistent Reservations support. It includes a user space interface for simplified Persistent Reservations which map to block devices that support these (only SCSI for now). Persistent Reservations allow restricting access to block devices to specific initiators in a shared storage setup [[https://git.kernel.org/torvalds/c/bbd3e064362e5057cc4799ba2e4d68c7593e490b|commit]], [[https://git.kernel.org/torvalds/c/924d55b06347d813b38c51e75ce1a6666c113933|commit]], [[https://git.kernel.org/torvalds/c/71cdb6978a80f9f6c51bef0622388c1414c2fe32|commit]] * Export integrity data interval size in ''/sys/block/<disk>/integrity/protection_interval_bytes'', so that apps can tell whether the interval is different from the device's logical block size [[https://git.kernel.org/torvalds/c/4c241d08dbfcbdc7a949b91d72707a289d464954|commit]] * cdrom: Random writing support for BD-RE media [[https://git.kernel.org/torvalds/c/f7e7868b4743f1cc5e59e6e0ddd3ccf9cfe53a1b|commit]] |
Line 107: | Line 151: |
crypto: caam - add support for acipher xts(aes) [https://git.kernel.org/torvalds/c/c6415a6016bff0b547c13cadb1d5e50e9ace2be3 commit] crypto: keywrap - add key wrapping block chaining mode [https://git.kernel.org/torvalds/c/e28facde3c39005071cc5323d56539bb44efa446 commit] crypto: qat - add support for ctr(aes) and xts(aes) [https://git.kernel.org/torvalds/c/def14bfaf30d5d5a4a8fe5bf600ce09232e688c0 commit] |
crypto: caam - add support for acipher xts(aes) [[https://git.kernel.org/torvalds/c/c6415a6016bff0b547c13cadb1d5e50e9ace2be3|commit]] crypto: keywrap - add key wrapping block chaining mode [[https://git.kernel.org/torvalds/c/e28facde3c39005071cc5323d56539bb44efa446|commit]] crypto: qat - add support for ctr(aes) and xts(aes) [[https://git.kernel.org/torvalds/c/def14bfaf30d5d5a4a8fe5bf600ce09232e688c0|commit]] |
Line 113: | Line 157: |
* TPM: Support TPM 2.0 chips [[https://git.kernel.org/torvalds/c/954650efb79f99d5c817c121bb0a7c6c53362048|commit]], [[https://git.kernel.org/torvalds/c/0fe5480303a1657b328a0a389f8d99249d9961f5|commit]] |
|
Line 115: | Line 161: |
* Integration of perf with eBPF that, given an eBPF .c source file (or .o file built for the 'bpf' target with clang), will get it automatically built, validated and loaded into the kernel via the sys_bpf syscall, which can then be used and seen using 'perf trace' and other tools. Users can run commands like ''perf record --event bpf-file.c ls'' to try it [https://git.kernel.org/torvalds/c/69d262a93a25cf475012ea2e00aeb29f4932c028 commit], [https://git.kernel.org/torvalds/c/84c86ca12b2189df751eed7b2d67cb63bc8feda5 commit], [https://git.kernel.org/torvalds/c/ed63f34c026e9a60d17fa750ecdfe3f600d49393 commit], [https://git.kernel.org/torvalds/c/1f45b1d49073541947193bd7dac9e904142576aa commit], [https://git.kernel.org/torvalds/c/4edf30e39e6cff32390eaff6a1508969b3cd967b commit], [https://git.kernel.org/torvalds/c/71dc2326252ff1bcdddc05db03c0f831d16c9447 commit], [https://git.kernel.org/torvalds/c/d509db0473e40134286271b1d1adadccf42ac467 commit], [https://git.kernel.org/torvalds/c/aa3abf30bb28addcf593578d37447d42e3f65fc3 commit], [https://git.kernel.org/torvalds/c/1e5e3ee8ff3877db6943032b54a6ac21c095affd commit], [https://git.kernel.org/torvalds/c/ba1fae431e74bb427a699187434142fd3fe98390 commit] * Add a new branch type sampling filter to perf record, named 'call' (''perf record -j call -e cycles .....''), that samples only call branches (function calls), unlike 'any_call' that included direct, indirect calls and far jumps. Only x86 and PowerPC are supported in this release [https://git.kernel.org/torvalds/c/43e41adc9e8c36545888d78fed2ef8d102a938dc commit], [https://git.kernel.org/torvalds/c/c229bf9dc179d2023e185c0f705bdf68484c1e73 commit] * Add Intel cstate (aka idle states) Performance Monitoring Unit support. This allows perf to support cstate related free running (read-only and system-wide) counters. For example, to caculate the fraction of time when the core is running in C6 state: '' perf stat -x, -e"cstate_core/c6-residency/,msr/tsc/" -C0 -- taskset -c 0 sleep 5 '' [https://git.kernel.org/torvalds/c/7ce1346a6842550a3c4c453cdf1c7b81fb60b07e commit] * CPU socket filtering: perf tools introduce a new sort type "socket" for the processor socket, eg. ''perf report --stdio --sort socket,comm,dso,symbol'' [https://git.kernel.org/torvalds/c/2e7ea3ab8282f6bb1d211d8af760a734c055f493 commit]. Also, perf report introduces a --socket-filter option for 'perf report' to only show entries for a processor socket that match this filter [https://git.kernel.org/torvalds/c/21394d948a0c7c451d4a4d68afed9a06c4969636 commit]. perf hists browser can zoom in/out for processor socket [https://git.kernel.org/torvalds/c/84734b06b63093cd44533f4caa43d4452fb11ec3 commit] * perf tools: Introduce 'P' modifier, it will cause the event to get maximum possible detected precise level. For example, ''perf record -e cycles:P ...'' will detect maximum precise level for 'cycles' event and use it [https://git.kernel.org/torvalds/c/7f94af7a489fada17d28cc60e8f4409ce216bd6d commit] * perf tools: Add support for sorting on the iaddr. New sort option is: symbol_iaddr, header label is 'Code Symbol', eg ''perf mem report --stdio -F +symbol_iaddr'' [https://git.kernel.org/torvalds/c/28e6db205b3ed3f1d86a00c69b3304190377da5f commit] * perf tools: enables config terms for tracepoint perf events. Valid terms for tracepoint events are 'call-graph' and 'stack-size', so different callgraph settings can be used for each event and eliminate unnecessary overhead. An example for using different call-graph config for each tracepoint: ''perf record -e syscalls:sys_enter_write/call-graph=fp -e syscalls:sys_exit_write/call-graph=no dd if=/dev/zero of=test bs=4k count=10'' [https://git.kernel.org/torvalds/c/e637d17757a10732fa5d573c18f20b3cd4d31245 commit] * perf script: Enable printing of branch stack viaa the 'brstack' and 'brstacksym' arguments to the field selection option -F. The option is off by default and operates only if the perf.data file has branch stack content [https://git.kernel.org/torvalds/c/dc323ce8e72d6d1beb9af9bbd29c4d55ce3d7fb0 commit] * perf auxtrace: Add AUX area tracing option 'l' to synthesize branch stacks on samples just like sample type PERF_SAMPLE_BRANCH_STACK [https://git.kernel.org/torvalds/c/601897b54c7ed492a89b262dccd7c6f7faf12b30 commit] * perf hists browser: Add 'm' key for context menu display [https://git.kernel.org/torvalds/c/31eb4360546b4bd890f349db01295a173c09b0fb commit] * perf inject: Add --strip option which is used with --itrace to strip out non-synthesized events [https://git.kernel.org/torvalds/c/f56fb9864c501dc85ebe40af5bf925dd07d990c0 commit] * perf script: Allow time to be displayed in nanoseconds [https://git.kernel.org/torvalds/c/83e1986032dfcd3f9e9fc0d06e11d9153edae19b commit] * Intel PT hardware tracer: Accept a zero ''--itrace'' period, meaning "as often as possible". In the case of Intel PT that is the same as a period of 1 and a unit of 'instructions' (i.e. --itrace=i1i)[https://git.kernel.org/torvalds/c/e1791347b5d57d13326cf0114df1a3f3b1c4ca24 commit] |
* Integration of perf with eBPF that, given an eBPF .c source file (or .o file built for the 'bpf' target with clang), will get it automatically built, validated and loaded into the kernel via the sys_bpf syscall, which can then be used and seen using 'perf trace' and other tools. Users can run commands like ''perf record --event bpf-file.c ls'' to try it [[https://git.kernel.org/torvalds/c/69d262a93a25cf475012ea2e00aeb29f4932c028|commit]], [[https://git.kernel.org/torvalds/c/84c86ca12b2189df751eed7b2d67cb63bc8feda5|commit]], [[https://git.kernel.org/torvalds/c/ed63f34c026e9a60d17fa750ecdfe3f600d49393|commit]], [[https://git.kernel.org/torvalds/c/1f45b1d49073541947193bd7dac9e904142576aa|commit]], [[https://git.kernel.org/torvalds/c/4edf30e39e6cff32390eaff6a1508969b3cd967b|commit]], [[https://git.kernel.org/torvalds/c/71dc2326252ff1bcdddc05db03c0f831d16c9447|commit]], [[https://git.kernel.org/torvalds/c/d509db0473e40134286271b1d1adadccf42ac467|commit]], [[https://git.kernel.org/torvalds/c/aa3abf30bb28addcf593578d37447d42e3f65fc3|commit]], [[https://git.kernel.org/torvalds/c/1e5e3ee8ff3877db6943032b54a6ac21c095affd|commit]], [[https://git.kernel.org/torvalds/c/ba1fae431e74bb427a699187434142fd3fe98390|commit]] * Add a new branch type sampling filter to perf record, named 'call' (''perf record -j call -e cycles .....''), that samples only call branches (function calls), unlike 'any_call' that included direct, indirect calls and far jumps. Only x86 and PowerPC are supported in this release [[https://git.kernel.org/torvalds/c/43e41adc9e8c36545888d78fed2ef8d102a938dc|commit]], [[https://git.kernel.org/torvalds/c/c229bf9dc179d2023e185c0f705bdf68484c1e73|commit]] * Add Intel cstate (aka idle states) Performance Monitoring Unit support. This allows perf to support cstate related free running (read-only and system-wide) counters. For example, to caculate the fraction of time when the core is running in C6 state: '' perf stat -x, -e"cstate_core/c6-residency/,msr/tsc/" -C0 -- taskset -c 0 sleep 5 '' [[https://git.kernel.org/torvalds/c/7ce1346a6842550a3c4c453cdf1c7b81fb60b07e|commit]] * CPU socket filtering: perf tools introduce a new sort type "socket" for the processor socket, eg. ''perf report --stdio --sort socket,comm,dso,symbol'' [[https://git.kernel.org/torvalds/c/2e7ea3ab8282f6bb1d211d8af760a734c055f493|commit]]. Also, perf report introduces a --socket-filter option for 'perf report' to only show entries for a processor socket that match this filter [[https://git.kernel.org/torvalds/c/21394d948a0c7c451d4a4d68afed9a06c4969636|commit]]. perf hists browser can zoom in/out for processor socket [[https://git.kernel.org/torvalds/c/84734b06b63093cd44533f4caa43d4452fb11ec3|commit]] * perf tools: Introduce 'P' modifier, it will cause the event to get maximum possible detected precise level. For example, ''perf record -e cycles:P ...'' will detect maximum precise level for 'cycles' event and use it [[https://git.kernel.org/torvalds/c/7f94af7a489fada17d28cc60e8f4409ce216bd6d|commit]] * perf tools: Add support for sorting on the iaddr. New sort option is: symbol_iaddr, header label is 'Code Symbol', eg ''perf mem report --stdio -F +symbol_iaddr'' [[https://git.kernel.org/torvalds/c/28e6db205b3ed3f1d86a00c69b3304190377da5f|commit]] * perf tools: enables config terms for tracepoint perf events. Valid terms for tracepoint events are 'call-graph' and 'stack-size', so different callgraph settings can be used for each event and eliminate unnecessary overhead. An example for using different call-graph config for each tracepoint: ''perf record -e syscalls:sys_enter_write/call-graph=fp -e syscalls:sys_exit_write/call-graph=no dd if=/dev/zero of=test bs=4k count=10'' [[https://git.kernel.org/torvalds/c/e637d17757a10732fa5d573c18f20b3cd4d31245|commit]] * perf script: Enable printing of branch stack viaa the 'brstack' and 'brstacksym' arguments to the field selection option -F. The option is off by default and operates only if the perf.data file has branch stack content [[https://git.kernel.org/torvalds/c/dc323ce8e72d6d1beb9af9bbd29c4d55ce3d7fb0|commit]] * perf auxtrace: Add AUX area tracing option 'l' to synthesize branch stacks on samples just like sample type PERF_SAMPLE_BRANCH_STACK [[https://git.kernel.org/torvalds/c/601897b54c7ed492a89b262dccd7c6f7faf12b30|commit]] * perf hists browser: Add 'm' key for context menu display [[https://git.kernel.org/torvalds/c/31eb4360546b4bd890f349db01295a173c09b0fb|commit]] * perf inject: Add --strip option which is used with --itrace to strip out non-synthesized events [[https://git.kernel.org/torvalds/c/f56fb9864c501dc85ebe40af5bf925dd07d990c0|commit]] * perf script: Allow time to be displayed in nanoseconds [[https://git.kernel.org/torvalds/c/83e1986032dfcd3f9e9fc0d06e11d9153edae19b|commit]] * Intel PT hardware tracer: Accept a zero ''--itrace'' period, meaning "as often as possible". In the case of Intel PT that is the same as a period of 1 and a unit of 'instructions' (i.e. --itrace=i1i)[[https://git.kernel.org/torvalds/c/e1791347b5d57d13326cf0114df1a3f3b1c4ca24|commit]] |
Line 137: | Line 183: |
* ftrace: add module globbing [[https://git.kernel.org/torvalds/c/0b507e1ed1b7364def464cfb348ea7c9e87e6e18|commit]] | |
Line 139: | Line 186: |
* Support for VT-d posted interrupts (i.e. PCI devices can inject interrupts directly into vCPUs). Used by KVM and VFIO [https://git.kernel.org/torvalds/c/f73f8173126ba68eb1c42bd9a234a51d78576ca6 commit] * KVM: Nested virtualization now supports VPID (same as PCID but for vCPUs) which makes it quite a bit faster [https://git.kernel.org/torvalds/c/99b83ac893b84ed1a62ad6d1f2b6cc32026b9e85 commit], [https://git.kernel.org/torvalds/c/089d7b6ec5151ad06a2cd524bc0580d311b641ad commit], [https://git.kernel.org/torvalds/c/5c614b3583e7b6dab0c86356fa36c2bcbb8322a0 commit] * KVM: Support for "split irqchip", i.e. LAPIC in kernel and IOAPIC/PIC/PIT in userspace, which reduces the attack surface of the hypervisor [https://git.kernel.org/torvalds/c/b053b2aef25d00773fa6762dcd4b7f5c9c42d171 commit], [https://git.kernel.org/torvalds/c/7543a635aa09eb138b2cbf60ac3ff19503ae6954 commit], [https://git.kernel.org/torvalds/c/1c1a9ce973a7863dd46767226bce2a5f12d48bc6 commit] * KVM: add capability for any-length ioeventfds. With KVM_CAP_IOEVENTFD_ANY_LENGTH, a zero length ioeventfd is allowed, and the kernel will ignore the length of guest write and may get a faster vmexit [https://git.kernel.org/torvalds/c/e9ea5069d9e569c32ab913c39467df32e056b3a7 commit] * VMware balloon: Get notified immediately via VMCI when a balloon target is set, instead of waiting for up to one second [https://git.kernel.org/torvalds/c/48e3d668b7902cca3c61e9e2098e7f76b5646c28 commit] * VMware balloon: Support ballooning with 2 MB sized pages. It significantly reduces the hypervisor side (and guest side) overhead of ballooning and unballooning [https://git.kernel.org/torvalds/c/365bd7ef7ec8eb9c2e081cd970a5cdfa237dc243 commit] * Vmware vmxnet3: Extend register dump support [https://git.kernel.org/torvalds/c/b6bd9b5448a9362e3ca33b21f1461baa5500520f commit] |
* Support for VT-d posted interrupts (i.e. PCI devices can inject interrupts directly into vCPUs). Used by KVM and VFIO [[https://git.kernel.org/torvalds/c/f73f8173126ba68eb1c42bd9a234a51d78576ca6|commit]] * KVM: Nested virtualization now supports VPID (same as PCID but for vCPUs) which makes it quite a bit faster [[https://git.kernel.org/torvalds/c/99b83ac893b84ed1a62ad6d1f2b6cc32026b9e85|commit]], [[https://git.kernel.org/torvalds/c/089d7b6ec5151ad06a2cd524bc0580d311b641ad|commit]], [[https://git.kernel.org/torvalds/c/5c614b3583e7b6dab0c86356fa36c2bcbb8322a0|commit]] * KVM: Support for "split irqchip", i.e. LAPIC in kernel and IOAPIC/PIC/PIT in userspace, which reduces the attack surface of the hypervisor [[https://git.kernel.org/torvalds/c/b053b2aef25d00773fa6762dcd4b7f5c9c42d171|commit]], [[https://git.kernel.org/torvalds/c/7543a635aa09eb138b2cbf60ac3ff19503ae6954|commit]], [[https://git.kernel.org/torvalds/c/1c1a9ce973a7863dd46767226bce2a5f12d48bc6|commit]] * KVM: add capability for any-length ioeventfds. With KVM_CAP_IOEVENTFD_ANY_LENGTH, a zero length ioeventfd is allowed, and the kernel will ignore the length of guest write and may get a faster vmexit [[https://git.kernel.org/torvalds/c/e9ea5069d9e569c32ab913c39467df32e056b3a7|commit]] * VMware balloon: Get notified immediately via VMCI when a balloon target is set, instead of waiting for up to one second [[https://git.kernel.org/torvalds/c/48e3d668b7902cca3c61e9e2098e7f76b5646c28|commit]] * VMware balloon: Support ballooning with 2 MB sized pages. It significantly reduces the hypervisor side (and guest side) overhead of ballooning and unballooning [[https://git.kernel.org/torvalds/c/365bd7ef7ec8eb9c2e081cd970a5cdfa237dc243|commit]] * Vmware vmxnet3: Extend register dump support [[https://git.kernel.org/torvalds/c/b6bd9b5448a9362e3ca33b21f1461baa5500520f|commit]] |
Line 148: | Line 195: |
* [http://kb.linuxvirtualserver.org/wiki/IPVS IP Virtual Server] * Support scheduling of ICMP packets to IPVS instances. A new sysctl ''net.ipv4.vs.schedule_icmp'' has been introduced, that will enable this feature if set to 1 (by default, it is set by default to 0 to retain the old behaviour) [https://git.kernel.org/torvalds/c/99cb99aa055a72d3880d8a95a71034c4d64bcf9a merge commit] * Allow to ignore tunnelled packets with new Sysctl ''net.ipv4.vs.ignore_tunneled''. If set, ipvs will set the ipvs_property on all packets which are of unrecognised protocols. This prevents the kernel from routing tunnelled protocols like ipip, which is useful to prevent rescheduling packets that have been tunneled to the ipvs host (i.e. to prevent ipvs routing loops when ipvs is also acting as a real server) [https://git.kernel.org/torvalds/c/4e478098ac0ac1b6ef9a70fcdc2ec8b93f1b59a1 commit] * Add setsockopt() support for SO_INCOMING_CPU and extend SO_REUSEPORT selection logic : If a TCP listener or UDP socket has this option set, a packet is delivered to this socket only if CPU handling the packet matches the specified one. This allows to build very efficient TCP servers, using one listener per RX queue, as the associated TCP listener should only accept flows handled in softirq by the same cpu. This provides optimal NUMA behavior and keep cpu caches hot [https://git.kernel.org/torvalds/c/76973dd79fd52f187ba3df018bca65792a3d942 commit], [https://git.kernel.org/torvalds/c/70da268b569d32a9fddeea85dc18043de9d89f89 commit] * Provide FIB table ID in ipv4 route dumps just as ipv6 does [https://git.kernel.org/torvalds/c/b7503e0cdb5dbec5d201aa69d8888c14679b5ae8 commit] * Allow the user to ask for the statistics to be filtered out of ipv4/ipv6 address netlink dumps, because many commonly used functions like getifaddrs() invoke RTM_GETLINK to dump the interface information, and do not need the AF_INET6 statistics, which are expensive to calculate [https://git.kernel.org/torvalds/c/d5566fd72ec1924958fcfd48b65c022c8f7eae64 commit] * wireless: implement Very High Throughput support for mesh networks [https://git.kernel.org/torvalds/c/c85fb53c4fa6521352028c40ce096a808aabd389 commit] * bridge: Allow setting the bridge attribute ''ageing_time'' in rocker and switchdev [https://git.kernel.org/torvalds/c/c62987bbd8a1a1664f99e89e3959339350a6131e commit], [https://git.kernel.org/torvalds/c/d0cf57f9dddb50ea404bf747a3c6b22b29f82b9a commit], [https://git.kernel.org/torvalds/c/f55ac58ae64cbb0315382e738681fe31837dcac0 commit] * vxlan: support both IPv4 and IPv6 sockets in a single vxlan device [https://git.kernel.org/torvalds/c/b1be00a6c39fda2ec380e168d7bcf96fb8c9da42 commit] * bridge: complete the bridge device's netlink support and makes it possible to view and configure everything that can be configured via sysfs [https://git.kernel.org/torvalds/c/3e087caa23ef36370bfb925d3bbca78e8302d3ce commit] * IPv4: Hash-based multipath routing. When the routing cache was [http://kernelnewbies.org/Linux_3.6#head-85de5e5247a939f2a61d0c5ccbc13ff5b4f1a6a0 removed in 3.6], the IPv4 multipath algorithm changed from more or less being destination-based into being quasi-random per-packet scheduling. This increased the risk of out-of-order packets and made it impossible to use multipath together with anycast services. In this release, the multipath routing implementation is replaced with a flow-based load balancing based on a hash over the source and destination addresses [https://git.kernel.org/torvalds/c/07355737a8badd951e6b72aa8609a2d6eed0a7e7 merge commit] * IPv6 support to the Virtual Routing and Forwarding (VRF) devices [https://git.kernel.org/torvalds/c/ccf3c8c3fe1bd4828556650ae7928da6ffb4aaf6 commit], [https://git.kernel.org/torvalds/c/35402e31366349a32b505afdfe856aeeb8d939a0 commit], [https://git.kernel.org/torvalds/c/ca254490c8dfdaddb5df8a763774db0f4c5200c3 commit] * TCP: Recent ACK (RACK) loss recovery. RACK loss recovery uses the notion of time instead of packet sequence (FACK) or counts (dupthresh) (see commit for details). In the current patch set RACK is only a supplemental loss detection and does not trigger fast recovery. However RACK is being developed to replace or consolidate FACK/dupthresh, early retransmit, and thin-dupack. Since RACK is still experimental, it is now used as a supplemental loss detection on top of existing algorithms. It can be disabled with sysctl ''net.ipv4.tcp_recovery'' [https://git.kernel.org/torvalds/c/eb9fae328faff9807a4ab5c1834b19f34dd155d4 commit] '' TODO '' * mpls: flow-based multipath selection [https://git.kernel.org/torvalds/c/1c78efa8319cad2f10f421afa627745fb4d9b29f commit] * mpls: multipath route support [https://git.kernel.org/torvalds/c/f8efb73c97e2fa0abbe2e07c5c5df07800312643 commit] * bridge: allow adding of fdb entries pointing to the bridge device [https://git.kernel.org/torvalds/c/3741873b4f73b572b8f8835e6bd114e08316a160 commit] * bonding: support encapsulated ipv6 TSO [https://git.kernel.org/torvalds/c/e87eb4051efe76b35d0a297db772f5964a001544 commit] * net: Add support for filtering neigh dump by device index [https://git.kernel.org/torvalds/c/16660f0bd942cec203eaf4de0e2ac1695bd9d32d commit] * net: Add support for filtering neigh dump by master device [https://git.kernel.org/torvalds/c/21fdd092acc7ebda0dfe682008592eb79c382707 commit] * net/core: generic support for disabling netdev features down stack [https://git.kernel.org/torvalds/c/fd867d51f889aec11cca235ebb008578780d052d commit] * net/ethoc: support big-endian register layout [https://git.kernel.org/torvalds/c/06e60e5912c0373b15143cc52e4a11fafeaafff3 commit] * net/wireless: enable wiphy device to suspend/resume asynchronously [https://git.kernel.org/torvalds/c/9f0e13546ef5773b7059b531a667ec47a5f897ee commit] * net: Introduce L3 Master device abstraction [https://git.kernel.org/torvalds/c/1b69c6d0ae90b7f1a4f61d5c8209d5cb7a55f849 commit] * net: dummy: add more features [https://git.kernel.org/torvalds/c/8f3af27786913851e720bc9466d1abffcfa7aff6 commit] * net: tso: add support for IPv6 [https://git.kernel.org/torvalds/c/8941faa161b526199e55ca7764cf875383453612 commit] * netfilter: nfnetlink_log: allow to attach conntrack [https://git.kernel.org/torvalds/c/a29a9a585b2840a205f085a34dfd65c75e86f7c3 commit] * nl80211: put current TX power in interface info [https://git.kernel.org/torvalds/c/d55d0d598e6610bbfcc1f2ecd6e8af669b94783b commit] * nl80211: support vendor dumpit commands [https://git.kernel.org/torvalds/c/7bdbe400d1b2aac116513f90b75969ad2365fba6 commit] * nl802154: add support for security layer [https://git.kernel.org/torvalds/c/a26c5fd7622d4951425131d54a8c99f076fe2068 commit] * ipconfig: send Client-identifier in DHCP requests [https://git.kernel.org/torvalds/c/26fb342c734061859fec1bd9e987bb6b78061ef0 commit] * ipv4: implement support for NOPREFIXROUTE ifa flag for ipv4 address [https://git.kernel.org/torvalds/c/7b1311807f3d3eb8bef3ccc53127838b3bea3771 commit] * ipv6: gro: support sit protocol [https://git.kernel.org/torvalds/c/feec0cb3f20b837f8ca36e974267918d7a4497f8 commit] * ieee802154: 6lowpan: add tx/rx stats [https://git.kernel.org/torvalds/c/1c64f147d3cc9bbafe091a7b335ea3ec700186f0 commit] * if_link: Add control trust VF [https://git.kernel.org/torvalds/c/dd461d6aa894761fe67c30ddf81eec0d08be216b commit] * IB/addr: Pass network namespace as a parameter [https://git.kernel.org/torvalds/c/565edd1d555513ab5d67a847d50d7c14c82ef6c3 commit] * IB/cma: Add support for network namespaces [https://git.kernel.org/torvalds/c/fa20105e09e97e81aadf02f722c31195e4a75c84 commit] * IB/cma: Separate port allocation to network namespaces [https://git.kernel.org/torvalds/c/4be74b42a6d05a74a21362010cd3920fa17f63c7 commit] * IB/core: Add support of checksum capability reporting for RC and RAW [https://git.kernel.org/torvalds/c/470a55358186d0bb93558a87d13159dfbc989351 commit] * bpf, seccomp: prepare for upcoming criu support [https://git.kernel.org/torvalds/c/bab18991871545dfbd10c931eb0fe8f7637156a9 commit] * bpf: add support for persistent maps/progs [https://git.kernel.org/torvalds/c/b2197755b2633e164a439682fb05a9b5ea48f706 commit] * cfg80211: allow changing station capabilities for unassociated stations [https://git.kernel.org/torvalds/c/47edb11b522561658fe719e56aa69a3c3098a3fe commit] * cfg80211: reg: make CRDA support optional [https://git.kernel.org/torvalds/c/b68630369167a7fd2c4c3d1be96430defc59fb9a commit] * mac80211: advertise support for full station state in AP mode [https://git.kernel.org/torvalds/c/44674d9c2267f454f38df7b2395939bfa911f92e commit] * mac80211: allow the driver to advertise A-MSDU within A-MPDU Rx support [https://git.kernel.org/torvalds/c/99e7ca44bb910f0cbfda5d9008e8517df0ebc939 commit] * mac80211: allow to transmit A-MSDU within A-MPDU [https://git.kernel.org/torvalds/c/e3abc8ff0fc18b3925fd5d5c5fbd1613856f4e7c commit] * openvswitch: netlink attributes for IPv6 tunneling [https://git.kernel.org/torvalds/c/6b26ba3a7d952e611dcde1f3f77ce63bcc70540a commit] * switchdev: Add support for flood control [https://git.kernel.org/torvalds/c/741af0053b43d8b9a688a12c57ece62338616ae8 commit] * switchdev: Make flood to CPU optional [https://git.kernel.org/torvalds/c/371e59adcebf9953385bf46d5325ac39a53c5520 commit] * tipc: introduce capability bit for broadcast synchronization [https://git.kernel.org/torvalds/c/fd556f209af53b9cdc45df8c467feb235376c4df commit] * tipc: introduce jumbo frame support for broadcast [https://git.kernel.org/torvalds/c/959e1781aa230aecc90e4deb80117fd9a53dede7 commit] |
* Lockless TCP listener [[https://git.kernel.org/torvalds/c/4d54d86546f62c7c4a0fe3b36a64c5e3b98ce1a9|commit]], [[https://git.kernel.org/torvalds/c/e6934f3ec00b04234acb24a1a2c28af59763d3b5|commit]], [[https://git.kernel.org/torvalds/c/c3fc7ac9a0b978ee8538058743d21feef25f7b33|commit]] * Add setsockopt() support for SO_INCOMING_CPU and extend SO_REUSEPORT selection logic : If a TCP listener or UDP socket has this option set, a packet is delivered to this socket only if CPU handling the packet matches the specified one. This allows to build very efficient TCP servers, using one listener per RX queue, as the associated TCP listener should only accept flows handled in softirq by the same cpu. This provides optimal NUMA behavior and keep cpu caches hot [[https://git.kernel.org/torvalds/c/76973dd79fd52f187ba3df018bca65792a3d942|commit]], [[https://git.kernel.org/torvalds/c/70da268b569d32a9fddeea85dc18043de9d89f89|commit]] * TCP: Recent ACK (RACK) loss recovery. RACK loss recovery uses the notion of time instead of packet sequence (FACK) or counts (dupthresh) (see commit for details). In the current patch set RACK is only a supplemental loss detection and does not trigger fast recovery. However RACK is being developed to replace or consolidate FACK/dupthresh, early retransmit, and thin-dupack. Since RACK is still experimental, it is now used as a supplemental loss detection on top of existing algorithms. It can be disabled with sysctl ''net.ipv4.tcp_recovery'' [[https://git.kernel.org/torvalds/c/eb9fae328faff9807a4ab5c1834b19f34dd155d4|commit]] * IP Virtual Server: Support scheduling of ICMP packets to IPVS instances. A new sysctl ''net.ipv4.vs.schedule_icmp'' has been introduced, that will enable this feature if set to 1 (by default, it is set by default to 0 to retain the old behaviour) [[https://git.kernel.org/torvalds/c/99cb99aa055a72d3880d8a95a71034c4d64bcf9a|merge commit]] * IP Virtual Server: Allow to ignore tunnelled packets with new Sysctl ''net.ipv4.vs.ignore_tunneled''. If set, ipvs will set the ipvs_property on all packets which are of unrecognised protocols. This prevents the kernel from routing tunnelled protocols like ipip, which is useful to prevent rescheduling packets that have been tunneled to the ipvs host (i.e. to prevent ipvs routing loops when ipvs is also acting as a real server) [[https://git.kernel.org/torvalds/c/4e478098ac0ac1b6ef9a70fcdc2ec8b93f1b59a1|commit]] * Provide FIB table ID in ipv4 route dumps just as ipv6 does [[https://git.kernel.org/torvalds/c/b7503e0cdb5dbec5d201aa69d8888c14679b5ae8|commit]] * IPv4: Hash-based multipath routing. When the routing cache was [[http://kernelnewbies.org/Linux_3.6#head-85de5e5247a939f2a61d0c5ccbc13ff5b4f1a6a0|removed in 3.6]], the IPv4 multipath algorithm changed from more or less being destination-based into being quasi-random per-packet scheduling. This increased the risk of out-of-order packets and made it impossible to use multipath together with anycast services. In this release, the multipath routing implementation is replaced with a flow-based load balancing based on a hash over the source and destination addresses [[https://git.kernel.org/torvalds/c/07355737a8badd951e6b72aa8609a2d6eed0a7e7|merge commit]] * IPv6 support to the Virtual Routing and Forwarding (VRF) devices [[https://git.kernel.org/torvalds/c/ccf3c8c3fe1bd4828556650ae7928da6ffb4aaf6|commit]], [[https://git.kernel.org/torvalds/c/35402e31366349a32b505afdfe856aeeb8d939a0|commit]], [[https://git.kernel.org/torvalds/c/ca254490c8dfdaddb5df8a763774db0f4c5200c3|commit]] * IPv4: Currently adding a new ipv4 address always cause the creation of the related network route, with default metric. Add support for IFA_F_NOPREFIXROUTE for ipv4 address. When an address is added with such flag set, no associated network route is created, no network route is deleted when said IP is gone and it's up to the user space manage such route [[https://git.kernel.org/torvalds/c/7b1311807f3d3eb8bef3ccc53127838b3bea3771|commit]] * IPv6: gro: support sit protocol [[https://git.kernel.org/torvalds/c/feec0cb3f20b837f8ca36e974267918d7a4497f8|commit]] * Allow the user to ask for the statistics to be filtered out of ipv4/ipv6 address netlink dumps, because many commonly used functions like getifaddrs() invoke RTM_GETLINK to dump the interface information, and do not need the AF_INET6 statistics, which are expensive to calculate [[https://git.kernel.org/torvalds/c/d5566fd72ec1924958fcfd48b65c022c8f7eae64|commit]] * bridge: Allow setting the bridge attribute ''ageing_time'' in rocker and switchdev [[https://git.kernel.org/torvalds/c/c62987bbd8a1a1664f99e89e3959339350a6131e|commit]], [[https://git.kernel.org/torvalds/c/d0cf57f9dddb50ea404bf747a3c6b22b29f82b9a|commit]], [[https://git.kernel.org/torvalds/c/f55ac58ae64cbb0315382e738681fe31837dcac0|commit]] * vxlan: support both IPv4 and IPv6 sockets in a single vxlan device [[https://git.kernel.org/torvalds/c/b1be00a6c39fda2ec380e168d7bcf96fb8c9da42|commit]] * bridge: complete the bridge device's netlink support and makes it possible to view and configure everything that can be configured via sysfs [[https://git.kernel.org/torvalds/c/3e087caa23ef36370bfb925d3bbca78e8302d3ce|commit]] * bridge: Enable adding fdb entries pointing to the bridge device. This can be used to propagate mac address of vlan interfaces configured on top of the vlan filtering bridge [[https://git.kernel.org/torvalds/c/3741873b4f73b572b8f8835e6bd114e08316a160|commit]] * Multi Protocol Label Switching (MPLS): Add support for multipath routes [[https://git.kernel.org/torvalds/c/1c78efa8319cad2f10f421afa627745fb4d9b29f|commit]], [[https://git.kernel.org/torvalds/c/f8efb73c97e2fa0abbe2e07c5c5df07800312643|commit]] * bonding: support encapsulated ipv6 TSO [[https://git.kernel.org/torvalds/c/e87eb4051efe76b35d0a297db772f5964a001544|commit]] * Add support for filtering neighbor dumps by master device by adding the NDA_MASTER attribute to the dump request. A new netlink flag, NLM_F_DUMP_FILTERED, is added to indicate the kernel supports the request and output is filtered as requested [[https://git.kernel.org/torvalds/c/21fdd092acc7ebda0dfe682008592eb79c382707|commit]] * Add support for filtering neighbor dumps by device by adding the NDA_IFINDEX attribute to the dump request [[https://git.kernel.org/torvalds/c/16660f0bd942cec203eaf4de0e2ac1695bd9d32d|commit]] * Support for disabling certain features on devices which, when disabled on an upper device, such as a bonding master or a bridge, must be disabled and cannot be re-enabled on underlying devices [[https://git.kernel.org/torvalds/c/fd867d51f889aec11cca235ebb008578780d052d|commit]] * Introduce L3 Master device abstraction support. It provides glue between core networking code and device drivers to support L3 master devices like VRF [[https://git.kernel.org/torvalds/c/1b69c6d0ae90b7f1a4f61d5c8209d5cb7a55f849|commit]] * dummy: add more features [[https://git.kernel.org/torvalds/c/8f3af27786913851e720bc9466d1abffcfa7aff6|commit]] * tso: add support for IPv6 [[https://git.kernel.org/torvalds/c/8941faa161b526199e55ca7764cf875383453612|commit]] * netfilter: nfnetlink_log: enables to include the conntrack information together with the packet that is sent to user-space via NFLOG, then a user-space program can acquire NATed information by this NFULA_CT attribute [[https://git.kernel.org/torvalds/c/a29a9a585b2840a205f085a34dfd65c75e86f7c3|commit]] * Wireless * Allow changing station capabilities for unassociated stations [[https://git.kernel.org/torvalds/c/47edb11b522561658fe719e56aa69a3c3098a3fe|commit]] * Implement Very High Throughput support for mesh networks [[https://git.kernel.org/torvalds/c/c85fb53c4fa6521352028c40ce096a808aabd389|commit]] * Make CRDA support optional [[https://git.kernel.org/torvalds/c/b68630369167a7fd2c4c3d1be96430defc59fb9a|commit]] * Advertise support for full station state in AP mode [[https://git.kernel.org/torvalds/c/44674d9c2267f454f38df7b2395939bfa911f92e|commit]] * Put current TX power in interface info replies [[https://git.kernel.org/torvalds/c/d55d0d598e6610bbfcc1f2ecd6e8af669b94783b|commit]] * Enable wiphy device to suspend/resume asynchronously [[https://git.kernel.org/torvalds/c/9f0e13546ef5773b7059b531a667ec47a5f897ee|commit]] * ieee802154: experimental netlink support [[https://git.kernel.org/torvalds/c/a26c5fd7622d4951425131d54a8c99f076fe2068|commit]] * ieee802154: 6lowpan: add tx/rx stats [[https://git.kernel.org/torvalds/c/1c64f147d3cc9bbafe091a7b335ea3ec700186f0|commit]] * ipconfig: Allow to send Client-identifier in DHCP requests with something like ''ip=dhcp,client_id_type, client_id_value'', as a kernel parameter to enable the kernel to identify itself to the server [[https://git.kernel.org/torvalds/c/26fb342c734061859fec1bd9e987bb6b78061ef0|commit]] * Add netlink directives and ndo entry to trust VF user. This controls the special permission of VF user. The administrator will dedicatedly trust VF user to use some features which impacts security and/or performance [[https://git.kernel.org/torvalds/c/dd461d6aa894761fe67c30ddf81eec0d08be216b|commit]] * IB: Add support of checksum capability reporting for RC and RAW [[https://git.kernel.org/torvalds/c/470a55358186d0bb93558a87d13159dfbc989351|commit]] * IB: Add support for network namespaces [[https://git.kernel.org/torvalds/c/565edd1d555513ab5d67a847d50d7c14c82ef6c3|commit]], [[https://git.kernel.org/torvalds/c/fa20105e09e97e81aadf02f722c31195e4a75c84|commit]], [[https://git.kernel.org/torvalds/c/4be74b42a6d05a74a21362010cd3920fa17f63c7|commit]] * openvswitch: Add netlink attributes for IPv6 tunnel addresses. This enables IPv6 support for tunnels [[https://git.kernel.org/torvalds/c/6b26ba3a7d952e611dcde1f3f77ce63bcc70540a|commit]] * switchdev: Add support for flood control [[https://git.kernel.org/torvalds/c/741af0053b43d8b9a688a12c57ece62338616ae8|commit]], [[https://git.kernel.org/torvalds/c/371e59adcebf9953385bf46d5325ac39a53c5520|commit]] * TIPC: introduce jumbo frame support for broadcast [[https://git.kernel.org/torvalds/c/959e1781aa230aecc90e4deb80117fd9a53dede7|commit]] * xprtrdma: Enable swap-on-NFS/RDMA [[https://git.kernel.org/torvalds/c/a045178887ebafa9514d6b4cb840ac13a26c8365|commit]] = List of merges = * [[https://git.kernel.org/torvalds/c/2e002662973fd8d67d5a760776a5d3ea3d3399a9|Merge file descriptor allocation speedup.]] * [[https://git.kernel.org/torvalds/c/66b019967845a5af58802fd9af77f2317e5298a1|Pull hwmon updates ]] * [[https://git.kernel.org/torvalds/c/17a1359034e1fb5cfe9e5196a8ab5153acfacdc6|Pull MMC updates ]] * [[https://git.kernel.org/torvalds/c/9ff3ca58b0f99475f269cd6faf4ab1b194243b3f|Pull EDAC updates ]] * [[https://git.kernel.org/torvalds/c/bc9d8c20ffb47e64a41a4716a06d37cdf88fcc42|Pull pin control updates ]] * [[https://git.kernel.org/torvalds/c/e86328c489d7ecdca99410a06a3f448caf7857bf|Pull GPIO updates ]] * [[https://git.kernel.org/torvalds/c/5062ecdb662bf3aed6dc975019c53ffcd3b01d1c|Pull regmap updates ]] * [[https://git.kernel.org/torvalds/c/e8a2a176dd0f3e4c7252b79dc5cea6bd4acaf78e|Pull LED updates ]] * [[https://git.kernel.org/torvalds/c/df91fba5e7ed414b1856b33bbed6d55439a14c47|Pull m68k update ]] * [[https://git.kernel.org/torvalds/c/15f93405aae307a3cb2c33c795286463601963f7|Pull avr32 update ]] * [[https://git.kernel.org/torvalds/c/0921f1efb605d8fda43d794734222d1ad39c6840|Pull CRIS changes ]] * [[https://git.kernel.org/torvalds/c/2c2b8285dcd4d0674b6e77269cf32721fffea59e|Pull ARC updates ]] * [[https://git.kernel.org/torvalds/c/316dde2fe95b33657de1fc2db54bfc16aa065790|Pull ARM updates ]] * [[https://git.kernel.org/torvalds/c/7b2a4306f9e7d64bb408a6df3bb419500578068a|Pull timer updates ]] * [[https://git.kernel.org/torvalds/c/6aa2fdb87cf01d7746955c600cbac352dc04d451|Pull irq updates ]] * [[https://git.kernel.org/torvalds/c/7eeef2abe87dc0d8c276f97ccfdb1f42d9d1e4d8|Pull wchan kernel address hiding ]] * [[https://git.kernel.org/torvalds/c/f5a8160c1e055c0fd8d16a5b3ac97c638365b0db|Pull EFI changes ]] * [[https://git.kernel.org/torvalds/c/281422869942c19f05a08d4017c633d08d390938|Pull RCU changes ]] * [[https://git.kernel.org/torvalds/c/d63a9788650fcd999b34584316afee6bd4378f19|Pull locking changes ]] * [[https://git.kernel.org/torvalds/c/b02ac6b18cd4e2c76bf0a102c20c429b973f5f76|Pull perf updates ]] * [[https://git.kernel.org/torvalds/c/b831ef2cad979912850e34f82415c0c5d59de8cb|Pull RAS changes ]] * [[https://git.kernel.org/torvalds/c/53528695ff6d8b77011bc818407c13e30914a946|Pull scheduler changes ]] * [[https://git.kernel.org/torvalds/c/d2bea739f8b41d620c235d81e00289d01169dc3c|Pull x86 APIC changes ]] * [[https://git.kernel.org/torvalds/c/a75a3f6fc92888e4119744d8594ffdf748c3d444|Pull x86 asm changes ]] * [[https://git.kernel.org/torvalds/c/378e4e98258ad92097bfdf795dbef8b49cf52a34|Pull x86 boot cleanup ]] * [[https://git.kernel.org/torvalds/c/33d46f9765901a08d7759c031779073263e8b4e3|Pull x86 cleanups ]] * [[https://git.kernel.org/torvalds/c/f323c49b300baf89e2cb4050b0def1856c0b1852|Pull x86 CPU changes ]] * [[https://git.kernel.org/torvalds/c/0f25f2c1b18f7e47279ec2cf1d24c11c3108873b|Pull x86 kgdb fixlet ]] * [[https://git.kernel.org/torvalds/c/ce4d72fac16a9540452957b526443b6080030bff|Pull x86 FPU changes ]] * [[https://git.kernel.org/torvalds/c/4302d506d5f3419109abdd0d6e400ed6e8148209|Pull x86 sigcontext header cleanups ]] * [[https://git.kernel.org/torvalds/c/639ab3eb38c6e92e27e061551dddee6dd3bbb5d2|Pull x86 mm changes ]] * [[https://git.kernel.org/torvalds/c/66ef3493d4bb387f5a83915e33dc893102fd1b43|Pull x86 platform changes ]] * [[https://git.kernel.org/torvalds/c/ccc9d4a6d640cbde05d519edeb727881646cf71b|Pull crypto update ]] * [[https://git.kernel.org/torvalds/c/b0f85fa11aefc4f3e03306b4cd47f113bd57dcba|Pull networking updates ]] * [[https://git.kernel.org/torvalds/c/1b1050cdc5cdde43177b375b5f22dc070d45d8f8|Pull IDE fixlet ]] * [[https://git.kernel.org/torvalds/c/14c79092909a52b6fd6394b6ad5e7756c4f9565e|Pull parisc updates ]] * [[https://git.kernel.org/torvalds/c/e627078a0cbdc0c391efeb5a2c4eb287328fd633|Pull s390 updates ]] * [[https://git.kernel.org/torvalds/c/2dc10ad81fc017837037e60439662e1b16bdffb9|Pull arm64 updates ]] * [[https://git.kernel.org/torvalds/c/41ecf1404b34d9975eb97f5005d9e4274eaeb76a|Pull xen updates ]] * [[https://git.kernel.org/torvalds/c/0d51ce9ca1116e8f4dc87cb51db8dd250327e9bb|Pull power management and ACPI updates ]] * [[https://git.kernel.org/torvalds/c/d9734e0d1ccf87e828ad172c58a96dff97cfc0ba|Pull core block updates ]] * [[https://git.kernel.org/torvalds/c/a9aa31cdc2a7be4a70b0ea24a451dfeb00ce0024|Pull block driver updates ]] * [[https://git.kernel.org/torvalds/c/effa04cc5a31b3f12cda6025ab93460f1f0e454e|Pull lightnvm support ]] * [[https://git.kernel.org/torvalds/c/527d1529e38b36fd22e65711b653ab773179d9e8|Pull block integrity updates ]] * [[https://git.kernel.org/torvalds/c/ccf21b69a83afaee4d5499e0d03eacf23946e08c|Pull block reservation support ]] * [[https://git.kernel.org/torvalds/c/ac322de6bf5416cb145b58599297b8be73cd86ac|Pull md updates ]] * [[https://git.kernel.org/torvalds/c/e0700ce70921fbe3d1913968c663beb9df2b01a9|Pull device mapper updates ]] * [[https://git.kernel.org/torvalds/c/3d6f47801c34e42da26e2b6b29706f0bfe423978|Pull USB updates ]] * [[https://git.kernel.org/torvalds/c/fd0d351de7bbd718bc2b34d5846854831aa2b88c|Pull tty/serial driver updates ]] * [[https://git.kernel.org/torvalds/c/118c216e16c5ccb028cd03a0dcd56d17a07ff8d7|Pull staging driver updates ]] * [[https://git.kernel.org/torvalds/c/e880e87488d5bbf630dd716e6de8a53585614568|Pull driver core updates ]] * [[https://git.kernel.org/torvalds/c/8e483ed1342a4ea45b70f0f33ac54eff7a33d918|Pull char/misc driver updates ]] * [[https://git.kernel.org/torvalds/c/9576c2f2934eb5839a468ae156418ef595d5fec6|Pull file locking updates ]] * [[https://git.kernel.org/torvalds/c/d000f8d67f2bb464c9cf4fb5103f78d8cb406c05|Pull dlm update ]] * [[https://git.kernel.org/torvalds/c/0fcb9d21b4e18ede3727b8905e74acd0d1daef56|Pull f2fs updates ]] * [[https://git.kernel.org/torvalds/c/66339fdacb63fc7908e7eb755b9fffa672ffbb10|Pull pstore updates ]] * [[https://git.kernel.org/torvalds/c/b0378657549bbc73ac0ec6e9332fcf3c53362365|Pull media updates ]] * [[https://git.kernel.org/torvalds/c/9bd9fa6c147e68fc4dc3b35893979720ba7d0321|Pull HSI updates ]] * [[https://git.kernel.org/torvalds/c/400c5bd5a5b1faf3089322ace58b974446a8ddc3|Pull power supply and reset updates ]] * [[https://git.kernel.org/torvalds/c/f66477a0aeb77f97a7de5f791700dadc42f3f792|Pull clk updates ]] * [[https://git.kernel.org/torvalds/c/52787e91bf5375e68e90f381bd157bd92e1f4a77|Pull regulator updates ]] * [[https://git.kernel.org/torvalds/c/75f5db39ff14ed95056f2cca3ad98c3cae97170c|Pull SPU updates ]] * [[https://git.kernel.org/torvalds/c/e25ac7ddaae0e798f794cdaf9109bc71246110cd|Pull workqueue update ]] * [[https://git.kernel.org/torvalds/c/11eaaadb3ea376c6c194491c2e9bddd647f9d253|Pull libata updates ]] * [[https://git.kernel.org/torvalds/c/69234acee54407962a20bedf90ef9c96326994b5|Pull cgroup updates ]] * [[https://git.kernel.org/torvalds/c/6de29ccb50f2caef07cdd888efc8cb933497b6a4|Pull userns hardlink capability check fix ]] * [[https://git.kernel.org/torvalds/c/3460b01b12aaf0011cb30f6f502edd05752f70eb|Pull audit updates ]] * [[https://git.kernel.org/torvalds/c/1873499e13648a2dd01a394ed3217c9290921b3d|Pull security subsystem update ]] * [[https://git.kernel.org/torvalds/c/5ebe0ee802c52cdf0c0eed8f3eccc9a056e412a3|Pull documentation update ]] * [[https://git.kernel.org/torvalds/c/ab1228e42e71f5cb687c740c4c304f1d48bcf68a|Pull Intel IOMMU updates ]] * [[https://git.kernel.org/torvalds/c/39cf7c398122ff6d7df13d2832810933d227ac59|Pull IOMMU updates ]] * [[https://git.kernel.org/torvalds/c/a3e7531535a0c6e5acbaa5436f37933bb471aa95|Pull SCSI updates ]] * [[https://git.kernel.org/torvalds/c/933425fb0010bd02bd459b41e63082756818ffce|Pull KVM updates ]] * [[https://git.kernel.org/torvalds/c/2c302e7e41050dbc174d50b58ad42eedf5dbd6fa|Pull sparc updates ]] * [[https://git.kernel.org/torvalds/c/2e3078af2c67730c479f1d183af5b367f5d95337|Merge patch-bomb ]] * [[https://git.kernel.org/torvalds/c/2f4bf528eca5b2d9eef12b6d323c040254f8f67c|Pull powerpc updates ]] * [[https://git.kernel.org/torvalds/c/d1e41ff11941784f469f17795a4d9425c2eb4b7a|Pull x86 platform driver update ]] * [[https://git.kernel.org/torvalds/c/bc914532a08892b30954030a0ba68f8534c67f76|Pull MFD updates ]] * [[https://git.kernel.org/torvalds/c/5bc23a0cdee4a6757fcc2919eb26827fe11e3bee|Pull backlight updates ]] * [[https://git.kernel.org/torvalds/c/0280d1a099da1d211e76ec47cc0944c993a36316|Pull sound updates ]] * [[https://git.kernel.org/torvalds/c/3c87b791880a2e0dad281c6494b94968d412bfa3|Pull PCI updates ]] * [[https://git.kernel.org/torvalds/c/2d49dcb9e48f65a69281fe4c698c8f1a20215daf|Pull mailbox updates ]] * [[https://git.kernel.org/torvalds/c/02f0d3f758ab456c50199b723a53f2443fa4f684|Pull MTD updates ]] * [[https://git.kernel.org/torvalds/c/3e069adabc9487b5e28065a17e6a228da3412dfd|Pull input updates ]] * [[https://git.kernel.org/torvalds/c/9bbd4b9f38f56b4ee2c8ff268a1104ff38333e90|Pull DeviceTree updates ]] * [[https://git.kernel.org/torvalds/c/22402cd0af685c1a5d067c87db3051db7fff7709|Pull tracking updates ]] * [[https://git.kernel.org/torvalds/c/9cf5c095b65da63c08b928a7d0015d5d5dca8a66|Pull asm-generic cleanups ]] * [[https://git.kernel.org/torvalds/c/713009809681e5a7871e96e6992692c805b4480b|Pull ext4 updates ]] * [[https://git.kernel.org/torvalds/c/27eb427bdc0960ad64b72da03e3596c801e7a9e9|Pull Btrfs updates ]] * [[https://git.kernel.org/torvalds/c/6f1da317ac1df15f442b5fd37be7740c7cb55057|Pull HID updates ]] * [[https://git.kernel.org/torvalds/c/75021d28594d9b6fb4d05bbc41f77948a0db0e02|Pull trivial updates ]] * [[https://git.kernel.org/torvalds/c/ab9f2faf8f40604551336e5b0a18e0910a57b92c|Pull RDMA updates ]] * [[https://git.kernel.org/torvalds/c/ad804a0b2a769a0eed29015c53fe395449c09d13|Merge second patch-bomb ]] * [[https://git.kernel.org/torvalds/c/50c36504fc6090847f1fbdc7cf4852ae16d6e500|Pull module updates ]] * [[https://git.kernel.org/torvalds/c/e4da7e9a54649d6877ac23828ff93ce7191eae2c|Pull m68knommu/coldfire fix ]] * [[https://git.kernel.org/torvalds/c/3510ca19a82ba4c6a17af79c1f0448622a406efa|Pull xtensa updates ]] * [[https://git.kernel.org/torvalds/c/f4d68930a88219ffda60f137dcc858e4f5db6680|Pull nios2 updates ]] * [[https://git.kernel.org/torvalds/c/373ee21eecebc5c06786a803d99661a3657afcc7|Pull parisc updates ]] * [[https://git.kernel.org/torvalds/c/123a28d8b522b03dd97c1f791245924088616ac0|Pull ext2 fix ]] * [[https://git.kernel.org/torvalds/c/9d74288ca79249af4b906215788b37d52263b58b|Pull gfs2 updates ]] * [[https://git.kernel.org/torvalds/c/e6604ecb70d4b1dbc0372c6518b51c25c4b135a1|Pull NFS client updates ]] * [[https://git.kernel.org/torvalds/c/bd4f203e433387d39be404b67ad02acf6f76b7bc|Merge third patch-bomb ]] * [[https://git.kernel.org/torvalds/c/3e82806b97398d542a5e03bd94861f79ce10ecee|Pull drm updates ]] * [[https://git.kernel.org/torvalds/c/3b13866869b8407497d20a916450594e117583e6|Pull fbdev updates ]] * [[https://git.kernel.org/torvalds/c/7d884710bb3635f94dac152ae226ca54a585a223|Pull RTC updates ]] * [[https://git.kernel.org/torvalds/c/041c79514af9080c75197078283134f538f46b44|Pull dmaengine updates ]] * [[https://git.kernel.org/torvalds/c/6aabef681df96b851b4a11459520d4a20ab1cae4|Pull tiny hwmon update ]] * [[https://git.kernel.org/torvalds/c/42d4ebb42a17754d2e8344dc1aa486119671d0eb|Pull watchdog update ]] * [[https://git.kernel.org/torvalds/c/d55fc37856244c929965c190c8e9dcb49e2c07aa|Pull i2c updates ]] * [[https://git.kernel.org/torvalds/c/264015f8a83fefc62c5125d761fbbadf924e520c|Pull libnvdimm updates ]] * [[https://git.kernel.org/torvalds/c/a5e1d715a8d0696961d99d31d869aa522f1cad5a|Pull ARM SoC cleanups ]] * [[https://git.kernel.org/torvalds/c/56e0464980febfa50432a070261579415c72664e|Pull ARM SoC platform updates ]] * [[https://git.kernel.org/torvalds/c/b44a3d2a85c64208a57362a1728efb58a6556cd6|Pull ARM SoC driver updates ]] * [[https://git.kernel.org/torvalds/c/c0d6fe2f01c475cc137d90607a07578586883df8|Pull ARM DT updates ]] * [[https://git.kernel.org/torvalds/c/52e9a33333fc337d03ffb865048f9ccae8552a8d|Pull ARM SoC defconfig updates ]] * [[https://git.kernel.org/torvalds/c/c6de7f1754bd474019c60d6f076fa3f704e46b78|Pull metag arch updates ]] * [[https://git.kernel.org/torvalds/c/4bde961e5245bb37dab4831107bbed23e433d55a|Pull UML updates ]] * [[https://git.kernel.org/torvalds/c/01504f5e9e071f1dde1062e3be15f54d4555308f|Pull UBI/UBIFS updates ]] * [[https://git.kernel.org/torvalds/c/3419b45039c6b799c974a8019361c045e7ca232c|Pull block IO poll support ]] * [[https://git.kernel.org/torvalds/c/6a177af775d92cff7ef36a681c304dc750dbe121|Pull kselftest updates ]] * [[https://git.kernel.org/torvalds/c/c34e6e0bd5d729948119d4b3e15b075ec0b80d6f|Pull kbuild update ]] * [[https://git.kernel.org/torvalds/c/152813e6e4bbb5f017e33eba7eb01bbda4b389b8|Pull kconfig updates ]] * [[https://git.kernel.org/torvalds/c/5dfe5b2c714a5bea0908c1e00da0e8e00535f55c|Pull misc kbuild updates ]] * [[https://git.kernel.org/torvalds/c/c5a37883f42be712a989e54d5d6c0159b0e56599|Merge final patch-bomb ]] * [[https://git.kernel.org/torvalds/c/baf51c43926ec9aa42ef9d33ca6ee9e3e043aebe|Pull thermal updates ]] * [[https://git.kernel.org/torvalds/c/c8fff3ed321abf11bea7464884b0876c46ff2491|Pull pwm updates ]] * [[https://git.kernel.org/torvalds/c/842cf0b9525813b084720a82d0d3aabc750b7ccc|Pull vfs update ]] * [[https://git.kernel.org/torvalds/c/31c1febd7a45229edb3e5d86f354e3c1df543cbb|Pull nfsd updates ]] * [[https://git.kernel.org/torvalds/c/5d50ac70fe98518dbf620bfba8184254663125eb|Pull xfs updates ]] * [[https://git.kernel.org/torvalds/c/be23c9d20b341a58ad7107f9e9aa5735cea3da13|Pull more power management and ACPI updates ]] * [[https://git.kernel.org/torvalds/c/3370b69eb0c1f6a05f9051e8fc3e8768461a80f7|Pull second batch of kvm updates ]] * [[https://git.kernel.org/torvalds/c/7dac7102afbeb99daa454f555f1ea1f42fad2f78|Pull h8300 updates ]] * [[https://git.kernel.org/torvalds/c/0e976064256523ca604bd82048ae0e3402ce2467|Pull trace cleanups ]] * [[https://git.kernel.org/torvalds/c/be4773e6a11a0cc1e63c9c32f000b870e51b8c01|Pull drm sti driver updates ]] * [[https://git.kernel.org/torvalds/c/4aeabc6b5ca3b9d025f287978096e138bdfbdd35|Pull more documentation updates ]] * [[https://git.kernel.org/torvalds/c/ca4ba96e02e932a0c9997a40fd51253b5b2d0f9d|Pull Ceph updates ]] * [[https://git.kernel.org/torvalds/c/f3996e6ac6e2bd739d8a82cc9acae0653c2d5dca|Pull SMB3 updates ]] * [[https://git.kernel.org/torvalds/c/934f98d7e8123892bd9ca8ea08728ee0784e6597|Pull VFIO updates ]] * [[https://git.kernel.org/torvalds/c/5d2eb548b309be34ecf3b91f0b7300a2b9d09b8c|Pull vfs xattr cleanups ]] * [[https://git.kernel.org/torvalds/c/9aa3d651a9199103eb6451aeb0ac1b66a6d770a6|Pull SCSI target updates ]] * [[https://git.kernel.org/torvalds/c/d83763f4a6adb2f417c3288ee903982985ae949c|Pull final round of SCSI updates ]] * [[https://git.kernel.org/torvalds/c/a30b7ca2894994e4e2f2e06811ee67fa637bca2e|Pull more input updates ]] * [[https://git.kernel.org/torvalds/c/4bfc89d26a0d177a79574fc1b54fc728e3bb8b4e|Pull another x86 platform driver update ]] * [[https://git.kernel.org/torvalds/c/63f4f7e8df6c504f39c6493799b54775916030d6|Pull chrome platform updates ]] * [[https://git.kernel.org/torvalds/c/b84da9fa47cf6e8dfd71d673a2f744ec1cac452c|Pull MIPS updates ]] * [[https://git.kernel.org/torvalds/c/0ca9b67606f0ce984b5811b0830cfd7d143f6077|Pull perf updates ]] |
Line 208: | Line 393: |
* LWN [[http://lwn.net/Articles/663742/|merge window part 1]] and [[http://lwn.net/Articles/664461/|part 2]] * Phoronix [[http://www.phoronix.com/scan.php?page=article&item=linux-44-features&num=1|A Look At The New Features Of The Linux 4.4 Kernel]] |
Linux 4.4 has been released on Sun, 10 Jan 2016.
Summary: This release adds support for 3D support in virtual GPU driver, which allows 3D hardware-accelerated graphics in virtualization guests; loop device support for Direct I/O and Asynchronous I/O, which saves memory and increases performance; support for Open-channel SSDs, which are devices that share the responsibility of the Flash Translation Layer with the operating system; the TCP listener handling is completely lockless and allows for faster and more scalable TCP servers; journalled RAID5 in the MD layer which fixes the RAID write hole; eBPF programs can now be run by unprivileged users, they can be made persistent, and perf has added support for eBPF programs aswell; a new mlock2() syscall that allows users to request memory to be locked on page fault; and block polling support for improved performance in high-end storage devices. There are also new drivers and many other small improvements.
Contents
-
Prominent features
- Faster and leaner loop device with Direct I/O and Asynchronous I/O support
- 3D support in virtual GPU driver
- LightNVM adds support for Open-Channel SSDs
- TCP listener handling completely lockless, making TCP servers faster and more scalable
- Preliminary journalled RAID5 MD support
- Unprivileged eBPF + persistent eBPF programs
- perf + eBPF integration
- Block polling support
- mlock2() syscall allow users to request memory to be locked on page fault
- Drivers and architectures
- Core (various)
- File systems
- Memory management
- Block layer
- Cryptography
- Security
- Tracing and perf tool
- Virtualization
- Networking
- List of merges
- Other news sites
1. Prominent features
1.1. Faster and leaner loop device with Direct I/O and Asynchronous I/O support
This release introduces support of Direct I/O and asynchronous I/O for the loop block device. There are several advantages to use direct I/O and AIO on read/write loop's backing file: double cache is avoided due to Direct I/O which reduces memory usage a lot; unlike user space direct I/O there isn't cost of pinning pages; avoids context switches in some cases because concurrent submissions can be avoided. See commits for benchmarks.
Code: commit, commit, commit, commit, commit
1.2. 3D support in virtual GPU driver
virtio-gpu is a driver for virtualization guests that allows to use the host graphics card efficiently. In this release, it allows the virtualization guest to use the capabilities of the host GPU to accelerate 3D rendering. In practice, this means that a virtualized linux guest can run a opengl game while using the GPU acceleration capabilities of the host, as show in this or this video. This also requires running QEMU 2.5.
44m linux.conf talk about the project
Code: commit
1.3. LightNVM adds support for Open-Channel SSDs
Open-channel SSDs are devices that share responsibilities with the operating system in order to implement and maintain features that typical SSDs keep strictly in firmware. These include the Flash Translation Layer (FTL), bad block management, and hardware units such as the flash controller, the interface controller, and large amounts of flash chips. In this way, Open-channels SSDs exposes direct access to their physical flash storage, while keeping a subset of the internal features of SSDs.
LightNVM is a specification that gives support to Open-channel SSDs. LightNVM allows the host to manage data placement, garbage collection, and parallelism. Device specific responsibilities such as bad block management, FTL extensions to support atomic IOs, or metadata persistence are still handled by the device. This Linux release adds support for lightnvm, (and adds support to NVMe as well).
Recommended LWN article: Taking control of SSDs with LightNVM
Code: commit, commit, commit, commit, commit
1.4. TCP listener handling completely lockless, making TCP servers faster and more scalable
In this release, and as a result from an effort that started two years ago, the TCP implementation has been refactored to make the TCP listener fast path completely lockless. During tests, a server was able to process 3,500,000 SYN packets per second on one listener and still have available CPU cycles - about 2 to 3 order of magnitude what it was possible before. SO_REUSEPORT has also been extended (see Networking section) to add proper CPU/NUMA affinities, so that heavy duty TCP servers can get proper siloing thanks to multi-queues NICs.
1.5. Preliminary journalled RAID5 MD support
This release adds journalled RAID 5 support to the MD (RAID/LVM) layer. With a journal device configured (typically NVRAM or SSD), Data/parity writing to RAID array first writes to the log, then write to raid array disks. If crash happens, we can recovery data from the log. This can speed up RAID resync and fixes RAID5 write hole issue - a crash during degraded operations cannot result in data corruption. In future releases the journal will also be used to improve performance and latency
Code: merge
1.6. Unprivileged eBPF + persistent eBPF programs
Unprivileged eBPF
eBPF programs got its own syscall in Linux 3.18, but until now its use had been restricted to root, because these programs were dangerous for security. eBPF programs are, however, validated by the kernel, and in this release the eBPF verifier has been improved and unprivileged users can use it (although unprivileged eBPF is only meaningful for 'socket filter'-like programs, eBPF programs for tracing and TC classifiers/actions will stay root only). This feature can be switched off with the sysctl kernel.unprivileged_bpf_disabled (once true, bpf programs and maps cannot be accessed from unprivileged process, and the toggle cannot be set back to false)
Recommended LWN article: Unprivileged bpf()
Persistent eBPF maps/progs
This release also adds support for "persistent" eBPF maps/programs. The term "persistent" is to be understood that maps/programs have a facility that lets them survive process termination. This is desired by various eBPF subsystem users, for example: tc classifier/action. Whenever tc parses the ELF object, extracts and loads maps/progs into the kernel, these file descriptors will be out of reach after the tc instance exits, so a subsequent tc invocation won't be able to access/relocate on this resource, and therefore maps cannot easily be shared, f.e. between the ingress and egress networking data path.
To fix issues as these, a new minimal file system has been created that can hold map/prog objects at /sys/fs/bpf/. Any subsequent mounts within a given namespace will point to the same instance. The file system allows for creating a user-defined directory structure. The objects for maps/progs are created/fetched through bpf(2) along with a pathname with two new commands (BPF_OBJ_PIN/BPF_OBJ_GET), that in turn creates the file system nodes. The user can use that to access maps and progs later on, through bpf(2).
1.7. perf + eBPF integration
In this release, eBPF programs have been integrated with perf. When perf is given an eBPF .c source file (or .o file built for the 'bpf' target with clang), will get it automatically built, validated and loaded into the kernel, which can then be used and seen using perf trace and other tools.
Users are allowed to use BPF filter like: # perf record --event ./hello_world.o ls, and the eBPF program is attached to a newly created perf event which works with all tools.
Code: commit, commit, commit, commit, commit, commit, commit, commit, commit, commit
1.8. Block polling support
This release adds basic support for polling for specific IO to complete, which can improve latency and throughput in very fast devices. Currently O_DIRECT sync read/write are supported. This support is only intended for testing, in future releases stats tracking will be used to auto-tune this. For now, for benchmark and testing purposes, we add a sysfs file (io_poll) that controls whether polling is enabled or not.
Recommended LWN article: Block-layer I/O polling
1.9. mlock2() syscall allow users to request memory to be locked on page fault
mlock() allows a user to control page out of program memory, but this comes at the cost of faulting in the entire mapping when it is allocated. For large mappings this is not ideal: For example, security applications that need mlock() are forced to lock an entire buffer, no matter how big it is. Or maybe a large graphical models where the path through the graph is not known until run time, they are forced to lock the entire graph or lock page by page as they are faulted in.
This new mlock2() syscall set creates a middle ground. Pages are marked to be placed on the unevictable LRU (locked) when they are first used, but they are not faulted in by the mlock call. The new system call that takes a flags argument along with the start address and size. This flags argument gives the caller the ability to request memory be locked in the traditional way, or to be locked after the page is faulted in. New calls are added for munlock() and munlockall() which give the called a way to specify which flags are supposed to be cleared. A new MCL flag is added to mirror the lock on fault behavior from mlock() in mlockall(). Finally, a flag for mmap() is added that allows a user to specify that the covered are should not be paged out, but only after the memory has been used the first time.
Recommended LWN article: Deferred memory locking
Code: commit, commit, commit, commit
2. Drivers and architectures
All the driver and architecture-specific changes can be found in the Linux_4.4-DriversArch page
3. Core (various)
process scheduler: Apply a frequency scaling correction factor to per-entity load tracking to make it invariant with respect to CPU frequency. Currently, load appears bigger when the CPU is running at slower frequencies, which affects load-balancing decisions commit, commit
seccomp: add support for dumping a process' (classic BFP) seccomp filters via ptrace + PTRACE_SECCOMP_GET_FILTER commit
watchdog: Mimic the softlockup_panic kernel knob and create a /proc/sys/kernel/hardlockup_panic. It enables a hardlockup to panic the machine commit
watchdog: optionally perform all-CPU backtrace in case of hard lockup. Can be enabled with sysctl /proc/sys/kernel/hardlockup_all_cpu_backtrace commit
coredump: Add two new flags to the existing coredump mechanism for ELF and FDPIC ELF files to allow us to explicitly filter DAX mappings. This is desirable because DAX mappings, like hugetlb mappings, have the potential to be very large commit, commit
test_printf: test printf family at runtime commit
Make sync_file_range(2) use WB_SYNC_NONE writeback. It helps PostgreSQL avoid large latency spikes when flushing data in the background commit
4. File systems
- XFS
- Btrfs
- CIFS
Allow duplicate extents (cp --reflink) in SMB3.0 not just SMB3.1.1 commit
Add resilienthandles mount parameter. Since many servers (Windows clients, and non-clustered servers) do not support persistent handles but do support resilient handles, allow the user to specify a mount option "resilienthandles" in order to get more reliable connections and less chance of data loss (at least when SMB2.1 or later). Default resilient handle timeout (120 seconds to recent Windows server) is used commit
Add support for persistent handles, which are like durable file handles with strong guarantees commit, commit, commit
Allow copy offload (copychunk) across shares commit
- NFS
- ext4
Store checksum seed in superblock commit
- OCFS2
Improve performance for localalloc commit
- UBIFS
atime support commit
5. Memory management
Get rid of vmalloc_info from /proc/meminfo. It is too expensive to calculate and shows up in real workloads, people who actually want to know what the situation is wrt the vmalloc area should just look at the much more complete /proc/vmallocinfo instead commit
Add HugetlbPages field to /proc/PID/status. Currently there's no easy way to get per-process usage of hugetlb pages, which is inconvenient because userspace applications which use hugetlb can need it commit
Add hugetlb-related fields to /proc/PID/smaps to know per-task or per-vma base hugetlb usage: AnonHugePages shows the amount of memory backed by transparent hugepage; Shared_Hugetlb and Private_Hugetlb show the amounts of memory backed by hugetlbfs page which is not counted in RSS or PSS field for historical reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field commit
memcontrol: eliminate memory.current on the root level, because it doesn't add anything that wouldn't be more accurate and detailed using system statistics commit
6. Block layer
loop: direct and asynchronous I/O commit, commit, commit, commit, commit
Add Persistent Reservations support. It includes a user space interface for simplified Persistent Reservations which map to block devices that support these (only SCSI for now). Persistent Reservations allow restricting access to block devices to specific initiators in a shared storage setup commit, commit, commit
Export integrity data interval size in /sys/block/<disk>/integrity/protection_interval_bytes, so that apps can tell whether the interval is different from the device's logical block size commit
cdrom: Random writing support for BD-RE media commit
7. Cryptography
crypto: caam - add support for acipher xts(aes) commit crypto: keywrap - add key wrapping block chaining mode commit crypto: qat - add support for ctr(aes) and xts(aes) commit
8. Security
9. Tracing and perf tool
Integration of perf with eBPF that, given an eBPF .c source file (or .o file built for the 'bpf' target with clang), will get it automatically built, validated and loaded into the kernel via the sys_bpf syscall, which can then be used and seen using 'perf trace' and other tools. Users can run commands like perf record --event bpf-file.c ls to try it commit, commit, commit, commit, commit, commit, commit, commit, commit, commit
Add a new branch type sampling filter to perf record, named 'call' (perf record -j call -e cycles .....), that samples only call branches (function calls), unlike 'any_call' that included direct, indirect calls and far jumps. Only x86 and PowerPC are supported in this release commit, commit
Add Intel cstate (aka idle states) Performance Monitoring Unit support. This allows perf to support cstate related free running (read-only and system-wide) counters. For example, to caculate the fraction of time when the core is running in C6 state: perf stat -x, -e"cstate_core/c6-residency/,msr/tsc/" -C0 -- taskset -c 0 sleep 5 commit
CPU socket filtering: perf tools introduce a new sort type "socket" for the processor socket, eg. perf report --stdio --sort socket,comm,dso,symbol commit. Also, perf report introduces a --socket-filter option for 'perf report' to only show entries for a processor socket that match this filter commit. perf hists browser can zoom in/out for processor socket commit
perf tools: Introduce 'P' modifier, it will cause the event to get maximum possible detected precise level. For example, perf record -e cycles:P ... will detect maximum precise level for 'cycles' event and use it commit
perf tools: Add support for sorting on the iaddr. New sort option is: symbol_iaddr, header label is 'Code Symbol', eg perf mem report --stdio -F +symbol_iaddr commit
perf tools: enables config terms for tracepoint perf events. Valid terms for tracepoint events are 'call-graph' and 'stack-size', so different callgraph settings can be used for each event and eliminate unnecessary overhead. An example for using different call-graph config for each tracepoint: perf record -e syscalls:sys_enter_write/call-graph=fp -e syscalls:sys_exit_write/call-graph=no dd if=/dev/zero of=test bs=4k count=10 commit
perf script: Enable printing of branch stack viaa the 'brstack' and 'brstacksym' arguments to the field selection option -F. The option is off by default and operates only if the perf.data file has branch stack content commit
perf auxtrace: Add AUX area tracing option 'l' to synthesize branch stacks on samples just like sample type PERF_SAMPLE_BRANCH_STACK commit
perf hists browser: Add 'm' key for context menu display commit
perf inject: Add --strip option which is used with --itrace to strip out non-synthesized events commit
perf script: Allow time to be displayed in nanoseconds commit
Intel PT hardware tracer: Accept a zero --itrace period, meaning "as often as possible". In the case of Intel PT that is the same as a period of 1 and a unit of 'instructions' (i.e. --itrace=i1i)commit
Intel PT: Add support for generating branch stack context for PT samples. This is useful for: reporting accurate basic block edge frequencies through the perf report branch view or using with --branch-history to get the wider context of samples. Examples, record with Intel PT: perf record -e intel_pt//u ls
ftrace: add module globbing commit
10. Virtualization
Support for VT-d posted interrupts (i.e. PCI devices can inject interrupts directly into vCPUs). Used by KVM and VFIO commit
KVM: Nested virtualization now supports VPID (same as PCID but for vCPUs) which makes it quite a bit faster commit, commit, commit
KVM: Support for "split irqchip", i.e. LAPIC in kernel and IOAPIC/PIC/PIT in userspace, which reduces the attack surface of the hypervisor commit, commit, commit
KVM: add capability for any-length ioeventfds. With KVM_CAP_IOEVENTFD_ANY_LENGTH, a zero length ioeventfd is allowed, and the kernel will ignore the length of guest write and may get a faster vmexit commit
VMware balloon: Get notified immediately via VMCI when a balloon target is set, instead of waiting for up to one second commit
VMware balloon: Support ballooning with 2 MB sized pages. It significantly reduces the hypervisor side (and guest side) overhead of ballooning and unballooning commit
Vmware vmxnet3: Extend register dump support commit
11. Networking
Add setsockopt() support for SO_INCOMING_CPU and extend SO_REUSEPORT selection logic : If a TCP listener or UDP socket has this option set, a packet is delivered to this socket only if CPU handling the packet matches the specified one. This allows to build very efficient TCP servers, using one listener per RX queue, as the associated TCP listener should only accept flows handled in softirq by the same cpu. This provides optimal NUMA behavior and keep cpu caches hot commit, commit
TCP: Recent ACK (RACK) loss recovery. RACK loss recovery uses the notion of time instead of packet sequence (FACK) or counts (dupthresh) (see commit for details). In the current patch set RACK is only a supplemental loss detection and does not trigger fast recovery. However RACK is being developed to replace or consolidate FACK/dupthresh, early retransmit, and thin-dupack. Since RACK is still experimental, it is now used as a supplemental loss detection on top of existing algorithms. It can be disabled with sysctl net.ipv4.tcp_recovery commit
IP Virtual Server: Support scheduling of ICMP packets to IPVS instances. A new sysctl net.ipv4.vs.schedule_icmp has been introduced, that will enable this feature if set to 1 (by default, it is set by default to 0 to retain the old behaviour) merge commit
IP Virtual Server: Allow to ignore tunnelled packets with new Sysctl net.ipv4.vs.ignore_tunneled. If set, ipvs will set the ipvs_property on all packets which are of unrecognised protocols. This prevents the kernel from routing tunnelled protocols like ipip, which is useful to prevent rescheduling packets that have been tunneled to the ipvs host (i.e. to prevent ipvs routing loops when ipvs is also acting as a real server) commit
Provide FIB table ID in ipv4 route dumps just as ipv6 does commit
IPv4: Hash-based multipath routing. When the routing cache was removed in 3.6, the IPv4 multipath algorithm changed from more or less being destination-based into being quasi-random per-packet scheduling. This increased the risk of out-of-order packets and made it impossible to use multipath together with anycast services. In this release, the multipath routing implementation is replaced with a flow-based load balancing based on a hash over the source and destination addresses merge commit
IPv6 support to the Virtual Routing and Forwarding (VRF) devices commit, commit, commit
IPv4: Currently adding a new ipv4 address always cause the creation of the related network route, with default metric. Add support for IFA_F_NOPREFIXROUTE for ipv4 address. When an address is added with such flag set, no associated network route is created, no network route is deleted when said IP is gone and it's up to the user space manage such route commit
IPv6: gro: support sit protocol commit
Allow the user to ask for the statistics to be filtered out of ipv4/ipv6 address netlink dumps, because many commonly used functions like getifaddrs() invoke RTM_GETLINK to dump the interface information, and do not need the AF_INET6 statistics, which are expensive to calculate commit
bridge: Allow setting the bridge attribute ageing_time in rocker and switchdev commit, commit, commit
vxlan: support both IPv4 and IPv6 sockets in a single vxlan device commit
bridge: complete the bridge device's netlink support and makes it possible to view and configure everything that can be configured via sysfs commit
bridge: Enable adding fdb entries pointing to the bridge device. This can be used to propagate mac address of vlan interfaces configured on top of the vlan filtering bridge commit
Multi Protocol Label Switching (MPLS): Add support for multipath routes commit, commit
bonding: support encapsulated ipv6 TSO commit
Add support for filtering neighbor dumps by master device by adding the NDA_MASTER attribute to the dump request. A new netlink flag, NLM_F_DUMP_FILTERED, is added to indicate the kernel supports the request and output is filtered as requested commit
Add support for filtering neighbor dumps by device by adding the NDA_IFINDEX attribute to the dump request commit
Support for disabling certain features on devices which, when disabled on an upper device, such as a bonding master or a bridge, must be disabled and cannot be re-enabled on underlying devices commit
Introduce L3 Master device abstraction support. It provides glue between core networking code and device drivers to support L3 master devices like VRF commit
dummy: add more features commit
tso: add support for IPv6 commit
netfilter: nfnetlink_log: enables to include the conntrack information together with the packet that is sent to user-space via NFLOG, then a user-space program can acquire NATed information by this NFULA_CT attribute commit
- Wireless
Allow changing station capabilities for unassociated stations commit
Implement Very High Throughput support for mesh networks commit
Make CRDA support optional commit
Advertise support for full station state in AP mode commit
Put current TX power in interface info replies commit
Enable wiphy device to suspend/resume asynchronously commit
ieee802154: experimental netlink support commit
ieee802154: 6lowpan: add tx/rx stats commit
ipconfig: Allow to send Client-identifier in DHCP requests with something like ip=dhcp,client_id_type, client_id_value, as a kernel parameter to enable the kernel to identify itself to the server commit
Add netlink directives and ndo entry to trust VF user. This controls the special permission of VF user. The administrator will dedicatedly trust VF user to use some features which impacts security and/or performance commit
IB: Add support of checksum capability reporting for RC and RAW commit
IB: Add support for network namespaces commit, commit, commit
openvswitch: Add netlink attributes for IPv6 tunnel addresses. This enables IPv6 support for tunnels commit
TIPC: introduce jumbo frame support for broadcast commit
xprtrdma: Enable swap-on-NFS/RDMA commit
12. List of merges
13. Other news sites
LWN merge window part 1 and part 2