Linux 4.2 has been released on 30 Aug 2015
Summary: This release adds a new amdgpu driver for modern AMD Radeon hardware, a virtio GPU driver to use the host GPU capabilities inside guests, the new atomic modesetting graphics API has been declared stable, support for stacking of security modules, a faster and more scalable spinlock implementation, cgroup writeback support, and reintroduction of the H8/300 architecture.There are also new drivers and many other small improvements.
Contents
1. Prominent features
1.1. New driver amdgpu for modern AMD Radeon hardware
This release includes the amdgpu driver, a new driver for VI+ AMD asics. It currently supports Tonga, Iceland, and Carrizo and also contains an option to build support for CI parts for testing. All major functionality is supported (displays, gfx, compute, dma, video decode/encode, etc.). Power management is working on Carrizo, but is still being worked on for Tonga and Iceland.
The purpose of this driver is to unify AMD's Linux offerings: the functionality that it's kept as private code in the Catalyst driver will be either ported to this driver or will run as a user-space private blob that uses the new driver.
Code: (merge)
1.2. Add virtio gpu driver
Virtio drivers are "fake" drivers that are used to make communication between virtualization guests and host faster, because emulating real hardware is complicated and ineficcient.
This release adds a virtual GPU driver for virtio. It can be used with QEMU based VMMs (like KVM or Xen). For now it supports kernel-modesetting: The xorg modesetting driver can handle the device just fine, the framebuffer for fbcon is there too. Qemu patches for the host side are under review currently. This initial revision has only 2d support, 3d (virgl) support requires some more work on the qemu side and will be added later.
Code: commit
1.3. Atomic modesetting API enabled by default
This release finally completes the atomic modesetting API and enables it by default. For details about the atomic modesetting API and why it is neccesary, read these recommended LWN articles: Atomic mode setting design overview, part 1, and part 2
Code: commit
1.4. Stacking of security modules
There are several security modules in the Linux kernel, but only one can be used. For a very long time, developers have wanted to be able to be able to use more than one at the same time ("stacking"). This release adds support for stacking of linux security modules
For more details, read Progress in security module stacking
Code: commit, commit, commit, commit, commit, commit, commit
1.5. Queued spinlocks become the default spinlock implementation
This release adds support in the x86 architecture for queue-based spinlocks that can replace the default ticket spinlock without increasing the size of the spinlock data structure.
The queue spinlock has slightly better performance than the ticket spinlock in uncontended case, and its performance can be much better with moderate to heavy contention. It is especially suitable for NUMA machines with at least 2 sockets. Though even at the 2-socket level, there can be significant speedup depending on the workload. It can also improve the performance of an I/O and interrupt intensive stress test with a lot of spinlock contention on a 2-socket system by up to 20%.
For more details, read this LWN article: MCS locks and qspinlocks
Code: commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit
1.6. cgroup writeback support
The Linux kernel can throttle processes than are trying to write too many pages to the disk ("writeback"). But this control is global and doesn't allow per-cgroup limits. This release adds support for writeback control of processes inside a cgroup.
For more details, read this recommended LWN article: Writeback and control groups
Documentation: Updates to Documentation/cgroups/blkio-controller.txt
Code: (merge)
1.7. Reintroduction of the H8/300 architecture
Linux added support for the H8/300 architecture in Linux 2.5.68. But it got removed in Linux 3.13 due to lack of maintainance.
In this release, the H8/300 architecture has got support again, and it's being reincorporated to the tree
Code: arch/h8300
2. Drivers and architectures
All the driver and architecture-specific changes can be found in the Linux_4.2-DriversArch page.
3. Core (various)
futex: Implement lockless wakeups. At the lowest level,it can reduce latency of a single thread attempting to acquire hb->lock in highly contended scenarios up to 2x commit
mqueue: Implement lockless pipelined wakeups commit
Allow drivers request for their probe functions to be called asynchronously during driver and device registration (manual binding is still synchronous) commit, commit, commit
printk: implement support for extended console drivers commit
rcu: Provide diagnostic option to slow down grace-period scans commit
- task scheduler
Replace spinlocks with atomics in thread_group_cputimer(), to improve scalability commit
debug: Add sum_sleep_runtime to /proc/<pid>/sched when CONFIG_SCHEDSTATS is enabled commit
debug: Replace vruntime with wait_sum in /proc/sched_debug commit
numa: Show numa_group ID in /proc/sched_debug task listings to see how the numa groups have spread across the system commit
Implement lockless wake-queues API commit
Export the CPU list that actually got isolated in /sys/devices/system/cpu/isolated. This can be used by system management tools like libvirt, openstack, and others to ensure proper placement of tasks commit
Export the CPU list running in nohz_full mode in /sys/devices/system/cpu/nohz_full. This can be used by system management tools like libvirt, openstack, and others to ensure proper placement of tasks commit
* Allow the watchdog to run by default only on the housekeeping cores when nohz_full is in effect; this seems to be a good compromise short of turning it off completely (since the nohz_full cores can't tolerate a watchdog). To provide customizability, a /proc/sys/kernel/watchdog_cpumask file is added so that the set of cores running the watchdog can be tuned to different values after bootup commit, commit, commit
Add an escape sequence to specify the current console's cursor blink interval. The interval is specified as a number of milliseconds until the next cursor display state toggle, from 50 to 65535 commit
4. File systems
- BTRFS
- FUSE
Allow an open fuse device to be "cloned" with FUSE_DEV_IOC_CLONE ioctl commit
- CIFS
- EXT2
Enable cgroup writeback support commit
- EXT4
Add support FALLOC_FL_INSERT_RANGE for fallocate(2), which allows to insert a hole within the file without overwriting any existing data commit
- F2FS
Support FALLOC_FL_ZERO_RANGE in fallocate(2), which zeroes a part of the file commit
Support FALLOC_FL_COLLAPSE_RANGE in fallocate(2), which removes a range from a file, without leaving a hole commit
Support FALLOC_FL_INSERT_RANGE in fallocate(2), which inserts a range in a file without overwritting data commit
Support RENAME_WHITEOUT in renameat2(2), used for for overlay/union filesystem implementations commit
Support encryption of f2fs files and directories commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit
Add compat_ioctl to provide backward compatability commit
Add default mount options to remount commit
Recovers a broken superblock with the other valid one commit
- XFS
When the free space is low and it's fragmented, XFS can have problems allocating new "inode chunks", necessary to create new inodes. This release adds EXPERIMENTAL support for a new sparse inode that allows to use small extents, smaller than a "inode chunk", for inode allocation. This feature can be enabled at mkfs time and it will set a incompatible feature flag that will prevent prevent mount from kernels that don't support sparse inodes commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit, commit
- OVERLAYFS
Allow distributed filesystems as lower layer commit
- GFS2
Add support for rename2 and RENAME_EXCHANGE commit
- NFS
NFSv4.2 LAYOUTSTATS functionality for pnfs flexfiles (merge)
- NILFS2
Support NFSv2 export commit
- UDF
Support NFSv2 export commit
- FS-Cache
5. Memory management
memcg: add per cgroup dirty page accounting, which provides the ability for each memory cgroup to have independent dirty/writeback page statistics which can provide information for per-cgroup direct reclaim or some. The new memcg stat is visible in the per memcg memory.stat cgroupfs file. The new accounting supports future efforts to add per cgroup dirty page throttling and writeback commit
Try to allocate all boot time kernel data structures from mirrored memory to have a recovery path for unrecoverable memory errors encountered during kernel code execution commit, commit, commit
zswap: Support runtime enable/disable of zswap commit
6. Block layer
The libnvdimm sub-system introduces, in addition to the libnvdimm-core, 4 drivers / enabling modules (common code to all: (merge))
- NFIT: Add ACPI NVDIMM Firmware Interface Table (NFIT) support: It adds infrastructure to probe ACPI 6 compliant platforms for NVDIMMs (NFIT) and register a libnvdimm device tree. In addition to storage devices this also enables libnvdimm to pass ACPI._DSM messages for platform/dimm configuration.
- PMEM: Initially merged in v4.1 this driver for contiguous spans of persistent memory address ranges is re-worked to drive PMEM-namespaces emitted by the libnvdimm-core. In this update the PMEM driver, on x86, gains the ability to assert that writes to persistent memory have been flushed all the way through the caches and buffers in the platform to persistent media.
- BLK: This new driver enables access to persistent memory media through "Block Data Windows" as defined by the NFIT. The primary difference of this driver to PMEM is that only a small window of persistent memory is mapped into system address space at any given point in time. Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access different portions of the media. BLK-mode, by definition, does not support DAX
- BTT: This is a library, optionally consumed by either PMEM or BLK, that converts a byte-accessible namespace into a disk with atomic sector update semantics (prevents sector tearing on crash or power loss)
Make CFQ default to IOPS mode on SSDs commit
zram: add dynamic device add/remove functionality commit
Addition of policy specific data to blkcg for block cgroups commit
Add support for DAX reads/writes to block devices commit
UBI: Dynamically allocate minor numbers commit
- Device mapper
dm raid: Add dm-raid access to the MD RAID0 personality to enable single zone striping commit
dm cache: Add fail io mode and needs_check flag: If a cache metadata operation fails (e.g. transaction commit) the cache's metadata device will abort the current transaction, set a new needs_check flag, and the cache will transition to "read-only" mode commit
dm cache: add stochastic-multi-queue (smq) policy, make it default: The stochastic-multi-queue (smq) policy addresses some of the problems with the current multiqueue (mq) policy (memory usage, level balancing, adaptability, performance) commit, commit
dm stats: add support for request-based DM devices (eg. multipath) commit
dm stats: add option to dm statistics to collect and report a histogram of IO latencies commit
dm stats: Make it possible to use precise timestamps with nanosecond granularity commit
dm thin: range discard support commit
rbd: queue_depth map option commit
7. Cryptography
Add jitterentropy RNG. The CPU Jitter RNG provides a source of good entropy by collecting CPU executing time jitter. The entropy in the CPU execution time jitter is magnified by the CPU Jitter Random Number Generator commit
drbg: use Jitter RNG to obtain seed commit
rng: Make DRBG the default crypto api RNG commit
New chacha20 cipher. ChaCha20 is a 256-bit high-speed stream cipher designed by Daniel J.Bernstein and further specified in RFC7539 for use in IETF protocols commit
Add Poly1305 authenticator algorithm. Poly1305 is an authenticator algorithm designed by Daniel J. Bernstein, it is used for the ChaCha20-Poly1305 AEAD, specified in RFC7539 for use in IETF protocols commit, commit, commit
rsa: add a new rsa generic implementation commit
Added support for SEC1 hardware to talitos commit, commit, commit, commit, commit, commit, commit, commit, commit, commit
echainiv: Add Encrypted Chain IV Generator, which generates an IV based on the encryption of a sequence number xored with a salt. This is the default algorithm for CBC commit
seqiv: Add a new IV generator seqniv which is identical to seqiv except that it skips the IV when authenticating. This is intended to be used by algorithms such as rfc4106 that does the IV authentication implicitly commit
8. Security
selinux: enable genfscon labeling for sysfs and pstore files commit
selinux: enable per-file labeling for debugfs files. commit
Smack: allow multiple labels in onlycap commit
evm: permit the labeling of existing files on pseudo files systems commit
ima: add support for new "euid" policy condition commit
ima: extend "mask" policy matching support commit
ima: update builtin policies commit
9. Tracing and perf tool
Allow disabling/enabling events dynamically in 'perf top': a 'perf top' session can instantly become a 'perf report' one, i.e. going from dynamic analysis to a static one, returning to a dynamic one is possible, to toogle the modes, just press 'f' to 'freeze/unfreeze' the sampling commit, commit
Add Instruction Tracing support (--itrace) commit, commit, commit, commit, commit
perf probe: Accept multiple filter options. Each filters are combined by logical-or. E.g. --filter abc* --filter *def is same as --filter abc*|*def commit
perf kmem: Add --live option for current allocation stat commit
perf kmem: Add kmem.default config option to select the default value ('page' or 'slab') commit
perf kmem: Implement stat --page --caller, it shows caller statistics for page commit
perf kmem: Add new sort keys for page: page, order, migtype, gfp commit
perf probe: allows the user to pass the filter pattern directly to the --funcs option commit and --list option commit and --del option commit
perf record: Add AUX area tracing Snapshot Mode support (--snapshot) commit, commit
perf bench futex: A new benchmark 'wake-parallel' is added to measure parallel waker threads commit
perf probe: Add --no-inlines option to avoid searching inline functions commit
perf probe: Support $params special probe argument. $params is similar to $vars but matches only function parameters not local variables. Thus, this is useful for tracing function parameter changing or tracing function call with parameters commit
perf probe: Support glob wildcards for function name when adding new probes. This will allow us to build caches of function-entry level information with $params commit
perf probe: Add --range option to show a variable's location range commit
perf sched: Add option to merge like comms to lat output commit
perf record: Add a new branch sampling type support for indirect jumps: perf record -j ind_jmp .......It enables analysis of indirect jumps targets commit
perf tools: Make Ctrl-C stop processing on TUI commit
perf annotate: Display total number of samples with --show-total-period commit
perf probe: Speed up perf probe --list by caching debuginfo commit
perf tools: The time out to limit the individual proc map processing was hard code to 500ms. This patch introduce a new option --proc-map-timeout to make the time limit configurable commit
perf stat: Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task commit
- BPF
tracing: add trace event for memory-failure commit
10. Virtualization
- KVM
Implement multiple address spaces commit
- Hyper-V
file copy service: full handshake support commit
vmbus: Implement NUMA aware CPU affinity for channels commit
vmbus: Implement the protocol for tearing down vmbus state commit
vss: full handshake support commit
Tools: kvp: use misc char device to communicate with kernel commit, vss: use misc char device to communicate with kernel commit
user mode linux: Remove hppfs (honeypot procfs) was an attempt to use UML as honeypot. It was never stable nor in heavy use commit
vhost: add max_mem_regions module parameter to specify the maximum number of memory regions in memory map (default 64) commit
vhost: allow vhost to support guests with a different byte ordering from host while using legacy virtio commit
vmxnet3: Make the driver understand adapter version 2 commit
xen: block: add multi-page ring support, so that more requests can be issued by using more than one pages as the request ring between blkfrontand backend. As a result, the performance can get improved significantly. If using 64 pages as the ring, the IOPS increased about 15 times for the throughput testing and above doubled for the latency testing commit
11. Networking
TCP: Add CAIA Delay-Gradient (CDG) congestion control. CDG modifies the TCP sender in order to: -Use the delay gradient as a congestion signal; -Back off with an average probability that is independent of the RTT; - Coexist with flows that use loss-based congestion control, i.e., flows that are unresponsive to the delay signal; Tolerate packet loss unrelated to congestion.(disabled by default. Its FreeBSD implementation was presented for the ICCRG in July 2012 commit
Add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations: When an application needs to force a source IP on an active TCP socket it has to use bind(IP, port=x). As most applications do not want to deal with already used ports, x is often set to 0, meaning the kernel is in charge to find an available port. But kernel does not know yet if this socket is going to be a listener or be connected. This patch adds a new SOL_IP socket option, asking kernel to ignore the 0 port provided by application in bind(IP, port=0) and only remember the given IP address. The port will be automatically chosen at connect() time, in a way that allows sharing a source port as long as the 4-tuples are unique commit
Introduce programable flow dissector commit
Introduce Flower classifier, which can classify packets based on a configurable combination of packet keys and masks commit
TCP: Add rfc3168, section 6.1.1.1. fallback for outgoing ECN connections. In other words, this work adds a retry with a non-ECN setup SYN packet, as suggested from the RFC on the first timeout. For users explicitly not wanting this which can be in DC use case, a net.ipv4.tcp_ecn_fallback knob is added that allows for disabling the fallback commit
switchdev: Add VLAN dump support to switchdev port's bridge_getlink. iproute2 "bridge vlan show" cmd already knows how to show the vlans installed on the bridge and the device , but (until now) no one implemented the port vlan part of the netlink message (merge)
Add a netdev driver for GENEVE (GEneric NEtwork Virtualization Encapsulation) tunnels. It allows one to create geneve virtual interfaces that provide Layer 2 Networks over Layer 3 Networks. GENEVE is often used to tunnel virtual network infrastructure in virtualized environments. For more information see http://tools.ietf.org/html/draft-gross-geneve-02 (merge)
ieee802154: adds transmission power setting support for IEEE-802.15.4 devices via nl802154 commit
Export the value of the linkdown sysctl to netconf commit
IPv6: IPv6 flow labels have been an unmitigated disappointment thus far. Support in HW devices to use them for ECMP is lacking, and OSes don't turn them on by default. This release splits the flow label space into two ranges: 0-7ffff is reserved for flow label manager, 80000-fffff will be used for creating auto flow labels (per RFC6438). This should give Linux a path to enabling auto flow labels by default for all IPv6 packets. It can be disabled with sysctl flowlabel_state_ranges commit
- RDS
- unix sockets
IPv4: sysctl option (ignore_routes_with_linkdown) to ignore routes when nexthop link is down commit
net scheduler: run ingress qdisc without locks commit
net scheduler :gred: Add a TCA_GRED_LIMIT attribute to set the GRED queue limit, in bytes, during qdisc setup commit
Implement extended console support commit
packet: add rollover statistics commit
vlan: Add GRO support for non hardware accelerated vlan commit
- netfilter
Add netfilter ingress hook, this allows to classify packets from ingress using the Netfilter infrastructure commit
nf_tables: add netdev table. It allows to create netdev tables that contain ingress chains. It provides access to the existing nf_tables features from the ingress hook commit
xt_MARK: Add ARP support commit
- nl802154
- openvswitch: If new optional attribute OVS_USERSPACE_ATTR_ACTIONS is added to an OVS_ACTION_ATTR_USERSPACE action, then include the datapath actions
in the upcall to userspace commit
pktgen: introduce xmit_mode '<start_xmit|netif_receive>' commit, add benchmark script pktgen_bench_xmit_mode_netif_receive.sh commit, add sample script pktgen_sample01_simple.sh commit, add sample script pktgen_sample02_multiqueue.sh commit, add sample script pktgen_sample03_burst_single_flow.sh commit,
tipc: add broadcast link window set/get to nl api commit
tipc: improve link congestion algorithm commit
- bonding
Add netlink support for sys prio, actor sys mac, and port key, until now they were only exported via bond's proc entry commit
Allow userspace to set actors' macaddr in an AD-system. commit
Allow userspace to set actors' system_priority in AD system commit
Implement user key part of port_key in an AD system. commit
bridge: allow setting hash_max + multicast_router if interface is down commit
Adds an optional ce_threshold to codel & fq_codel qdiscs, so that DCTCP can have feedback from queuing in the host commit
Add TCPWinProbe and TCPKeepAlive SNMP counters commit
- wireless
- Bluetooth
NFC: netlink: Implement vendor-specific command support commit
12. List of pull requests
13. Other news sites
Phoronix New Features Of The Linux 4.2 Kernel
linuxfr.org: Sortie du noyau Linux 4.2