Linux 3.8 was released on Mon, 18 Feb 2013.

This Linux release includes support in Ext4 for embedding very small files in the inode, which greatly improves the performance for these files and saves some disk space. There is also a new Btrfs feature that allows to replace quickly a disk, a new filesystem F2FS optimized for SSDs, support of filesystem mounts, UTS, IPC, PIDs, and network stack namespaces for unprivileged users, accounting of kernel memory in the memory resource controller, journal checksums in XFS, an improved NUMA policy redesign and, of course, the removal of support for 386 processors. Many small features and new drivers and fixes are also available.

1. Prominent features in Linux 3.8

1.1. Ext4 embeds very small files in the inode

Every file in Ext4 has a corresponding inode which stores various information -size, date creation, owner, etc- about the file (users can see that information with the stat(1) command). But the inode doesn't store the actual data, it just holds information about where the data it is placed.

The size used by each inode is predetermined at mkfs.ext4(8) time, and defaults to 256 bytes. But the space isn't always used entirely (despite small extended attributes making use of it), and there millions of inodes in a typical file system, so some space is wasted. At the same time, at least one data block is always allocated for file data (typically, 4KB), even if the file only uses a few bytes. And there is a extra seek involved for reading these few bytes, because the data blocks aren't allocated contiguously to the inodes.

Ext4 has added support for storing very small files in the unused inode space. With this feature the unused inode space gets some use, a data block isn't allocated for the file, and reading these small files is faster, because once the inode has been read, the data is already available without extra disk seeks. Some simple tests shows that with a linux-3.0 vanilla source, the new system can save more than 1% disk space. For a sample /usr directory, it saved more than 3% of space. Performance for small files is also improved. The files that can be inlined can be tweaked indirectly by increasing the inode size (-I mkfs.ext4(8) option) - the bigger the inode, the bigger the files that can be inlined (but if the workload doesn't make extensive use of small files, the space will be wasted).

Recommended LWN article: Improving ext4: bigalloc, inline data, and metadata checksums

Code: (commit 1, 2, 3, 4, 5, 6, 7, 8)

1.2. Btrfs fast device replacement

As a filesystem that expands to multiple devices, Btrfs can remove a disk easily, just in case you want to shrink your storage pool, or just because the device is failing and you want to replace it:

But the process is not as fast as it could be. Btrfs has added a explicit device replacement operation which is much faster:

The copy usually takes place at 90% of the available platter speed if no additional disk I/O is ongoing during the copy operation. The operation takes place at runtime on a live filesystem, it does not require to unmount it or stop active tasks, and it is safe to crash or lose power during the operation, the process will resume with the next mount. It's also possible to use the command "btrfs replace status" to check the status of the operation, or "btrfs replace cancel" to cancel it. The userspace patches for the btrfs program can be found [git:// here].

Code: (commit 1, 2, 3, 4, 5, 6)

1.3. F2FS, a SSD friendly file system

F2FS is a new experimental file system, contributed by Samsung, optimized for flash memory storage devices. Linux has several file systems targeted for flash devices -logfs, jffs2, ubifs-, but they are designed for "native" flash devices that expose the flash storage device directly to the computer. Many of the flash storage devices commonly used (SSD disks) aren't "native" flash devices. Instead, they have a FTL ("flash translation layer") that emulates a block based device and hides the true nature of flash memory devices. This makes possible to use the existing block storage stacks and file systems in those devices. These file systems have made some optimizations to work better with SSDs (like trimming). But the filesystem formats don't make changes to optimize for them.

F2FS is a filesystem for SSDs that tries to keep in mind the existence of the Flash Translation Layer, and tries to make good use of it. For more details about the design choices made by F2FS, reading the following LWN article is recommended:

Recommended LWN article: An f2fs teardown

Code: fs/f2fs

1.4. User namespace support completed

Per-process namespaces allow to have different namespaces for several resources. For example, a process might see a set mountpoints, PID numbers, and network stack state, and a process in other namespace might see others. The per-process namespace support has been developed for many years: The command unshare(1), available in modern linux distros, allows to start a process with the mount, UTS, IPC or network namespaces "unshared" from its parent; and systemd uses mount namespaces for the ReadWriteDirectories, ReadOnlyDirectories or InaccessibleDirectories unit configuration options, and for systemd-nspawn. But the use of namespaces was limited only to root.

This release adds is the ability for unprivileged users to use per-process namespaces safely. The resources with namespace support available are filesystem mount points, UTS, IPC, PIDs, and network stack.

For more details about the Linux namespace support, what they are, how they work, details about the API and some example programs, you should read the article series from LWN

(The remaining namespaces will be covered in future LWN articles)

1.5. XFS log checksums

XFS is planning to add full metadata checksumming in the future. As part of that effort, this release adds support for checksums in the journal.

Code: (commit 1, 2)

1.6. Huge Pages support a zero page

Huge pages are a type of memory pages provided by the CPU memory management unit, which are much bigger than usual. They are typically used by big databases and applications which maker use of large portions of memory. In the other hand, a "zero page" is a memory page full of zeros. This page is used by the kernel to save memory: some applications allocate large portions of memory full of zeros but they don't write to all parts of it, so instead of allocating that zeroed memory, the kernel just makes all the memory point to the zero page. The zero page was only available for normal sized pages (4KB in x86), this release adds a zero huge page for applications that use huge pages.

Recommended LWN article: Adding a huge zero page

Code: (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)

1.7. The memory resource controller supports accounting of kernel memory

The Linux memory controller is a control group that can limit, account and isolate memory usage to arbitrary groups of processes. In this release, the memory controller has got support for accounting two types uses of kernel memory usage: stack and slab usage. These limits can be useful for things like stopping fork bombs.

The files created in the control group are:

Recommended LWN article: KS2012: memcg/mm: Improving kernel-memory accounting for memory cgroups

1.8. Automatic NUMA balancing

A lot of modern machines are "non uniform memory access" (NUMA) architectures: they have per-processor memory controllers, and accessing the memory in the local processor is faster than accessing the memory of other processors, so the placement of memory in the same node where processes will reference it is critical for performance. This is specially true in huge boxes with docens or hundreds of processors.

The Linux NUMA implementation had some deficiencies. This release includes a new NUMA foundation which will allow to build smarter NUMA policies in the next releases. For more details, see the LWN article:

Recommended LWN article: NUMA in a hurry

Some parts of the code: (commit 1), 2, 3, 4, 5, 6, 7)

1.9. Removal of support for 386 processors

As it has been widely reported, this release no longer supports the Intel 386 processor (486 is still supported, though)

Code: (commit)

2. Driver and architecture-specific changes

All the driver and architecture-specific changes can be found in the Linux_3.8_DriverArch page

3. Various core changes

4. Filesystems

5. Block

6. Crypto/keyring

7. Security

process (commit)

8. Perf

9. Virtualization

10. Networking

, add support of link creation via rtnl 'ip link .. type ipip' (commit)

11. Other news sites that track the changes of this release

KernelNewbies: Linux_3.8 (last edited 2017-12-30 01:29:54 by localhost)