KernelNewbies:

Linux 3.8 was released on

TableOfContents()

1. Prominent features in Linux 3.8

2. Ext4 embeds very small files in the inode

Every file in Ext4 has a corresponding [http://en.wikipedia.org/wiki/Inode inode] which stores various information -size, date creation, owner, etc- about the file (users can see that information with the [http://linux.die.net/man/1/stat stat(1)] command). But the inode doesn't store the actual data, it just holds information about where the data it is placed. The size used by each inode is predetermined at [http://linux.die.net/man/8/ mkfs.ext4(8)] time, and defaults to 256 bytes. But the space isn't always used entirely (despite small extended attributes making use of it), and there millions of inodes in a typical filesystem, so some space is wasted. At the same time, at least one data block is always allocated for file data (typically, 4KB), even if the file only uses a few bytes. And there is a extra seek involved for reading these few bytes, because the data blocks aren't allocated contiguouslly to the inodes.

Ext4 has added support for storing very small files in the unused inode space. With this feature the unused inode space gets some use, a data block isn't allocated for the file, and reading these small files is faster, because once the inode has been read, the data is already available without extra disk seeks. Some [https://lwn.net/Articles/468678/ simple tests] shows that with a linux-3.0 vanilla source, the new system can save more than 1% disk space. For a sample "/usr" directory, it saved more than 3% of space. Performance for small files is [https://lwn.net/Articles/516645/ also improved]. The files that can be inlined can be tweaked indirectly by increasing the inode size (-I mkfs option) - the bigger the inode, the bigger the files that can be inlined (but if the workload doesn't make extensive use of small files, the space will be wasted)

Recommended LWN article: [https://lwn.net/Articles/469805/ Improving ext4: bigalloc, inline data, and metadata checksums]

Code: [http://git.kernel.org/linus/67cf5b09a46f72e048501b84996f2f77bc42e947 (commit 1], [http://git.kernel.org/linus/46c7f254543dedcf134ad05091ed2b935a9a597d 2], [http://git.kernel.org/linus/f19d5870cbf72d4cb2a8e1f749dff97af99b071e 3], [http://git.kernel.org/linus/3fdcfb668fd78ec92d9bc2daddf1d41e2a8a30bb 4], [http://git.kernel.org/linus/9c3569b50f12e47cc5e907b5e37e4a45c0c10b43 5], [http://git.kernel.org/linus/978fef914a2e6b8ad5672d0a39f9201b7aa7c396 6], [http://git.kernel.org/linus/7335cd3b41b1e704608ca46159641ca9cb598121 7], [http://git.kernel.org/linus/f08225d176a5736363beea653b9b3fb9400c1255 8)]

3. Btrfs fast device replacement

As a filesystem that expands to multiple devices, Btrfs can remove a disk easily, just in case you want to shrink your storage pool, or just because the device is failing and you want to replace it:

But the process is not as fast as it could be. Btrfs has added a explicit device replacement operation which is much faster:

The copy usually takes place at 90% of the available platter speed if no additional disk I/O is ongoing during the copy operation. The operation takes place at runtime on a live filesystem, it does not require to unmount it or stop active tasks, and it is safe to crash or lose power during the operation, the process will resume with the next mount. It's also possible to use the command "btrfs replace status" to check the status of the operation, or "btrfs replace cancel" to cancel it

Code:

[http://git.kernel.org/linus/ff023aac31198e88507d626825379b28ea481d4d (commit 1], [http://git.kernel.org/linus/e93c89c1aaaaaec3487c4c18dd02360371790722 2], [http://git.kernel.org/linus/3f6bcfbd4149875662773eb40a62294cddf215d4 3], [http://git.kernel.org/linus/ad6d620e2a5704f6bf3a39c92a75aad962c51cb3 4], [http://git.kernel.org/linus/8dabb7420f014ab0f9f04afae8ae046c0f48b270 5], [http://git.kernel.org/linus/b8b8ff590f99678616f9ea85f5088542d1cfc0be 6)]

4. F2FS, a SSD friendly filesystem

F2FS is a new experimental filesystem, contributed by Samsung, optimized for flash memory storage devices. Linux has several filesystems targetted for flash devices -logfs, jffs2, ubifs-, but they are designed for "native" flash devices that expose the flash storage device directly to the computer. Many of the flash storage devices commonly used (SSD disks) aren't "native" flash devices. Instead, they have a FTL ("flash translation layer") that emulates a block based device and hides the true nature of flash memory devices. This makes possible to use the existing block storage stacks and filesystems in those devices. These filesystems have made some optimizations to work better with SSDs (like [http://en.wikipedia.org/wiki/TRIM trimming]). But the filesystem formats don't make changes to optimize for them.

F2FS is a filesystem for SSDs that tries to keep in mind the existence of the Flash Translation Layer, and tries to make good use of it. For more details about the design choices made by F2FS, reading the following LWN article is recommended:

Recommended LWN article: [https://lwn.net/Articles/518988/ An f2fs teardown]

Code: [http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=tree;f=fs/f2fs;hb=HEAD fs/f2fs]

5. User namespace support completed

Per-process namespaces allow to have different namespaces for several resources. For example, a process might see a set mountpoints, PID numbers, and network stack state, and a process in other namespave might see others. The per-process namespace support has been developed and usable for many years, and are already available in Linux system: The command [http://linux.die.net/man/1/unshare unshare(1)], available in modern linux distros, allows to start a process with the mount, UTS, IPC or network namespaces "unshared" from its parent; and systemd uses mount namespaces for the ReadWriteDirectories, ReadOnlyDirectories or InaccessibleDirectories unit configuration options, and for systemd-nspawn. But the use of namespaces was limited only to root.

This release adds is the ability for unprivileged users to use per-process namespaces safely. The resources with namespace support available are filesystem mount points, UTS, IPC, PIDs, and network stack.

For more details about the Linux namespace support, what they are, how they work, details about the API and some example programs, you should read the article series from LWN

Namespaces in operation, [https://lwn.net/Articles/531114/ part 1: namespaces overview] Namespaces in operation, [https://lwn.net/Articles/531381/ part 2: the namespaces API] Namespaces in operation, [https://lwn.net/Articles/532271/ part 3: PID namespaces] Namespaces in operation, [https://lwn.net/Articles/532748/ part 4: more on PID namespaces]

(The remaining namespaces will be covered in future LWN articles)

6. Huge Pages support a zero page

[http://kernelnewbies.org/Linux_2_6_38#head-f28790278bf537b4c4869456ad7b84425298708b Huge pages] are a type of memory pages provided by the CPU memory management unit, which are much bigger than usual. They are typically used by big databases and applications which maker use of large portions of memory. In the other hand, a "zero page" is a memory page full of zeros. This page is used by the kernel to save memory: some applications allocate large portions of memory full of zeros but they don't write to all parts of it, so instead of allocating that zeroed memory, the kernel just makes all the memory point to the zero page. The zero page was only available for normal sized pages (4KB in x86), this release adds a zero huge page for applications that use huge pages.

Recommended LWN article: [https://lwn.net/Articles/517465/ Adding a huge zero page]

Code: [http://git.kernel.org/linus/d8a8e1f0da3d29d7268b3300c96a059d63901b76 (commit 1], [http://git.kernel.org/linus/3ea41e6210fea3b234b6cb3e9443e75975850bbf 2], [http://git.kernel.org/linus/e180377f1ae48b3cbc559c9875d9b038f7f000c6 3], [http://git.kernel.org/linus/cad7f613c4d010e1d0f05c9a4fb33c7ae40ba115 4], [http://git.kernel.org/linus/fc9fe822f7112db23e51e2be3b886f5d8f0afdb6 5], [http://git.kernel.org/linus/93b4796dede916de74b21fbd637588da6a99a7ec 6], [http://git.kernel.org/linus/4a6c1297268c917e9c50701906ba2f7e06812299 7], [http://git.kernel.org/linus/97ae17497e996ff09bf97b6db3b33f7fd4029092 8], [http://git.kernel.org/linus/c5a647d09fe9fc3e0241c89845cf8e6220b916f5 9], [http://git.kernel.org/linus/79da5407eeadc740fbf4b45d6df7d7f8e6adaf2c 10], [http://git.kernel.org/linus/78ca0e679203bbf74f8febd9725a1c8dd083d073 11], [http://git.kernel.org/linus/80371957f09814d25c38733d2d08de47f59a13c2 12], [http://git.kernel.org/linus/479f0abbfd253d1117a35c1df12755d27a2a0705 13)]

7. The memory resource controller supports accounting of kernel memory

The Linux memory controller is a [http://en.wikipedia.org/wiki/Cgroups control group] that can limit, account and isolate memory usage to arbitrary groups of processes. In this release, the memory controller has got support for accounting two types uses of kernel memory usage: stack and slab usage. These limits can be useful for things like stopping fork bombs.

The files created in the control group are:

memory.kmem.limit_in_bytes: set/show hard limit for kernel memory memory.kmem.usage_in_bytes: show current kernel memory allocation memory.kmem.failcnt: show the number of kernel memory usage hits limits memory.kmem.max_usage_in_bytes: show max kernel memory usage recorded

Recommended LWN article: [https://lwn.net/Articles/516529/ KS2012: memcg/mm: Improving kernel-memory accounting for memory cgroups]

Some parts of the code: [http://git.kernel.org/linus/510fc4e11b772fd60f2c545c64d4c55abd07ce36 (commit 1)], [http://git.kernel.org/linus/7ae1e1d0f8ac2927ed7e3ca6d15e42d485903459 2], [http://git.kernel.org/linus/7a64bf05b2a6fe3703062d13d389e3eb904741c6 3], [http://git.kernel.org/linus/6a1a0d3b625a4091e7a0eb249aefc6a644385149 4], [http://git.kernel.org/linus/d79923fad95b0cdf7770e024677180c734cb7148 5], [http://git.kernel.org/linus/d5bdae7d59451b9d63303f7794ef32bb76ba6330 6], [http://git.kernel.org/linus/92e793495597af4135d94314113bf13eafb0e663 7)]

8. Driver and architecture-specific changes

All the driver and architecture-specific changes can be found in the [http://kernelnewbies.org/Linux_3.8_DriverArch Linux_3.8_DriverArch page]

9. Various core changes

10. Filesystems

11. Block

12. Crypto/keyring

13. Security

process [http://git.kernel.org/linus/2f4b3bf6b2318cfaa177ec5a802f4d8d6afbd816 (commit)]

14. Perf

15. Virtualization

16. Networking

, add support of link creation via rtnl 'ip link .. type ipip' [http://git.kernel.org/linus/be42da0e1012bf67d8f6899b7d9162e35527da4b (commit)]

17. Other news sites that track the changes of this release


KernelNewbies: Linux_3.8 (last edited 2013-02-19 00:33:53 by diegocalleja)