KernelNewbies:

Linux 2.6.28 kernel released on 25 December, 2008.

Summary: Linux 2.6.28 adds the first version of Ext4 as a stable filesystem, the much-expected GPU memory manager which will be the foundation of a renewed graphic stack, support for Ultra Wide Band (Wireless USB, UWB-IP), memory management scalability and performance improvements, a boot tracer, disk shock protection, the phonet network protocol, support of SSD discard requests, transparent proxy support, several new network drivers, controlable IO CPU affinity, high-resolution poll()/select(), support of a minimal "dummy" policy in SELinux, tracing improvements, x86 x2APIC support, a fb driver for VIA UniChrome devices, Mitac Mio A701 ARM-based smartphone support, some new drivers, improved device support, and many other small improvements and fixes.

1. Prominent features (the cool stuff)

1.1. Ext4

The backwards-compatible replacement of Ext3 has been declared as stable. Bigger filesystem/file sizes, extents, delayed allocation, multiblock allocation, improved block allocation algoritms, faster fsck, online defragmentation and faster and more robust journaling are the main features of this filesystem.

A separate article has been written about Ext4: Ext4, the Fourth Extended File System

1.2. The GEM Memory Manager for GPU memory

Recommended article: A description of all the parts involved in the new graphics stack: "EXA, UXA, DRI, GEM, TTM"

Recommended articles about GEM: "Memory management for graphics processors" and "GEM v. TTM"

In the last decade graphics hardware has evolved at an astounding pace, and it's expected to improve even more in the future. Modern GPUs have a lot of processing power -more than the most powerful CPU in some specialized workloads- that traditionally has only been used by specialized applications using opengl/directx, like games and 3D design apps; the 2D desktop implementations that are commonly found in computers kept using this modern graphic hardware in the same way they used the old graphics hardware which started the "desktop revolution" in the 80-90's, ie. inefficiently. There's a lot of GPU power that don't get used unless you run a game. On the other hand, the Linux/FOSS graphic stack is far from perfect, even for the traditional graphic stack design. To start with, there're several drivers fighting to access the same resource (the graphics card): The fb-based console, the in-kernel DRM driver, the X.org userspace 2D driver...this situation leads to all class of problems and artifacts and suboptimal performance.

There has been a lot of work in the latest years to modernize the Linux graphics stack so that it's both well designed and also ready to use the full power of modern and future GPUs. In 2.6.28, Linux is adding one of the most important pieces of the stack: A memory manager for the GPU memory, called GEM ("Graphic Execution Manager"). The purpose is to have a central manager for buffer object placement, caching, mapping and synchronization. It speeds up some benchmarks by 50%. On top of GEM are being built a lot of improvementes to the graphic stack: Kernel Modesetting, DRI2, UXA (a EXA implementation based in GEM). The Linux/FOSS graphics stack will be finally unified and optimally coupled.

All this new code has been delayed for a long time, because there was a competing memory manager, called TTM, which was almost merged in the kernel in 2.6.24 or so, until the Intel people came up with the first versions of the GEM memory manager. People decided it was better than TTM, and it was considered neccesary to delay the merge to stabilize GEM and rewrite the other features to work with GEM, not TTM. Hence, this first version of GEM works only with the i915 driver, and support on the X.org side is implemented only in the version 2.5.0 of the driver. Preliminary GEM support for other drivers is already in development and will be merged in future releases.

Code: (commit)

1.3. Support for "Ultra Wide Band" (UWB), Wireless USB and UWB-IP

"Ultra Wide Band" (UWB) is a high-bandwidth, low-power, point-to-point radio technology using a wide spectrum (3.1-10.6GHz). It is optimized for in-room use (480Mbps at 2 meters, 110Mbps at 10m). It serves as the transport layer for other protocols, such as Wireless USB, WiMedia Link Protocol (Ethernet/IP over UWB) and, in the future, Bluetooth and 1394. Linux 2.6.28 adds code to implement a Ultra Wide Band stack, as well as drivers for the the USB based UWB radio controllers defined in the Wireless USB 1.0 specification (including Wireless USB host controller and an Intel WiNET controller).

UWB: (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), WLP: (commit 1, 2, 3, 4), WUSB: (commit 1, 2, 3, 4, 5, 6, 7, 8, 9)

1.4. Memory management Scalability improvements

Improvements to the page replacement algorithm

Recommended LWN article: The state of the pageout scalability patches

Systems with a lot of memory have lots (millions) of pages. When the replacement algorithm has to search candidate pages to be swapped, it has to search between all the pages, and in big systems this can take too much time. In 2.6.28, the file-backed pages (pages that belong to some file on the disk) and the anonymous pages (pages that are not part of any file, like ie. pages obtained with malloc, which need to be written to swap before being evicted) are put in two different lists, unlike previous releases, that used a single list. The algorithms can decide to look into only one of those lists without needing to look in the other. Additionally, there're pages that cannot be deleted from memory, for example because they're mlock()'ed, or because they belong to a ramfs filesystem. Those pages are put into a special third list, which won't be searched at all by the algorithms because they can not be evicted.

Code: (commit 1,2,3,4, 5, 6,7, 8, 9,10,11,12, 13,14, 15, 16, 17, 18, 19, 20, 21, 22, 23,24, 25)

Rewrite the vmap layer

Recommended LWN article: Reworking vmap()

In 2.6.28, the vmap allocator has been rewritten to use rbtrees and lazy tlb flushing, and provide a fast, scalable percpu frontend for small vmaps. Some benchmarks that exercize the vmap layer have been speeded up by 20x or more.

Code: (commit 1, 2, 3, 4, 5, 6)

1.5. Container freezer

Freezing filesystems and containers

The container freezer is a cgroup subsystem that utilizes the swsusp freezer to freeze and restart a arbitrary group of tasks determined by the user. It's immediately useful for batch job management scripts. It should also be useful in the future for implementing container checkpoint/restart. For more details on to how to use it, see the commit links.

Code: (commit 1, 2, 3, 4, 5, 67, 8)

1.6. Boot tracer

The purpose of this tracer is to helps developers to optimize boot times: it records the timings of the initcalls. Its aim is to be parsed by the scripts/bootgraph.pl tool to produce graphics about boot inefficiencies, giving a visual representation of the delays during initcalls. Users need to enable CONFIG_BOOT_TRACER, boot with the "initcall_debug" and "printk.time=1" parameters, and run "dmesg | perl scripts/bootgraph.pl > output.svg" to generate the final data.

Code: (commit 1, 2, 3, 4, 5, 6)

1.7. Disk Shock Protection

ATA/ATAPI-7 specifies the IDLE IMMEDIATE command with unload feature. Issuing this command should cause the drive to switch to idle mode and unload disk heads. This feature is being used in modern laptops in conjunction with accelerometers and appropriate software to implement a shock protection facility. The idea is to stop all I/O operations on the internal hard drive and park its heads on the ramp when critical situations are anticipated.

For each ATA device, Linux 2.6.28 adds the file /sys/block/*/device/unload_heads. Writing an integer value to this file will take the heads of the respective drive off the platter and defer all I/O operations for the specified number of milliseconds. When the timeout expires normal operation will be resumed. The maximal value accepted for a timeout is 30000 milliseconds. However, there are some hard drives that only comply with an earlier version of the ATA standard, but do support the feature nonetheless. Unfortunately, there is no safe way Linux can detect these devices, so you won't be able to write to the unload_heads attribute. If you know that your device really does support the unload feature (for instance, because the vendor of your laptop or the hard drive itself told you so), then you can tell the kernel to enable the usage of this feature for that drive by writing the special value -1 to the unload_heads attribute. See this page for information about Linux support of the hard disk active protection system as implemented in IBM/Lenovo Thinkpads.

Code: (commit), (commit), (commit)

1.8. Phonet Network Protocol

The Phone Network protocol (PhoNet) is a packet-oriented communication protocol developped by Nokia for use with its cellular modems for both IPC and RPC. With the Linux Phonet socket family, Linux host processes can receive and send messages from/to the modem, or any other external device attached to the modem; the modem takes care of routing. Phonet packets can be exchanged through various hardware connections depending on the device, such as: USB with the CDC Phonet interface, infrared, Bluetooth, a serial port. This is required for Maemo to use cellular data connectivity (if supported), it can also be used to control Nokia phones.

Code: (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)

1.9. Network: Transparent proxying, new drivers, DSA...

1.10. Tracepoints

Tracing: no shortage of options Tracepoints are another mechanism for inserting static tracing points in the kernel, used by tools like LTTng (Linux Trace Toolkit). There's already a mechanism to insert such points: kernel markers, merged in Linux 2.6.24, but tracepoints are slightly different (see the LWN article). The scheduler has been instrumentalized with tracepoints, and ftrace has been ported to use them.

Code: (commit 1, 2, 3,4, 5, 6)

1.11. -staging drivers

Recommended article: Moving the -staging tree

There's a controversy in the kernel community between the people who wants to see new drivers getting merged in the main Linux tree as soon as possible, and people who thinks that before being merged they must have good quality. The -staging tree has been created to get those out-of-the-tree drivers that don't have the required level into the drivers/staging directory.

1.12. IO CPU affinity

2.6.28 add support for controlling the IO completion CPU of either all requests on a queue, or on a per-request basis. A sysfs variable (rq_affinity) is exported which, if set, migrates completions of requests to the CPU that originally submitted it. A internal bio helper (bio_set_completion_cpu()) is also added, so that queuers can ask for completion on that specific CPU. In testing, this has been show to cut the system time by as much as 20-40% on synthetic workloads where CPU affinity is desired.

Code: (commit)

1.13. FIEMAP

Recommended LWN article: SEEK_HOLE or FIEMAP?

When an application wants to know how a file is store in the disk (for example, a backup application that wants to know if a file is a sparse file and wants to avoid backing up the hole) it uses the fibmap ioctl. But this ioctl is suboptimal - the ioctl can only be asked for a block at a time, which is too expensive for big files. The FIEMAP ioctl, in the other hand, returns a list of extents.

Code: (commit 1, 2, 3, 4)

2. Various core

3. Filesystems

4. Networking

5. Security

6. Tracing/Profiling

7. Block

Recommended LWN articles: Block layer: solid-state storage, timeouts, affinity, and more, and Block layer discard requests

8. Crypto

9. WIFI

10. Architecture-specific changes

11. Drivers

11.1. Graphics

11.2. Storage

11.3. Network

11.4. Input

11.5. USB

11.6. Sound

11.7. V4L/DVB

11.8. HID

11.9. HWMON

11.10. I2C

11.11. Multi-Function Devices

11.12. MTD

11.13. RTC

11.14. WATCHDOG

11.15. LED

11.16. ACPI

11.17. Various

11.18. Other sources tracking the kernel changes


KernelNewbies: Linux_2_6_28 (last edited 2017-12-30 01:30:15 by localhost)