KernelNewbies:

Released 5 February, 2007 (full SCM git log)

Short overview (for news sites, etc)

With 2.6.20, Linux joins the virtualization trend. This release adds two virtualization implementations: A full-virtualization implementation that uses Intel/AMD hardware virtualization capabilities called KVM (http://kvm.sourceforge.net) and a paravirtualization implementation (http://lwn.net/Articles/194543) that can be used by different hypervisors (Rusty's lguest; Xen and VMWare in the future, etc). This release also adds initial Sony Playstation 3 support, a fault injection debugging feature (http://lwn.net/Articles/209257), UDP-lite support, better per-process IO accounting, relative atime, support for using swap files for suspend users, relocatable x86 kernel support for kdump users, small microoptimizations in x86 (sleazy FPU, regparm, support for the Processor Data Area, optimizations for the Core 2 platform), a generic HID layer, DEEPNAP power savings for PPC970, lockless radix-tree readside, shared pagetables for hugetbl, ARM support for the AT91 and iop13xx processors, full NAT for nf_conntrack and many other things.

Important things (AKA: ''the cool stuff'')

Sony Playstation 3 support

You may like the Wii or the 360 more, but only the PS3 is gaining official Linux support, written by Sony engineers. Note that the support at this time is incomplete (apparently enabling it will not boot on a stock PS3) and it doesn't support the devices included like the graphics card, etc. (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)

Virtualization support through KVM

KVM (project page) adds a driver for Intel's and AMD's hardware virtualization extensions to the x86 architecture (KVM will not work in CPUs without virtualization capabilities). See the Virtualization wiki for more information about virtualization in Linux

The driver adds a character device (/dev/kvm) that exposes the virtualization capabilities to userspace. Using this driver, a process can run a virtual machine (a "guest") in a fully virtualized PC containing its own virtual hard disks, network adapters, and display. Each virtual machine is a process on the host; a virtual CPU is a thread in that process. kill(1), nice(1), top(1) work as expected. In effect, the driver adds a third execution mode to the existing two: we now have kernel mode, user mode, and guest mode. Guest mode has its own address space mapping guest physical memory (which is accessible to user mode by mmap()ing /dev/kvm). Guest mode has no access to any I/O devices; any such access is intercepted and directed to user mode for emulation.

32 and 64 bits guests are supported (but not x86-64 guests on x86-32 hosts!). For i386 guests and hosts, both pae and non-pae paging modes are supported. SMP hosts and UP guests are supported, SMP guests aren't (support will be added in the future). You also can start multiple virtual machines in a host. Performance currently is non-stellar, it will be improved by a lot with the future inclusion of KVM paravirtualization KVM support.

The Windows install currently bluescreens due to a problem with the virtual APIC, a fix is being worked on and will be added in future releases. A temporary workaround is to use an existing image or install through qemu - Windows 64-bit does not work either (commit)

Paravirtualization support for i386

Paravirtualization is the act of running a guest operating system, under control of a host system, where the guest has been ported to a virtual architecture which is almost like the hardware it is actually running on. This technique allows full guest systems to be run in a relatively efficient manner (continue reading this LWN article for more information). This allows to link different hypervisors (lguest/lhype/rustyvisor implements a hypervisor in 6.000 lines; Xen and Vmware will be probably ported to this framework some day). There are limitations like no SMP support yet; this feature will evolve a lot with the time (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 11, 12, 13)

Relocatable kernel support for x86

This feature (enabled with CONFIG_RELOCATABLE) isn't very noticeable for end-users but it's quite interesting from a kernel POV. Until now, it was a requirement that a i386 kernel was loaded at a fixed memory address in order to work, loading it in a different place wouldn't work. This feature allows to compile a kernel that can be loaded at different 4K-aligned addresses, but always below 1 GB, with no runtime overhead. Kdump users (a feature introduced in 2.6.13 that triggers kexec in a kernel crash in order to boot a kernel that has been previously loaded at an 'empty' address, then runs that kernel, saves the memory where the crashed kernel was placed, dumps it in a file and continues booting the system) will benefit from this because until now the "rescue kernel" needed to be compiled with different configuration options in order to make it bootable at a different address. With a relocatable kernel, the same kernel can be booted at different addresses. (commit 1, 2, 3, 4)

Fault injection

This is a debugging feature that 'injects' failures in several layers in the kernel (kmalloc() failures, alloc_pages() failures, disk IO errors). By 'injecting' them on purpose, a developer can test how their code reacts to errors that are very difficult to find in the real world, where things do not fail so often. For example, a filesystem could not be handling correctly an error triggered by a broken hard disk. Because those error code paths are exercised very rarely the code may contain bugs that could be hit by a user some day. This feature 'injects' those errors on purpose so testing can find bugs much faster. Enabled by the following configuration options: CONFIG_FAILSLAB, CONFIG_PAGE_ALLOC and CONFIG_MAKE_REQUEST. If you also want to configure them via debugfs you must enable CONFIG_FAULT_INJECTION_DEBUG_FS. Here is a LWN article about it; and the documentation is here. (commit 1, 2, 3, 4, 5, 6, 7, 8, 9)

IO Accounting

The present per-task IO accounting isn't very useful. It simply counts the number of bytes passed into read() and write(). So if a process reads 1MB from an already-cached file, it is accused of having performed 1MB of I/O, which is 'wrong'. So this IO accounting implements per-process statistics of "storage I/O" (i.e.: I/O that _really_ does I/O on the storage device - Linux already had I/O storage statistics but it's not per-task). The data is reported through taskstats and procfs (/proc/$PID/io) (commit 1, 2, 3, 4, 5, 6, 7, 8, 10, 11)

Relative atime support

'Atime' is the 'Access time' field of a file: When a process reads a file, its atime is updated. Disabling atime updates, with the 'noatime' mount flag, is probably the most used performance tweak that Linux administrators use: An active server is continually reading files, generating lots of atime updates, which translate to metadata updates that the filesystem must write to disk. And writing those updates can seriously damage your performance. Believe it or not, a busy server like kernel.org (vsftpd + apache workload) cut their load average in half just by mounting their filesystems with 'noatime'.

Relative atime ('relatime') only updates the atime if the previous atime is older than the mtime or ctime. It avoids a lot of metadata atime updates (but not all of them, obviously, there's 'noatime' for that). It's like noatime, but useful for applications like mutt that need to know when a file has been read since it was last modified. Currently only OCFS2 supports it. A corresponding patch against mount(8) is available here (commit), ocfs2 support (commit)

UDP-Lite support

Support for UDP-Lite (RFC 3828) for IPv4 and an extension for UDP-Lite over IPv6 is added in 2.6.20. Documentation and programming guide. UDP-Lite is a Standards-Track IETF transport protocol whose characteristic is a variable-length checksum. This has advantages for transport of multimedia (video, VoIP) over wireless networks, as partly damaged packets can still be fed into the codec instead of being discarded due to a failed checksum test (commit)

Generic HID layer

Currently the HID layer (Human Interface Device) does only work with USB devices. 2.6.20 turns the USB-oriented HID layer into a generic HID layer that can be used for any subsystem that needs it, like Bluetooth. (commit 1, 2, 3, 4, 5, 6, 7, 8)

Sleazy FPU optimization

This is an x86-32 port of the x86-64 feature implemented in 2.6.19. It gives only a small improvement in FPU-intensive programs, but it's also an interesting optimization. Right now the kernel has a 100% lazy FPU behavior: after *every* context switch a trap is taken for the first FPU use to restore the FPU context lazily. This is great for applications that have very sporadic or no FPU use (since then you avoid doing the expensive save/restore all the time).

However, for very frequent FPU users every context switch takes an extra trap. This feature adds a simple heuristic to this code: After 5 consecutive context switches of FPU use, the lazy behavior is disabled and the context gets restored every context switch. If the application indeed uses the FPU, the trap is avoided (the chance of the 6th time slice using FPU after the previous 5 having done so are quite high obviously). After 256 switches, this is reset and lazy behavior is returned (until there are 5 consecutive switches again). The reason for this is to give the lazy behavior back to applications that use the FPU in bursts. (commit)

Use 'regparm' in x86-32

This is another not-relevant-to-users-yet-interesting-for-geeks feature, that has been available as an option for a while but it's default now. Since forever the x86 architecture has stored the function parameters in the stack. Modern architectures (PPC, SPARC, etc) use registers: It's much faster, since you don't need to do anything to bring the parameters back: The parameters are just there, in the register. The x86 world (including Linux) continued using stacks for parameter passing, for compatibility reasons with software, compilers, etc; they only added extensions to compilers to optionally tell the compiler to use registers for parameter passing in a given function (usually involving the 'fastcall' keyword) for performance-critical paths.

Thanks to a GCC extension, the Linux kernel uses the '-mregparm=3' compile option, which means that as long as a function uses 3 or less arguments, GCC will automatically use registers to pass its parameters. And if you're wondering about x86-64, in that platforms using the registers has always been the default (commit)

round_jiffies() infrastructure

This is an example of the power savy trend ongoing in the Linux kernel. This feature introduces the round_jiffies()/round_jiffies_relative() functions. These functions round a jiffies value to the next whole second. The target of this rounding is all the "we don't care exactly when" timers. By rounding these timers to whole seconds, all such timers will fire at the same time, rather than at various times spread out; with dynamic ticks these extra timers cause wakeups from deep sleep CPU sleep states and thus waste power (commit 1, 2, 3)

New drivers

Here are some important new drivers that have been added to the Linux tree:

Various core changes

Architecture-specific changes

Filesystems

Networking

Various subsystems

Software suspend

Crypto

CPUFREQ

DM

Noflush suspending (commit), (commit), (commit)

SELinux

Add support for DCCP (commit)

Drivers

Graphics

Add support for secondary vertical blank interrupt to DRM core and add support to i915 (commit 1, 2); add ioctl in i915 for scheduling buffer swaps at vertical blanks. (commit)

sstfb: add sysfs interface (commit), support command line options (commit), support flat panel timings (commit), fixups for the AMD Geode GX framebuffer driver (commit), add support for STN displays in s3c2410fb (commit), add YUV video overlay support (commit) in mbxfb

Sound

The scheduled removal of the OSS drivers depending on OSS_OBSOLETE_DRIVER: miroSOUND PCM20 radio, Creative SBLive! (EMU10K1), Crystal Soundfusion (CS4280/461x), AD1816(A) based cards, AD1889 based cards (AD1819 codec), ACI mixer (miroSOUND PCM1-pro/PCM12/PCM20), NM256AV/NM256ZX audio support, Yamaha OPL3-SA2 and SA3 based PnP cards (commit)

V4L/DVB

Add support for remote control of Hauppauge HVR1110 (commit), add support for both DVB frontends of the Lifeview Trio (commit), add support ptv-305 (commit), add support for Avermedia AverTV Studio 507 (commit), add support for the Terratec Cinergy HT PCMCIA module (commit), add support for Pinnacle 310i (commit), add working dib7000m-module (commit), dynamic cx88 mpeg port management for HVR1300 MPEG2/DVB-T support (commit), add usbvision driver (commit), add support for a ASUSTEK P7131 Dual DVB-T variant (commit), add support for Leadtek Winfast DTV Dongle (STK7700P based) (commit), add initial DiB7000M-demod driver (commit), add support for Dibcom DiB7000PC (commit), remove the broken VIDEO_ZR36120 driver (commit), add Omnivision OV7670 driver (commit), adds support for Pinnacle PCTV 400e DVB-S (commit), add support for Hauppauge WinTV-HVR1110 DVB-T/Hybrid (commit), add support for the Compro Videomate DVB-T200A (commit), Implement IR reception for 24xxx devices (commit), add Marvell 88ALP01 "cafe" driver (commit), add support for new revision of Nova-T Stick (commit)

libata

SCSI

Input drivers

Networking devices

USB

Hwmon

Watchdog

I2C

PCMCIA

MMC

IPMI

RTC

Firewire

Various

KernelNewbies: Linux_2_6_20 (last edited 2017-12-30 01:30:21 by localhost)