KernelNewbies:

Released 20 September, 2006 (full SCM git log)

Short overview (for news sites)

This release includes lightweight user space priority inheritance support (http://lwn.net/Articles/178253/), a "lock validator" debugging tool (http://lwn.net/Articles/185666/), a new power saving policy for multicore systems, SMPnice (http://lwn.net/Articles/186438/), a much improved SATA layer (http://lwn.net/Articles/183734/), swapless page migration (http://lwn.net/Articles/160201/), per-zone VM counters, per-task delay accounting, a new per-packet access control for SELinux called 'secmark' (http://james-morris.livejournal.com/11010.html), randomized i386 vDSO, a few new drivers, additional device support for many existing drivers, many bug fixes and many other small improvements.

Important things (AKA: ''the cool stuff'')

Lightweight user space priority inheritance (PI)

PI is a critical feature for RT-ish applications. Without PI, if a high-priority and a low-priority task share a lock, even if all critical sections are coded carefully to be deterministic (I.E., all critical sections are short in duration and execute only a limited number of instructions), the kernel cannot guarantee any deterministic execution of the high-priority task: any medium-priority task could preempt the low-priority task while it holds the shared lock and executes the critical section, and could delay it indefinitely. User-space PI helps with achieving/improving determinism for user-space applications in those cases. Detailed LWN article, glibc patch can be found here, justification for this feature and design documentation: (commit); code: (commit), (commit), (commit)

Lockdep, a kernel lock validator

Linux's locking style is known for being simple compared with other Unix SMP-friendly derivatives. Still, locking is a necessary evil that is hard to get right for most normal programmers (most of us). Locking bugs can be very difficult to find, especially in drivers, which don't get the solid review that the core kernel has. The kernel lock validator is a debugging tool that tries to makes such things easier, it's (LWN article) "a complex infrastructure to the kernel which can then be used to prove that none of the locking patterns observed in a running system could ever deadlock the kernel". If you want to help to make Linux more stable, give it a run and report the backtraces printed on dmesg at linux-kernel@vger.kernel.org or http://bugzilla.kernel.org. Design documentation: (commit), code: (commit)

Process scheduler

New power saving policy

In machines with several multi core/smt "packages" (which will become increasingly common in the future), the power consumption can be improved by letting some packages idle while others do all the work, instead of spreading the tasks over all CPUs, so a optional power saving policy has been developed to make this possible. When this power savings policy is enabled - set to 1 the sysfs entry 'sched_mc_power_savings' or 'sched_smt_power_savings' placed under /sys/devices/system/cpu/cpuX/ when enabled CONFIG_SCHED_MC / CONFIG_SCHED_SMT - and under light load conditions, the scheduler will minimize the physical packages/cpu cores carrying the load and thus conserving power, but impacting the performance depending on the workload characteristics (when there's lot of work to do all CPUs will be used, to completely disable individual CPUs use the already available CPU hot plugging feature by writing 0 to the "online" file in that sysfs directory). For more details on the effect of this policy read the "Chip Multi Processing (CMP) aware Linux Kernel Scheduler" talk from the OLS 2005 (page 201 and onwards) (commit)

SMPnice

(A.K.A. 'take priority into account when balancing processes between CPUs'): One of the design principles of the new 2.6 scheduler (aka, "Ingo's O(1) scheduler") was the idea of having a separate run queue of processes for each CPU present on the system, instead of a single run queue for all CPUs, for scalability reasons. Periodically, the scheduler would balance the per-cpu run queues to distribute all the jobs and keep all the CPUs busy. However, priority levels were not taken into account at the time of doing this balance and it was possible recreate scenarios where the kernel was being unfair, when mixing processes with different priorities. "SMPnice" is an implementation of a solution for this problem (LWN article), (commit)

Memory management

Swapless page migration

Being able to migrate physical pages between nodes in NUMA-like systems - to improve the locality of reference - was introduced in Linux 2.6.16, but it didn't use a very clean method: pages were swapped out in purpose, and then the next time those pages would be faulted, they'd be swapped in to the node where you wanted to move those pages instead of the old one. This trick was used but now the feature has been completed with "direct page migration": Now pages are moved directly from one node to another, without using swap. This feature includes a new system call which allows to move individual pages of a process from one node to another: long move_pages(pid, number_of_pages_to_move, addresses_of_pages[], nodes[] or NULL, status[],lags) - the swap-based migration had already added a migrate_pages() syscall and a MPOL_MF_MOVE option to the set_mempolicy() syscall). For full details, read this (LWN article). Code: (commit), (commit), (commit), (commit), (commit), (commit)

Per-zone VM counters

Zone based VM statistics are necessary to be able to determine what the state of memory in a zone is. The counters that we currently have for the VM are split per processor, but the processor has not much to do with the zone these pages belong to: we cannot tell f.e. how many pages on a particular node are dirty - if we knew then we could put measures into the VM to balance the use of memory between different zones and different nodes in a NUMA system. It would allow the development of new NUMA balancing algorithms that may be able to improve the decision making in the scheduler of when to move a process to another node - and hopefully will also enable automatic page migration through a user space program that can analyze the memory load distribution and then rebalance memory use in order to increase performance. This feature allows to have such info. The zone_reclaim_interval sysctl vanishes (since VM stats can now determine when it is worth to do local reclaim), and there're accurate counters in /sys/devices/system/node/node*/meminfo (current counters are not very accurate). Other detailed VM counters are available in more /proc and /sys status files (commit), (commit), (commit), (commit), (commit), (commit), (commit), (commit), (commit), (commit), (commit), (commit), (commit), (commit)

Per-task delay accounting

This feature collects information on time spent by a task waiting for system resources like cpu, synchronous block I/O completion and swapping in pages. Until now, it was only possible to know that a process was not running, but it was not possible to obtain detailed information in what was making the process spend the time. The data is exported through netlink and /proc/<tgid>stats (commit), (commit), (commit), (commit), (commit), (commit), (commit), (commit), (commit), (commit)

Big libata (SATA) update

(LWN article) Mainstream libata has been missing some features like NCQ and hot plug. The code had been written a while ago (more than a year ago in the case of NCQ) but only now it has been considered stable. The features included in this update are: a revamped error handling across all the libata code, which makes libata more robust to errors and failures, and makes easier to debug problems (commit); NCQ (Native Command Queuing) which improves the performance greatly for many workloads) (commit), hotplug (commit), warmplug (commit), and bootplug - boot probing via hotplug path - support (commit), interrupt-driven PIO mode (instead of the inefficient poll method), (commit), add MCP61 support (commit)

Change the default IO scheduler to 'CFQ'

2.6 features modular I/O schedulers: There're several I/O schedulers with different performance properties (that you can change at runtime with /sys/block/hda/queue/scheduler). The Anticipatory Scheduler (AS) has been the default one since then, but the CFQ (Complete Fair Queuing) scheduler has been gaining adoption since then, to the point that it's the default I/O scheduler for RHEL 4, Suse, and other distros. One of the coolest things about CFQ is that it features (since 2.6.13) "io priorities": That means you can set the "I/O" priority of a process so you can avoid that a process that does too much I/O (daily updatedb) starves the rest of the system, or give extra priority to a process that shouldn't be starved by other processes, by using the "ionice" tool included in schedutils (1.5.0 and onwards). Now CFQ is the default scheduler (commit) (after some performance tweaks that should improve the performance in many workloads) (commit). If you want to continue using the AS scheduler, you can change it at runtime in /sys/block/hda/queue/scheduler, or use the "elevator=as" boot option.

Secmark: Add security markings to packets via iptables

James Morris article SELinux already has methods to "mark" network packets, but they're not as expressive or powerful as the controls provided by Netfilter/iptables. So Netfilter/iptables has been leveraged for packet selection and labeling, so that now SELinux can have more powerful and expressive network controls for adding security markings to packets. This also allows for increased security, as the policy is more effective, allowing access to the full range of iptables selectors and support mechanisms. The feature includes a SECMARK target allowing the admin to apply security marks to packets via both iptables and ip6tables, a CONNSECMARK target used to specify rules for copying security marks from packets to connections and for copying security marks back from connections to packets, and secmark support to conntrack. Examples of policies and rulesets, and patches for libselinux can be found here. (commit), (commit), (commit), (commit), (commit), (commit), (commit)

Add binding/unbinding support for the VT console

This feature adds the ability to detach and attach the framebuffer console to and from the vt layer. With this change, it is possible to detach fbcon from the console layer. If it is detached, it will reattach the boot console driver (which is permanently loaded) back to the console layer so the system can continue to work. Similarly, fbcon can be reattached to the console layer without having to reload the module. Attaching and detaching fbcon is done via sysfs attributes. A class device entry for fbcon is created in /sys/class/graphics. The two attributes that controls this feature are detach and attach. Two other attributes that are piggybacked under /sys/class/graphics/fb[n] that are fbcon-specific, 'con_rotate' and 'con_rotate_all' are moved to fbcon. They are renamed as 'rotate' and 'rotate_all' respectively. Overall, this feature is a great help for developers working in the framebuffer or console layer as there is no need to continually reboot the kernel for every small change. It is also useful for regular users who want to choose between a graphical console or a text console without having to reboot (commit), (commit), (commit), (commit), (commit), (commit)

New drivers

Here are some important drivers that have been added to the Linux tree - note that it says 'drivers', only new important drivers are listed today. Other small drivers are listed below; the already available drivers also add support for new devices and some are listed below but support for new devices is added so fast that it's impossible to keep track of all of them.

Generic IRQ layer

Yet More Generalization of the IRQ handling layer. Not all architectures were using the current IRQ layer (specially ARM) and the current one had some shortcomings. From this LWN article: These patches attempt to take lessons learned about optimal interrupt handling on all architectures, mix in the quirks found in the fifty (yes, fifty) ARM sub architectures, and create a new IRQ subsystem which is truly generic, and more powerful as well. Design documentation: (commit); code: (commit), (commit), (commit), (commit), (commit), (commit), (commit), (commit)

Generic core time subsystem

The time work is done in an architecture-dependent way. This work tries to provide a core time subsystems that can be used for all architectures, avoiding lots of code duplication. Detailed analysis in this LWN article; (commit), (commit), (commit), (commit), (commit), (commit), (commit), (commit), (commit), (commit), (commit)

Randomize the i386 vDSO

Move the i386 VDSO down into a vma and thus randomize it. Besides the security implications (attackers cannot use the predictable high-mapped VDSO page as syscall trampoline anymore) this feature also helps debuggers, and it's good for hypervisors (Xen, VMWare) too. There's a new CONFIG_COMPAT_VDSO option, which provides support for older glibcs that still rely on a prelinked high-mapped VDSO. Newer distributions (using glibc 2.3.3 or later) can turn this backwards-compatibility option off (recommended, for security reasons, as the features makes harder certain types of attacks). There is a new vdso=[0|1] boot option as well, and a runtime /proc/sys/vm/vdso_enabled sysctl switch, that allows the VDSO to be turned on/off (commit)

Various core stuff

Other stuff

Architecture-specific changes

x86 32/64

Make powernow-k7 work on SMP kernels (commit), a cache pollution aware update to copy_from_user_ll() (commit), a x86-64 version of the "alternatives" feature in x86-32 (commit), nmi watchdog support for new Intel CPUs (commit), reliable stack trace support for x86-64 (commit), x86_64 stack overflow debugging (commit)

PPC

Add cpufreq support to Xserve G5 (commit), use the device tree for the iSeries vio bus probe (commit), (commit), add support for PCI-Express nodes in the device tree (commit), oprofile support for POWER6 (commit), add cell RAS support (commit), support for Time-Of-Day-Clock (commit), base support for the Freescale MPC8349E-mITX eval board (commit), 85xx CDS board support (commit), 86xx HPCN platform support (commit), (commit), (commit), (commit), (commit), (commit), Freescale mpc7448 (Taiga) board support (commit)

ARM

Initial uCLinux support for MMU-based CPUs (commit), add the base support for Hilscher's netX network processors (commit), add AMBA CLCD support in lpd7a40x (commit), add support for Philips PNX4008 ARM platform (commit), add spi support to lubbock platform (commit), add support for NXDKN development board (commit), core support for the Samsung s3c2442, and its serial port (commit), framebuffer driver for Hilscher netX (commit), add support for NXDB500 development board (commit), add support for NXEB500HMI development board (commit), add support for Trizeps4 SoM and ConXS-evalboard (commit), add cirrus logic edb9315 support (commit), add ajeco 1arm sbc support (commit)

MIPS

Add: support for the S3c2412 core cpu (commit), APM emu support (commit), the R5500-based NEC EMMA2RH Mark-eins board (commit). the GT-64120-based Wind River 4KC PPMC evaluation board (commit), the RM9000-based Basler eXcite smart camera platform (commit), cirrus logic edb9315 support to ep93xx (commit) and for edb9302 (commit), MIPS32/MIPS64 secondary cache management (commit), remove support for NEC DDB5476 (commit) and DDB5074 (commit), add core support for the TI F-Sample Board (OMAP 850) (commit), readd Amstrad Delta USB support (commit), add GPMC support for OMAP2 (commit), add bitbank SPI driver for Innovator 1510 touchscreen (commit) and add oprofile Support VSMP on 34K (commit)

SPARC64

Use the OBP to obtain information avout the system (commit), (commit)

IA64

MSI support for Altix (commit), (commit)

S390

S390 Hypervisor Filesystem (commit), add support for parallel-access-volumes to the dasd driver (commit)

m68k

Coldfire 532x support (commit), (commit), (commit)

Filesystems

SELinux

Add security class for appletalk sockets so that they can be distinguished in SELinux policy (commit), execve argument logging (commit), ppid logging (commit), filtering by ppid (commit), path-based rules using internally the inotify API (commit), SELinux hooks to support the access key retention subsystem within the kernel (commit), support for a rule key, which can be used to tie audit records to audit rules. This is useful when a watched file is accessed through a link or symlink, as well as for general audit log analysis (commit), support for object context filters based on the elements of the SELinux context (commit), audit syscall classes: Allow to tie upper bits of syscall bitmap in audit rules to kernel-defined sets of syscalls (commit), add security hooks to {get,set}affinity to enable security modules to control these operations between tasks with task_setscheduler and task_getscheduler LSM hooks (commit), add a security hook call to enable security modules to control the ability to attach a task to a cpuset (commit), implement an LSM hook for setting a task's IO priority (commit), add security_task_movememory calls to mm code to enable security modules to mediate this operation between tasks (commit), add task_movememory hook to be called when memory owned by a task is to be moved (commit), add sockcreate node to procattr API - /proc/self/attr/sockcreate. A process may write a context into this interface and all subsequent sockets created will be labeled with that context (commit), add rootcontext= option to label root inode when mounting (commit), add audit AUDIT_PERM support [http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=55669bfa141b488be865341ed12e188967d11308 (commit)

Networking

Drivers and other subsystems

Video

Add i945G support to the intelfb driver (commit) and i945GM aswell (commit), add suspend/Resume support for nVidia nForce AGP (commit), update radeon driver and add r200 vertex program support (R200_EMIT_VAP_PVS_CNTL) (commit), add support for Geforce 6100 and related chipsets to nvidiafb (commit), add support for Display Update Module and RGB framebuffer device on Philips PNX4008 ARM board (commit), add frame buffer driver for the 2700G LCD controller present on Compulab CM-X270 computer module (commit)

Sound

hda-codec: Add support for: Apple Mac Mini (early 2006) (commit), Sony Vaio VGN-A790 laptop with ALC260 codec (commit), Sony Vaio VGN-S3HP with ALC260 codec (commit), Thinkpad X60/T60/Z60 laptops with AD1981HD codec (commit), LG S1 laptop (commit), ATI RS600 HDMI audio device (commit), 9227/9228/9229 sigmatel hda codecs (commit), HP nx6320 with AD1981HD codec (commit), ALC888, ALC660 (ALC861-compatible) codecs and HP xw4400/6400/8400/9400 (model=hp-bpc) (commit), Intel D965 boards with STAC9227 codec (commit)

Add support for SB Live! 24-Bit External remote control (commit), for Audigy4 (not Pro) (commit), for Turtle Beach Roadie (commit), for oss sound support in au1200 (commit), for iMac G5 iSight (commit), for power management in the cs5535audio (commit) and azt3328 (commit) drivers,

Add O_APPEND flag support to PCM to enable shared substreams among multiple processes (commit)

SCSI

Create libiscsi (commit), expose the bus setting to sysfs in aic7xxx driver (commit), add DMI (Diagnostics Monitoring Interface) (commit) and NVRAM 'Disable Serdes' bit support (commit) in qla2xxx driver, wide port support in mptsas (commit), and add 1078 ROC (Raid On Chip) Support (commit)

Input

Add mapping for Wistron MS 2111 (commit), add support for Intellimouse 4.0 (commit), and add input device support (commit)

USB

Add: Macbook Pro touchpad support (commit), new driver for Cypress CY7C63xxx mirco controllers (commit), add support for Kyocera Wireless KPC650/Passport EV-DO/1xRTT PC Cards (commit) and for Sierra Wireless MC5720 (commit), add support for ASIX 88178 chipset USB Gigabit Ethernet adaptor (commit), add support for Yost Engineering Servocenter3.1 (commit), add support for VIA VT8251 (commit), add support for WiseGroup., Ltd Smartjoy Dual PLUS Adapter (commit), add ZyXEL vendor/product ID to rtl8150 driver (commit), add driver for non-composite Sierra Wireless devices (commit), add ohci bits for the cirrus ep93xx (commit), add support for Susteen Datapilot Universal-2 cable in pl2303 (commit),

Network drivers

Add new SMSC LAN83C185 10BaseT/100BaseTX PHY driver for the PHY subsystem (commit), add VLAN (802.1q) support to the sis900 driver (commit), enable (via the IPW2200_PROMISCUOUS config option) the creation of a second interface prefixed 'rtap' for RF promiscuous mode in the ipw2200 driver (commit), add TRENDnet TE-CF100 ethernet adapter support in pcnet_cs driver (commit), add support for the Cicada 8201 PHY (commit); expose several configuration knobs configurable through ethtool in the forcedeth driver - ring sizes (commit) WOL (commit) rx and tx checksum offloads (commit) flow control (commit) diagnostic tests (commit) and hardware statistic counters (commit) and add new device ids (commit)-; convert au1000_eth driver to use PHY framework (commit), enable shared key authentication (commit) in the bcm43xx driver and add ipv6 TSO feature (commit) in the TG3 driver, allow WoL settings on new 5708 chips (commit) and add firmware decompression (commit) in the BNX2 driver, add ethtool eeprom support (commit) in 8139cp driver, add WOL support (commit) in the b44 driver, add netpoll support to the s2io driver (commit), and add support for the Cicada 8201 PHY (commit); add ich8lan core functions (commit), smart power down code (commit) and integrate ich8 support into driver (commit) in e1000 driver; NAPI support for via-rhine (commit)

V4L/DVB

Cx88 driver: added support for KWorld MCE 200 Deluxe (commit), IR remote support for DTV2000H (commit), basic support for Leadtek Winfast DTV2000H card (commit), support for the new cx88 card #50: NPG Tech RealTV, including it's remote (commit), support for FusionHDTV 3 Gold (original revision) (commit), support for Geniatech Digistar / Digiwave 103g (commit)

Add support for pcHDTV HD5500 ATSC/QAM (commit), add support for DViCO FusionHDTV DVB-T Lite 2nd revision in the Dvb-bt8xx driver (commit), enable Blackbird MPEG encoder support in KWorld HardwareMpegTV XPert: (commit), add support for the TCL M2523_3DB_E tuner (commit), implement v4l2 driver for the Hauppauge PVR USB2 TV tuner (commit), add v4l2 compatibility to the pwc driver, include the decompressor, export to userland compressed stream, more cameras supported etc (commit), add support for the Texas Instruments TLV320AIC23B audio codec (commit), Genpix 8PSK->USB driver (commit), add support for Samsung TCPG 6121P30A PAL tuner (commit), add support for Avermedia 6 Eyes AVS6EYES (commit), add support for the cx25836/7 video decoder (commit), add support for VP-3250 ATSC card (commit), add support for DViCO FusionHDTV DVB-T Dual USB based on zl10353 (commit), add CX2341X MPEG encoder module (commit), add support for the DNTV Live! mini DVB-T card (commit), add support for USB Logitech Quickcam Messenger (commit)

RNG

Remove old HW RNG support (commit), and add a new generic HW RNG core (commit), Geode HW RNG driver (commit), AMD HW RNG driver (commit), VIA HW RNG driver (commit), Intel HW RNG driver (commit), bcm43xx HW RNG driver (commit), ixp4xx HW RNG driver (commit), TI OMAP CPU family HW RNG driver (commit)

RTC

Add: driver for ARM AMBA PL031 RTC (commit), AT91RM9200 RTC driver (commit), rtc-dev UIE emulation for UIE-less rtc drivers (commit), v3020 RTC support (commit), rtc-ds1742 driver for the Dallas DS1742 RTC chip (commit), rtc-ds1553 driver for the Dallas DS1553 RTC chip (commit), rtc-rs5c348 driver for the Ricoh RS5C348 RTC chip (commit), class driver for Samsung S3C series SoC (commit), "RTC-framework" driver for DS1307 and similar RTC chips (commit), max6902 RTC support for the MAX6902 SPI RTC chip (commit), port of the driver for the pcf8583 i2c rtc controller to the generic RTC framework (commit), support for the I2C-attached Intersil ISL1208 RTC chip (commit)

Various drivers

KernelNewbies: Linux_2_6_18 (last edited 2017-12-30 01:30:02 by localhost)