Round 9 (Dec 2014 to Mar 2015) projects:
Kernel tinification
Mentor:: Josh Triplett
Over time, the Linux kernel has grown far more featureful, but it has also grown significantly larger, even with all the optional features turned off. I'd like to reverse that trend, making the kernel much smaller, to enable ridiculously small embedded applications and other fun uses.
In this project, you'll start from "make allnoconfig", and then try to shrink the kernel even further. You'll learn how to work with the kernel configuration system, Kconfig, and use scripts/bloat-o-meter to measure the size impact of a change.
This is a highly incremental project: each feature you make optional or kernel component you shrink will mostly stand alone, and you can develop and submit each change independently.
Some of these tinification goals will work well during the application period; others will require a substantial time investment, and will primarily make sense during the full internship.
Before working on any of these, especially during the application period, you should send a quick note to the OPW kernel mailing list to coordinate, and avoid duplicated effort.
Please see https://tiny.wiki.kernel.org/ for more details on this effort. See https://tiny.wiki.kernel.org/projects for a list of possible projects. The projects listed as "small" can potentially be done during the application period, or by an intern accepted to work on this project. The projects listed as "large" should wait until the internship.
(Note to mentors and prospective applicants: Josh plans to present the tinification effort and list of projects at Linux Kernel Summit, and may add, remove, or edit items on this list based on feedback obtained there.)
Coccinelle
Mentor: Julia Lawall, Nicolas Palix
Coccinelle is a program matching and transformation tool for C code that has been used extensively in contributing to the Linux kernel, for both code evolutions and bug fixes. Coccinelle is driven by specifications, known as semantic patches, that use a notation based on C code, and are this fairly easy to develop. Around 40 semantic patches are included with the Linux kernel source code, in scripts/coccinelle, and are used in the continuous testing service provided by Intel.
Recently, we have used Coccinelle in an extensive study of faults in Linux 2.6. The goal of this project is to extend the results to more recent versions of Linux, and to facilitate the extension of the work to subsequent versions. This will entail:
- Running the Coccinelle scripts that have been developed to collect data.
- Evaluating the resulting reports to identify real bugs and false positives.
- Submitting patches to the Linux kernel to fix the identified real bugs that are still present in the kernel.
- Updating a database with the results.
- Creating graphs to summarize the results.
Surviving Year 2038
Mentor: Arnd Bergmann
The concept of 'time' in Linux is encoded in many different ways, but the most common one is based on the 'time_t' type that counts the number of seconds that have passed since Jan 1, 1970. This type is currently defined as 'long', which on 32-bit systems is a signed 32-bit number that will overflow on Jan 19 2038 and likely cause existing systems to stop working, see http://en.wikipedia.org/wiki/Year_2038_problem.
On 64-bit systems, the problem is solved for the most part because 'long' is a 64-bit number that will not overflow for billions of years, but there are some important missing pieces such as file systems that store time in 32-bit quantities on disk as well support for 32-bit user space binaries running on 64-bit kernels.
Solving this problem in general is a huge effort involving lots of changes in the kernel as well as in user space. This project focuses on the kernel side, can be nicely split up into many small subtasks and is a prerequisite for doing the user space changes. There are currently 2117 instances of 'time_t', 'struct timespec' and 'struct timeval' in the kernel, and we are going to replace all of them with other types.
Any isolated in-kernel uses of these types can be replaced with 'ktime_t' or 'struct timespec64'. For any interface to user space (typically an ioctl command or a system call) that passes a data structure based on these types, we have to keep the existing interface working and introduce an alternative interface that can be used by newly built user space programs.
y2038 has a deeper introduction to the topic and will be updated with more detailed subtasks over time.
IIO staging drivers cleanup
Mentors: Octavian Purdila, Daniel Baluta
The Industrial I/O subsystem is intended to provide support for devices that in some sense are analog to digital or digital to analog converters. Some devices that fall in this category are: accelerometers, gyroscopes, light sensors, etc. This project will involve cleaning and moving IIO drivers out from staging. Most of the work will change drivers to use proper IIO ABI and adapt the code to follow the Linux Kernel coding style.
We plan to start the with driver for Intersil ISL29018 digital ambient light and proximity sensor. Follow IIO cleanup page for small tasks and updates.
Khugepaged swap readahead
Mentor: RikvanRiel
Linux can transparently use huge pages (THP) for anonymous memory on x86 and several other architectures. These huge pages allow programs to run faster, due to reduced TLB pressure, and lower administrative overhead. The huge pages are formed either directly at allocation time, or by collapsing several (512 on x86) small pages together into one huge page. When the system is low on memory, huge pages are broken into small pages, which then get swapped out. However, after the memory pressure is over, and most of the small pages have been swapped back in, there usually are a few small pages left on swap, and the huge page cannot be reconstituted. In other words, a one-time swap event can cause permanent performance degradation.
The project consists of teaching khugepaged to slowly fetch pages from swap, when most of the pages of a region that could form a huge page are resident in memory, a few pages are in swap, and there is plenty of memory to form huge pages. This project involves a lot of reading of memory management code, and a smaller amount of code writing. Part of the project will involve documenting how some of the existing code works.