Where do I begin ?
A common question asked by a newbie is "I've just unpacked this huge tarball, and I want to help out, but I don't know where to start!"
It may seem daunting to be confronted with such a large amount of source code, but bear in mind, that very few kernel hackers understand every area of the kernel tree.
People specialise. If you're interested in TCP/IP, you'll not be needing to read the filesystem code. Figure out what it is you want to be working on, and focus on that.
Linux is a professional-quality kernel. This makes it difficult to come up with small "student projects" by which you can learn: often features are already implemented, and at a level that requires a good level of understanding before you can hack on them. However, there are several practical things and useful things you can do until you have learned enough to really start hacking :
Test and benchmark
- New code is constantly evolving, benchmark it. You will certainly notice some odd behaviours: there is your impetus to understand where the behaviour is coming from. Profile things, trace it (e.g. LTT), see if you can work out what might be causing problems. You'll learn the code by accident. Try out experimental patches posted to linux-kernel and trees like mjc's. Try and understand what a particular patch does, and how it does it.
- Sounds boring ? Maybe, but you'll be doing everybody a favour, not least yourself. Forcing yourself to explain things crystalises your own understanding. Documentation of behaviour requires you to understand code. You'll find code a lot easier to read if you are directed to answering a specific question. Write articles for kernelnewbies and get them peer-reviewed in the IRC channel. Identify inaccuracies in the current man-pages, and fix them. Add source docs to the kernel source.
Kernel janitors is a project to fix mis-use of kernel APIs as the code mutates. This can quickly get pretty interesting. An educational talk on the project can be read here
For more tips, see the CompleteNewbiesClickHere page.
Start reading the LKML (Linux Kernel Mailing List). Read all sorts of different threads and gradually you'll pick up on the lingo - you'll probably also learn a thing or two about the kernel at the same time (and it's a great way to pick up info on things that need doing - if you see something discussed that you think you can fix, then you can jump right in and offer to fix it - but don't forget to let people know if you give up so someone else can take over). Don't worry if you don't understand everything at first, just start following the list and pick up the bits you can. (ohh and when you post to the list, don't top-post).
You can also take a look at http://www.kernelnewbies.org/KernelGlossary for a list of the most common terms with short explanations.
Let me see if I can give you some ideas on easy things to start with.
There are lots of things that need doing and some of them don't even require programming skills.
You can help improve the documentation. Even a simple thing such as fixing spelling errors or correcting grammar is worth doing.
It's also helpful if you dig for Kconfig help entries that are blank, work out what the config option is for and write a patch that add a suitable help entry.
You can also help out by simply running the daily git snapshots and/or latest -mm kernels on a regular basis. Use them as your every-day kernel, watch for regressions and bugs (build them with some of the debugging options enabled). Then if/when you spot a regression or bug (make it a habit to check dmesg once in a while), write up a detailed bug report and submit it to the list and relevant maintainers. That's a *very* useful thing to do. You can also try to actively provoke bugs by stressing different areas of the kernel or just by trying out unusual things. Personally the first thing I do every morning when I turn on my PC is fetch the latest git snapshot from kernel.org, build it, install it, boot it and then use that as my kernel for the day (ok, maybe not every day, but at least I do it every other day). It's perfect if you are able to fix the bug you found and can submit a patch along with your bug report, but it's in no way required. If you can submit a detailed and well written bug report then that is worth it's weight in gold.
You can also take a look at the Kernel janitors TODO list - http://kernelnewbies.org/KernelJanitors/Todo - it lists a lot of minor things that need to be done. Some of it seems trivial and tedious work, but it's often a great place to find some beginners projets to cut your teeth on.
You can also dig through the source looking for files that don't conform to the official kernel CodingStyle (see Documentation/CodingStyle for details). Then clean up the file so it does conform (hint: scripts/lindent is helpful to get started but don't assume it gets everthing right, it needs manual verification and correction).
Another easy thing to get started with is simply build a lot of kernels, log all the output from the build process, look for any warnings or errors that show up, pick one at a time, investigate *why* the compiler warns or errors out, create a patch to fix the problem, submit it, listen to the feedback you get (possibly correct the patch), then move on to the next warning/build error. "make randconfig" is useful to generate a bunch of random configs to build and is an easy way to gather lots of warnings and errors to investigate (just make sure you don't waste time on configs that are generated with CONFIG_EXPERIMENTAL set - skip those). This is useful work.
As you get more experienced you can try to get involved in projects discussed on the LKML (you should be following this list regularly by now) that interrest you.
I'd say doing a little bit of all of the above is a good way to start. It'll let you get aqainted with a lot of different parts of the kernel. It'll let you get comfortable with submitting patches and working with other developers and you'll be doing useful work fairly quickly.
Some more general advice;
Before you do anything else make sure you have (as a minimum) read the following documents :
From the kernel source:
I should probably also stress the importance of keeping up-to-date backups of any important data you have on your machine or using a sepperate development machine when running development kernels. Although bugs that end up eating your files and/or filesystems are rare these days they still happen from time to time, so just make sure that that's not a problem.
Also, when running development kernels you *will* run into Oops'es (crashes) more often than when only running the stable kernels. And when you do, capturing the Oops output is very important, but often the system will be in no state to write it to your logs or you may be in X and don't even see it get printed on the console or the system just hangs. Be prepared to capture bugs, either by running a serial console on another machine where you can record the Oopses when they happen or use netconsole (useful, but not as reliable as serial console in my experience). If you have a printer, then console on line printer can also be useful. If you can do none of those, then taking a photo of the screen with the Oops on it or writing it down by hand (*all* of it) will have to do. You probably also want to make sure your kernels have magic sysrq support build in so you can (at least attempt to) do an emergency flush, and unmount of your filesystems when the box hangs followed by an emergency reboot... That's a lot better than just hitting the power button. sysrq can also be used to capture stack traces for running programs, get memory dumps etc. See Documentation/sysrq.txt for details.
Also, please always submit patches in plain text generated with "diff -up" inline in a plain-text email. No HTML mails. No attachments. Plain text with patch inline (and do test send the patches to your self first the first few times and try applying them to make sure your mailer doesn't mangle them). Also always include a diffstat and a Signed-off-by: line (the above documents have details on this). You also don't want to mix different types of changes in the same patch. If for example you have been doing a warning fix as well as a general CodingStyle cleanup of a file, then don't include both in the same patch but submit two distinct patches instead (in 2 emails) that each do just one thing. Also always include in the description of your patch not just what it does but also *why* it does it.
At some point in time you probably also want to learn how to use git - http://git.or.cz/ - which is what Linus and lots of other core developers use to manage the kernel source.
Some additional pages to visit online for more info etc :