Git, quilt and binary searching
Generally, kernel source trees are maintained using two basic tools, git and quilt. While it is true that you can also use some git ”frontends”, like Cogito, Stacked GIT, Patchy Git (pg), (h)gct, which simplify user interactions with this tool, we will focus on git itself and on quilt, since each of them is strictly related to a specific way of maintaining a source tree.
4.1 Git
Git is a versioning control system written by Linus Torvalds specifically for the maintenance of the main Linux kernel source tree (ie. the mainline). The current maintainer of git itself is Junio C. Hamano and the latest version is git-1.5.1.3.
Of course, to use git you need to install it in your system, but all of the contemporary Linux distributions provide git packages that can be installed in usual ways. The most important of these packages is usually called git-core, as it contains the most basic git components. Alternatively, you can download the source code of git from http://www.kernel.org/pub/software/scm/git/ and build it yourself.
Like some other versioning control systems (eg. CVS ), git assumes that there is a central ”master” directory tree used as a reference for all changes made to the maintained source code. This directory tree contains the source code itself along with the history of all changes made to it since certain point in time. To be registered in the ”master” tree, the changes must be accepted by its maintainer (eg. by Linus, in the case of the Linux kernel ”master” tree) and after they have been registered (”committed” in the git-related terminology) all of the users of this tree can ”see” the source code with these changes applied. Still, they can also see how the source code looked like before specific modifications and they can follow its history (ie. view all modifications made to it after given point that may be specified in many different ways). For this reason each modification of the source code is registered in the ”master” tree along with additional information including, among other things, the name and email address of its author and a short description of the purpose and scope of the modification (known as the ”changelog”).
If you create a copy of the Linux kernel ”master” tree, all of the information contained in it will be available to you locally and can be used for many different purposes. In particular, you can use them to identify the modifications that have introduced bugs.
To make a copy of this tree, or to ”clone” it in the git language, you can run
$ git-clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-git
This will download the entire Linus’ git tree to your machine and the local copy of it will be located in the subdirectory linux-git of the current directory (ie. the one that was current when the above command was being executed). You should remember, however, that the Linus’ tree is huge, so the download may take quite a lot of time, depending on your Internet connection’s bandwidth and the current load of the kernel.org’s git server. If you are not going to use the git capabilities described below, it might be less cumbersome to download stand-alone patches and apply them as described in Section 1.2.
Once you have downloaded the Linus’ git tree, it generally is a good idea to tell git that it is the the tree that you want to start with. To do this, change directory to linux-git and run
$ git-checkout -f
Now, you are able to do some really nice things with the help of git . For example, you can synchronize your local copy of the Linus’ tree with the original one by downloading only the changes that were committed by Linus after you had run git-clone . For this purpose, run
$ git-pull git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git $ git-checkout
Of course, this means that you don’t need to run git-clone every time the Linus’ tree changes. Moreover, you should not do this. In fact, it is only necessary do download the Linus’ tree once and then use git-pull to update your local copy. Naturally, if you make your own changes to the local copy of the tree, git-pull may fail, but then you can use ’git-checkout -f’ to revert all of the conflicting changes. Still, if you want to save your modifications, it is not that easy any more, but we won’t discuss this case any further (for more information see, for example, the tutorial at http://www.kernel.org/pub/software/scm/git/docs/tutorial.html).
Apart from the git commands presented above, you will probably find the following ones useful:
git-whatchanged <file> – shows all changes affecting given source file (the patch relative to the git tree’s root directory should be provided)
git-bisect * – used for binary searching for buggy patches (see below)
git-revert * – reverts given change (”commit” in the git terminology)
gitk – graphically visualizes the tree
git-show – prints given commit
git-log – shows the list of changes made to the tree
In our opinion particularly useful is the git-log command that allows you to track the history of the source code. By running
$ git-log
you can see all changes made since the Linus’ tree was created. Well, albeit impressive, this is not particularly useful, because there are very many of them, but you can easily narrow the scope of its output. For example, the command
$ git-log v2.6.19..+
will print the changes made since the version 2.6.19 of the kernel (this won’t work for v2.6.21.., but we don’t know why), and the following one:
$ git-log --since="7 days ago"
will give you all changes made since – you have guessed it – 7 days ago. Isn’t it nice?
While playing with git-log you can see that each change, or commit, in the log starts from the word commit and a long hexadecimal number. These numbers are unique commit identifiers that can be used for referring to specific commits in many git commands. For example, if you want to see the source code modifications associated with commit b5bf28cde894b3bb3bd25c13a7647020562f9ea0 in the form of a patch, run
$ git-show b5bf28cde894b3bb3bd25c13a7647020562f9ea0
It sometimes is unnecessary to use the entire commit identifier here, since several initial characters may be sufficient to identify the object. Thus to get the same result as from the above command, it may be sufficient to run
$ git-show b5bf28
Of course, if you want to save the resulting patch in a file, it is only necessary to do
$ git-show b5bf28 > b5bf28.patch
where b5bf28.patch can be replaced with an arbitrary file name.
When you run such git commands as git-clone or git-pull, some data are being transferred over the Internet. In such cases you should always use the git:// protocol designed specifically for transferring git objects. Still, sometimes you may not be able to do this (eg. if you are behind a firewall that you cannot configure) and then you can use the ”ordinary” http:// protocol, but it is not generally recommended.
It is worthy of noting that the Linus’ tree is not the only git tree downloadable from kernel.org . The other ones correspond to various development branches of the kernel introduced in Section 1.3. At http://git.kernel.org there is the list of all git trees available from kernel.org . Another list of kernel git tree is available from http://git.infradead.org .
To learn more about git, you can read the documentation distributed along with it or visit, for instance, the web page at http://www.kernel.org/pub/software/scm/git/docs
4.2 Quilt
Quilt is a tool to manage series of patches. First of all, it makes the creation of patches easy and allows one to preserve the right ordering of patches within the series. It also automates the application and reverting of patches as well as the updating of patches in accordance with manual modifications of the source code. Usually, it can be installed from a package provided by your distribution, but you can also install it from the source code available at http://savannah.nongnu.org/projects/quilt/ . In principle, it can be used for managing modifications of an arbitrary set of text files located in a single directory tree, but we will focus on using it with respect to the Linux kernel source code.
Some Linux kernel developers use quilt as their primary patch management tool and some of the patchsets discussed in Section 1.4 are distributed as quilt-friendly series of patches. The most important of them is the -mm tree maintained by Andrew Morton (see Section 1.5).
A quilt-friendly series of patches consists of the patches themselves and the file series, in which the patch names are saved in the right order (they must be the same as the names of the files that contain the patches). To apply the patches, you only need to create a subdirectory called patches in the root of the kernel source tree and place the entire patch series, including the series file, in it. Then, you can use the ’quilt push -a’ command to apply them all in one shot.
Suppose, for example, that you want to apply the series of patches available at http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc1/patches/ and consisting of the following files:
01-freezer-close-theoretical-race-between-refrigerator-and-thaw_tasks.patch 02-freezer-fix-vfork-problem.patch 03-freezer-take-kernel_execve-into-consideration.patch 04-fix-kthread_create-vs-freezer-theoretical-race.patch 05-fix-pf_nofreeze-and-freezeable-race-2.patch 06-move-frozen_process-to-kernel-power-processc.patch 07-power-management-use-mutexes-instead-of-semaphores.patch 08-swsusp-fix-sysfs-interface.patch 09-acpi-fix-suspend-resume-ordering.patch 10-swsusp-remove-platform-callbacks-from-resume-code.patch 11-swsusp-reduce-code-duplication-between-user_c-and-disk_c.patch series
on top of the 2.6.22-rc1 kernel source. For this purpose, go to the directory that contains the kernel sources, create the directory patches and copy the files that the series consists of into it. Next, run
$ quilt push -a
and all patches in the series will be applied. Alternatively, you can run
$ quilt push 11-swsusp-reduce-code-duplication-between-user_c-and-disk_c.patch
and quilt will apply all patches in the series up to and including the one given as the argument to ’quilt push’. The ordering of patches is based on the contents of the series file. That is, the patches the names of which are at the beginning of the series file are applied first.
Generally, you should not modify the series file manually, but sometimes it is convenient to do that. Suppose, for instance, that you do not want the patch 09-acpi-fix-suspend-resume-ordering.patch
from the above series to be applied. In that case, you can place ’#’ before the name of the patch in the series file:
01-freezer-close-theoretical-race-between-refrigerator-and-thaw_tasks.patch 02-freezer-fix-vfork-problem.patch 03-freezer-take-kernel_execve-into-consideration.patch 04-fix-kthread_create-vs-freezer-theoretical-race.patch 05-fix-pf_nofreeze-and-freezeable-race-2.patch 06-move-frozen_process-to-kernel-power-processc.patch 07-power-management-use-mutexes-instead-of-semaphores.patch 08-swsusp-fix-sysfs-interface.patch #09-acpi-fix-suspend-resume-ordering.patch 10-swsusp-remove-platform-callbacks-from-resume-code.patch 11-swsusp-reduce-code-duplication-between-user_c-and-disk_c.patch
and that will make quilt regard this line as a comment and skip the patch. You can also change the ordering of patches by changing the ordering of names in the series file manually. Remember, however, not to change the ordering of names of the patches that have already been applied, because that will confuse quilt.
The applied patches are treated by quilt as though they were on a stack. The first applied patch is on the bottom of the stack, while the last applied patch is on the top of it. That’s why ’quilt push’ is used to apply patches. Specifically, ’quilt push’ applies the next patch in the series that has not been applied yet and places it on the top of the stack. Analogously, ’quilt pop’ reverts the most recently applied patch (ie. the one on the top of the stack) and removes it from the stack. The command ’quilt push -a’ applies all patches in the series that have not been applied yet and places them on the stack in the right order, while ’quilt pop -a’ reverts all of the applied patches and removes them from the stack, one by one.
You can make quilt print the name of the most recently applied patch (ie. the one on the top of the stack) by running ’quilt top’, while ’quilt next’ will show you the name of the next patch to be applied. Similarly, ’quilt previous’ prints the name of the patch that has been applied right before the last one, ’quilt applied’ prints the names of all the currently applied patches (in the order in which they have been applied, so the name of the ”top” patch is printed last), and ’quilt series’ prints the names of all patches in the series.
You can also provide quilt with the number of patches to be applied or reverted. Namely, if you want it to apply the next two patches in the series, run
$ quilt push 2
Similarly, to make it revert the last two most recently applied patches, use
$ quilt pop 2
(these commands are very useful in binary searching for ”bad” patches, discussed in the next section). For example, having applied the first two patches from the series introduced above, you may want to apply the next three patches:
$ quilt push 3 Applying patch 03-freezer-take-kernel_execve-into-consideration.patch patching file kernel/power/process.c Applying patch 04-fix-kthread_create-vs-freezer-theoretical-race.patch patching file kernel/kthread.c Applying patch 05-fix-pf_nofreeze-and-freezeable-race-2.patch patching file include/linux/freezer.h Now at patch 05-fix-pf_nofreeze-and-freezeable-race-2.patch
Now, quilt tells you that 05-fix-pf_nofreeze-and-freezeable-race-2.patch is on the top of the stack: {{{$ quilt top 05-fix-pf_nofreeze-and-freezeable-race-2.patch $ quilt next 06-move-frozen_process-to-kernel-power-processc.patch $ quilt previous 04-fix-kthread_create-vs-freezer-theoretical-race.patch }}} Next, suppose that you want to apply two patches more:
{{{$ quilt push 2 Applying patch 06-move-frozen_process-to-kernel-power-processc.patch patching file include/linux/freezer.h patching file kernel/power/process.c Applying patch 07-power-management-use-mutexes-instead-of-semaphores.patch patching file drivers/base/power/main.c patching file drivers/base/power/power.h patching file drivers/base/power/resume.c patching file drivers/base/power/runtime.c patching file drivers/base/power/suspend.c Now at patch 07-power-management-use-mutexes-instead-of-semaphores.patch }}} Reverting of the three most recently applied patches is also simple:
$ quilt pop 3 Removing patch 07-power-management-use-mutexes-instead-of-semaphores.patch Restoring drivers/base/power/resume.c Restoring drivers/base/power/main.c Restoring drivers/base/power/runtime.c Restoring drivers/base/power/power.h Restoring drivers/base/power/suspend.c Removing patch 06-move-frozen_process-to-kernel-power-processc.patch Restoring kernel/power/process.c Restoring include/linux/freezer.h Removing patch 05-fix-pf_nofreeze-and-freezeable-race-2.patch Restoring include/linux/freezer.h Now at patch 04-fix-kthread_create-vs-freezer-theoretical-race.patch
Sometimes you may want quilt to revert patches util specific patch is on the top of the stack. To do this, use ’quilt pop’ with an argument being the name of the patch that you want to be on the top of the stack after the operation. Suppose, for instance, that you have applied the first ten patches from our example series, but now you want 04-fix-kthread_create-vs-freezer-theoretical-race.patch to be on the top of the stack (ie. to become the last recently applied one). You can make this happen in the following way: {{{$ quilt pop 04-fix-kthread_create-vs-freezer-theoretical-race.patch Removing patch 10-swsusp-remove-platform-callbacks-from-resume-code.patch Restoring kernel/power/user.c Removing patch 09-acpi-fix-suspend-resume-ordering.patch Restoring kernel/power/main.c Removing patch 08-swsusp-fix-sysfs-interface.patch Restoring kernel/power/disk.c Restoring kernel/power/main.c Removing patch 07-power-management-use-mutexes-instead-of-semaphores.patch Restoring drivers/base/power/resume.c Restoring drivers/base/power/main.c Restoring drivers/base/power/runtime.c Restoring drivers/base/power/power.h Restoring drivers/base/power/suspend.c Removing patch 06-move-frozen_process-to-kernel-power-processc.patch Restoring kernel/power/process.c Restoring include/linux/freezer.h Removing patch 05-fix-pf_nofreeze-and-freezeable-race-2.patch Restoring include/linux/freezer.h Now at patch 04-fix-kthread_create-vs-freezer-theoretical-race.patch Analogously, you can make quilt apply patches until specific one is on the top of the stack: $ quilt push 09-acpi-fix-suspend-resume-ordering.patch Applying patch 05-fix-pf_nofreeze-and-freezeable-race-2.patch patching file include/linux/freezer.h Applying patch 06-move-frozen_process-to-kernel-power-processc.patch patching file include/linux/freezer.h patching file kernel/power/process.c Applying patch 07-power-management-use-mutexes-instead-of-semaphores.patch patching file drivers/base/power/main.c patching file drivers/base/power/power.h patching file drivers/base/power/resume.c patching file drivers/base/power/runtime.c patching file drivers/base/power/suspend.c Applying patch 08-swsusp-fix-sysfs-interface.patch patching file kernel/power/disk.c patching file kernel/power/main.c Applying patch 09-acpi-fix-suspend-resume-ordering.patch patching file kernel/power/main.c Now at patch 09-acpi-fix-suspend-resume-ordering.patch }}}
As you can see in the above examples, for each applied or reverted patch quilt prints the names of the files modified in the process (in fact, these names are printed by the patch program introduced in Sec. 1.2, used by quilt). You can also make it print the names of the files modified by the most recently applied patch (ie. the one on the top of the stack) by using ’quilt files’: {{{$ quilt top 09-acpi-fix-suspend-resume-ordering.patch $ quilt files kernel/power/main.c }}} Additionally, it is possible to add some ”external” patches to a quilt series. The recommended way of doing this is to use the ’quilt import’ command with the additional argument being the name of the file containing the patch that you want to add to the series.
If this command is used, the file with the patch is automatically copied to the patches directory and the series file is updated by adding the name of this file right after the most recently applied one. Thus the patch becomes the next one to be applied by quilt, although it is not applied automatically.
Our description of quilt is by no means a complete one. It may only allow you to get a general idea of how quilt works and how flexible it is. Generally speaking, it allows one to do much more than we have shown above. In particular, it automates the creation and updating of patches in a very convenient way, but this part of its functionality is beyond the scope of our discussion. For more information about quilt refer to the documentation distributed with it, available at http://download.savannah.gnu.org/releases/quilt/ .
4.3 General idea of binary searching
Imagine that you have found a kernel bug, but you have no idea which of the thousand or more patches included in the tree that you are testing might have caused it to appear. In principle, you could revert the patches one by one and test the kernel after reverting each of them, but that would take way too much time. The solution is to use binary searching, also known as bisection.
To explain what it is we first assume that the kernel does not work for you any more after you have applied a series of n patches (n may be of the order of 1000). We also assume that it takes ∆t minutes on the average to compile and run the kernel on your system after reverting one patch (∆t may be of the order of 10). Under these assumptions, if you tried to revert patches one by one and test the kernel each time, you could spend about ∆t * n minutes worst-case before finding the offending patch. Of course, you might be lucky and the bug might have been introduced by the last patch in the series, but generally this is not very probable. More precisely, if you know nothing about the patches in question, you should assume that each of them is equally likely to have introduced the bug and therefore the probability of the bug being introduced by a specific patch is equal to 1/n (hence, the worst case is not very probable either, but that does not improve the situation very much).
Naturally, you can revert k patches out of n and then test the kernel to see whether or not one of these k patches has introduced the bug. Still, the question arises what number k should be equal to. To answer it, let’s estimate the maximum expected time needed to find the patch that has introduced the bug under the assumption that k patches are reverted in the first step. For this purpose, let A(k) denote the event that one of the k patches reverted in the first step has introduced the bug and let B(k) denote the event that the bug has been introduced by one of the remaining n − k patches. If A(k) occurs, the time needed to find the buggy patch will not be greater than TA (k) = ∆t · k (we need to search a series of k patches and it takes ∆t minutes to check one patch on the average) and the probability of occurrence of A(k) is pA (k) = k/n. In turn, if B(k) occurs, the time needed to find the buggy patch will not be greater than TB (k) = ∆t · (n − k) and the probability of occurrence of B(k) is pB (k) = (n − k)/n. Hence, since A(k) and B(k) are mutually exclusive and one of them must occur, the maximum average time needed to find the buggy patch is given by
T (k) = TA (k)pA (k) + TB (k)pB (k) = ∆t/n [ k + (n − k)^2 ]
Now, it takes a little algebra to show that T (k) attains minimum for k = n/2, so it is most reasonable to revert n/2 patches in the first step.
Further, if the kernel still doesn’t work after reverting n/2 patches from the series, we get the problem that is formally equivalent to the initial one with the number of patches to search reduced by half. Then, we can repeat the above reasoning for n = n/2 and it will turn out that the most reasonable thing we can do is to revert n /2 patches and test the kernel.
On the other hand, if the kernel works after we have reverted n/2 patches from the series, we need to reapply some of the reverted patches and test it once again. In that case, however, we still have n = n/2 patches to check and a reasoning analogous to the above one leads to the conclusion that the most reasonable number of patches to reapply before testing the kernel once again is n /2.
Thus we can establish a procedure, referred to as bisection or binary searching, allowing us to reduce the number of patches to check by half in every step. Namely, if the initial number of patches to check is n, we revert n/2 patches and test the kernel. Next, depending on the result of the previous test, we either revert or reapply n/4 patches and test the kernel once again. Subsequently, depending on the result of the previous test, we either revert or reapply n/8 patches and test the kernel once again. By repeating this approximately log2 n times we can reduce the number of suspicious patches to one and thus find the patch that has introduced the bug. It is not very difficult to observe that the average time needed to find the buggy patch this way is proportional to log2 n.