Size: 8342
Comment:
|
← Revision 71 as of 2019-03-13 15:19:05 ⇥
Size: 11396
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 6: | Line 6: |
I'm a researcher at Inria, in Paris France. I develop the tool [http://coccinelle.lip6.fr Coccinelle], which allows easy matching and transformation of C code. Coccinelle has been designed with the goal of contributing to Linux development, but it can also be used on other C code. | I'm a researcher at Inria, in Paris France. I develop the tool [[http://coccinelle.lip6.fr|Coccinelle]], which allows easy matching and transformation of C code. Coccinelle has been designed with the goal of contributing to Linux development, but it can also be used on other C code. |
Line 8: | Line 8: |
Please write to me directly if you would like to apply to the Coccinelle OPW project. | Please write to me directly if you would like to apply to the Coccinelle Outreachy project. == Overview == This page is organized into two parts. The first part is about learning to use Coccinelle. The second part has some small tasks that are relevant for the documentation project. If you are interested in working on the Coccinelle project, you should do some work from both parts. For the Coccinelle part, it would be a good idea to start with the first challenge problem, to check that you know how to use the tool properly. The remaining challenge problems can be done in any order. It is not obligatory to do all of them. You may find other things that can be done with Coccinelle. Sources of inspiration may be the results of checkpatch and patches that have been applied to the kernel in the past. Any kind of problem that occurs over and over might be amenable to being solved with Coccinelle. These challenge problems may apply to many files in the kernel. Pick a few files, and send patches for those. Once they have been accepted, consider moving on to another challenge problem. You will get a better understanding of Coccinelle if you use it for many different things than if you use it do one thing over and over. There are many examples of uses of Coccinelle, in previous patches, in the kernel source tree in the scripts/coccinelle directory, and at [[https://github.com/coccinelle/coccinellery|coccinellery]]. If you use a script that is already in the Linux kernel, you don't need to include the script in your commit log, but rather something like Generated-by: scripts/coccinelle/misc/badty.cocci == Tutorial == A tutorial for Coccinelle is available [[https://pages.lip6.fr/Julia.Lawall/tutorial.pdf|here]]. These are slides that are intended to be presented, but they may be understandable independently of the presentation. Please note that the tutorial focuses on the source code of Linux 3.2, and so the patches created in doing the exercises of the tutorial are not suitable for submission to the ooutreachy-kernel mailing list. Doing the tutorial also does not count as a contribution to the project. |
Line 12: | Line 26: |
Consider the following function, from drivers/staging/vt6656/rf.c: | Consider the following function, from drivers/staging/most/hdm-dim2/dim2_sysfs.c (Note that this file no longer exists. If you want to experiment with this code, just create a new .c file containing this function definition.) |
Line 15: | Line 29: |
int vnt_rf_setpower(struct vnt_private *priv, u32 rate, u32 channel) | static ssize_t bus_kobj_attr_store(struct kobject *kobj, struct attribute *attr, const char *buf, size_t count) |
Line 17: | Line 32: |
int ret = true; u8 power = priv->cck_pwr; |
ssize_t ret; struct medialb_bus *bus = container_of(kobj, struct medialb_bus, kobj_group); struct bus_attr *xattr = container_of(attr, struct bus_attr, attr); |
Line 20: | Line 37: |
if (channel == 0) return -EINVAL; |
if (!xattr->store) return -EIO; |
Line 23: | Line 40: |
switch (rate) { case RATE_1M: case RATE_2M: case RATE_5M: case RATE_11M: channel--; if (channel < sizeof(priv->cck_pwr_tbl)) power = priv->cck_pwr_tbl[channel]; break; case RATE_6M: case RATE_9M: case RATE_18M: case RATE_24M: case RATE_36M: case RATE_48M: case RATE_54M: if (channel > CB_MAX_CHANNEL_24G) power = priv->ofdm_a_pwr_tbl[channel-15]; else power = priv->ofdm_pwr_tbl[channel-1]; break; } ret = vnt_rf_set_txpower(priv, power, rate); return ret; |
ret = xattr->store(bus, buf, count); return ret; |
Line 56: | Line 48: |
int vnt_rf_setpower(struct vnt_private *priv, u32 rate, u32 channel) | static ssize_t bus_kobj_attr_store(struct kobject *kobj, struct attribute *attr, const char *buf, size_t count) |
Line 58: | Line 51: |
int ret = true; u8 power = priv->cck_pwr; |
ssize_t ret; struct medialb_bus *bus = container_of(kobj, struct medialb_bus, kobj_group); struct bus_attr *xattr = container_of(attr, struct bus_attr, attr); |
Line 61: | Line 56: |
if (channel == 0) return -EINVAL; |
if (!xattr->store) return -EIO; |
Line 64: | Line 59: |
switch (rate) { ... } return vnt_rf_set_txpower(priv, power, rate); |
return xattr->store(bus, buf, count); |
Line 76: | Line 67: |
expression ret; identifier f; |
local idexpression ret; expression e; |
Line 82: | Line 73: |
f(...); | e; |
Line 88: | Line 79: |
1. Download and install Coccinelle. If you are using Linux, it should be available in your package manager. Any recent version is fine to start with, but you may need to get the most recent version, which is 1.0.0-rc21. This is available on the Coccinelle webpage (coccinelle.lip6.fr) and on github. 1. Download staging-next 1. Save the above semantic patch in a file ret.cocci 1. Run Coccinelle on ret.cocci and staging-next, ie spatch --sp-file ret.cocci --no-includes --dir {your staging-next path} > ret.out. This may take some time. |
1. Download and install Coccinelle. If you are using Linux, it should be available in your package manager. Any recent version is fine to start with, but you may need to get the most recent version, which is 1.0.4. This is available on the Coccinelle webpage (coccinelle.lip6.fr) and on github. 1. Download staging-testing 1. Save the above semantic patch in a file ret.cocci 1. Run Coccinelle on ret.cocci and staging-testing, ie spatch --sp-file ret.cocci --no-includes --dir {your staging-testing path}/drivers/staging > ret.out. This may take some time. |
Line 102: | Line 88: |
If you do submit a patch based on the use of Coccinelle, please mention Coccinelle in your patch, and the semantic patch that you used. |
Your code may now declare some variables that are never used. Remove them before submitting your patch. If you do submit a patch based on the use of Coccinelle, please mention Coccinelle in your patch, and the semantic patch that you used. What happens in the above semantic patch if you replace local idexpression by identifier or expression? Try these extra variants and see if there are any differences in the results. |
Line 107: | Line 96: |
In the following function, from drivers/staging/rtl8723au/os_dep/ioctl_cfg80211.c, the variable ret is not very useful. |
Parentheses are not needed around the right hand side of an assignment, like in value = (FLASH_CMD_STATUS_REG_READ << 24);. Write a semantic patch to remove these parentheses. |
Line 111: | Line 98: |
{{{ static int rtw_cfg80211_monitor_if_set_mac_address(struct net_device *ndev, void *addr) { int ret = 0; |
One could consider that parentheses might be useful in the case of eg rising = (dir == IIO_EV_DIR_RISING); because there could be a confusion between the different kinds of =. Extend your semantic patch using a disjunction so that it does not report on such cases. |
Line 117: | Line 100: |
DBG_8723A("%s\n", __func__); return ret; } }}} The code would be simpler as: {{{ static int rtw_cfg80211_monitor_if_set_mac_address(struct net_device *ndev, void *addr) { DBG_8723A("%s\n", __func__); return 0; } }}} The following semantic patch makes this transformation: {{{ @@ identifier ret; @@ -int ret = 0; ... when != ret when strict -return ret; +return 0; }}} The code <code>... when != ret</code> means that between the int ret = 0; and the return at the end of the function, there should be no use of ret. The code <code>when strict</code> means that this should hold on every execution path, including those that abort the function (return in the middle of a function) Test this semantic patch on the staging tree. Do you find any of the results surprising? Are the results correct? Did Coccinelle complain about anything or crash (if so, you may need to get a more recent version). Submit some patches based on your results. |
Other kinds of code do not need parentheses, such as a->b in &(a->b), function arguments, and the argument of return. |
Line 162: | Line 104: |
If you look back at the code in == Coccinelle challenge problem 1, you will see that actually the only use of ret is the one at the end of the function, and the semantic patch gets rid of it. This suggests that it could be good to extend that semantic patch with another rule that would look for cases where a variable is never used, and then remove the declaration of the variable entirely. For this, the semantic patch rule shown in == Coccinelle challenge problem 2 can serve as inspiration, but needs to be somewhat modified. |
Some functions return NULL as a return value on failure. NULL can be tested for as !x, NULL == x, or x == NULL. When NULL represents failure, eg of an allocation, !x is commonly used. The following are some functions that commonly follow this strategy: |
Line 171: | Line 106: |
Submit patches, including your semantic patch, based on your results. Does your semantic patch do too much? Think about what information you would need to get a better result. |
{{{ kmalloc devm_kzalloc kmalloc_array devm_ioremap usb_alloc_urb alloc_netdev dev_alloc_skb }}} Write a semantic patch to clean up the tests on the results of one or more of these functions. |
Line 177: | Line 120: |
The lustre file system in the staging tree defines the following macro: | Kmalloc and variants normally produce a backtrace when there is not enough memory, so it is not necessary to print an error message that provides only this information. Write a semantic patch that removes such print statements. Note that doing so may results in an if that has only one statement in a branch, so the surrounding braces should also be removed in this case |
Line 179: | Line 122: |
{{{ #define GOTO(label, rc) \ do { \ if (cfs_cdebug_show(D_TRACE, DEBUG_SUBSYSTEM)) { \ LIBCFS_DEBUG_MSG_DATA_DECL(msgdata, D_TRACE, NULL); \ libcfs_log_goto(&msgdata, #label, (long_ptr_t)(rc)); \ } else { \ (void)(rc); \ } \ goto label; \ } while (0) }}} In practice, the "then" branch of the if is debugging code, and it is only the code in the else branch that is useful, as well as the goto that is after the if. The GOTO macro is not standard in Linux, and it would be nice to get rid of it. Write a semantic patch to make the required transformation. |
'''Hint''': A metavariable declared as constant char[] c; matches any string constant. |
Line 200: | Line 126: |
Continuing with the GOTO macro, you may find that in many cases the rc argument can be dropped completely. In what case should it be kept? In what case should it be dropped? Write a semantic patch that gives a pleasant result. <b>Hint:</b> it may be useful to consider the metavariable types identifier (a variable name like rc) and constant (a number like 0). |
The Linux kernel coding style guidelines discourage the use of typedefs for struct types. There are several opportunities for using Coccinelle here. |
Line 207: | Line 128: |
<b>Bigger hint:</b> A number of patches making this transformation using Coccinelle have been submitted already. If you are stuck, track them down in the git logs and try to understand what they do. |
* You can write a semantic patch to find such typedefs. Typedefs are often found in header files. To be sure that Coccinelle looks at all available header files, use the argument --include-headers. Note also that the name set by a typedef matches a metavariable of type '''type'''. * If you find such a typedef, to remove it, you need to adjust all the uses. This can be done using a semantic patch. * You can also fully automate the process. Note that the name of a typedef typically ends in _t and thus that name is not directly suitable as a name for the struct type. You will need to use python code to remove the _t. The file coccinelle/demos/pythontococci.cocci can help in doing this, as it shows how to declare a variable in python code and then use it in pattern matching code. By default, Coccinelle only works on .c files, including only .h files that have the same name as the .c file. Typedefs, however are likely to be in .h files. You can try the argument --all-includes, to try to include the .h files in the treatment of each .c file. That will make it possible to update both the typedef and its uses. To work on the .h files individually, you can use the option --include-headers. In that case you will have to update the uses of the types separately, by hand or with another semantic patch. |
Line 213: | Line 136: |
The lustre file system in the staging tree uses a number of macros related to locks. Find their definitions and use Coccinelle to replace them by the corresponding standard Linux functions. |
The file include/linux/list.h contains many functions and macros for manipulating lists. For example, when some expression l points into a list, ie has type struct list_head *, then list_entry(l, type, member) can be used to access the current list element, rather than using container_of. Make a semantic patch to use list_entry when possible. When you find a change opportunity, consider whether some other nearby code could also be reimplemented to use a list operator. '''Hint:''' A metavariable declares as struct list_head *l; will only match an expression of type struct list_head *. |
Line 219: | Line 142: |
If a variable has value 0, then there is no point in combining it with other things with |, as for any x, 0 | x is just x. The following semantic patch finds this problem. | list_for_each is a macro that iterates over the elements of a (doubly linked) list. list_entry is a function that takes as argument a list pointer and returns the structure that is pointed to. list_for_each_entry is a macro that iterates over the structures in the list, rather than exposing the list spine. Often a list is only used for its entries, and thus list_for_each_entry can be used instead of the composition of list_for_each and list_entry. An exmple of the transformation is as follows (commit 711584ea4c8ce): |
Line 222: | Line 149: |
@@ expression x,e,e1; statement S; @@ if (x == 0) { ... when != x = e1 when != while(...) S when != for(...;...;...) S ( * x |= e | * x | e ) ... when any } |
- list_for_each(p, &hci_cb_list) { - struct hci_cb *cb = list_entry(p, struct hci_cb, list); + list_for_each_entry(cb, &hci_cb_list, list) { if (cb->security_cfm) cb->security_cfm(conn, status, encrypt); } |
Line 240: | Line 157: |
1. Apply the semantic patch to the Linux kernel and make some corresponding changes by hand. Note that the result of a semantic patch that uses * is something like a patch with a - at the beginning of any line that contains a match of the starred pattern. 1. Consider how you could extend the semantic patch to fix the code rather than just finding possible occurrences of the problem. Hint: it may work best to change the first "..." to <... and to change the "... when any" to just ...>. |
An criterion for the transformation is that p should not be used in the loop body. |
Line 243: | Line 159: |
Note that there are very few occurrences of this problem in staging. If someone else has done this problem, you may need to look elsewhere in the kernel. |
Note that in this example, the variable holding the result of list_entry is only defined inside the loop in the old code. Since that variable is moved up into the loop header, its declaration has to be moved up as well. At the same time, the variable p is no longer used inside the loop, and is indeed no longer used in the function at all, and thus its declaration can be dropped completely. You can automate as much of this as you like. There are currently few opportunities for this transformation in staging drivers. == Coccinelle challenge problem 8 == Sometimes a variable is declared and at the same time initialized to the result of calling some function, and thus function does some simple task, such as accessing a structure field. If the value of this function call doesn't change, the variable can be used rather than calling the function again. Write a semantic patch to detect, and potentially correct, these issues. |
Line 249: | Line 170: |
You can also try the [http://kernelnewbies.org/JuliaLawall_round8 Old Coccinelle challenge problems] from round 8. | You can also try the [[http://kernelnewbies.org/JuliaLawall_round8|Coccinelle challenge problems from round 8]], [[http://kernelnewbies.org/JuliaLawall_round9|Coccinelle challenge problems from round 9]], and [[http://kernelnewbies.org/JuliaLawall_round10|Coccinelle challenge problems from round 10]]. |
Line 253: | Line 174: |
Email: [[MailTo(Julia.Lawall AT lip6 DOT fr)]] | Email: <<MailTo(Julia.Lawall AT lip6 DOT fr)>> |
Line 257: | Line 178: |
Questions about using Coccinelle should go to the Coccinelle mailing list: [[MailTo(cocci AT systeme DOT lip6 DOT fr)]] | Questions about using Coccinelle should go to the Coccinelle mailing list: <<MailTo(cocci AT systeme DOT lip6 DOT fr)>> |
About Me
I'm a researcher at Inria, in Paris France. I develop the tool Coccinelle, which allows easy matching and transformation of C code. Coccinelle has been designed with the goal of contributing to Linux development, but it can also be used on other C code.
Please write to me directly if you would like to apply to the Coccinelle Outreachy project.
Overview
This page is organized into two parts. The first part is about learning to use Coccinelle. The second part has some small tasks that are relevant for the documentation project. If you are interested in working on the Coccinelle project, you should do some work from both parts.
For the Coccinelle part, it would be a good idea to start with the first challenge problem, to check that you know how to use the tool properly. The remaining challenge problems can be done in any order. It is not obligatory to do all of them. You may find other things that can be done with Coccinelle. Sources of inspiration may be the results of checkpatch and patches that have been applied to the kernel in the past. Any kind of problem that occurs over and over might be amenable to being solved with Coccinelle.
These challenge problems may apply to many files in the kernel. Pick a few files, and send patches for those. Once they have been accepted, consider moving on to another challenge problem. You will get a better understanding of Coccinelle if you use it for many different things than if you use it do one thing over and over.
There are many examples of uses of Coccinelle, in previous patches, in the kernel source tree in the scripts/coccinelle directory, and at coccinellery. If you use a script that is already in the Linux kernel, you don't need to include the script in your commit log, but rather something like Generated-by: scripts/coccinelle/misc/badty.cocci
Tutorial
A tutorial for Coccinelle is available here. These are slides that are intended to be presented, but they may be understandable independently of the presentation. Please note that the tutorial focuses on the source code of Linux 3.2, and so the patches created in doing the exercises of the tutorial are not suitable for submission to the ooutreachy-kernel mailing list. Doing the tutorial also does not count as a contribution to the project.
Coccinelle challenge problem 1
Consider the following function, from drivers/staging/most/hdm-dim2/dim2_sysfs.c (Note that this file no longer exists. If you want to experiment with this code, just create a new .c file containing this function definition.)
static ssize_t bus_kobj_attr_store(struct kobject *kobj, struct attribute *attr, const char *buf, size_t count) { ssize_t ret; struct medialb_bus *bus = container_of(kobj, struct medialb_bus, kobj_group); struct bus_attr *xattr = container_of(attr, struct bus_attr, attr); if (!xattr->store) return -EIO; ret = xattr->store(bus, buf, count); return ret; }
In this function, the last two lines could be compressed into one, as:
static ssize_t bus_kobj_attr_store(struct kobject *kobj, struct attribute *attr, const char *buf, size_t count) { ssize_t ret; struct medialb_bus *bus = container_of(kobj, struct medialb_bus, kobj_group); struct bus_attr *xattr = container_of(attr, struct bus_attr, attr); if (!xattr->store) return -EIO; return xattr->store(bus, buf, count); }
The following semantic patch makes this change:
@@ local idexpression ret; expression e; @@ -ret = +return e; -return ret;
Do the following:
- Download and install Coccinelle. If you are using Linux, it should be available in your package manager. Any recent version is fine to start
with, but you may need to get the most recent version, which is 1.0.4. This is available on the Coccinelle webpage (coccinelle.lip6.fr) and on github.
- Download staging-testing
- Save the above semantic patch in a file ret.cocci
Run Coccinelle on ret.cocci and staging-testing, ie spatch --sp-file ret.cocci --no-includes --dir {your staging-testing path}/drivers/staging > ret.out. This may take some time.
Do you find the result satisfactory? If so, submit some patches. If not, let us know!
Your code may now declare some variables that are never used. Remove them before submitting your patch.
If you do submit a patch based on the use of Coccinelle, please mention Coccinelle in your patch, and the semantic patch that you used.
What happens in the above semantic patch if you replace local idexpression by identifier or expression? Try these extra variants and see if there are any differences in the results.
Coccinelle challenge problem 2
Parentheses are not needed around the right hand side of an assignment, like in value = (FLASH_CMD_STATUS_REG_READ << 24);. Write a semantic patch to remove these parentheses.
One could consider that parentheses might be useful in the case of eg rising = (dir == IIO_EV_DIR_RISING); because there could be a confusion between the different kinds of =. Extend your semantic patch using a disjunction so that it does not report on such cases.
Other kinds of code do not need parentheses, such as a->b in &(a->b), function arguments, and the argument of return.
Coccinelle challenge problem 3
Some functions return NULL as a return value on failure. NULL can be tested for as !x, NULL == x, or x == NULL. When NULL represents failure, eg of an allocation, !x is commonly used. The following are some functions that commonly follow this strategy:
kmalloc devm_kzalloc kmalloc_array devm_ioremap usb_alloc_urb alloc_netdev dev_alloc_skb
Write a semantic patch to clean up the tests on the results of one or more of these functions.
Coccinelle challenge problem 4
Kmalloc and variants normally produce a backtrace when there is not enough memory, so it is not necessary to print an error message that provides only this information. Write a semantic patch that removes such print statements. Note that doing so may results in an if that has only one statement in a branch, so the surrounding braces should also be removed in this case
Hint: A metavariable declared as constant char[] c; matches any string constant.
Coccinelle challenge problem 5
The Linux kernel coding style guidelines discourage the use of typedefs for struct types. There are several opportunities for using Coccinelle here.
You can write a semantic patch to find such typedefs. Typedefs are often found in header files. To be sure that Coccinelle looks at all available header files, use the argument --include-headers. Note also that the name set by a typedef matches a metavariable of type type.
- If you find such a typedef, to remove it, you need to adjust all the uses. This can be done using a semantic patch.
- You can also fully automate the process. Note that the name of a typedef typically ends in _t and thus that name is not directly suitable as a name for the struct type. You will need to use python code to remove the _t. The file coccinelle/demos/pythontococci.cocci can help in doing this, as it shows how to declare a variable in python code and then use it in pattern matching code.
By default, Coccinelle only works on .c files, including only .h files that have the same name as the .c file. Typedefs, however are likely to be in .h files. You can try the argument --all-includes, to try to include the .h files in the treatment of each .c file. That will make it possible to update both the typedef and its uses. To work on the .h files individually, you can use the option --include-headers. In that case you will have to update the uses of the types separately, by hand or with another semantic patch.
Coccinelle challenge problem 6
The file include/linux/list.h contains many functions and macros for manipulating lists. For example, when some expression l points into a list, ie has type struct list_head *, then list_entry(l, type, member) can be used to access the current list element, rather than using container_of. Make a semantic patch to use list_entry when possible. When you find a change opportunity, consider whether some other nearby code could also be reimplemented to use a list operator.
Hint: A metavariable declares as struct list_head *l; will only match an expression of type struct list_head *.
Coccinelle challenge problem 7
list_for_each is a macro that iterates over the elements of a (doubly linked) list. list_entry is a function that takes as argument a list pointer and returns the structure that is pointed to. list_for_each_entry is a macro that iterates over the structures in the list, rather than exposing the list spine.
Often a list is only used for its entries, and thus list_for_each_entry can be used instead of the composition of list_for_each and list_entry.
An exmple of the transformation is as follows (commit 711584ea4c8ce):
- list_for_each(p, &hci_cb_list) { - struct hci_cb *cb = list_entry(p, struct hci_cb, list); + list_for_each_entry(cb, &hci_cb_list, list) { if (cb->security_cfm) cb->security_cfm(conn, status, encrypt); }
An criterion for the transformation is that p should not be used in the loop body.
Note that in this example, the variable holding the result of list_entry is only defined inside the loop in the old code. Since that variable is moved up into the loop header, its declaration has to be moved up as well. At the same time, the variable p is no longer used inside the loop, and is indeed no longer used in the function at all, and thus its declaration can be dropped completely. You can automate as much of this as you like.
There are currently few opportunities for this transformation in staging drivers.
Coccinelle challenge problem 8
Sometimes a variable is declared and at the same time initialized to the result of calling some function, and thus function does some simple task, such as accessing a structure field. If the value of this function call doesn't change, the variable can be used rather than calling the function again. Write a semantic patch to detect, and potentially correct, these issues.
Other Coccinelle challenge problems
You can also try the Coccinelle challenge problems from round 8, Coccinelle challenge problems from round 9, and Coccinelle challenge problems from round 10.
Contact info
Email: <Julia.Lawall AT lip6 DOT fr>
My IRC handle is jlawall.
Questions about using Coccinelle should go to the Coccinelle mailing list: <cocci AT systeme DOT lip6 DOT fr>