Size: 6230
Comment:
|
Size: 2968
Comment: separated examples
|
Deletions are marked like this. | Additions are marked like this. |
Line 41: | Line 41: |
I wrote this to do a multiline (git-)grep lets say we want to search for an erroneous pattern like this: {{{ for (i = 0; i < n; i++) {} ... if (i > n) ... }}} This is wrong because at the end of the loop i equals n and cannot be greater than n. The most simple way to match this is: {{{ gres -A40 "^ for \(" \ "for \( (@V) = @d ; \1 < @d ; \1 \+\+ \) \{.8.\} @n if \( \1 > \3 \)" }}} Also, @d has parentheses, that's why we have to use \3, not \2 for back-reference to match the second @d. For me - I currently have kernel version 2.6.33-rc2 - this results after about 10 seconds in a match (simplified here): {{{ ---[ vi drivers/mmc/host/s3cmci.c +1209 ]--- /* Set clock */ for (mci_psc = 0; mci_psc < 255; mci_psc++) { ... } if (mci_psc > 255) mci_psc = 255; ... }}} Quite harmless so I left it. So how does it work? gres does something in the order of: |
I wrote this to do a multiline (git-)grep, See [wiki:roelkluin/gres_examples examples]. gres does something in the order of: |
Line 77: | Line 45: |
The {{{`}}}-B1' and {{{`}}}-A40' are passed to git-grep, bli2() parses the first pattern and subsequent patterns are passed to ecsed2(). | with {{{`gres -B1 -A40 "pattern1" "pattern2" "..."'}}} the {{{`}}}-B1' and {{{`}}}-A40' are passed to git-grep, bli2() parses the first pattern to a extended regexp query. Subsequent patterns are passed to ecsed2() which are used to create a sed script. With this script, sed parses the `git grep' output and prints only the output of which the last - but not prior - patterns matched. |
Line 79: | Line 47: |
= ecsed2 = ecsed2() parses the `git grep' output and prints only the ones of which the last passed pattern matches, prior patterns (if any) are excluded. More in detail, the first of the {{{`}}}path/to/filename.c-301-' is transformed into a vi command, the remainder are removed. Until an end-of-function- or end-of-match-pattern occurs, lines are extended. Any comments are removed. For each match that `git grep' piped to sed, (parsed) matches are displayed '''if''' no exclusion pattern matched and the last pattern matched. Let's extend the example, similar errors will occur with: {{{ for (i = 0; i != MAX; i++) {} ... if (i <= MAX) ... }}} To catch such errors we could use a pattern like this: {{{ gres -A40 "^ for \(" \ "for \( (@V) = (@d|@K) ; \1 (<|\!=) (@d|@K) ; (\+\+ \1|\1 \+\+|\1 = \1 \+ 1|\1 = 1 \+ \1) \) \{.8.\} @n if \( \1 (>|<=) \5 \)" }}} The @K matches definitions. This did not result in more errors in this kernel version, so lets extend it even more. Similar problems may occur when we have a pattern like: {{{ while (foo() && ++i < MAX && bar()) {} ... if (baz() || i > MAX) ... }}} This can be matched by: {{{ gres -A40 "^ (for|while) \(" \ "(for \([^;]*;|while \() (\(-..\)[&|])* \+* (@V) (<|\!=) (@d|@w) ([&|]\(-..\))* (; (\+\+ \6|\6 \+\+|\6 = \6 \+ 1|\6 = 1 \+ \6))? \) \{.8.\} @n if \( (\(-..\)[&|])* \6 (>|<=) \8 ([&|]\(-..\))* \)" }}} This results (after about half a minute on my computer) in the additional matches: {{{ ---[ vi arch/sparc/mm/init_64.c +786 ]--- ... start += PAGE_SIZE; while (start < end) { ... if (...) break; start += PAGE_SIZE; } if (start > end) start = end; ... ---[ vi drivers/atm/horizon.c +626 ]--- while (...) ... ... if (...) ... ... while (div < CR_MAXD) { div++; if (...) { ... goto got_it; } } got_it: if (div > CR_MAXD || ...) ... }}} The second one was difficult to see in the output due to a prior while loop. The first one, in contrast, is a false positive: the addition of PAGE_SIZE can cause {{{`}}}start' to be bigger than {{{`}}}end'. In the case of a while loop the chosen pattern with {{{`}}}\+*' allows a postfix increment, but does not ensure that an increment occurs, an addition can occur just as well. We can exclude the false positive by adding an exclusion pattern: {{{ gres -A40 "^ (for|while) \(" \ "while \( (\(-..\)[&|])* (@V) (<|\!=) (@d|@w) ([&|]\(-..\))* \) (\{\{-..\} \n)? \5 ([+*|]=|= \5 \+)@n if \( (\(-..\)[&|])* \5 (>|<=) \7 ([&|]\(-..\))* \)" \ "(for \([^;]*;|while \() (\(-..\)[&|])* \+* (@V) (<|\!=) (@d|@w) ([&|]\(-..\))* (; (\+\+ \6|\6 \+\+|\6 = \6 \+ 1|\6 = 1 \+ \6))? \) \{.8.\} @n if \( (\(-..\)[&|])* \6 (>|<=) \8 ([&|]\(-..\))* \)" }}} |
The sed script transforms the first of the {{{`}}}path/to/filename.c-301-' into a vi command, the remainder are removed. Until an end-of-function- or end-of-match-pattern occurs, lines are extended. Any comments are removed. For each match that `git grep' piped to sed, (parsed) matches are displayed '''if''' no exclusion pattern matched and the last pattern matched. |
You have to source [wiki:roelkluin/cvars cvars] to use these tools and run these commands in your git Linux kernel directory.
gg
gg does something like
git grep -n -E [other_options] "$(bli2 "$1")"
bli2
bli2() parses a string and transforms it into a more complex extended regexp, which it simply echoes.
To understand how it parses things try these:
bli2 "@V" bli2 "@d" bli2 " " bli2 " "
Note that @V will catch the identifier of a simple local variable, @d will catch a number (even if it's a hex or 1ull), spaces are squeezed and parsed to match optional spaces.
bli2 pattern |
description of what is echoed |
echoed string (literally) |
number of back-references |
any space |
optional space |
[[:space:]]* |
- |
\! |
exclamation mark (because bash otherwise bangs) |
! |
- |
@S |
obligatory space |
[[:space:]]+ |
- |
@V |
identifier |
[[:alpha:]_]+[[:alnum:]_]* |
- |
@K |
identifier in only uppercases |
[[:upper:]_]+[[:upper:][:digit:]_]* |
- |
@Q |
a non-alnumeric |
[^[:alnum:]_] |
- |
@Q2 |
a non-alnumeric or extension to the left of a variable |
[^[:alnum:]_>.] |
- |
@w |
(pointer) member, array |
see `bli2 "@w"' |
1 |
@d |
any number |
see `bli2 "@d"' |
1 |
@n |
any number of lines, subsequent matches on the beginning of the next line |
([^\n]*\n)* |
1 |
\(...\) |
up to 2 nested parentheses |
see `bli2 "\(...\)"' |
2 |
\{...\} |
up to 2 nested curly brackets |
see `bli2 "\{...\}"' |
2 |
\[...\] |
up to 2 nested square brackets |
see `bli2 "\[...\]"' |
2 |
\(-..\) |
characters optionally followed by up to 2 nested parentheses |
see `bli2 "\(-..\)"' |
3 |
\{-..\} |
characters optionally followed by up to 2 nested curly brackets |
see `bli2 "\{-..\}"' |
3 |
\{.8.\} |
up to 8 nested curly brackets |
see `bli2 "\(.8.\)"' |
8 |
gres
I wrote this to do a multiline (git-)grep, See [wiki:roelkluin/gres_examples examples]. gres does something in the order of:
git grep -E -n -other_opts "$(bli2 "$1")" -- '*.c' '*.h' | sed -n -r "$(ecsed2 "${@:2})"
with `gres -B1 -A40 "pattern1" "pattern2" "..."' the `-B1' and `-A40' are passed to git-grep, bli2() parses the first pattern to a extended regexp query. Subsequent patterns are passed to ecsed2() which are used to create a sed script. With this script, sed parses the `git grep' output and prints only the output of which the last - but not prior - patterns matched.
The sed script transforms the first of the `path/to/filename.c-301-' into a vi command, the remainder are removed. Until an end-of-function- or end-of-match-pattern occurs, lines are extended. Any comments are removed. For each match that `git grep' piped to sed, (parsed) matches are displayed if no exclusion pattern matched and the last pattern matched.