quick links [ main: Text processing #comments ] [ sed: writing the script to search of full function definitions #comments ] [ useful links #comments ]
efficiency and usability
sed
From IEEE Std 1003.1, 2004 Edition it is known, that in case of "variable number of matching characters" "the longest such sequence is matched". (Plenty of words, no useful real-life example or explanation.)
What if we need shortest match? Congratulations! This distinguishes the journeyman regular expression user from the novice. (wiki has hard time understanding own multi-line syntax, oh gee...)
While that phrase is about "negated character classes", demand is shortest match. Singe-character negation is as easy as [^chars]. What about multiple characters?
C comments
1 void /* __init */ func(int a, /* int b, */ int c) /* returns nothing */
' S/BRE/replacing/flag' command is like ' s/BRE/replacing/flag', but BRE matches shortest or first sequence. If changing of BRE syntax is OK, then '\{0,s\}' is better.
Speed-up is obvious (for free), and it should be used in context address (i.e. address BRE in '/BRE/cmd;' syntax, job: "is there at least one matching sequence?"). But i was told, that RE matcher is hardwired to be "greedy".
Thus, idea has nothing to invent, but just to apply.
perl added even more mess in RE syntax, and custom sed follows this bad design. I hope it is clear from previous paragraph, that new command (GNU sed has lot of them, not described as new in the man page) is easy and clever way.
new S-/[*].*[*]/-&-g match
1 void /* __init */ func(int a, /* int b, */ int c) /* returns nothing */
2 ^^^^^^^^^^^^ ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^
old s-/[*].*[*]/-&- match
1 void /* __init */ func(int a, /* int b, */ int c) /* returns nothing */
2 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
how it can be done now
1 olecom@flower$ sed 's-/[*].*[*]/-!!!!-' << "_"
2 > void /* __init */ func(int a, /* int b, */ int c) /* returns nothing */
3 > _
4 void !!!!
5 olecom@flower$ sed 's-[*]/-\n-g ; s-/[*][^\n]*\n-####-g' << "_"
6 > void /* __init */ func(int a, /* int b, */ int c) /* returns nothing */
7 > _
8 void #### func(int a, #### int c) ####
9 olecom@flower$
Collapse needed symbols with a special one (ordinary '\n' here) and then do usual single-character negation. Needless to say how non efficient all this is.
/bin/sh
http://article.gmane.org/gmane.comp.shells.dash/8
http://article.gmane.org/gmane.comp.shells.dash/12
0. Introduce versioning and feature check.
1. Patterns.
1.1 Restore original ash idea about negative pattern matching
(DIFFERENCES .9), but with only one `!', as it can be enables in
bash.
1.2 Make patterns to distinguish files and directories, because
searching algorithm already doing this as result of `*/*'
expansions, for example.
Sometimes it's better to have a list of files only, without
additional `find`. I don't know what syntax to propose,
especially if socket/fifo/devnode matching will be requested
later. Maybe
mplayer &F*/* # play files
cat !&P* # output content of not fifos(pipes)
ls -l /dev/&Dhd* # show a bit more info devnodes /dev/hd*
1.3 To sort output only on request.
I see no value in sorting it.
1.4 To do not output patterns in case of empty match.
Quoting is meant to be done for anything, shell can screw up.
Thus, i also see no value in saving pattern for programs, that
don't expect such ``file names'' anyway.
2. Restore `setvar variable value`. It is better, than to use `eval` to
artificially construct and perform assignments of variables with names,
which a generated or passed as parameters.
3. Yet again to kill aliases and all traces of history introduced to ash
by BSD guys.
4. Here-doc with quoted empty delimiter (`<<""') to be ended with EOF.
5. File descriptors.
5.1 Opening file descriptors at position (seeking)
1 # skip 1k while opening
2 read A B C <@"$((1<<10))"/tmp/file.txt
3
4 # seek while copying and closing file descriptor 4
5 cat <&4 <at> 4096 4<&-
6
7 # seek in the very beginning, while copying read-only file descriptor 4
8 cat <&4@
9
10 # seek in the very end beginning, while copying write-only $WO_A (see 5.2)
11 cat >&$WO_A@
5.2 IMHO making user accessible file descriptors in range of [3-9] is
kind of silly, when open() returns lowest available fd number,
and shell have no semantics of saying "no, this fd is used already".
Making them in higher and wider region, say in [100-255] is
quite reasonable. Making them as special variables, like
parameters `$1' are, makes even more sense, thus preventing any
potential problems.
1 # open /tmp/file.txt and place fd in $RO_A
2 exec RO_A</tmp/file.txt
3 # open, seek /tmp/file.bin and place fd in $RO_B
4 exec RO_B<@1024/tmp/file.bin
5
6 cd /tmp
7 # same, with making clear border in file name
8 exec RO_B<@"1024"file.bin
5.3 select()-like functionality. I.e. adding blocking/non blocking semantics
with timeouts.
6. Binary generator. To build more speed and size optimized complete
functions, loops, forking daemons etc. With clear and simple
shell<->"basic systems programming C language" relation it
shouldn't be that hard.
It's not fscking perl. It's how shell have to evolve, instead of ~20
years of complete crap. There is The Kernel, here we are
@vger.kernel.org. Now it's time for userspace to do not silently suck
under the table!
And BTW, i didn't see anything like proposed in pdksh or bash. They
seem to do other perl stuff instead.
== Political/religious ==
To restore original AGPL license in sources. Place reference to the
LICENSE file to all files with small notice, not BSD junk. That BSD
(and other pieces) must be added in LICENSE file.
To rename back to ash. Mainly because i didn't see that much changes
since original release by Kenneth Almquist. Quite reverse. Original
built-in `test` and `expr` were removed, crap, like aliases and
history, editing was added. Ah, Debian. Debian.....................
Useful links:
ash
sed
