• Immutable Page
  • Info
  • Attachments

olecom/sed-and-sh++

quick links [ main: Text processing #comments ] [ sed: writing the script to search of full function definitions #comments ] [ useful links #comments ]

efficiency and usability

sed

From IEEE Std 1003.1, 2004 Edition it is known, that in case of "variable number of matching characters" "the longest such sequence is matched". (Plenty of words, no useful real-life example or explanation.)

What if we need shortest match? Congratulations! This distinguishes the journeyman regular expression user from the novice. (wiki has hard time understanding own multi-line syntax, oh gee...)

While that phrase is about "negated character classes", demand is shortest match. Singe-character negation is as easy as [^chars]. What about multiple characters?

C comments

   1 void /* __init */ func(int a, /* int b, */ int c) /* returns nothing */

' S/BRE/replacing/flag' command is like ' s/BRE/replacing/flag', but BRE matches shortest or first sequence. If changing of BRE syntax is OK, then '\{0,s\}' is better.

Speed-up is obvious (for free), and it should be used in context address (i.e. address BRE in '/BRE/cmd;' syntax, job: "is there at least one matching sequence?"). But i was told, that RE matcher is hardwired to be "greedy".

  • Thus, idea has nothing to invent, but just to apply.

perl added even more mess in RE syntax, and custom sed follows this bad design. I hope it is clear from previous paragraph, that new command (GNU sed has lot of them, not described as new in the man page) is easy and clever way.

new S-/[*].*[*]/-&-g match

   1 void /* __init */ func(int a, /* int b, */ int c) /* returns nothing */
   2      ^^^^^^^^^^^^             ^^^^^^^^^^^^        ^^^^^^^^^^^^^^^^^^^^^

old s-/[*].*[*]/-&- match

   1 void /* __init */ func(int a, /* int b, */ int c) /* returns nothing */
   2      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

how it can be done now

   1 olecom@flower$ sed 's-/[*].*[*]/-!!!!-' << "_"
   2 > void /* __init */ func(int a, /* int b, */ int c) /* returns nothing */
   3 > _
   4 void !!!!
   5 olecom@flower$ sed 's-[*]/-\n-g ; s-/[*][^\n]*\n-####-g' << "_"
   6 > void /* __init */ func(int a, /* int b, */ int c) /* returns nothing */
   7 > _
   8 void #### func(int a, #### int c) ####
   9 olecom@flower$

Collapse needed symbols with a special one (ordinary '\n' here) and then do usual single-character negation. Needless to say how non efficient all this is.

/bin/sh

http://article.gmane.org/gmane.comp.shells.dash/8

http://article.gmane.org/gmane.comp.shells.dash/12

0. Introduce versioning and feature check.

1. Patterns.

  1.1 Restore original ash idea about negative pattern matching
      (DIFFERENCES .9), but with only one `!', as it can be enables in
      bash.

  1.2 Make patterns to distinguish files and directories, because
      searching algorithm already doing this as result of `*/*'
      expansions, for example.

      Sometimes it's better to have a list of files only, without
      additional `find`. I don't know what syntax to propose,
      especially if socket/fifo/devnode matching will be requested
      later. Maybe

      mplayer &F*/*         # play files
      cat !&P*              # output content of not fifos(pipes)
      ls -l /dev/&Dhd*      # show a bit more info devnodes /dev/hd*

  1.3 To sort output only on request.
      I see no value in sorting it.

  1.4 To do not output patterns in case of empty match.
      Quoting is meant to be done for anything, shell can screw up.
      Thus, i also see no value in saving pattern for programs, that
      don't expect such ``file names'' anyway.

2. Restore `setvar variable value`. It is better, than to use `eval` to
   artificially construct and perform assignments of variables with names,
   which a generated or passed as parameters.

3. Yet again to kill aliases and all traces of history introduced to ash
   by BSD guys.

4. Here-doc with quoted empty delimiter (`<<""') to be ended with EOF.

5. File descriptors.

   5.1 Opening file descriptors at position (seeking)

   1    # skip 1k while opening
   2    read A B C <@"$((1<<10))"/tmp/file.txt
   3 
   4    # seek while copying and closing file descriptor 4
   5    cat <&4 <at> 4096 4<&-
   6 
   7    # seek in the very beginning, while copying read-only file descriptor 4
   8    cat <&4@
   9 
  10    # seek in the very end beginning, while copying write-only $WO_A (see 5.2)
  11    cat >&$WO_A@

   5.2 IMHO making user accessible file descriptors in range of [3-9] is
       kind of silly, when open() returns lowest available fd number,
       and shell have no semantics of saying "no, this fd is used already".

       Making them in higher and wider region, say in [100-255] is
       quite reasonable. Making them as special variables, like
       parameters `$1' are, makes even more sense, thus preventing any
       potential problems.

   1    # open /tmp/file.txt and place fd in $RO_A
   2    exec RO_A</tmp/file.txt
   3    # open, seek /tmp/file.bin and place fd in $RO_B
   4    exec RO_B<@1024/tmp/file.bin
   5 
   6    cd /tmp
   7    # same, with making clear border in file name
   8    exec RO_B<@"1024"file.bin

  5.3 select()-like functionality. I.e. adding blocking/non blocking semantics
      with timeouts.

6. Binary generator. To build more speed and size optimized complete
   functions, loops, forking daemons etc. With clear and simple
   shell<->"basic systems programming C language" relation it
   shouldn't be that hard.

It's not fscking perl. It's how shell have to evolve, instead of ~20
years of complete crap. There is The Kernel, here we are
@vger.kernel.org. Now it's time for userspace to do not silently suck
under the table!

And BTW, i didn't see anything like proposed in pdksh or bash. They
seem to do other perl stuff instead.

== Political/religious ==

To restore original AGPL license in sources. Place reference to the
LICENSE file to all files with small notice, not BSD junk. That BSD
(and other pieces) must be added in LICENSE file.

To rename back to ash. Mainly because i didn't see that much changes
since original release by Kenneth Almquist. Quite reverse. Original
built-in `test` and `expr` were removed, crap, like aliases and
history, editing was added. Ah, Debian. Debian.....................

Useful links:

ash

sed

back to sed

shell

back to /bin/sh

Please, leave comments here.

KernelNewbies: olecom/sed-and-sh++ (last edited 2008-04-21 19:11:26 by olecom)