• Immutable Page
  • Info
  • Attachments

olecom

E-Mail, SPAM

Person: Oleg Verych
Username: olecom
Address: flower.upol.cz
Condition: public
Composite: "Oleg Verych" <!public?olecom.ENOMSG@flower.upol.cz>

A message will be accepted not to /dev/null only, if human will construct correct full e-mail address.

1) "Person" must be included
2) "Condition" must be evaluated.

Additional anti-spam measures may include demand of "In-Reply-To" header with required content.

Hey, Smart Spam, you are welcome!

quick links [ tools: sed and /bin/sh functionality ideas #comments ] [ sed: writing the script to search of full function definitions #comments ] [ #comments ]

Automated source code. Text processing.

I want to present my view of source code management.

http://article.gmane.org/gmane.linux.kernel/649945

http://article.gmane.org/gmane.linux.kernel/650783

Tools

BRE (not ERE!) patterns and sed scripts are the base. Together they make easy to read and understand patterns and logic. BREs -- matching, sed -- processing && logic. Glue is /bin/sh, shell which is a very small, yet powerful, userspace functionality layer above libc and UNIX-like kernel.

How to read a BRE is up to you. One hint can be found on next to last paragraph here. Having meaningful and helpful highlighting in a text editor is another problem.

Also, this tools are seem to forced back in its development to black, proprietary UNIX past. Now it is called POSIX(R). Decades of stall, oh crap. Some useful from my POV functionality ideas here for sed and shell, that i didn't see in GNU [bash gsed], ksh, pdksh; perl isn't serious garbage collector.

Text

Ideas will be posted as soon as they will be developed.

May be one day Open Source projects will have all needed tools

  • to audit all known security errors (not broken security designs, no)
  • to have one driver base for all kernels (Linux, BSD, GNU, etc.), no binary or source compatibility layers
  • file system base
  • dynamic source code, which will be configured, adopted and built for particular demands
    • hardware legacy,
    • schedulers,
    • memory management algorithms,
    • favorite out-of-tree toy,
    • no stupid style or #include patches,
    • Synthesis but source-based

    • etc. etc. means meta-maintaining of source code base, it's Open Source, isn't it?

  • making this crappy wiki accept own meaningful for human syntax.

So, what makes an ordinary craft and manu factura an industry? The tools are main engine of industrial revolutions.

  • ~+Where are such tools?+~

Coding style. Function definitions.

Why coding style matters? How to write code, that will be friendly for text processing?

Case study: function definitions

whitespace damage made visible:

linux-2.6/arch/m68k/atari/stram.c

   1 void atari_stram_free( void *addr )
   2 \n
   3 {
   4         BLOCK *block;
   5 ...
   6 }

linux-2.6/arch/parisc/kernel/signal.c

   1 static long
   2 setup_sigcontext(struct sigcontext __user *sc, struct pt_regs *regs, int in_syscall)
   3 \t\t_\n
   4 {
   5         unsigned long flags = 0;
   6 ...
   7 }

body blocks:

linux-2.6/drivers/base/class.c

   1 static inline int make_deprecated_class_device_links(struct class_device *cd)
   2 { return 0; }
   3 static void remove_deprecated_class_device_links(struct class_device *cd)
   4 { }

linux-2.6/arch/ppc/boot/lib/vreset.c

   1 static
   2 int PCIVendor(int slotnum) {
   3  struct PCI_ConfigInfo *pslot;
   4 
   5  pslot = &PCI_slots[slotnum];
   6 
   7 return (pslot->regs[DEVID] & 0xFFFF);
   8 }

trailing comments: linux-2.6/fs/xfs/xfs_alloc.c

   1 STATIC int                              /* success (>= minlen) */
   2 xfs_alloc_compute_aligned(
   3         xfs_agblock_t   foundbno,       /* starting block in found extent */
   4         xfs_extlen_t    foundlen,       /* length in found extent */
   5         xfs_extlen_t    alignment,      /* alignment for allocation */
   6         xfs_extlen_t    minlen,         /* minimum length for allocation */
   7         xfs_agblock_t   *resbno,        /* result block number */
   8         xfs_extlen_t    *reslen)        /* result length */
   9 {
  10 ...
  11 }

multi-line trailing comments: linux-2.6/fs/xfs/xfs_ialloc.c

   1 int
   2 xfs_dialloc(
   3         xfs_trans_t     *tp,            /* transaction pointer */
   4         xfs_ino_t       parent,         /* parent inode (directory) */
   5         mode_t          mode,           /* mode bits for new inode */
   6         int             okalloc,        /* ok to allocate more space */
   7         xfs_buf_t       **IO_agbp,      /* in/out ag header's buffer */
   8         boolean_t       *alloc_done,    /* true if we needed to replenish
   9                                            inode freelist */
  10         xfs_ino_t       *inop)          /* inode number allocated */
  11 {
  12 ...
  13 }

linux-2.6/arch/arm/mach-ixp4xx/nslu2-pci.c

   1 int __init nslu2_pci_init(void) /* monkey see, monkey do */
   2 {
   3         if (machine_is_nslu2())
   4                 pci_common_init(&nslu2_pci);
   5 
   6         return 0;
   7 }

linux-2.6/sound/pci/au88x0/au88x0_core.c^vortex_wtdma_setmode

   1 static void
   2 vortex_wtdma_setmode(vortex_t * vortex, int wtdma, int ie, int fmt, int d,
   3                      /*int e, */ u32 offset)
   4 {
   5 ...
   6 }

In-line blocks, trailing comments, as well as stupid trailing and newline whitespace, makes writing of an automated text processing script much harder. Mainly, because they can break parsing assumptions, which may lead to not just false positives, but to completely broken parsing.

Pattern: function definition. Parameter block [ '(' ')' ].

The first symbol to construct a match of function definition, that comes in mind, is '(', i.e. a function parameter block. But it becomes naive too quickly, even with much more comprehensive scripting, than just grep. This is because of:

linux-2.6/arch/x86/boot/tty.c

   1 void __attribute__((section(".inittext"))) putchar(int ch)
   2 {
   3 ...
   4 }

if such special attribute will be place on separate line, i.e.

linux-2.6/arch/um/sys-x86_64/stub_segv.c

   1 void __attribute__ ((__section__ (".__syscall_stub")))
   2 stub_segv_handler(int sig)
   3 {
   4 ...
   5 }

it becomes another unnecessary complication: it uses [ '(' ')' ] arbitrarily, [ '"' '.' etc ] become needed symbols.

For those, who thinks, that above thing is easy to solve, have this one in mind: linux-2.6/arch/alpha/boot/misc.c

   1 extern long srm_printk(const char *, ...)
   2      __attribute__ ((format (printf, 1, 2)));
  • Isn't it a good idea, to put compiler-specific stuff in #macros?

Even less hope remains after function pointers in parameters

linux-2.6/arch/arm/mach-pnx4008/dma.c

   1 int pnx4008_request_channel(char *name, int ch,
   2                             void (*irq_handler) (int, int, void *), void *data)
   3 {
   4         int i, found = 0;
   5 
   6 ...
   7 }

and broken style like this:

linux-2.6/net/ipv4/netfilter/nf_nat_ftp.c

   1 static int (*mangle[])(struct sk_buff *, __be32, u_int16_t,
   2                        unsigned int, unsigned int, struct nf_conn *,
   3                        enum ip_conntrack_info)
   4 = {
   5         [NF_CT_FTP_PORT] = mangle_rfc959_packet,
   6         [NF_CT_FTP_PASV] = mangle_rfc959_packet,
   7         [NF_CT_FTP_EPRT] = mangle_eprt_packet,
   8         [NF_CT_FTP_EPSV] = mangle_epsv_packet
   9 };

Macro usage can finish this idea off:

linux-2.6/drivers/media/video/adv7170.c

   1 module_param(debug, int, 0);
   2 MODULE_PARM_DESC(debug, "Debug level (0-1)");
   3 ...
   4 module_init(adv7170_init);
   5 module_exit(adv7170_exit);

Here they all are ended with ';', which is quite good for parsing. But there are things like:

linux-2.6/drivers/uio/uio.c

   1 module_init(uio_init)
   2 module_exit(uio_exit)
   3 MODULE_LICENSE("GPL v2");

Such thing is almost indistinguishable from a func.def:

   1 main(void)

Thus, every single case must have a crutch in matching algorithm.

  • But isn't that a cool idea, to have a universal rule, like mathematical models of physical processes usually have?

OK. This was first attempt to match a func.def. I even came out with a script with almost all crutches. It will be correctness and speed reference for another matching rule, which now will include function types on separate lines -- a very common case even in the Linux sources.

Pattern: full function definition with types. Body block [ '{' '}' ].

Multi-line types and function parameter lists are quite easy to handle(after linearization), if main pattern would be a block:

BRE: func.def body

   1 ^int main(void) {$
   2 --------------^^^^

Broken function blocks make this thing harder. To see a block, another input line must be appended, and if a macro usage is followed by empty line:

foo.c

   1 ...
   2 module_init(foo_init)
   3 
   4 /* doing bar */
   5 void bar(void)
   6 {
   7 ...
   8 }

then another arbitrary (e.g. comment, macros + multi-line cases), as opposed to expected '\n{', line must be appended. This complicates further parsing very much.

But let's begin to start with an assumption, that '{' is right on the next line after parameter block.

BRE+sed: naive skipping of non func.def

   1 N
   2 #- skip
   3 #^module_exit(foo)
   4 #\n$
   5 
   6 /[;\n]$/b
   7 
   8 # this skips assignments, function declarations as well

thus, func.body can be identified simply as:

BRE+sed: simple func.body

   1 /{$/b_funcdef

with in-line body crap:

BRE+sed: in-line func.body

   1 /[{}]$/b_funcdef

Summing up

  • function type and name can be on separate lines
  • parameter block can be multi-line
    • trailing or free standing C/C++ comments
  • function definition pattern is ') {'

  • whilespace [:blank:] can be everywhere

Now it's time to write all this (and much more) in a `sed` script; step by step.

Other style problems

Free-order assembler (which happily ends with ';'):

linux-2.6/arch/s390/kernel/process.c

   1 asm(
   2         ".align 4\n"
   3         "kernel_thread_starter:\n"
   4         "    la    2,0(10)\n"
   5         "    basr  14,9\n"
   6         "    la    2,0\n"
   7         "    br    11\n");

broken offset of a block inside of a top block:

linux-2.6/drivers/atm/zatm.c

   1 static void poll_rx(struct atm_dev *dev,int mbx)
   2 {
   3 ...
   4 #if 0
   5 printk("RX IND: 0x%x, 0x%x, 0x%x, 0x%x\n",here[0],here[1],here[2],here[3]);
   6 {
   7 unsigned long *x;
   8                 printk("POOL: 0x%08x, 0x%08x\n",zpeekl(zatm_dev,
   9                       zatm_dev->pool_base),
  10                       zpeekl(zatm_dev,zatm_dev->pool_base+1));
  11                 x = (unsigned long *) here[2];
  12                 printk("[0..3] = 0x%08lx, 0x%08lx, 0x%08lx, 0x%08lx\n",
  13                     x[0],x[1],x[2],x[3]);
  14 }
  15 #endif
  16 ...
  17 }

linux-2.6/arch/arm/mach-ixp4xx/gtwx5715-pci.c

   1 static int __init gtwx5715_map_irq(struct pci_dev *dev, u8 slot, u8 pin)
   2 {
   3         int rc;
   4         static int gtwx5715_irqmap
   5                         [GTWX5715_PCI_SLOT_COUNT]
   6                         [GTWX5715_PCI_INT_PIN_COUNT] = {
   7         {GTWX5715_PCI_SLOT0_INTA_IRQ, GTWX5715_PCI_SLOT0_INTB_IRQ},
   8         {GTWX5715_PCI_SLOT1_INTA_IRQ, GTWX5715_PCI_SLOT1_INTB_IRQ},
   9 };
  10 ...
  11 }

this are a total show-stoppers from coding style POV. If parser is removing body after a func.def to speed-up processing, then some tail will remain.

linux-2.6/arch/alpha/kernel/smc37c669.c

   1 static unsigned long SMC37c669_Addresses[] __initdata = {
   2   0x3F0UL,            /* Primary address      */
   3   0x370UL,            /* Secondary address    */
   4   0UL                 /* End of list          */
   5     };

On contrary this one lets simple skiping parser to eat a lot more, than needed.

loop to skip top bodies

   1 :_body
   2  n
   3  # save ; coloring is done on copy ; restore
   4  h ; '"`verbose 5 bdy_`"' ; g
   5  /^}/!b_body
   6 -^^^- doesn't work

Generally parsing of the simple (without commented match) #if/#endif, as well as this kind of brokenness, require some kind of memory or stack to implement nesting. IMHO such things are completely out of order.

Old hardware, failed coding style.

linux-2.6/drivers/block/paride/on26.c

   1 static int on26_read_regr( PIA *pi, int cont, int regr )
   2 
   3 {       int     a, b, r;
   4 ...
   5 }

best noki-debugging

linux-2.6/arch/arm/mach-omap2/gpmc.c

   1 #ifdef DEBUG
   2 static int set_gpmc_timing_reg(int cs, int reg, int st_bit, int end_bit,
   3                                int time, const char *name)
   4 #else
   5 static int set_gpmc_timing_reg(int cs, int reg, int st_bit, int end_bit,
   6                                int time)
   7 #endif
   8 {
   9         u32 l;
  10         int ticks, mask, nr_bits;
  11 
  12         if (time == 0)
  13                 ticks = 0;
  14         else
  15                 ticks = gpmc_ns_to_ticks(time);
  16         nr_bits = end_bit - st_bit + 1;
  17         if (ticks >= 1 << nr_bits) {
  18 #ifdef DEBUG
  19                 printk(KERN_INFO "GPMC CS%d: %-10s* %3d ns, %3d ticks >= %d\n",
  20                                 cs, name, time, ticks, 1 << nr_bits);
  21 #endif
  22                 return -1;
  23         }
  24 
  25         mask = (1 << nr_bits) - 1;
  26         l = gpmc_cs_read_reg(cs, reg);
  27 #ifdef DEBUG
  28         printk(KERN_INFO
  29                 "GPMC CS%d: %-10s: %3d ticks, %3lu ns (was %3i ticks) %3d ns\n",
  30                cs, name, ticks, gpmc_get_fclk_period() * ticks / 1000,
  31                         (l >> st_bit) & mask, time);
  32 #endif
  33         l &= ~(mask << st_bit);
  34         l |= ticks << st_bit;
  35         gpmc_cs_write_reg(cs, reg, l);
  36 
  37         return 0;
  38 }

Comments

Please, leave comments here. E-mail messages will be posted there as well.


CategoryHomepage

KernelNewbies: olecom (last edited 2008-04-08 19:15:44 by olecom)