KernelNewbies:

Linux ABI

As a newbie you must have wondered how the kernel executes compiled code? How does the compiler running in user space generate code that the kernel is able to understand and execute? The straightforward and easy answer is - through a shared standard called ELF

Amongst the most important features of ELF is the function calling sequence. The function calling sequence defines how the registers are used in a function; some registers are assigned special meaning (frame pointer, stack pointer, etc). ELF defines the functions and names of special registers, including what registers might need to be saved. It defines the stack frame. The stack frame has details of the run time stack, including registers saved, return address and frame pointer (if any) and whether the stack grows upwards or downwards. It also defines the parameter passing convention (whether they are passed through the stack or through registers). As you might have deduced by now, all these details are architecture specific .

If you are curious to know why these details are important to you - look at the following (typical) oops reported on lkml (http://lkml.org/lkml/2006/8/8/157)

BUG: unable to handle kernel NULL pointer dereference at virtual
address 00000000
 printing eip:
c027c3d2
*pde = 00000000
Oops: 0000 [#3]
Modules linked in: qla2xxx ext3 jbd mbcache sg ide_cd cdrom floppy
CPU:    0
EIP:    0060:[<c027c3d2>]    Not tainted VLI
EFLAGS: 00010202   (2.6.17.3 #1)
EIP is at dm_put_device+0xf/0x3b
eax: 00000001   ebx: ee4fcac0   ecx: 00000000   edx: ee4fcac0
esi: ee4fc4e0   edi: ee4fc4e0   ebp: 00000000   esp: c5db3e78
ds: 007b   es: 007b   ss: 0068
Process multipathd (pid: 15912, threadinfo=c5db2000 task=ef485a90)
Stack: ec4eda40 c02816bd ee4fc4c0 00000000 f7e89498 f883e0bc c02816f6 f7e89480
      f7e8948c c0281801 ffffffea f7e89480 f883e080 c0281ffe 00000001 00000000
      00000004 dfe9cab8 f7a693c0 f883e080 f883e0c0 ca4b99c0 c027c6ee 01400000
Call Trace:
 <c02816bd> free_pgpaths+0x31/0x45  <c02816f6> free_priority_group+0x25/0x2e
 <c0281801> free_multipath+0x35/0x67  <c0281ffe> multipath_ctr+0x123/0x12d
 <c027c6ee> dm_table_add_target+0x11e/0x18b  <c027e5b4> populate_table+0x8a/0xaf
 <c027e62b> table_load+0x52/0xf9  <c027ec23> ctl_ioctl+0xca/0xfc
 <c027e5d9> table_load+0x0/0xf9  <c0152146> do_ioctl+0x3e/0x43
 <c0152360> vfs_ioctl+0x16c/0x178  <c01523b4> sys_ioctl+0x48/0x60
 <c01029b3> syscall_call+0x7/0xb
Code: 97 f0 00 00 00 89 c1 83 c9 01 80 e2 01 0f 44 c1 88 43 14 8b 04
24 59 5b 5e 5f 5d c3 53 89 c1 89 d3 ff 4a 08 0f 94 c0 84 c0 74 2a <8b>
01 8b 10 89 d8 e8 f6 fb ff ff 8b 03 8b 53 04 89 50 04 89 02
EIP: [<c027c3d2>] dm_put_device+0xf/0x3b SS:ESP 0068:c5db3e78

As you can see - the kernel did a good job of producing a backtrace for us. Let's see how the kernel and tools are able to produce a backtrace in case you ever need to do it yourself. The ELF convention for your architecture should give you all the required details to produce a backtrace.

ELF i386 convention

Registers

General purpose

eax

return value

edx

dividend register

ecx

count register

ebx

local register variable

ebp

stack frame pointer (optinal)

esi

local register variable

edi

local register variable

esp

stack pointer

Floating Point

st0

floating point stack top, return value

st1

floating point next to stack top

..

..

st7

floating point stack bottom

Stack Frame

4n+8(ebp)

argument n

High address

...

...

8(ebp)

argument 1

4(ebp)

return address

0(ebp)

previous ebp (optional)

-4(ebp)

unspecified

Low address

Parameter Passing

As shown in the stack frame, the arguments are passed in the stack frame, above the return address.

While GCC can generate code with different calling conventions for userspace programs or kernel, by using the -mregparm option, this does not change the userspace<->kernel communication style. glibc enforces the previously discussed method of passing parameters no matter if you pass -mregparm to the compiler or not. In fact, the recent kernel default configuration results in -mregparm=3 code, while the userspace programs are almost always compiled without this option at all (to maintain binary compatibility; -mregparm is usually an "all or nothing" choice).

1. Quiz - Given a function foo(a, b, c) - in what order are arguments pushed

a

b

c

'or'

c

b

a

and why?

i. Hint - Think elipses "..." i. Hint - Look at stdarg.h

Prologue and Epilogue

Each function call requires a setup and tear down of the function call stack frame. The setup is called 'prologue' and the tear down is called 'epilogue'

A typical prologue (generated by gcc is shown below)

 80488e4:       55                      push   %ebp
 80488e5:       89 e5                   mov    %esp,%ebp
 80488e7:       83 ec 08                sub    $0x8,%esp

A typical epilogue (generated by gcc is shown below)

 804856d:       c9                      leave
 804856e:       c3                      ret

In addition to the registers shown above, the local variable registers (ebx, esi, edi) might also be pushed and popped, depending on whether they are used within the function or not.

1. Quiz - generate a backtrace from the following stack

0xf3e2de34 f3e2de70 c0135351 401ef021 00000000   p.bsQS...p......
0xf3e2de44 f3e2de81 00000021 f3e2c000 f7950924   ..bs......bs...w
0xf3e2de54 00000000 401ef000 00000246 00000246   .....p..F...F...
0xf3e2de64 00000001 00000001 f7950000 f3e2df18   ...........w..bs
0xf3e2de74 c02898b5 322d7875 23203235 00007820   5...ux.252...x..
0xf3e2de84 f3e2de98 00000000 f7950bd0 bffff0ec   ..bs....P..wlp..
0xf3e2de94 c70cf660 00000000 00000000 bffff0ec   .v.G........lp..
0xf3e2dea4 f3e2c000 f7950930 7fffffff 00000000   ..bs0..w........
0xf3e2deb4 00000000 00000001 00000000 bffff07b   .............p..
0xf3e2dec4 f5cfd880 bffff07b 00000000 f5f24740   .XOu.p.......Gru
0xf3e2ded4 c0124f87 00000000 00000000 00200200   .O..............
0xf3e2dee4 f3e2defc 067b3067 00000000 f5f24740   ..bsg0.......Gru
0xf3e2def4 c0124f87 f7950934 f7950934 c028331b   .O..4..w4..w.3..
0xf3e2df04 00000000 c01b0fe8 f5cfd880 f7950000   ....h....XOu...w
0xf3e2df14 fffffffb f3e2df50 c0283642 00000001   ....P.bsB6......
0xf3e2df24 00000001 00000001 f3e2c000 f6fe30d0   ..........bsP0.v

i. Hint - the current function can always be determined from EIP

References

TODO

KernelNewbies: ABI (last edited 2021-01-13 03:19:38 by RandyDunlap)