KernelHacking-HOWTO/Overview_of_the_Kernel_Source_Code/Internals_of_Interrupt_Handling

Parent Node : Overview of the Kernel Source Code

This section will cover the internals of Interrupt Handling in Linux Kernel (all explaination is related to i386 platform). This section isunder development and might be incomplete right now.

I will cover the following topics in this section, explaining the hardware as well as software part of it, from how the interrupts are generated, routed and then handled by the low level code of Linux Kernel.

Introduction
Interrupt Routing
1. Details of Programmable Interrupt Controller
Details of Interrupt Descriptor Table
1. Task Gates
2. Trap Gates
3. Interrupt Gates
Hardware Checks for Interrupts and Exceptions
Linux Kernel support for Handling Interrupts - Details of do_IRQ() function, core of Interrupt Handling

Introduction

This section will discuss, the hardware prospective of interrupt handling fromCPU, Linux Kernel's Interrupt Routing subsystem, Device Drivers's rolein Interrupt handling.

Term Interrupt is self defined,Interrupts are signals sent to CPU on an INTR bus (connected to CPU) whenever any device want to get attention of CPU. As soon as theinterrupt signal occurs, CPU defer the current activity and service the interrupt by executing the interrupt handler corresponding to that interrupt number (also know as IRQ number).

One of the clasifications of Interrupts can be done as follows: - Synchronous Interrupts (also know on as software interrupts) - Asynchronous Interrupts (also know as hardware interrupts)

Basic difference between these is that, synchronous interrupts are generated by CPU's control unit on facing some abnormal condition; these are also know as exception in Intel's termenology. These are interrupts whihc are generated by CPU itself either when CPU detects an abnormal condition or CPU executes some of the special instructions like 'int'or 'int3' etc. on other hand, asynchronous interupts are those, which actually are generated by outside world (devices connected to CPU). As these interrupts can occur at any point of time, these are known asynchronous interrupts.

Its important to note that both synchornous and asynchronous interrupts are handled by CPU on the completion of insturction during which the interrupt occur. Execution of a machine instruction is not done in one single CPU cycle, it take somecycles to complete. Any interrupt occurs in between the execution of instruction, will not be handled imediately, rather CPU will check o finterrupts on the completion of instruction.

Interrupt Routing

For handling interrupts there are few of the things which we expect theCPU to do on occurence of every interrupt. Whenever an interrupt occurs, CPU performs some of the hardware checks, which are very much needed to make the system secure. Before explaining the hardware checks,we will understand how the interrupts are routed to the CPU from hardware devices.

Details of Programmable Interrupt Controller

On Intel architecture, system devices (device controllers) are connected to a special device known as PIC (Programmable Interrupt Controller). CPU have two lines for receiving interrupt signals (NMI and INTR). NMI line is to recieve non-maskable interrupts; the interrupts which can not be masked, means which can not be blocked at any cost.These interrupts are of hightest priority and are rarely used. INTR line is the line on which all the interrupts from system devices are received.These interrupts can be masked or blocked. As all the interrupt signals need to be multiplxed on single CPU line, we need some mechanisum through which interrupts from different device controllerscan be routed to single line of CPU. This routing ormultiplexing isdone PIC(Programmable Interrupt Controller). PIC sitsbetween systemdevicesand CPU and have multiple input lines; each line connected to different divice contollers in system. On other hand IPC have only one output line which is connected to the CPU's INTR line on which it send signal to CPU. There are two PIC controllers joined together and the output of second PIC controller is connected to the second input of first PCI.This setup allows maximum of 15 input lines on which different system device controllers can be connected. PIC have some programmable registers, through which CPU communicates with it (give command, mask/unmask interrup lines, read status). Both PICs have their own following registers:

Mask Register

Status Register

Mask register is used to mask/unmask a specific interrupt line. CPU can ask the PIC to mask (block) the specific interrupt by setting the corresponding bit in mask register. Unmasking can be done by clearing that bit. When a particular interrupt is being masked, PIC do receive the interrupts on its corresponding input line, but do not send the interrupt singnal to CPU in which case CPU keeps on doing what it was doing. When an interrupts are being masked, they are not lost, rather PIC remembers those and do send the interrupt to CPU when CPU unmasks that interrupt line. Masking is different from blocking all the interrupts to CPU. CPU can ignore all the interrupts coming on INTR line by clearing the IF (Interrupt Falg) flag in EFLAGS register of CPU. When this bit is cleared, interrupts coming on INTR line are simply ignored by CPU, we can consider it to be blocking of interrupts.So now we understand that masking is done at PIC level and individual interrupt lines can be masked or unmasked,where as blocking is done at CPU level and is done for all the interrupts coming to that CPU except NMI (Non-Maskable Interrupt), which is received on NMI lineof CPU and can not be blocked or ignored.

Now days, interrupt architecture is not as simple as shown above.Now days machines uses the APIC (Advanced Programmable Interrupt Controller),which can support upto 256 interrupt lines. Along with APIC, every CPU also have in-built IO-APIC. I won't go into detailsofthese right now.

Once the interrupt signal is received by CPU, CPU performs some hardware checks for which no software machine instructions are executed. Before looking into what these checks are, we need to understand some architecture specific data structures maintained by kernel.

Details of Interrupt Descriptor Table (IDT)

Kernel need to maintain one IDT (Interrupt Descriptor Table), which actually maps the interrupt line with the interrupt handler routine. This table is of 256 enteries and each entry is of 8 bytes. First 32 enteries of this table are used for exceptions and rest are used for hardware interrupts received from outer world. This table can contain three different type of enteries; these three different types are as follows:

Task Gates Trap Gates Interrupt Gates

Lets see what these gates are where these are used.

a). Task Gates

Format of task gates is as follows:

0 to 15 bits : reserved (not used)
16 to 31 bits : points to the TSS (Task State Segment) entry of the process to which we need to switch.
32 to 39 bits : these bits are reserved and are not currently used.
40 to 43 bits : specify the type of entry (its value for task gate is 0101)
44th bit : always 0, not used
45 to 46 bits : this specifies the DPL (Decsriptor Previlege Level) level of gate entry.
47th bit : specifies if this entry is valid or not (1 - valid, 0 - invalid)
48 to 63 bits : reserved (not used)

Basicallythe task gates are used in IDT, to allow the user processs to make a context switch with another process without requesting the kernel to do this. As soon as this gate is hit (interrupt received on line for which there is a task gate in IDT), CPU saves the context (state of processor registers) of currently running process to the TSS of current process,whose address is saved in TR (Task Register) of CPU. After saving the context of current process, CPU sets the CPU registers with the values stored in the TSS of new process, whose pointer is saved inthe 16-31 bits of the task gate. Once the registers are set with these new values, processor gets the new process and the context switch is done. Linux do not use the task gates, it only uses the trap and interrupt gates in IDT. So I will not explain the task gates any more.

b). Trap Gates

Format of trap gates is as follows:

0-15 bits : first 16 bits of a pointer to a kernel function which need to be invoked when this gate is hit
16-31 bits : indicates the index of segment descriptor in GDT (Global Descriptor Table)
32-36 bits : these bits are reserved and are not currently used.
37-39 bits : always 000, not used
40-43 bits : specify the type of entry (its value for trap gate is 1111)
44th bit : always 0, not used
45-46 bits : this specifies the DPL (Decsriptor Previlege Level) level of gate entry.
47th bit : specifies if this entry is valid or not (1 - valid, 0 - invalid)
48-63 bits : last 16 bits of a pointer to a kernel function which need to be invoked when this gate is hit

Trap gates are basically used to handle exceptions generated by CPU. 0-15 bits and 48-63 bits together form the pointer (offset in segment identified by 16-31 bits of this entry) to a kernel function.The only difference between trap gates and interrupt gates is that,whenever an interrupt gate is hit, CPU automatically disables theinterrupts by clearing the IF flag in CPU's EFLAG register, where as incase of trap gate this is not done and interrupts remain enabled.As mentioned earlier trap gates are used for exceptions, so in Linux Kernel first 32 enteries in IDTare initialized with trap gates. In addition to this Linux Kernel also uses the trap gate for system call entry (entry128 of IDT).

c). Interrupt Gates

Format is as follows:

0-15 bits : first 16 bits of a pointer to a kernel function which need to be invoked when this gate is hit
16-31 bits : indicates the index of segment descriptor in GDT (Global Descriptor Table)
32-36 bits : these bits are reserved and are not currently used.
37-39 bits : always 000, not used
40-43 bits : specify the type of entry (its value for interrupt gate is 1110)
44th bit : always 0, not used
45-46 bits : this specifies the DPL (Decsriptor Previlege Level) level of gate entry.
47th bit : specifies if this entry is valid or not (1 - valid, 0 - invalid)
48-63 bits : last 16 bits of a pointer to a kernel function which need to be invoked when this gate is hit

Format of interrupt gates is same as trap gates explained above,expect the value of type field (40-43 bits). In case of trap gates this have a value 1111 and in case of interrupts its 1110.

Note: whenever the interrupt gate is hit, interrupts are disabled automatically.

Hardware Checks for Interrupts and Exceptions

Whenever an exception or interrupt occurs, corresponding trap/interrupt gate is hit and CPU performs some checks with fields of these gates.Things done by CPU are as follows:

1). get the ith entry fromIDT (physical address and size of IDT is stored in IDTR register ofCPU), here 'i' means the interrupt number.

2). read the segment descriptor index from 16-31 bits of IDT entry, lets say this to be 'n'

3). gets the segment descriptor from 'n'th entry in GDT (physical address and size of GDT is stored in GDTR register of CPU)

4).DPL of the nth entry in the GDT should be less that equal toCPL(Current Previelge Level, specified in the read-only lowermost twobitsof CS register). Incase DPL > CPL, CPU will generate general protection exception. We will see ahead, what does this check mean and why this is done. Simply saying:

general protection exception IfDPL (of GDT entry) < CPL, we are entering the higher previlege level (probably from user to kernel mode). In this case CPU switches thehardware stack (SS and ESP registers) from currently running process'suser mode stack to its kernel mode stack. We will see ahead, how this stack switch is exactly done. Note: stack switching idea has been mentioned here, but it actually happens after the 5th step mentioned below.

5). for software interrupts (generated by assembly instructions 'int'), one more check is done. This check is not performed for hardware interrupts (interrupts generated by system devices and forwarded by PIC). Simply saying:

DPL (of IDT entry) >= CPL : ok, we have permission to enter through this gate
DPL < CPL : genreal protection exception

6).switches the stack if DPL (of GDT entry) < CPL. In addition to this mode of CPU (least significant two bits of CS) is also changed from CPL to DPL (of GDT entry)

7). if the stack switch has taken place (SS and ESP registers reset to kernelstack), then pushes the oldvalues of SS and ESP (pointing to user stack) on this new stack (kernel stack)

8). pushes the EFALGS, CS and EIP registers on the stack (note: now we are working on kernel stack). This actually saves the pointer to user application instruction to which we need to return back after servicing the interrupt or exception

9). In case of exceptions, if there is any harware code, processor pushes that also on kernel stack

10). loads the CS with the value of GDT entry and EIP with the offset entry of IDT (0-15 bits + 48-63 bits)

All the above action is done by CPU hardware without the execution of any software instruction. Checks performed at step 4th and 5th (mentioned above) are important.

4th checks make sure that the code we are going to execute (Interrupt Service Routine) does not fall in a segment with lesser previlege. Obivously the ISR can not be in lesser previlege segment than that what we are into. DPL or CPL can have 4 values (0,1,2 for kernel mode and 3 fo user mode). Out of these four only two are used, that is 0 (for kernel mode) and 3 (for user mode).

5th check makes sure that application can enter the kernel mode through specific gates only, in Linux only through 128th gate entry which is for system call invocation. If we set the DPL field of IDT entry to be 0,1 or 2,application programme (running with CPL 3) cannot enter through that gate entry. If it tries, CPU will generate general protection exception. This is the reason that in Linux, DPL fields of all the IDT enteries (except 128th entry used for system call) are initialized with value '0', this makes sure only kernel code can access these gates not application code. In Linux 128th entry (used for system call) is of trap gate type and its DPL value is initialized to 3, so that application code can enter through this gate by assembly instruction"int 0x80"

Now lets see how does the stack switch happens when the DPL (of GDT entry) < CPL. CPU have TR (Task Register) register,which actually points to the TSS (Task Sate Segment) od currently running process. TSS is an architecture defined data structure which contains the stae of processor registers whenever context switch ofthis process happens. TSS include three sets of ESS and ESP fields, one for each level of processor (0,1 and 2). These fields specifies the stack to be used whenever we enter that processor level. Lets say the DPL value in GDT entry is 0, in this case, CPU will load the SS register with the value of SS field in TSS for 0 level and ESP registerwith the value of ESP field in TSS for 0 level. After loading the SS and ESP with these values, CPU starts pointing to the new kernel levelstack o current process. Old values of SS and ESP (CPU remembers them somehow) are now pushed on this new kernel level stack; this is done as we need to return back to old stack oncewe service the interrupts,exception or system call. Prudent readers must be wondering, why there is no field for level 3 stack in TSS. Well the reason for this is that we never use the CPU's stack switching mechanism to switch from higher CPU level (kernel mode - 0,1 and 2) to lower CPU level (user mode - 3).This is the reason that CPU while entering the higher level (kernel mode) saves the previously used lower level stack (user mode) on thekernel stack.

Once all this CPU action is done, CPU's CS and EIP registers are pointing to the kernel functions written for handling interrupts or exceptions. CPU simply start executing the instructions at this point (now we are in kernel mode - level 0)

Linux Kernel support for Handling Interrupts - Details of do_IRQ() function, core of Interrupt Handling

As this is the software part related to handling of Interrupts and maybe interest of wider audience so I wrote this on a seperate page, please find this here.

Parent Node : Overview of the Kernel Source Code

CategoryKernelHacking