KernelNewbies:

This Section will cover the internals of Interrupt Handling in Linux Kernel (all explaination is related to i386 platform). This section is under development and might be incomplete right now.

I will cover the following topics in this section, explaining thehardware as well as software part of it, from how the interrupts aregenerated, routed and then handled by the low level code of Linux Kernel.

  1. Introduction
  2. Interrupt Routing
    1. Details of Programmable Interrupt Controller
  3. Details of Interrupt Descriptor Table
    1. Task Gates
    2. Trap Gates
    3. Interrupt Gates
  4. Hardware Checks for Interrupts and Exceptions
  5. Kernel Support for Handling Interrupts
  6. Low Level Interrupt Stubs - Details of do_IRQ() function, core of Interrupt Handling

1. Introduction

This section will discuss, the hardware prospective of interrupt handling fromCPU, Linux Kernel's Interrupt Routing subsystem, Device Drivers's rolein Interrupt handling.

Term Interrupt is self defined,Interrupts are signals sent to CPU on an INTR bus (connected to CPU) whenever any device want to get attention of CPU. As soon as theinterrupt signal occurs, CPU defer the current activity and service the interrupt by executing the interrupt handler corresponding to that interrupt number (also know as IRQ number).

One of the clasifications of Interrupts can be done as follows: - Synchronous Interrupts (also know on as software interrupts) - Asynchronous Interrupts (also know as hardware interrupts)

Basic difference between these is that, synchronous interrupts are generated by CPU's control unit on facing some abnormal condition; these are also know as exception in Intel's termenology. These are interrupts whihc are generated by CPU itself either when CPU detects an abnormal condition or CPU executes some of the special instructions like 'int'or 'int3' etc. on other hand, asynchronous interupts are those, which actually are generated by outside world (devices connected to CPU). As these interrupts can occur at any point of time, these are known asynchronous interrupts.

Its important to note that both synchornous and asynchronous interrupts are handled by CPU on the completion of insturction during which the interrupt occur. Execution of a machine instruction is not done in one single CPU cycle, it take somecycles to complete. Any interrupt occurs in between the execution of instruction, will not be handled imediately, rather CPU will check o finterrupts on the completion of instruction.

2. Interrupt Routing

For handling interrupts there are few of the things which we expect theCPU to do on occurence of every interrupt. Whenever an interrupt occurs,CPU performs some of the hardware checks, which are very muchneeded tomake the system secure. Before explaining the hardware checks,we willunderstand how the interrupts are routed to the CPU fromhardwaredevices.

2a) Details of Programmable Interrupt Controller

On Intel architecture, system devices (devicecontrollers)areconnectedto a special device known as PIC (Programmable Interrupt Controller). CPU have two lines for receiving interrupt signals(NMIandINTR). NMIline is to recieve non-maskable interrupts; theinterruptswhich cannot be masked, means which can not be blocked atanycost.Theseinterrupts are of hightest priority and are rarelyused. INTRlineisthe line on which all the interrupts from system devicesarereceived.These interrupts can be masked or blocked. As alltheinterruptsignalsneed to be multiplxed on single CPU line, weneedsomemechanisumthrough which interrupts from different devicecontrollerscan berouted to single line of CPU. This routing ormultiplexing isdone PIC(Programmable Interrupt Controller). PIC sitsbetween systemdevicesand CPU and have multiple input lines; each lineconnected todifferentdivice contollers in system. On other hand IPChave only oneoutputline which is connected to the CPU's INTR line onwhich it sendssignalto CPU. There are two PIC controllers joinedtogether and theoutput ofsecond PIC controller is connected tothesecond input of firstPCI.This setup allows maximum of 15 input lines onwhich differentsystemdevice controllers can be connected. PIC havesomeprogrammableregisters, through which CPU communicates with it(givecommand,mask/unmask interrup lines, read status). Both PICs havetheirownfollowing registers:

Mask Register

Status Register

Mask register is used to mask/unmask a specific interrupt line.CPUcanask the PIC to mask (block) the specific interrupt bysettingthecorresponding bit in mask register. Unmasking can be donebyclearingthat bit. When a particular interrupt is being masked, PICdoreceivethe interrupts on its corresponding input line, but do notsendtheinterrupt singnal to CPU in which case CPU keeps on doing whatitwasdoing. When an interrupts are being masked, they are notlost,ratherPIC remembers those and do send the interrupt to CPU whenCPUunmasksthat interrupt line. Masking is different from blockingalltheinterrupts toCPU. CPU can ignore all the interrupts coming onINTRlineby clearing the IF (Interrupt Falg) flag in EFLAGS registerofCPU.When this bit is cleared, interrupts coming on INTR linearesimplyignored by CPU, we can consider it to be blocking ofinterrupts.So nowwe understand that masking is done at PIC levelandindividualinterrupt lines can be masked or unmasked,where asblockingis done atCPU level and is done for all the interrupts comingto tahtCPU exceptNMI (Non-Maskable Interrupt), which is received on NMIlineof CPU andcan not be blocked or ignored.

Now days,interrupt architecture is not as simple as shown above.Nowdaysmachines uses the APIC (Advanced Programmable Interrupt Controller),which can support upto 256 interrupt lines. AlongwithAPIC, every CPUalso have inbuilt IO-APIC. I won't go into detailsofthese right now.

Once the interrupt signal is received by CPU, CPU performs somehardware checks for which no software machine instructions areexecuted. Before looking into what these checks are, we need tounderstand some architecture spcific data structures maintained bykernel.

3. Details of Interrupt Descriptor Table (IDT)

Kernel need to maintain one IDT (Interrupt Descriptor Table), which actually maps the interrupt line with the interrupt handler routine. This table is of 256 enteries and each entry is of 8 bytes. First 32 enteries of this table are used for exceptions and rest are used for hardware interrupts received from outer world. This table can contain three different type of enteries; these three different types are as follows:

Task Gates Trap Gates Interrupt Gates

Lets see what these gates are where these are used.

3a). Task Gates

Format of task gates is as follows:

Basicallythe task gates are used in IDT, to allow the user processs tomake acontext switch with another process without requesting the kernelto dothis. As soon as this gate is hit (interrupt received on line forwhichthere is a task gate in IDT), CPU saves the context (state ofprocessorregisters) of currently running process to the TSS of currentprocess,whose address is saved in TR (Task Register) of CPU. Aftersaving thecontext of current process, CPU sets the CPU registers withthe valuesstored in the TSS of new process, whose pointer is saved inthe 16-31bits of the task gate. Once the registers are set with thesenewvalues, processor gets the new process and the context switch isdone.Linux do not use the task gates, it only uses the trap andinterruptgates in IDT. So I will not explain the task gates any more.

3b). Trap Gates

Format of trap gates is as follows:

Trapgates are basically used to handle exceptions generated byCPU.0-15bits and 48-63 bits together form the pointer (offsetinsegmentidentified by 16-31 bits of this entry) to a kernelfunction.The onlydifference between trap gates and interrupt gates isthat,whenever aninterrupt gate is hit, CPU automatically disablestheinterrupts byclearing the IF flag in CPU's EFLAG register, whereasincase of trapgate this is not done and interrupts remain enabled.Asmentionedearlier trap gates are used for exceptions, so in Linux Kernel first 32 enteries in IDTare initialized with trap gates. Inaddition tothis Linux Kernel alsouses the trap gate for system callentry (entry128 of IDT).

3c). Interrupt Gates

Format is as follows:

Format of interrupt gates is same as trap gates explainedabove,expectthe value of type field (40-43 bits). In case of trap gatesthishave avalue 1111 and in case of interrupts its 1110.

Note: whenever the interrupt gate is hit, interrupts are disabled automatically.

4. Hardware Checks for Interrupts and Exceptions

Whenever an exception or interrupt occurs, corresponding trap/interruptgate is hit and CPU performs some checks with fields of these gates.Things done by CPU are as follows:

1). get the ith entry fromIDT (physical address and size of IDT is stored in IDTR register ofCPU), here 'i' means the interrupt number.

2). read the segment descriptor index from 16-31 bits of IDT entry, lets say this to be 'n'

3). gets the segment descriptor from 'n'th entry in GDT (physical address and size of GDT is stored in GDTR register of CPU)

4).DPL of the nth entry in the GDT should be less that equal toCPL(Current Previelge Level, specified in the read-only lowermost twobitsof CS register). Incase DPL > CPL, CPU will generategeneralprotection exception. We will see ahead, whatdoes this checkmean andwhy this is done. Simply saying:

general protection exception IfDPL (of GDT entry) < CPL, we are entering the higher previlege level(probably from user to kernel mode). In this case CPU switches thehardware stack (SS and ESP registers) from currently running process'suser mode stack to its kernel mode stack. We will see ahead, how thisstack switch is exactly done. Note: stack switching idea has beenmentioned here, but it actually happens after the 5th step mentionedbelow.

5). for software interrupts (generated by assemblyinstructions 'int'),one more check is done. This check is notperformedfor hardwareinterrupts (interrupts generated by systemdevices and forwarded byPIC). Simply saying:

6).switches the stack if DPL (of GDT entry) < CPL. In addition to thismode of CPU (least significant two bits of CS) are also changed fromCPL to DPL (of GDT entry)

7). if the stack switch has takenplace (SS and ESP registers reset to kernelstack), then pushes the oldvalues of SS and ESP (pointing to user stack) on this new stack (kernelstack)

8). pushes the EFALGS, CS and EIP registers on the stack(note: now we are working on kernel stack). This actually saves thepointer to user application instruction to which we need to return backafter servicing the interrupt or exception

9). In case of exceptions, if there is any harware code, processor pushes that also on kernel stack

10). loads the CS with the value of GDT entry and EIP with the offset entry of IDT (0-15 bits + 48-63 bits)

Allthe above action is done by CPU hardware without the execution of anysoftware instruction. Checks performed at step 4th and 5th (mentionedabove) are important.

4th checks make sure that the code we aregoing to execute (Interrupt Service Routine) does not fall in a segmentwith lesser previlege. Obivously the ISR can not be in lesser previlegesegment that what we are into. DPL or CPL can have 4 values (0,1,2 forkernel mode and 3 fo user mode). Out of these four only two are used,that is 0 (for kernel mode) and 3 (for user mode).

5th checkmakes sure that application can enter the kernel mode through specificgaes only, in Linux only through 128th gate entry which is for systemcall invocation. If we set the DPL field of IDT entry to be 0,1 or 2,application programme (running with CPL 3) cannot enter through thatgate entry. If it tries, CPU will generate general protectionexception. This is the reason that in Linux, DPL fields of all the IDTenteries (except 128th entry used for system call) are initialized withvalue '0', this makes sure only kernel code can access these gates notapplication code. In Linux 128th entry (used for system call) is oftrap gate type and its DPL value is initialized to 3, so thatapplication code can enter through this gate byassembly instruction"int 0x80"

Now lets see how does the stack switch happens whenthe DPL (of GDT entry) < CPL. CPU have TR (Task Register) register,which actually points to the TSS (Task Sate Segment) od currentlyrunning process. TSS is an architecture defined data structure whichcontains the stae of processor registers whenever context switch ofthis process happens. TSS include three sets of ESS and ESP fields, onefor each level of processor (0,1 and 2). These fields specifies thestack to be used whenevr we entert that processor level. Lets say theDPL value in GDT entry is 0, in this case, CPU will load the SSregister with the value of SS field in TSS for 0 level and ESP registerwith the value of ESP field in TSS for 0 level. After loading the SSand ESP with these values, CPU starts pointing to the new kernel levelstack o current process. Old values of SS and ESP (CPU remembers themsomehow) are now pushed on this new kernel level stack; this is done aswe need to return back to old stack oncewe service the interrupts,exception or system call. Prudent readers must be wondering, why thereis no firld for level 3 stack in TSS. Well the reason for this is thatwe never use the CPU's stack switching mechanism to switch from higherCPU level (kernel mode - 0,1 and 2) to lower CPU level (user mode - 3).This is the reason that CPU while entering the higher level(kernelmode) saves the previously used lower level stack (user mode) on thekernel stack.

Once all this CPU action is done, CPU's CS and EIPregisters are pointing to the kernel functions written for handlinginterrupts or exceptions. CPU simply start executing the instructionsat this point (now we are in kernel mode - level 0)

5. Kernel Support for Handling Interrupts

As this is the software part related to handling of Interrupts, I wrote this on a seperate page, please find this [:Software Handling of Interrupts:here].

KernelNewbies: KernelHacking-HOWTO/Overview_of_the_Kernel_Source_Code/Internals_of_Interrupt_Handling (last edited 2006-09-18 16:40:05 by nnc1)