Parent Node : Exceptions and Interrupts Handling
In this section, I will cover and walk through the kernel code executedin interrupt context. I will be reffering the the code as per2.4.18release of kernel.
Low Level Interrupt Stubs
Whenever an interrupt occurs, CPU performs the some hardware checks and start executing the following assembly instructions inkernel, whose pointer (offest in kernel code segment) is storedcorrestonding IDT entry.
File: include/asm-i386/hw_irq.h 155 #define BUILD_COMMON_IRQ() 156 asmlinkage void call_do_IRQ(void); 157 __asm__( 158 "\n" __ALIGN_STR"\n" 159 "common_interrupt:\n\t" 160 SAVE_ALL 161 SYMBOL_NAME_STR(call_do_IRQ)":\n\t" 162 "call " SYMBOL_NAME_STR(do_IRQ) "\n\t" 163 "jmp ret_from_intr\n"); 175 #define BUILD_IRQ(nr) 176 asmlinkage void IRQ_NAME(nr); 177 __asm__( 178 "\n"__ALIGN_STR"\n" 179 SYMBOL_NAME_STR(IRQ) #nr "_interrupt:\n\t" 180 "pushl $"#nr"-256\n\t" 181 "jmp common_interrupt");
This macros is used at the kernel initialization time to writeoutthelowest interrupt stubs, which can be called from IDT bysavingtheiroffsets (pointers) in IDT gates. Kernel maintains one globalarrayoffunction pointers (name of array is "interrupt") in which itstoresthepointer of these stubs. Code related to creation of thesestubs(usingabove mentioned BUILD_IRQ macro) and saving their pointersintheglobal array "interrupt[NR_IRQS]" can be seeninfile"arch/x86_64/kernel/i8259.c". In this file you will see theusageof BUILD_IRQ macro to create the interrupt stubs as follows:
File: arch/i386/kernel/i8259.c 40 #define BI(x,y) 41 BUILD_IRQ(x##y) 42 43 #define BUILD_16_IRQS(x) 44 BI(x,0) BI(x,1) BI(x,2) BI(x,3) 45 BI(x,4) BI(x,5) BI(x,6) BI(x,7) 46 BI(x,8) BI(x,9) BI(x,a) BI(x,b) 47 BI(x,c) BI(x,d) BI(x,e) BI(x,f) 48 49 /* 50 * ISA PIC or low IO-APIC triggered (INTA-cycle or APIC) interrupts: 51 * (these are usually mapped to vectors 0x20-0x2f) 52 */ 53 BUILD_16_IRQS(0x0) 54 55 #ifdef CONFIG_X86_IO_APIC 56 /* 57 * The IO-APIC gives us many more interrupt sources. Most of these 58 * are unused but an SMP system is supposed to have enough memory ... 59 * sometimes (mostly wrt. hw bugs) we get corrupted vectors all 60 * across the spectrum, so we really want to be prepared to get all 61 * of these. Plus, more powerful systems might have more than 64 62 * IO-APIC registers. 63 * 64 * (these are usually mapped into the 0x30-0xff vector range) 65 */ 66 BUILD_16_IRQS(0x1) BUILD_16_IRQS(0x2) BUILD_16_IRQS(0x3) 67 BUILD_16_IRQS(0x4) BUILD_16_IRQS(0x5) BUILD_16_IRQS(0x6) BUILD_16_IRQS(0x7) 68 BUILD_16_IRQS(0x8) BUILD_16_IRQS(0x9) BUILD_16_IRQS(0xa) BUILD_16_IRQS(0xb) 69 BUILD_16_IRQS(0xc) BUILD_16_IRQS(0xd) 70 #endif 71 72 #undef BUILD_16_IRQS 73 #undef BI
Above code actually creates the interrupt stubs and do not place therepointers in interrupt[NR_IRQS] array. The code which places the pointersof these stubs in global array is as follows and can be found in samefile "arch/x86_64/kernel/i8259.c"
File: arch/i386/kernel/i8259.c 100 #define IRQ(x,y) 101 IRQ##x##y##_interrupt 102 103 #define IRQLIST_16(x) 104 IRQ(x,0), IRQ(x,1), IRQ(x,2), IRQ(x,3), 105 IRQ(x,4), IRQ(x,5), IRQ(x,6), IRQ(x,7), 106 IRQ(x,8), IRQ(x,9), IRQ(x,a), IRQ(x,b), 107 IRQ(x,c), IRQ(x,d), IRQ(x,e), IRQ(x,f) 108 109 void (*interrupt[NR_IRQS])(void) = { 110 IRQLIST_16(0x0), 111 112 #ifdef CONFIG_X86_IO_APIC 113 IRQLIST_16(0x1), IRQLIST_16(0x2), IRQLIST_16(0x3), 114 IRQLIST_16(0x4), IRQLIST_16(0x5), IRQLIST_16(0x6), IRQLIST_16(0x7), 115 IRQLIST_16(0x8), IRQLIST_16(0x9), IRQLIST_16(0xa), IRQLIST_16(0xb), 116 IRQLIST_16(0xc), IRQLIST_16(0xd) 117 #endif 118 }; 119 120 #undef IRQ 121 #undef IRQLIST_16
Above code actually fills the global array of functionpointers(array name interrupt[NR_IRQS]). Once the global array isinitialized with the pointers to interrupt stubs, we initialize theIDT(Interrupt Descriptor Table) in function "init_IRQ()"using thisglobalarray as follows:
File: arch/i386/kernel/i8259.c, Function: init_IRQ() 457 for (i = 0; i < NR_IRQS; i++) { 458 int vector = FIRST_EXTERNAL_VECTOR + i; 459 if (vector != SYSCALL_VECTOR) 460 set_intr_gate(vector, interrupt[i]); 461 }
In above loop, we loop over all the IDT enteries staringfrom"FIRST_EXTERNAL_VECTOR" (32, because first 32 enteries areforexception) and call "set_intr_gate()" function which actually settheinterrupt gate descriptor. For entry 128, which is for systemcallinvocation, interrupt gte is not set, for this rather trap gate issetand that is done in function trap_init(). In the samefunctioninit_IRQ(), after this looping, we initialize the IPI (Inter Processor Interrupts). These interrupts are sent from one CPU toanother CPU inSMP machines.
Now we can see once these IDTeneries are set,whenever an interrupt occurs, CPU directly jumps to thecode given inBUILD_IRQ macro. Now lets analyse what this macro do.Following is thecode for BUILD_IRQ macro:
File: include/asm-i386/hw_irq.h 175 #define BUILD_IRQ(nr) \ 176 asmlinkage void IRQ_NAME(nr); \ 177 __asm__( \ 178 "\n"__ALIGN_STR"\n" \ 179 SYMBOL_NAME_STR(IRQ) #nr "_interrupt:\n\t" \ 180 "pushl $"#nr"-256\n\t" \ 181 "jmp common_interrupt");
This assembly code first subtracts the IRQ number from 256 and pushesthe result on kernel stack. After doing this it jumps to"common_interrupt" assembly label, which simply saves the context ofinterrupted process (CPU resigters) on to kernel stack and then callsthe C language function "do_IRQ()".
Details of do_IRQ() function, core of Inteuupt Handling
do_IRQ() is the common function to all hardware interrupts. This function is the most important to understand from the prespective of interrupt handling. I will first show the code of whole function and then explain it line by line in coming paragraphs with line refferences.
File: arch/i386/kernel/irq.c 563 asmlinkage unsigned int do_IRQ(struct pt_regs regs) 564 { 565 /* 566 * We ack quickly, we don't want the irq controller 567 * thinking we're snobs just because some other CPU has 568 * disabled global interrupts (we have already done the 569 * INT_ACK cycles, it's too late to try to pretend to the 570 * controller that we aren't taking the interrupt). 571 * 572 * 0 return value means that this irq is already being 573 * handled by some other CPU. (or is disabled) 574 */ 575 int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_code */ 576 int cpu = smp_processor_id(); 577 irq_desc_t *desc = irq_desc + irq; 578 struct irqaction * action; 579 unsigned int status; 580 581 kstat.irqs[cpu][irq]++; 582 spin_lock(&desc->lock); 583 desc->handler->ack(irq); 584 /* 585 REPLAY is when Linux resends an IRQ that was dropped earlier 586 WAITING is used by probe to mark irqs that are being tested 587 */ 588 status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING); 589 status |= IRQ_PENDING; /* we _want_ to handle it */ 590 591 /* 592 * If the IRQ is disabled for whatever reason, we cannot 593 * use the action we have. 594 */ 595 action = NULL; 596 if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) { 597 action = desc->action; 598 status &= ~IRQ_PENDING; /* we commit to handling */ 599 status |= IRQ_INPROGRESS; /* we are handling it */ 600 } 601 desc->status = status; 602 603 /* 604 * If there is no IRQ handler or it was disabled, exit early. 605 Since we set PENDING, if another processor is handling 606 a different instance of this same irq, the other processor 607 will take care of it. 608 */ 609 if (!action) 610 goto out; 611 612 /* 613 * Edge triggered interrupts need to remember 614 * pending events. 615 * This applies to any hw interrupts that allow a second 616 * instance of the same irq to arrive while we are in do_IRQ 617 * or in the handler. But the code here only handles the _second_ 618 * instance of the irq, not the third or fourth. So it is mostly 619 * useful for irq hardware that does not mask cleanly in an 620 * SMP environment. 621 */ 622 for (;;) { 623 spin_unlock(&desc->lock); 624 handle_IRQ_event(irq, ®s, action); 625 spin_lock(&desc->lock); 626 627 if (!(desc->status & IRQ_PENDING)) 628 break; 629 desc->status &= ~IRQ_PENDING; 630 } 631 desc->status &= ~IRQ_INPROGRESS; 632 out: 633 /* 634 * The ->end() handler has to deal with interrupts which got 635 * disabled while the handler was running. 636 */ 637 desc->handler->end(irq); 638 spin_unlock(&desc->lock); 639 640 if (softirq_pending(cpu)) 641 do_softirq(); 642 return 1; 643
}
Here is the detailed explaination of do_IRQ() function, this has been explained below line by line.
Line - 575 to 577 Get the number of the interrupt that got triggered. Its pushed on the kernel stack before pushing the context of the interrupted process. Getthe processor or CPU id o which this code is being executed or in other means the CPU id of processor handling this interrupt. Get the pointer to the IRQ descriptor. IRQ descriptor is a kernel data structure which actually binds together the different ISRs (Interrupt Service Routines) registere by device drivers for same IRQ line. As mentioned earlier also, same IRQ line can be shared between different devices, so their device drivers need to register their own ISRs to handle the interrupts genetated by these devices. IRQ descriptor data structure is defined as follows:
typedef struct { unsigned int status; hw_irq_controller *handler; struct irqaction *action; unsigned int depth; spinlock_t lock; } ____cacheline_aligned irq_desc_t;
Following is the significance of different elements in this stucture:
- status : Its a bit mask of different flags to identify the state of a particular IRQ line. We will see the use of differnet flags ahead in this article.
- handler : This is the pointer to the structure, whose each element is the pointer to the function related to the handlingof physical PIC (programmable interrupt controller). These functions are used to mask/unmask particular interrup line in PIC or to acknowledge the interrupt to PIC. The definitions of these PIC related functions can be found in file "arch/i386/kernel/i8259.c"
- action : This element is the pointer to the list o ISRs registered by different device drivers for this IRQ line. When a device driver registers its ISR to kernel using kernel function "irq_request()", the ISR is added to this list for that particular IRQ line.
- lock : This is spinlock to handle the synchronization problem while accessing any element in IRQ descriptor. Kernel execution context access the different elements of IRQ descriptor, but before doing so they should acquire this spinlock so that the synchronization can be maintained.
Line - 581 to 583 Here we increment the interrupt count received by this CPU, this is maintained for accounting purpose. Hold the spinlock before accessing any element of the IRQ descriptor for our interrupt line. We also mask and acknowledge the interrupt to PIC using handler function of our IRQ descriptor.
Line - 588 to 589 Now we clear the IRQ_REPLAY and IRQ_WAITING flags from IRQ descriptor flag. As mentioned earlier this is used to maintain the status of an interrupt handling line. We clear these flags because now we are going to handle this interrupt will not be anymore in reply or waiting mode. Actually IRQ_WAITING flag is used by device drivers in conjunction with IRQ_AUTODETECT flag for auto-detecting the IRQ line to which their device is connected. Device drivers use the probe_irq_on() function, which actually sets the IRQ_AUTODETECT and IRQ_WAITING flag for all the IRQ descriptors for whom no ISR has yet been registered. After calling probe_irq_on() function, device driver in structs the device to trigger an interrupt and then calls probe_irq_off() function. probe_irq_off() function actually looks for those IRQ descriptors whose IRQ_AUTODETECT flag is still set butIRQ_WAITING flag has been cleared and returns the IRQ line number to device driver.
After clearing the IRQ_REPLAY and IRQ_WAITING flags in do_IRQ() function we set the IRQ_PENDING function. This is done, to indicate that we are planning to handle this interrupt if this interrupt is not disabled or not already being handled by another CPU (in case of SMP machines). The use of setting IRQ_PENDING flag is explained in details in next few lines.
As we have seen the interrupt and want to handle it by calling the set of ISRs (Interrupt Serive Routines) registered by different device drivers. We set IRQ_PENDING flag because seeing an interrupt does not mean we will for sure handle it. IRQ_PENDING flag helps us in following two cases:
- In case interrupt is disabled (set flag IRQ_DISABLED), we will not service the interrupt and will just keep it marked as pending (set flag IRQ_PENDING). Once the interrupt is again enabled (clear flag IRQ_DISABLED), ISRs will be called to service the interrupt. So IRQ_PENDING helps us to remember the intterupt which occured while that interrupt was disabled due to some reason.
Note: Here disabling interrupt does not mean masking a particular line at PIC level or disabling all the interrupt at CPU level by clearing the IF flag of CPU EFLAG register. Disabling here means the kernel has been asked not to service the interrupt, but the hardware triggering of interrupt signal is not being stopped at all.
- In case another CPU is already handling the previous interrupt requests on this IRQ line. In this case flag IRQ_INPROGRESS will already be set by that another CPU. Our role will be to just mark the interrupt as IRQ_PENDING and in away asks that other CPU to service this interrupt request also. When that CPU will finish its handling of previous interrupt, it will check this flag. Because of this flag being set by us, that CPU will again go and call all the ISRs once agian to service interrupt request we received on this IRQ line.
Line - 595 to 601 Now we check if this interrupt is not disabled (flag IRQ_DISABLED is clear) and at the same time is also not being handled by another CPU (flag IRQ_INPROGRESS is also clear), we go forward and clear the IRQ_PENDING flag and sets the IRQ_INPROGRESS flag to indicate that we take the responsibility of handling this interrupt request.Now while we are handling this interrupt request, lets see anotherCPU receives an interrupt on same IRQ line, that CPU will simple mark the IRQ_PENDING flag and will transfer his responsibility to us and in that case we(CPU we are executing on) will be responsible to serve that interrupt request also.
Line - 609 to 610 If there is no registered ISR for this IRQ line, we simply return from interrupt context after releasig the lock we hold and serving the softirqs (if any pending).
Line - 622 to 630 Now we are al set to call the registered ISRs (device driver's functions), so that they can figure out which device connected to this IRQ line has actually triggered the interrupt and can serve it poperly before calling the ISRs, we release the IRQ descriptor spinlock so that while we are executing the ISRs this spinlock can be acquired by another interrupt context, which may execute on another CPU for the same IRQ line. This interrupt context on another CPU will simply mark the IRQ_PENDING flag and return without handling the interrupt itself.In this infite loop we call the handle_IRQ_event() function which actually calls all the ISRs registrered for this IRQ line one byone. After completing the list of ISRs, we again acquire the IRQ descriptor spinlock as we need to again check and update the flag element of IRQ descriptor. After acquiring the spinlock, we check is the IRQ_PENDING flag is clear, we break out of this infite loop, else we clear the IRQ_PENDING flag of our IRQ descriptor and again going to handle_IRQ_event() function to serve the new interrupt request as indicated by IRQ_PENDING flag.
Line - 631 Finally we come out of the above mentioned infite loop only if there is not pending request for thie IRQ line. Once we are out, we are done with the most of the part, so we clear the IRQ_INPROGRESS flag.
Line - 637 to 638 Now we call the end function of PIC relatedfunctions stored in handler element of our IRQ descriptor. Thisfunction take care of the situation where the interrupt we werehandling got disabled while we were handling it. Lets see while we wereserving the interrupt by calling the ISRs for it, the interrupt gotdisabled (flag IRQ_DISABLED isset) by code running on another CPU, thenin this case we should not unmask the interrupt line (which we maskedby calling the PIC related ack() function, line 583). If the IRQ is notyet disabled,this function end() will simply unmask the interrupt lineat PIC level and return. After this we go ahead and do serve thepending softirqs (is any marked).
Parent Node : Exceptions and Interrupts Handling