Most of the kernel's program code (text) is used via large pages (2M and 1G on x86). This makes each TLB entry that the kernel uses cover more space which means the CPU spends less time walking the page tables and more time doing other work.
However, the kernel module code is placed in small pages. Historically, large amounts of contiguous memory are difficult to allocate in the kernel. For instance, if a 500KB module came along, and we tried to allocate memory for it using kmalloc() (which uses large pages mostly), it would be relatively likely to fail, and we would not be able to load the module. vmalloc() gets around this problem by allocating a bunch of small pages and then stitching them all back together.
Your goal here is to find all of the cases of vmalloc_exec() and vfree() in kernel/module.c. Convert them to try to call alloc_pages_exact() instead of vmalloc_exec(). If alloc_pages_exact() fails, then fall back to vmalloc_exec(). kmalloc_section_memmap() is an example of similar code. Then, find all the instances of vfree() in kernel/module.c If the address is a vmalloc address, then call vfree(), otherwise call free_pages_exact().
On x86, there is a special area for mapping in the modules (see Documentation/x86/x86_64/mm.txt). We probably also need to convert this area to use large pages if possible.
Hint: If you do an operation more than once (if it looks like it was copied-and-pasted to two places in your code), then you need a helper function.
Note: This vmalloc()->alloc_pages_exact() transition is really only a small part of a larger effort. This only provides the possibility of getting modules in to large pages since we are now putting them in physically contiguous memory. We also need to ensure that there are no side-effects and that the modules get actually used via large pages.