For every virtual address in Linux, there is a struct page. Struct page is a rather weak data type; it's very easy to look at (eg) page->mapping when the page is actually a tail page, and so does not have a mapping. Folios are the beginning of separating out some of the roles of struct page. Conceptually, folios take the contents of struct page (except the tail page parts) and move them into struct folio. That isn't what the patchset actually does, because I'm not enough of a masochist to make all those changes.
We can (and should) go further. We need to understand how memory is (currently) allocated and used. Any memory mapped to userspace needs to have dirty & locked bits, have a refcount and a mapcount.
Purpose |
Notes |
Free |
Belongs to the buddy allocator. Not mappable to userspace |
Slab |
Not mappable to userspace |
Page table |
Not mappable to userspace |
vmalloc |
May be mapped |
kernel stack |
Currently allocated by vmalloc, but should never be mapped |
File cache |
May be mapped |
Anon |
May be mapped |
net pool |
May be mapped |
kernel text |
Can we map this through /dev/kmem or something? |
kernel data |
Mapping this is a security hole? |
ZONE_DEVICE |
This is a disaster area |
arbitrary |
Many device drivers just allocate pages and map them to userspace |
Memory allocated to slab is low-hanging fruit. Preliminary patch available here: https://lore.kernel.org/lkml/YUpaTBJ%2FJhz15S6a@casper.infradead.org/
Page tables are probably the next obvious thing to split out.
Eventually, I hope to get to the point where struct page contains only a single unsigned long, pointing to the "memory descriptor struct" that page belongs to. This can be typed (by the usual bottom few bits). Every memory descriptor struct (even those that aren't mappable to userspace) must be TYPESAFE_BY_RCU as there are lockless page table walkers that might see a stale reference to a memory descriptor.