For every physical address in Linux, there is a {{{struct page}}}. Struct page is a rather weak data type; it's very easy to look at (eg) {{{page->mapping}}} when the page is actually a tail page, and so does not have a mapping. Folios are the beginning of separating out some of the roles of struct page. Conceptually, folios take the contents of struct page (except the tail page parts) and move them into struct folio. That isn't what the patchset ''actually'' does, because I'm not enough of a masochist to make all those changes in one go. We can (and should) go further. We need to understand how memory is (currently) allocated and used. Any memory mapped to userspace needs to have dirty & locked bits, have a refcount and a mapcount. || Purpose || Notes || || Free || Belongs to the buddy allocator. Not mappable to userspace || || Slab || Not mappable to userspace || || Page table || Not mappable to userspace || || vmalloc || May be mapped || || kernel stack || Currently allocated by vmalloc, but should never be mapped || || File cache || May be mapped || || Anon || May be mapped || || net pool || May be mapped || || zsmalloc || Not mappable to userspace || || kernel text || Can we map this through /dev/kmem or something? || || kernel data || Mapping this is a security hole? || || ZONE_DEVICE || This is a disaster area || || offline || Logically offline, eg balloon || || guard || see debug_pagealloc || || arbitrary || Many device drivers just allocate pages and map them to userspace || || arbitrary || Many more device drivers allocate pages from the buddy allocator and ''don't'' map them to userspace. These should probably be converted to use the slab allocator, but this is a job for someone who understands each driver well. || Memory allocated to slab is low-hanging fruit. Preliminary patch available here: https://lore.kernel.org/lkml/YUpaTBJ%2FJhz15S6a@casper.infradead.org/ Page tables are probably the next obvious thing to split out. At that point, though, I lack vision for splitting out any of the other things. So we end up with: {{{ struct page { unsigned long flags; unsigned long compound_head; union { struct { /* First tail page only */ unsigned char compound_dtor; unsigned char compound_order; atomic_t compound_mapcount; unsigned int compound_nr; }; struct { /* Second tail page only */ atomic_t hpage_pinned_refcount; struct list_head deferred_list; }; unsigned long padding1[4]; }; unsigned int padding2[2]; #ifdef CONFIG_MEMCG unsigned long padding3; #endif #ifdef WANT_PAGE_VIRTUAL void *virtual; #endif #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS int _last_cpupid; #endif }; struct slab { ... slab specific stuff here ... }; struct page_table { ... pgtable stuff here ... }; struct folio { unsigned long flags; union { struct { struct list_head lru; struct address_space *mapping; pgoff_t index; void *private; }; struct { ... net pool here ... }; struct { ... zone device here ... }; }; atomic_t _mapcount; atomic_t _refcount; #ifdef CONFIG_MEMCG unsigned long memcg_data; #endif }; }}} (nb: I haven't added the rcu_head anywhere; not sure what needs it right now. maybe just slab?) Eventually, I hope to get to the point where we dynamically allocate the {{{struct folio}}}, {{{struct slab}}} and other memory descriptors [1]. struct page then contains only a single unsigned long, pointing to the memory descriptor that page belongs to. This can be typed (by the usual bottom few bits). It shrinks the overhead of memmap from 1.6% of memory to 0.2% of memory. There's a bootstrapping problem here (to allocate the first struct slab, eg), but I'm sure it's not insurmountable. I'm also not sure what the buddy allocator looks like at this point, if it can't use struct page to store its metadata (or maybe it can still use this newly shrunken struct page to store enough metadata?) [1] Every memory descriptor struct (even those that aren't mappable to userspace) must be TYPESAFE_BY_RCU as there are lockless page table walkers that might see a stale reference to a memory descriptor.