984
Comment:
|
2925
|
Deletions are marked like this. | Additions are marked like this. |
Line 6: | Line 6: |
In parallel with this, pull struct page apart as has already been done for ptdesc and slab. | In parallel with this, pull struct page apart as has already been done for ptdesc, slab and zpdesc. The largest remaining piece is the netpool bump allocator. |
Line 14: | Line 15: |
struct folio { unsigned long flags; struct list_head lru; struct address_space *mapping; pgoff_t index; void *private; unsigned int _refcount; unsigned int _mapcount; unsigned int pincount; unsigned long pfn; struct mem_cgroup *memcg; }; }}} {{{ |
|
Line 18: | Line 34: |
unsigned long compound_head; | struct list_head pcp_list; struct { unsigned long memdesc; int _refcount; // 0 for folios }; |
Line 20: | Line 40: |
unsigned long private; | union { unsigned long private; struct { int _mapcount; // only used for folios? }; }; |
Line 24: | Line 49: |
In this 2025 world, we copy page->flags from the first page to the folio, but leave it intact to keep code calling page_zone() working the way it does today. For a folio, the usage is: * flags (mostly read-only; PageAnonExclusive, PageHwPoison, maybe a couple of others) * memdesc (points to the folio) * mapcount (per-page mapcount) For accounted memory, there are several possibilities: * If a folio, we use the memcg in the folio * If slab, we use the objcg in struct slab * If a plain page GFP_ACCOUNT, use the memdesc to point to the objcg Notes: * PageTail() moves out of line. If folio, compare page_pfn() with folio->pfn. If slab, compare page_address() with slab->addr. Otherwise, see if memdesc is tail. * Still a few dozen users of page->lru to remove. * Must remove folio->page * Some filesystems still working with pages. * movableops users need to use memdesc * can only support 12 memdescs as list_head.next is only 4-byte aligned (so we can use 1-3,5-7,9-11,13-15 but not 0,4,8,12) The next step is to shrink struct page to 16 bytes, {{{ struct page { union { struct list_head buddy_list; struct { unsigned long memdesc; unsigned long private; }; }; }; }}} This will involve changes to page_zone(), page_to_nid() and so on. |
How do we get to memdescs in a series of bisectable, small and reviewable steps?
From here (May 2024), finish converting all filesystems to folios. This is very parallel. Once that has finished, rename ->mapping to ->__folio_mapping and ->index to ->__folio_index
In parallel with this, pull struct page apart as has already been done for ptdesc, slab and zpdesc. The largest remaining piece is the netpool bump allocator.
Once all this has landed, we can start dynamically allocating the various memdescs and point to them from every page's compound_head (instead of just the tail page).
Then we can shrink struct page to 32 bytes,
struct folio { unsigned long flags; struct list_head lru; struct address_space *mapping; pgoff_t index; void *private; unsigned int _refcount; unsigned int _mapcount; unsigned int pincount; unsigned long pfn; struct mem_cgroup *memcg; };
struct page { unsigned long flags; union { struct list_head buddy_list; struct list_head pcp_list; struct { unsigned long memdesc; int _refcount; // 0 for folios }; }; union { unsigned long private; struct { int _mapcount; // only used for folios? }; }; };
In this 2025 world, we copy page->flags from the first page to the folio, but leave it intact to keep code calling page_zone() working the way it does today.
For a folio, the usage is:
flags (mostly read-only; PageAnonExclusive, PageHwPoison, maybe a couple of others)
- memdesc (points to the folio)
- mapcount (per-page mapcount)
For accounted memory, there are several possibilities:
- If a folio, we use the memcg in the folio
- If slab, we use the objcg in struct slab
- If a plain page GFP_ACCOUNT, use the memdesc to point to the objcg
Notes:
PageTail() moves out of line. If folio, compare page_pfn() with folio->pfn. If slab, compare page_address() with slab->addr. Otherwise, see if memdesc is tail.
Still a few dozen users of page->lru to remove.
Must remove folio->page
- Some filesystems still working with pages.
- movableops users need to use memdesc
- can only support 12 memdescs as list_head.next is only 4-byte aligned (so we can use 1-3,5-7,9-11,13-15 but not 0,4,8,12)
The next step is to shrink struct page to 16 bytes,
struct page { union { struct list_head buddy_list; struct { unsigned long memdesc; unsigned long private; }; }; };
This will involve changes to page_zone(), page_to_nid() and so on.
After this, we can start working on removing accesses to page->private from device drivers. When that is finished, we can explore the various options presented in MatthewWilcox/BuddyAllocator