2925
Comment:
|
← Revision 10 as of 2024-12-27 08:36:12 ⇥
4109
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
From here (May 2024), finish converting all filesystems to folios. This is very parallel. Once that has finished, rename {{{->mapping}}} to {{{->__folio_mapping}}} and {{{->index}}} to {{{->__folio_index}}} |
=== Immediate projects === |
Line 6: | Line 5: |
In parallel with this, pull struct page apart as has already been done for ptdesc, slab and zpdesc. The largest remaining piece is the netpool bump allocator. |
There are no dependencies between these projects. They can all be tackled by different people. If you want to work on one of them, please ask Matthew for an invite to the THP Cabal meeting for coordination purposes. |
Line 9: | Line 7: |
Once all this has landed, we can start dynamically allocating the various memdescs and point to them from every page's compound_head (instead of just the tail page). |
* Remove {{{page->lru}}} uses. This is really a set of many small projects. Some tactics can be shared between different projects, but this really requires looking into each usage and figuring out how to replace it. * Remove {{{page->mapping}}} uses. In a filesystem, this is converting to folios. I have a plan for movable_ops (but not a plan for its use of {{{->lru}}}, so ...) * Remove use of {{{&folio->page}}}. Sometimes this is using {{{folio_page(folio, 0)}}}; sometimes this is pushing folios into the called function (eg {{{block_commit_write()}}}). * --(Remove {{{bh->b_page}}} as it's also effectively a cast between page & folio.)-- * Split the pagepool bump allocator out of struct page, as has been done for, eg, slab and ptdesc. * Fix memcg_data so that slabs, folios and plain pages are each accounted appropriately. |
Line 12: | Line 14: |
Then we can shrink struct page to 32 bytes, | We may wish to do a "developer preview" where we just disable some modules without finishing the conversion so that people can evaluate the performance. === After those projects are complete === Then we can shrink struct page to 32 bytes: |
Line 49: | Line 55: |
For each memdesc (slab, folio, zpdesc, ptdesc, bump) in turn, we create a slab cache for it. Then we make {{{page->compound_head}}} point to the dynamically allocated memdesc rather than the first page. |
|
Line 51: | Line 59: |
For a folio, the usage is: | For a page in a folio, the usage is: |
Line 53: | Line 61: |
* flags (mostly read-only; PageAnonExclusive, PageHwPoison, maybe a couple of others) | * flags (mostly zone/node/section; PageAnonExclusive, PageHwPoison, maybe a couple of others?) |
Line 65: | Line 73: |
* Still a few dozen users of page->lru to remove. * Must remove folio->page * Some filesystems still working with pages. * movableops users need to use memdesc * can only support 12 memdescs as list_head.next is only 4-byte aligned (so we can use 1-3,5-7,9-11,13-15 but not 0,4,8,12) |
* Similarly get_page(), lock_page(), put_page(), unlock_page() go out of line and operate differently, depending if the page is a folio or not. * movable_ops users need to use memdesc * Can only support 12 memdescs as list_head.next is only 4-byte aligned (so we can use 1-3,5-7,9-11,13-15 but not 0,4,8,12) * Calling compound_head() on a page that belongs to a folio or slab is a BUG() * Calling page_folio() on a page which does not belong to a folio returns NULL == Page2026 == |
How do we get to memdescs in a series of bisectable, small and reviewable steps?
Immediate projects
There are no dependencies between these projects. They can all be tackled by different people. If you want to work on one of them, please ask Matthew for an invite to the THP Cabal meeting for coordination purposes.
Remove page->lru uses. This is really a set of many small projects. Some tactics can be shared between different projects, but this really requires looking into each usage and figuring out how to replace it.
Remove page->mapping uses. In a filesystem, this is converting to folios. I have a plan for movable_ops (but not a plan for its use of ->lru, so ...)
Remove use of &folio->page. Sometimes this is using folio_page(folio, 0); sometimes this is pushing folios into the called function (eg block_commit_write()).
Remove bh->b_page as it's also effectively a cast between page & folio.
- Split the pagepool bump allocator out of struct page, as has been done for, eg, slab and ptdesc.
- Fix memcg_data so that slabs, folios and plain pages are each accounted appropriately.
We may wish to do a "developer preview" where we just disable some modules without finishing the conversion so that people can evaluate the performance.
After those projects are complete
Then we can shrink struct page to 32 bytes:
struct folio { unsigned long flags; struct list_head lru; struct address_space *mapping; pgoff_t index; void *private; unsigned int _refcount; unsigned int _mapcount; unsigned int pincount; unsigned long pfn; struct mem_cgroup *memcg; };
struct page { unsigned long flags; union { struct list_head buddy_list; struct list_head pcp_list; struct { unsigned long memdesc; int _refcount; // 0 for folios }; }; union { unsigned long private; struct { int _mapcount; // only used for folios? }; }; };
For each memdesc (slab, folio, zpdesc, ptdesc, bump) in turn, we create a slab cache for it. Then we make page->compound_head point to the dynamically allocated memdesc rather than the first page.
In this 2025 world, we copy page->flags from the first page to the folio, but leave it intact to keep code calling page_zone() working the way it does today.
For a page in a folio, the usage is:
flags (mostly zone/node/section; PageAnonExclusive, PageHwPoison, maybe a couple of others?)
- memdesc (points to the folio)
- mapcount (per-page mapcount)
For accounted memory, there are several possibilities:
- If a folio, we use the memcg in the folio
- If slab, we use the objcg in struct slab
- If a plain page GFP_ACCOUNT, use the memdesc to point to the objcg
Notes:
PageTail() moves out of line. If folio, compare page_pfn() with folio->pfn. If slab, compare page_address() with slab->addr. Otherwise, see if memdesc is tail.
- Similarly get_page(), lock_page(), put_page(), unlock_page() go out of line and operate differently, depending if the page is a folio or not.
- movable_ops users need to use memdesc
- Can only support 12 memdescs as list_head.next is only 4-byte aligned (so we can use 1-3,5-7,9-11,13-15 but not 0,4,8,12)
- Calling compound_head() on a page that belongs to a folio or slab is a BUG()
- Calling page_folio() on a page which does not belong to a folio returns NULL
Page2026
The next step is to shrink struct page to 16 bytes,
struct page { union { struct list_head buddy_list; struct { unsigned long memdesc; unsigned long private; }; }; };
This will involve changes to page_zone(), page_to_nid() and so on.
After this, we can start working on removing accesses to page->private from device drivers. When that is finished, we can explore the various options presented in MatthewWilcox/BuddyAllocator