KernelNewbies:

This is part of the MatthewWilcox/Memdescs project.

Today, pages in the buddy allocator store the following information:

migratetype (stored in page->index) is only used by the PCP frontend to the buddy allocator and does not need to be retained by pages in the buddy allocator.

While Node/Section/Zone is initialised in page->flags, zone & node information is already part of the zone, so we can discard it on free and reinitialise it on allocation. Section can be inferred from the PFN.

Once we move to memdescs, we have a choice between storing all the information we need in struct page and allocating a 'struct buddy' for each page being managed by the buddy allocator. Allocating a struct buddy is feasible but adds a lot of complex interactions between the buddy allocator and the slab allocator to ensure that we can always allocate struct buddy in order to either allocate or free memory. I definitely prefer storing all necessary information directly in struct page.

First option is to do this:

struct buddy {
    unsigned long next;
    unsigned long prev;
};

struct page {
    union {
        unsigned long memdesc;
        struct buddy buddy;
    };
};

This gives us a 16 byte struct page. The 'next' field has its bottom four bits taken with the memdesc type for buddy pages. The bottom four bytes of 'prev' can be used to store the order (assuming we don't need to go past order 15 pages).

The second option is the same data structure, but store the PFN of the next/prev pages instead of the pointer to the struct page. That gives us a lot more bits to play with! On 32-bit, we can use 28 bits to support up to 1TB of memory (theoretically possible with ARM32, I believe). But we no longer have a limit of order-15 pages as we know that PFNs are naturally aligned, and so we can use a single bit to record what order the page is. And we have three bits left over! On 64-bit, we have space for 60 bits of PFN which works out to 4096 exabytes of memory (most 64-bit architectures can support PFNs up to about 51 bits).

The third option is an optional compression of option 2. Many 64-bit systems, like my laptop and my phone have less than 1TB of memory. So instead of using a pair of unsigned long, we can encode all of this into a single 8-byte integer:

bits

meaning

0-3

memdesc type buddy

4

order encoding

5-33

next

34-62

prev

63

unused

That is actually 29 bits, letting us support up to 2TB systems with an 8 byte memdesc. Assuming there's a decent Kconfig option to determine whether it's OK to decline to support the extra memory.

It is tempting to see if we can shrink memdesc to 4 bytes on 32-bit systems, but we only get 13 bits for prev/next, limiting us to a 32MB system, which is sufficiently rare not to be worth supporting.

KernelNewbies: MatthewWilcox/BuddyAllocator (last edited 2024-01-18 05:18:09 by MatthewWilcox)