KernelNewbies:

Suppose the storage presented by your filesystem is connected through a 1Gbps link (eg PCIe gen 1 x1, or 1Gbps ethernet) and has a latency of 10ms response time. If you send it one read at a time, it will only service 100 requests per second. It has a maximum bandwidth of 120MB/s, so each read needs to be 1.2MB in order to saturate the link. If we keep 16 requests outstanding, each read need only be 75kB.

We need the core readahead code to handle this situation. My suggestion for doing this is to submit two extra readahead requests every time we hit a folio which has the uptodate flag clear (and have to wait for the previous readahead request to finish). The current readahead code only submits new read requests upon seeing a folio with the readahead flag set (assuming the readahead window predicts that folio should have the readahead flag set). We would retain that mechanism, but augment it with the extra requests upon finding that the device has not kept up with us.

It looks something like this (assuming the app is calling read() in increments of 4kB and is processing the data faster than storage can provide it). Items in bold are new.

1. read 0-4kB, no folio present.

2. read 32-36kB (readahead flag set), no reads outstanding

3. read 64-68kB (uptodate flag clear), one read outstanding

4. read 128-132kB (readahead flag set), two reads outstanding

5. read 192-196kB (uptodate flag clear), three reads outstanding

6. read 256-260kB (readahead flag set), four reads outstanding

7. read 320-324kB (uptodate flag clear), five reads outstanding

8. read 384-388kB (readahead flag set), five reads outstanding

9. read 448-452kB (uptodate flag clear), six reads outstanding

10. read 576-580kB (readahead flag set), seven reads outstanding

...

Once we stop seeing folios with the uptodate flag clear, we'll continue to see folios with the readahead flag set, and thus keep the number of readahead requests at the level it determined was necessary to keep the link saturated. I think we may need to put a parallelism cap in the bdi so that a device which is just slow instead of at the end of a long fat pipe doesn't get overwhelmed with requests. Or maybe we cap the maximum number of pages outstanding instead of the number of requests.

KernelNewbies: MatthewWilcox/ImprovedReadaheadAlgorithm (last edited 2022-01-20 14:17:12 by MatthewWilcox)