MatthewWilcox/ImprovedReadaheadAlgorithm

Suppose the storage presented by your filesystem is connected through a 1Gbps link (eg PCIe gen 1 x1, or 1Gbps ethernet) and has a latency of 10ms response time. If you send it one read at a time, it will only service 100 requests per second. It has a maximum bandwidth of 120MB/s, so each read needs to be 1.2MB in order to saturate the link. If we keep 16 requests outstanding, each read need only be 75kB.

We need the core readahead code to handle this situation. My suggestion for doing this is to submit two extra readahead requests every time we hit a folio which has the uptodate flag clear (and have to wait for the previous readahead request to finish). The current readahead code only submits new read requests upon seeing a folio with the readahead flag set (assuming the readahead window predicts that folio should have the readahead flag set). We would retain that mechanism, but augment it with the extra requests upon finding that the device has not kept up with us.

It looks something like this (assuming the app is calling read() in increments of 4kB and is processing the data faster than storage can provide it). Items in bold are new.

1. read 0-4kB, no folio present.

set readahead size to 64kB
set readahead flag on folio at 32kB
send read for 0-64kB
wait for 0-64kB to complete
reads of 4k-32k will be satisfied from the page cache without interaction with the filesystem

2. read 32-36kB (readahead flag set), no reads outstanding

increase readahead size to 128kB (as it was successful)
clear readahead flag on folio at 32kB
set readahead flag on folio at 128kB
send read for 64k-192kB

3. read 64-68kB (uptodate flag clear), one read outstanding

send read for 192-320kB
send read for 320-448kB
wait for 64-192kB to complete

4. read 128-132kB (readahead flag set), two reads outstanding

increase readahead size to 256kB (as readahead was successful),
set readahead flag on folio at 256kB
send read for 448-704kB

5. read 192-196kB (uptodate flag clear), three reads outstanding

send read for 704-960kB
send read for 960-1216kB
wait for 192-320kB to complete

6. read 256-260kB (readahead flag set), four reads outstanding

256kB is the maximum readahead size; do not increase it
set readahead flag on folio at 384kB
send read for 1216-1472kB

7. read 320-324kB (uptodate flag clear), five reads outstanding

send read for 1472-1728kB
send read for 1728-1984kB
wait for 320-448kB to complete

8. read 384-388kB (readahead flag set), five reads outstanding

set readahead flag on folio at 640kB
send read for 1984-2240kB

9. read 448-452kB (uptodate flag clear), six reads outstanding

send read for 2240-2496kB
send read for 2496-2752kB
wait for 448-704kB to complete

10. read 576-580kB (readahead flag set), seven reads outstanding

set readahead flag on folio at 832kB
send read for 2752-3008kB

...

Once we stop seeing folios with the uptodate flag clear, we'll continue to see folios with the readahead flag set, and thus keep the number of readahead requests at the level it determined was necessary to keep the link saturated. I think we may need to put a parallelism cap in the bdi so that a device which is just slow instead of at the end of a long fat pipe doesn't get overwhelmed with requests. Or maybe we cap the maximum number of pages outstanding instead of the number of requests.

KernelNewbies: MatthewWilcox/ImprovedReadaheadAlgorithm (last edited 2022-01-20 14:17:12 by MatthewWilcox)