Linux SAT integration
SAT solver technology has been advancing steadily over the last few years. A lot of these advancements can be explained by a series of competitions held in different areas which have not only spurred interest into the field but also concrete solutions. The state of the art of SAT solvers is now even shifting into hardware space, but yet remains part of ongoing research.
- Linux SAT integration
- Sorting functional dependencies between devices
- Sorting functional dependencies between routines
- Linux scheduler
- Review of state of the art on hardware assisted SAT solvers
The first taste of a SAT solver in Linux the kernel could be with integration and use of Kconfig. The interest and work for that is being documented separately on the kconfig-sat project page.
Sorting functional dependencies between devices
SAT solvers have been known to help with ordering package dependency problems, helping improve the not only if a request is satisfiable but also help with the most optimal solution. A SAT solver could potentially be used to help with optimizing asynchronous boot and device dependency management.
Sometimes Linux cannot annotate dependencies between devices though. This can cause a few problems or means having to implement hacks on the kernel. One known hack is the use of deferred probe, returning -EPROBE_DEFER on probe as a requirement is determined to not yet be available. Another is not being sure one can do suspend and resume in the right order. We need both semantics to address these shortcomings but also a way to figure out the optimal dependency order needed.
Defining device functional dependency
Functional dependency is defined as a requirement of driver B needing both device and driver A to be functional and present in order to work. This implies that the driver of A needs to be working for B to be probed successfully.
Linux deferred probe
When Linux boots some drivers currently return -EPROBE_DEFER on their device driver probe routine, this tells the driver core to put the device driver into a linked list on init and try re-probing later. The driver may use -EPROBE_DEFER as it may have not yet found a requirement present up yet. The driver does this as *sometimes* the kernel doesn't have dependency information available, or there is no way to currently annotate this. As an example an SDHCI driver may first need a GPIO line from an i2c GPIO controller before it can be initialized.
Linux suspend / resume
From a power management perspective A being functional dependent on B means that for suspend driver B to needs to wait for driver A to suspend. For resume it means driver A needs to wait for driver B to resume.
Current proposed solutions
Rafael J. Wysocki described the functional dependency problem best recently and also proposed a new mechanism to annotate functional relationship between devices. As Rafael proposes, each device driver can claim a possible set of consumers it can help or a set of suppliers it needs.
In Rafael's proposal there is the concept of two types of links. A permanent link and a probe time link. A permanent link will stay around until one of the linked devices is unregistered, at which time the driver core will drop the link along with the device going away. A probe-time link will be dropped (automatically) at the consumer device driver unbind time.
What a solution means
A proper solution to this problem should not only help avoid using -EPROBE_DEFER but also help with enabling better ordering for suspend and resume. While we don't have a count for for device drivers needing a solution from a power management perspective as it stands we have well over ~500 uses of -EPROBE_DEFER on device drivers.
Once a formal solution on dependency annotation is found both at device probe and suspend/resume, devices depending on others can wait until its suppliers are made available. The ordering will happen asynchronously at first though. A SAT solver might help with analysis of the dependency graph and help with ordering and optimize boot time by ordering device ordering as best as possible.
Sorting functional dependencies between routines
Linux distributions tend to build either a very modular kernel or a large kernel with a lot of device drivers available and features available to suit a diverse set of needs. This is done to enable a single kernel to support different set of needs. Over the years this has meant that Linux has had to implement a diverse set of techniques to help avoid having to build specialized kernels for specialized needs. For example the Linux kernel implements a binary patching, where it can patch critical code at run time, once it knows it needs a different technique for a specialized run time environment. As the Linux kernel grows in size, and as its used in a richer set of environments it also means the complexity to address these different needs increases. There are both few overlooked problems with this and there also possible optimizations both short term and long term.
Routine code functional dependency
Addressing dead code
One overlooked problem that enabling different complex dynamic run time environments has on Linux is it can enables large amounts of code to be remain available even once knows one has reached a state at run time that certain code should never run. This is best described at length on the post Avoiding dead code pv_ops is not the silver bullet.
Addressing this problem for early boot confines the problem set to small set of variables, enabling simple run time dependency and sorting techniques to be proposed. If this type of mechanism becomes desirable outside of the scope of early boot, the number of variables involved can increase in size, potentially making a SAT solver a reasonable consideration at run time. The semantics and technical mechanism for routine functional dependency and annotation can enable further experimentation with these ideas. The current proposed solution for x86 on early boot provides a lightweight solution for both dependency annotation and sorting, both through a linker solution and a simple run time sort should dependencies only be available to be inspected during run time. The linker solution provides a light weight annotation for when dependency annotations can be made early on boot without any run time consideration.
Code ordering and asynchronous boot
The Linux kernel boot varies depending on architecture. For simplicity's sake let's only focus on the x86 boot for now. The first x86 boot entry vary depending on what hypervisor you use, if any, or if you boot using bare metal. Reducing this picture to consider only x86-64 on bare metal, KVM, and Xen, this is best summarized as follows:
Bare metal, KVM, Xen HVM Xen PV / dom0 startup_64() startup_xen() \ / x86_64_start_kernel() xen_start_kernel() \ / x86_64_start_reservations() | start_kernel() [ ... ] [ setup_arch() ] [ ... ] init
The kernel iterates over these synchronously. Although asynchronous probe is supported at the driver level, this can typically happen safely at the driver (non-built-in) as core requirements typically are fulfilled by the time drivers are needed, and because module loading also has a symbol resolution dependency map which ensures symbols that a module uses must be satisfied and available first before use.
For early initialization the kernel remains synchronous. It may remain a theoretical consideration that perhaps asynchronous boot may help with booting times, but for at least some early boot areas in the kernel this has already helped. As an example you can review the recent work by Mel Gorman on on memory on init through commits 1e8ce83cd17fd0f549a7ad145ddd2bfcdd7dfe37..0e1cc95b4cc7293bb7b39175035e7f7e45c90977, please note these also have a few follow-on fixes.
Striving to help the kernel boot asynchronously is nothing new, it was tried long ago and folks gave up due to the amount of issues found. These days we now have asynchronous probe available, and if folks want to help further built-in drivers can also take advantage of asynchronous probe, and if issues are found these must be addressed. Finding dependency requirements in order to do asynchronous loading is part of the task at hand required. Solving long complex dependency maps is an expected problem if we want to bring asynchronous boot down to lower levels of the kernel. This requires both efforts on semantics on annotating dependencies and then a separate effort on solving dependencies and ordering them to optimize boot.
Reducing kernel size
If a mechanism put in place is to help free unused code late at boot time it can mean trimming the kernel size further than is possible today than by just freeing init annotated code. Profiling code with large memory and having annotations of only run code and associated kconfig symbols can also mean having the ability to determine an optimal required kernel post boot.
Long term: kernel correctness
One area could be in select_idle_sibling() where the schedular checks if the overall compute and NUMA accesses of the system would be improved if the source tasks were migrated to another target CPU. Such use would require a resolution in the thousands of CPU cycles time frame, and the size of the problems will vary depending on the number of CPUs, topology, and workloads.
Another optimization consideration would be the requirement for scheduling decisions to have all data locally avaiable, offloading such problems would very likely not be viable solution, for instance, so fully hardware assisted SAT solvers may be required.