Compile-time pre-calculations in C++
constexpr functions and C++20’s consteval specifier, it is easy to do io-independent pre-calculations of algorithms while compiling the program. This may not be useful or even possible in long running programs and is unlikely to make a difference in their performance, but in binaries that do short calculations with a set of parameters fixed at compile-time, a Sieve of Eratosthenes array, or roots of unity used for calculating DFT loaded right from the binary could make a difference.
All code samples can be found at github.com/farnasirim/compile-time-cpp. There are also godbolt and cppinsights links for each snippet.
Here’s an example of a compile-time
is_prime implementation under
We are sweeping all values from
a - 1 to check if any of them divide
a, which is not efficient. To improve that, we can implement a compile-time sqrt using binary search or Newton’s method. In the following examples, I’ll stick to the full sweep all the way to
a - 1 for simplification, but the square root improvement can be plugged in in all of them.
The first problem with this approach is readability. Tedious template syntax make the code harder to understand and simple modifications might cause many lines to change. In addition to that, implementing the actual Sieve algorithm with good performance in a top-down, recursive, template-friendly format requires more effort. The code will also look a lot more convoluted.
A more serious problem is the maximum depth issue. In this case, a bit disappointing as well, given that we wouldn’t need to look past
HasDivisorGE<2000, 2> to see that 2000 is not prime.
The following snippets are takes on eliminating these two problems.
enable_if, let’s try to fix the depth issue. Here is a try at limiting the template instantiation when it is not needed.
This won’t compile. In
HasDivisorGE<v, d>::value with say,
[v = 7, d = 7], we already know that
(d < v) is
false, however the lazy evaluation of
&& helps with the value of
HasDivisorGE<v, d + 1>::value if it was already there, not with the initialization of the class template
[v = 7, d = 8]. Even though we already know the result at [v = 7, d = 7], we haven’t been able to keep the template instantiation from propagating further.
As a result, this approach also fails to keep
HasDivisorGE<2000, 2> from exceeding the maximum instantiation depth.
std::enable_if imposes the limitation after an instantiation is requested. We somehow need to stop the “unnecessary” instantiation from being requested in the first place.
Using an immediately invoked function expression (to turn multiple statements into a single expression) and
if constexpr, we can do:
Good, the code inside
if constexpr just has to be well-formed. If the condition is false, the enclosed code will not be evaluated.
IsPrime<2003> still exhausts the depth however. To mitigate that, we can use constexpr functions:
struct IsPrime is left in to keep
main() similar to previous examples. The allowed number of loop iterations is a limitation here, similar to the maximum depth issue. However for evaluating a single value per template instantiation, the default compiler settings for maximum number of iterations seem to be more permissive than the maximum instantiation depth.
When a function is marked with
consteval, every potential call to it must produce compile-time results. This way, we can force functions to be compile-time evaluated, making them uncallable from the a non-constexpr context. Contrary to
constexpr functions, non-constexpr expressions are not allowed in
consteval function bodies.
consteval functions appear to be faster in this case as well. In the Sieve example, a
consteval function generates the sieve array ~2 times faster than a
constexpr function (g++ 10.2.0 on Linux).
Here, we can end up doing a lot of duplicate work in compile time if we are not careful since for example
sieve<1001> cannot use
sieve<1000>’s result, but this can be resolved or improved by creating an indirection level over the sieve template. On the other hand, the array easily escapes to run-time land, allowing it to be used in non-constexpr contexts.