Saturday, May 29, 2021

Compile-time pre-calculations in C++

With C++17’s constexpr functions and C++20’s consteval specifier, it is easy to do io-independent pre-calculations of algorithms while compiling the program. This may not be useful or even possible in long running programs and is unlikely to make a difference in their performance, but in binaries that do short calculations with a set of parameters fixed at compile-time, a Sieve of Eratosthenes array, or roots of unity used for calculating DFT loaded right from the binary could make a difference.

All code samples can be found at github.com/farnasirim/compile-time-cpp. There are also godbolt and cppinsights links for each snippet.

C++98

Here’s an example of a compile-time is_prime implementation under -std=c++98.

// \file compile-time-cpp/is-prime-98.cc
#include <iostream>
  
// HasDivisorGE::value = true if v is divisible by d or any number greater
// than or equal to d.
template<int v, int d>
struct HasDivisorGE {
  static const bool value = (v % d == 0) || HasDivisorGE<v, d + 1>::value;
};
  
template<int v>
struct HasDivisorGE<v, v> {
  static const bool value = false;
};
  
// IsPrime::value = true if v is prime, flase otherwise.
template<int v>
struct IsPrime {
  static const bool value = !HasDivisorGE<v, 2>::value;
};
  
int main() {
  std::cout << 7 << " : " << IsPrime<7>::value << std::endl;
  
  // fatal error: template instantiation depth exceeds maximum of 900
  // (use ‘-ftemplate-depth=’ to increase the maximum)
  // std::cout << 2000 << " : " << IsPrime<2000>::value << std::endl;
  return 0;
}

https://cppinsights.io/s/cebfe32c

https://godbolt.org/z/GGYGEoqsv

We are sweeping all values from 2 to a - 1 to check if any of them divide a, which is not efficient. To improve that, we can implement a compile-time sqrt using binary search or Newton’s method. In the following examples, I’ll stick to the full sweep all the way to a - 1 for simplification, but the square root improvement can be plugged in in all of them.

The first problem with this approach is readability. Tedious template syntax make the code harder to understand and simple modifications might cause many lines to change. In addition to that, implementing the actual Sieve algorithm with good performance in a top-down, recursive, template-friendly format requires more effort. The code will also look a lot more convoluted.

A more serious problem is the maximum depth issue. In this case, a bit disappointing as well, given that we wouldn’t need to look past HasDivisorGE<2000, 2> to see that 2000 is not prime.

The following snippets are takes on eliminating these two problems.

C++14

Equipped with constexpr and enable_if, let’s try to fix the depth issue. Here is a try at limiting the template instantiation when it is not needed.

// \file compile-time-cpp/is-prime-14.cc
#include <iostream>
  
template<int v, int d, typename = std::enable_if_t<(d <= v)>>
struct HasDivisorGE {
  static const bool value = (d < v) &&
    ((v % d == 0) || HasDivisorGE<v, d + 1>::value);
};
  
template<int v>
struct IsPrime {
  static const bool value = !HasDivisorGE<v, 2>::value;
};
  
int main() {
  // In instantiation of ‘const bool HasDivisorGE<7, 7, void>::value’:
  //   recursively required from ‘const bool HasDivisorGE<7, 3, void>::value’
  //   required from ‘const bool HasDivisorGE<7, 2, void>::value’
  //   required from ‘const bool IsPrime<7>::value’
  //   required from here
  // error: no type named ‘type’ in ‘struct std::enable_if<false, void>’
  //       |     ((v % d == 0) || HasDivisorGE<v, d + 1>::value);
  //                                                      ^~~~~
  std::cout << 7 << " : " << IsPrime<7>::value << std::endl;
  
  // fatal error: template instantiation depth exceeds maximum of 900
  // (use ‘-ftemplate-depth=’ to increase the maximum)
  std::cout << 2000 << " : " << IsPrime<2000>::value << std::endl;
  return 0;
}

https://cppinsights.io/s/a6034994

https://godbolt.org/z/bqKhzs3a6

This won’t compile. In HasDivisorGE<v, d>::value with say, [v = 7, d = 7], we already know that (d < v) is false, however the lazy evaluation of && helps with the value of HasDivisorGE<v, d + 1>::value if it was already there, not with the initialization of the class template HasDivisorGE with [v = 7, d = 8]. Even though we already know the result at [v = 7, d = 7], we haven’t been able to keep the template instantiation from propagating further.

As a result, this approach also fails to keep HasDivisorGE<2000, 2> from exceeding the maximum instantiation depth. std::enable_if imposes the limitation after an instantiation is requested. We somehow need to stop the “unnecessary” instantiation from being requested in the first place.

C++17

Using an immediately invoked function expression (to turn multiple statements into a single expression) and if constexpr, we can do:

// \file compile-time-cpp/is-prime-17.cc
#include <iostream>
  
template<int v, int d>
struct HasDivisorGE {
  static constexpr bool value = []{
    if constexpr (d >= v) {
      return false;
    } else if constexpr (v % d == 0) {
      return true;
    } else {
      return HasDivisorGE<v, d + 1>::value;
    }
  }();
};
  
template<int v>
struct IsPrime {
  static const bool value = !HasDivisorGE<v, 2>::value;
};
  
int main() {
  std::cout << 7 << " : " << IsPrime<7>::value << std::endl;
  std::cout << 2000 << " : " << IsPrime<2000>::value << std::endl;
  
  // fatal error: template instantiation depth exceeds maximum of 900
  // (use ‘-ftemplate-depth=’ to increase the maximum)
  // std::cout << 2003 << " : " << IsPrime<2003>::value << std::endl;
  
  return 0;
}

https://cppinsights.io/s/735420fe

https://godbolt.org/z/Ysb1Th7cf

Good, the code inside if constexpr just has to be well-formed. If the condition is false, the enclosed code will not be evaluated. IsPrime<2003> still exhausts the depth however. To mitigate that, we can use constexpr functions:

// \file compile-time-cpp/is-prime-17-constexpr-func.cc
#include <iostream>
  
constexpr bool is_prime(int v) {
  for (int i = 2; i < v; i++) {
    if (v % i == 0) {
      return false;
    }
  }
  return true;
}
  
template<int v>
struct IsPrime {
  static constexpr bool value = is_prime(v);
};
  
int main() {
  std::cout << 7 << " : " << IsPrime<7>::value << std::endl;
  std::cout << 2000 << " : " << IsPrime<2000>::value << std::endl;
  std::cout << 2003 << " : " << IsPrime<2003>::value << std::endl;
  
  return 0;
}

https://cppinsights.io/s/f75ad687

https://godbolt.org/z/5qeMo5ozW

Where struct IsPrime is left in to keep main() similar to previous examples. The allowed number of loop iterations is a limitation here, similar to the maximum depth issue. However for evaluating a single value per template instantiation, the default compiler settings for maximum number of iterations seem to be more permissive than the maximum instantiation depth.

C++20

When a function is marked with consteval, every potential call to it must produce compile-time results. This way, we can force functions to be compile-time evaluated, making them uncallable from the a non-constexpr context. Contrary to constexpr functions, non-constexpr expressions are not allowed in consteval function bodies. consteval functions appear to be faster in this case as well. In the Sieve example, a consteval function generates the sieve array ~2 times faster than a constexpr function (g++ 10.2.0 on Linux).

// \file compile-time-cpp/is-prime-20-consteval-func.cc
#include <iostream>
#include <array>
  
template<int v>
consteval std::array<int, v + 1> sieve() {
  std::array<int, v + 1> arr = {};
  for(long long i = 2; i <= v; i++) {
    if(arr[i]) {
      continue;
    }
    for(long long j = i * i; j <= v; j+= i) {
      arr[j] = 1;
    }
  }
  return arr;
}
  
int main() {
  auto sieve_array = sieve<12345>();
  std::cout << 7 << " : " << sieve_array[7] << std::endl;
  std::cout << 2000 << " : " << sieve_array[2000]<< std::endl;
  std::cout << 2003 << " : " << sieve_array[2003]<< std::endl;
  
  size_t i = 0;
  std::cin >> i;
  std::cout << sieve_array[i] << std::endl;
  
  return 0;
}

https://cppinsights.io/s/d06a597a

https://godbolt.org/z/sa3x3oWqd

Here, we can end up doing a lot of duplicate work in compile time if we are not careful since for example sieve<1001> cannot use sieve<1000>’s result, but this can be resolved or improved by creating an indirection level over the sieve template. On the other hand, the array easily escapes to run-time land, allowing it to be used in non-constexpr contexts.

hackernews thread