Are constexpr evaluated on target platform?

Are constexpr evaluated on target platform? - c++

I wonder if, for example, evaluated compiled on a little endian platform will return true on a big endian target platform.
constexpr bool is_little_endian()
{
int num = 1;
return (1 == *(char *)&num);
}
In other words, are constexpr evaluated as if on the target?
EDIT: This example isn't correct, but the question is still active.

First off: If you compile code for a given target, then the compiler will generate code for that target. This, of course, includes expressions that are evaluated at compile-time - otherwise every cross compilation that involved such expressions would be broken.
However, just marking a function as constexpr does not guarantee that it is evaluated at compile-time. In particular, your sample function cannot (according to the standard) be evaluated at compile-time, so it is orthogonal to the primary question.
As remarked in the comments, you can't really find out endianness at compile-time without querying the compiler directly. The compiler has to know (because it has to generate code) and any reasonable compiler will provide a way for you to query this information (at compile-time).

This is not a valid constexpr function as it has reinterpret_cast baked into it. This makes the whole question moot.
And the reason why this is not valid constexpr function is outlined here: https://en.cppreference.com/w/cpp/language/constexpr. In particular, constexpr function should satisfy, among others, following criteria:
...there exists at least one set of argument values such that an
invocation of the function could be an evaluated subexpression of a
core constant expression
reinterpret_cast can never be a part of core constant expression.

Yes they are. If you have a C++20 compiler available (probably -std=c++2a) you can try to compile this for platforms with different endianess and see that it behaves correctly.
#include <bit>
#include <iostream>
constexpr bool are_all_scalar_types_little_endian() {
return std::endian::native == std::endian::little;
}
constexpr bool are_all_scalar_types_big_endian() {
return std::endian::native == std::endian::big;
}
int main() {
std::cout << std::boolalpha
<< "little: " << are_all_scalar_types_little_endian() << '\n'
<< "big : " << are_all_scalar_types_big_endian() << '\n'
<< "mixed : " <<
(are_all_scalar_types_little_endian()==are_all_scalar_types_big_endian()) << '\n';
}

Related

Why can't I pass a constexpr function to std::cout? [duplicate]

This question already has answers here:
When does a constexpr function get evaluated at compile time?
(2 answers)
Closed 2 years ago.
This code, when compiled with g++ -O3, does not seem to evaluate get_fibonacci(50) at compile time - as it runs for a very long time.
#include <iostream>
constexpr long long get_fibonacci(int num){
if(num == 1 || num == 2){return 1;}
return get_fibonacci(num - 1) + get_fibonacci(num - 2);
}
int main()
{
std::cout << get_fibonacci(50) << std::endl;
}
Replacing the code with
#include <iostream>
constexpr long long get_fibonacci(int num){
if(num == 1 || num == 2){return 1;}
return get_fibonacci(num - 1) + get_fibonacci(num - 2);
}
int main()
{
long long num = get_fibonacci(50);
std::cout << num << std::endl;
}
worked perfectly fine. I don't know exactly why this is occurring, but my guess is that get_fibonacci(50) is not evaluated at compile-time in the first scenario because items given std::cout are evaluated at runtime. Is my reasoning correct, or is something else happening? Can somebody please point me in the right direction?

Actually, both versions of your code do not have the Fibonnaci number computed at compile-time, with typical compilers and compilation flags. But, interestingly enough, if you reduce the 50 to be, say, 30, both versions of your program do have the compile-time evaluation.
Proof: GodBolt
At the link, your first program is compiled and run first with 50 as the argument to get_fibbonacci(), then with 30, using GCC 10.2 and clang 11.0.
What you're seeing is the limits of the compiler's willingness to evaluate code at compile-time. Both compilers engage in the recursive evaluation at compile time - until a certain depth, or certain evaluation time cap, has elapsed. They then give up and leave it for run-time evaluation.

I don't know exactly why this is occurring, but my guess is that get_fibonacci(50) is not evaluated at compile-time in the first scenario because items given std::cout are evaluated at runtime
Your function can be computed compile-time, because receive a compile-time know value (50), but can also computed run-time, because the returned value is send to standard output so it's used run-time.
It's a gray area where the compiler can choose both solutions.
To impose (ignoring the as-if rule) the compile-time computation, you can place the returned value in a place where the value is required compile-time.
For example, in a template parameter, in your first example
std::cout << std::integral_constant<long long, get_fibonacci(50)>::value
<< std::endl;
or in a constexpr variable, in your second example
constexpr long long num = get_fibonacci(50);
But remember there is the "as-if rule", so the compiler (in this case, also using constexpr or std::integral_constant) can select the run-time solution because this "do not change the observable behavior of the program".

Assign to a constexpr to get the compiler to spit out an error message
constexpr auto val = get_fibonacci(50);

constexpr functions are evaluated at compile time only in constexpr context, which includes assignment to constexpr variables, template parameter, array size...
Regular function/operator call is not such context.
std::cout << get_fibonacci(50);
is done at runtime.
Now, compiler might optimize any (constexpr or not, inline or not) functions with the as-if rule, resulting in a constant, a simpler loop, ...

Inline version of a function returns different value than non-inline version

How can two versions of the same function, differing only in one being inline and the other one not, return different values? Here is some code I wrote today and I am not sure how it works.
#include <cmath>
#include <iostream>
bool is_cube(double r)
{
return floor(cbrt(r)) == cbrt(r);
}
bool inline is_cube_inline(double r)
{
return floor(cbrt(r)) == cbrt(r);
}
int main()
{
std::cout << (floor(cbrt(27.0)) == cbrt(27.0)) << std::endl;
std::cout << (is_cube(27.0)) << std::endl;
std::cout << (is_cube_inline(27.0)) << std::endl;
}
I would expect all outputs to be equal to 1, but it actually outputs this (g++ 8.3.1, no flags):
1
0
1
instead of
1
1
1
Edit: clang++ 7.0.0 outputs this:
0
0
0
and g++ -Ofast this:
1
1
1

Explanation
Some compilers (notably GCC) use higher precision when evaluating expressions at compile time. If an expression depends only on constant inputs and literals, it may be evaluated at compile time even if the expression is not assigned to a constexpr variable. Whether or not this occurs depends on:
The complexity of the expression
The threshold the compiler uses as a cutoff when attempting to perform compile time evaluation
Other heuristics used in special cases (such as when clang elides loops)
If an expression is explicitly provided, as in the first case, it has lower complexity and the compiler is likely to evaluate it at compile time.
Similarly, if a function is marked inline, the compiler is more likely to evaluate it at compile time because inline functions raise the threshold at which evaluation can occur.
Higher optimization levels also increase this threshold, as in the -Ofast example, where all expressions evaluate to true on gcc due to higher precision compile-time evaluation.
We can observe this behavior here on compiler explorer. When compiled with -O1, only the function marked inline is evaluated at compile-time, but at -O3 both functions are evaluated at compile-time.
-O1: https://godbolt.org/z/u4gh0g
-O3: https://godbolt.org/z/nVK4So
NB: In the compiler-explorer examples, I use printf instead iostream because it reduces the complexity of the main function, making the effect more visible.
Demonstrating that inline doesn’t affect runtime evaluation
We can ensure that none of the expressions are evaluated at compile time by obtaining value from standard input, and when we do this, all 3 expressions return false as demonstrated here: https://ideone.com/QZbv6X
#include <cmath>
#include <iostream>
bool is_cube(double r)
{
return floor(cbrt(r)) == cbrt(r);
}
 
bool inline is_cube_inline(double r)
{
return floor(cbrt(r)) == cbrt(r);
}
int main()
{
double value;
std::cin >> value;
std::cout << (floor(cbrt(value)) == cbrt(value)) << std::endl; // false
std::cout << (is_cube(value)) << std::endl; // false
std::cout << (is_cube_inline(value)) << std::endl; // false
}
Contrast with this example, where we use the same compiler settings but provide the value at compile-time, resulting in the higher-precision compile-time evaluation.

As observed, using the == operator to compare floating point values has resulted in different outputs with different compilers and at different optimization levels.
One good way to compare floating point values is the relative tolerance test outlined in the article: Floating-point tolerances revisited.
We first calculate the Epsilon (the relative tolerance) value which in this case would be:
double Epsilon = std::max(std::cbrt(r), std::floor(std::cbrt(r))) * std::numeric_limits<double>::epsilon();
And then use it in both the inline and non-inline functions in this manner:
return (std::fabs(std::floor(std::cbrt(r)) - std::cbrt(r)) < Epsilon);
The functions now are:
bool is_cube(double r)
{
double Epsilon = std::max(std::cbrt(r), std::floor(std::cbrt(r))) * std::numeric_limits<double>::epsilon();
return (std::fabs(std::floor(std::cbrt(r)) - std::cbrt(r)) < Epsilon);
}
bool inline is_cube_inline(double r)
{
double Epsilon = std::max(std::cbrt(r), std::floor(std::cbrt(r))) * std::numeric_limits<double>::epsilon();
return (std::fabs(std::round(std::cbrt(r)) - std::cbrt(r)) < Epsilon);
}
Now the output will be as expected ([1 1 1]) with different compilers and at different optimization levels.
Live demo

What optimizations are enabled by non-type template parameters?

I found this example at cppreference.com, and it seems to be the defacto example used through-out StackOverflow:
template<int N>
struct S {
int a[N];
};
Surely, non-type templatization has more value than this example. What other optimizations does this syntax enable? Why was it created?
I am curious, because I have code that is dependent on the version of a separate library that is installed. I am working in an embedded environment, so optimization is important, but I would like to have readable code as well. That being said, I would like to use this style of templating to handle version differences (examples below). First, am I thinking of this correctly, and MOST IMPORTANTLY does it provide a benefit or drawback over using a #ifdef statement?
Attempt 1:
template<int VERSION = 500>
void print (char *s);
template<int VERSION>
void print (char *s) {
std::cout << "ERROR! Unsupported version: " << VERSION << "!" << std::endl;
}
template<>
void print<500> (char *s) {
// print using 500 syntax
}
template<>
void print<600> (char *s) {
// print using 600 syntax
}
OR - Since the template is constant at compile time, could a compiler consider the other branches of the if statement dead code using syntax similar to:
Attempt 2:
template<int VERSION = 500>
void print (char *s) {
if (VERSION == 500) {
// print using 500 syntax
} else if (VERSION == 600) {
// print using 600 syntax
} else {
std::cout << "ERROR! Unsupported version: " << VERSION << "!" << std::endl;
}
}
Would either attempt produce output comparable in size to this?
void print (char *s) {
#if defined(500)
// print using 500 syntax
#elif defined(600)
// print using 600 syntax
#else
std::cout << "ERROR! Unsupported version: " << VERSION << "!" << std::endl;
#endif
}
If you can't tell I'm somewhat mystified by all this, and the deeper the explanation the better as far as I'm concerned.

Compilers find dead code elimination easy. That is the case where you have a chain of ifs depending (only) on a template parameter's value or type. All branches must contain valid code, but when compiled and optimized the dead branches evaporate.
A classic example is a per pixel operation written with template parameters that control details of code flow. The body can be full of branches, yet the compiled output branchless.
Similar techniques can be used to unroll loops (say scanline loops). Care must be taken to understand the code size multiplication that can result: especially if your compiler lacks ICF (aka comdat folding) such as the gold gcc linker and msvc (among others) have.
Fancier things can also be done, like manual jump tables.
You can do pure compile time type checks with no runtime behaviour at alll stuff like dimensional analysis. Or distinguish between points and vectors in n-space.
Enums can be used to name types or switches. Pointers to functions to enable efficient inlining. Pointers to data to allow 'global' state that is mockable, or siloable, or decoupled from implementation. Pointers to strings to allow efficient readable names in code. Lists of integral values for myriads of purposes, like the indexes trick to unpack tuples. Complex operations on static data, like compile time sorting of data in multiple indexes, or checking integrity of static data with complex invariants.
I am sure I missed some.

An obvious optimization is when using an integer, the compiler has a constant rather than a variable:
int foo(size_t); // definition not visible
// vs
template<size_t N>
size_t foo() {return N*N;}
With the template, there's nothing to compute at runtime, and the result may be used as a constant, which can aid other optimizations. You can take this example further by declaring it constexpr, as 5gon12eder mentioned below.
Next example:
int foo(double, size_t); // definition not visible
// vs
template<size_t N>
size_t foo(double p) {
double r(p);
for (size_t i(0) i < N; ++i) {
r *= p;
}
return r;
}
Ok. Now the number of iterations of the loop is known. The loop may be unrolled/optimized accordingly, which can be good for size, speed, and eliminating branches.
Also, basing off your example, std::array<> exists. std::array<> can be much better than std::vector<> in some contexts, because std::vector<> uses heap allocations and non-local memory.
There's also the possibility that some specializations will have different implementations. You can separate those and (potentially) reduce other referenced definitions.
Of course, templates<> can also work against you unnecessarily duplication of your programs.
templates<> also require longer symbol names.
Getting back to your version example: Yes, it's certainly possible that if VERSION is known at compilation, the code which is never executed can be deleted and you may also be able to reduce referenced functions. The primary difference will be that void print (char *s) will have a shorter name than the template (whose symbol name includes all template parameters). For one function, that's counting bytes. For complex programs with many functions and templates, that cost can go up quickly.

There is an enormous range of potential applications of non-typename template parameters. In his book The C++ Programming Language, Stroustrup gives an interesting example that sketches out a type-safe zero-overhead framework for dealing with physical quantities. Basically, the idea is that he writes a template that accepts integers denoting the powers of fundamental physical quantities such as length or mass and then defines arithmetic on them. In the resulting framework, you can add speed with speed or divide distance by time but you cannot add mass to time. Have a look at Boost.Units for an industry-strength implementation of this idea.
For your second question. Any reasonable compiler should be able to produce exactly the same machine code for
#define FOO
#ifdef FOO
do_foo();
#else
do_bar();
#endif
and
#define FOO_P 1
if (FOO_P)
do_foo();
else
do_bar();
except that the second version is much more readable and the compiler can catch errors in both branches simultaneously. Using a template is a third way to generate the same code but I doubt that it will improve readability.

Keyword "auto" near critical points [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How much is too much with C++0x auto keyword
I find using "auto" near critical points maybe cause some problems.
This is the example code:
#include <iostream>
#include <typeinfo>
#include <limits>
using std::cout;
using std::endl;
using std::numeric_limits;
using std::cerr;
int main() {
auto i = 2147483647 /* numeric_limits<int>::max() */ ;
cout << "The type of i is " << typeid(i).name() << endl;
int count = 0;
for (auto i = 2147483647;
i < 2147483657 /* numeric_limits<int>::max() + 10 */ ; ++i) {
cout << "i = " << i << " " << endl;
if (count > 30) {
cerr << "Too many loops." << endl;
break;
}
++count;
}
return 0;
}
The "auto" decides the type of "i" is integer, but the upper limit of integer is 2147483647, that's easily overflow.
That's the outputs on Ideone(gcc-4.5.1) and LWS(gcc-4.7.2). They're different: "i" remains 2147483647 in the loops on Ideone(gcc-4.5.1) and overflows on LWS(gcc-4.7.2). But none of them is the expecting result: 10 cycles, +1 every time.
Should I avoid to use "auto" near critical points? Or How I use "auto" appropriately here?
UPDATE: Someone says "Use auto everywhere you can." in this thread you tell me. I don't think that's quite right. Type "long long int" is more appropriate the type "int" here. I wonder where I can use "auto" safely, where can't.
UPDATE 2: The solution 4(b) of the article by Herb Sutter should have answered the question.

You should only rely on type deduction to work out the type of your variables if it's going to be correct. Here, the compiler makes the deduction that it's an int, which is right as far as the standard is concerned, but your specific problem requires another type with a larger range. When you use auto, you're saying "the compiler knows best", but the compiler doesn't always know everything.
You wouldn't use auto here, just as you wouldn't use int. You could make your literal have higher rank (stick L or LL after it - although they're not guaranteed to be any larger than your int) and then auto would deduce a larger integral type.
Not to mention that auto really saves you nothing in this case. auto is usually used to avoid typing long, ugly types or types that you don't know. In this case, the type is not long and ugly, and you do know it.

auto is just a syntactic sugar. It isn't a type, it just infers what type the right side is supposed to be and decides that variable's type by that.
If you give it literals, it will just infer the default type it is given by the compiler.
You just need to know what the actual type is.

An numeric literal (without the decimal point) is always int unless you explicitly change its type.
int x = 2147483657; // 2147483657 is treated as an int.
// If it does not fit tough it will be truncated
// according to standard rules.
long x = 2147483657L; // The L suffix tells the compiler to treat it as a long.
// Here you will get the correct value assuming long is larger than int.
In your case:
for(auto i = 2147483647;i < 2147483657;) // is not going to work as i is always
// an int and literal overflows.
// Try correct types:
for(auto i = 2147483647L; i < 2147483657L;++i) //Now it should work correctly.

You are expecting too much out of auto. Your expectation is that auto will automatically deduce the type which is best for the manipulation that you are going to perform on your variable. This is semantic analysis and compilers are not expected to do that (most often, they cannot). They can't look forward into the way you are going to use the variable you declare later on in your program.
The auto keyword only saves you from the burden of explicitly writing on the left the type of the expression appearing on the right, avoiding possible redundancy and all problems connected with it (what if the type of the expression on the right changes?)
This said, all other answers are correct: if you want your variable i not to overflow, you should assign to it a long long literal (using the LL postfix).

constexpr question, why do these two different programs run in such a different amount of time with g++?

I'm using gcc 4.6.1 and am getting some interesting behavior involving calling a constexpr function. This program runs just fine and straight away prints out 12200160415121876738.
#include <iostream>
extern const unsigned long joe;
constexpr unsigned long fib(unsigned long int x)
{
return (x <= 1) ? 1 : (fib(x - 1) + fib(x - 2));
}
const unsigned long joe = fib(92);
int main()
{
::std::cout << "Here I am!\n";
::std::cout << joe << '\n';
return 0;
}
This program takes forever to run and I've never had the patience to wait for it to print out a value:
#include <iostream>
constexpr unsigned long fib(unsigned long int x)
{
return (x <= 1) ? 1 : (fib(x - 1) + fib(x - 2));
}
int main()
{
::std::cout << "Here I am!\n";
::std::cout << fib(92) << '\n';
return 0;
}
Why is there such a huge difference? Am I doing something wrong in the second program?
Edit: I'm compiling this with g++ -std=c++0x -O3 on a 64-bit platform.

joe is an Integral Constant Expression; it must be usable in array bounds. For that reason, a reasonable compiler will evaluate it at compile time.
In your second program, even though the compiler could calculate it at compile time, there's no reason why it must.

My best guess would be that program number one is had fib(92) evaluated at compile time, with lots of tables and stuff for the compiler to keep track of what values has already been evaluated... making running the program almost trivial,
Where as the second version is actually evaluated at run-time without lookup tables of evaluated constant expressions, meaning that the evaluating of fib(92) makes something like 2**92 recursive calls.
In other words the compiler does not optimize the fact that fib(92) is a constant expression.

There's wiggle room for the compiler to decide not to evaluate at compile time if it thinks something is "too complicated". That's in cases where it's not being absolutely forced to do the evaluation in order to generate a correct program that can actually be run (as #MSalters points out).
I thought perhaps the decision affecting compile-time laziness would be the recursion depth limit. (That's suggested in the spec as 512, but you can bump it up with the command line flag -fconstexpr-depth if you wanted it to.) But rather that would control it giving up in any cases...even when a compile time constant was necessary to run the program. So no effect on your case.
It seems if you want a guarantee in the code that it will do the optimization then you've found a technique for that. But if constexpr-depth can't help, I'm not sure if there are any relevant compiler flags otherwise...

I also wanted to see how gcc did optimize the code for this new constexpr keyword, and actually it's just because you are calling fib(92) as parameter of the ofstream::operator<<
::std::cout << fib(92) << '\n';
that it isn't evaluated at the compilation time, if you try calling it not as a parameter of another function (like you did in)
const unsigned long joe = fib(92);
it is evaluated at compile time, I did a blog post about this if you want more info, I don't know if this should be mentioned to gcc developers.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Are constexpr evaluated on target platform? - c++

Related

Why can't I pass a constexpr function to std::cout? [duplicate]

Inline version of a function returns different value than non-inline version

What optimizations are enabled by non-type template parameters?

Keyword "auto" near critical points [duplicate]

constexpr question, why do these two different programs run in such a different amount of time with g++?

Categories

Resources