Performance loss when initializing variables in constexpr functions

Performance loss when initializing variables in constexpr functions - c++

According to this question it's impossible to leave variables uninitialized inside a constexpr function. Sometimes for performance reasons, we do not want to intialize variables though. Is it possible to "overload" the function somehow, so it allows for a constexpr version and a higher-performance non-constexpr function?
As an example, consider the following add function in a custom class vec:
auto add(vec that) const {
vec ret;
for (int i = 0; i < n; i++)
ret[i] = (*this)[i] + that[i];
return ret;
}
constexpr auto add(vec that) const {
vec ret = {};
for (int i = 0; i < n; i++)
ret[i] = (*this)[i] + that[i];
return ret;
}

The C++ compiler is very good at optimizations, especially inside constexpr functions. The initialization will very likely be optimized and have no additional cost, and in your case it doesn't even matter since declaring a vector already initializes it to an empty vector.

Related

C++ Lambda Overhead

I have an O(N^4) scaling algorithm of the form
...
...
...
for (unsigned i = 0; i < nI; ++i) {
for (unsigned j = 0; j < nJ; ++j) {
for (unsigned k = 0; k < nK; ++k) {
for (unsigned l = 0; l < nL; ++l) {
*calculate value*
*do something with value*
}
}
}
}
I need this code in a couple of places so I put it a looper function as part of a class. This loop function is templated so that it can accept a lambda function which takes care of *do something with value*.
Some tests have shown that this is not optimal performance-wise but I do not have any idea on how to get around explicitly writing out this code every time I need it. Do you see a way of doing this?

Using a templated function to call the lambda should generate a code that can be optimized by modern optimizing compilers. It is actually the case for the last version of GCC, Clang and MSVC. You can check that on GodBolt with this code:
extern int unknown1();
extern int unknown2(int);
template <typename LambdaType>
int compute(LambdaType lambda, int nI, int nJ, int nK, int nL)
{
int sum = 0;
for (unsigned i = 0; i < nI; ++i) {
for (unsigned j = 0; j < nJ; ++j) {
for (unsigned k = 0; k < nK; ++k) {
for (unsigned l = 0; l < nL; ++l) {
sum += lambda(i, j, k, l);
}
}
}
}
return sum;
}
int caller(int nI, int nJ, int nK, int nL)
{
int context = unknown1();
auto lambda = [&](int i, int j, int k, int l) -> int {
return unknown2(context + i + j + k + l);
};
return compute(lambda, nI, nJ, nK, nL);
}
Using optimization flags, GCC, Clang and MSVC are capable of generating an efficient implementation of compute eliding the lambda calls in the 4 nested loops (unknown2 is directly called in the generated assembly). This is the case even if compute is not inlined. Note the fact that the lambda capture its context do not actually prevent optimisations (although this is much harder for the compiler to optimize this case).
Note that this is important not to use the direct lambda type and not wrappers like std::function as wrapper will likely prevent optimizations (or at least make optimizations much more difficult to apply) resulting in direct function calls. Indeed, the type help the compiler to inline the function and then apply further optimizations like vectorization and constant propagation.
Note that the code of the lambda should be kept small. Otherwise, it may not be inlined resulting in a function call. A direct function call is not so slow with if the function body is pretty big on modern processors because of good branch prediction units and relatively fast large caches. However, the cost of preventing further optimizations mostly possible due to the lambda inlining can be huge. One way to mitigate this cost is to move at least one loop in the lambda (see Data-oriented design for more information). Another solution is to use OpenMP to help the compiler vectorizing the lambda thanks to #pragma omp declare simd [...] directives (assuming your compiler supports it). You can also play with compiler inlining command-line parameters to tell your compiler to actually inline the lambda in such a case.

Is it possible in C++ to use the same code with and without compile time constants?

say you have a function like:
double do_it(int m)
{
double result = 0;
for(int i = 0; i < m; i++)
result += i;
return result;
}
If you know m at compile time you can do:
template<size_t t_m>
double do_it()
{
double result = 0;
for(int i = 0; i < t_m; i++)
result += i;
return result;
}
This gives a possibility for things like loop unrolling when optimizing. But, sometimes you might know some cases at compile-time and some at run-time. Or, perhaps you have defaults which a user could change...but it would be nice to optimize the default case.
I'm wondering if there is any way to provide both versions without basically duplicating the code or using a macro?
Note that the above is a toy example to illustrate the point.

In terms of the language specification, there's no general way to have a function that works in the way you desire. But that doesn't mean compilers can't do it for you.
This gives a possibility for things like loop unrolling when optimizing.
You say this as though the compiler cannot unroll the loop otherwise.
The reason the compiler can unroll the template loop is because of the confluence of the following:
The compiler has the definition of the function. In this case, the function definition is provided (it's a template function, so its definition has to be provided).
The compiler has the compile-time value of the loop counter. In this case, through the template parameter.
But none of these factors explicitly require a template. If the compiler has the definition of a function, and it can determine the compile-time value of the loop counter, then it has 100% of the information needed to unroll that loop.
How it gets this information is irrelevant. It could be an inline function (you have to provide the definition) which you call given a compile-time constant as an argument. It could be a constexpr function (again, you have to provide the definition) which you call given a compile-time constant as an argument.
This is a matter of quality of implementation, not of language. If compile-time parameters are to ever be a thing, it would be to support things you cannot do otherwise, not to support optimization (or at least, not compiler optimizations). For example, you can't have a function which returns a std::array whose length is specified by a regular function parameter rather than a template parameter.

Yes you can, with std::integral_constant. Specifically, the following function will work with an int, as well as specializations of std::integral_constant.
template<class Num>
constexpr double do_it(Num m_unconverted) {
double result = 0.;
int m_converted = static_cast<int>(m_unconverted);
for(int i = 0; i < m_converted; i++){ result += i; }
return result;
}
If you want to call do_it with a compile-time constant, then you can use
constexpr double result = do_it(std::integral_constant<int, 5>{});
Otherwise, it's just
double result = do_it(some_number);

Use constexpr (needs at least C++14 to allow for):
constexpr double do_it(int m)
{
double result = 0;
for(int i = 0; i < m; i++)
result += i;
return result;
}
constexpr double it_result = do_it(10) + 1; // compile time `do_it`, possibly runtime `+ 1`
int main() {
int x;
cin >> x;
do_it(x); // runtime
}
If you want to force a constexpr value to be inlined as part of a runtime expression, you can use the FORCE_CT_EVAL macro from this comment:
#include <utility>
#define FORCE_CT_EVAL(func) [](){constexpr auto ___expr = func; return std::move(___expr);}()
double it_result = FORCE_CT_EVAL(do_it(10)); // compile time

Why type deduction for auto specifier cares about only init field of the for-loop?

The following example seems to be very easy and straightforward:
void ftest(size_t& arg)
{
std::cout << arg << '\n';
}
int main()
{
size_t max = 5;
for (auto i = 0; i < max; ++i)
ftest(i);
}
but it won't compile (at least using VS2013) because the i is deduced as int and not as size_t. And the question is -- what is the point of auto in such for-loops if it can't rely on the conditional field? Would it be too much hard and time consuming if a compile analyze the whole statement and give expected result instead of what we're having now?

Because the type of variable is determined when declared (from its initializer), it has nothing do with how it will be used. If necessary type conversion would be considered. The rule is same as variables declared with type specified explicitly, auto is just helping you to deduce the type, it's not special.
Try to consider about this:
auto max = 5u;
for (auto i = 0; i < max; ++i)
// ~~~~~~~~
// i should be unsigned int, or max should be int ?
BTW: You could use decltype if you want the type to be determined by the conditional field max:
for (decltype(max) i = 0; i < max; ++i)

Keyword auto has nothing do with the rest of for statement, neither it knows about it. If you say it should infer from max, you are saying delay the type deduction, which won't comply with the auto type-inference rules.
Additionally, what about this?
size_t max = 5;
short min = 1;
for (auto i = 0; i < max && i > min; ++i)
Should it infer to short or size_t ? You cannot make compiler to read your mind!
Also, such delayed inference rules (if any) would complicate templates meta-programming.

You are actually providing a very simple case, where the condition is a simple i < max where the type of max is known. The standard tries to provide rules that apply in all cases, now let's consider this:
bool f(int);
bool f(size_t);
for (auto i = 0; f(i); ++i) { }
If the type of i was dependent on the conditional expression in the for loop, your compiler will probably not be happy.
Also, Herb Sutter as a small post on its blog about this issue actually: https://herbsutter.com/2015/01/14/reader-qa-auto-and-for-loop-index-variables/

How do I deal with "signed/unsigned mismatch" warnings (C4018)?

I work with a lot of calculation code written in c++ with high-performance and low memory overhead in mind. It uses STL containers (mostly std::vector) a lot, and iterates over that containers almost in every single function.
The iterating code looks like this:
for (int i = 0; i < things.size(); ++i)
{
// ...
}
But it produces the signed/unsigned mismatch warning (C4018 in Visual Studio).
Replacing int with some unsigned type is a problem because we frequently use OpenMP pragmas, and it requires the counter to be int.
I'm about to suppress the (hundreds of) warnings, but I'm afraid I've missed some elegant solution to the problem.
On iterators. I think iterators are great when applied in appropriate places. The code I'm working with will never change random-access containers into std::list or something (so iterating with int i is already container agnostic), and will always need the current index. And all the additional code you need to type (iterator itself and the index) just complicates matters and obfuscates the simplicity of the underlying code.

It's all in your things.size() type. It isn't int, but size_t (it exists in C++, not in C) which equals to some "usual" unsigned type, i.e. unsigned int for x86_32.
Operator "less" (<) cannot be applied to two operands of different sign. There's just no such opcodes, and standard doesn't specify, whether compiler can make implicit sign conversion. So it just treats signed number as unsigned and emits that warning.
It would be correct to write it like
for (size_t i = 0; i < things.size(); ++i) { /**/ }
or even faster
for (size_t i = 0, ilen = things.size(); i < ilen; ++i) { /**/ }

Ideally, I would use a construct like this instead:
for (std::vector<your_type>::const_iterator i = things.begin(); i != things.end(); ++i)
{
// if you ever need the distance, you may call std::distance
// it won't cause any overhead because the compiler will likely optimize the call
size_t distance = std::distance(things.begin(), i);
}
This a has the neat advantage that your code suddenly becomes container agnostic.
And regarding your problem, if some library you use requires you to use int where an unsigned int would better fit, their API is messy. Anyway, if you are sure that those int are always positive, you may just do:
int int_distance = static_cast<int>(distance);
Which will specify clearly your intent to the compiler: it won't bug you with warnings anymore.

If you can't/won't use iterators and if you can't/won't use std::size_t for the loop index, make a .size() to int conversion function that documents the assumption and does the conversion explicitly to silence the compiler warning.
#include <cassert>
#include <cstddef>
#include <limits>
// When using int loop indexes, use size_as_int(container) instead of
// container.size() in order to document the inherent assumption that the size
// of the container can be represented by an int.
template <typename ContainerType>
/* constexpr */ int size_as_int(const ContainerType &c) {
const auto size = c.size(); // if no auto, use `typename ContainerType::size_type`
assert(size <= static_cast<std::size_t>(std::numeric_limits<int>::max()));
return static_cast<int>(size);
}
Then you write your loops like this:
for (int i = 0; i < size_as_int(things); ++i) { ... }
The instantiation of this function template will almost certainly be inlined. In debug builds, the assumption will be checked. In release builds, it won't be and the code will be as fast as if you called size() directly. Neither version will produce a compiler warning, and it's only a slight modification to the idiomatic loop.
If you want to catch assumption failures in the release version as well, you can replace the assertion with an if statement that throws something like std::out_of_range("container size exceeds range of int").
Note that this solves both the signed/unsigned comparison as well as the potential sizeof(int) != sizeof(Container::size_type) problem. You can leave all your warnings enabled and use them to catch real bugs in other parts of your code.

You can use:
size_t type, to remove warning messages
iterators + distance (like are first hint)
only iterators
function object
For example:
// simple class who output his value
class ConsoleOutput
{
public:
ConsoleOutput(int value):m_value(value) { }
int Value() const { return m_value; }
private:
int m_value;
};
// functional object
class Predicat
{
public:
void operator()(ConsoleOutput const& item)
{
std::cout << item.Value() << std::endl;
}
};
void main()
{
// fill list
std::vector<ConsoleOutput> list;
list.push_back(ConsoleOutput(1));
list.push_back(ConsoleOutput(8));
// 1) using size_t
for (size_t i = 0; i < list.size(); ++i)
{
std::cout << list.at(i).Value() << std::endl;
}
// 2) iterators + distance, for std::distance only non const iterators
std::vector<ConsoleOutput>::iterator itDistance = list.begin(), endDistance = list.end();
for ( ; itDistance != endDistance; ++itDistance)
{
// int or size_t
int const position = static_cast<int>(std::distance(list.begin(), itDistance));
std::cout << list.at(position).Value() << std::endl;
}
// 3) iterators
std::vector<ConsoleOutput>::const_iterator it = list.begin(), end = list.end();
for ( ; it != end; ++it)
{
std::cout << (*it).Value() << std::endl;
}
// 4) functional objects
std::for_each(list.begin(), list.end(), Predicat());
}

C++20 has now std::cmp_less
In c++20, we have the standard constexpr functions
std::cmp_equal
std::cmp_not_equal
std::cmp_less
std::cmp_greater
std::cmp_less_equal
std::cmp_greater_equal
added in the <utility> header, exactly for this kind of scenarios.
Compare the values of two integers t and u. Unlike builtin comparison operators, negative signed integers always compare less than (and not equal to) unsigned integers: the comparison is safe against lossy integer conversion.
That means, if (due to some wired reasons) one must use the i as integer, the loops, and needs to compare with the unsigned integer, that can be done:
#include <utility> // std::cmp_less
for (int i = 0; std::cmp_less(i, things.size()); ++i)
{
// ...
}
This also covers the case, if we mistakenly static_cast the -1 (i.e. int)to unsigned int. That means, the following will not give you an error:
static_assert(1u < -1);
But the usage of std::cmp_less will
static_assert(std::cmp_less(1u, -1)); // error

I can also propose following solution for C++11.
for (auto p = 0U; p < sys.size(); p++) {
}
(C++ is not smart enough for auto p = 0, so I have to put p = 0U....)

I will give you a better idea
for(decltype(things.size()) i = 0; i < things.size(); i++){
//...
}
decltype is
Inspects the declared type of an entity or the type and value category
of an expression.
So, It deduces type of things.size() and i will be a type as same as things.size(). So,
i < things.size() will be executed without any warning

I had a similar problem. Using size_t was not working. I tried the other one which worked for me. (as below)
for(int i = things.size()-1;i>=0;i--)
{
//...
}

I would just do
int pnSize = primeNumber.size();
for (int i = 0; i < pnSize; i++)
cout << primeNumber[i] << ' ';

Taking integer as parameter and returning an array

I want to create a function that takes an integer as it's parameter and returns an array in C++. This is the pseudo-code I had in mind:
function returnarray(integer i)
{
integer intarr[i];
for (integer j = 0; j < i; j++) { intarr[j] = j; }
return intarr;
}
I tried the common way of declaring returnarray as function* returning a pointer, but then I can't take an integer as my parameter. I also can't assign j to intarr[j]. I'd really like to avoid making a pointer to an int just so I could use the parameter.
Is there any way of doing this and being able to assign j to intarr[j] without making a pointer for it?
EDIT:
forgot to write that I want to avoid vectors. I use them only if I really have to! (my reasons are mine).
Thanks :D

You can't return a stack-allocated array- it's going to go out of scope and the memory deallocated. In addition, C++ does not allow stack-allocated variable-length arrays. You should use a std::vector.
std::vector<int> returnarray(int i) {
std::vector<int> ret(i);
for(int j = 0; j < i; j++) ret[j] = j;
return ret;
}

your code isn't even near valid c++ so i assume you're total beginner
use std::vector
#include <vector>
std::vector<int> yourFunction( int n )
{
std::vector<int> result;
for( int i = 0; i < n; ++i )
{
result.push_back( i );
}
return result;
}
Disclaimer: code untouched by compilers' hands.
Cheers & hth.,

Two remarks, before using the excellent #DeadMG 's solution:
1) You never want to avoid vectors. If v is a vector, and you really want a pointer, you can always have a pointer to the first element by writing &v[0].
2) You can't return an array. You will return a pointer to a fresh memory zone that you'll have to delete once finished with it. Vectors are only arrays with an automatic deletion facility, so you won't leak memory.

You need to use dynamic memory. Something like this
int* returnArray(int size) {
int* array = new int[size];
for(int i = 0; i < size; ++i)
array[i] = i;
return array;
}

Not that I'd particularly recommend this approach in general, but you can use templates to do this without resorting to dynamic memory allocation. Unfortunately you can't return arrays from functions, so you'd need to return a struct with the array inside.
template <int N>
struct int_array_type {
int ints[N];
};
template <int N>
int_array_type<N> returnarray() {
int_array_type<N> a;
for (int i = 0; i < N; ++i)
a.ints[i] = i;
return a;
}
...
int_array_type<10> u = returnarray<10>();
std::copy(u.ints, u.ints+sizeof(u.ints)/sizeof(u.ints[0]),
std::ostream_iterator<int>(std::cout, "\n"));

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Performance loss when initializing variables in constexpr functions - c++

The C++ compiler is very good at optimizations, especially inside constexpr functions. The initialization will very likely be optimized and have no additional cost, and in your case it doesn't even matter since declaring a vector already initializes it to an empty vector.

Related

C++ Lambda Overhead

Is it possible in C++ to use the same code with and without compile time constants?

Why type deduction for auto specifier cares about only init field of the for-loop?

How do I deal with "signed/unsigned mismatch" warnings (C4018)?

Taking integer as parameter and returning an array

Categories

Resources