The future of C++ alignment: passing by value? - c++

Reading the Eigen library documentation, I noticed that some objects cannot be passed by value. Are there any developments in C++11 or planned developments that will make it safe to pass such objects by value?
Also, why is there no problem with returning such objects by value?

It is entirely possible that Eigen is just a terribly written library (or just poorly-thought out); just because something is online doesn't make it true. For example:
Passing objects by value is almost always a very bad idea in C++, as this means useless copies, and one should pass them by reference instead.
This is not good advice in general, depending on the object. It is sometimes necessary pre-C++11 (because you might want an object to be uncopyable), but in C++11, it is never necessary. You might still do it, but it is never necessary to always pass a value by reference. You can just move it by value if it contains allocated memory or something. Obviously, if it's a "look-but-don't-touch" sort of thing, const& is fine.
Simple struct objects, presumably like Eigen's Vector2d are probably cheap enough to copy (especially in x86-64, where pointers are 64-bits) that the copy won't mean much in terms of performance. At the same time, it is overhead (theoretically), so if you're in performance critical code, it may help.
Then again, it may not.
The particular crash issue that Eigen seems to be talking about has to do with alignment of objects. However, most C++03 compiler-specific alignment support guarantees that alignment in all cases. So there's no reason that should "make your program crash!". I've never seen an SSE/AltaVec/etc-based library that used compiler-specific alignment declarations that caused crashes with value parameters. And I've used quite a few.
So if they're having some kind of crash problem with this, then I would consider Eigen to be of... dubious merit. Not without further investigation.
Also, if an object is unsafe to pass by value, as the Eigen docs suggest, then the proper way to handle this would be to make the object non-copy-constructable. Copy assignment would be fine, since it requires an already existing object. However, Eigen doesn't do this, which again suggests that the developers missed some of the finer points of API design.
However, for the record, C++11 has the alignas keyword, which is a standard way to declare that an object shall be of a certain alignment.
Also, why is there no problem with returning such objects by value?
Who says that there isn't (noting the copying problem, not the alignment problem)? The difference is that you can't return a temporary value by reference. So they're not doing it because it's not possible.

They could do this in C++11:
class alignas(16) Matrix4f
{
// ...
};
Now the class will always be aligned on a 16-byte boundary.
Also, maybe I'm being silly but this shouldn't be an issue anyway. Given a class like this:
class Matrix4f
{
public:
// ...
private:
// their data type (aligned however they decided in that library):
aligned_data_type data;
// or in C++11
alignas(16) float data[16];
};
Compilers are now obligated to allocate a Matrix4f on a 16-byte boundary anyway, because that would break it; the class-level alignas should be redundant. But I've been known to be wrong in the past, somehow.

Related

Why hasn't not_null made it into the C++ standard yet?

After adding the comment "// not null" to a raw pointer for the Nth time I wondered once again whatever happened to the not_null template.
The C++ core guidelines were created quite some time ago now and a few things have made into into the standard including for example std::span (some like string_view and std::array originated before the core guidelines themselves but are sometimes conflated). Given its relative simplicity why hasn't not_null (or anything similar) made it into the standard yet?
I scan the ISO mailings regularly (but perhaps not thoroughly) and I am not even aware of a proposal for it.
Possibly answering my own question. I do not recall coming across any cases where it would have prevented a bug in code I've worked on as we try not to write code that way.
The guidelines themselves are quite popular, making it into clang-tidy and sonar for example. The support libraries seem a little less popular.
For example boost has been available as a package on Linux from near the start. I am not aware of any implementation of GSL that is. Though, I presume it is bundled with Visual C++ on windows.
Since people have asked in the comments.
Myself I would use it to document intent.
A construct like not_null<> potentially has semantic value which a comment does not.
Enforcing it is secondary though I can see its place. This would preferably be done with zero overhead (perhaps at compile time only for a limited number of cases).
I was thinking mainly about the case of a raw pointer member variable. I had forgotten about the case of passing a pointer to a function for which I always use a reference to mean not-null and also to mean "I am not taking ownership".
Likewise (for class members) we could also document ownership owned<> not_owned<>.
I suppose there is also whether the associated object is allowed to be changed. This might be too high level though. You could use references members instead of pointers to document this. I avoid reference members myself because I almost always want copyable and assignable types. However, see for example Should I prefer pointers or references in member data? for some discussion of this.
Another dimension is whether another entity can modify a variable.
"const" says I promise not to modify it. In multi-threaded code we would like to say almost the opposite. That is "other code promises not to modify it while we are using it" (without an explicit lock) but this is way off topic...
There is one big technical issue that is likely unsolvable which makes standardizing not_null a problem: it cannot work with move-only smart pointers.
The most important use case for not_null is with smart pointers (for raw pointers a reference usually is adequate, but even then, there are times when a reference won't work). not_null<shared_ptr<T>> is a useful thing that says something important about the API that consumes such an object.
But not_null<unique_ptr<T>> doesn't work. It cannot work. The reason being that moving from a unique pointer leaves the old object null. Which is exactly what not_null is expected to prevent. Therefore, not_null<T> always forces a copy on its contained T. Which... you can't do with a unique_ptr, because that's the whole point of the type.
Being able to say that the unqiue_ptr consumed by an API is not null is good and useful. But you can't actually do that with not_null, which puts a hole in its utility.
So long as move-only smart pointers can't work with not_null, standardizing the class becomes problematic.

Can we detect "trivial relocatability" in C++17?

In future standards of C++, we will have the concept of "trivial relocatability", which means we can simply copy bytes from one object to an uninitialized chunk of memory, and simply ignore/zero out the bytes of the original object.
this way, we imitate the C-style way of copying/moving objects around.
In future standards, we will probably have something like std::is_trivially_relocatable<type> as a type trait. currently, the closest thing we have is std::is_pod<type> which will be deprecated in C++20.
My question is, do we have a way in the current standard (C++17) to figure out if the object is trivially relocatable?
For example, std::unique_ptr<type> can be moved around by copying its bytes to a new memory address and zeroing out the original bytes, but std::is_pod_v<std::unique_ptr<int>> is false.
Also, currently the standard mandate that every uninitialized chunk of memory must pass through a constructor in order to be considered a valid C++ object. even if we can somehow figure out if the object is trivially relocatable, if we just move the bytes - it's still UB according to the standard.
So another question is - even if we can detect trivial relocatability, how can we implement trivial relocation without causing UB? simply calling memcpy + memset(src,0,...) and casting the memory address to the right type is UB.
`
Thanks!
The whole point of trivial-relocatability would seem to be to enable byte-wise moving of objects even in the presence of a non-trivial move constructor or move assignment operator. Even in the current proposal P1144R3, this ultimately requires that a user manually mark types for which this is possible. For a compiler to figure out whether a given type is trivially-relocatable in general is most-likely equivalent to solving the halting problem (it would have to understand and reason about what an arbitrary, potentially user-defined move constructor or move assignment operator does)…
It is, of course, possible that you define your own is_trivially_relocatable trait that defaults to std::is_trivially_copyable_v and have the user specialize for types that should specifically be considered trivially-relocatable. Even this is problematic, however, because there's gonna be no way to automatically propagate this property to types that are composed of trivially-relocatable types…
Even for trivially-copyable types, you can't just copy the bytes of the object representation to some random memory location and cast the address to a pointer to the type of the original object. Since an object was never created, that pointer will not point to an object. And attempting to access the object that pointer doesn't point to will result in undefined behavior. Trivial-copyabibility means you can copy the bytes of the object representation from one existing object to another existing object and rely on that making the value of the one object equal to the value of the other [basic.types]/3.
To do this for trivially-relocating some object would mean that you have to first construct an object of the given type at your target location, then copy the bytes of the original object into that, and then modify the original object in a way equivalent to what would have happened if you had moved from that object. Which is essentially a complicated way of just moving the object…
There's a reason a proposal to add the concept of trivial-relocatability to the language exists: because you currently just can't do it from within the langugage itself…
Note that, despite all this, just because the compiler frontend cannot avoid generating constructor calls doesn't mean the optimizer cannot eliminate unnecessary loads and stores. Let's have a look at what code the compiler generates for your example of moving a std::vector or std::unique_ptr:
auto test1(void* dest, std::vector<int>& src)
{
return new (dest) std::vector<int>(std::move(src));
}
auto test2(void* dest, std::unique_ptr<int>& src)
{
return new (dest) std::unique_ptr<int>(std::move(src));
}
As you can see, just doing an actual move often already boils down to just copying and overwriting some bytes, even for non-trivial types…
Author of P1144 here; somehow I'm just seeing this SO question now!
std::is_trivially_relocatable<T> is proposed for some-future-version-of-C++, but I don't predict it'll get in anytime soon (definitely not C++23, I bet not C++26, quite possibly not ever). The paper (P1144R6, June 2022) ought to answer a lot of your questions, especially the ones where people are correctly answering that if you could already implement this in present-day C++, we wouldn't need a proposal. See also my 2019 C++Now talk.
Michael Kenzel's answer says that P1144 "ultimately requires that a user manually mark types for which [trivial relocation] is possible"; I want to point out that that's kind of the opposite of the point. The state of the art for trivial relocatability is manual marking ("warranting") of each and every such type; for example, in Folly, you'd say
struct Widget {
std::string s;
std::vector<int> v;
};
FOLLY_ASSUME_FBVECTOR_COMPATIBLE(Widget);
And this is a problem, because the average industry programmer shouldn't be bothered with trying to figure out if std::string is trivially relocatable on their library of choice. (The annotation above is wrong on 1.5 of the big 3 vendors!) Even Folly's own maintainers can't get these manual annotations right 100% of the time.
So the idea of P1144 is that the compiler can just take care of it for you. Your job changes from dangerously warranting things-you-don't-necessarily-know, to merely (and optionally) verifying things-you-want-to-be-true via static_assert (Godbolt):
struct Widget {
std::string s;
std::vector<int> v;
};
static_assert(std::is_trivially_relocatable_v<Widget>);
struct Gadget {
std::string s;
std::list<int> v;
};
static_assert(!std::is_trivially_relocatable_v<Gadget>);
In your (OP's) specific use-case, it sounds like you need to find out whether a given lambda type is trivially relocatable (Godbolt):
void f(std::list<int> v) {
auto widget = [&]() { return v; };
auto gadget = [=]() { return v; };
static_assert(std::is_trivially_relocatable_v<decltype(widget)>);
static_assert(!std::is_trivially_relocatable_v<decltype(gadget)>);
}
This is something you can't really do at all with Folly/BSL/EASTL, because their warranting mechanisms work only on named types at the global scope. You can't exactly FOLLY_ASSUME_FBVECTOR_COMPATIBLE(decltype(widget)).
Inside a std::function-like type, you're correct that it would be useful to know whether the captured type is trivially relocatable or not. But since you can't know that, the next best thing (and what you should do in practice) is to check std::is_trivially_copyable. That's the currently blessed type trait that literally means "This type is safe to memcpy, safe to skip the destructor of" — basically all the things you're going to be doing with it. Even if you knew that the type was exactly std::unique_ptr<int>, or whatever, it would still be undefined behavior to memcpy it in present-day C++, because the current standard says that you're not allowed to memcpy types that aren't trivially copyable.
(Btw, technically, P1144 doesn't change that fact. P1144 merely says that the implementation is allowed to elide the effects of relocation, which is a huge wink-and-nod to implementors that they should just use memcpy. But even P1144R6 doesn't make it legal for ordinary non-implementor programmers to memcpy non-trivially-copyable types: it leaves the door open for some compiler to implement, and some library implementation to use, a __builtin_trivial_relocate function that is in some magical sense distinguishable from a plain old memcpy.)
Finally, your last paragraph refers to memcpy + memset(src,0,...). That's wrong. Trivial relocation is tantamount to just memcpy. If you care about the state of the source object afterward — if you care that it's all-zero-bytes, for example — then that must mean you're going to look at it again, which means you aren't actually treating it as destroyed, which means you aren't actually doing the semantics of a relocate here. "Copy and null out the source" is more often the semantics of a move. The point of relocation is to avoid that extra work.

Is there any difference in performance to declare a large variable inside a function as `static`?

Not sure if this has already been asked before. While answering this very simple question, I asked myself the following instead. Consider this:
void foo()
{
int i{};
const ReallyAnyType[] data = { item1, item2, item3,
/* many items that may be potentially heavy to recreate, e.g. of class type */ };
/* function code here... */
}
Now in theory, local variables are recreated every time control reaches function, right? I.e. look at int i above - it's going to be recreated on the stack for sure. What about the array above? Can a compiler be as smart as to optimize its creation to occur only once, or do I need a static modifier here anyway? What about if the array is not const? (OK, if it's not const, there probably i snot sense in creating it only once, since re-initialization to the default state may be required between calls due to modifications being made during function execution.)
Might sound like a basic question, but for some reason I still ponder. Also, ignore the "why would you want to do this" - this is just a language question, not applied to a certain programming problem or design. I mean both C and C++ here. Should there be differences between the two regarding this question, please outline those.
There a two questions here, I think:
Can a compiler optimize a non-static const object to be effectively static so that it is only created once; and
Is it a reasonable expectation that a given compiler will do so.
I think the answer to the second question is "No", because I don't see the point of doing a huge amount of control flow analysis to save the programmer the trouble of typing the word static. However, I've often been surprised what optimizations people spend their time writing (as opposed to the optimizations which I think they should be working on :-) ). All the same, I would strongly recommend using the word static if that's what you wanted.
For the first question, there are circumstances under which the compiler could perform the optimization based on the "as-if" rule, but in very few cases would it work out.
First of all, if any object or subobject in the initializer has a non-trivial constructor/destructor, then the construction/destruction is visible, and this is not an example of copy elision. (This paragraph is C++ only, of course.)
The same would be true if any computation in the initializer list has visible side-effects.
And it should go without saying that if any subobject's value is not constant, the computation of that subobject would need to be done on each construction.
If the object and all subobjects are trivially copyable, all the initializer-list computations are constant, and the only construction cost is that of copying from a template into the object, then the compiler still couldn't perform the optimization if there is any chance that the addresses of more than one live instance of the object might be simultaneously visible. For example, if the function were recursive, and the object's address was used somewhere (hard to avoid for an array), then there would be the possibility that the addresses of two of these objects from different recursive invocations of the function might be compared. And they would have to compare unequal, since they are in fact separate objects. (And, now that I think of it, the function would not even need to be recursive in a multi-threaded environment.)
So the burden of proof for a compiler wishing to optimize that object into a single static instance is quite high. As I said, it may well be that a given compiler actually attempts to perform that task, but I definitely wouldn't expect it to.
The compiler would almost certainly do whatever is deemed most optimal, but most likely it will have it in read-only memory and turn your local variable into a pointer that points to the array in read-only memory. This assumes your array is equivalent to a POD type (or a class composed of POD types; if your class does something non-trivial and/or modifies other things, there is no way the compiler can fairly do this optimization).

Should I now be passing by value?

In this talk (sorry about the sound) Chandler Carruth suggests not passing by reference, even const reference, in the vast majority of cases due to the way in which it limits the back-end to perform optimisation.
He claims that in most cases the copy is negligible - which I am happy to believe, most data structures/classes etc. have a very small part allocated on the stack - especially when compared with the back-end having to assume pointer aliasing and all the nasty things that could be done to a reference type.
Let's say that we have large object on the stack - say ~4kB and a function that does something to an instance of this object (assume free-standing function).
Classically I would write:
void DoSomething(ExpensiveType* inOut);
ExpensiveType data;
...
DoSomething(&data);
He's suggesting:
ExpensiveType DoSomething(ExpensiveType in);
ExpensiveType data;
...
data = DoSomething(data);
According to what I got from the talk, the second would tend to optimise better. Is there a limit to how big I make something like this though, or is the back-end copy-elision stuff just going to prefer the values in almost all cases?
EDIT: To clarify I'm interested in the whole system, since I feel that this would be a major change to the way I write code, I've had use of refs over values drilled into me for anything larger than integral types for a long time now.
EDIT2: I tested it too, results and code here. No competition really, as we've been taught for a long time, the pointer is a far faster method of doing things. What intrigues me now is why it was suggested during that talk that we move to pass by value, but as the numbers don't support it, it's not something I'm going to do.
I have now watched parts of Chandler's talk. I think the general discussion along the lines "should I now always pass by value" does not do his talk justice. Edit: And actually his talk has been discussed before, here value semantics vs output params with large data structures and in a blog from Eric Niebler, http://ericniebler.com/2013/10/13/out-parameters-vs-move-semantics/.
Back to Chandler. In the key note he specifically (around the 4x-5x minute mark mentioned elsewhere) mentions the following points:
If the optimizer cannot see the code of the called function you have much bigger problems than passing refs or values. It pretty much prevents optimization. (There is a follow-up question at that point about link time optimization which may be discussed later, I don't know.)
He recommends the "new classical" way of returning values using move semantics. Instead of the old school way of passing a reference to an existing object as an in-out parameter the value should be constructed locally and moved out. The big advantage is that the optimizer can be sure that no part of the object is alisased since only the function has access to it.
He mentions threads, storing a variable's value in globals, and observable behaviour like output as examples for unknowns which prevent optimization when only refs/pointers are passed. I think an abstract description could be "the local code can not assume that local value changes are undetected elsewhere, and it cannot assume that a value which is not changed locally has not changed at all". With local copies these assumptions could be made.
Obviously, when passing (and possibly, if objects cannot be moved, when returning) by value, there is a trade-off between the copy cost and the optimization benefits. Size and other things making copying costly will tip the balance towards reference strategies, while lots of optimizable work on the object in the function tips it towards value passing. (His examples involved pointers to ints, not to 4k sized objects.)
Based on the parts I watched I do not think Chandler promoted passing by value as a one-fits-all strategy. I think he dissed passing by reference mostly in the context of passing an out parameter instead of returning a new object. His example was not about a function which modified an existing object.
On a general note:
A program should express the programmer's intent. If you need a copy, by all means do copy! If you want to modify an existing object, by all means use references or pointers. Only if side effects or run time behavior become unbearable; really only then try do do something smart.
One should also be aware that compiler optimizations are sometimes surprising. Other platforms, compilers, compiling options, class libraries or even just small changes in your own code may all prevent the compiler from coming to the rescue. The run-time cost of the change would in many cases come totally unexpected.
Perhaps you took that part of the talk out of context, or something. For large objects, typically it depends on whether the function needs a copy of the object or not. For example:
ExpensiveType DoSomething(ExpensiveType in)
{
cout << in.member;
}
you wasted a lot of resource copying the object unnecessarily, when you could have passed by const reference instead.
But if the function is:
ExpensiveType DoSomething(ExpensiveType in)
{
in.member = 5;
do_something_else(in);
}
and we did not want to modify the calling function's object, then this code is likely to be more efficient than:
ExpensiveType DoSomething(ExpensiveType const &inr)
{
ExpensiveType in = inr;
in.member = 5;
do_something_else(in);
}
The difference comes when invoked with an rvalue (e.g. DoSomething( ExpensiveType(6) ); The latter creates a temporary , makes a copy, then destroys both; whereas the former will create a temporary and use that to move-construct in. (I think this can even undergo copy elision).
NB. Don't use pointers as a hack to implement pass-by-reference. C++ has native pass by reference.

zero Functor construction and overhead even with new and delete?

If I have a functor class with no state, but I create it from the heap with new, are typical compilers smart enough to optimize away the creation overhead entirely?
This question has come up when making a bunch of stateless functors. If they're allocated on the stack, does their 0 state class body mean that the stack really isn't changed at all? It seems it must in case you later take an address of the functor instance.
Same for heap allocation.
In that case, functors are always adding a (trivial, but non-zero) overhead in their creation. But maybe compilers can see whether the address is used and if not it can eliminate that stack allocation. (Or, can it even eliminate a heap allocation?)
But how about a functor that's created as a temporary?
#include <iostream>
struct GTfunctor
{
inline bool operator()(int a, int b) {return a>b; }
};
int main()
{
GTfunctor* f= new GTfunctor;
GTfunctor g;
std::cout<< (*f)(2,1) << std::endl;
std::cout<< g(2,1) << std::endl;
std::cout<< GTfunctor()(2,1) << std::endl;
delete f;
}
So in the concrete example above, the three lines each call the same functor in three different ways. In this example, is there any efficiency difference between the ways? Or is the compiler able to optimize each line all the way down to being a compute-less print statement?
Edit:
Most answers say that the compiler could never inline/eliminate the heap allocated functor. But is this really true as well? Most compilers (GCC, MS, Intel) have linktime optimization as well which could indeed do this optimization. (but does it?)
are typical compilers smart enough to optimize away the creation overhead entirely?
When you're creating them on the heap, I doubt whether the compiler is allowed to. IMO:
Invoking new implies invoking operator new.
operator new is a non-trivial function defined in the run-time library.
The compiler isn't allowed to decide that you didn't really mean to invoke such a function and to decide that as an optimization it will silently not invoke it.
When you're creating them on the stack, and not taking their address, then maybe ... or maybe not: my guess is that every object has a non-zero size, in order to occupy some memory, in order to have an identity, even when the object has no state apart from its identity.
Obviously, it depends on your compiler.
I would say
No compiler will optimize away the object on the heap. (This is because, as ChrisW says, compilers will never optimize away a call to new, which is almost certainly defined in another translation unit.)
Some compilers will optimize away a named object on the stack. I've known gcc to do this optimization quite often.
Most compilers will optimize away an unnamed object on the stack. This is one of the "standard" C++ optimizations, especially as more advanced C++ users tend to create lots of unnamed temporary variables.
Unfortunately, these are only rules of thumb. Optimizers are notoriously unpredictable; really the only way to know what your compiler is doing is to read the assembly output.
I highly doubt this type of optimization is allowed, but if your functor has no state, why would you want to initialize it on the heap? It should be just as easy to use it as a temporary.
A C++ object is always non-zero in size. "Empty base class optimization" allows empty base class to have zero size but that doesn't apply here.
I have not worked on any C++ optimizer, so whatever i say is just speculating. I think 2nd and 3rd will be expanded inline easily and there will be no overhead, and no GTFunctor is created. The functor pointer, however, is a different story. In your example, it may seem simple enough and any optimizer should be able to eliminate heap allocation, but in a non trivial program, you maybe creating the functors in one translation unit and use it in another. Or even in a different library where the compiler/linker/loader/runtime system don't have source code to, and it is almost impossible to optimize. Given the fact that optimizing it is not easy, the potential gain in performance is not great, and the number of cases where empty functor is allocated in the heap is probably small, i think most optimizer programmer will probably not put this optimization high in their to do list.
The compiler cannot optimize out a call to new or delete. It may however optimize out the variable created on the stack since it has no state.
Simple way to answer the heap question:
GTfunctor *f = new GTfunctor;
The value of f must not be null, so what should it be? And you also had:
GTfunctor *g = new GTfunctor;
Now the value of g must not equal the value of f, so what should each be?
Furthermore, neither f or g may be equal to any other pointer obtained from new, unless some pointer elsewhere is somehow initialised to be equal to f or g, which (depending on the code that comes after) may involve examining what the entire rest of the program does.
Yes, if by local examination of the code the compiler can see that you never rely on any of these requirements, then it could perform a rewrite such that no heap allocation occurs. The problem is, if your code was that simple, you could probably do that rewrite yourself and end up with a more readable program anyway, e.g. your test program would look like your stack-based g example. So real programs would not benefit from such an optimisation in the compiler.
Presumably the reason you're doing this is because sometimes the functor does have data, depending on which type is chosen at runtime. So compile-time analysis cannot usefully work its magic here.
C++ Standard states that each object (imho on the heap) must at least have a size one byte, so it can be uniquely addressed.
Generating functors with new can lead to two problems:
The constructions can generally not optimized away. New is a function with complex side effects (bad_alloc).
Because you address the functor indirectly the compiler may not be able to inline the function.
Chances are good that you will not see a sign of the functor, if you generate it on the stack.
Side note: The inline statement is not necessary. Every function which is defined in a class definition is treated as inlineable.
The compiler can probably figure out that operator() doesn't use any member variables, and optimize it to the max. I wouldn't make any assumptions about the local or heap allocated variables, though.
Edit: When in doubt, turn on the assembly output option on your compiler and see what it's actually doing. No sense in listening to a bunch of idiots on the web when you can see the real answer for yourself.
The answer to your question has two aspects.
Does the compiler optimize away the heap allocation: I strongly doubt it, but I'm not a standard guy, so I have to look it up.
Can the compiler optimize by inline the object's operator()? Yes. As long as you don't specify the call as virtual, even the pointer dereferencing isn't actually performed.