Eliminating copying of function parameter

Eliminating copying of function parameter - c++

I am writing a custom memory allocator. If possible, I want to make object creation function like this to abstract creation procedure completely.
template<typename T>
class CustomCreator
{
virtual T& createObject(T value) __attribute__((always_inline))
{
T* ptr = (T*)customAlloc();
new (ptr) T(value);
return *ptr;
}
}
But this causes copying. Is there a way to force to eliminate copying in this case?
Here's my current testing code.
#include <iostream>
struct AA
{
inline static void test(AA aa) __attribute__((always_inline))
{
AA* ptr = new AA(aa);
delete ptr;
}
AA(AA const& aa)
{
printf("COPY!\n");
}
AA(int v)
{
printf("CTOR!\n");
}
};
int main(int argc, const char * argv[])
{
AA::test(AA(77));
return 0;
}
I tried to pass the value as T&, T const&, T&&, T const&&, but it still copies.
I expected optimizer will eliminate function frame, so the function parameter can be deduced into an R-value, but I still see COPY! message.
I also tried C++11 forwarding template, but I couldn't use it because it cannot be a virtual function.
My compiler is Clang included in Xcode. (clang-425.0.28) And optimization level is set to -Os -flto.
Update (for later reference)
I wrote extra tests, and checked generated LLVM IR with clang -Os -flto -S -emit-llvm -std=c++11 -stdlib=libc++ main.cpp; options. What I could observe are: (1) function frames always can be get eliminated. (2) if large object (4096 byte) passed by value, it didn't get eliminated. (3) Passing r-value reference using std::move works well. (4) if I don't make move-constructor, compiler mostly fall back to copy-constructor.

First:
You expected the optimizer to eliminate the function frame, but it can't, because (1) that function isn't a valid case for RVO to cheat and skip the copy constructor entirely, and (2) Your function has side effects. Namely, writing copy to the screen. So the optimizer can't optimize out the copy, because the code says it must write copy out twice. What I would do is remove the printf call, and check the assembly. Most likely, if it's simple, it is being optimized out.
Second:
If the members copy constructors can have side effects (such as allocating memory), then most likely you're right that it isn't being optimized out. However, you can tell it to move members rather than copying them, which is probably what you expected. For this to work you'll need to make sure your class has a move constructor alongside your copy constructor.
inline static void test(AA aa) __attribute__((always_inline))
{
AA* ptr = new AA(std::move(aa));
delete ptr;
}
Third:
In reality, even that isn't really what you want. What you probably want is something more like perfect forwarding, which will just directly pass the parameters to the constructor without making copies of ANYTHING. This means even with the optimizer disabled entirely, your code is still avoiding copies. (Note that this may not apply to your situation, templates cannot be virtual like this, unless it's specialized)
template<class ...types>
inline static void test(types&&... values) __attribute__((always_inline))
{
AA* ptr = new AA(std::forward<types>(values)...);
delete ptr;
}

Isn't your copy-constructor explicitly called in line AA* ptr = new AA(aa);? If you want to avoid copying initializing value, you should do it like this:
#include <iostream>
#include <utility>
int main(int argc, const char * argv[])
{
struct
AA
{ inline static void test(AA aa) __attribute__((always_inline))
{
AA* ptr = new AA(std::move(aa));
delete ptr;
}
inline static void test(AA&& aa) __attribute__((always_inline))
{
AA* ptr = new AA(std::move(aa));
delete ptr;
}
AA(AA const& aa)
{
printf("COPY!\n");
}
AA(AA const&& aa) // Move constructors are not default here - you have to declare one
{
printf("MOVE!\n");
}
AA(int v)
{
printf("CTOR!\n");
xx = v;
}
int xx = 55;
};
AA::test(AA(77));
return 0;
}
Problems:
Conditions allowing default move constructor are quite tight, you are better off writing one yourself.
If you pass value along (even a temporary one) you still need std::move to to pass it even further without creating a copy.

First, let's solve your A&& move problem. You can get a move constructor by making a move constructor (or declaring it default) (or letting the compiler generate one), and making sure you call std::move on aa when you pass it through: Live example here.
The same will apply to any virtual instance function you are working with. I've also included the forwarding variadic version in the example as well, just in case you'd like a reference on how to do that as well (notice all the arguments are std::forwarded to their next calls).
In either case, this seems like a question spawned from a serious case of XY, that is: you're trying to solve a problem you've created with your current design. I don't know why you need to create objects through instance methods on a pre-existing object (factory-type creation?), but it sounds messy. In either case, the above should help you out enough to get your going. Just make sure if you're moving parameters, to always wrap the && item with a std::move (or std::forward if you're working with universal references and template parameters).

Related

std::move in initializer lists

I often see the following idiom in production code: A value argument (like a shared pointer) is handed into a constructor and shall be copied once. To ensure this, the argument is wrapped into a std::move application. Hating boilerplate code and formal noise I wondered if this is actually necessary. If I remove the application, at least gcc 7 outputs some different assembly.
#include <memory>
class A
{
std::shared_ptr<int> p;
public:
A(std::shared_ptr<int> p) : p(std::move(p)) {} //here replace with p(p)
int get() { return *p; }
};
int f()
{
auto p = std::make_shared<int>(42);
A a(p);
return a.get();
}
Compiler Explorer shows you the difference. While I am not certain what is the most efficient approach here, I wonder if there is an optimization that allows to treat p as a rvalue reference in that particular location? It certainly is a named entity, but that entity is "dead" after that location anyway.
Is it valid to treat a "dead" variable as a rvalue reference? If not, why?

In the body of the constructor, there are two p objects, the ctor argument and this->p. Without the std::move, they're identical. That of course means the ownership is shared between the two pointers. This must be achieved in a thread-safe way, and is expensive.
But optimizing this out is quite hard. A compiler can't generally deduce that ownership is redundant. By writing std::move yourself, you make it unambiguously clear that the ctor argument p does not need to retain ownership.

C++11 best practice to use rvalue

I am new to C++11. In fact until recently, I programmed only using dynamic allocation, in a way similar to Java, e.g.
void some_function(A *a){
a->changeInternalState();
}
A *a = new A();
some_function(a);
delete a;
// example 2
some_function( new A() ); // suppose there is **no** memory leak.
Now I want to reproduce similar code with C++11, but without pointers.
I need to be able to pass newly created class class A directly to function useA(). There seems to be a problem if I want to do so with non-const normal reference and It works if I do it with rvalue reference.
Here is the code:
#include <stdio.h>
class A{
public:
void print(){
++p; // e.g. change internal state
printf("%d\n", p);
}
int p;
};
// normal reference
void useA(A & x){
x.print();
}
// rvalue reference
void useA(A && x){
useA(x);
}
int main(int argc, char** argv)
{
useA( A{45} ); // <--- newly created class
A b{20};
useA(b);
return 0;
}
It compiles and executes correctly, but I am not sure, if this is the correct acceptable way to do the work?
Are there some best practices for this kind of operations?

Normally you would not design the code so that a temporary object gets modified. Then you would write your print function as:
void useA(A const & x){
x.print();
}
and declare A::print as const. This binds to both rvalues and lvalues. You can use mutable for class member variables which might change value but without the object logically changing state.
Another plan is to keep just A &, but write:
{ A temp{45}; useA(temp); }
If you really do want to modify a temporary object, you can write the pair of lvalue and rvalue overloads as you have done in your question. I believe this is acceptable practice for that case.

The best thing about C++11 move semantics is that most of the time, you get them "for free" without having to explicitly add any &&s or std::move()s in your code. Usually, you only need to use these things explicitly if you're writing code that does manual memory management, such as the implementation of a smart pointer or a container class, where you would have had to write a custom destructor and copy constructor anyway.
In your example, A is just an int. For ints, a move is no different from a copy, because there's no opportunity for optimization even if the int happens to be a disposable temporary. Just provide a single useA() function that takes an ordinary reference. It'll have the same behavior.

auto_ptr in a class not returning from a source function

Consider the following code:
#include<memory>
struct A {
std::auto_ptr<int> i;
};
A F() {
A a;
return a;
}
int main(int argc, char **argv) {
A a = F();
return 0;
}
When compiling I receive a compilation error, (see here):
error: no matching function for call to ‘A::A(A)’
A a = F();
^
To my understanding, A::A(A) isn't even allowed to exist, so why is the compiler requesting it? Secondly, why is it not using RVO?
If it is because a std::auto_ptr cannot be returned from a function, why does the following compile and run?
#include<memory>
std::auto_ptr<int> F() {
std::auto_ptr<int> ap;
return ap;
}
int main(int argc, char **argv) {
std::auto_ptr<int> ap = F();
return 0;
}
I cannot use C++11 in my current work unfortunately, hence the use of auto_ptr.

I tried searching but couldn't find a relevant Q&A, even though I know this is a duplicate. So instead I'm answering instead of voting to close as a duplicate. Apologies.
The reason it needs a copy constructor is because the line:
A a = F();
is really (from the compiler's perspective):
A a(F());
even if copy elision/RVO is used. That is, the compiler does not do:
// This is NOT what the compiler does for A a = F();
A a;
a = F();
Even with copy elision/RVO, A a(F()); won't work. From a C++ standards perspective, the code needs to be legal, whether or not the compiler does copy elision. Copy elision doesn't relax the requirement of needing a copy constructor (even if it doesn't actually use it; it still needs to be there in order to ensure "legality" of the code).
This doesn't work, because std::auto_ptr's copy constructor doesn't take a const reference, so A's copy constructor doesn't exist. F() returns a temporary A, which can only be captured by a const reference, which means that line of code is trying to use a non-existent copy constructor.

The compiler normally makes the default constructor
A(const A&)
However in this case it is not possible as there is no auto_ptr::auto_ptr(const auto_ptr &rhs) so it creates the following instead:
A(A&)
Now, when the F returns it will not let the return value a be modified (I think because it might be a persisting object such as a global or a reference). When it finds no A(const A&) it will instead look for A(A) as that's the only other way to return the value without modifying a (even though it's stupid). Even using RVO, it must still be valid code as #Cornstalks mentions in his answer (to an extent, see below).
auto_ptr gets around this by creating a temporary auto_ptr_ref object using the following (ref)
auto_ptr::operator auto_ptr_ref()
auto_ptr(auto_ptr_ref)
For some reason, the compiler accepts the typecast to auto_ptr_ref despite it not being a const function and rejecting the use of A(A&).
To get around this issue one can simply declare A(const A&) without implementing it
struct A {
A() { }
A(const A&);
std::auto_ptr<int> i;
};
The compiler thinks that the return is legal but applies RVO before the linker sees it and so the missing implementation is never needed. Of course it precludes the use of
A a;
A b(a);
An inelegant fix, but effective none the less.

Lambda to std::function conversion performance

I'd like to use lambda functions to asynchronously call a method on a reference counted object:
void RunAsync(const std::function<void()>& f) { /* ... */ }
SmartPtr<T> objPtr = ...
RunAsync([objPtr] { objPtr->Method(); });
Creating the lambda expression obviously creates a copy but I now have the problem that converting the lambda expression to a std::function object also creates a bunch of copies of my smart pointer and each copy increases the reference count.
The following code should demonstrate this behavior:
#include <functional>
struct C {
C() {}
C(const C& c) { ++s_copies; }
void CallMe() const {}
static int s_copies;
};
int C::s_copies = 0;
void Apply(const std::function<void()>& fct) { fct(); }
int main() {
C c;
std::function<void()> f0 = [c] { c.CallMe(); };
Apply(f0);
// s_copies = 4
}
While the amount of references goes back to normal afterwards, I'd like to prevent too many referencing operations for performance reasons. I'm not sure where all these copy operations come from.
Is there any way to achieve this with less copies of my smart pointer object?
Update: Compiler is Visual Studio 2010.

std::function probably won't be as fast as a custom functor until compilers implement some serious special treatment of the simple cases.
But the reference-counting problem is symptomatic of copying when move is appropriate. As others have noted in the comments, MSVC doesn't properly implement move. The usage you've described requires only moving, not copying, so the reference count should never be touched.
If you can, try compiling with GCC and see if the issue goes away.

Converting to a std::function should only make a move of the lambda. If this isn't what's done, then there's arguably a bug in the implementation or specification of std::function. In addition, in your above code, I can only see two copies of the original c, one to create the lambda and another to create the std::function from it. I don't see where the extra copy is coming from.

Return value copying issue (to improve debug timing) -- What's the solution here?

The most interesting C++ question I've encountered recently goes as follows:
We determined (through profiling) that our algorithm spends a lot of time in debug mode in MS Visual Studio 2005 with functions of the following type:
MyClass f(void)
{
MyClass retval;
// some computation to populate retval
return retval;
}
As most of you probably know, the return here calls a copy constructor to pass out a copy of retval and then the destructor on retval. (Note: the reason release mode is very fast for this is because of the return value optimization. However, we want to turn this off when we debug so that we can step in and nicely see things in the debugger IDE.)
So, one of our guys came up with a cool (if slightly flawed) solution to this, which is, create a conversion operator:
MyClass::MyClass(MyClass *t)
{
// construct "*this" by transferring the contents of *t to *this
// the code goes something like this
this->m_dataPtr = t->m_dataPtr;
// then clear the pointer in *t so that its destruction still works
// but becomes 'trivial'
t->m_dataPtr = 0;
}
and also changing the function above to:
MyClass f(void)
{
MyClass retval;
// some computation to populate retval
// note the ampersand here which calls the conversion operator just defined
return &retval;
}
Now, before you cringe (which I am doing as I write this), let me explain the rationale. The idea is to create a conversion operator that basically does a "transfer of contents" to the newly constructed variable. The savings happens because we're no longer doing a deep copy, but simply transferring the memory by its pointer. The code goes from a 10 minute debug time to a 30 second debug time, which, as you can imagine, has a huge positive impact on productivity. Granted, the return value optimization does a better job in release mode, but at the cost of not being able to step in and watch our variables.
Of course, most of you will say "but this is abuse of a conversion operator, you shouldn't be doing this kind of stuff" and I completely agree. Here's an example why you shouldn't be doing it too (this actually happened:)
void BigFunction(void)
{
MyClass *SomeInstance = new MyClass;
// populate SomeInstance somehow
g(SomeInstance);
// some code that uses SomeInstance later
...
}
where g is defined as:
void g(MyClass &m)
{
// irrelevant what happens here.
}
Now this happened accidentally, i.e., the person who called g() should not have passed in a pointer when a reference was expected. However, there was no compiler warning (of course). The compiler knew exactly how to convert, and it did so. The problem is that the call to g() will (because we've passed it a MyClass * when it was expecting a MyClass &) called the conversion operator, which is bad, because it set the internal pointer in SomeInstance to 0, and rendered SomeInstance useless for the code that occured after the call to g(). ... and time consuming debugging ensued.
So, my question is, how do we gain this speedup in debug mode (which has as direct debugging time benefit) with clean code that doesn't open the possibility to make such other terrible errors slip through the cracks?
I'm also going to sweeten the pot on this one and offer my first bounty on this one once it becomes eligible. (50 pts)

You need to use something called "swaptimization".
MyClass f(void)
{
MyClass retval;
// some computation to populate retval
return retval;
}
int main() {
MyClass ret;
f().swap(ret);
}
This will prevent a copy and keep the code clean in all modes.
You can also try the same trick as auto_ptr, but that's more than a little iffy.

If your definition of g is written the same as in your code base I'm not sure how it compiled since the compiler isn't allowed to bind unnamed temporaries to non-const references. This may be a bug in VS2005.
If you make the converting constructor explicit then you can use it in your function(s) (you would have to say return MyClass(&retval);) but it won't be allowed to be called in your example unless the conversion was explicitly called out.
Alternately move to a C++11 compiler and use full move semantics.
(Do note that the actual optimization used is Named Return Value Optimization or NRVO).

The problem is occuring because you're using MyClass* as a magic device, sometimes but not always. Solution: use a different magic device.
class MyClass;
class TempClass { //all private except destructor, no accidental copies by callees
friend MyClass;
stuff* m_dataPtr; //unfortunately requires duplicate data
//can't really be tricked due to circular dependancies.
TempClass() : m_dataPtr(NULL) {}
TempClass(stuff* p) : m_dataPtr(p) {}
TempClass(const TempClass& p) : m_dataPtr(p) {}
public:
~TempClass() {delete m_dataPtr;}
};
class MyClass {
stuff* m_dataPtr;
MyClass(const MyClass& b) {
m_dataPtr = new stuff();
}
MyClass(TempClass& b) {
m_dataPtr = b.m_dataPtr ;
b.m_dataPtr = NULL;
}
~MyClass() {delete m_dataPtr;}
//be sure to overload operator= too.
TempClass f(void) //note: returns hack. But it's safe
{
MyClass retval;
// some computation to populate retval
return retval;
}
operator TempClass() {
TempClass r(m_dataPtr);
m_dataPtr = nullptr;
return r;
}
Since TempClass is almost all private (friending MyClass), other objects cannot create, or copy TempClass. This means the hack can only be created by your special functions when clearly told to, preventing accidental usage. Also, since this doesn't use pointers, memory can't be accidentally leaked.

Move semantics have been mentioned, you've agreed to look them up for education, so that's good. Here's a trick they use.
There's a function template std::move which turns an lvalue into an rvalue reference, that is to say it gives "permission" to move from an object[*]. I believe you can imitate this for your class, although I won't make it a free function:
struct MyClass;
struct MovableMyClass {
MyClass *ptr;
MovableMyClass(MyClass *ptr) : ptr(ptr) {}
};
struct MyClass {
MyClass(const MovableMyClass &tc) {
// unfortunate, we need const reference to bind to temporary
MovableMyClass &t = const_cast<MovableMyClass &>(tc);
this->m_dataPtr = t.ptr->m_dataPtr;
t.ptr->m_dataPtr = 0;
}
MovableMyClass move() {
return MovableMyClass(this);
}
};
MyClass f(void)
{
MyClass retval;
return retval.move();
}
I haven't tested this, but something along those lines. Note the possibility of doing something const-unsafe with a MovableMyClass object that actually is const, but it should be easier to avoid ever creating one of those than it is to avoid creating a MyClass* (which you've found out is quite difficult!)
[*] Actually I'm pretty sure I've over-simplified that to the point of being wrong, it's actually about affecting what overload gets chosen rather than "turning" anything into anything else as such. But causing a move instead of a copy is what std::move is for.

A different approach, given your special scenario:
Change MyClass f(void) (or operator+) to something like the following:
MyClass f(void)
{
MyClass c;
inner_f(c);
return c;
}
And let inner_f(c) hold the actual logic:
#ifdef TESTING
# pragma optimize("", off)
#endif
inline void inner_f(MyClass& c)
{
// actual logic here, setting c to whatever needed
}
#ifdef TESTING
# pragma optimize("", on)
#endif
Then, create an additional build configurations for this kind of testing, in which TESTING is included in the preprocessor definitions.
This way, you can still take advantage of RVO in f(), but the actual logic will not be optimized on your testing build. Note that the testing build can either be a release build or a debug build with optimizations turned on. Either way, the sensitive parts of the code will not be optimized (you can use the #pragma optimize in other places too, of course - in the code above it only affects inner_f itself, and not code called from it).

Possible solutions
Set higher optimization options for the compiler so it optimizes out the copy construction
Use heap allocation and return pointers or pointer wrappers, preferably with garbage collection
Use the move semantics introduced in C++11; rvalue references, std::move, move constructors
Use some swap trickery, either in the copy constructor or the way DeadMG did, but I don't recommend them with a good conscience. An inappropriate copy constructor like that could cause problems, and the latter is a bit ugly and needs easily destructible default objects which might not be true for all cases.
+1: Check and optimize your copy constructors, if they take so long then something isn't right about them.

I would prefer to simply pass the object by reference to the calling function when MyClass is too big to copy:
void f(MyClass &retval) // <--- no worries !
{
// some computation to populate retval
}
Just simple KISS principle.

Okay I think I have a solution to bypass the Return Value Optimization in release mode, but it depends on the compiler and not guaranteed to work. It is based on this.
MyClass f (void)
{
MyClass retval;
MyClass dummy;
// ...
volatile bool b = true;
if b ? retval : dummy;
}
As for why the copy construction takes so long in DEBUG mode, I have no idea. The only possible way to speed it up while remaining in DEBUG mode is to use rvalue references and move semantics. You already discovered move semantics with your "move" constructor that accepts pointer. C++11 gives a proper syntax for this kind of move semantics. Example:
// Suppose MyClass has a pointer to something that would be expensive to clone.
// With move construction we simply move this pointer to the new object.
MyClass (MyClass&& obj) :
ptr (obj.ptr)
{
// We set the source object to some trivial state so it is easy to delete.
obj.ptr = NULL;
}
MyClass& operator = (MyClass&& obj) :
{
// Here we simply swap the pointer so the old object will be destroyed instead of the temporary.
std::swap(ptr, obj.ptr);
return *this;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Eliminating copying of function parameter - c++

Related

std::move in initializer lists

C++11 best practice to use rvalue

auto_ptr in a class not returning from a source function

Lambda to std::function conversion performance

Return value copying issue (to improve debug timing) -- What's the solution here?

Categories

Resources