Prevent unnecessary copying between large structs - c++

I have huge structs DataFrom and Data (which have different members in reality). Data is Created from DataFrom.
struct DataFrom{
int a = 1;
int b = 2;
};
static DataFrom dataFrom;
struct Data{
int a;
int b;
};
class DataHandler{
public:
static Data getData(const DataFrom& data2){
Data data;
setA(data, data2);
setB(data, data2);
return data;
}
private:
static void setA(Data& dest, const DataFrom& source){
dest.a = source.a;
}
static void setB(Data& dest, const DataFrom& source){
dest.b = source.b;
}
};
int main(){
auto data = DataHandler2::getData(dataFrom); // copy of whole Data structure
// ...
return 0;
}
As Data is huge, in getData function, there is a copying of whole Data structure. Can this be prevented somehow in elegant way?
I had an idea about:
static void getData( Data& data, const DataFrom& data2);
But I would prefer to retrieve data as a return value, not an output parameter.

There are two potential "copy hazards" to address here:
Copy hazard 1: The construction outside getData()
On the first line of main(), where you commented "copy of whole Data structure" - as commenters noted, the structure won't actually be copied, due to the Named Return Value Optimization, or NRVO for short. You can read about it in this nice blog post from a few years back:
Fluent{C++}: Return value optimizations
In a nutshell: The compiler arranges it so that data inside the getData function, when it is called from main(), is actually an alias of data in main.
Copy hazard 2: data and data2
The second "copy scare" is with setA() and setB(). Here you must be more pro-active, since you do have two live, valid structs in the same function - data and data2 within getData(). Indeed, if Data and DataFrom are simply large structs - then you will be doing a lot of copying from data2 to data, the way you wrote your code.
Move semantics to the rescue
If, however, your DataFrom holds a reference to some allocated storage, say, std::vector<int> a instead of int[10000] a - you could move from your DataFrom instead of copying from it - by having getData() with the signature static Data getData(DataFrom&& data2). Read more about moving here:
What is move semantics?
In my example, this would mean you would now use the raw buffer of data2.a for your data - without copying the contents of that buffer anywhere else. But that would mean you can no longer use data2 afterwards, since its a field has been cannibalized, moved from.
... or just be "lazy".
Instead of a move-based approach, you might try something else. Suppose you defined something like this:
class Data {
protected:
DataFrom& source_;
public:
int& a() { return source_.a; }
int& b() { return source_.b; }
public:
Data(DataFrom& source) : source_(source) { }
Data(Data& other) : source_(other.source) { }
// copy constructor?
// assignment operators?
};
Now Data is not a simple struct; it is more of a facade for a DataFrom (and perhaps some other fields and methods). That's a bit less convenient, but the benefit is that you now create a Data with merely a reference to a DataFrom and no copying of anything else. On access, you may need to dereference a pointer.
Other notes:
Your DataHandler is defined as a class, but it looks like it serves as just a namespace. You never instantiate "data handlers". Consider reading:
Why and how should I use namespaces in C++?
My suggestions do not involve any C++17. Move semantics were introduced in C++11, and if you choose the "lazy" approach - that would work even in C++98.

Since you've tagged this with c++17, you can write your code in a way that prevents any copying from taking place (or being able to take place), and if it compiles you will know that statically no copies will be made.
C++17 guarantees copy elision when returned from functions, which ensures no copies take place in certain circumstances. You can ensure this by changing your code so that Data has an = deleted copy-constructor, and changing getData to return a constructed object. If the code compiles correctly at all, you will be sure that no copy occurred (since copying would trigger a compile-error)
#include <iostream>
struct DataFrom{
int a = 1;
int b = 2;
};
static DataFrom dataFrom;
struct Data{
Data() = default;
Data(const Data&) = delete; // No copy
int a;
int b;
};
class DataHandler{
public:
static Data getData(const DataFrom& data2){
// construct it during return
return Data{data2.a, data2.b};
}
private:
static void setA(Data& dest, const DataFrom& source){
dest.a = source.a;
}
static void setB(Data& dest, const DataFrom& source){
dest.b = source.b;
}
};
int main(){
auto data = DataHandler::getData(dataFrom); // copy of whole Data structure
return 0;
}
This will compile without any extra copies -- you can see it here on compiler explorer

Related

unique_ptr<char[]> confusion

I have a class in which I would like one of the functions to pass a unique ptr object to a char array. But I'm confused on several on features of unique pointers. I'm aware a destructor is called automatically when there are no more references to the object but is still the same for primitive variables? For instance if I do this, will the memory be deleted?
class A {
private:
public:
A(std::unique_ptr<char[]> data) {
data = nullptr;
}
~A();
};
int main() {
auto data = std::make_unique<char[]>(10);
A a(std::move(data));
return 0;
}
The next question I have is: If I have a private object which I want to point to data, why does this result in a compiler error?
class A {
private:
std::unique_ptr<char[]> internaldata;
public:
A(std::unique_ptr<char[]> data) {
internaldata = data;
}
~A() {
internaldata = nullptr;
}
};
int main() {
auto data = std::make_unique<char[]>(10);
A a(std::move(data));
return 0;
}
However when I call std::move while assigning it, the code compiles fine.
class A {
private:
std::unique_ptr<char[]> internaldata;
public:
A(std::unique_ptr<char[]> data) {
internaldata = std::move(data);
}
~A() {
internaldata = nullptr;
}
};
int main() {
auto data = std::make_unique<char[]>(10);
A a(std::move(data));
return 0;
}
But why do I have to call std::move twice here? Once for passing the argument then the second for assigning? And what exactly occurs in terms of reference count during that process, does a reallocation, copy and deletion occur?
And finally, is it possible to pass data into the smart pointer during the deceleration? Because currently I do it like this:
auto data = std::make_unique<char[]>(10);
char* buf = data.get();
strcpy(buf, "hello\0");
But is it possible to do something along the lines of:
char hellobuffer[] = "hello";
auto data = std::make_unique<char[]>(hellobuffer);
Where data is automatically assigned the correct size needed to store hellobuffer and copies over the data itself?
I'm aware a destructor is called automatically when there are no more references to the object but is still the same for primitive variables?
The destructor is always logically called. However, since things like int and char are trivially-destructible, the compiler understands that nothing should actually get called.
For instance if I do this, will the memory be deleted?
Yes -- the whole point of std::unique_ptr<T> is that your memory is taken care of automatically.
A(std::unique_ptr<char[]> data) {
internaldata = data;
}
That example fails to compile because internaldata = data is calling the copy-assignment operator and copying std::unique_ptr instances is disallowed (hence the unique bit).
And what exactly occurs in terms of reference count during that process, does a reallocation, copy and deletion occur?
There is no reference count -- a std::unique_ptr either refers to something or it is empty. When you std::move from a std::unique_ptr, the moved-from variable becomes empty. If you are looking for a reference-counted pointer type, see std::shared_ptr<T>.
And finally, is it possible to pass data into the smart pointer during the deceleration?
No. For std::make_unique<T[]>, you are only allowed to pass a std::size_t (see overload 2). It should be easy to write a wrapper function for what you are looking for.

Initialise const std::vector Class Members Efficiently

Lets assume I am trying to create "immutable" class objects (i.e. members variables are defined with const). So when calling the constructor I currently call separate init functions to initialise the class members. But as a result there seems to be a lot of new vector-creating & vector-copying going on.
If the members weren't const I could perform the initialisations in the { } section of the constructor and write directly into values (which I assume would be more efficient). But this is not possible.
Are there better/cleaner/more efficient ways to design the construction of immutable classes?
#include <vector>
class Data
{
public:
const std::vector<int> values;
Data(unsigned int size, int other) : values(init(size, other)) { }
Data(const std::vector<int>& other) : values(init(other)) { }
private:
std::vector<int> init(unsigned int size, int other) {
std::vector<int> myVector(size);
for (unsigned int i = 0; i < size; ++i)
myVector[i] = other * i;
return myVector;
}
std::vector<int> init(const std::vector<int>& other) {
std::vector<int> myVector(other);
for (unsigned int i = 0; i < other.size(); ++i)
myVector[i] *= myVector[i] - 1;
return myVector;
}
};
int main() {
Data myData1(5, 3); // gives {0, 3, 6, 9, 12}
Data myData2({2, 5, 9}); // gives {2, 20, 72}
return 0;
}
Your current design is perfectly fine. The initialization takes place in the constructor's member initialization list, so it will trigger a move at worst (which is quite cheap for a vector anyway) and NRVO at best.
NRVO is Named Return Value Optimization. When a function returns a named variable with automatic storage duration, the compiler is allowed to elide the copy/move. Note that however, copy/move constructors still need to be available even in the case of an elision. Here is a dummy example to sum up that concept:
SomeType foo() { // Return by value, no ref
SomeType some_var = ...; // some_var is a named variable
// with automatic storage duration
do_stuff_with(var);
return some_var; // NRVO can happen
}
(Your init function follows that pattern.)
In C++17 you could even benefit from a guaranteed copy elision in that scenario depending on the shape of your init function. You can find out more about this in this other SO answer.
Note: Since you tagged your question c++11 I assume move semantics are available.
You say
there seems to be a lot of new vector-creating & vector-copying going on.
But I am unsure of it. I would instead expect one full creation and one move here:
init builds and returns a temporary vector (ok full vector creation, directly with the final size) which is used to initialize the const member (ok a move should occur here). We should control the generated assembly here, but a decent compiler should build the data block once, and move it into the data member.
So unless you can prove by profiling (or by looking at the assembly generated by your compiler) that things really need to be optimized here, I would gladly keep on with this code because it clearly declares the member constness.
The solution here is to remove const from the member vector so that you can perform your initialisation in place, rather than by copying.
If you want values to be readable but not writable by users of the class, you can expose a const reference to it:
class Data {
std::vector<int> values_;
// constructors...
public:
std::vector<int> const& values() const { return values_; }
};

Pointer members of const struct

I have a class that looks somewhat like this:
class S
{
public:
int* data;
S() : data(new int[10]) {}
};
The constructor allocates the memory of 10 integers, and the default copy constructor as expected merely copies the pointer itself rather than the content.
Even if there is an instance of S that has const modifier, I can modify the data that data points to, since that data itself does not have const modifier. I could avoid this by making data private and only allowing write access via a non-const method like so:
class S
{
private:
int* data;
public:
S() : data(new int[10]) {}
int& operator(size_t i)
{
return data[i];
}
const int& operator(size_t i) const
{
return data[i];
}
};
But now I can use the copy constructor to circumvent the constness of the instance of S like so:
void main()
{
const S a; // Allocates memory
S b(a); // Also points to memory allocated for a
b(1) = 3; // Writes to a even though it is not supposed to be mutable
}
What would be an elegant way to solve this problem (potentially using templates)?
The data pointed to by an instance of const S should not be mutable at all using only this instance.
Copy constructor should only copy pointer, but not make a deep copy of the data.
Both a const S and an S should be creatable via a copy constructor given an instance of S such that the const instance cannot modify the data, but the non-const instance can.
It is possible to know in the copy constructor if the object being copied is const by providing two different copy constructors, one which takes a const parameter and one which does not. The compiler will select whichever version matches the passed parameter. Set a flag in the constructor so it can throw an error when a non-const operation is performed.
The best way to avoid the leaked memory shown in the question is to used a smart pointer like std::shared_ptr rather than a raw pointer. Unfortunately shared_ptr is meant for single objects, not arrays; workarounds are possible as in this StackOverflow question. I'm not going to try to solve this now, the code below still has the leak.
To be complete you should follow the Rule of Three and provide an operator= and destructor as well. I left this as an exercise for the reader.
class S
{
private:
int* data;
bool is_const;
public:
S() : data(new int[10]), is_const(false) { data[1] = 42; }
S(const S& other) : data(other.data), is_const(true) {}
S(S& other) : data(other.data), is_const(false) {}
int& operator()(size_t i)
{
if (is_const)
throw std::logic_error("non-const operation attempted");
return data.ptr[i];
}
const int& operator()(size_t i) const
{
return data.ptr[i];
}
};
See it in action: http://ideone.com/SFN89M
Delete the copy constructor (and assignment operator) for S. Create a new proxy class (SCopy) that holds a pointer to an S object (which is passed in to the constructor for SCopy). SCopy would then implement the const int &operator() const and not the non-const version.
This would then allow you to implement a destructor in S that would free the memory you're currently leaking.

How do I assign a data object with const members?

Hope this is not a duplicate. If so, please point me to it in a comment and I'll remove the question again.
I have a data object with data that's only valid in a bundle - i.e. there's no sense in changing the value of one member without invalidating the other members.
This data object describes some image information:
struct ImageInfo
{
ImageInfo(const double &ppix, ...)
: PpiX(ppix),
...
{ }
const double PpiX;
const double PpiY;
const int SizeX;
const int SizeY;
};
In my image object I have a non-const member of type ImageInfo:
class MyImageObject
{
...
private:
ImageInfo mMyInfo;
}
I want to be able to change mMyInfo at runtime, but only so that it will take a new ImageInfo(...) instance.
In the MyImageObject::Load() function, I'd like to read this data from the file info and then create an ImageInfo instance with the correct set of data:
double ppix = ImageFile.GetPpiX();
...
mMyInfo = ImageInfo(ppix, ...);
But I couldn't manage to write a valid assignment operator (copy constructor is possible of course). My solution left mMyInfo empty, because I didn't reference this:
ImageInfo operator=(const ImageInfo &other)
{
// no reference to *this
return ImageInfo(other);
}
Out of curiosity I'd like to know how the assignment operator for such a class would need to look like.
I'm using plain C++.
EDIT
Possible solutions (the goal is to keep the data transportable, but coherent):
Use private members together with Get...() functions -> simple, but I'd like to avoid the parentheses.
Store a pointer to ImageInfo: ImageInfo *mpMyInfo; (I'd like to avoid the heap.)
Use serialization and store the serialized ImageInfo, then create local instances from the serialized data.
I don't think you can have const member variables that aren't static. If you need const variables that change with the instance, you could do something like this:
struct ImageInfo
{
private:
double myPpiX;
double myPpiY;
int mySizeX;
int mySizeY
public:
ImageInfo(const double &ppix, ...)
: myPpiX(ppix),
PpiX(myPpiX),
...
{ }
ImageInfo( const ImageInfo &other)
: myPpiX( other.myPpiX),
PpiX(myPpiX)
...
{ }
const double &PpiX;
const double &PpiY;
const int &SizeX;
const int &SizeY;
// EDIT: explicit assignment operator was missing
ImageInfo& operator=(const ImageInfo &other)
{
myPpiX = other.myPpiX;
myPpiY = other.myPpiX;
mySizeX = other.mySizeX;
mySizeX = other.mySizeX;
return *this;
}
};
The values are stored in the private variables that can be set at construction, and their values accessed by const references. You're also not dependent on the references passed into the constructor living as long as the ImageInfo instance.
As they are data fields that can be modified, they are not const.
If you want to restrict post-construct access to them to const, you need to wrap them in accessors as follows:
struct ImageInfo
{
ImageInfo(const double &ppix, /*...*/)
: PpiX_(ppix),
/*...*/
{ }
double const& PpiX() const {return PpiX_; };
double const& PpiY() const {return PipY_; };
int const& SizeX() const {return SizeX_; };
int const& SizeY() const {return SizeY_; };
private:
double PpiX_;
double PpiY_;
int SizeX_;
int SizeY_;
};
That allows move/copy assignment and construction, while blocking non-const access outside of said construction.
Avoiding the () is tricky, but could be done with pseudo-references, something like this:
struct pseudo_const_reference_to_Ppix {
ImageInfo const* self;
operator double() const { return self->Ppix; }
void reseat( ImageInfo const* o ) { self = o; }
};
plus a whole pile of boilerplate to overload every const operator on the left and right such that the above pseudo_const_reference_* is just as valid as double.
Generic versions can be written (either taking a functor or a std::function if you are willing to suffer type erasure overhead).
Then you maintain these pseudo-const references on assignment and copy/move construction.
I think the () is the better option.
Note that the overhead of a pointer (or more) per pseudo-reference is basically unavoidable: member variables do not have access to the this from which they are invoked, even though the accesing site has it right there plain as day.
If something is const, you can't change it. Full stop.
So you must adjust the design somewhere, either not have those ImageInfo members const, not have ImageInfo as member, or best: not do the assignment.
Normally const members are set in constructor. You can make a load function that creates a MyImageObject object with all its content, so avoiding to have a half-done thing and load the sate in a second phase.
An alternative is to have the mMyInfo indirectly, say using unique_ptr, then you can replace it with another instance. I would not do that without a really good reason.
Immutable value objects are great. But the variable holding the value object (mMyInfo in MyImageObject) should be (non-constant) a pointer. In other languages (e.g. Java) this is automatically the case, but not in C++, where you need the * operator. Also, there is no need to override/implement the = operator for value objects. To change the image data within the image object, you assign a newly constructed ImageInfo object to the myImageInfo pointer. This way, none of the internal variables of the value object are changed.

Move Constructors and Static Arrays

I've been exploring the possibilities of Move Constructors in C++, and I was wondering what are some ways of taking advantage of this feature in an example such as below. Consider this code:
template<unsigned int N>
class Foo {
public:
Foo() {
for (int i = 0; i < N; ++i) _nums[i] = 0;
}
Foo(const Foo<N>& other) {
for (int i = 0; i < N; ++i) _nums[i] = other._nums[i];
}
Foo(Foo<N>&& other) {
// ??? How can we take advantage of move constructors here?
}
// ... other methods and members
virtual ~Foo() { /* no action required */ }
private:
int _nums[N];
};
Foo<5> bar() {
Foo<5> result;
// Do stuff with 'result'
return result;
}
int main() {
Foo<5> foo(bar());
// ...
return 0;
}
In this above example, if we trace the program (with MSVC++ 2011), we see that Foo<N>::Foo(Foo<N>&&) is called when constructing foo, which is the desired behaviour. However, if we didn't have Foo<N>::Foo(Foo<N>&&), Foo<N>::Foo(const Foo<N>&) would be called instead, which would do a redundant copy operation.
My question is, as noted in the code, with this specific example which is using a statically-allocated simple array, is there any way to utilize the move constructor to avoid this redundant copy?
First off, there's a general sort of advice that says you shouldn't write any copy/move constructor, assignment operator or destructor at all if you can help it, and rather compose your class of high-quality components which in turn provide these, allowing the default-generated functions to Do The Right Thing. (The reverse implication is that if you do have to write any one of those, you probably have to write all of them.)
So the question boils down to "which single-responsibility component class can take advantage of move semantics?" The general answer is: Anything that manages a resource. The point is that the move constructor/assigner will just reseat the resource to the new object and invalidate the old one, thus avoiding the (presumed expensive or impossible) new allocation and deep copying of the resource.
The prime example is anything that manages dynamic memory, where the move operation simply copies the pointer and sets the old object's pointer to zero (so the old object's destructor does nothing). Here's a naive example:
class MySpace
{
void * addr;
std::size_t len;
public:
explicit MySpace(std::size_t n) : addr(::operator new(n)), len(n) { }
~MySpace() { ::operator delete(addr); }
MySpace(const MySpace & rhs) : addr(::operator new(rhs.len)), len(rhs.len)
{ /* copy memory */ }
MySpace(MySpace && rhs) : addr(rhs.addr), len(rhs.len)
{ rhs.len = 0; rhs.addr = 0; }
// ditto for assignment
};
The key is that any copy/move constructor will do a full copying of the member variables; it is only when those variables are themselves handles or pointers to resources that you can avoid copying the resource, because of the agreement that a moved object is no longer considered valid and that you're free to steal from it. If there's nothing to steal, then there's no benefit in moving.
In this case it's not useful because int has no move-constructors.
However, it could be useful if those were strings instead, for example:
template<unsigned int N>
class Foo {
public:
// [snip]
Foo(Foo<N>&& other) {
// move each element from other._nums to _nums
std::move(std::begin(other._nums), std::end(other._nums), &_nums[0]);
}
// [snip]
private:
std::string _nums[N];
};
Now you avoid copying strings where a move will do. I'm not sure if a conforming C++11 compiler will generate equivalent code if you omit all the copy-/move-constructors completely, sorry.
(In other words, I'm not sure if std::move is specially defined to do an element-wise move for arrays.)
For the class template you wrote, there's no advantage to take in a move constructor.
There would be an advantage if the member array was allocated dynamically. But with a plain array as a member, there's nothing to optimize, you can only copy the values. There's no way to move them.
Usually, move-semantic is implemented when your class manages resource. Since in your case, the class doesn't manages resource, the move-semantic would be more like copy-semantic, as there is nothing to be moved.
To better understand when move-semantic becomes necessary, consider making _nums a pointer, instead of an array:
template<unsigned int N>
class Foo {
public:
Foo()
{
_nums = new int[N](); //allocate and zeo-initialized
}
Foo(const Foo<N>& other)
{
_nums = new int[N];
for (int i = 0; i < N; ++i) _nums[i] = other._nums[i];
}
Foo(Foo<N>&& other)
{
_nums = other._nums; //move the resource
other._nums=0; //make it null
}
Foo<N> operator=(const Foo<N> & other); //implement it!
virtual ~Foo() { delete [] _nums; }
private:
int *_nums;
};