Are pointers used when copying a class with huge array member? - c++

I have a class storing an multidimensional array as member.
struct Structure
{
Structure()
{
memset(Data, 0, sizeof Data);
}
int Number;
int Data[32][32][32];
}
When I write a function returning an object of this Structure, are all the bytes if the Data member copied or is just a reference passed?
Structure Create(int Parameter)
{
Structure structure;
// modify structure based on parameter
// ...
return structure;
}
If that results in copying the whole block of data, how can I do better? And what would it change to allocate the object on the heap like Structure *structure = new Structure(); and returning that pointer?

When you return the object by value, the actual data (323 ints) will be copied. It's possible the compiler will optimise away the copy (so called "copy elision"), but that is never guaranteed.
If you allocate the object dynamically and return a pointer to it, there will be no copying, of course. If you have access to C++11, consider returning a std::unique_ptr, so that ownership is clear and there's no chance of memory leaks.
In C++11, you could also "do better" by turning the member Data into a conainer (such as std::vector) which internally stores its data on the heap and has move semantics. This means that when returning from a function, the container will be moved instead of copied, and data will not be duplicated.

When you return an object by value the object will not actually be copied, instead your function will populate the object directly in the callers memory. It's as if you did the following:
Structure s;
Create(Parameter, &s);
Although a little better as the default constructor doesn't even get called. This is called "return value optimisation". Although it's not guarenteed by the standard, it is performed by all mainstream C++ compilers (clang, gcc, and Visual C++ all included).
If you want it on the heap then do this:
Structure * Create(int Parameter)
{
Structure * structure = new Structure();
return structure;
}
But it's better to use a smart pointer. If you're using c++11 you can use std::unique_ptr.
std::unique_ptr<Structure> Create(int Parameter)
{
auto structure = std::unique_ptr<Structure>(new Structure());
return structure;
}

You can have it behave however you want. If you define a function like this...
Structure* someFunction();
will return a pointer to a Structure object. This object must be allocated with new. For example defining the function like this:
Structure* someFunction() {
Structure* someNewStructure = new Structure();
return someNewStructure;
}
You have to remember that this element was created within this function, and the function is transferring "responsibility" for the destruction of this object on to the caller. It is usually best to avoid this, though with large data structures it can't be. Another way to handle this is to define a copy constructor in your class so that you can do it the way you referenced. Without defining this copy constructor if you did this:
Sturcture someFunction() {
Structure someResultStruct;
return someResultStruct;
}
When you call someFunction in this case, if your class contained dynamic elements, or other complex data types, they are not guaranteed to copy correctly on return, and you will get weird behavior... unless you define your own copy constructor.

Related

Populating Array of Pointers in Function

I have a class with an array of pointers that gets dynamically allocated in the constructor. This class also has a function to populate the array.
HolderClass::HolderClass(int number)
{
arrayOfPointers = new ItemClass*[number];
}
HolderClass::addItem(int number, ItemClass item)
{
arrayofPointers[number] = &item;
}
Even though this would compile is my understanding correct in that I would actually be populating the array with dangling pointers since the lifetime of the item variable is only for the duration of the addItem function?
What would be the correct way of populating the arrayOfPointers with pointers to the passed in items? The one complexity here is that there will be child classes of ItemClass that will get passed to the addItem function so I don't believe default copy constructors could be used.
EDIT: This code is for an Arduino so I'm fairly limited in what can and can't be done. This also means that I would like to keep things simply for users of this class (since there are lots of Arduino newbies) and not require them to pass in a pointer to the addItem function (which would require them to manage the life of the passed in ItemClass object).
What would be the correct way of populating the arrayOfPointers with pointers to the passed in items?
Firstly, don't use raw new and delete. Use containers and smart pointers.
struct HolderClass
{
std::unique_ptr<std::unique_ptr<ItemClass>[]> arrayOfPointers;
// ...
};
HolderClass::HolderClass(int number)
{
arrayOfPointers = std::make_unique<std::unique_ptr<ItemClass>[]>(number);
}
Then, pass a pointer to addItem instead of a value to avoid object slicing and lifetime issues:
HolderClass::addItem(int number, std::unique_ptr<ItemClass> item)
{
arrayofPointers[number] = std::move(item);
}

The return value choose when dynamically allocate a big structure in a function in C++

Let's say my function:
vector<MyClass>* My_func(int a)
{
vector<MyClass>* ptr = new vector<MyClass>;
//...... Add a lot of elements to this vector, and let's say MyClass is also relatively big structure.
return ptr;
}
This method leaves responsibility for user to free the pointer.
Another method I can think of is just creating local variable in function and return the value:
vector<MyClass> My_func(int a)
{
vector<MyClass> vec;
//...... Add a lot of elements to this vector, and let's say MyClass is also relatively big structure.
return vec;
}
This one avoid the responsibility for user but may take a lot of space when return and copy the value.
Maybe smart pointer in C++ is a better choice but I am not sure. I did not use smart pointer before. What do people do when they come across this situation? What kind of return type will they choose?
Thanks ahead for your tips:-)
In most cases of this sort of construct, the compiler will do "Return Value Optimisation", which means that it's not actually copying the data structure being returned, but instead writing straight into one that lives on the place where it will be returned to.
So, you can safely do this without worrying about it being copied.
However, another method would be to not return a vector in the first, place, but request that the calling code pass one in:
So, something like:
void My_func(int a, vector<MyClass>& vec)
{
...
}
This is GUARANTEED to avoid copying.
In many situations return value optimizations easily can take care of the unnecessary copying. The question is: how are you planning to use this vector outside the function? If you have something like:
vector<MyClass> ret = My_func(a);
Then the optimizations can take care of the problem.
On the other hand, if you want to reuse an existing vector, you could pass a non-const reference to an existing vector, but there aren't many situations where this is needed or useful.
vector<MyClass> ret;
// do something with ret ...
My_func(a, ret);
Plus, this also changes the semantics of your function (e.g. you may need to clear() the vector).
Here is the internal structure of a vector
template <typename T>
class vector {
private:
size_t m_size;
size_t m_cap;
T * m_data;
public:
//methods push pop etc.
};
As you can the size of the vector (with 2 additional size_t data members) is not much larger than size of a pointer. There will be negligible performance benefit in passing a vector instead of pointer, infact using a pointer, accessing the vector will be slower as each time you will have to dereference the pointer. Generally, we don't create a pointer to a vector.
Also, never return a pointer to a local variable. The memory of the local variable will be wiped off once you return value & go out of the scope of the method. Ideally, you should create a vector in the calling function and pass a reference to the vector, when calling your method My_func.

unique_ptr and polymorphism

I have some code that currently uses raw pointers, and I want to change to smart pointers. This helps cleanup the code in various ways. Anyway, I have factory methods that return objects and its the caller's responsibility to manager them. Ownership isn't shared and so I figure unique_ptr would be suitable. The objects I return generally all derive from a single base class, Object.
For example,
class Object { ... };
class Number : public Object { ... };
class String : public Object { ... };
std::unique_ptr<Number> State::NewNumber(double value)
{
return std::unique_ptr<Number>(new Number(this, value));
}
std::unique_ptr<String> State::NewString(const char* value)
{
return std::unique_ptr<String>(new String(this, value));
}
The objects returned quite often need to be passed to another function, which operates on objects of type Object (the base class). Without any smart pointers the code is like this.
void Push(const Object* object) { ... } // push simply pushes the value contained by object onto a stack, which makes a copy of the value
Number* number = NewNumber(5);
Push(number);
When converting this code to use unique_ptrs I've run into issues with polymorphism. Initially I decided to simply change the definition of Push to use unique_ptrs too, but this generates compile errors when trying to use derived types. I could allocate objects as the base type, like
std::unique_ptr<Object> number = NewNumber(5);
and pass those to Push - which of course works. However I often need to call methods on the derived type. In the end I decided to make Push operate on a pointer to the object stored by the unique_ptr.
void Push(const Object* object) { ... }
std::unique_ptr<Object> number = NewNumber(5);
Push(number.get());
Now, to the reason for posting. I'm wanting to know if this is the normal way to solve the problem I had? Is it better to have Push operate on the unique_ptr vs the object itself? If so how does one solve the polymorphism issues? I would assume that simply casting the ptrs wouldn't work. Is it common to need to get the underlying pointer from a smart pointer?
Thanks, sorry if the question isn't clear (just let me know).
edit: I think my Push function was a bit ambiguous. It makes a copy of the underlying value and doesn't actually modify, nor store, the input object.
Initially I decided to simply change the definition of Push to use
unique_ptrs too, but this generates compile errors when trying to use
derived types.
You likely did not correctly deal with uniqueness.
void push(std::unique_ptr<int>);
int main() {
std::unique_ptr<int> i;
push(i); // Illegal: tries to copy i.
}
If this compiled, it would trivially break the invariant of unique_ptr, that only one unique_ptr owns an object, because both i and the local argument in push would own that int, so it is illegal. unique_ptr is move only, it's not copyable. It has nothing to do with derived to base conversion, which unique_ptr handles completely correctly.
If push owns the object, then use std::move to move it there. If it doesn't, then use a raw pointer or reference, because that's what you use for a non-owning alias.
Well, if your functions operate on the (pointed to) object itself and don't need its address, neither take any ownership, and, as I guess, always need a valid object (fail when passed a nullptr), why do they take pointers at all?
Do it properly and make them take references:
void Push(const Object& object) { ... }
Then the calling code looks exactly the same for raw and smart pointers:
auto number = NewNumber(5);
Push(*number);
EDIT: But of course no matter if using references or pointers, don't make Push take a std::unique_ptr if it doesn't take ownership of the passed object (which would make it steal the ownership from the passed pointer). Or in general don't use owning pointers when the pointed to object is not to be owned, std::shared_ptr isn't anything different in this regard and is as worse a choice as a std::unique_ptr for Push's parameter if there is no ownership to be taken by Push.
If Push does not take owenrship, it should probably take reference instead of pointer. And most probably a const one. So you'll have
Push(*number);
Now that's obviously only valid if Push isn't going to keep the pointer anywhere past it's return. If it does I suspect you should try to rethink the ownership first.
Here's a polymorphism example using unique pointer:
vector<unique_ptr<ICreature>> creatures;
creatures.emplace_back(new Human);
creatures.emplace_back(new Fish);
unique_ptr<vector<string>> pLog(new vector<string>());
for each (auto& creature in creatures)
{
auto state = creature->Move(*pLog);
}

Maintaining scope with class pointers in c++

I have a class that is responsible for creating and initializing a number of large objects, as the objects are all of the same Type and I don't want to repeat the same initializing code for all the objects, I call an Init method for each object, for example:
InitObject(objMember);
void Test::InitObject(LargeObject * obj)
{
obj = new LargeObject;
obj->Load();
obj->SetSomeProperty(false);
}
Once this has been done, from a public method I call a set of methods to get a pointer to each of the objects:
//public
LargeObject * Test::GetObject()
{
return objMember;
}
The issue is that the objects are losing scope, when InitObject is called, the objects are correctly constructed and populated, but when I call GetObject, it has lost everything.
I'm probably missing something trivial, but I can't see why it's going out of scope.
It is trivial, yes. You're initializing a copy of the original pointer. You probably want to pass it by reference:
void Test::InitObject(LargeObject*& obj)
Passing by value means that you're assigning the return of new to a copy of the pointer. The one outside the function InitObject remains unchanged.
A few more things - initializing objects after construction should be done with care. If the object isn't valid after construction, it's a bad design (excluding some rare cases). You can signal invalid initialization by throwing an exception from the constructor.
Also, consider using smart pointers instead of raw pointers.

How are pointers to data members allocated/stored in memory?

This is one topic that is not making sense to me. Pointers to data members of a class can be declared and used. However,
What is the logic that supports the idea ? [I am not talking about the syntax, but the logic of this feature]
Also,if i understand this correctly, this would imply an indefinite/variable amount of memory being allocated at the pointer initialization as any number of objects may exist at that time. Also, new objects may be created and destroyed during runtime. Hence, in effect, a single statement will cause a large number of allocations/deallocations. This seems rather counter-intuitive as compared to the rest of the language. Or is my understanding of this incorrect ? I dont think there is any other single initialization statement that will implicitly affect program execution as widely as this.
Lastly, how is memory allocated to these pointers ? Where are they placed with respect to objects ? Is it possible to see physical memory addresses of these pointers ?
A single declaration of a pointer to a data member, creates pointers for every object of that class.
No, it does not. A pointer to a member is a special object that is very different from a pointer; it is a lot more similar to an offset. Given a pointer to an object of the class and a member pointer, you'd be able to get the value of a member; without the pointer to an object of a class a pointer to a member is useless.
Questions 2 and 3 stem from the same basic misunderstanding.
A single declaration of a pointer to a data member, creates pointers for every object of that class.
No. It creates a pointer to a member (which can be though of as an offset from the base of object)
You can then use it with a pointer to an object to get that member.
struct S
{
int x;
int y;
};
int S::* ptrToMember = &S::x; // Pointer to a member.
S obj;
int* ptrToData = &obj.x; // Pointer to object
// that happens to be a member
Notice in creating the pointer to a member we don't use an object (we just use the type information). So this pointer is an offset into the class to get a specific member.
You can access the data member via a pointer or object.
(obj.*ptrToMember) = 5; // Assign via pointer to member (requires an object)
*ptrToData = 6; // Assign via pointer already points at object.
Why does this happen as opposed to a single pointer being created to point to only one specific instance of the class ?
That is called a pointer.
A similar but parallel concept (see above).
What is the logic that supports the idea ?
Silly example:
void addOneToMember(S& obj, int S::* member) { (obj.*member) += 1; }
void addOneToX(S& obj) { addOneToMember(obj, &Obj::x);}
void addOneToY(S& obj) { addOneToMember(obj, &Obj::y);}
Also,if i understand this correctly, this would imply an indefinite/variable amount of memory being allocated at the pointer initialization as any number of objects may exist at that time.
No. Because a pointer to a member is just an offset into an object. You still need the actual object to get the value.
Lastly, how is memory allocated to these pointers ?
Same way as other objects. There is nothing special about them in terms of layout.
But the actual layout is implementation defined. So there is no way of answering this question without referring to the compiler. But it is really of no use to you.
Is it possible to see physical memory addresses of these pointers ?
Sure. They are just like other objects.
// Not that this will provide anything meaningful.
std::cout.write(reinterpret_cast<char*>(&ptrToMember), sizeof(ptrToMember));
// 1) take the address of the pointer to member.
// 2) cast to char* as required by write.
// 3) pass the size of the pointer to member
// and you should write the values printed out.
// Note the values may be non printable but I am sure you can work with that
// Also note the meaning is not useful to you as it is compiler dependent.
Internally, for a class that does not have virtual bases, a pointer-to-member-data just has to hold the offset of the data member from the start of an object of that type. With virtual bases it's a bit more complicated, because the location of the virtual base can change, depending on the type of the most-derived object. Regardless, there's a small amount of data involved, and when you dereference the pointer-to-data-member the compiler generates appropriate code to access it.