std::map and 'fat' value objects

std::map and 'fat' value objects - c++

What's better from a performance point of view std::map<uint32_t, MyObject> or std::map<uint32_t. MyObject*> if MyObject is 'fat' (that is operator= rather expensive) and I have to insert/update/delete a lot ?

If you'd prefer to store the objects "by value", but don't want to perform expensive copying, then just don't do the copying at all. For example, you can always insert "empty" objects (which can be copied quickly) and then fill them with actual content after they are already inserted into the map. The latter can be done in more efficient way by employing, for example, move semantics instead of copy semantics. Associative containers are not supposed to perform any copying between the already inserted elements (although in theory it is probably possible), so once you have taken care of the new element insertion, you should not run into any additional issues with expensive copying.
For example, a typical "expensive" insertion scenario might look as follows
MyObject new_value(/* constructor arguments */);
// Maybe do some additional preparations on `new_value`
// ...
// And now: the actual insertion
map[key] = new_value;
// .. which makes a call to the heavy assignment operator
Note, that in this scenario it is you who's making the call to the assignment operator. Since you have the control over the actual copying, you can rewrite it in much less expensive fashion, as follows
MyObject& new_value = map[key];
// Now `new_value` is a reference to a default-constructed object
// Here you should "load" the `new_value` object with whatever information
// you want it to carry. That should cover both the original constructor's
// functionality from the previous piece of code, as well as any
// post-constructor preparations
// ...
Note, that in the second scenario the effort required to build the new value is basically the same as in the first one, but there no extra copying for the actual insertion. Also note, that in this case your object has to be default-consructible, which is not normally a requirement imposed on standard container elements.
If you decide to store the objects "by pointer", a better idea would be to use appropriate smart pointers instead of raw pointers.

Check out the Boost Pointer Container Library for containers that hold pointers safely.

std::map<uint32_t, boost::shared_ptr<MyObject> > is the nice way to deal with this.
If copy is expensive, it will generally be undesirable to store the map values by value.

If copying MyObjects is an expensive operation then there will be a performance penalty. However you should probably benchmark this yourself and see if it's even an issue in the context of all the other things your app is doing.
If you have boost you should consider storing a smart pointer (possibly boost::shared_pointer) in your map instead of bare pointers as this is much safer

Inserting elements will actual invoke the copy constructor, not operator=, and you shouldn't automatically assume that the overhead of dynamically allocating and accessing the object through a pointer will be faster than paying the price of an extra construction. Especially with C++0x, rvalue references can make this cost evaporate, and storing containers of pointers as a performance optimization will disappear.
Point being, don't assume that storing a fat object in a container is bad, especially std::map (which should only make one copy). That extra level of indirection can really hurt on CPU bound operations. You should measure; but the "line" where it's better to store a pointer is often much higher than people think.
It also depends on what you mean by expensive. Modern processors are really good at moving memory around. But if by expensive you mean talking over a network or reading from disk, then yeah that extra copy might be painful (..again, until rvalue references come around).

Inserting a object leads to a call of the assignment operator (=). So the performance depends on the implementation of MyObject's assignment operator. If the implementations has to copy large amounts of data, that may lead to a performance bottleneck. If the implementation used e.g. a kind of copy-on-write scheme it should not.
More often I suppose the object style is not as performance at the pointer style. However, if it leads to an bottleneck depends on the actual application.
The advantage of using the object instead of the pointer is that is makes the memory management much easier.

Related

Is there any reason not to use pimpl to implement move support in C++?

Obviously, pimpl is not strictly necessary, but if the class is designed the "normal" way, then, it seems that moving is not going to give you the full intended benefit: moving is supposed to be cheap; in particular, it should be much faster than copying; the cost should not "scale up" with the amount of "real" internal data. Ideally, the cost should be O(1).
Obviously, if you use pimpl, you will achieve this speed benefit every time with minimal effort and maximum reliability (thanks to = default). So, is there any reason not to just do pimpl all over the place whenever you want the ability to move an object?
(I'm assuming that you are allowed to use the heap in your application, since that obviously rules out pimpl.)

Moving is cheaper than copying when it matters - for types that themselve are expensive to move, mostly due to internally allocating heap-memory, like std::string or std::vector. Copying a class the has 10 "ints" it relatively cheap anyways, so there is little to be gained by having a faster "move"-operation on that type of class.
While you are right that pimpl would speed up moves in any case, it also introduces a consistent overhead for both creation of the object (by having to allocate the internal "pimpl" object on the heap") as well as usage (by now having an additional indirection for any external call). I personally don't think that is a reason to use pimpl solely for the reason of making moves faster - unless move is the primary bottleneck in your specific use-case.
Personally, I would rather go the other round - if you have a very large object only consisting of "trivial" types that cannot be efficiently moved, and you want the object to be moved fast, then just store the object on the stack (std::unique_ptr). That way, you don't impose the overhead of pimpl on every user of the class, while still gaining the fast move when you need it.

With pimpl, the only class data member is a pointer. Move support is cheap for that case because copying that pointer is cheaper than copying the large data structure it points to.
But the same reasoning would apply if the class had several pointers to large structures. Or a pointer to a large structure and a few small non pointers.

Is it better to save object pointer in STL container rather than object itself?

Suppose I have a class A, and I need a vector of objects with class A.
Is it better to use std::vector<A*> or std::vector<A>?
I came across some lectures mentioned that the former doesn't require the definition of copy constructor and copy assignment operator; while the latter requires definitions of both. Is this correct?

The lecture notes are not fully correct: using a vector of A uses a copy constructor / a copy assignment operator, but if the default implementation provided by the compiler works for you, you are not required to provide your own definition.
The decision to define a copy constructor, an assignment operator, and a destructor is made independently of the decision to place your objects in a container. You define these three when your class allocates its resources manually. Otherwise, default implementations should work.
Back to the main question, the decision to store a pointer vs. an object depends mostly on the semantic of your collection, and on the need to store objects with polymorphic behavior.
If polymorphic behavior is not needed, and creating copies of your objects is relatively inexpensive, using vector<A> is a better choice, because it lets the container manage resources for you.
If copying is expensive, and you need polymorphic behavior, you need to use pointers. They are not necessarily need to be raw pointers, though: C++ Standard Library provides smart pointers that will deal with cleanup for you, so you wouldn't have to destroy your objects manually.

Unlike many other modern languages, C++ thrives on value types.
If you store pointers, you have to manage the resulting objects lifetime. Unless you use a unique_ptr or similar smart pointers, the language won't help you with that. Losing track of objects is leaking : keeping track of them after you have disposed of them is a dangling reference/pointer. Both are very common bugs.
If you store values (or value-like types), and you teach your data how to move itself efficiently, a vector will store it contiguously in memory.
On modern computers, CPUs are fast, and memory is amazingly slow. A typical computer will have 3 levels of cache in order to try to make memory faster, but if your data is scattered throughout memory (because you used the free store to store the objects), the CPU has little chance to figure out where you are going to access next.
If your data is in a contiguous buffer, not only will one fetch from memory get more than one object, but the CPU will be able to guess that you are going to want the next chunk of memory in the buffer, and pre-fetch it for you.
So the short version is, if your objects are modest in size, use a vector of actual copies of the object. If they are modestly larger, stick the frequently accessed stuff in the object, and the big less frequently accessed part in a vector within the object, and write an efficient move semantics. Then store the object itself in a vector.
There are a few exceptions to this.
First, polymorphism in value types is hard, so you end up using the free store a lot.
Second, some objects end up having their location part of their identity. Vector moves objects around, and the cost of "rehoming" an object can be not worth the bother.
Third, often performance doesn't matter much. So you do what is easy. At the same time, value types are not that hard, and while premature optimization is a bad idea, so is premature deoptimization. Learning how to work with value types and contiguous vectors is important.
Finally, learn the rule of zero. The rule of zero is that objects which manage resources should have their copy/move/assignment/move assignment/destructors carefully written to follow value semantics rules (or blocked). Then objects that use those resource management objects typically do not need to have their move/copy/assignment/move assign/destructors actually written -- they can be left empty, =defaulted, or similar.
And code you don't write tends to have fewer bugs than code you write.

This is correct and it depends.
Storing pointers in your container gives you additional flexibility, since you don't need these operators or you can have these operators with side effects and / or high cost. The container itself will, at worse, perform copies of the pointers, the cost of which is quite low. In addition, you can store objects of different sizes (instances of different classes in the same inheritance hierarchy comes to mind, especially if they have virtual methods).
On the other hand, storing the objects themselves lower the access overhead, as you won't need to dereference the pointers everytime you access an element. Moreover, it'll improve data locality (thus lower cache misses, page misses…) and reduce memory consumption and fragmentation.
There's no general rule of thumb, there. Small objects with no side effects in their constructors and copy operators are probably better directly in a container that's read often and rarely modified, while large objects with constructors and copy operators that have expensive side effects fit probably better outside of containers that are often modified, resized…
You have to consider your use case and weight the pros and cons of each approach.

Do you need to check for identity? if yes
then use unique_ptr to store them because then you don't need to take care of deleting them later.
Yes you need copy operations because if you store A into the vector, the vector copys the object.
With A* the vector only copys the adress ( the pointer )

For a data member, is there any difference between dynamically allocating this variable(or not) if the containing object is already in dynamic memory?

I'm starting with the assumption that, generally, it is a good idea to allocate small objects in the stack, and big objects in dynamic memory. Another assumption is that I'm possibly confused while trying to learn about memory, STL containers and smart pointers.
Consider the following example, where I have an object that is necessarily allocated in the free store through a smart pointer, and I can rely on clients getting said object from a factory, for instance. This object contains some data that is specifically allocated using an STL container, which happens to be a std::vector. In one case, this data vector itself is dynamically allocated using some smart pointer, and in the other situation I just don't use a smart pointer.
Is there any practical difference between design A and design B, described below?
Situation A:
class SomeClass{
public:
SomeClass(){ /* initialize some potentially big STL container */ }
private:
std::vector<double> dataVector_;
};
Situation B:
class SomeOtherClass{
public:
SomeOtherClass() { /* initialize some potentially big STL container,
but is it allocated in any different way? */ }
private:
std::unique_ptr<std::vector<double>> pDataVector_;
};
Some factory functions.
std::unique_ptr<SomeClass> someClassFactory(){
return std::make_unique<SomeClass>();
}
std::unique_ptr<SomeOtherClass> someOtherClassFactory(){
return std::make_unique<SomeOtherClass>();
}
Use case:
int main(){
//in my case I can reliably assume that objects themselves
//are going to always be allocated in dynamic memory
auto pSomeClassObject(someClassFactory());
auto pSomeOtherClassObject(someOtherClassFactory());
return 0;
}
I would expect that both design choices have the same outcome, but do they?
Is there any advantage or disadvantage for choosing A or B? Specifically, should I generally choose design A because it's simpler or are there more considerations? Is B morally wrong because it can dangle for a std::vector?
tl;dr : Is it wrong to have a smart pointer pointing to a STL container?
edit:
The related answers pointed to useful additional information for someone as confused as myself.
Usage of objects or pointers to objects as class members and memory allocation
and Class members that are objects - Pointers or not? C++
And changing some google keywords lead me to When vectors are allocated, do they use memory on the heap or the stack?

std::unique_ptr<std::vector<double>> is slower, takes more memory, and the only advantage is that it contains an additional possible state: "vector doesn't exist". However, if you care about that state, use boost::optional<std::vector> instead. You should almost never have a heap-allocated container, and definitely never use a unique_ptr. It actually works fine, no "dangling", it's just pointlessly slow.

Using std::unique_ptr here is just wasteful unless your goal is a compiler firewall (basically hiding the compile-time dependency to vector, but then you'd need a forward declaration to standard containers).
You're adding an indirection but, more importantly, the full contents of SomeClass turns into 3 separate memory blocks to load when accessing the contents (SomeClass merged with/containing unique_ptr's block pointing to std::vector's block pointing to its element array). In addition you're paying one extra superfluous level of heap overhead.
Now you might start imagining scenarios where an indirection is helpful to the vector, like maybe you can shallow move/swap the unique_ptrs between two SomeClass instances. Yes, but vector already provides that without a unique_ptr wrapper on top. And it already has states like empty that you can reuse for some concept of validity/nilness.
Remember that variable-sized containers themselves are small objects, not big ones, pointing to potentially big blocks. vector isn't big, its dynamic contents can be. The idea of adding indirections for big objects isn't a bad rule of thumb, but vector is not a big object. With move semantics in place, it's worth thinking of it more like a little memory block pointing to a big one that can be shallow copied and swapped cheaply. Before move semantics, there were more reasons to think of something like std::vector as one indivisibly large object (though its contents were always swappable), but now it's worth thinking of it more like a little handle pointing to big, dynamic contents.
Some common reasons to introduce an indirection through something like unique_ptr is:
Abstraction & hiding. If you're trying to abstract or hide the concrete definition of some type/subtype, Foo, then this is where you need the indirection so that its handle can be captured (or potentially even used with abstraction) by those who don't know exactly what Foo is.
To allow a big, contiguous 1-block-type object to be passed around from owner to owner without invoking a copy or invalidating references/pointers (iterators included) to it or its contents.
A hasty kind of reason that's wasteful but sometimes useful in a deadline rush is to simply introduce a validity/null state to something that doesn't inherently have it.
Occasionally it's useful as an optimization to hoist out certain less frequently-accessed, larger members of an object so that its commonly-accessed elements fit more snugly (and perhaps with adjacent objects) in a cache line. There unique_ptr can let you split apart that object's memory layout while still conforming to RAII.
Now wrapping a shared_ptr on top of a standard container might have more legitimate applications if you have a container that can actually be owned (sensibly) by more than one owner. With unique_ptr, only one owner can possess the object at a time, and standard containers already let you swap and move each other's internal guts (the big, dynamic parts). So there's very little reason I can think of to wrap a standard container directly with a unique_ptr, as it's already somewhat like a smart pointer to a dynamic array (but with more functionality to work with that dynamic data, including deep copying it if desired).
And if we talk about non-standard containers, like say you're working with a third party library that provides some data structures whose contents can get very large but they fail to provide those cheap, non-invalidating move/swap semantics, then you might superficially wrap it around a unique_ptr, exchanging some creation/access/destruction overhead to get those cheap move/swap semantics back as a workaround. For the standard containers, no such workaround is needed.

I agree with #MooingDuck; I don't think using std::unique_ptr has any compelling advantages. However, I could see a use case for std::shared_ptr if the member data is very large and the class is going to support COW (copy-on-write) semantics (or any other use case where the data is shared across multiple instances).

std::list of objects efficiency

Say you have a std::list of some class. There are two ways you can make this list:
1)
std::list<MyClass> myClassList;
MyClass myClass;
myClassList.push_front(myClass);
Using this method, the copy constructor will be called when you pass the object to the list. If the class has many member variables and you are making this call many times it could become costly.
2)
std::list<MyClass*> myClassList;
MyClass* myClass = new MyClass();
myClassList.push_front(myClass);
This method will not call the copy constructor for the class. I'm not exactly positive what happens in this case, but I think the list will create a new MyClass* and assign the address of the parameter. In fact if you make myClass on the stack instead of the heap and let it go out of scope then myClassList.front() is invalid so that must be the case.
If I am wrong about this please contradict me, but I believe the 2nd method is much more efficient for certain classes.

The important point to consider here is much more subtle than the performance issue.
Standard library containers work on Copy Semantics, they create a copy of the element you add to the container.
In general it is always better to stay away from dynamic memory allocations in C++ unless you absolutely need it. First option is better because you do not have to bother about the memory allocations and deallocations, The container will take the ownership of the object you add to it, And do the management for you.
In Second case the container does not take the ownership of element you add, You have to manage it yourself. And if you must then you should a Smart pointer as container element rather than a raw pointer.
With respect to performance, You will nedd to profile the code samples on your system to see if the performance difference is notable enough to select one approach over the other.

This is always a though question.
First of all, it really depends whether your compiler supports C++11 move semantics or not, as this dramatically change the aspects of the problem.
For those stuck in C++03
There are multiple choices:
std::list<MyClass> list;
list.push_front(MyClass());
Even though semantically there is a copy, the optimizer might remove most of the redundant/dead stores. Most optimizers will require that the definition of the default constructor and copy constuctor be available.
boost::ptr_deque<MyClass> deque;
std::auto_ptr<MyClass> p(new MyClass());
deque.push_front(p);
ptr_vector could be used should you replace push_front with push_back, otherwise it's a bit wasteful. This avoids most of the memory overhead of a std::list<MyClass*> and has the added bonus of automatically handling memory.
boost::stable_vector<MyClass> svec;
svec.push_back(MyClass());
// ~~~~
There is one copy (as with list) but a guarantee that no further copy should be made within the container (as with list). It also allows a few more operations than list (for example, random access), at the cost of being slower for insertion in the middle for large containers.
For those enjoying C++11
std::list<MyClass> list;
list.push_front(MyClass());
does not generate any copy, instead a move operation occurs.
It is also possible to use the new operations provided to construct objects in place:
std::list<MyClass> list;
list.emplace_front();
will create a new MyClass directly within the node, no copy, no move.
And finally, you may wish for a more compact representation or other operations on the container, in which case:
std::vector<std::unique_ptr<MyClass>> vec;
vec.emplace_back(new MyClass());
Offers you random access and a lower memory overhead.

If you are really concerned about performance but still need to use linked lists, consider using boost::intrusive::list. The main problem with using std::list is that you'll need to allocate new memory from the heap, and that's probably more costly then even the copy construction for most cases. Since boost::intrusive::list leaves allocation to you, you could keep your objects in a std::vector and allocate them in batches. This way, you'd also have better cache locality, another concern in performance. Alternatively, you could use a custom allocator with the std::list to do the same. Since using the custom allocator for std::list is probably around as messy as using a boost intrusive list, I'd go with boost, because you get many other useful features with that (such as keeping the same object in multiple lists, etc.).
BTW, don't be concerned about the copy construction, the compiler will probably optimize out any unnecessary copying (unnecessary given the way you use it).

The problem with the first approach -- low performance when MyClass is large and inability to have the same object in two data structures (in two lists, each with a different semantics; in a list and a tree; etc). If these downsides do not bother you, go with the first approach.
The second approach is more efficient, but may be harder to manage. For example, you need to correctly destroy MyClass objects if they will not be accessible anymore. This may be non-trivial in the presence of exceptions (read about C++ exception safety). I would recommend you looking at Boost Smart Pointers, which intend to ease C++ pointers management. C++11 has these built-in, so you don't need Boost if you use a modern compiler. Read Wikipedia for a short introduction.

Why is creating STL containers dynamically considered bad practice?

Title says it.
Sample of bad practive:
std::vector<Point>* FindPoints()
{
std::vector<Point>* result = new std::vector<Point>();
//...
return result;
}
What's wrong with it if I delete that vector later?
I mostly program in C#, so this problem is not very clear for me in C++ context.

As a rule of thumb, you don't do this because the less you allocate on the heap, the less you risk leaking memory. :)
std::vector is useful also because it automatically manages the memory used for the vector in RAII fashion; by allocating it on the heap now you require an explicit deallocation (with delete result) to avoid leaking its memory. The thing is made complicated because of exceptions, that can alter your return path and skip any delete you put on the way. (In C# you don't have such problems because inaccessible memory is just recalled periodically by the garbage collector)
If you want to return an STL container you have several choices:
just return it by value; in theory you should incur in a copy-penality because of the temporaries that are created in the process of returning result, but newer compilers should be able to elide the copy using NRVO1. There may also be std::vector implementations that implement copy-on-write optimization like many std::string implementations do, but I've never heard about that.
On C++0x compilers, instead, the move semantics should trigger, avoiding any copy.
Store the pointer of result in an ownership-transferring smart pointer like std::auto_ptr (or std::unique_ptr in C++0x), and also change the return type of your function to std::auto_ptr<std::vector<Point > >; in that way, your pointer is always encapsulated in a stack-object, that is automatically destroyed when the function exits (in any way), and destroys the vector if its still owned by it. Also, it's completely clear who owns the returned object.
Make the result vector a parameter passed by reference by the caller, and fill that one instead of returning a new vector.
Hardcore STL option: you would instead provide your data as iterators; the client code would then use std::copy+std::back_inserter or whatever to store such data in whichever container it wants. Not seen much (it can be tricky to code right) but it's worth mentioning.
As #Steve Jessop pointed out in the comments, NRVO works completely only if the return value is used directly to initialize a variable in the calling method; otherwise, it would still be able to elide the construction of the temporary return value, but the assignment operator for the variable to which the return value is assigned could still be called (see #Steve Jessop's comments for details).

Creating anything dynamically is bad practice unless it's really necessary. There's rarely a good reason to create a container dynamically, so it's usually not a good idea.
Edit: Usually, instead of worrying about things like how fast or slow returning a container is, most of the code should deal only with an iterator (or two) into the container.

Creating objects dynamically in general is considered a bad practice in C++. What if an exception is thrown from your "//..." code? You'll never be able to delete the object. It is easier and safer to simply do:
std::vector<Point> FindPoints()
{
std::vector<Point> result;
//...
return result;
}
Shorter, safer, more straghtforward... As for the performance, modern compilers will optimize away the copy on return and if they are not able to, move constructors will get executed so this is still a cheap operation.

Perhaps you're referring to this recent question: C++: vector<string> *args = new vector<string>(); causes SIGABRT
One liner: It's bad practice because it's a pattern that's prone to memory leaks.
You're forcing the caller to accept dynamic allocation and take charge of its lifetime. It's ambiguous from the declaration whether the pointer returned is a static buffer, a buffer owned by some other API (or object), or a buffer that's now owned by the caller. You should avoid this pattern in any language (including plain C) unless it's clear from the function name what's going on (e.g strdup, malloc).
The usual way is to instead do this:
void FindPoints(std::vector<Point>* ret) {
std::vector<Point> result;
//...
ret->swap(result);
}
void caller() {
//...
std::vector<Point> foo;
FindPoints(&foo);
// foo deletes itself
}
All objects are on the stack, and all the deletion is taken care of by the compiler. Or just return by value, if you're running a C++0x compiler+STL, or don't mind the copy.

I like Jerry Coffin's answer. Additionally, if you want to avoid returning a copy, consider passing the result container as a reference, and the swap() method may be needed sometimes.
void FindPoints(std::vector<Point> &points)
{
std::vector<Point> result;
//...
result.swap(points);
}

Programming is the art of finding good compromises. Dynamically allocated memory can have some place of course, and I can even think to problems where a good compromise between code complexity and efficiency is obtained using std::vector<std::vector<T>*>.
However std::vector does a great job of hiding most needs of dynamically allocated arrays, and managed pointers are many times just a perfect solution for dynamically allocated single instances. This means that it's just not so common finding cases where an unmanaged dynamically allocated container (or dynamically allocated whatever, actually) is the best compromise in C++.
This in my opinion doesn't make dynamic allocation "bad", but just "suspect" if you see it in code, because there's an high probability that better solutions could be possile.
In your case for example I see no reason for using dynamic allocation; just making the function returning an std::vector would be efficient and safe. With any decent compiler Return Value Optimization will be used when assigning to a newly declared vector, and if you need to assign the result to an existing vector you can still do something like:
FindPoints().swap(myvector);
that will not do any copying of the data but just some pointer twiddling (note that you cannot use the apparently more natural myvector.swap(FindPoints()) because of a C++ rule that is sometimes annoying that forbids passing temporaries as non-const references).
In my experience the biggest source of needs of dynamically allocated objects are complex data structures where the same instance can be reached using multiple access paths (e.g. instances are at the same time both in a doubly linked list and indexed by a map). In the standard library containers are always the only owner of the contained objects (C++ is a copy semantic language) so it may be difficult to implement those solutions efficiently without the pointer and dynamic allocation concept.
Often you can stil reasonable-enough compromises that just use standard containers however (may be paying some extra O(log N) lookups that you could have avoided) and that, considering the much simpler code, can be IMO the best compromise in most cases.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js