Creation of a template class creates major bottleneck - c++

I am trying to write a scientific graph library, it works but I have some performance problems. When creating a graph I use a template class for the nodes and do something like
for(unsigned int i = 0; i < l_NodeCount; ++i)
m_NodeList.push_back(Node<T>(m_NodeCounter++));
Even though in the constructor of the node class almost nothing happens (a few variables are asigned) this part is a major bottleneck of my program (when I use over a million of nodes), especially in the debug mode it becomes too inefficient to run at all.
Is there a better way to simultaneusly create all those template classes without having to call the constructor each time or do I have to rewrite it without templates?

If the constructor does almost nothing, as you say, the bottleneck is most likely the allocation of new memory. The vector grows dynamically, and each time it's memory is exhausted, it will reserve new memory and copy all data there. When adding a large number of objects, this can happen very frequently, and become very expensive. This can be avoided by calling
m_NodeList.reserve(l_NodeCount);
With this call, the vector will allocate enough memory to hold l_NodeCount objects, and you will not have any expensive reallocations when bulk-adding the elements.

There are things that happen in your code:
as you add elements to the vector, it occasionally has to resize the internal array, which involves copying all existing elements to the new array
the constructor is called for each element
The constructor call is unavoidable. You create a million elements, you have a million constructor calls. What you can change is what the constructor does.
Adding elements is obviously unavoidable too, but the copying/resizing can be avoided. Call reserve on the vector initially, to reserve enough space for all your nodes.
Depending on your compiler, optimization settings and other flags, the compiler may do a lot of unnecessary bounds checking and debug checks as well.
You can disable this for the compiler (_SECURE_SCL=0 on VS2005/2008, _ITERATOR_DEBUG_LEVEL=0 in VS2010. I believe it's off by default in GCC, and don't know about other compilers).
Alternatively, you can rewrite the loop to minimize the amount of debug checking that needs to be done. Using the standard library algorithms instead of a raw loop allows the library to skip most of the checks (typically, a bounds check will then be performed on the begin and the end iterator, and not on the intervening iterations, whereas on a plain loop, it'll be done every time an iterator is dereferenced)

I would say, your bottleneck is not a template class which has nothing to do with run-time and is dealt with during compilation, but adding an element to vector container (you have tag "vector" in your question). You are performing A LOT of allocations using push_back. Try allocating required total memory right away and then fill elements.

you can avoid the templates buy having a list of (void *) to the objects. and cast them later.
but... if you wish to have 1,000,000 instances of the node class. you will have to call 1,000,000 the node constructor.

Related

Does it make sense to clear standard container instead of construct and destruct it?

Let's say I have a function:
void someFunc(void) {
std::vector<std::string> contentVector;
// here are some operations on the vector
}
The function is called many times. And profiler shows hight percentage CPU usage of std::__u::vector::vector and std::__u::vector::~vector.
Does it make sense to create the vector outside of the function and call .clear() in the beginning of the functions?
The same question for other standard containers.
Does it make sense to clear standard container instead of construct and destruct it?
Yes, it can make sense. It depends on what you're doing. Use whatever is appropriate for the use case.
In case of vector, you would be reusing the storage which can be better in some use cases, so take that into consideration.
If you create the vector anew each time (as a local variable) then the vector will have to allocate a new internal array from the heap each time (and if you are adding items to the vector one-at-a-time without calling reserve() first, the vector may have to do several reallocations if its first allocated array(s) turn out not to be large enough to hold all of the items you want to add); then when your method-call returns, the vector's destructor will free the internally-allocated array from the heap to avoid a memory-leak.
If you keep the vector external to the method, on the other hand, then the vector's internal array will only have to be allocated once (with perhaps a few reallocations if you don't call reserve()), and the same internal array will be re-used after a call to clear() during future method-calls.
Doing it the second way will therefore cut down on CPU cycles used (no constant allocations and de-allocations from the heap, yay!), at the expense of using a keeping a chunk of additional RAM allocated the entire time (since the vector's internal array will stick around the entire time, and will be large enough to hold at least your worst-case-so-far number of items). Whether trading off RAM usage for CPU power is a worthwhile trade depends a lot on what you are doing, how much RAM your machine has, how much CPU power it has, and so on, so it's hard to say which is the better approach other than "try it both ways, measure performance, and pick the approach you prefer".
Does it make sense to create the vector outside of the function and call .clear() in the beginning of the functions?
Unless you assure there are no concurrent calls. If there are concurrent calls, hoisting variable outside the function would be a disaster.
Even there aren't. It might be a disaster once concurrency is introduced into this system.
To elaborate on the disaster: if this is a crawler function and reads different website. Using same variable outside the function will mix two websites' content into one.
My suggestion is leave local variable local. Unless contentVector is always holding the some data. Then the vector should be put somewhere else
This will depend very much on the situation.
Ideally, the caller should know as little as possible about how a function works (encapsulation). If the caller supplies the vector, you introduce the possibility of a bug if the caller passes the wrong vector. Also, you make changing the implementation of the function a painful maintenance burden. The more self-contained a function, the safer.
But performance may suffer and that is when you may compromise in performance critical sections - such as a tight loop. You can, though, have safe and fast versions of a function, where the safe version creates a vector and passes it to the fast version.
void some_function_fast_api(std::vector<std::string>& v) {
// here are some operations on the vector
}
inline void some_function_safe_api() {
std::vector<std::string> v;
some_function_fast_api(v);
}

Faster alternative to push_back(size is known)

I have a float vector. As I process certain data, I push it back.I always know what the size will be while declaring the vector.
For the largest case, it is 172,490,752 floats. This takes about eleven seconds just to push_back everything.
Is there a faster alternative, like a different data structure or something?
If you know the final size, then reserve() that size after you declare the vector. That way it only has to allocate memory once.
Also, you may experiment with using emplace_back() although I doubt it will make any difference for a vector of float. But try it and benchmark it (with an optimized build of course - you are using an optimized build - right?).
The usual way of speeding up a vector when you know the size beforehand is to call reserve on it before using push_back. This eliminates the overhead of reallocating memory and copying the data every time the previous capacity is filled.
Sometimes for very demanding applications this won't be enough. Even though push_back won't reallocate, it still needs to check the capacity every time. There's no way to know how bad this is without benchmarking, since modern processors are amazingly efficient when a branch is always/never taken.
You could try resize instead of reserve and use array indexing, but the resize forces a default initialization of every element; this is a waste if you know you're going to set a new value into every element anyway.
An alternative would be to use std::unique_ptr<float[]> and allocate the storage yourself.
::boost::container::stable_vector Notice that allocating a contiguous block of 172 *4 MB might easily fail and requires quite a lot page joggling. Stable vector is essentially a list of smaller vectors or arrays of reasonable size. You may also want to populate it in parallel.
You could use a custom allocator which avoids default initialisation of all elements, as discussed in this answer, in conjunction with ordinary element access:
const size_t N = 172490752;
std::vector<float, uninitialised_allocator<float> > vec(N);
for(size_t i=0; i!=N; ++i)
vec[i] = the_value_for(i);
This avoids (i) default initializing all elements, (ii) checking for capacity at every push, and (iii) reallocation, but at the same time preserves all the convenience of using std::vector (rather than std::unique_ptr<float[]>). However, the allocator template parameter is unusual, so you will need to use generic code rather than std::vector-specific code.
I have two answers for you:
As previous answers have pointed out, using reserve to allocate the storage beforehand can be quite helpful, but:
push_back (or emplace_back) themselves have a performance penalty because during every call, they have to check whether the vector has to be reallocated. If you know the number of elements you will insert already, you can avoid this penalty by directly setting the elements using the access operator []
So the most efficient way I would recommend is:
Initialize the vector with the 'fill'-constructor:
std::vector<float> values(172490752, 0.0f);
Set the entries directly using the access operator:
values[i] = some_float;
++i;
The reason push_back is slow is that it will need to copy all the data several times as the vector grows, and even when it doesn’t need to copy data it needs to check. Vectors grow quickly enough that this doesn’t happen often, but it still does happen. A rough rule of thumb is that every element will need to be copied on average once or twice; the earlier elements will need to be copied a lot more, but almost half the elements won’t need to be copied at all.
You can avoid the copying, but not the checks, by calling reserve on the vector when you create it, ensuring it has enough space. You can avoid both the copying and the checks by creating it with the right size from the beginning, by giving the number of elements to the vector constructor, and then inserting using indexing as Tobias suggested; unfortunately, this also goes through the vector an extra time initializing everything.
If you know the number of floats at compile time and not just runtime, you could use an std::array, which avoids all these problems. If you only know the number at runtime, I would second Mark’s suggestion to go with std::unique_ptr<float[]>. You would create it with
size_t size = /* Number of floats */;
auto floats = unique_ptr<float[]>{new float[size]};
You don’t need to do anything special to delete this; when it goes out of scope it will free the memory. In most respects you can use it like a vector, but it won’t automatically resize.

Pre-Allocated List

I have two lists of objects
list<QC> qcB;
list<QC> qcS;
and am using emplace_back() to insert items into them. Since I realized that inserting the items was taking too long, I started searching about allocators, which I have never used, to see if I was able to make things run faster. I read somewhere that I would be able to get the default allocator for the list, and allocate space on it ahead of time, so I tried allocating space in one of the lists:
qcB.get_allocator().allocate(100000);
I am unsure if this was supposed to work or not, but the truth is that the emplace_back() is taking the same amount of time with both the lists, even though one of them is allocating space beforehand.
Is this supposed to work? Should this be done in a different way, instead of trying to allocate space in the default allocator? I am clearing the lists from time to time, may this be affecting the allocated space?
Thank you for your help.
If you are talking about the Standard Library list, it is typically a linked list algorithm or similar.
Assuming that you have done a good profiling and the insertion is indeed the problem, and assuming that you only want to do emplace_back calls. Then use a vector, which allows you to call reserve and should have a little more performance.
vector<QC> qcB;
qcB.reserve(10000);
But I fear that your actual bottleneck is the object QC initialization, and this cannot be reserved. In this scenario, you could preinitialize the objects (in case it makes sense to initialize the object and then put actual data in it).
Like this (quick&dirty draft):
vector<QC> qcB;
qcB.resize(10000);
for (int i = 0; i < 10000; ++i) {
qcB[i].populate_object();
}

Are C++ vector constructors efficient?

If I make a vector like this:
vector<int>(50000000, 0);
What happens internally? Does it make a default vector and then continually add values, resizing as necessary? Note: 50,000,000 is not known at compile time.
Would it make a difference if I make the vector like this:
gVec = vector<int>();
gVec.reserve(50000000);
// push_back default values
Please tell me the constructor knows to avoid unnecessary reallocations given the two parameters.
Would it make a difference if I make the vector like this:
gVec = vector<int>();
gVec.reserve(50000000);
// push_back default values
Yes it definitiely makes a difference Using push_back() to fill in the default values may turn out a lot less efficient.
To have the same operations as with done with the constructor vector<int>(50000000, 0); use std::vector<int>::resize():
vector<int> gVec;
gVec.resize(50000000,0);
You will greatly enhance what you learn from this question by stepping through the two options in the debugger - seeing what the std::vector source code does should be instructive if you can mentally filter out a lot of the initially-confusing template and memory allocation abstractions. Demystify this for yourself - the STL is just someone else's code, and most of my work time is spent looking through that.
std::vector guarantees contiguous storage so only one memory block is ever allocated for the elements. The vector control structure will require a second allocation, if it is heap-based and not RAII (stack-based).
vector<int>(N, 0);
creates a vector of capacity >= N and size N, with N values each set to 0.
Step by step:
gVec = vector<int>();
creates an empty vector, typically with a nominal 'best-guess' capacity.
gVec.reserve(N);
updates the vector's capacity - ensures the vector has room for at least N elements. Typically this involves a reallocation from the 'best guess' default capacity, which is unlikely to be large enough for the value of N proposed in this question.
// push_back default values
Each iteration here increases the vector's size by one and sets the new back() element of the vector to 0. The vector's capacity will not change until the number of values pushed exceeds N plus whatever pad the vector implementation might have applied (typically none).
reserve solely allocates storage. No initialization is performed. Applied on an empty vector it should result in one call to the allocate member function of the allocator used.
The constructor shown allocates the storage required and initializes every element to zero: It's semantically equivalent to a reserve and a row of push_back's.
In both cases no reallocations are done.
I suppose in theory the constructor could start by allocating a small block of memory and expanding several times before returning, at least for types that didn't have side-effects in their copy constructor. This would be allowed only because there were no observable side effects of doing so though, not because the standard does anything to allow it directly.
At least in my opinion, it's not worth spending any time or effort worrying about such a possibility though. Chances of anybody doing it seem remote, to say the least. It's only "allowed" to the degree that it's essentially impossible to truly prohibit it.

Variable sized char array with minimizing calls to new?

I need a char array that will dynamically change in size. I do not know how big it can get so preallocating is not an option. It could never get any bigger than 20 bytes 1 time, the next time it may get up to 5kb...
I want the allocation to be like a std vector.
I thought of using a std vector < char > but all those push backs seem like they waste time:
strVec.clear();
for(size_t i = 0; i < varLen; ++i)
{
strVec.push_back(0);
}
Is this the best I can do or is there a way to add a bunch of items to a vector at once? Or maybe a better way to do this.
Thanks
std::vector doesn't allocate memory every time you call push_back, but only when the size becomes bigger than the capacity
First, don't optimize until you've profiled your code and determined that there is a bottleneck. Consider the costs to readability, accessibility, and maintainability by doing something clever. Make sure any plan you take won't preclude you from working with Unicode in future. Still here? Alright.
As others have mentioned, vectors reserve more memory than they use initially, and push_back usually is very cheap.
There are cases when using push_back reallocates memory more than is necessary, however. For example, one million calls to myvector.push_back() might trigger 10 or 20 reallocations of myvector. On the other hand, inserting into a vector at its end will cause at most 1 reallocation of myvector*. I generally prefer the insertion idiom to the reserve / push_back idiom for both speed and readability reasons.
myvector.insert(myvector.end(), inputBegin, inputEnd)
If you do not know the size of your string in advance and cannot tolerate the hiccups caused by reallocations, perhaps because of hard real-time constraints, then maybe you should use a linked list. A linked list will have consistent performance at the price of much worse average performance.
If all of this isn't enough for your purposes, consider other data structures such as a rope or post back with more specifics about your case.
From Scott Meyer's Effective STL, IIRC
You can use the resize member function to add a bunch. However, I would not expect that push_back would be slow, especially if the vector's internal capacity is already non-trivial.
Is this the best I can do or is there a way to add a bunch of items to a vector at once? Or maybe a better way to do this.
push_back isn't very slow, it just compares the size to the current capacity and reallocates if necessary. The comparison may work out to essentially zero time because of branch prediction and superscalar execution on the CPU. The reallocation is performed O(log N) times, so the vector uses up to twice as much memory as needed but time spent on reallocation seldom adds up to anything.
To insert several items at once, use insert. There are a few overloads, the only trick is that you need to explicitly pass end.
my_vec.insert( my_vec.end(), num_to_add, initial_value );
my_vec.insert( my_vec.end(), first, last ); // iterators or pointers
For the second form, you could put the values in an array first and then copy the array to the end of the vector. But this might add as much complexity as it removes. That's how it goes with micro-optimization. Only attempt to optimize if you know there's a measurable gain to be had.