Which STL container would fit my needs best? I basically have a 10 elements wide container in which I continually push_back new elements while pop_front ing the oldest element (about a million time).
I am currently using a std::deque for the task but was wondering if a std::list would be more efficient since I wouldn't need to reallocate itself (or maybe I'm mistaking a std::deque for a std::vector?). Or is there an even more efficient container for my need?
P.S. I don't need random access
Since there are a myriad of answers, you might be confused, but to summarize:
Use a std::queue. The reason for this is simple: it is a FIFO structure. You want FIFO, you use a std::queue.
It makes your intent clear to anybody else, and even yourself. A std::list or std::deque does not. A list can insert and remove anywhere, which is not what a FIFO structure is suppose to do, and a deque can add and remove from either end, which is also something a FIFO structure cannot do.
This is why you should use a queue.
Now, you asked about performance. Firstly, always remember this important rule of thumb: Good code first, performance last.
The reason for this is simple: people who strive for performance before cleanliness and elegance almost always finish last. Their code becomes a slop of mush, because they've abandoned all that is good in order to really get nothing out of it.
By writing good, readable code first, most of you performance problems will solve themselves. And if later you find your performance is lacking, it's now easy to add a profiler to your nice, clean code, and find out where the problem is.
That all said, std::queue is only an adapter. It provides the safe interface, but uses a different container on the inside. You can choose this underlying container, and this allows a good deal of flexibility.
So, which underlying container should you use? We know that std::list and std::deque both provide the necessary functions (push_back(), pop_front(), and front()), so how do we decide?
First, understand that allocating (and deallocating) memory is not a quick thing to do, generally, because it involves going out to the OS and asking it to do something. A list has to allocate memory every single time something is added, and deallocate it when it goes away.
A deque, on the other hand, allocates in chunks. It will allocate less often than a list. Think of it as a list, but each memory chunk can hold multiple nodes. (Of course, I'd suggest that you really learn how it works.)
So, with that alone a deque should perform better, because it doesn't deal with memory as often. Mixed with the fact you're handling data of constant size, it probably won't have to allocate after the first pass through the data, whereas a list will be constantly allocating and deallocating.
A second thing to understand is cache performance. Going out to RAM is slow, so when the CPU really needs to, it makes the best out of this time by taking a chunk of memory back with it, into cache. Because a deque allocates in memory chunks, it's likely that accessing an element in this container will cause the CPU to bring back the rest of the container as well. Now any further accesses to the deque will be speedy, because the data is in cache.
This is unlike a list, where the data is allocated one at a time. This means that data could be spread out all over the place in memory, and cache performance will be bad.
So, considering that, a deque should be a better choice. This is why it is the default container when using a queue. That all said, this is still only a (very) educated guess: you'll have to profile this code, using a deque in one test and list in the other to really know for certain.
But remember: get the code working with a clean interface, then worry about performance.
John raises the concern that wrapping a list or deque will cause a performance decrease. Once again, he nor I can say for certain without profiling it ourselves, but chances are that the compiler will inline the calls that the queue makes. That is, when you say queue.push(), it will really just say queue.container.push_back(), skipping the function call completely.
Once again, this is only an educated guess, but using a queue will not degrade performance, when compared to using the underlying container raw. Like I've said before, use the queue, because it's clean, easy to use, and safe, and if it really becomes a problem profile and test.
Check out std::queue. It wraps an underlying container type, and the default container is std::deque.
Where performance really matters, check out the Boost circular buffer library.
I continually push_back new elements
while pop_front ing the oldest element
(about a million time).
A million is really not a big number in computing. As others have suggested, use a std::queue as your first solution. In the unlikely event of it being too slow, identify the bottleneck using a profiler (do not guess!) and re-implement using a different container with the same interface.
Why not std::queue? All it has is push_back and pop_front.
A queue is probably a simpler interface than a deque but for such a small list, the difference in performance is probably negligible.
Same goes for list. It's just down to a choice of what API you want.
Use a std::queue, but be cognizant of the performance tradeoffs of the two standard Container classes.
By default, std::queue is an adapter on top of std::deque. Typically, that'll give good performance where you have a small number of queues containing a large number of entries, which is arguably the common case.
However, don't be blind to the implementation of std::deque. Specifically:
"...deques typically have large minimal memory cost; a deque holding just one element has to allocate its full internal array (e.g. 8 times the object size on 64-bit libstdc++; 16 times the object size or 4096 bytes, whichever is larger, on 64-bit libc++)."
To net that out, presuming that a queue entry is something that you'd want to queue, i.e., reasonably small in size, then if you have 4 queues, each containing 30,000 entries, the std::deque implementation will be the option of choice. Conversely, if you have 30,000 queues, each containing 4 entries, then more than likely the std::list implementation will be optimal, as you'll never amortize the std::deque overhead in that scenario.
You'll read a lot of opinions about how cache is king, how Stroustrup hates linked lists, etc., and all of that is true, under certain conditions. Just don't accept it on blind faith, because in our second scenario there, it's quite unlikely that the default std::deque implementation will perform. Evaluate your usage and measure.
This case is simple enough that you can just write your own. Here is something that works well for micro-conroller situations where STL use takes too much space. It is nice way to pass data and signal from interrupt handler to your main loop.
// FIFO with circular buffer
#define fifo_size 4
class Fifo {
uint8_t buff[fifo_size];
int writePtr = 0;
int readPtr = 0;
public:
void put(uint8_t val) {
buff[writePtr%fifo_size] = val;
writePtr++;
}
uint8_t get() {
uint8_t val = NULL;
if(readPtr < writePtr) {
val = buff[readPtr%fifo_size];
readPtr++;
// reset pointers to avoid overflow
if(readPtr > fifo_size) {
writePtr = writePtr%fifo_size;
readPtr = readPtr%fifo_size;
}
}
return val;
}
int count() { return (writePtr - readPtr);}
};
Related
Is there a better way to copy the contents of a std::deque into a byte-array? It seems like there should be something in the STL for doing this.
// Generate byte-array to transmit
uint8_t * i2c_message = new uint8_t[_tx.size()];
if ( !i2c_message ) {
errno = ENOMEM;
::perror("ERROR: FirmataI2c::endTransmission - Failed to allocate memory!");
} else {
size_t i = 0;
// Load byte-array
for ( const auto & data_byte : _tx ) {
i2c_message[i++] = data_byte;
}
// Transmit data
_marshaller.sendSysex(firmata::I2C_REQUEST, _tx.size(), i2c_message);
_stream.flush();
delete[] i2c_message;
}
I'm looking for suggestions for either space or speed or both...
EDIT: It should be noted that _marshaller.sendSysex() cannot throw.
FOLLOW UP:
I thought it would be worth recapping everything, because the comments are pretty illuminating (except for the flame war). :-P
The answer to the question as asked...
Use std::copy
The bigger picture:
Instead of simply increasing the raw performance of the code, it was worth considering adding robustness and longevity to the code base.
I had overlooked RAII - Resource Acquisition is Initialization. By going in the other direction and taking a slight performance hit, I could get big gains in resiliency (as pointed out by #PaulMcKenzie and #WhozCraig). In fact, I could even insulate my code from changes in a dependency!
Final Solution:
In this case, I actually have access to (and the ability to change) the larger code base - often not the case. I reevaluated* the benefit I was gaining from using a std::deque and I swapped the entire underlying container to a std::vector. Thus saving the performance hit of container swapping, and gaining the benefits of contiguous data and RAII.
*I chose a std::deque because I always have to push_front two bytes to finalize my byte-array before sending. However, since it is always two bytes, I was able to pad the vector with two dummy bytes and replace them by random access - O(n) time.
Embrace the C++ standard library. Assuming _tx is really a std::deque<uint8_t>, one way to do this is simply:
std::vector<uint8_t> msg(_tx.cbegin(), _tx.cend());
_marshaller.sendSysex(firmata::I2C_REQUEST, msg.size(), msg.data());
This allocates the proper size contiguous buffer, copies the contents from the source iterator pair, and then invokes your send operation. The vector will be automatically cleaned up on scope-exit, and an exception will be thrown if the allocation for building it somehow fails.
The standard library provides a plethora of ways to toss data around, especially given iterators that mark where to start, and where to stop. Might as well use that to your advantage. Additionally, letting RAII handle the ownership and cleanup of entities like this rather than manual memory management is nearly always a better approach, and should be encouraged.
In all, if you need continuity (and judging by the looks of that send-call, that's exactly why you're doing this), then copying from non-contiguous to contiguous space is pretty much your only option, and that takes space and copy-time. Not much you can do to avoid that. I suppose peeking into the implementation specifics of std::deque and possibly doing something like stacking send-calls would be possible, but I seriously doubt there would be any reward, and the only savings would likely evaporate in the multi-send invokes.
Finally, there is another option that may be worth considering. Look at the source of all of this. Is a std::deque really warranted? For example, certainly your building that container somewhere else. If you can do that build operation as-efficient, or nearly-so, using a std::vector, then this entire problem goes away, as you can just send that.
For example, if you knew (provably) that your std::deque would never be larger than some size N, you could, pre-size a std::vector or similar continuous RAII-protected allocation, to be 2*N in size, start both a fore and aft iterator pair in the middle and either prepend data by walking the fore iterator backward, or append data by walking the aft iterator forward. In the end, your data will be contiguous between fore and aft, and the send is all that remains. no copies would be required, though added-space is still required. This all hinges on knowing with certainty the maximum message size. If that is available to you, it may be an idea worth profiling.
I have an STL container (std::list) that I am constantly reusing. By this I mean I
push a number of elements into the container
remove the elements during processing
clear the container
rinse and repeat a large number of times
When profiling using callgrind I am seeing a large number of calls to new (malloc) and delete (free) which can be quite expensive. I am therefore looking for some way to preferably preallocate a reasonably large number of elements. I would also like my allocation pool to continue to increase until a high water mark is reach and for the allocation pool to continue to hang onto the memory until the container itself is deleted.
Unfortunately the standard allocator continually resizes the memory pool so I am looking for some allocator that will do the above without me having to write my own.
Does such an allocator exist and where can I find such an allocator?
I am working on both Linux using GCC and Android using the STLPort.
Edit: Placement new is ok, what I want to minimize is heap walking which is expensive. I would also like all my object to be as close to eachother as possible to minimize cache misses.
It sounds like you may be just using the wrong kind of container: With a list, each element occupies a separate chunk of memory, to allow individual inserts/deletes - so every addition/deletion form the list will require a separate new()/delete().
If you can use a std::vector instead, then you can reserve the required size before adding the items.
Also for deletion, it's usually best not to remove the items individually. Just call clear() on the container to empty. it.
Edit: You've now made it clear in the comments that your 'remove the elements during processing' step is removing elements from the middle of the list and must not invalidate iterators, so switching to a vector is not suitable. I'll leave this answer for now (for the sake of the comment thread!)
The allocator boost::fast_pool_allocator is designed for use with std::list.
The documentation claims that "If you are seriously concerned about performance, use boost::fast_pool_allocator when dealing with containers such as std::list, and use boost::pool_allocator when dealing with containers such as std::vector."
Note that boost::fast_pool_allocator is a singleton and by default it never frees allocated memory. However, it is implemented using boost::singleton_pool and you can make it free memory by calling the static functions boost::singleton_pool::release_memory() and boost::singleton_pool::purge_memory().
You can try and benchmark your app with http://goog-perftools.sourceforge.net/doc/tcmalloc.html, I've seen some good improvements in some of my projects (no numbers at hand though, sorry)
EDIT: Seems the code/download has been moved there: http://code.google.com/p/gperftools/?redir=1
Comment was too short so I will post my thoughts as an answer.
IMO, new/delete can come only from two places in this context.
I believe std::list<T> is implemented with some kind of nodes as normally lists are, for various reasons. Therefore, each insertion and removal of an element will have to result in new/delete of a node. Moreover, if the object of type T has any allocations and deallocations in c'tor/d'tor, they will be called as well.
You can avoid recreation of standard stuff by reiterating over existing nodes instead of deleting them. You can use std::vector and std::vector::reserve or std::array if you want to squeeze it to c-level.
Nonetheless, for every object created there must be called a destructor. The only way I see to avoid creations and destructions is to use T::operator= when reiterating over container, or maybe some c++13 move stuff if its suitable in your case.
I need a char array that will dynamically change in size. I do not know how big it can get so preallocating is not an option. It could never get any bigger than 20 bytes 1 time, the next time it may get up to 5kb...
I want the allocation to be like a std vector.
I thought of using a std vector < char > but all those push backs seem like they waste time:
strVec.clear();
for(size_t i = 0; i < varLen; ++i)
{
strVec.push_back(0);
}
Is this the best I can do or is there a way to add a bunch of items to a vector at once? Or maybe a better way to do this.
Thanks
std::vector doesn't allocate memory every time you call push_back, but only when the size becomes bigger than the capacity
First, don't optimize until you've profiled your code and determined that there is a bottleneck. Consider the costs to readability, accessibility, and maintainability by doing something clever. Make sure any plan you take won't preclude you from working with Unicode in future. Still here? Alright.
As others have mentioned, vectors reserve more memory than they use initially, and push_back usually is very cheap.
There are cases when using push_back reallocates memory more than is necessary, however. For example, one million calls to myvector.push_back() might trigger 10 or 20 reallocations of myvector. On the other hand, inserting into a vector at its end will cause at most 1 reallocation of myvector*. I generally prefer the insertion idiom to the reserve / push_back idiom for both speed and readability reasons.
myvector.insert(myvector.end(), inputBegin, inputEnd)
If you do not know the size of your string in advance and cannot tolerate the hiccups caused by reallocations, perhaps because of hard real-time constraints, then maybe you should use a linked list. A linked list will have consistent performance at the price of much worse average performance.
If all of this isn't enough for your purposes, consider other data structures such as a rope or post back with more specifics about your case.
From Scott Meyer's Effective STL, IIRC
You can use the resize member function to add a bunch. However, I would not expect that push_back would be slow, especially if the vector's internal capacity is already non-trivial.
Is this the best I can do or is there a way to add a bunch of items to a vector at once? Or maybe a better way to do this.
push_back isn't very slow, it just compares the size to the current capacity and reallocates if necessary. The comparison may work out to essentially zero time because of branch prediction and superscalar execution on the CPU. The reallocation is performed O(log N) times, so the vector uses up to twice as much memory as needed but time spent on reallocation seldom adds up to anything.
To insert several items at once, use insert. There are a few overloads, the only trick is that you need to explicitly pass end.
my_vec.insert( my_vec.end(), num_to_add, initial_value );
my_vec.insert( my_vec.end(), first, last ); // iterators or pointers
For the second form, you could put the values in an array first and then copy the array to the end of the vector. But this might add as much complexity as it removes. That's how it goes with micro-optimization. Only attempt to optimize if you know there's a measurable gain to be had.
Since
they are both contiguous memory containers;
feature wise, deque has almost everything vector has but more, since it is more efficient to insert in the front.
Why whould anyone prefer std::vector to std::deque?
Elements in a deque are not contiguous in memory; vector elements are guaranteed to be. So if you need to interact with a plain C library that needs contiguous arrays, or if you care (a lot) about spatial locality, then you might prefer vector. In addition, since there is some extra bookkeeping, other ops are probably (slightly) more expensive than their equivalent vector operations. On the other hand, using many/large instances of vector may lead to unnecessary heap fragmentation (slowing down calls to new).
Also, as pointed out elsewhere on StackOverflow, there is more good discussion here: http://www.gotw.ca/gotw/054.htm .
To know the difference one should know how deque is generally implemented. Memory is allocated in blocks of equal sizes, and they are chained together (as an array or possibly a vector).
So to find the nth element, you find the appropriate block then access the element within it. This is constant time, because it is always exactly 2 lookups, but that is still more than the vector.
vector also works well with APIs that want a contiguous buffer because they are either C APIs or are more versatile in being able to take a pointer and a length. (Thus you can have a vector underneath or a regular array and call the API from your memory block).
Where deque has its biggest advantages are:
When growing or shrinking the collection from either end
When you are dealing with very large collection sizes.
When dealing with bools and you really want bools rather than a bitset.
The second of these is lesser known, but for very large collection sizes:
The cost of reallocation is large
The overhead of having to find a contiguous memory block is restrictive, so you can run out of memory faster.
When I was dealing with large collections in the past and moved from a contiguous model to a block model, we were able to store about 5 times as large a collection before we ran out of memory in a 32-bit system. This is partly because, when re-allocating, it actually needed to store the old block as well as the new one before it copied the elements over.
Having said all this, you can get into trouble with std::deque on systems that use "optimistic" memory allocation. Whilst its attempts to request a large buffer size for a reallocation of a vector will probably get rejected at some point with a bad_alloc, the optimistic nature of the allocator is likely to always grant the request for the smaller buffer requested by a deque and that is likely to cause the operating system to kill a process to try to acquire some memory. Whichever one it picks might not be too pleasant.
The workarounds in such a case are either setting system-level flags to override optimistic allocation (not always feasible) or managing the memory somewhat more manually, e.g. using your own allocator that checks for memory usage or similar. Obviously not ideal. (Which may answer your question as to prefer vector...)
I've implemented both vector and deque multiple times. deque is hugely more complicated from an implementation point of view. This complication translates to more code and more complex code. So you'll typically see a code size hit when you choose deque over vector. You may also experience a small speed hit if your code uses only the things the vector excels at (i.e. push_back).
If you need a double ended queue, deque is the clear winner. But if you're doing most of your inserts and erases at the back, vector is going to be the clear winner. When you're unsure, declare your container with a typedef (so it is easy to switch back and forth), and measure.
std::deque doesn't have guaranteed continuous memory - and it's often somewhat slower for indexed access. A deque is typically implemented as a "list of vector".
According to http://www.cplusplus.com/reference/stl/deque/, "unlike vectors, deques are not guaranteed to have all its elements in contiguous storage locations, eliminating thus the possibility of safe access through pointer arithmetics."
Deques are a bit more complicated, in part because they don't necessarily have a contiguous memory layout. If you need that feature, you should not use a deque.
(Previously, my answer brought up a lack of standardization (from the same source as above, "deques may be implemented by specific libraries in different ways"), but that actually applies to just about any standard library data type.)
A deque is a sequence container which allows random access to it's elements but it is not guaranteed to have contiguous storage.
I think that good idea to make perfomance test of each case. And make decision relying on this tests.
I'd prefer std::deque than std::vector in most cases.
You woudn't prefer vector to deque acording to these test results (with source).
Of course, you should test in your app/environment, but in summary:
push_back is basically the same for all
insert, erase in deque are much faster than list and marginally faster than vector
Some more musings, and a note to consider circular_buffer.
On the one hand, vector is quite frequently just plain faster than deque. If you don't actually need all of the features of deque, use a vector.
On the other hand, sometimes you do need features which vector does not give you, in which case you must use a deque. For example, I challenge anyone to attempt to rewrite this code, without using a deque, and without enormously altering the algorithm.
Note that vector memory is re-allocated as the array grows. If you have pointers to vector elements, they will become invalid.
Also, if you erase an element, iterators become invalid (but not "for(auto...)").
Edit: changed 'deque' to 'vector'
I recent wrote this post:
How best to store VERY large 2D list of floats in c++? Error-handling?
Some suggested that I implemented my 2D list-like structure of floats as a vector, others said a deque.
From what I gather vector requires continuous memory, but is hence more efficient. Obviously this would be desirable if possible.
Thus my question is what's a good rule of how long a basic structure can be in terms of...
1. float
2. int
...before you should switch from a vector to a deque to avoid memory problems?
e.g. I'm looking for answer like "At around 4 million floats or 8 million ints, you should switch..." ...if possible.
Well, here are two opinions. The C++ standard says (23.1.1/2):
vector is the type of sequence that should be used by default.
list should be used when there are frequent insertions and deletions from the middle of the sequence.
deque is the data structure of choice when most insertions and deletions take place at the beginning or at the end of the sequence.
Herb Sutter argues the following (the article contains his rationale and a performance analysis):
I'd like to present an amiably dissenting point of view: I recommend that you consider preferring deque by default instead of vector, especially when the contained type is a class or struct and not a builtin type, unless you really need the container's memory to be contiguous.
Again, there is no size limit above which deque is or not better than vector. Memory fragmentation implications are pretty much the same in either case, except when you have already done a huge load of allocations/deallocations and there is not enough contiguous space left for a big vector. But this case is very rare. Remember that memory space is per process (google for virtual memory). And you can remedy it by allocating the memory for the vector (by the reserve method) before the cluttering takes place.
The tradeoff is in term of what you want to do with it. If the structure is basically immutable and you only want to access it / overwrite it by index access, go for vector.
Deque is when you need to do insertions either at the end, the beginning or in the middle, something vector cannot handle naturally (except for inserting at the end).
Herb Sutter's articles are in general of great quality, but you'll notice that when you do "number crunching" in C++, most of the stuff you're taught in "general C++" books must be taken with an extra word of caution. The poor indexing performance you experience with deques is perhaps important for your application. In this case, don't use deque.
If you need insertions at the beginning, then go with deque.
Otherwise, I always like to point to this article on vector vs. deque (in addition to those linked by James McNellis here). Assuming an implementation of deque that uses page-based allocation, this article has good comparisons of allocation time (& deallocation time) for vector with & without reserve() vs. deque. Basically, using reserve() makes vector allocation time very similar to deque. Informative, and useful if you can guess the right value to reserve ahead of time.
There are so many factors to consider that it's impossible to give a clear answer. The amount of memory on the machine, how fragmented it is, how fragmented it may become, etc. My suggestion is to just choose one and use it. If it causes problems switch. Chances are you aren't going to hit those edge cases anyway.
If you are truly worried, then you could probably implement a sort of pseudo PIMPL:
template<typename T>
class VectorDeque
{
private:
enum TYPE { NONE, DEQUE, VECTOR };
std::deque<T> m_d;
std::vector<T> m_v;
TYPE m_type;
...
public:
void resize(size_t n)
{
switch(m_type)
{
case NONE:
try
{
m_v.resize(n);
m_type = VECTOR;
}
catch(std::bad_alloc &ba)
{
m_d.resize(n);
m_type = DEQUE;
}
break;
}
}
};
But this seems like total overkill. Have you done any benchmarking or tests? Do you have any reason to believe that memory allocations will fail or that deques will be too slow?
You switch after testing and profiling indicate that one is worse than the other for your application. There is no "around N floats or M ints" universal answer.
Well, regarding memory, I can share some experience that may help you decide when contiguous memory blocks (malloc or std::vector) may become too large:
The application I work with does record measurement data, mostly 4byte float, and for this it allocates internal buffers to store the data. These buffers heavily vary in size, but the typical range may be say, several dozen of 1-10MB and a very few of >100MB. The buffers are allways allocated with calloc, i.e. one big chunk of memory. If a buffer-allocation fails, an error is logged and the user has the choice to try again.
Buffer sizes: Say you want to record 1000 channels at 100Hz for 10 Minutes: 4byte x 1000 x 100 x 60x10 == 228 MB (approx.) ... or 100 channels at 10Hz for 12 hours == 41 MB
We (nearly) never had any problems allocating 40MB buffers (and that's about 10 millon floats) and the 200-300 MB buffers fail from time to time -- all this on normal WinXP/32bit boxes with 4GB RAM.
Given that you don't insert after creation, you should probably either use plain old std::vector, or if fragmentation really does become an issue, a custom vector-like Sequence implemented as a vector or array of pointers to fixed-size arrays.