Most efficient method of copying std::deque contents to byte-array

Most efficient method of copying std::deque contents to byte-array - c++

Is there a better way to copy the contents of a std::deque into a byte-array? It seems like there should be something in the STL for doing this.
// Generate byte-array to transmit
uint8_t * i2c_message = new uint8_t[_tx.size()];
if ( !i2c_message ) {
errno = ENOMEM;
::perror("ERROR: FirmataI2c::endTransmission - Failed to allocate memory!");
} else {
size_t i = 0;
// Load byte-array
for ( const auto & data_byte : _tx ) {
i2c_message[i++] = data_byte;
}
// Transmit data
_marshaller.sendSysex(firmata::I2C_REQUEST, _tx.size(), i2c_message);
_stream.flush();
delete[] i2c_message;
}
I'm looking for suggestions for either space or speed or both...
EDIT: It should be noted that _marshaller.sendSysex() cannot throw.
FOLLOW UP:
I thought it would be worth recapping everything, because the comments are pretty illuminating (except for the flame war). :-P
The answer to the question as asked...
Use std::copy
The bigger picture:
Instead of simply increasing the raw performance of the code, it was worth considering adding robustness and longevity to the code base.
I had overlooked RAII - Resource Acquisition is Initialization. By going in the other direction and taking a slight performance hit, I could get big gains in resiliency (as pointed out by #PaulMcKenzie and #WhozCraig). In fact, I could even insulate my code from changes in a dependency!
Final Solution:
In this case, I actually have access to (and the ability to change) the larger code base - often not the case. I reevaluated* the benefit I was gaining from using a std::deque and I swapped the entire underlying container to a std::vector. Thus saving the performance hit of container swapping, and gaining the benefits of contiguous data and RAII.
*I chose a std::deque because I always have to push_front two bytes to finalize my byte-array before sending. However, since it is always two bytes, I was able to pad the vector with two dummy bytes and replace them by random access - O(n) time.

Embrace the C++ standard library. Assuming _tx is really a std::deque<uint8_t>, one way to do this is simply:
std::vector<uint8_t> msg(_tx.cbegin(), _tx.cend());
_marshaller.sendSysex(firmata::I2C_REQUEST, msg.size(), msg.data());
This allocates the proper size contiguous buffer, copies the contents from the source iterator pair, and then invokes your send operation. The vector will be automatically cleaned up on scope-exit, and an exception will be thrown if the allocation for building it somehow fails.
The standard library provides a plethora of ways to toss data around, especially given iterators that mark where to start, and where to stop. Might as well use that to your advantage. Additionally, letting RAII handle the ownership and cleanup of entities like this rather than manual memory management is nearly always a better approach, and should be encouraged.
In all, if you need continuity (and judging by the looks of that send-call, that's exactly why you're doing this), then copying from non-contiguous to contiguous space is pretty much your only option, and that takes space and copy-time. Not much you can do to avoid that. I suppose peeking into the implementation specifics of std::deque and possibly doing something like stacking send-calls would be possible, but I seriously doubt there would be any reward, and the only savings would likely evaporate in the multi-send invokes.
Finally, there is another option that may be worth considering. Look at the source of all of this. Is a std::deque really warranted? For example, certainly your building that container somewhere else. If you can do that build operation as-efficient, or nearly-so, using a std::vector, then this entire problem goes away, as you can just send that.
For example, if you knew (provably) that your std::deque would never be larger than some size N, you could, pre-size a std::vector or similar continuous RAII-protected allocation, to be 2*N in size, start both a fore and aft iterator pair in the middle and either prepend data by walking the fore iterator backward, or append data by walking the aft iterator forward. In the end, your data will be contiguous between fore and aft, and the send is all that remains. no copies would be required, though added-space is still required. This all hinges on knowing with certainty the maximum message size. If that is available to you, it may be an idea worth profiling.

Related

Can an unordered_set use a different allocator for the nodes and the bucket list?

I'd like to use a std::pmr::unordered_map with a std::pmr::monotonic_buffer_resource. The two fit well together, because the set's nodes are stable, so I don't create a lot of holes in the buffer resource by reallocation:
std::pmr::monotonic_buffer_resource res;
std::pmr::unordered_set<T> set(&res);
That is, except for the bucket list, which needs to be reallocated when the set rehashes as it exceeds the max_load_factor(). Assuming I can't reserve() my way out of this, and I actually care about the holes in the buffer resource left by old bucket lists since grown, what are my options?
If I know that unordered_set is implemented as std::vector<std::forward_list>, as in (some versions of) MSVC, then I should be able to use a scoped_allocator to give different allocators for the vector and the forward_list. But a) I can't rely on unordered_set being a vector<forward_list> in portable code and b) scoped_allocator is an Allocator whereas monotonic_buffer_resource is a memory_resource, an impedance mismatch that will make for very complicated initialization.
Or I could write a switch_memory_resource that delegates to other memory_resources based on the size of the request. I could then use a monotonic_buffer_resource for requests that match the size of the nodes (which, however, I cannot, portably, know, either) and default_memory_resource() for everything else. I could probably make an educated guess that the nodes are at most sizeof(struct {void* next; size_t hash; T value;}), add some error margin by multiplying that by two and use that as the cut-off between the two memory_resources, but I wonder whether there's a cleaner way?

The small number of concrete resource types that I proposed a number of years ago and that were adopted into C++17 was a minimalist set of useful allocators. As evidenced by your question, they do not provide optimal behavior for every circumstance. There are not many tuning dials and I have some regrets about missing functionality, but they are still useful for most cases.
For your specific situation, you say "Assuming I can't reserve() my way out of this, and I actually care about the holes in the buffer resource left by old bucket lists since grown." I'm not sure any general allocator can help you. The geometric growth of the bucket list will leave holes in any allocation strategy. The question is whether those holes can be re-used and/or minimized. As you point out, only a very-carefully customized allocator for the very specific situation will minimize these holes. But maybe your assumptions are too strong.
Consider a std::pmr::vector<int>. This is the worst-case scenario for a monotonic_buffer_resource because every reallocation results in leaked memory. And yet, even this case has a worst-case memory waste of only 50%; i.e., it will never use more than twice as much memory as it would with a resource that perfectly reuses memory blocks. Granted, 50% can be pretty bad, but in your scenario, we are talking much, much less. For a reasonably large set, the bucket list is small compared to the buckets and the data itself, and you can use reserve to minimize reallocation. So my first piece of advice is to go ahead and use the monotonic_buffer_resource without alteration, and measure to see if you have unacceptable memory use. A second experiment would be to use an unsynchronized_pool_resource backed by an (upstream) monotonic_buffer_resource.
If you decide you want to create a custom resource for this purpose, which might be fruitful and might even be fun, your approach of choosing some lower threshold for passing to the monotonic allocator would probably work and would not actually be a lot of effort. You could also consider making it adaptive: Keep a list of the last, say, 4, allocation sizes. If any size gets more than two hits, then assume it is your node size and allocate those nodes from the monotonic resource while other requests get passed directly to the upstream resource. Be careful, however, that you use such a custom allocator only when you know what's going on. If you have a std::pmr::unordered_set<std::pmr::string>, then both approaches may result in many of the strings getting allocated from the upstream resource and thus losing the benefit of the monotonic buffer. As you can see, being too stingy with memory for your bucket list can backfire in a generic context. You're likely to discover that the unmodified monotonic_buffer_resource was a better bet.
Good luck, and please report your findings back here. Also, if you have an idea for a general-purpose resource that could address your problem (or any other common allocation problem), I'd love to hear about it. There's certainly room in the standard for a few more useful resource types.

The definition of std::pmr::unordered_set
using unordered_set = std::unordered_set<Key, Hash, Pred,
std::pmr::polymorphic_allocator<Key>>
Which indicates that it is only used for the nodes.
But a short test shows it is used for both, which makes me sad. So the only way will be to test on the size of the allocation.
this part indicates that allocator only considers Key
std::pmr::polymorphic_allocator<**Key**>
In the old allocators there was a rebind functions to convert to another allocator type that I didn't see in the pmr (it is in std::allocator_traits, see comments).
I was starting to write something like this
void* do_allocate(std::size_t bytes, std::size_t alignment) override {
if (bytes == first_size)
return first_alloc->allocate(bytes, alignment);
return second_alloc->allocate(bytes, alignment);
}
When I saw that there already exists an pre-pmr scoped_allocator_adaptor I don't know if its compatible with pmr, that will have to be investigated further.

store strings in stable memory in c++

A little bit of background first (skip ahead to the boldface if you're bored by this).
I'm trying to glue two pieces of code together. One is a JSON/YML library that makes heavy use of a custom string view object, the other is a piece of code from the early 2000s.
I've been seeing weird behavior for a long time, until I have traced it down to a memory issue, namely that the string views I construct in the JSON/YML library take a const char* as a constructor, and assume that the memory location of that char array stays constant over the lifetime of the string view. However, some of the std::string objects on which I construct these views are temporary, so that's just not true and the string view ends up pointing at garbage.
Now, I thought I was being smart and constructed a cache in the form of a std::vector that would hold all the temporary strings, I would construct the string views on these and only clear the cache in the end - easy.
However, I was still seeing garbled strings every now and then, until I found the reason: sometimes, when pushing things to the vector beyond the preallocated size, the vector would be moved to a different memory location, invalidating all the string views. For now, I've settled on preallocating a cache size that is large enough to avoid any conceivable moving of the vector, but I can see this causing severe and untracable problems in the future for very large runs. So here's my question:
How can I construct a std::vector<std::string> or any other string container that either avoids being moved in memory alltogether, or at least throws an error message if that happens?
Of course, if you feel that I'm going about this whole issue in the wrong way fundamentally, please also let me know how I should deal with this issue instead.
If you're interested, the two pieces of code in question are RapidYAML and the CERN Statistics Library ROOT.

My answer from a similar question: Any way to update pointer/reference value when vector changes capability?
If you store objects in your vector as std::unique_ptr or std::shared_ptr, you can get an observing pointer to the underlying object with std::unique_ptr::get() (or a reference if you dereference the smart pointer). This way, even though the memory location of the smart pointer changes upon resizing, the observing pointer points to the same object and thus the same memory location.

[...] sometimes, when pushing things to the vector beyond the preallocated size, the vector would be moved to a different memory location, invalidating all the string views.
The reason is that std::vector is required to store its data contiguously in memory. So, if you exceed the maximum capacity of the vector when adding an element, it will allocate a new space in memory (big enough this time) and move all the data here.
What you are subject to is called iterator invalidation.
How can I construct a std::vector or any other string container that either avoids being moved in memory alltogether, or at least throws an error message if that happens?
You have at least 3 easy solutions:
If your cache size is supposed to be fixed and is known at compile-time, I would advise you to use std::array instead.
If your cache size is supposed to be fixed but not necessarily known at compile-time, I would advise you to reserve() the required capacity of your std::vector so that you will have the guarantee that it will big enough to not need to be reallocated.
If your cache size may change, I would advise you to use std::list instead. It is implemented as a (usually doubly) linked list. It will guarantee that the elements will not be relocated in memory.
But since they are not stored contiguously in memory, you'll lose the ability to have direct access to any element (i.e. you'll need to iterate over the list in order to find an element).
Of course there probably are other solutions (I do not claim this answer to be exhaustive) but these solutions will allow you to almost not change your code (only the container) and protect your string views to be invalidated.

Perhaps use an std::list. Its accessing method is slower (at least when iterating) but memory location is constant. Reason for both is that it does not use contiguous memory.
Alternatively create a wrapper that wraps a pointer to a string that has been created with "new". That address will also be constant. EDIT: Somehow I managed to miss that what I've just described is pretty much a smartpointer minus automated deletion ;)

Well sadly it is impossible to be able to grow a vector while being sure the content will stay at the same place on classical OS at least.
There is the function realloc that tries to keep the same place, but as you can read on the documentation, there is no guarantee to that, only the os will decide.
To solution your problem, you need the concept of a pool, a pool of string here, that handle the life time of your strings.
You may get away with a simple std::list of string, but it will lead to bad data aliasing and a lot of independent allocations bad to your performances. These will also be the problems with smart pointers.
So if you care about performances, how you may implement it in your case may be not far from your current implementation in my opinion. Because you cannot resize the vector, you should prefer an std::array of a fixed size that you decide at compile time. Then, whenever you need it, you can create a new one to expand your memory capacity. This may be easily implemented by a std::list<std::array> typically.
I don't know if it applies here, but you must be careful if your application can create any number of string during its execution as it may induce an ever growing memory pool, and maybe finally memory problems. To fix that you may insure that the strings you don't use anymore can be reused or freed. Sadly I cannot help you too much here, as these rules will depend on your application.

Is shrink_to_fit the proper way of reducing the capacity a `std::vector` to its size?

In C++11 shrink_to_fit was introduce to complement certain STL containers (e.g., std::vector, std::deque, std::string).
Synopsizing, its main functionality is to request the container that is associated to, to reduce its capacity to fit its size. However, this request is non-binding, and the container implementation is free to optimize otherwise and leave the vector with a capacity greater than its size.
Furthermore, in a previous SO question the OP was discouraged from using shrink_to_fit to reduce the capacity of his std::vector to its size. The reasons not to do so are quoted below:
shrink_to_fit does nothing or it gives you cache locality issues and it's O(n) to
execute (since you have to copy each item to their new, smaller home).
Usually it's cheaper to leave the slack in memory. #Massa
Could someone be so kind as to address the following questions:
Do the arguments in the quotation hold?
If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).
And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?

Do the arguments in the quotation hold?
Measure and you will know. Are you constrained in memory? Can you figure out the correct size up front? It will be more efficient to reserve than it will be to shrink after the fact. In general I am inclined to agree on the premise that most uses are probably fine with the slack.
If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).
The comment does not only apply to shrink_to_fit, but to any other way of shrinking. Given that you cannot realloc in place, it involves acquiring a different chunk of memory and copying over there regardless of what mechanism you use for shrinking.
And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?
The request is non-binding, but the alternatives don't have better guarantees. The question is whether shrinking makes sense: if it does, then it makes sense to provide a shrink_to_fit operation that can take advantage of the fact that the objects are being moved to a new location. I.e., if the type T has a noexcept(true) move constructor, it will allocate the new memory and move the elements.
While you can achieve the same externally, this interface simplifies the operation. The equivalent to shrink_to_fit in C++03 would have been:
std::vector<T>(current).swap(current);
But the problem with this approach is that when the copy is done to the temporary it does not know that current is going to be replaced, there is nothing that tells the library that it can move the held objects. Note that using std::move(current) would not achieve the desired effect as it would move the whole buffer, maintaining the same capacity().
Implementing this externally would be a bit more cumbersome:
{
std::vector<T> copy;
if (noexcept(T(std::move(declval<T>())))) {
copy.assign(std::make_move_iterator(current.begin()),
std::make_move_iterator(current.end()));
} else {
copy.assign(current.begin(), current.end());
}
copy.swap(current);
}
Assuming that I got the if condition right... which is probably not what you want to write every time that you want this operation.

Will the arguments hold?
As the arguments are originally mine, don't mind if I defend them, one by one:
Either shrink_to_fit does nothing (...)
As it was mentioned, the standard says (many times, but in the case of vector it's section 23.3.7.3...) that the request is non-binding to allow an implementation latitude for optimizations. This means that the implementation can define shrink_to_fit as an no-op.
(...) or it gives you cache locality issues
In the case that shrink_to_fit is not implemented as a no-op, you have to allocate a new underlying container with capacity size(), copy (or, in the best case, move) construct all your N = size() new items from the old ones, destruct all the old ones (in the move case this should be optimized, but it's possible that this involves a loop again over the old container) and then destructing the old container per se. This is done, in libstdc++-4.9, exactly as David Rodriguez has described, by
_Tp(__make_move_if_noexcept_iterator(__c.begin()),
__make_move_if_noexcept_iterator(__c.end()),
__c.get_allocator()).swap(__c);
and in libc++-3.5, by a function in __alloc_traits that does approximately the same.
Oh, and an implementation absolutely cannot rely on realloc (even if it uses malloc inside ::operator new for its memory allocations) because realloc, if it cannot shrink in-place, will either leave the memory alone (no-op case) or make a bitwise copy (and miss the opportunity for readjusting pointers, etc. that the proper C++ copying/moving constructors would give).
Sure, one can write a shrinkable memory allocator, and use it in the constructor of its vectors.
In the easy case where the vectors are larger than the cache lines, all that movement puts pressure on the cache.
and it's O(n)
If n = size(), I think it was established above that, at the very least, you have to do one n sized allocation, n copy or move constructions, n destructions, and one old_capacity sized deallocation.
usually it's cheaper just to leave the slack in memory
Obviously, unless you are really pressed for free memory (in which case it might be wiser to save your data to the disk and re-load it later on demand...)
If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).
The proper way is still shrink_to_fit... you just have to either not rely on it or know very well your implementation!
And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?
There is no better way, but the reason for the existence of shrink_to_fit is, AFAICT, that sometimes your program might feel memory pressure and it's one way of treating it. Not a very good way, but still.
HTH!

If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).
The 'swap trick' will trim a vector to the exact size required (from More Effective STL):
vector<Person>(persons).swap(persons);
Particularly useful when the vector is empty, to release all memory:
vector<Person>().swap(persons);
Vectors were constantly tripping my unit tester's memory leak detection code because of retained allocations of unused space, and this sorted them out perfectly.
This is the kind of example where I really don't care about runtime efficiency (size or speed), but I do care about exact memory usage.
And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?
I really don't know what the point of providing a function that can legally do absolutely nothing is.
I cheered when I saw it had been introduced, then despaired when I found it couldn't be relied upon.
Perhaps we'll see maybe_sort() in the next version.

Why would I prefer using vector to deque

Since
they are both contiguous memory containers;
feature wise, deque has almost everything vector has but more, since it is more efficient to insert in the front.
Why whould anyone prefer std::vector to std::deque?

Elements in a deque are not contiguous in memory; vector elements are guaranteed to be. So if you need to interact with a plain C library that needs contiguous arrays, or if you care (a lot) about spatial locality, then you might prefer vector. In addition, since there is some extra bookkeeping, other ops are probably (slightly) more expensive than their equivalent vector operations. On the other hand, using many/large instances of vector may lead to unnecessary heap fragmentation (slowing down calls to new).
Also, as pointed out elsewhere on StackOverflow, there is more good discussion here: http://www.gotw.ca/gotw/054.htm .

To know the difference one should know how deque is generally implemented. Memory is allocated in blocks of equal sizes, and they are chained together (as an array or possibly a vector).
So to find the nth element, you find the appropriate block then access the element within it. This is constant time, because it is always exactly 2 lookups, but that is still more than the vector.
vector also works well with APIs that want a contiguous buffer because they are either C APIs or are more versatile in being able to take a pointer and a length. (Thus you can have a vector underneath or a regular array and call the API from your memory block).
Where deque has its biggest advantages are:
When growing or shrinking the collection from either end
When you are dealing with very large collection sizes.
When dealing with bools and you really want bools rather than a bitset.
The second of these is lesser known, but for very large collection sizes:
The cost of reallocation is large
The overhead of having to find a contiguous memory block is restrictive, so you can run out of memory faster.
When I was dealing with large collections in the past and moved from a contiguous model to a block model, we were able to store about 5 times as large a collection before we ran out of memory in a 32-bit system. This is partly because, when re-allocating, it actually needed to store the old block as well as the new one before it copied the elements over.
Having said all this, you can get into trouble with std::deque on systems that use "optimistic" memory allocation. Whilst its attempts to request a large buffer size for a reallocation of a vector will probably get rejected at some point with a bad_alloc, the optimistic nature of the allocator is likely to always grant the request for the smaller buffer requested by a deque and that is likely to cause the operating system to kill a process to try to acquire some memory. Whichever one it picks might not be too pleasant.
The workarounds in such a case are either setting system-level flags to override optimistic allocation (not always feasible) or managing the memory somewhat more manually, e.g. using your own allocator that checks for memory usage or similar. Obviously not ideal. (Which may answer your question as to prefer vector...)

I've implemented both vector and deque multiple times. deque is hugely more complicated from an implementation point of view. This complication translates to more code and more complex code. So you'll typically see a code size hit when you choose deque over vector. You may also experience a small speed hit if your code uses only the things the vector excels at (i.e. push_back).
If you need a double ended queue, deque is the clear winner. But if you're doing most of your inserts and erases at the back, vector is going to be the clear winner. When you're unsure, declare your container with a typedef (so it is easy to switch back and forth), and measure.

std::deque doesn't have guaranteed continuous memory - and it's often somewhat slower for indexed access. A deque is typically implemented as a "list of vector".

According to http://www.cplusplus.com/reference/stl/deque/, "unlike vectors, deques are not guaranteed to have all its elements in contiguous storage locations, eliminating thus the possibility of safe access through pointer arithmetics."
Deques are a bit more complicated, in part because they don't necessarily have a contiguous memory layout. If you need that feature, you should not use a deque.
(Previously, my answer brought up a lack of standardization (from the same source as above, "deques may be implemented by specific libraries in different ways"), but that actually applies to just about any standard library data type.)

A deque is a sequence container which allows random access to it's elements but it is not guaranteed to have contiguous storage.

I think that good idea to make perfomance test of each case. And make decision relying on this tests.
I'd prefer std::deque than std::vector in most cases.

You woudn't prefer vector to deque acording to these test results (with source).
Of course, you should test in your app/environment, but in summary:
push_back is basically the same for all
insert, erase in deque are much faster than list and marginally faster than vector
Some more musings, and a note to consider circular_buffer.

On the one hand, vector is quite frequently just plain faster than deque. If you don't actually need all of the features of deque, use a vector.
On the other hand, sometimes you do need features which vector does not give you, in which case you must use a deque. For example, I challenge anyone to attempt to rewrite this code, without using a deque, and without enormously altering the algorithm.

Note that vector memory is re-allocated as the array grows. If you have pointers to vector elements, they will become invalid.
Also, if you erase an element, iterators become invalid (but not "for(auto...)").
Edit: changed 'deque' to 'vector'

Which STL container should I use for a FIFO?

Which STL container would fit my needs best? I basically have a 10 elements wide container in which I continually push_back new elements while pop_front ing the oldest element (about a million time).
I am currently using a std::deque for the task but was wondering if a std::list would be more efficient since I wouldn't need to reallocate itself (or maybe I'm mistaking a std::deque for a std::vector?). Or is there an even more efficient container for my need?
P.S. I don't need random access

Since there are a myriad of answers, you might be confused, but to summarize:
Use a std::queue. The reason for this is simple: it is a FIFO structure. You want FIFO, you use a std::queue.
It makes your intent clear to anybody else, and even yourself. A std::list or std::deque does not. A list can insert and remove anywhere, which is not what a FIFO structure is suppose to do, and a deque can add and remove from either end, which is also something a FIFO structure cannot do.
This is why you should use a queue.
Now, you asked about performance. Firstly, always remember this important rule of thumb: Good code first, performance last.
The reason for this is simple: people who strive for performance before cleanliness and elegance almost always finish last. Their code becomes a slop of mush, because they've abandoned all that is good in order to really get nothing out of it.
By writing good, readable code first, most of you performance problems will solve themselves. And if later you find your performance is lacking, it's now easy to add a profiler to your nice, clean code, and find out where the problem is.
That all said, std::queue is only an adapter. It provides the safe interface, but uses a different container on the inside. You can choose this underlying container, and this allows a good deal of flexibility.
So, which underlying container should you use? We know that std::list and std::deque both provide the necessary functions (push_back(), pop_front(), and front()), so how do we decide?
First, understand that allocating (and deallocating) memory is not a quick thing to do, generally, because it involves going out to the OS and asking it to do something. A list has to allocate memory every single time something is added, and deallocate it when it goes away.
A deque, on the other hand, allocates in chunks. It will allocate less often than a list. Think of it as a list, but each memory chunk can hold multiple nodes. (Of course, I'd suggest that you really learn how it works.)
So, with that alone a deque should perform better, because it doesn't deal with memory as often. Mixed with the fact you're handling data of constant size, it probably won't have to allocate after the first pass through the data, whereas a list will be constantly allocating and deallocating.
A second thing to understand is cache performance. Going out to RAM is slow, so when the CPU really needs to, it makes the best out of this time by taking a chunk of memory back with it, into cache. Because a deque allocates in memory chunks, it's likely that accessing an element in this container will cause the CPU to bring back the rest of the container as well. Now any further accesses to the deque will be speedy, because the data is in cache.
This is unlike a list, where the data is allocated one at a time. This means that data could be spread out all over the place in memory, and cache performance will be bad.
So, considering that, a deque should be a better choice. This is why it is the default container when using a queue. That all said, this is still only a (very) educated guess: you'll have to profile this code, using a deque in one test and list in the other to really know for certain.
But remember: get the code working with a clean interface, then worry about performance.
John raises the concern that wrapping a list or deque will cause a performance decrease. Once again, he nor I can say for certain without profiling it ourselves, but chances are that the compiler will inline the calls that the queue makes. That is, when you say queue.push(), it will really just say queue.container.push_back(), skipping the function call completely.
Once again, this is only an educated guess, but using a queue will not degrade performance, when compared to using the underlying container raw. Like I've said before, use the queue, because it's clean, easy to use, and safe, and if it really becomes a problem profile and test.

Check out std::queue. It wraps an underlying container type, and the default container is std::deque.

Where performance really matters, check out the Boost circular buffer library.

I continually push_back new elements
while pop_front ing the oldest element
(about a million time).
A million is really not a big number in computing. As others have suggested, use a std::queue as your first solution. In the unlikely event of it being too slow, identify the bottleneck using a profiler (do not guess!) and re-implement using a different container with the same interface.

Why not std::queue? All it has is push_back and pop_front.

A queue is probably a simpler interface than a deque but for such a small list, the difference in performance is probably negligible.
Same goes for list. It's just down to a choice of what API you want.

Use a std::queue, but be cognizant of the performance tradeoffs of the two standard Container classes.
By default, std::queue is an adapter on top of std::deque. Typically, that'll give good performance where you have a small number of queues containing a large number of entries, which is arguably the common case.
However, don't be blind to the implementation of std::deque. Specifically:
"...deques typically have large minimal memory cost; a deque holding just one element has to allocate its full internal array (e.g. 8 times the object size on 64-bit libstdc++; 16 times the object size or 4096 bytes, whichever is larger, on 64-bit libc++)."
To net that out, presuming that a queue entry is something that you'd want to queue, i.e., reasonably small in size, then if you have 4 queues, each containing 30,000 entries, the std::deque implementation will be the option of choice. Conversely, if you have 30,000 queues, each containing 4 entries, then more than likely the std::list implementation will be optimal, as you'll never amortize the std::deque overhead in that scenario.
You'll read a lot of opinions about how cache is king, how Stroustrup hates linked lists, etc., and all of that is true, under certain conditions. Just don't accept it on blind faith, because in our second scenario there, it's quite unlikely that the default std::deque implementation will perform. Evaluate your usage and measure.

This case is simple enough that you can just write your own. Here is something that works well for micro-conroller situations where STL use takes too much space. It is nice way to pass data and signal from interrupt handler to your main loop.
// FIFO with circular buffer
#define fifo_size 4
class Fifo {
uint8_t buff[fifo_size];
int writePtr = 0;
int readPtr = 0;
public:
void put(uint8_t val) {
buff[writePtr%fifo_size] = val;
writePtr++;
}
uint8_t get() {
uint8_t val = NULL;
if(readPtr < writePtr) {
val = buff[readPtr%fifo_size];
readPtr++;
// reset pointers to avoid overflow
if(readPtr > fifo_size) {
writePtr = writePtr%fifo_size;
readPtr = readPtr%fifo_size;
}
}
return val;
}
int count() { return (writePtr - readPtr);}
};

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js