Which C++ stl container should I use? - c++

Imagine the following requirements:
measurement data should be logged and the user should be able to iterate through the data.
uint32_t timestamp;
uint16_t place;
struct SomeData someData;
have a timestamp (uint32_t), a place (uint16_t) and some data in a struct
have a constant number of datasets. If a new one arrives, the oldest is thrown away.
the number of "place" is dynamic, the user can insert new ones during runtime
it should be possible to iterate through the data to the next newer or older dataset but only if the place is the same
need to insert at the end only
memory should be allocated once at program start
insertion need not to be fast but should not block other threads for a long time which might be iterating through the container
memory requirement should be low
EDIT: - The container should all the memory which is not used otherwise, therefore it can be large.
I am not sure which container I should use. It is an embedded system and should not use boost etc.
I see the following possibilities:
std::vector - drawbacks: The insertion at the end requires that all objects are copied and during this time another thread cannot access the vector. Edit: This can be avoided by implementing it as a circular buffer - see comments below. When iterating throught the vector, I have to test the place ID. Maybe it might also be a problem to allocate much memory as one block - because the memory could be segmented?
std::deque - compared to std::vector insertion (and pop_back) is faster but memory requirement? Iterators do not become invalid if the insertion is at the end. But I still have to iterate and test the second ID ("place"). I think it does not need to allocate all the memory in one big block as it is the case with vector or array. If an element is added in front and another one is removed at the end (or removed first and added after), I guess there does no memory allocation take place?
std::queue - instead of deque, I should rather use a queue? Is it true that in many implementations a queue ist implented just as a deque?
std::map - Like deque any iterators to existing elements will not become invalid. If I make the key a combination of place and timestamp, then iteration through the map is maybe faster because it is already sorted? Memory requirements of a map?
std::multimap - as the number of places is not constant I cannot make a multimap with "place" as the index.
std::list - has no advantage over deque here?
Some suggested the use of a circular buffer. If I do not want that the memory is allocated as one big block I still have to use a container and most questions above stay valid.
Update:
I will use a ring buffer as suggested here but using a deque as the underlying container. In order to being able to scroll fast through the datasets with the preselected "place" I will eventually introduce two additional indices into the data struct which will point to the previous and the next index with the same place.
How much memory will be used? In my special case the size of the struct is 56 bytes. The gnu lib uses 512 bytes as minimum block size, the IAR compiler 16 bytes. Hence the used block size will be 512 or 56 bytes respectively. Besides two iterators (using 4 pointers each) and the size there will be a pointer stored for each block. Therefore in the implementation of the iar compiler (block size 56 bytes) there will be 7 % overhead (on a 32 bit system) compared to the use of a std::vector or array. In the gcc implementation there will fit 9 objects in the block (504 bytes) while 512 + 4 bytes are needed per block which is 2 % more.
The block size is not large but the continuous memory size needed for the pointer array is already relatively large, especially for the implementation where one block is one struct.
A std::list would need 2 pointers per struct which is 14 % overhead in my case on 32 bit systems.

std::vector
... the memory could be segmented?
No, std::vector allocates contiguous memory, as is documented in that link. Arrays are also contiguous, but you might just as well use vector for this.
std::deque is segmented, which you said you didn't want. Or do you want to avoid a single large allocated block? It's not clear.
Anyway, it has no benefit over vector if you really want a circular buffer (because you'll never be adding/removing elements from the front/back anyway), and you can't control the block size.
std::queue
... Is it true that in many implementations a queue is implented just as a deque?
Yes, that's the default in all implementations. See the linked documentation or any decent book.
It doesn't sound like you want a FIFO queue, so I don't know why you're considering this one - the interface doesn't match your stated requirement.
'std::map`
... iteration through the map is maybe faster because it is already sorted?
On most modern server/desktop architectures, map will be slower because advancing an iterator involves a pointer chase (which impairs pipelining) and a likely cache miss. Your anonymous embedded architecture may be less sensitive to these effects, so map may be faster for you.
... Memory requirements of a map?
Higher. You have the node size (at least a couple of pointers) added to each element.

Related

std::vector increasing peak memory

This is in continuation of my last question. I am failed to understand the memory taken up by vector. Problem skeleton:
Consider an vector which is an collection of lists and lists is an collection of pointers. Exactly like:
std::vector<std::list<ABC*> > vec;
where ABC is my class. We work on 64bit machines, so size of pointer is 8 bytes.
At the start of my flow in the project, I resize this vector to an number so that I can store lists at respective indexes.
vec.resize(613284686);
At this point, capacity and size of the vector would be 613284686. Right. After resizing, I am inserting the lists at corresponding indexes as:
// Some where down in the program, make these lists. Simple push for now.
std::list<ABC*> l1;
l1.push_back(<pointer_to_class_ABC>);
l1.push_back(<pointer_to_class_ABC>);
// Copy the list at location
setInfo(613284686, l1);
void setInfo(uint64_t index, std::list<ABC*> list>) {
std::copy(list.begin(), list.end(), std::back_inserter(vec.at(index));
}
Alright. So inserting is done. Notable things are:
Size of vector is : 613284686
Entries in the vector is : 3638243731 // Calculated this by going over vector indexes and add the size of std::lists at each index.
Now, since there are 3638243731 entries of pointers, I would expect memory taken by this vector is ~30Gb. 3638243731 * 8(bytes) = ~30Gb.
BUT BUT When I have this data in memory, memory peaks to, 400G.
And then I clear up this vector with:
std::vector<std::list<nl_net> >& ccInfo = getVec(); // getVec defined somewhere and return me original vec.
std::vector<std::list<nl_net> >::iterator it = ccInfo.begin();
for(; it != ccInfo.end(); ++it) {
(*it).clear();
}
ccInfo.clear(); // Since it is an reference
std::vector<std::list<nl_net> >().swap(ccInfo); // This makes the capacity of the vector 0.
Well, after clearing up this vector, memory drops down to 100G. That is too much holding from an vector.
Would you all like to correct me what I am failing to understand here?
P.S. I can not reproduce it on smaller cases and it is coming in my project.
vec.resize(613284686);
At this point, capacity and size of the vector would be 613284686
It would be at least 613284686. It could be more.
std::vector<std::list<nl_net> >().swap(ccInfo); // This makes the capacity of the vector 0.
Technically, there is no guarantee by the standard that a default constructed vector wouldn't have capacity other than 0... But in practice, this is probably true.
Now, since there are 3638243731 entries of pointers, I would expect memory taken by this vector is ~30Gb. 3638243731 * 8(bytes)
But the vector doesn't contain pointers. It contains std::list<ABC*> objects. So, you should expect vec.capacity() * sizeof(std::list<ABC*>) bytes used by the buffer of the vector itself. Each list has at least a pointer to beginning and the end.
Furthermore, you should expect each element in each of the lists to use memory as well. Since the list is doubly linked, you should expect about two pointers plus the data (a third pointer) worth of memory for each element.
Also, each pointer in the lists apparently points to an ABC object, and each of those use sizeof(ABC) memory as well.
Furthermore, since each element of the linked lists are allocated separately, and each dynamic allocation requires book-keeping so that they can be individually de-allocated, and each allocation must be aligned to the maximum native alignment, and the free store may have fragmented during the execution, there will be much overhead associated with each dynamic allocation.
Well, after clearing up this vector, memory drops down to 100G.
It is quite typical for the language implementation to retain (some) memory it has allocated from the OS. If your target system documents an implementation specific function for explicitly requesting release of such memory, then you could attempt using that.
However, if the vector buffer wasn't the latest dynamic allocation, then its deallocation may have left a massive reusable area in the free store, but if there exists later allocations, then all that memory might not be releasable back to the OS.
Even if the langauge implementation has released the memory to the OS, it is quite typical for the OS to keep the memory mapped for the process until another process actually needs the memory for something else. So, depending on how you're measuring memory use, the results might not necessarily be meaningful.
General rules of thumb that may be useful:
Don't use a vector unless you use all (or most) of the indices. In case where you don't, consider a sparse array instead (there is no standard container for such data structure though).
When using vector, reserve before resize if you know the upper bound of allocation.
Don't use linked lists without a good reason.
Don't rely on getting all memory back from peak usage (back to the OS that is; The memory is still usable for further dynamic allocations).
Don't stress about virtual memory usage.
std::list is a fragmented memory container. Typically each node MUST have the data it is storing, plus the 2 prev/next pointers, and then you have to add in the space required within the OS allocation table (typically 16 or 32 bytes per allocation - depending on OS). You then have to account for the fact all allocations must be returned on a 16byte boundary (on Intel/AMD based 64bit machines anyway).
So using the example of std::list<ABC*> the size of a pointer is 8, however you will need at least 48bytes to store each element (at least).
So memory usage for ONLY the list entries is going to be around: 3638243731 * 48(bytes) = ~162Gb.
This is of course assuming that there is no memory fragmentation (where there may be a block of 62bytes free, and the OS returns the entire block of 62 rather than the 48 requested). We are also assuming here that the OS has a minimum allocation size of 48 bytes (and not say, 64bytes, which would not be overly silly, but would push the usage up far higher).
The size of the std::lists themselves within the vector comes to around 18GB. So in total we are looking at 180Gb at least to store that vector. It would not be beyond the realm of possibility that the extra allocations are additional OS book keeping info, for all of those individual memory allocations (e.g. lists of loaded memory pages, lists of swapped out memory pages, the read/write/mmap permissions, etc, etc).
As a final note, instead of using swap on a newly constructed vector, you can just use shrink to fit.
ccInfo.clear();
ccInfo.shrinkToFit();
The main vector needs some more consideration. I get the impression it will always be a fixed size. So why not use a std::array instead? A std::vector always allocates more memory than it needs to allow for growth. The bigger your vector the bigger the reservation of memory to allow for more even growth. The reasononing behind is to keep relocations in memory to a minimum. Relocations on really big vectors take up huge amounts of time so a lot of extra memory is reserved to prevent this.
No vector function that can delete elements (such as vector::clear and ::erase) also deallocates memory (e.g. lower the capacity). The size will decrease but the capacity doesn't. Again, this is meant to prevent relocations; if you delete you are also very likely to add again. ::shrink_to_fit also doesn't guarantuee you that all of the used memory is released.*
Next is the choice of a list to store elements. Is a list really applicable? Lists are strong in random access/insertion/removal operations. Are you really constantly adding and removing ABC objects to the list in random locations? Or is another container type with different properties but with contiguous memory more suitable? Another std::vector or std::array perhaps. If the answer is yes than you're pretty much stuck with a list and its scattered memory allocations. If no, than you could win back a lot of memory by using a different container type.
So, what is it you really want to do? Do you really need dynamic growth on both the main container and its elements? Do you really need random manipulation? Or can you use fixed-size arrays for both container and ABC objects and use iteration instead? When contemplating this you might want to read up on the available containers and their properties on en.cppreference.com. It will help you decide what is most appropriate.
*For the fun of it I dug around in VS2017's implementation and it creates an entirely new vector without the growth segment, copies the old elements and then reassigns the internal pointers of the old vector to the new one while deleting the old memory. So at least with that compiler you can count on memory being released.

Memory usage in C++ program

I wrote a program that needs to handle a very large data with the following libraries:
vector
boost::unordered_map
boost::unordered_multimap
So, I'm having memory problems (The program uses a LOT) and I was thinking maybe I can replace this libraries (with something that already exists or my own implementations):
So, three questions:
How much memory I'd save if I replace vector with a C array? Is it worth it?
Can someone explain how is the memory used in boost::unordered_map and boost::unordered_multimap in the current implementation? Like what's stored in order to achieve their performance.
Can you recommend me some libraries that outperform boost::unordered_map and boost::unordered_multimap in memory usage (But not something too slow) ?
std::vector is memory efficient. I don't know about the boost maps, but the Boost people usually know what they're doing, I doubt you'll save a lot of memory by creating your own variants.
You can do a few other things to help with memory issues:
Compile in 64 bit. Running out of memory in a 64 bit process is very hard.
You won't run out of memory, but memory might get swapped out. You should instead see if you need to load everything into memory at once, perhaps you can work on chunks of the data at a time.
As a side benefit, working on a chunk of the data at a time allows you to run your code in parallel.
With memory being so cheap nowadays, so that allocating 10GB of RAM is very simple, I guess your bottlenecks will be in the processing you do of the data, not of allocating the data.
These two articles explain the data structures underyling some common implementations of unordered associative containers:
Implementation of C++ unordered associative containers
Implementation of C++ unordered associative containers with duplicate elements
Even though there are some differences between implementations, they are modest --one word per element at most. If you go with minimum-overhead solutions such as sorted vectors, this would gain you 2-3 words per element, not even a 2x improvement if your objects are large. So, you'd probably be better off resorting to an environment with more memory or radically changing your approach by using a database or something.
If you have only one set of data and multiple ways of accessing it you can try to use boost::multi_index here is the documentation.
std::vector is basically a contiguous array, plus a few bytes of overhead. About the only way you'll improve with vector is by using a smaller element type. Can you store a short int instead of a regular int? If so, you can cut the vector memory down by half.
Are you perhaps using these containers to hold pointers to many objects on the heap? If so, you may have a lot of wasted space in the heap that could be saved by writing custom allocators, or by doing away with a pointer to a heap element altogether, and by storing a value type within the container.
Look within your class types. Consider all pointer types, and whether they need to be dynamic storage or not. The typical class often has pointer members hanging off a base object, which means a single object is a graph of memory chunks in itself. The more you can inline your class members, the more efficient your use of the heap will be.
RAM is cheap in 2014. Easy enough to build x86-64 Intel boxes with 64-256GB of RAM and Solid State Disk as fast swap if your current box isn't cutting it for the project. Hopefully this isn't a commercial desktop app we are discussing. :)
I ended up by changing boost::unordered_multimap for std::unordered_map of vector.
boost::unordered_multimap consumes more than two times the memory consumed by std::unordered_map of vector due the extra pointers it keeps (at least one extra pointer per element) and the fact that it stores the key and the value of each element, while unordered_map of vector only stores the key once for a vector that contains all the colliding elements.
In my particular case, I was trying to store about 4k million integers, consuming about 15 GB of memory in the ideal case. Using the multimap I get to consume more than 40 GB while using the map I use about 15 GB (a little bit more due the pointers and other structures but their size if despicable).

Why is deque using so much more RAM than vector in C++?

I have a problem I am working on where I need to use some sort of 2 dimensional array. The array is fixed width (four columns), but I need to create extra rows on the fly.
To do this, I have been using vectors of vectors, and I have been using some nested loops that contain this:
array.push_back(vector<float>(4));
array[n][0] = a;
array[n][1] = b;
array[n][2] = c;
array[n][3] = d;
n++
to add the rows and their contents. The trouble is that I appear to be running out of memory with the number of elements I was trying to create, so I reduced the number that I was using. But then I started reading about deque, and thought it would allow me to use more memory because it doesn't have to be contiguous. I changed all mentions of "vector" to "deque", in this loop, as well as all declarations. But then it appeared that I ran out of memory again, this time with even with the reduced number of rows.
I looked at how much memory my code is using, and when I am using deque, the memory rises steadily to above 2GB, and the program closes soon after, even when using the smaller number of rows. I'm not sure exactly where in this loop it is when it runs out of memory.
When I use vectors, the memory usage (for the same number of rows) is still under 1GB, even when the loop exits. It then goes on to a similar loop where more rows are added, still only reaching about 1.4GB.
So my question is. Is this normal for deque to use more than twice the memory of vector, or am I making an erroneous assumption in thinking I can just replace the word "vector" with "deque" in the declarations/initializations and the above code?
Thanks in advance.
I'm using:
MS Visual C++ 2010 (32-bit)
Windows 7 (64-bit)
The real answer here has little to do with the core data structure. The answer is that MSVC's implementation of std::deque is especially awful and degenerates to an array of pointers to individual elements, rather than the array of arrays it should be. Frankly, only twice the memory use of vector is surprising. If you had a better implementation of deque you'd get better results.
It all depends on the internal implementation of deque (I won't speak about vector since it is relatively straightforward).
Fact is, deque has completely different guarantees than vector (the most important one being that it supports O(1) insertion at both ends while vector only supports O(1) insertion at the back). This in turn means the internal structures managed by deque have to be more complex than vector.
To allow that, a typical deque implementation will split its memory in several non-contiguous blocks. But each individual memory block has a fixed overhead to allow the memory management to work (eg. whatever the size of the block, the system may need another 16 or 32 bytes or whatever in addition, just for bookkeeping). Since, contrary to a vector, a deque requires many small, independent blocks, the overhead stacks up which can explain the difference you see. Also note that those individual memory blocks need to be managed (maybe in separate structures?), which probably means some (or a lot of) additional overhead too.
As for a way to solve your problem, you could try what #BasileStarynkevitch suggested in the comments, this will indeed reduce your memory usage but it will get you only so far because at some point you'll still run out of memory. And what if you try to run your program on a machine that only has 256MB RAM? Any other solution which goal is to reduce your memory footprint while still trying to keep all your data in memory will suffer from the same problems.
A proper solution when handling large datasets like yours would be to adapt your algorithms and data structures in order to be able to handle small partitions at a time of your whole dataset, and load/save those partitions as needed in order to make room for the other partitions. Unfortunately since it probably means disk access, it also means a big drop in performance but hey, you can't eat the cake and have it too.
Theory
There two common ways to efficiently implement a deque: either with a modified dynamic array or with a doubly linked list.
The modified dynamic array uses is basically a dynamic array that can grow from both ends, sometimes called array deques. These array deques have all the properties of a dynamic array, such as constant-time random access, good locality of reference, and inefficient insertion/removal in the middle, with the addition of amortized constant-time insertion/removal at both ends, instead of just one end.
There are several implementations of modified dynamic array:
Allocating deque contents from the center of the underlying array,
and resizing the underlying array when either end is reached. This
approach may require more frequent resizings and waste more space,
particularly when elements are only inserted at one end.
Storing deque contents in a circular buffer, and only resizing when
the buffer becomes full. This decreases the frequency of resizings.
Storing contents in multiple smaller arrays, allocating additional
arrays at the beginning or end as needed. Indexing is implemented by
keeping a dynamic array containing pointers to each of the smaller
arrays.
Conclusion
Different libraries may implement deques in different ways, but generally as a modified dynamic array. Most likely your standard library uses the approach #1 to implement std::deque, and since you append elements only from one end, you ultimately waste a lot of space. For that reason, it makes an illusion that std::deque takes up more space than usual std::vector.
Furthermore, if std::deque would be implemented as doubly-linked list, that would result in a waste of space too since each element would need to accommodate 2 pointers in addition to your custom data.
Implementation with approach #3 (modified dynamic array approach too) would again result in a waste of space to accommodate additional metadata such as pointers to all those small arrays.
In any case, std::deque is less efficient in terms of storage than plain old std::vector. Without knowing what do you want to achieve I cannot confidently suggest which data structure do you need. However, it seems like you don't even know what deques are for, therefore, what you really want in your situation is std::vector. Deques, in general, have different application.
Deque can have additional memory overhead over vector because it's made of a few blocks instead of contiguous one.
From en.cppreference.com/w/cpp/container/deque:
As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays.
The primary issue is running out of memory.
So, do you need all the data in memory at once?
You may never be able to accomplish this.
Partial Processing
You may want to consider processing the data into "chunks" or smaller sub-matrices. For example, using the standard rectangular grid:
Read data of first quadrant.
Process data of first quandrant.
Store results (in a file) of first quandrant.
Repeat for remaining quandrants.
Searching
If you are searching for a particle or a set of datum, you can do that without reading in the entire data set into memory.
Allocate a block (array) of memory.
Read a portion of the data into this block of memory.
Search the block of data.
Repeat steps 2 and 3 until the data is found.
Streaming Data
If your application is receiving the raw data from an input source (other than a file), you will want to store the data for later processing.
This will require more than one buffer and is more efficient using at least two threads of execution.
The Reading Thread will be reading data into a buffer until the buffer is full. When the buffer is full, it will read data into another empty one.
The Writing Thread will initially wait until either the first read buffer is full or the read operation is finished. Next, the Writing Thread takes data out of the read buffer and writes to a file. The Write Thread then starts writing from the next read buffer.
This technique is called Double Buffering or Multiple Buffering.
Sparse Data
If there is a lot of zero or unused data in the matrix, you should try using Sparse Matrices. Essentially, this is a list of structures that hold the data's coordinates and the value. This also works when most of the data is a common value other than zero. This saves a lot of memory space; but costs a little bit more execution time.
Data Compression
You could also change your algorithms to use data compression. The idea here is to store the data location, value and the number or contiguous equal values (a.k.a. runs). So instead of storing 100 consecutive data points of the same value, you would store the starting position (of the run), the value, and 100 as the quantity. This saves a lot of space, but requires more processing time when accessing the data.
Memory Mapped File
There are libraries that can treat a file as memory. Essentially, they read in a "page" of the file into memory. When the requests go out of the "page", they read in another page. All this is performed "behind the scenes". All you need to do is treat the file like memory.
Summary
Arrays and deques are not your primary issue, quantity of data is. Your primary issue can be resolved by processing small pieces of data at a time, compressing the data storage, or treating the data in the file as memory. If you are trying to process streaming data, don't. Ideally, streaming data should be placed into a file and then processed later.
A historical purpose of a file is to contain data that doesn't fit into memory.

Linked list vs dynamic array for implementing a stack using vector class

I was reading up on the two different ways of implementing a stack: linked list and dynamic arrays. The main advantage of a linked list over a dynamic array was that the linked list did not have to be resized while a dynamic array had to be resized if too many elements were inserted hence wasting alot of time and memory.
That got me wondering if this is true for C++ (as there is a vector class which automatically resizes whenever new elements are inserted)?
It's difficult to compare the two, because the patterns of their memory usage are quite different.
Vector resizing
A vector resizes itself dynamically as needed. It does that by allocating a new chunk of memory, moving (or copying) data from the old chunk to the new chunk, the releasing the old one. In a typical case, the new chunk is 1.5x the size of the old (contrary to popular belief, 2x seems to be quite unusual in practice). That means for a short time while reallocating, it needs memory equal to roughly 2.5x as much as the data you're actually storing. The rest of the time, the "chunk" that's in use is a minimum of 2/3rds full, and a maximum of completely full. If all sizes are equally likely, we can expect it to average about 5/6ths full. Looking at it from the other direction, we can expect about 1/6th, or about 17% of the space to be "wasted" at any given time.
When we do resize by a constant factor like that (rather than, for example, always adding a specific size of chunk, such as growing in 4Kb increments) we get what's called amortized constant time addition. In other words, as the array grows, resizing happens exponentially less often. The average number of times items in the array have been copied tends to a constant (usually around 3, but depends on the growth factor you use).
linked list allocations
Using a linked list, the situation is rather different. We never see resizing, so we don't see extra time or memory usage for some insertions. At the same time, we do see extra time and memory used essentially all the time. In particular, each node in the linked list needs to contain a pointer to the next node. Depending on the size of the data in the node compared to the size of a pointer, this can lead to significant overhead. For example, let's assume you need a stack of ints. In a typical case where an int is the same size as a pointer, that's going to mean 50% overhead -- all the time. It's increasingly common for a pointer to be larger than an int; twice the size is fairly common (64-bit pointer, 32-bit int). In such a case, you have ~67% overhead -- i.e., obviously enough, each node devoting twice as much space to the pointer as the data being stored.
Unfortunately, that's often just the tip of the iceberg. In a typical linked list, each node is dynamically allocated individually. At least if you're storing small data items (such as int) the memory allocated for a node may be (usually will be) even larger than the amount you actually request. So -- you ask for 12 bytes of memory to hold an int and a pointer -- but the chunk of memory you get is likely to be rounded up to 16 or 32 bytes instead. Now you're looking at overhead of at least 75% and quite possibly ~88%.
As far as speed goes, the situation is rather similar: allocating and freeing memory dynamically is often quite slow. The heap manager typically has blocks of free memory, and has to spend time searching through them to find the block that's most suited to the size you're asking for. Then it (typically) has to split that block into two pieces, one to satisfy your allocation, and another of the remaining memory it can use to satisfy other allocations. Likewise, when you free memory, it typically goes back to that same list of free blocks and checks whether there's an adjoining block of memory already free, so it can join the two back together.
Allocating and managing lots of blocks of memory is expensive.
cache usage
Finally, with recent processors we run into another important factor: cache usage. In the case of a vector, we have all the data right next to each other. Then, after the end of the part of the vector that's in use, we have some empty memory. This leads to excellent cache usage -- the data we're using gets cached; the data we're not using has little or no effect on the cache at all.
With a linked list, the pointers (and probable overhead in each node) are distributed throughout our list. I.e., each piece of data we care about has, right next to it, the overhead of the pointer, and the empty space allocated to the node that we're not using. In short, the effective size of the cache is reduced by about the same factor as the overall overhead of each node in the list -- i.e., we might easily see only 1/8th of the cache storing the date we care about, and 7/8ths devoted to storing pointers and/or pure garbage.
Summary
A linked list can work well when you have a relatively small number of nodes, each of which is individually quite large. If (as is more typical for a stack) you're dealing with a relatively large number of items, each of which is individually quite small, you're much less likely to see a savings in time or memory usage. Quite the contrary, for such cases, a linked list is much more likely to basically waste a great deal of both time and memory.
Yes, what you say is true for C++. For this reason, the default container inside std::stack, which is the standard stack class in C++, is neither a vector nor a linked list, but a double ended queue (a deque). This has nearly all the advantages of a vector, but it resizes much better.
Basically, an std::deque is a linked list of arrays of sorts internally. This way, when it needs to resize, it just adds another array.
First, the performance trade-offs between linked-lists and dynamic arrays are a lot more subtle than that.
The vector class in C++ is, by requirement, implemented as a "dynamic array", meaning that it must have an amortized-constant cost for inserting elements into it. How this is done is usually by increasing the "capacity" of the array in a geometric manner, that is, you double the capacity whenever you run out (or come close to running out). In the end, this means that a reallocation operation (allocating a new chunk of memory and copying the current content into it) is only going to happen on a few occasions. In practice, this means that the overhead for the reallocations only shows up on performance graphs as little spikes at logarithmic intervals. This is what it means to have "amortized-constant" cost, because once you neglect those little spikes, the cost of the insert operations is essentially constant (and trivial, in this case).
In a linked-list implementation, you don't have the overhead of reallocations, however, you do have the overhead of allocating each new element on freestore (dynamic memory). So, the overhead is a bit more regular (not spiked, which can be needed sometimes), but could be more significant than using a dynamic array, especially if the elements are rather inexpensive to copy (small in size, and simple object). In my opinion, linked-lists are only recommended for objects that are really expensive to copy (or move). But at the end of the day, this is something you need to test in any given situation.
Finally, it is important to point out that locality of reference is often the determining factor for any application that makes extensive use and traversal of the elements. When using a dynamic array, the elements are packed together in memory one after the other and doing an in-order traversal is very efficient as the CPU can preemptively cache the memory ahead of the reading / writing operations. In a vanilla linked-list implementation, the jumps from one element to the next generally involves a rather erratic jumps between wildly different memory locations, which effectively disables this "pre-fetching" behavior. So, unless the individual elements of the list are very big and operations on them are typically very long to execute, this lack of pre-fetching when using a linked-list will be the dominant performance problem.
As you can guess, I rarely use a linked-list (std::list), as the number of advantageous applications are few and far between. Very often, for large and expensive-to-copy objects, it is often preferable to simply use a vector of pointers (you get basically the same performance advantages (and disadvantages) as a linked list, but with less memory usage (for linking pointers) and you get random-access capabilities if you need it).
The main case that I can think of, where a linked-list wins over a dynamic array (or a segmented dynamic array like std::deque) is when you need to frequently insert elements in the middle (not at either ends). However, such situations usually arise when you are keeping a sorted (or ordered, in some way) set of elements, in which case, you would use a tree structure to store the elements (e.g., a binary search tree (BST)), not a linked-list. And often, such trees store their nodes (elements) using a semi-contiguous memory layout (e.g., a breadth-first layout) within a dynamic array or segmented dynamic array (e.g., a cache-oblivious dynamic array).
Yes, it's true for C++ or any other language. Dynamic array is a concept. The fact that C++ has vector doesn't change the theory. The vector in C++ actually does the resizing internally so this task isn't the developers' responsibility. The actual cost doesn't magically disappear when using vector, it's simply offloaded to the standard library implementation.
std::vector is implemented using a dynamic array, whereas std::list is implemented as a linked list. There are trade-offs for using both data structures. Pick the one that best suits your needs.
As you indicated, a dynamic array can take a larger amount of time adding an item if it gets full, as it has to expand itself. However, it is faster to access since all of its members are grouped together in memory. This tight grouping also usually makes it more cache-friendly.
Linked lists don't need to resize ever, but traversing them takes longer as the CPU must jump around in memory.
That got me wondering if this is true for c++ as there is a vector class which automatically resizes whenever new elements are inserted.
Yes, it still holds, because a vector resize is a potentially expensive operation. Internally, if the pre-allocated size for the vector is reached and you attempt to add new elements, a new allocation takes place and the old data is moved to the new memory location.
From the C++ documentation:
vector::push_back - Add element at the end
Adds a new element at the end of the vector, after its current last element. The content of val is copied (or moved) to the new element.
This effectively increases the container size by one, which causes an automatic reallocation of the allocated storage space if -and only if- the new vector size surpasses the current vector capacity.
http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style
Skip to 44:40. You should prefer std::vector whenever possible over a std::list, as explained in the video, by Bjarne himself. Since std::vector stores all of it's elements next to each other, in memory, and because of that it will have the advantage of being cached in memory. And this is true for adding and removing elements from std::vector and also searching. He states that std::list is 50-100x slower than a std::vector.
If you really want a stack, you should really use std::stack instead of making your own.

Why would I prefer using vector to deque

Since
they are both contiguous memory containers;
feature wise, deque has almost everything vector has but more, since it is more efficient to insert in the front.
Why whould anyone prefer std::vector to std::deque?
Elements in a deque are not contiguous in memory; vector elements are guaranteed to be. So if you need to interact with a plain C library that needs contiguous arrays, or if you care (a lot) about spatial locality, then you might prefer vector. In addition, since there is some extra bookkeeping, other ops are probably (slightly) more expensive than their equivalent vector operations. On the other hand, using many/large instances of vector may lead to unnecessary heap fragmentation (slowing down calls to new).
Also, as pointed out elsewhere on StackOverflow, there is more good discussion here: http://www.gotw.ca/gotw/054.htm .
To know the difference one should know how deque is generally implemented. Memory is allocated in blocks of equal sizes, and they are chained together (as an array or possibly a vector).
So to find the nth element, you find the appropriate block then access the element within it. This is constant time, because it is always exactly 2 lookups, but that is still more than the vector.
vector also works well with APIs that want a contiguous buffer because they are either C APIs or are more versatile in being able to take a pointer and a length. (Thus you can have a vector underneath or a regular array and call the API from your memory block).
Where deque has its biggest advantages are:
When growing or shrinking the collection from either end
When you are dealing with very large collection sizes.
When dealing with bools and you really want bools rather than a bitset.
The second of these is lesser known, but for very large collection sizes:
The cost of reallocation is large
The overhead of having to find a contiguous memory block is restrictive, so you can run out of memory faster.
When I was dealing with large collections in the past and moved from a contiguous model to a block model, we were able to store about 5 times as large a collection before we ran out of memory in a 32-bit system. This is partly because, when re-allocating, it actually needed to store the old block as well as the new one before it copied the elements over.
Having said all this, you can get into trouble with std::deque on systems that use "optimistic" memory allocation. Whilst its attempts to request a large buffer size for a reallocation of a vector will probably get rejected at some point with a bad_alloc, the optimistic nature of the allocator is likely to always grant the request for the smaller buffer requested by a deque and that is likely to cause the operating system to kill a process to try to acquire some memory. Whichever one it picks might not be too pleasant.
The workarounds in such a case are either setting system-level flags to override optimistic allocation (not always feasible) or managing the memory somewhat more manually, e.g. using your own allocator that checks for memory usage or similar. Obviously not ideal. (Which may answer your question as to prefer vector...)
I've implemented both vector and deque multiple times. deque is hugely more complicated from an implementation point of view. This complication translates to more code and more complex code. So you'll typically see a code size hit when you choose deque over vector. You may also experience a small speed hit if your code uses only the things the vector excels at (i.e. push_back).
If you need a double ended queue, deque is the clear winner. But if you're doing most of your inserts and erases at the back, vector is going to be the clear winner. When you're unsure, declare your container with a typedef (so it is easy to switch back and forth), and measure.
std::deque doesn't have guaranteed continuous memory - and it's often somewhat slower for indexed access. A deque is typically implemented as a "list of vector".
According to http://www.cplusplus.com/reference/stl/deque/, "unlike vectors, deques are not guaranteed to have all its elements in contiguous storage locations, eliminating thus the possibility of safe access through pointer arithmetics."
Deques are a bit more complicated, in part because they don't necessarily have a contiguous memory layout. If you need that feature, you should not use a deque.
(Previously, my answer brought up a lack of standardization (from the same source as above, "deques may be implemented by specific libraries in different ways"), but that actually applies to just about any standard library data type.)
A deque is a sequence container which allows random access to it's elements but it is not guaranteed to have contiguous storage.
I think that good idea to make perfomance test of each case. And make decision relying on this tests.
I'd prefer std::deque than std::vector in most cases.
You woudn't prefer vector to deque acording to these test results (with source).
Of course, you should test in your app/environment, but in summary:
push_back is basically the same for all
insert, erase in deque are much faster than list and marginally faster than vector
Some more musings, and a note to consider circular_buffer.
On the one hand, vector is quite frequently just plain faster than deque. If you don't actually need all of the features of deque, use a vector.
On the other hand, sometimes you do need features which vector does not give you, in which case you must use a deque. For example, I challenge anyone to attempt to rewrite this code, without using a deque, and without enormously altering the algorithm.
Note that vector memory is re-allocated as the array grows. If you have pointers to vector elements, they will become invalid.
Also, if you erase an element, iterators become invalid (but not "for(auto...)").
Edit: changed 'deque' to 'vector'