std::list vs std::vector iteration - c++

It is said that iterating through a vector (as in reading all it's element) is faster than iterating through a list, because of optimized cache.
Is there any ressource on the web that would quantify how much it impacts the performances ?
Also, would it be better to use a custom linked list, whom elements would be prealocated so that they are consecutive in memory?
The idea behind that is that I want to store elements in a certain order that won't change. I still need to be able to insert some at run time in the midle quickly, but most of them will still be consecutive, because the order won't change.
Does the fact that the elements are consecutive have an impact in the cache, or because I'll still call list_element->next instead of ++list_element it does not improve anything ?

The main difference between vector and lists is that in vector elements are constructed subsequently inside a preallocated buffer, while in a list elements are constructed one by one.
As a consequence, elements in a vector are granted to occupy a contiguous memory space, while list elements (unless some specific situations, like a custom allocator working that way) aren't granted to be so, and can be "sparse" around the memory.
Now, since the processor operates on a cache (that can be up to 1000 times faster than the main RAM) that remaps entire pages of the main memory, if elements are consecutive it is higly probable that they fits a same memory page and hence are moved all together in the cache when iteration begins. While proceeding, everything happens in the cache without further moving of data or further access to the slower RAM.
With list-s, since elements are sparse everywhere, "going to the next" means refer to an address that may not be in the same memory page of its previous, and hence, the cache needs to be updated upon every iteration step, accessing the slower RAM on each iteration.
The performance difference greatly depends on the processor and on the type of memory used for both the main RAM and the cache, and on the way the std::allocator (and ultimately operator new and malloc) are implemented, so a general number is impossible to be given.
(Note: great difference means bad RAM respect to to the cache, but may also means bad implementation on list-s)

The efficiency gains from cache coherency due to compact representation of data structures can be rather dramatic. In the case of vectors compared to lists, compact representation can be better not just for read but even for insertion (shifting in vectors) of elements up to the order of 500K elements for some particular architecture as demonstrated in Figure 3 of this article by Bjarne Stroustrup:
http://www2.research.att.com/~bs/Computer-Jan12.pdf
(Publisher site: http://www.computer.org/portal/web/csdl/doi/10.1109/MC.2011.353)
I think that if this is a critical factor for your program, you should profile it on your architecture.

Not sure if I can explain it right but here's my view(i'm thinking along the lines of translated machine instruction below:),
Vector iterator(contiguous memory):
When you increment a vector iterator, the iterator value is simply added the size of the object(known at compile time) to point to the next object. In most CPUs this is anything from one to three instructions at most.
List iterator(linked list http://www.sgi.com/tech/stl/List.html):
When you increment a list iterator(the pointed object), the location of the forward link is located by adding some number to the base of the object pointed and then loaded up as the new value of the iterator. There is more than one memory access for this and is slower than the vector iteration operation.

Related

Is every element access in std::vector a cache miss?

It's known that std::vector hold its data on the heap so the instance of the vector itself and the first element have different addresses. On the other hand, std::array is a lightweight wrapper around a raw array and its address is equal to the first element's address.
Let's assume that the sizes of collections is big enough to hold one cache line of int32. On my machine with 384kB L1 cache it's 98304 numbers.
If I iterate the std::vector it turns out that I always access first the address of the vector itself and next access element's address. And probably this addresses are not in the same cache line. So every element access is a cache miss.
But if I iterate std::array addresses are in the same cache line so it should be faster.
I tested with VS2013 with full optimization and std::array is approx 20% faster.
Am I right in my assumptions?
Update: in order to not create the second similar topic. In this code I have an array and some local variable:
void test(array<int, 10>& arr)
{
int m{ 42 };
for (int i{ 0 }; i < arr.size(); ++i)
{
arr[i] = i * m;
}
}
In the loop I'm accessing both an array and a stack variable which are placed far from each other in memory. Does that mean that every iteration I'll access different memory and miss the cache?
Many of the things you've said are correct, but I do not believe that you're seeing cache misses at the rate that you believe you are. Rather, I think you're seeing other effects of compiler optimizations.
You are right that when you look up an element in a std::vector, that there are two memory reads: first, a memory read for the pointer to the elements; second, a memory read for the element itself. However, if you do multiple sequential reads on the std::vector, then chances are that the very first read you do will have a cache miss on the elements, but all successive reads will either be in cache or be unavoidable. Memory caches are optimized for locality of reference, so whenever a single address is pulled into cache a large number of adjacent memory addresses are pulled into the cache as well. As a result, if you iterate over the elements of a std::vector, most of the time you won't have any cache misses at all. The performance should look quite similar to that for a regular array. It's also worth remembering that the cache stores multiple different memory locations, not just one, so the fact that you're reading both something on the stack (the std::vector internal pointer) and something in the heap (the elements), or two different elements on the stack, won't immediately cause a cache miss.
Something to keep in mind is that cache misses are extremely expensive compared to cache hits - often 10x slower - so if you were indeed seeing a cache miss on each element of the std::vector you wouldn't see a gap of only 20% in performance. You'd see something a lot closer to a 2x or greater performance gap.
So why, then, are you seeing a difference in performance? One big factor that you haven't yet accounted for is that if you use a std::array<int, 10>, then the compiler can tell at compile-time that the array has exactly ten elements in it and can unroll or otherwise optimize the loop you have to eliminate unnecessary checks. In fact, the compiler could in principle replace the loop with 10 sequential blocks of code that all write to a specific array element, which might be a lot faster than repeatedly branching backwards in the loop. On the other hand, with equivalent code that uses std::vector, the compiler can't always know in advance how many times the loop will run, so chances are it can't generate code that's as good as the code it generated for the array.
Then there's the fact that the code you've written here is so small that any attempt to time it is going to have a ton of noise. It would be difficult to assess how fast this is reliably, since something as simple as just putting it into a for loop would mess up the cache behavior compared to a "cold" run of the method.
Overall, I wouldn't attribute this to cache misses, since I doubt there's any appreciably different number of them. Rather, I think this is compiler optimization on arrays whose sizes are known statically compared with optimization on std::vectors whose sizes can only be known dynamically.
I think it has nothing to do with cache miss.
You can take std::array as a wrapper of raw array, i.e. int arr[10], while vector as a wrapper of dynamic array, i.e. new int[10]. They should have the same performance. However, when you access vector, you operate on the dynamic array through pointers. Normally the compiler might optimize code with array better than code with pointers. And that might be the reason you get the test result: std::array is faster.
You can have a test that replacing std::array with int arr[10]. Although std::array is just a wrapper of int arr[10], you might get even better performance (in some case, the compiler can do better optimization with raw array). You can also have another test that replacing vector with new int[10], they should have equal performance.
For your second question, the local variable, i.e. m, will be saved in register (if optimized properly), and there will be no access to the memory location of m during the for loop. So it won't be a problem of cache miss either.

Bjarne Stroustrup says we must avoid linked lists

I saw this video on YouTube: https://www.youtube.com/watch?v=YQs6IC-vgmo
in which Bjarne says it is better to use vectors, rather than linked lists. I am unable to grasp the entire thing, so could anyone explain what he is saying in layman's terms?
P.S: I am an high school student and can easily handle linked lists, but I am struggling to learn vectors on my own. Could you suggest any sources to learn vectors?
Advantages of vector vs. linked list
The main advantage of vector versus linked lists is memory locality.
Usually, each element is allocated seperately in a linked list. As a consequence those elements probably are not next to each other in memory.
(Gaps between the elements in memory.)
A vector is guaranteed to store all contained elements contiguously. (Items next to each other, no gaps;)
Note: Oversimplifications may occur... ;)
Imo, the simplified key points about the superior performance of a contiguously stored data storage pattern versus non-contiguous storage are
1. cache misses
Modern CPUs do not fetch single bytes from memory but slightly larger chunks. Thus, if your data objects size is less than the size of those chunks and if storage is contiguous, you can get more than one element at a time because multiple elements may be in a single chunk.
Example:
A 64byte block (usual cache line size) fits sixteen 32bit integers at a time. Therefore, a cache miss (data not in cache yet -> load from main memory required) occurs at the earliest after processing 16 elements from the moment first one has been brought to cache. If a linked list is used, the first element might well be the only one within a 64byte block. It might in theory happen that a cache miss occurs for every single element of the list.
Concrete example:
std::vector<std::uint32_t> v;
// somehow fill 64 values into v
std::uint32_t s{};
for(std::size_t i{0}; i<v.size(); ++i)
{
s += v[i];
}
Imagine the contents of v not being cached yet.
What happens during the processing of the data in the for loop?
1)Check whether element v[0] is in cache. --> Nope
2)Fetch 64 bytes starting at the address of v[0] from main memory into a cache line
3)Load v[0] from cache and process by adding its value to s
4)Is element v1 in cache? --> Yes loaded with previous fetch because neighbouring v[0]
5)Load v1 from cache and process by adding its value to s
6)Is element v[2] in cache? --> Yes ...
7) Load v[2] from cache and process by adding its value to s
... etc...
34)Is element v[16] in cache? --> Nope
35) Fetch 64 bytes starting at the address of v[16] from main memory into a cache line
36)Load v[16] from cache and process by adding its value to s
37)Is element v[17] in cache? --> Yes loaded with previous fetch because neighbouring v[16]
etc...
Fetching data from main memory into the cache takes much more time than loading data from cache into processor registers and perform simple operations. Therefore, the fact that multiple values may reside on a single cache line can boost performance significantly.
Linked lists do not provide a contiguous storage guarantee and you cannot hope to get this performance boost. This is also the reason why random iteration (accessing elements randomly) performs worse than forward iteration (accessing elements in order) for contiguous containers.
2. prefetching
The above effect is amplified by a CPU feature called "prefetcher".
If a chunk has been loaded from main memory, the prefetcher prepares loading the next chunk / already puts it into cache, significantly reducing the penality of loading stuff from that part of the main memory.
This is of course effective if and only if you in fact need data from the next prepared chunk.
How do vectors usually work (internally)?
See: c++ Vector, what happens whenever it expands/reallocate on stack?
Stroustrup has written an article https://isocpp.org/blog/2014/06/stroustrup-lists that says he has been misunderstood and isn't saying to avoid linked lists.
I can't comment on C++ implementations of vectors and linked lists.. But, you say you understand linked lists and not vectors. I can say that in Java, people understood vectors but not necessarily linked lists. C# has a List data type, and most people don't really look into whether that's implemented as a linked list or as a vector. Here is a good discussion on the differences in terms of usage. https://www.reddit.com/r/learnprogramming/comments/15mxrt/what_are_the_real_world_applications_of_linked/ One comment says "Data stored in a Linked List, once allocated in memory, will stay in the same spot. That means as your linked list changes in size, the data of any elements does not move (in memory), so it can be safely pointed at."
Straustrup's article says "And, yes, my recomendation is to use std::vector by default. More generally, use a contiguous representation unless there is a good reason not to. Like C, C++ is designed to do that by default. Also, please don’t make statements about performance without measurements. "
I have seen Stroustrup in an interview say that deleting an element and shifting them all up is really fast, and faster than deleting an element from a linked list. But I suppose one shouldn't conclude from that that he's saying linked lists have no use case.

Helps a vector for cache locality? (C++)

Last week I have read about great concepts as cache locality and pipelining in a cpu. Although these concepts are easy to understand I have two questions. Suppose one can choose between a vector of objects or a vector of pointers to objects (as in this question).
Then an argument for using pointers is that shufling larger objects may be expensive. However, I'm not able to find when I should call an object large. Is an object of several bytes already large?
An argument against the pointers is the loss of cache locality. Will it help if one uses two vectors where the first one contains the objects and will not be reordered and the second one contains pointers to these objects? Say that we have a vector of 200 objects and create a vector with pointers to these objects and then randomly shuffle the last vector. Is the cache locality then lost if we loop over the vector with pointers?
This last scenario happens a lot in my programs where I have City objects and then have around 200 vectors of pointers to these Cities. To avoid having 200 instances of each City I use a vector of pointers instead of a vector of Cities.
There is no simple answer to this question. You need to understand how your system interacts with regards to memory, what operations you do on the container, and which of those operations are "important". But by understanding the concepts and what affects what, you can get a better understanding of how things work. So here's some "discussion" on the subject.
"Cache locality" is largely about "keeping things in the cache". In other words, if you look at A, then B, and A is located close to B, they are probably getting loaded into the cache together.
If objects are large enough that they fill one or more cache-lines (modern CPU's have cache-lines of 64-128 bytes, mobile ones are sometimes smaller), the "next object in line" will not be in the cache anyways [1], so the cache-locality of the "next element in the vector" is less important. The smaller the object is, the more effect of this you get - assuming you are accessing objects in the order they are stored. If you pick a random number, then other factors start to become important [2], and the cache locality is much less important.
On the other other hand, as objects get larger, moving them within the vector (including growing, removing, inserting, as well as "random shuffle") will be more time consuming, as copying more data gets more extensive.
Of course, one further step is always needed to read from a pointer vs. reading an element directly in a vector, since the pointer itself needs to be "read" before we can get to the actual data in the pointee object. Again, this becomes more important when random-accessing things.
I always start with "whatever is simplest" (which depends on the overall construct of the code, e.g. sometimes it's easier to create a vector of pointers because you have to dynamically create the objects in the first place). Most of the code in a system is not performance critical anyway, so why worry about it's performance - just get it working and leave it be if it doesn't turn up in your performance measurements.
Of course, also, if you are doing a lot of movement of objects in a container, maybe vector isn't the best container. That's why there are multiple container variants - vector, list, map, tree, deque - as they have different characteristics with regards to their access and insert/remove as well as characteristics for linearly walking the data.
Oh, and in your example, you talk of 200 city objects - well, they are probably going to all fit in the cache of any modern CPU anyways. So stick them in a vector. Unless a city contains a list of every individual living in the city... But that probably should be a vector (or other container object) in itself.
As an experiment, make a program that does the same operations on a std::vector<int> and std::vector<int*> [such as filling with random numbers, then sorting the elements], then make an object that is large [stick some array of integers in there, or some such], with one integer so that you can do the very same operations on that. Vary the size of the object stored, and see how it behaves. On YOUR system, where is the benefit of having pointers, over having plain objects. Of course, also vary the number of elements, to see what effect that has.
[1] Well, modern processors use cache-prefetching, which MAY load "next data" into the cache speculatively, but we certainly can't rely on this.
[2] An extreme case of this is a telephone exchange with a large number of subscribers (millions). When placing a call, the caller and callee are looked up in a table. But the chance of either caller or callee being in the cache is nearly zero, because (assuming we're dealing with a large city, say London) the number of calls placed and received every second is quite large. So caches become useless, and it gets worse, because the processor also caches the page-table entries, and they are also, most likely, out of date. For these sort of applications, the CPU designers have "huge pages", which means that the memory is split into 1GB pages instead of the usual 4K or 2MB pages that have been around for a while. This reduces the amount of memory reading needed before "we get to the right place". Of course, the same applies to various other "large database, unpredictable pattern" - airlines, facebook, stackoverflow all have these sort of problems.

Why is deque using so much more RAM than vector in C++?

I have a problem I am working on where I need to use some sort of 2 dimensional array. The array is fixed width (four columns), but I need to create extra rows on the fly.
To do this, I have been using vectors of vectors, and I have been using some nested loops that contain this:
array.push_back(vector<float>(4));
array[n][0] = a;
array[n][1] = b;
array[n][2] = c;
array[n][3] = d;
n++
to add the rows and their contents. The trouble is that I appear to be running out of memory with the number of elements I was trying to create, so I reduced the number that I was using. But then I started reading about deque, and thought it would allow me to use more memory because it doesn't have to be contiguous. I changed all mentions of "vector" to "deque", in this loop, as well as all declarations. But then it appeared that I ran out of memory again, this time with even with the reduced number of rows.
I looked at how much memory my code is using, and when I am using deque, the memory rises steadily to above 2GB, and the program closes soon after, even when using the smaller number of rows. I'm not sure exactly where in this loop it is when it runs out of memory.
When I use vectors, the memory usage (for the same number of rows) is still under 1GB, even when the loop exits. It then goes on to a similar loop where more rows are added, still only reaching about 1.4GB.
So my question is. Is this normal for deque to use more than twice the memory of vector, or am I making an erroneous assumption in thinking I can just replace the word "vector" with "deque" in the declarations/initializations and the above code?
Thanks in advance.
I'm using:
MS Visual C++ 2010 (32-bit)
Windows 7 (64-bit)
The real answer here has little to do with the core data structure. The answer is that MSVC's implementation of std::deque is especially awful and degenerates to an array of pointers to individual elements, rather than the array of arrays it should be. Frankly, only twice the memory use of vector is surprising. If you had a better implementation of deque you'd get better results.
It all depends on the internal implementation of deque (I won't speak about vector since it is relatively straightforward).
Fact is, deque has completely different guarantees than vector (the most important one being that it supports O(1) insertion at both ends while vector only supports O(1) insertion at the back). This in turn means the internal structures managed by deque have to be more complex than vector.
To allow that, a typical deque implementation will split its memory in several non-contiguous blocks. But each individual memory block has a fixed overhead to allow the memory management to work (eg. whatever the size of the block, the system may need another 16 or 32 bytes or whatever in addition, just for bookkeeping). Since, contrary to a vector, a deque requires many small, independent blocks, the overhead stacks up which can explain the difference you see. Also note that those individual memory blocks need to be managed (maybe in separate structures?), which probably means some (or a lot of) additional overhead too.
As for a way to solve your problem, you could try what #BasileStarynkevitch suggested in the comments, this will indeed reduce your memory usage but it will get you only so far because at some point you'll still run out of memory. And what if you try to run your program on a machine that only has 256MB RAM? Any other solution which goal is to reduce your memory footprint while still trying to keep all your data in memory will suffer from the same problems.
A proper solution when handling large datasets like yours would be to adapt your algorithms and data structures in order to be able to handle small partitions at a time of your whole dataset, and load/save those partitions as needed in order to make room for the other partitions. Unfortunately since it probably means disk access, it also means a big drop in performance but hey, you can't eat the cake and have it too.
Theory
There two common ways to efficiently implement a deque: either with a modified dynamic array or with a doubly linked list.
The modified dynamic array uses is basically a dynamic array that can grow from both ends, sometimes called array deques. These array deques have all the properties of a dynamic array, such as constant-time random access, good locality of reference, and inefficient insertion/removal in the middle, with the addition of amortized constant-time insertion/removal at both ends, instead of just one end.
There are several implementations of modified dynamic array:
Allocating deque contents from the center of the underlying array,
and resizing the underlying array when either end is reached. This
approach may require more frequent resizings and waste more space,
particularly when elements are only inserted at one end.
Storing deque contents in a circular buffer, and only resizing when
the buffer becomes full. This decreases the frequency of resizings.
Storing contents in multiple smaller arrays, allocating additional
arrays at the beginning or end as needed. Indexing is implemented by
keeping a dynamic array containing pointers to each of the smaller
arrays.
Conclusion
Different libraries may implement deques in different ways, but generally as a modified dynamic array. Most likely your standard library uses the approach #1 to implement std::deque, and since you append elements only from one end, you ultimately waste a lot of space. For that reason, it makes an illusion that std::deque takes up more space than usual std::vector.
Furthermore, if std::deque would be implemented as doubly-linked list, that would result in a waste of space too since each element would need to accommodate 2 pointers in addition to your custom data.
Implementation with approach #3 (modified dynamic array approach too) would again result in a waste of space to accommodate additional metadata such as pointers to all those small arrays.
In any case, std::deque is less efficient in terms of storage than plain old std::vector. Without knowing what do you want to achieve I cannot confidently suggest which data structure do you need. However, it seems like you don't even know what deques are for, therefore, what you really want in your situation is std::vector. Deques, in general, have different application.
Deque can have additional memory overhead over vector because it's made of a few blocks instead of contiguous one.
From en.cppreference.com/w/cpp/container/deque:
As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays.
The primary issue is running out of memory.
So, do you need all the data in memory at once?
You may never be able to accomplish this.
Partial Processing
You may want to consider processing the data into "chunks" or smaller sub-matrices. For example, using the standard rectangular grid:
Read data of first quadrant.
Process data of first quandrant.
Store results (in a file) of first quandrant.
Repeat for remaining quandrants.
Searching
If you are searching for a particle or a set of datum, you can do that without reading in the entire data set into memory.
Allocate a block (array) of memory.
Read a portion of the data into this block of memory.
Search the block of data.
Repeat steps 2 and 3 until the data is found.
Streaming Data
If your application is receiving the raw data from an input source (other than a file), you will want to store the data for later processing.
This will require more than one buffer and is more efficient using at least two threads of execution.
The Reading Thread will be reading data into a buffer until the buffer is full. When the buffer is full, it will read data into another empty one.
The Writing Thread will initially wait until either the first read buffer is full or the read operation is finished. Next, the Writing Thread takes data out of the read buffer and writes to a file. The Write Thread then starts writing from the next read buffer.
This technique is called Double Buffering or Multiple Buffering.
Sparse Data
If there is a lot of zero or unused data in the matrix, you should try using Sparse Matrices. Essentially, this is a list of structures that hold the data's coordinates and the value. This also works when most of the data is a common value other than zero. This saves a lot of memory space; but costs a little bit more execution time.
Data Compression
You could also change your algorithms to use data compression. The idea here is to store the data location, value and the number or contiguous equal values (a.k.a. runs). So instead of storing 100 consecutive data points of the same value, you would store the starting position (of the run), the value, and 100 as the quantity. This saves a lot of space, but requires more processing time when accessing the data.
Memory Mapped File
There are libraries that can treat a file as memory. Essentially, they read in a "page" of the file into memory. When the requests go out of the "page", they read in another page. All this is performed "behind the scenes". All you need to do is treat the file like memory.
Summary
Arrays and deques are not your primary issue, quantity of data is. Your primary issue can be resolved by processing small pieces of data at a time, compressing the data storage, or treating the data in the file as memory. If you are trying to process streaming data, don't. Ideally, streaming data should be placed into a file and then processed later.
A historical purpose of a file is to contain data that doesn't fit into memory.

Linked list vs dynamic array for implementing a stack using vector class

I was reading up on the two different ways of implementing a stack: linked list and dynamic arrays. The main advantage of a linked list over a dynamic array was that the linked list did not have to be resized while a dynamic array had to be resized if too many elements were inserted hence wasting alot of time and memory.
That got me wondering if this is true for C++ (as there is a vector class which automatically resizes whenever new elements are inserted)?
It's difficult to compare the two, because the patterns of their memory usage are quite different.
Vector resizing
A vector resizes itself dynamically as needed. It does that by allocating a new chunk of memory, moving (or copying) data from the old chunk to the new chunk, the releasing the old one. In a typical case, the new chunk is 1.5x the size of the old (contrary to popular belief, 2x seems to be quite unusual in practice). That means for a short time while reallocating, it needs memory equal to roughly 2.5x as much as the data you're actually storing. The rest of the time, the "chunk" that's in use is a minimum of 2/3rds full, and a maximum of completely full. If all sizes are equally likely, we can expect it to average about 5/6ths full. Looking at it from the other direction, we can expect about 1/6th, or about 17% of the space to be "wasted" at any given time.
When we do resize by a constant factor like that (rather than, for example, always adding a specific size of chunk, such as growing in 4Kb increments) we get what's called amortized constant time addition. In other words, as the array grows, resizing happens exponentially less often. The average number of times items in the array have been copied tends to a constant (usually around 3, but depends on the growth factor you use).
linked list allocations
Using a linked list, the situation is rather different. We never see resizing, so we don't see extra time or memory usage for some insertions. At the same time, we do see extra time and memory used essentially all the time. In particular, each node in the linked list needs to contain a pointer to the next node. Depending on the size of the data in the node compared to the size of a pointer, this can lead to significant overhead. For example, let's assume you need a stack of ints. In a typical case where an int is the same size as a pointer, that's going to mean 50% overhead -- all the time. It's increasingly common for a pointer to be larger than an int; twice the size is fairly common (64-bit pointer, 32-bit int). In such a case, you have ~67% overhead -- i.e., obviously enough, each node devoting twice as much space to the pointer as the data being stored.
Unfortunately, that's often just the tip of the iceberg. In a typical linked list, each node is dynamically allocated individually. At least if you're storing small data items (such as int) the memory allocated for a node may be (usually will be) even larger than the amount you actually request. So -- you ask for 12 bytes of memory to hold an int and a pointer -- but the chunk of memory you get is likely to be rounded up to 16 or 32 bytes instead. Now you're looking at overhead of at least 75% and quite possibly ~88%.
As far as speed goes, the situation is rather similar: allocating and freeing memory dynamically is often quite slow. The heap manager typically has blocks of free memory, and has to spend time searching through them to find the block that's most suited to the size you're asking for. Then it (typically) has to split that block into two pieces, one to satisfy your allocation, and another of the remaining memory it can use to satisfy other allocations. Likewise, when you free memory, it typically goes back to that same list of free blocks and checks whether there's an adjoining block of memory already free, so it can join the two back together.
Allocating and managing lots of blocks of memory is expensive.
cache usage
Finally, with recent processors we run into another important factor: cache usage. In the case of a vector, we have all the data right next to each other. Then, after the end of the part of the vector that's in use, we have some empty memory. This leads to excellent cache usage -- the data we're using gets cached; the data we're not using has little or no effect on the cache at all.
With a linked list, the pointers (and probable overhead in each node) are distributed throughout our list. I.e., each piece of data we care about has, right next to it, the overhead of the pointer, and the empty space allocated to the node that we're not using. In short, the effective size of the cache is reduced by about the same factor as the overall overhead of each node in the list -- i.e., we might easily see only 1/8th of the cache storing the date we care about, and 7/8ths devoted to storing pointers and/or pure garbage.
Summary
A linked list can work well when you have a relatively small number of nodes, each of which is individually quite large. If (as is more typical for a stack) you're dealing with a relatively large number of items, each of which is individually quite small, you're much less likely to see a savings in time or memory usage. Quite the contrary, for such cases, a linked list is much more likely to basically waste a great deal of both time and memory.
Yes, what you say is true for C++. For this reason, the default container inside std::stack, which is the standard stack class in C++, is neither a vector nor a linked list, but a double ended queue (a deque). This has nearly all the advantages of a vector, but it resizes much better.
Basically, an std::deque is a linked list of arrays of sorts internally. This way, when it needs to resize, it just adds another array.
First, the performance trade-offs between linked-lists and dynamic arrays are a lot more subtle than that.
The vector class in C++ is, by requirement, implemented as a "dynamic array", meaning that it must have an amortized-constant cost for inserting elements into it. How this is done is usually by increasing the "capacity" of the array in a geometric manner, that is, you double the capacity whenever you run out (or come close to running out). In the end, this means that a reallocation operation (allocating a new chunk of memory and copying the current content into it) is only going to happen on a few occasions. In practice, this means that the overhead for the reallocations only shows up on performance graphs as little spikes at logarithmic intervals. This is what it means to have "amortized-constant" cost, because once you neglect those little spikes, the cost of the insert operations is essentially constant (and trivial, in this case).
In a linked-list implementation, you don't have the overhead of reallocations, however, you do have the overhead of allocating each new element on freestore (dynamic memory). So, the overhead is a bit more regular (not spiked, which can be needed sometimes), but could be more significant than using a dynamic array, especially if the elements are rather inexpensive to copy (small in size, and simple object). In my opinion, linked-lists are only recommended for objects that are really expensive to copy (or move). But at the end of the day, this is something you need to test in any given situation.
Finally, it is important to point out that locality of reference is often the determining factor for any application that makes extensive use and traversal of the elements. When using a dynamic array, the elements are packed together in memory one after the other and doing an in-order traversal is very efficient as the CPU can preemptively cache the memory ahead of the reading / writing operations. In a vanilla linked-list implementation, the jumps from one element to the next generally involves a rather erratic jumps between wildly different memory locations, which effectively disables this "pre-fetching" behavior. So, unless the individual elements of the list are very big and operations on them are typically very long to execute, this lack of pre-fetching when using a linked-list will be the dominant performance problem.
As you can guess, I rarely use a linked-list (std::list), as the number of advantageous applications are few and far between. Very often, for large and expensive-to-copy objects, it is often preferable to simply use a vector of pointers (you get basically the same performance advantages (and disadvantages) as a linked list, but with less memory usage (for linking pointers) and you get random-access capabilities if you need it).
The main case that I can think of, where a linked-list wins over a dynamic array (or a segmented dynamic array like std::deque) is when you need to frequently insert elements in the middle (not at either ends). However, such situations usually arise when you are keeping a sorted (or ordered, in some way) set of elements, in which case, you would use a tree structure to store the elements (e.g., a binary search tree (BST)), not a linked-list. And often, such trees store their nodes (elements) using a semi-contiguous memory layout (e.g., a breadth-first layout) within a dynamic array or segmented dynamic array (e.g., a cache-oblivious dynamic array).
Yes, it's true for C++ or any other language. Dynamic array is a concept. The fact that C++ has vector doesn't change the theory. The vector in C++ actually does the resizing internally so this task isn't the developers' responsibility. The actual cost doesn't magically disappear when using vector, it's simply offloaded to the standard library implementation.
std::vector is implemented using a dynamic array, whereas std::list is implemented as a linked list. There are trade-offs for using both data structures. Pick the one that best suits your needs.
As you indicated, a dynamic array can take a larger amount of time adding an item if it gets full, as it has to expand itself. However, it is faster to access since all of its members are grouped together in memory. This tight grouping also usually makes it more cache-friendly.
Linked lists don't need to resize ever, but traversing them takes longer as the CPU must jump around in memory.
That got me wondering if this is true for c++ as there is a vector class which automatically resizes whenever new elements are inserted.
Yes, it still holds, because a vector resize is a potentially expensive operation. Internally, if the pre-allocated size for the vector is reached and you attempt to add new elements, a new allocation takes place and the old data is moved to the new memory location.
From the C++ documentation:
vector::push_back - Add element at the end
Adds a new element at the end of the vector, after its current last element. The content of val is copied (or moved) to the new element.
This effectively increases the container size by one, which causes an automatic reallocation of the allocated storage space if -and only if- the new vector size surpasses the current vector capacity.
http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style
Skip to 44:40. You should prefer std::vector whenever possible over a std::list, as explained in the video, by Bjarne himself. Since std::vector stores all of it's elements next to each other, in memory, and because of that it will have the advantage of being cached in memory. And this is true for adding and removing elements from std::vector and also searching. He states that std::list is 50-100x slower than a std::vector.
If you really want a stack, you should really use std::stack instead of making your own.