Complexity of Array Operations - c++

So one of the topics in my comp sci class is concerning time complexity and using arrays and linked lists as a good way to compare certain operations and what container is better at doing so, so you can choose the appropiate data structure.
I understand the reasoning behind most of the operations but I'm unsure about one and that is inserting and appending in an array.
The worst case scenario for both of these is O(n). I believe I understand why inserting is O(n) because worst case, you insert at the front causing you to shift all elements over to the right meaning that its linear and dependent on the number of elements in the array.
For appending, I was curious why it was not O(1) since it takes one operation no matter the size to add an element at the end, given that there is space.
Is that the issue, if there isn't enough space you have to copy the array to a larger one for its worst case scenario?

[...] if there isn't enough space you have to copy the array to a larger one
for its worst case scenario?
Bingo.

A typical array is a chunk of contiguous memory with a definite size, which is determined either at compile or run time. There is no such thing as removing or inserting elements into an array, but simply writing into the already-allocated memory.
A linked list is a non-contiguous collection of memory chunks, which are connected by means of their addresses. There is such a thing as removing and inserting elements into a linked list.
The benefits of an array over a linked list are easier traversal and compactness (extra memory to store the address of the next [or previous] element is unnecessary). However, unlike a linked list, this cannot be extended as easily.
Nevertheless, in order for us to more precisely talk about the time complexities of the algorithms inherent to a data structure, we need to first define the data structure.
Doubly linked lists? Do we store the addresses of the first and last elements (like a queue)? Binary trees (which are a type of linked list)?

Related

c++ is "Getting element at index from list" or "Getting element at index of vector", faster [duplicate]

For a simple linked list in which random access to list elements is not a requirement, are there any significant advantages (performance or otherwise) to using std::list instead of std::vector? If backwards traversal is required, would it be more efficient to use std::slist and reverse() the list prior to iterating over its elements?
As usual the best answer to performance questions is to profile both implementations for your use case and see which is faster.
In general if you have insertions into the data-structure (other than at the end) then vector may be slower, otherwise in most cases vector is expected to perform better than list if only for data locality issues, this means that if two elements that are adjacent in the data-set are adjacent in memory then the next element will already be in the processor's cache and will not have to page fault the memory into the cache.
Also keep in mind that the space overhead for a vector is constant (3 pointers) while the space overhead for a list is paid for each element, this also reduces the number of full elements (data plus overhead) that can reside in the cache at any one time.
Default data structure to think of in C++ is the Vector.
Consider the following points...
1] Traversal:
List nodes are scattered everywhere in memory and hence list traversal leads to cache misses. But traversal of vectors is smooth.
2] Insertion and Deletion:
Average 50% of elements must be shifted when you do that to a Vector but caches are very good at that! But, with lists, you need to traverse to the point of insertion/deletion... so again... cache misses!
And surprisingly vectors win this case as well!
3] Storage:
When you use lists, there are 2 pointers per elements(forward & backward) so a List is much bigger than a Vector!
Vectors need just a little more memory than the actual elements need.
Yout should have a reason not to use a vector.
Reference:
I learned this in a talk of The Lord Bjarne Stroustrup: https://youtu.be/0iWb_qi2-uI?t=2680
Simply no. List has advantages over Vector, but sequential access is not one of them - if that's all you're doing, then a vector is better.
However.. a vector is more expensive to add additional elements than a list, especially if you're inserting in the middle.
Understand how these collections are implemented: a vector is a sequential array of data, a list is an element that contains the data and pointers to the next elements. Once you understand that, you'll understand why lists are good for inserts, and bad for random access.
(so, reverse iteration of a vector is exactly the same as for forward iteration - the iterator just subtracts the size of the data items each time, the list still has to jump to the next item via the pointer)
If you need backwards traversal an slist is unlikely to be the datastructure for you.
A conventional (doubly) linked list gives you constant insertion and deletion time anywhere in the list; a vector only gives amortised constant time insertion and deletion at the end of the list. For a vector insertion and deletion time is linear anywhere other than the end. This isn't the whole story; there are also constant factors. A vector is a more simple datastructure that has advantages and disadvantages depending on the context.
The best way to understand this is to understand how they are implemented. A linked list has a next and a previous pointer for each element. A vector has an array of elements addressed by index. From this you can see that both can do efficient forwards and backwards traversal, while only a vector can provide efficient random access. You can also see that the memory overhead of a linked list is per element while for the vector it is constant. And you can also see why insertion time is different between the two structures.
Some rigorous benchmarks on the topic:
http://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-deque.html
As has been noted by others, contiguous memory storage means std::vector is better for most things. There is virtually no good reason to use std::list except for small amounts of data (where it can all fit in the cache) and/or where erasure and reinsertion are frequent.
Complexity guarantees do Not relate to real-world performance because of the difference between cache and main memory speeds (200x) and how contiguous memory access affects cache usage. See Chandler Carruth (google) talk about the issue here:
https://www.youtube.com/watch?v=fHNmRkzxHWs
And Mike Acton's Data Oriented Design talk here:
https://www.youtube.com/watch?v=rX0ItVEVjHc
See this question for details about the costs:
What are the complexity Guarantees of the standard containers
If you have an slist and you now want to traverse it in reverse order why not change the type to list everywhere?
std::vector is insanely faster than std::list to find an element
std::vector always performs faster than std::list with very small data
std::vector is always faster to push elements at the back than std::list
std::list handles large elements very well, especially for
sorting or inserting in the front
Note: If you want to learn more about performance, I would recommend to see this

What advantages do vectors have over linked lists

I had the following exchange with my professor which wasn't very satisfying. I included my parts of the exchange which should be enough to get my point across.
"For vectors, does the C++ implementation traverse through each element of the old dynamically allocated array and free it?
(Edit: I mean, when resizing and adding elements, either by pushback or resize)
I am especially curious because the book tries to make the case that linked lists are troublesome because of having to traverse each time. Does not seem to me that vectors have a huge advantage in that regard.
The main benefit of the vectors I can see is the convenience and fast accessing but not much more. As in, everytime you try to do something other than accessing, you will be traversing through everything to move and free memory. Is that correct?"
After his reply, I added.
"Professor xxxx,
I went out to test, and in fact, the addresses change if you either resize or push_back, so my assumption that the old addresses are freed is correct. I can only assume that program will have to go to each element to free it, and if that's correct, won't the insertion of new things be costly in terms of time, even more so than traversing linked lists?
Can you kindly correct the following statement If it states any incorrect facts or assumptions. Using vectors in any other way than using arrays (for any other purpose than accessing already stored data), means that linked lists will almost always be faster because unlike linked lists, in vectors not only would you traverse through the elements, you will traverse through them, free them, and then create a whole new array to accommodate new space. That is because the next address after the last element of the current vector could have a pointer variable pointing to it, and using that address will cause an extremely strange behavior that I cannot imagine the poor soul's misery that tries to figure out what had gone wrong."
TL;DR:
Disadvantage of linked lists is traversing, but vector uses (push_back, resize(), etc) most often require the traversing anyway, so how are vectors exactly faster?
There are several things that are faster than you expect:
When a vector reallocates, the original elements are destroyed, not freed, one by one. Their storage is then freed all at once. This is as opposed to a linked list, where each node is allocated and freed individually. But this is somewhat moot because:
A vector batches reallocations. std::vector is specified to have an amortized constant insertion cost, which implies that it avoids reallocating every time you push_back, to the extent that this cost becomes negligible when considering complexity. Typical implementations multiply the vector's capacity by a fixed factor every time it is exceeded, so when performing the costly reallocation it provides room for the next several push_backs. These then do not need to traverse the vector or allocate anything.
A vector is extremely cache-friendly. This makes all sequential operations on a vector blazing fast, and can counter-intuitively outperform a linked list in many cases, especially in long-running applications where memory might become fragmented.
As a complement of the already given answer and comments...
The std::vector has its elements stored contiguously in memory while it is not the case of a linked list.
The direct consequence is that element access is trivial for a std::vector while it's not for a linked list.
For example, if I want to access the nth element of a linked list, I have to iterate through the list until reaching the desired element.
But on the other hand, the linked list will perform better if we want to insert a new element inside.
Indeed, for a linked list, we have to iterate until we reach the desired position, then we just have to change the connections between the previous and the next node so that the new element is inserted in-between.
For a std::vector, you have to relocate every elements after the desired position (and make a reallocation if needed, i.e. if adding a new element exceeds the reserved available space).
So the std::vector is better for element access but is less efficient when inserting an element inside (same thing for removal).

Dynamic array VS linked list in C++ [duplicate]

This question already has answers here:
Linked List vs Vector
(5 answers)
Closed 7 years ago.
Why we need a linked list when we have dynamic array list?
I have studied static list and linked list. I have knowledge of dynamic array list. but I couldn't find out the exact difference between that
Anyone please help me to answer this
Dynamic array is an array that resizes itself up or down depending on the number of content.
Advantage:
accessing and assignment by index is very fast O(1) process, since internally index access is just [address of first member] + [offset].
appending object (inserting at the end of array) is relatively fast amortized O(1). Same performance characteristic as removing objects at the end of the array. Note: appending and removing objects near the end of array is also known as push and pop.
Disadvantage:
inserting or removing objects in a random position in a dynamic array is very slow O(n/2), as it must shift (on average) half of the array every time. Especially poor is insertion and removal near the start of the array, as it must copy the whole array.
Unpredictable performance when insertion or removal requires resizing
There is a bit of unused space, since dynamic array implementation usually allocates more memory than necessary (since resize is a very slow operation)
Linked List is an object that have a general structure of [head, [tail]], head is the data, and tail is another Linked List. There are many versions of linked list: singular LL, double LL, circular LL, etc.
Advantage:
fast O(1) insertion and removal at any position in the list, as insertion in linked list is only breaking the list, inserting, and repairing it back together (no need to copy the tails)
Linked list is a persistent data structure, rather hard to explain in short sentence, see: wiki-link . This advantage allow tail sharing between two linked list. Tail sharing makes it easy to use linked list as copy-on-write data structure.
Disadvantage:
Slow O(n) index access (random access), since accessing linked list by index means you have to recursively loop over the list.
poor locality, the memory used for linked list is scattered around in a mess. In contrast with, arrays which uses a contiguous addresses in memory. Arrays (slightly) benefits from processor caching since they are all near each other
Others:
Due to the nature of linked list, you have to think recursively. Programmers that are not used to recursive functions may have some difficulties in writing algorithms for linked list (or worse they may try to use indexing).
Simply put, when you want to use algorithms that requires random access, forget linked list. When you want to use algorithms that requires heavy insertion and removal, forget arrays.
This is taken from the best answer of this question
I am convinced by this answer.
Vector aka Dynamic Array: Like regular array. The continuous memory location is used for storing vector. Whenever you need to allocate more memory, and memory is not available in the current location, the entire array is copied to another location and extra memory is allocated.
List: Maintain a pointer in each element and that pointer points to the next element.
What are the complexity guarantees of the standard containers?
Look at this link for more information.

C++ vector and list insertion

Could anybody know why inserting an element into the middle of a list is faster
than inserting an element into the middle of a vector?
I prefer to use vector but am told to use list if I can.
Anybody can explains why?
And is it always recommended to use list over vector?
If I take the question verbatim, finding the middle of an array (std::vector) is a simple operation, you divide the length by two and then round up or down to get the index. Finding the middle of a doubly linked list (std::list) requires walking through all elements. Even if you know its size, you still need to walk over half of the elements. Therefore std::vector is faster than std::list, in other words one is O(1) while the other is O(n).
Inserting at a known position requires shuffing the adjacent elements for an array and just linking in another node for a doubly linked list, as others explained here. Therefore, std::list with O(1) is faster than std::vector with O(n).
Together, to insert in the exact middle, we have O(1) + O(n) for the array and O(n) + O(1) for the doubly linked list, making inserting in the middle O(n) for both container types. All this leaves out things like CPU caches and allocator speed though, it just compares the number of "simple" operations. In summary, you need to find out how you use the container. If you really insert/delete at random positions a lot, std::list might be better. If you only do so rarely and then only read the container, a std::vector might be better. If you only have ten elements, all the O(x) is probably worthless anyway and you should go with the one you like best.
Inserting into the middle of the vector requires all the elements after the insertion point to be shuffled along to make space, potentially involving lots of copying.
The list is implemented as a linked list with each node occupying its own space in memory with references to neighboring nodes, so adding a new node just requires changing 2 references to point to the new node.
Depending on the data type you use, a vector may well perform much faster than a list. But the more complex the object is to copy, the worse a vector gets.
In simple terms, a vector is an array. So, its elements are stored in consecutive memory locations (i.e., one next to the other). The only exception is that a vector allows resizing during run-time, without causing data loss.
Now, to insert to a list, you identify the node, then create the new element (any where in memory), store the value and connect the pointers.
But in the case of the vector (array), you must physically move the elements from one cell to the other in order to create that space for a new elements. That physical movement is what causes the delay, particularly if many elements (i.e., data) needs to be moved. You are not physcially moving array elements. Rather, its their contents.
Ulrich Eckhardt's answer is pretty good. I don't have enough reputation to add a comment so I will write an answer myself. Like Ulrich said the speed of insertion in the middle for both the list and the vector is O(n) in theory. In practice, modern CPUs have a thing called "prefetcher". it's pretty good at getting contiguous data. Since the vector is contiguous in memory, moving lots of elements is pretty fast because of the prefetcher. You need to be manipulating really, really big vectors in order for them to be slower in inserting than the list. For more details check this awesome blog post:
http://gameprogrammingpatterns.com/data-locality.html

array vs vector vs list

I am maintaining a fixed-length table of 10 entries. Each item is a structure of like 4 fields. There will be insert, update and delete operations, specified by numeric position. I am wondering which is the best data structure to use to maintain this table of information:
array - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item using [] is faster.
stl vector - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item is slower than an array since it is a call to operator[] and a linked list .
stl list - insert and delete takes linear time since you need to iterate to a specific position before applying the insert/delete; additional space is needed for pointers; accessing an item is slower than an array since it is a linked list linear traversal.
Right now, my choice is to use an array. Is it justifiable? Or did I miss something?
Which is faster: traversing a list, then inserting a node or shifting items in an array to produce an empty position then inserting the item in that position?
What is the best way to measure this performance? Can I just display the timestamp before and after the operations?
Use STL vector. It provides an equally rich interface as list and removes the pain of managing memory that arrays require.
You will have to try very hard to expose the performance cost of operator[] - it usually gets inlined.
I do not have any number to give you, but I remember reading performance analysis that described how vector<int> was faster than list<int> even for inserts and deletes (under a certain size of course). The truth of the matter is that these processors we use are very fast - and if your vector fits in L2 cache, then it's going to go really really fast. Lists on the other hand have to manage heap objects that will kill your L2.
Premature optimization is the root of all evil.
Based on your post, I'd say there's no reason to make your choice of data structure here a performance based one. Pick whatever is most convenient and return to change it if and only if performance testing demonstrates it's a problem.
It is really worth investing some time in understanding the fundamental differences between lists and vectors.
The most significant difference between the two is the way they store elements and keep track of them.
- Lists -
List contains elements which have the address of a previous and next element stored in them. This means that you can INSERT or DELETE an element anywhere in the list with constant speed O(1) regardless of the list size. You also splice (insert another list) into the existing list anywhere with constant speed as well. The reason is that list only needs to change two pointers (the previous and next) for the element we are inserting into the list.
Lists are not good if you need random access. So if one plans to access nth element in the list - one has to traverse the list one by one - O(n) speed
- Vectors -
Vector contains elements in sequence, just like an array. This is very convenient for random access. Accessing the "nth" element in a vector is a simple pointer calculation (O(1) speed). Adding elements to a vector is, however, different. If one wants to add an element in the middle of a vector - all the elements that come after that element will have to be re allocated down to make room for the new entry. The speed will depend on the vector size and on the position of the new element. The worst case scenario is inserting an element at position 2 in a vector, the best one is appending a new element. Therefore - insert works with speed O(n), where "n" is the number of elements that need to be moved - not necessarily the size of a vector.
There are other differences that involve memory requirements etc., but understanding these basic principles of how lists and vectors actually work is really worth spending some time on.
As always ... "Premature optimization is the root of all evil" so first consider what is more convenient and make things work exactly the way you want them, then optimize. For 10 entries that you mention - it really does not matter what you use - you will never be able to see any kind of performance difference whatever method you use.
Prefer an std::vector over and array. Some advantages of vector are:
They allocate memory from the free space when increasing in size.
They are NOT a pointer in disguise.
They can increase/decrease in size run-time.
They can do range checking using at().
A vector knows its size, so you don't have to count elements.
The most compelling reason to use a vector is that it frees you from explicit memory management, and it does not leak memory. A vector keeps track of the memory it uses to store its elements. When a vector needs more memory for elements, it allocates more; when a vector goes out of scope, it frees that memory. Therefore, the user need not be concerned with the allocation and deallocation of memory for vector elements.
You're making assumptions you shouldn't be making, such as "accessing an item is slower than an array since it is a call to operator[]." I can understand the logic behind it, but you nor I can know until we profile it.
If you do, you'll find there is no overhead at all, when optimizations are turned on. The compiler inlines the function calls. There is a difference in memory performance. An array is statically allocated, while a vector dynamically allocates. A list allocates per node, which can throttle cache if you're not careful.
Some solutions are to have the vector allocate from the stack, and have a pool allocator for a list, so that the nodes can fit into cache.
So rather than worry about unsupported claims, you should worry about making your design as clean as possible. So, which makes more sense? An array, vector, or list? I don't know what you're trying to do so I can't answer you.
The "default" container tends to be a vector. Sometimes an array is perfectly acceptable too.
First a couple of notes:
A good rule of thumb about selecting data structures: Generally, if you examined all the possibilities and determined that an array is your best choice, start over. You did something very wrong.
STL lists don't support operator[], and if they did the reason that it would be slower than indexing an array has nothing to do with the overhead of a function call.
Those things being said, vector is the clear winner here. The call to operator[] is essentially negligible since the contents of a vector are guaranteed to be contiguous in memory. It supports insert() and erase() operations which you would essntially have to write yourself if you used an array. Basically it boils down to the fact that a vector is essentially an upgraded array which already supports all the operations you need.
I am maintaining a fixed-length table of 10 entries. Each item is a
structure of like 4 fields. There will be insert, update and delete
operations, specified by numeric position. I am wondering which is the
best data structure to use to maintain this table of information:
Based on this description it seems like list might be the better choice since its O(1) when inserting and deleting in the middle of the data structure. Unfortunately you cannot use numeric positions when using lists to do inserts and deletes like you can for arrays/vectors. This dilemma leads to a slew of questions which can be used to make an initial decision of which structure may be best to use. This structure can later be changed if testing clearly shows its the wrong choice.
The questions you need to ask are three fold. The first is how often are you planning on doing deletes/inserts in the middle relative to random reads. The second is how important is using a numeric position compared to an iterator. Finally, is order in your structure important.
If the answer to the first question is random reads will be more prevalent than a vector/array will probably work well. Note iterating through a data structure is not considered a random read even if the operator[] notation is used. For the second question, if you absolutely require numeric position than a vector/array will be required even though this may lead to a performance hit. Later testing may show this performance hit is negligible relative to the easier coding with numerical positions. Finally if order is unimportant you can insert and delete in a vector/array with an O(1) algorithm. A sample algorithm is shown below.
template <class T>
void erase(vector<T> & vect, int index) //note: vector cannot be const since you are changing vector
{
vect[index]= vect.back();//move the item in the back to the index
vect.pop_back(); //delete the item in the back
}
template <class T>
void insert(vector<T> & vect, int index, T value) //note: vector cannot be const since you are changing vector
{
vect.push_back(vect[index]);//insert the item at index to the back of the vector
vect[index] = value; //replace the item at index with value
}
I Believe it's as per your need if one needs more insert/to delete in starting or middle use list(doubly-linked internally) if one needs to access data randomly and addition to last element use array ( vector have dynamic allocation but if you require more operation as a sort, resize, etc use vector)