List of arrays vs list [closed] - c++

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
A list uses a lot of memory since it adds a pointer to each node, and it is not contiguous, the memory is there fragmented ... A List of arrays in my opinion is a lot better. For example if I am managing 100 object, A list of 5 arrays of 20 is a lot better than a List of 100, only 5 Pointers added vs 100 pointers, we win locality, when using the same array, and we have less fragmentation.
I did some research about this, but I can't find any interesting article about this, so I thought I am missing something.
What can be the benefit of using a List over a List of arrays ?
EDIT : This is definetely not Array vs List ... It is more like why putting only one element per list Node if it's possible to put more

I think this is a valid question as memory location might affect the performance of your program. You can try std::deque as some suggested. However, the statement that a typical implementation uses chunks of memory is a general statement about implementation, not standard. It is therefor not guaranteed to be so.
In C++, you can improve the locality of your data through custom allocators and memory pools. Every STL container takes as a parameter allocator. The default allocator is probably a simple wrapper around new and delete but you can supply your own allocator that uses a memory pool. You can find more about allocators here and here is a link to the C++ default allocator. Here is a dr.dobbs article about this topic. A quote from the article:
If you're using node-based containers (such as maps, sets, or lists), allocators optimized for smaller objects are probably a better way to go.
Only profiling will tell you what works best for your case. Maybe the combination of std::deque and a custom allocator will be the best option.

Some operations have guaranteed constant-time complexity with std::list and not with std::vector or std::deque. For example, with std::list you can move a subsequence ("splice") of 5000 consecutive elements from the middle of a list with a million items to the beginning (of the same list only!) in constant time. No way to do that with either std::vector or std::deque, and even for your list-of-arrays, it will be a fair bit more complicated.
This is a good read: Complexity of std::list::splice and other list containers

Related

Should I use linked lists or arrays when sorting 100 million elements? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I want to implement algorithms like quicksort, mergesort, etc., while working with a big data files, like 100 million elements. I can't use std::vector, std::sort, or anything like that, since this is a school assignment to get to know these specific algorithms. I can only use things that I will only write on my own.
Should I implement sorting algorithms using linked lists or arrays? Which from these two is more efficient in term of working with big data? What are the advantages of using one of them?
If the number of elements is large, the better option would be an array (or any type that has contiguous memory storage, i.e. a std::vector, allocating with new [], etc.). A linked list usually does not store its nodes in contiguous memory. The contiguous memory aspect leads to better cache friendliness.
In addition to this, a linked list (assuming a doubly-linked list), would need to store a next and previous pointers to the next and previous elements for each data item, thus requiring more memory per data item. Even for a singly-linked list, a next pointer has to exist, so even though less overhead than a doubly-linked list, it is still more overhead than an array.
Another reason that isn't related to efficiency why you want to use an array is ease of implementation of the sorting algorithm. It is more difficult to implement a sorting algorithm for a linked list than it is for an array, and especially an algorithm that works with non-adjacent elements.
Also, please note that std::sort is an algorithm, it is not a container. Thus it can work with regular arrays, std::array, std::vector, std::deque, etc. So comparing std::sort to an array is not a correct comparison.

sparse vector in C++? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I have some code, using the class vector, which I want to implement with a vector that implements sparse vectors (i.e. instead of recording the elements in an array of the vector's length, including 0's, it would only include the non-zero tables in a look-up table).
Is there any sparse vector class in C++ that makes use of the same interface that vector does? (that will make refactoring much easier.)
Brendan is right to observe that logically a vector provides a map from index to value. An std::vector accomplishes this mapping with a simple array. But there are other options.
std::unordered_map has amortized O(1) time operations and seems like a natural alternative.
std::map has O(logn) operations, but the constants are smaller and there are more years of optimizations behind this. In practice it may be faster depending on your application.
SparseHash has an STL-compatible hashtable implementation that claims to be better than a standard unordered_map.
C++ BTree again offers an STL-compatible map, but one that uses btrees instead of binary trees. They claim significantly improved memory (50-80%) and time.
BitMagic offers an implementation of a sparse bit vector. Think a sparse std::bitset. If this fits your needs it offers really significant improvements over all the other approaches.
Finally the classical approach to a sparse vector is to use two vectors, one for the index and one for the values. So you have an std::vector<uint> ind; and a std::vector<value_type> val;.
None of these have exactly the same interface as a std::vector, but the differences are pretty small and you could easily write a small wrapper. For example, for the map classes you would want to keep track of the size and overload size() to return that number instead of the number of non-empty elements. Indeed, Boost's mapped_vector that Brendan links to does exactly this: it wraps a map-like class in a vector-like interface.
A drop-in replacement that works in all cases is impossible (because a std::vector is in nearly all cases assumed to degenerate into an array, eg. &vector[0], and often this is used). Also most users who are interested in the sparse cases are also interested in taking advantage of the sparsity, hence need it exposed. For example, your sparse vector's iterator would have to iterate over all elements, including the empties, which is simply wasteful. The whole point of a sparse structure is to skip all that. If your algorithms can't handle that then you leave a lot of potential gains on the table.
Boost has a sparse vector. I don't think one exists in the standard library.
std::unordered_map is probably a better choice though in the long run though, unless you're already using Boost. The main annoyance in refactoring will be that size() means something different in a map vs. sparse array. Range-based for loops should make that easier to deal with though.

Why would I want to implement my own doubly linked list in C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Apart from the academic aspect of learning something from implementing my own doubly linked list in C++, is there any actual real-world advantage of implementing my own doubly linked list when there is already std::list? Can I make things more efficient on my own for a certain task, or has std::list been refined so much over the years that it is the optimal implementation of a doubly linked list in most cases?
is there any actual real-world advantage of implementing my own doubly linked list when there is already std::list?
Probably not.
Can I make things more efficient on my own for a certain task,
Maybe - depends on the task. You might only need a singly linked list, which might end up being faster, for example.
or has std::list been refined so much over the years that it is the optimal implementation of a doubly linked list in most cases?
Probably.
The best answer to all of this stuff is probably "use the standard implementation until it doesn't work, and then figure out what you're going to do about it."
"Why would I want to implement a doubly linked list in C++?"
You would not. Even if you needed a doubly linked list, you would avoid reinventing the wheel and use an existing implementation.
If you have to ask that question, the answer is no.
You would almost never want or need to implement your own list. For that matter, you would almost never want or need to reimplement your own anything if it is covered by the Standard Library.
There are exceptions. Those exceptions are, well... exceptional. Exceptions are made for performance reasons, typically when the performance requirements are extreme. But these exceptions can only be responsibly made when you have already proved that the Standard Library-provided functionality is a measurable problem for your specific use.
Yes, it is possible for a doubly-linked list to be more efficient than std::list for a certain task. There's an algorithmic complexity trade-off in std::list, and your task might benefit from the opposite decision.
Specifically, std::list is required in C++11, and recommended in C++03, to have constant-time size(). That is, the number of elements must be stored in the object.
This means that the 4-argument version of splice() necessarily has linear complexity, because it needs to count the number of elements moved from one list to another in order to update both sizes (except in some special cases, like the source list being empty after the splice, or the two lists being the same object).
However, if you're doing a lot of splicing then you might prefer to have these the other way around -- constant-time splice() and a size() function that is linear-time in the worst case (when a splice has been done on the list more recently than the last time the size was updated).
As it happens, GNU's implementation of std::list made this choice, and had to change for C++11 conformance. So even if this is what you want, you don't necessarily have to completely implement it yourself. Just dust off the old GNU code and change the names so that it's valid user code.

Why prefer std::vector over std::deque? [duplicate]

This question already has answers here:
Why would I prefer using vector to deque
(10 answers)
Closed 3 years ago.
They both have access complexity of O(1) and random insertion/removal complexity of O(n). But vector costs more when expanding because of reallocation and copy, while deque does not have this issue.
It seems deque has a better performance, but why most people use vector instead of deque?
why most people use vector instead of deque?
Because this is what they have been taught.
vector and deque serve slightly different purposes. They can both be used as a simply container of objects, if that's all you need. When learning to program C++, that is all most people need -- a bucket to drop stuff in to, get stuff out of, and walk over.
When StackOverflow is asked a question like "which container should I use by default," the answer is almost invariably vector. The question is generally asked from the context of learning to program in C++, and at the point where a programmer is asking such a question, they don't yet know what they don't know. And there's a lot they don't yet know. So, we (StackOverflow) need a container that fits almost every need for better or worse, can be used in almost any context, and doesn't require that the programmer has asked all the right questions before landing on something approximating the correct answer. Furthermore, the Standard specifically recommends using vector. vector isn't best for all uses, and in fact deque is better than vector for many common uses -- but it's not so much better for a learning programmer that we should vary from the Standard's advice to newbie C++ programmers, so StackOverflow lands on vector.
After having learned the basics of the syntax and, shall we say, the strategies behind programming in C++, programmers split in to two branches: those who care to learn more and write better programs, and those who don't. Those who don't will stick on vector forever. I think many programmers fall in to this camp.
The rarer programmers who try to move beyond this phase start asking other questions -- questions like you've asked here. They know there is lots they don't yet know, and they want to start discovering what those things are. They will quickly (or less quickly) discover that when choosing between vector and deque, some questions they didn't think to ask before are:
Do I need the memory to be contigious?
Do I need to avoid lots of reallocations?
Do I need to keep valid iterators after insertions?
Do I need my collection to be compatible with some ancient C-like function?
Then they really start thinking about the code they are writing, discover yet more stuff they don't know, and the beat goes on...
From C++ standard section 23.1.1:
vector is the type of sequence that should be used by default... deque is
the data structure of choice when most insertions and deletions take place
at the beginning or at the end of the sequence.
However there are some arguments in the opposite direction.
In theory vector is at least as efficient as deque as it provides a subset of its functionality. If your task only needs what vector's interface provides, prefer vector - it can not be worse than a deque.
But vector costs more when expanding because of reallocation and copy
While it's true that vector sometimes has to reallocate its array as it grows, it will grow exponentially so that the amortised complexity is still O(1). Often, you can avoid reallocations by judicious use of reserve().
It seems deque has a better performance
There are many aspects to performance; the time taken by push_back is just one. In some applications, a container might be modified rarely, or populated on start-up and then never modified. In cases like that, iteration and access speed might be more important.
vector is the simplest possible container: a contiguous array. That means that iteration and random access can be achieved by simple pointer arithmetic, and accessing an element can be as fast as dereferencing a pointer.
deque has further requirements: it must not move the elements. This means that a typical implementation requires an extra level of indirection - it is generally implemented as something like an array of pointers to arrays. This means that element access requires dereferencing two pointers, which will be slower than one.
Of course, often speed is not a critical issue, and you choose containers according to their behavioural properties rather than their performance. You might choose vector if you need the elements to be contiguous, perhaps to work with an API based on pointers and arrays. You might choose deque or list if you want a guarantee that the elements won't move, so you can store pointers to them.
For cplusplus :
Therefore they provide a similar functionality as vectors, but with
efficient insertion and deletion of elements also at the beginning of
the sequence, and not only at its end. But, unlike vectors, deques are
not guaranteed to store all its elements in contiguous storage
locations, thus not allowing direct access by offsetting pointers to
elements.
Personally, I prefer using deque (I always end up spoiling myself and having to use push_front for some reason or other), but vector does have its uses/differences, with the main one being:
vector has contiguous memory, while a deque allocates via pages/chunks.
Note, the pages/chunks are fairly useful: constant-time insert/erase on the front of the container. It is also typical that a large block of memory broken up into a series of smaller blocks is more efficient than a single block of a memory.
You could also argue that, because deque is 'missing' size reservation methods (capacity/reserve), you have less to worry about.
I highly suggest you read Sutters' GotW on the topic.

Thoughts on how to implement? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 months ago.
Improve this question
I am porting some very old c-code into c++ and I've run across a linked list implemented within an array. The element is a simple structure:
struct element
{
void *m_ptrData;
short m_nextEntry;
short m_prevEntry;
};
As an array, there is quick access to the data, if you know the index. The linked list aspect lets the elements be moved around, and "deleted" from the list. Elements can be moved in the list, based on frequency of use (up for MRU and down for LRU).
I like to find a better way to implement this than using another array. I'd like to use STL, but I'm not certain which container is best to use.
Any one have any thoughts?
Since this is a linked list, you should probably use std::list...
The rule of thumb is that you want to use a linked list when you need to insert elements into random positions in the list, or delete random elements from the list. If you mainly need to add/delete elements to/from the end of the list, then you should use std::vector. If you need to add/delete elements to/from either beginning or the end of the list, then you should use std::deque.
Keep in mind, we are talking about probabilities here. If you need to insert an element into the middle of an std::vector once in a blue moon, that will probably be ok. But if you need to do this all the time, it will have a major impact on performance, because the vector will need to constantly move its elements, and probably reallocate its memory too.
On the other hand, the advantage of using a vector is that its elements are contiguous in memory, which greatly improves performance if you simply need to traverse them in order because of caching.
Since the data in this list is pointers, why bother with a linked list at all? For small PODs, std::vector is usually the best first bet, and due to the better locality of its data playing nicely with processor caches it often out-performs a linked list even where, in theory, a linked list should be better. I'd pick std::vector until some profiling would show that there is a performance problem and std::list performs better.
See here:
http://linuxsoftware.co.nz/cppcontainers.html
There's a flow chart to help you choose the right container at the bottom.