Thoughts on how to implement? [closed]

Thoughts on how to implement? [closed] - c++

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 months ago.
Improve this question
I am porting some very old c-code into c++ and I've run across a linked list implemented within an array. The element is a simple structure:
struct element
{
void *m_ptrData;
short m_nextEntry;
short m_prevEntry;
};
As an array, there is quick access to the data, if you know the index. The linked list aspect lets the elements be moved around, and "deleted" from the list. Elements can be moved in the list, based on frequency of use (up for MRU and down for LRU).
I like to find a better way to implement this than using another array. I'd like to use STL, but I'm not certain which container is best to use.
Any one have any thoughts?

Since this is a linked list, you should probably use std::list...
The rule of thumb is that you want to use a linked list when you need to insert elements into random positions in the list, or delete random elements from the list. If you mainly need to add/delete elements to/from the end of the list, then you should use std::vector. If you need to add/delete elements to/from either beginning or the end of the list, then you should use std::deque.
Keep in mind, we are talking about probabilities here. If you need to insert an element into the middle of an std::vector once in a blue moon, that will probably be ok. But if you need to do this all the time, it will have a major impact on performance, because the vector will need to constantly move its elements, and probably reallocate its memory too.
On the other hand, the advantage of using a vector is that its elements are contiguous in memory, which greatly improves performance if you simply need to traverse them in order because of caching.

Since the data in this list is pointers, why bother with a linked list at all? For small PODs, std::vector is usually the best first bet, and due to the better locality of its data playing nicely with processor caches it often out-performs a linked list even where, in theory, a linked list should be better. I'd pick std::vector until some profiling would show that there is a performance problem and std::list performs better.

See here:
http://linuxsoftware.co.nz/cppcontainers.html
There's a flow chart to help you choose the right container at the bottom.

Related

Most efficient way to index true/false values in C++ [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 months ago.
The community reviewed whether to reopen this question 9 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I have a list of unsigned shorts that act as local IDs for a database. I was wondering what is the most memory-efficient way to store allowed IDs. For the lifetime of my project, the allowed ID list will be dynamic, so it may have more true or more false allowed IDs as time goes on, with a range of none allowed or all allowed.
What would be the best method to store these? I've considered the following:
List of allowed IDs
Bool vector/array of true/false for allowed IDs
Byte array that can be iterated through, similar to 2
Let me know which of these would be best or if another, better method, exists.
Thanks
EDIT: If possible, can a vector have a value put at say, index 1234, without all 1233 previous values, or would this suit a map or similar type more?
I'm looking at using an Arduino with 2k total ram and using external storage to assist with managing a large block of data, but I'm exploring what my options are

"Best" is opinion-based, unless you are aiming for memory efficiency at the expense of all other considerations. Is that really what you want?
First of all, I hope we're talking <vector> here, not <list> -- because a std::list< short > would be quite wasteful already.
What is the possible value range of those ID's? Do they use the full range of 0..USHRT_MAX, or is there e.g. a high bit you could use to indicate allowed ones?
If that doesn't work, or you are willing to sacrifice a bit of space (no pun intended) for a somewhat cleaner implementation, go for a vector partitioned into allowed ones first, disallowed second. To check whether a given ID is allowed, find it in the vector and compare its position against the cut-off iterator (which you got from the partitioning). That would be the most memory-efficient standard container solution, and quite close to a memory-optimum solution either way. You would need to re-shuffle and update the cut-off iterator whenever the "allowedness" of an entry changes, though.

One suitable data structure to solve your problem is a trie (string tree) that holds your allowed or disallowed IDs.
Your can refer to the ID binary representation as the string. Trie is a compact way to store the IDs (memory wise) and the runtime access to it is bound by the longest ID length (which in your case is constant 16)
I'm not familiar with a standard library c++ implementation, but if efficiency is crucial you can find an implementation or implementat yourself.

Should I use linked lists or arrays when sorting 100 million elements? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I want to implement algorithms like quicksort, mergesort, etc., while working with a big data files, like 100 million elements. I can't use std::vector, std::sort, or anything like that, since this is a school assignment to get to know these specific algorithms. I can only use things that I will only write on my own.
Should I implement sorting algorithms using linked lists or arrays? Which from these two is more efficient in term of working with big data? What are the advantages of using one of them?

If the number of elements is large, the better option would be an array (or any type that has contiguous memory storage, i.e. a std::vector, allocating with new [], etc.). A linked list usually does not store its nodes in contiguous memory. The contiguous memory aspect leads to better cache friendliness.
In addition to this, a linked list (assuming a doubly-linked list), would need to store a next and previous pointers to the next and previous elements for each data item, thus requiring more memory per data item. Even for a singly-linked list, a next pointer has to exist, so even though less overhead than a doubly-linked list, it is still more overhead than an array.
Another reason that isn't related to efficiency why you want to use an array is ease of implementation of the sorting algorithm. It is more difficult to implement a sorting algorithm for a linked list than it is for an array, and especially an algorithm that works with non-adjacent elements.
Also, please note that std::sort is an algorithm, it is not a container. Thus it can work with regular arrays, std::array, std::vector, std::deque, etc. So comparing std::sort to an array is not a correct comparison.

How to handle fast insert-erase operations in this c++ data structure? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Fact: my code is in c++
Question:
I have a list of unsigned long long, which I use as reppresentation for some objects and the position in the list is the ID for the object.
Each unsigned long long is a sort of lis that mark if an object has the component j, so let say that I have
OBJ=1 | (1<<3)
that means that the object has the components 1 and 3.
I want to be fast when I insert, erease and when I retrieve objects from the list.
The retrieve usually is performed by components so I will look for objects with the same components.
I was using std::vector but as soon I started thinking about performance it seems to me not to be the best choice, since each time I have to delete an object from the list it will relocate all the object ( erease can be really frequent ) plus as soon as the underlying array is full vector will create a new bigger array and will copy all the elements on it.
I was thinking to have an "efficientArray" wich is a simple array but each time an element is removed I will "mark" the position as free an I will store that position inside a list of available positions, and anytime I have to add a new element I will store it in the first available position.
In this way all the elements will be stored in contiguos area, as using vector, but I avoid the "erease" problem.
In this way I'm not avoiding the "resize" problem (maybe I can't) plus the objects with the same components will not be closer (maybe).
Are there other ideas/structures wich I can use in order to have better performance?
Am I wrong when I say that I want "similar" object to be closer?
Thanks!
EDIT
Sorry maybe the title and the question was not write in a good way. I know vector is efficient and I don't want to write a better vector. Since I'm learning I would like to understand if vector IN THIS CASE is good or bad and why, if I'm wrong and if what I was thinking is bad and why, if there are better solutions and data structures (tree? map?), if yes why. I asked even if it is convinient to keep "similar" objects closer and if that MAYBE can influence things like branch prediction or something else (no answer about that) or if it is just nonsence. I just want to learn, even "wrong" answer can be useful for me and for others to learn something, but seems it was a bad idea like I asked *"my compiler works even if I write char ** which is wrong"* and I didn't understand why.

I recommend using either std::set or std::map. You want to know if an item exists in a container and both std::set and std::map have good performance for searches and "lookups".
You could use std::bitset and assign each object an ID that is a power of 2.
In any case, you need to profile. Profile your code without changes. Profile your code using different data structures. Choose the data structure with the best performance.

Some timing for different structures can be read here.
The problem with lists are that your always hunting after the link, where each link potentially is a cache miss (and maybe a TLB miss in addition).
The vector on the other hand will enjoy few cache misses and the hardware prefetcher will work optimally for this data structure.
If the data was much larger the results are not so clearcut.

List of arrays vs list [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
A list uses a lot of memory since it adds a pointer to each node, and it is not contiguous, the memory is there fragmented ... A List of arrays in my opinion is a lot better. For example if I am managing 100 object, A list of 5 arrays of 20 is a lot better than a List of 100, only 5 Pointers added vs 100 pointers, we win locality, when using the same array, and we have less fragmentation.
I did some research about this, but I can't find any interesting article about this, so I thought I am missing something.
What can be the benefit of using a List over a List of arrays ?
EDIT : This is definetely not Array vs List ... It is more like why putting only one element per list Node if it's possible to put more

I think this is a valid question as memory location might affect the performance of your program. You can try std::deque as some suggested. However, the statement that a typical implementation uses chunks of memory is a general statement about implementation, not standard. It is therefor not guaranteed to be so.
In C++, you can improve the locality of your data through custom allocators and memory pools. Every STL container takes as a parameter allocator. The default allocator is probably a simple wrapper around new and delete but you can supply your own allocator that uses a memory pool. You can find more about allocators here and here is a link to the C++ default allocator. Here is a dr.dobbs article about this topic. A quote from the article:
If you're using node-based containers (such as maps, sets, or lists), allocators optimized for smaller objects are probably a better way to go.
Only profiling will tell you what works best for your case. Maybe the combination of std::deque and a custom allocator will be the best option.

Some operations have guaranteed constant-time complexity with std::list and not with std::vector or std::deque. For example, with std::list you can move a subsequence ("splice") of 5000 consecutive elements from the middle of a list with a million items to the beginning (of the same list only!) in constant time. No way to do that with either std::vector or std::deque, and even for your list-of-arrays, it will be a fair bit more complicated.
This is a good read: Complexity of std::list::splice and other list containers

Why would I want to implement my own doubly linked list in C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Apart from the academic aspect of learning something from implementing my own doubly linked list in C++, is there any actual real-world advantage of implementing my own doubly linked list when there is already std::list? Can I make things more efficient on my own for a certain task, or has std::list been refined so much over the years that it is the optimal implementation of a doubly linked list in most cases?

is there any actual real-world advantage of implementing my own doubly linked list when there is already std::list?
Probably not.
Can I make things more efficient on my own for a certain task,
Maybe - depends on the task. You might only need a singly linked list, which might end up being faster, for example.
or has std::list been refined so much over the years that it is the optimal implementation of a doubly linked list in most cases?
Probably.
The best answer to all of this stuff is probably "use the standard implementation until it doesn't work, and then figure out what you're going to do about it."

"Why would I want to implement a doubly linked list in C++?"
You would not. Even if you needed a doubly linked list, you would avoid reinventing the wheel and use an existing implementation.

If you have to ask that question, the answer is no.

You would almost never want or need to implement your own list. For that matter, you would almost never want or need to reimplement your own anything if it is covered by the Standard Library.
There are exceptions. Those exceptions are, well... exceptional. Exceptions are made for performance reasons, typically when the performance requirements are extreme. But these exceptions can only be responsibly made when you have already proved that the Standard Library-provided functionality is a measurable problem for your specific use.

Yes, it is possible for a doubly-linked list to be more efficient than std::list for a certain task. There's an algorithmic complexity trade-off in std::list, and your task might benefit from the opposite decision.
Specifically, std::list is required in C++11, and recommended in C++03, to have constant-time size(). That is, the number of elements must be stored in the object.
This means that the 4-argument version of splice() necessarily has linear complexity, because it needs to count the number of elements moved from one list to another in order to update both sizes (except in some special cases, like the source list being empty after the splice, or the two lists being the same object).
However, if you're doing a lot of splicing then you might prefer to have these the other way around -- constant-time splice() and a size() function that is linear-time in the worst case (when a splice has been done on the list more recently than the last time the size was updated).
As it happens, GNU's implementation of std::list made this choice, and had to change for C++11 conformance. So even if this is what you want, you don't necessarily have to completely implement it yourself. Just dust off the old GNU code and change the names so that it's valid user code.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js