Appropriate data structure for returning all subsets of a set

Appropriate data structure for returning all subsets of a set - c++

I understand the algorithm for doing this but I don't know what data structure (array, linked list, vector, other?) would be best for returning the final set of sets since every example I see just asks to print the sets.
Can someone explain the thought process for deciding between the 3 data structures I mentioned?
Also, are vectors even used anymore? I heard they were obsolete but still see many recent examples.
To be clear, I mean ALL subsets, so they have different sizes.

The decision of which data structure to use depends on:
Type of data to be stored
Operations that you intend to perform on the data
A normal array, would give you contiguous block of memory and random access to the elements, however you need to know the exact number of elements before hand so that you can allocate an array of appropriate size.
With std::vector you get random access to the data, and contiguous just like arrays but vector is a dynamic array,it grows as you add new elements, and a amortized constant complexity, however insertion/deletion of elements is faster only at the ends since all elements need to be moved.
With std::list you don't get the random access but insertion and deletion of elements is faster because it involves just moving around of the pointer links.
Also, are vectors even used anymore?
That is not true at all.
They are very much in use and one of the most widely used data structures provided by the Standard Library.

Once i used bit fields to determine the subsets. Where if the i th bit is 1 then the i the element in the set is selected in the subset and 0 otherwise. In this case you need to store the ordering of the elements. The same can be done with bool vectors i think.

Related

Containing single data in multiple containers

I've lots of data to acquire and process (near a million) and i don't wanna copy or move it along the whole program.
Let me describe the situation with an example. I have a Vector with 100.000 elements. And i want to keep the track of the time when these elements were inserted into the vector. So, it's a good idea to keep both time and data in a Map. However, i still want to use the Vector.
Is there any way to achieve that second elements of Map shows the Vector but not waste any resource unnecessarily?
The first thing that comes to my mind is containing the adress of datas in Vector. However, pointers use 4 bytes(not sure) and for example, if we wanna contain the address of char, it is 4 times bigger than the data itself.
Any ideas ?

I'd say it's not solely a question of memory consumption, but of consistency also. Depends on how you're going to use various views on your original input data. In general I'd advise using std::unique_ptr for the original data and std::weak_ptr for reference in views.
But you're right that might have a certain memory usage overhead because of pointer sizes exceed size of the pointee objects.
For the latter case having a kind of FlyWeight pattern implementation might be more appropriate.

Containing single data in multiple containers
Yes, you can use the Boost Multi-index Containers Library to index the same data in more than one way with no duplication of content. Unlike homebrew XXX_ptr solutions, the multi-index containers also takes care of keeping everything coherent (removing a data unit from the container automatically elides it from all indexes.)
Lighter weight, more specialized solutions (and possibly more efficient than either muklti-index containers and/or homebrew XXX_ptr solutions) may also be possible depending on your application's requirements, particularities and data insertion/lifecycle patterns:
Do you need the memory layout of your original vector to remain unchanged, or can you accommodate a vector of some derived type?
Will your vector contents change? can it grow?
Will the elements therein be (inserted & kept) in chronological order anyway?
Do you need to look up a vector element by insertion time in addition to vector position?

Fastest data structure for inserting data, not for searching

Can anyone please tell me which is the fastest data structure for inserting data. My requirements are to load names of people, and then retrieve them at super fast speed. There is no question of sorting, searching a specific name etc, not even memory, because total people may not be more than 20. The single requirement is load people and retrieve the names at a later stage.
Does anyone have any idea?

Tongue-in-cheek answer: if you are only doing insertions and nothing else, the easiest data structure is nothing at all - just don't store anything. That makes insertions instantaneous, since you do absolutely nothing to do an insertion.
More realistic answer: if you are just trying to store a bunch of data as quickly as possible and you have an upper bound on the number of total elements, just use an array and keep track of the next free index. If the array stores pointers to elements, then each insertion is a pointer assignment plus an increment of the next free index. If you're storing copies, then each insertion makes a copy (which you'd have to do anyway) and an increment. Since any structure that stores elements has to either store a pointer or copy to it, the overhead is a single increment, which I'm fairly confident is about as cheap as it gets.
Hope this helps!

Apart from the tongue in check answer (it was my first thought) and you have at most 20 people, just use an array.

If you are sure that total objects / nodes will always be less than 20 than array is the best. Retreival happens using index in array and is the fastest.
If you are not sure of the size of datastructure, then I will suggest a list because insertion in a list is the fastest provided you are not looking for an order of insertion. If order of insertion is needed then go for linked list.
Hashtable might not be an option because it has an extra overhead of synchronozation which is not needed here.
A Set interface could be used if you want to avoid duplicates and you have no worries about the order of retrieval.

Dynamic size of array in c++?

I am confused. I don't know what containers should I use. I tell you what I need first. Basically I need a container that can stored X number of Object (and the number of objects is unknown, it could be 1 - 50k).
I read a lot, over here array vs list its says: array need to be resized if the number of objects is unknown (I am not sure how to resize an array in C++), and it also stated that if using a linked list, if you want to search certain item, it will loop through (iterate) from first to end (or vice versa) while an array can specify "array object at index".
Then I went for an other solution, map, vector, etc. Like this one: array vs vector. Some responder says never use array.
I am new to C++, I only used array, vector, list and map before. Now, for my case, what kind of container you will recommend me to use? Let me rephrase my requirements:
Need to be a container
The number of objects stored is unknown but is huge (1 - 40k maybe)
I need to loop through the containers to find specific object

std::vector is what you need.
You have to consider 2 things when selecting a stl container.
Data you want to store
Operations you want to perform on the stored data
There wasa good diagram in a question here on SO, which depitcs this, I cannot find the link to it but I had it saved long time ago, here it is:

You cannot resize an array in C++, not sure where you got that one from. The container you need is std::vector.

The general rule is: use std::vector until it doesn't work, then shift to something that does. There are all sorts of theoretical rules about which one is better, depending on the operations, but I've regularly found that std::vector outperforms the others, even when the most frequent operations are things where std::vector is supposedly worse. Locality seems more important than most of the theoretical considerations on a modern machine.
The one reason you might shift from std::vector is because of iterator validity. Inserting into an std::vector may invalidate iterators; inserting into a std::list never.

Do you need to loop through the container, or you have a key or ID for your objects?
If you have a key or ID - you can use map to be able to quickly access the object by it, if the id is the simple index - then you can use vector.
Otherwise you can iterate through any container (they all have iterators) but list would be the best if you want to be memory efficient, and vector if you want to be performance oriented.

You can use vector. But if you need to find objects in the container, then consider using set, multiset or map.

Best sorting algorithm for sorting structs with byte comparisons?

I have an array of 64 structs that hold a decent amount of data (struct is around 128 bytes, so that's 8192 bytes that need to be re-arragned). The array needs to be sorted based off a single unsigned byte in each struct. An interesting property of my data is that it is likely that there will be many duplicates of sorted value - meaning that if you got rid of all duplicates, the array may only be 10 unique elements long, but this is not a given.
Once sorted, I need to create a stack that stores the size and type that each unique byte-run starts:
so if I ended up with sorted values:
4,4,4,9,9,9,9,9,14,14
the stack would be:
(4,3), (9,5), (14,2)
I figured that there would be some nice optimizations I could perform under these conditions. If I do heapsort, I can create the stack while I'm sorting, but would this be faster than a qsort and then build the stack afterwords? Would any sorting algorithm run slower because of the large structs I'm using? Any optimizations I can make because I'm only comparing bytes?
BTW: language is c++
Thanks.

I would imagine that STL will do what you want nicely. Re-writing your own sort routines and containers is likely to be error-prone, and slow. So only worry about if you find it's a bottleneck.

In general with large objects it can be faster to sort an array of pointers/indices of the objects rather than the objects. Or sort an array of nodes, where each node contains a pointer/index of the object and the object's sort key (in this case the key is one byte). To do this in C++ you can just supply a suitable comparator to std::sort or std::stable_sort. Then if you need the original objects in order, as opposed to just needing to know the correct order, finally copy the objects into a new array.
Copying 128 bytes is almost certainly much slower than performing a byte comparison, even with an extra indirection. So for optimal performance it's the moves you need to look at, not the comparisons, and dealing in pointers is one way to avoid most of the moving.
You could build your run-length encoding as you perform the copy at the end.
Of course it might be possible to go even faster with some custom sorting algorithm which makes special use of the numbers in your case (64, "around 128", and 1). But even simple questions like "which is fastest - introsort, heap sort or merge sort" are generally impossible to answer without writing and running the code.

The sort would not be slower because you will be sorting pointer or references to the structs and not the actual struct in memory.

The fact that your keys are integers, and there really aren't a lot of them,
odds are the Bucket Sort, with a bucket size of 1, would be very applicable.

array vs vector vs list

I am maintaining a fixed-length table of 10 entries. Each item is a structure of like 4 fields. There will be insert, update and delete operations, specified by numeric position. I am wondering which is the best data structure to use to maintain this table of information:
array - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item using [] is faster.
stl vector - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item is slower than an array since it is a call to operator[] and a linked list .
stl list - insert and delete takes linear time since you need to iterate to a specific position before applying the insert/delete; additional space is needed for pointers; accessing an item is slower than an array since it is a linked list linear traversal.
Right now, my choice is to use an array. Is it justifiable? Or did I miss something?
Which is faster: traversing a list, then inserting a node or shifting items in an array to produce an empty position then inserting the item in that position?
What is the best way to measure this performance? Can I just display the timestamp before and after the operations?

Use STL vector. It provides an equally rich interface as list and removes the pain of managing memory that arrays require.
You will have to try very hard to expose the performance cost of operator[] - it usually gets inlined.
I do not have any number to give you, but I remember reading performance analysis that described how vector<int> was faster than list<int> even for inserts and deletes (under a certain size of course). The truth of the matter is that these processors we use are very fast - and if your vector fits in L2 cache, then it's going to go really really fast. Lists on the other hand have to manage heap objects that will kill your L2.

Premature optimization is the root of all evil.
Based on your post, I'd say there's no reason to make your choice of data structure here a performance based one. Pick whatever is most convenient and return to change it if and only if performance testing demonstrates it's a problem.

It is really worth investing some time in understanding the fundamental differences between lists and vectors.
The most significant difference between the two is the way they store elements and keep track of them.
- Lists -
List contains elements which have the address of a previous and next element stored in them. This means that you can INSERT or DELETE an element anywhere in the list with constant speed O(1) regardless of the list size. You also splice (insert another list) into the existing list anywhere with constant speed as well. The reason is that list only needs to change two pointers (the previous and next) for the element we are inserting into the list.
Lists are not good if you need random access. So if one plans to access nth element in the list - one has to traverse the list one by one - O(n) speed
- Vectors -
Vector contains elements in sequence, just like an array. This is very convenient for random access. Accessing the "nth" element in a vector is a simple pointer calculation (O(1) speed). Adding elements to a vector is, however, different. If one wants to add an element in the middle of a vector - all the elements that come after that element will have to be re allocated down to make room for the new entry. The speed will depend on the vector size and on the position of the new element. The worst case scenario is inserting an element at position 2 in a vector, the best one is appending a new element. Therefore - insert works with speed O(n), where "n" is the number of elements that need to be moved - not necessarily the size of a vector.
There are other differences that involve memory requirements etc., but understanding these basic principles of how lists and vectors actually work is really worth spending some time on.
As always ... "Premature optimization is the root of all evil" so first consider what is more convenient and make things work exactly the way you want them, then optimize. For 10 entries that you mention - it really does not matter what you use - you will never be able to see any kind of performance difference whatever method you use.

Prefer an std::vector over and array. Some advantages of vector are:
They allocate memory from the free space when increasing in size.
They are NOT a pointer in disguise.
They can increase/decrease in size run-time.
They can do range checking using at().
A vector knows its size, so you don't have to count elements.
The most compelling reason to use a vector is that it frees you from explicit memory management, and it does not leak memory. A vector keeps track of the memory it uses to store its elements. When a vector needs more memory for elements, it allocates more; when a vector goes out of scope, it frees that memory. Therefore, the user need not be concerned with the allocation and deallocation of memory for vector elements.

You're making assumptions you shouldn't be making, such as "accessing an item is slower than an array since it is a call to operator[]." I can understand the logic behind it, but you nor I can know until we profile it.
If you do, you'll find there is no overhead at all, when optimizations are turned on. The compiler inlines the function calls. There is a difference in memory performance. An array is statically allocated, while a vector dynamically allocates. A list allocates per node, which can throttle cache if you're not careful.
Some solutions are to have the vector allocate from the stack, and have a pool allocator for a list, so that the nodes can fit into cache.
So rather than worry about unsupported claims, you should worry about making your design as clean as possible. So, which makes more sense? An array, vector, or list? I don't know what you're trying to do so I can't answer you.
The "default" container tends to be a vector. Sometimes an array is perfectly acceptable too.

First a couple of notes:
A good rule of thumb about selecting data structures: Generally, if you examined all the possibilities and determined that an array is your best choice, start over. You did something very wrong.
STL lists don't support operator[], and if they did the reason that it would be slower than indexing an array has nothing to do with the overhead of a function call.
Those things being said, vector is the clear winner here. The call to operator[] is essentially negligible since the contents of a vector are guaranteed to be contiguous in memory. It supports insert() and erase() operations which you would essntially have to write yourself if you used an array. Basically it boils down to the fact that a vector is essentially an upgraded array which already supports all the operations you need.

I am maintaining a fixed-length table of 10 entries. Each item is a
structure of like 4 fields. There will be insert, update and delete
operations, specified by numeric position. I am wondering which is the
best data structure to use to maintain this table of information:
Based on this description it seems like list might be the better choice since its O(1) when inserting and deleting in the middle of the data structure. Unfortunately you cannot use numeric positions when using lists to do inserts and deletes like you can for arrays/vectors. This dilemma leads to a slew of questions which can be used to make an initial decision of which structure may be best to use. This structure can later be changed if testing clearly shows its the wrong choice.
The questions you need to ask are three fold. The first is how often are you planning on doing deletes/inserts in the middle relative to random reads. The second is how important is using a numeric position compared to an iterator. Finally, is order in your structure important.
If the answer to the first question is random reads will be more prevalent than a vector/array will probably work well. Note iterating through a data structure is not considered a random read even if the operator[] notation is used. For the second question, if you absolutely require numeric position than a vector/array will be required even though this may lead to a performance hit. Later testing may show this performance hit is negligible relative to the easier coding with numerical positions. Finally if order is unimportant you can insert and delete in a vector/array with an O(1) algorithm. A sample algorithm is shown below.
template <class T>
void erase(vector<T> & vect, int index) //note: vector cannot be const since you are changing vector
{
vect[index]= vect.back();//move the item in the back to the index
vect.pop_back(); //delete the item in the back
}
template <class T>
void insert(vector<T> & vect, int index, T value) //note: vector cannot be const since you are changing vector
{
vect.push_back(vect[index]);//insert the item at index to the back of the vector
vect[index] = value; //replace the item at index with value
}

I Believe it's as per your need if one needs more insert/to delete in starting or middle use list(doubly-linked internally) if one needs to access data randomly and addition to last element use array ( vector have dynamic allocation but if you require more operation as a sort, resize, etc use vector)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js