Which datastructure to use for finding an element [duplicate] - c++

This question already has answers here:
How can I efficiently select a Standard Library container in C++11?
(4 answers)
Closed 4 years ago.
I have several items saved in a list. I would like to add items that have already been processed to a datastructure (this makes sense in my case even though you might wonder why). When processing the next item from the list I first want to make sure if it has been processed before so lets say something like this:
if(element_is_in_datastructure(current_element)) {
do this
}
else
{
do that
add_element_to_datastructure(current_element)
}
My question is, what is the ideal datastructure where checking if the element is in it won't take too long. At the moment I don't have too many elements (max 30) which will be added to the datastructure, but this number might increase and I don't want to lose performance.

You can use a map e.g std::unordered_map to store your elements as keys.
Then just check their presence e.g
if(!yourMap.count(element))
{
// your element is not in the structure
}
This finding takes logarithmic time in the map's size to finish.

Related

Fastest way for getting last index of item in vector in C++? [duplicate]

This question already has answers here:
C++ How to find position of last occurrence of element in vector
(7 answers)
Closed 11 months ago.
Let's say you have a vector of ints, unsorted and with multiple repeating items, like so:
vector<int> myVec{1, 0, 0, 0, 1, 1, 0, 1,0,0}
What is the fastest way to get the last index of 1 which is 8 for this example, other than looping through it from its end?
Would this be different if the vector would contain other items than 0 and 1?
What is the fastest way to do this in C++?
L.E. I have seen the duplicate topic suggestions but even it dolves partially what I am looking for this has nothing to do with the minimum element in vector so I keep the question maybe it will help someone else too.
Depends on if you are stuck with vector<int>. If you could store the bits with bitset or unsigned int, then you can find the right most set bit through bitwise operations: Efficient bitwise operations for counting bits or find the right|left most ones
The only faster way i can think of would be to save the last index as you populate the vector... It would add extra time to insertion but it would be faster to access.
If that is acceptable for your use case you might also want to consider the number of unique values in your vector, in your example this is feasible, if most values are unique you would quickly increase your memory usage.
You might want to inherit std::vector and implement your own insert as well as constructor if you want to go this way.
Use std::max_element and reverse iterators. And that is looping through the vector. If it is unsorted, there is no faster way.

Use of multisets in C++ [duplicate]

This question already has answers here:
Give me a practical use-case of Multi-set
(4 answers)
"multiset" & "multimap" - What's the point?
(7 answers)
Closed 2 years ago.
I understand the usage on sets in C++, but why do multisets exist?
What are some real world applications where multisets are useful?
This argument can extended for unordered multisets as well, what differentiates then from using a vector and what advantages and disadvantages does it provide?
Because you don't have to store single-element objects in a multi-set. You're thinking of storing something like a string in a multi-set. But that's not what it's made for. You can have any struct you want, and make the comparison be with a single element in the struct.
For example:
struct PhoneBookEntry
{
std::string name;
std::string phoneNumber;
}
In this naive "phone book" entry, there's no reason to have a single entry per name in a phone book. There might be many. So you make a multiset of PhoneBookEntry, and you make the comparator be by name. This way, you can have multiple phone numbers with the same name.
Now you might think that a map is more suitable for this, sure. But this is just an example. If you have a structure where you don't need a key/value but you need the search properties of a set with multiple elements per key, you use a multiset.

How exactly do lookup tables work and how to implement them? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
The community reviewed whether to reopen this question 6 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I made a program recently that dealt with a lot of if/else statements to return particular values. Someone recommended using lookup tables instead. My question is,
How do they work and how do you implement them?
What is the difference between a map, hash table, and lookup table.
My question is, how do they work and how do you implement it? What is
the difference between stl map, hash tables, and lookup tables.
What you're looking for is an efficient mechanism by which you can look up the value that corresponds to a given key.
Your current mechanism (a long list of if/else-if commands) is rather inefficient, in that if you have N possible values to choose from, you will (on average) have to compare your candidate key against (N/2) other keys before you find the one that matches and you can stop looking. (This is known as O(N) complexity)
So what are the other choices?
The simplest one is literally just an array of values, e.g.
const char* const myLookupTable[1000] = {
"zero",
"one",
"two",
[...]
"nine hundred and ninety-nine"
};
... with a lookup table like that, you take a key (which in this case is a number between 0 and 999, inclusive), and look up the corresponding value with a single array-lookup:
const char* val = myLookupTable[myKeyIndex];
That's super-efficient (O(1) complexity -- it always finishes in constant time, regardless of how big the array is!), but it only works in cases where your keys are unsigned integers in a continuous (and relatively small) range of values. For example, if your keys were strings, this approach wouldn't apply.
For more flexibility, the next option would be STL's std::map. std::map gives you fast key->value lookups from any key-type to any value-type. Internally it is implemented as a tree: each key-value pair is inserted into the tree in such a way that the tree remains sorted with the smallest keys at the left of the tree and the largest keys at the right. Because of that, looking up a key (and its associated value) in a std::map is just a matter of starting at the tree's root node and comparing the key at that node to the key you are looking up: is it less than your key? Then move to the right-hand child. Or it greater than your key? Then move to the left-hand child. Repeat that until you get to the bottom of the tree, at which point you'll either find the key-value pair you were looking for or you'll find that it's not present. This is an algorithm of O(log(N)) complexity, because for a tree with N values in it, it takes log(N) comparisons for the lookup to complete. O(log(N)) is considered pretty good efficiency.
The final data structure you mentioned is a hash table (as seen in std::unordered_map). A hash table does things a bit differently -- internally it is an array, but in order to avoid the limitations of the lookup-table approach, it also comes with an algorithm for figuring out where in its array a given key/value pair is to be stored. It does this by calculating a hash code for the key-object you pass in -- and then using that code to compute an offset into the array (e.g. int array_offset = hash_code % array_size) and looking at that slot in the array to see if the requested key-value pair is there. If it is, then it's done (O(1) performance again!); or if the slot is empty, then it knows that your key isn't in the table and can return failure immediately (O(1) again). If the slot is occupied by some other key/value pair, then the hashtable will need to fall back to another algorithm to sort out the hash collision; different hash tables handle that different ways but it's generally still fairly efficient.
Your question is really to broad since StackOverflow is not a tutorial site, but I feel kind this morning...
A "lookup table" is simply a container (any kind of container) that contains values you look up, and usually map to some other value.
In its simplest form, consider the following:
struct MapIntToString
{
int value;
char* string;
};
MapIntToString my_map[] = {
{ 1, "one" },
{ 2, "two" },
{ 3, "three" },
// ...
};
The above could be considered a lookup table. You can iterate (loop) over my_map to find (look up) the integer 2 and then pick the string "two" from it.
Depending on your need and use-case the above example might not be enough. The code above is basically how it is commonly done in plain C, not C++. For C++ there are better containers for mapping values, like std::map and std::unordered_map.
However sometimes the standard types might not be enough, and there are many other data-structures that could be implemented for looking up things.

Doesn't std::unordred_map preserve insertion order? [duplicate]

This question already has answers here:
Keep the order of unordered_map as we insert a new key
(3 answers)
C++11: does unordered_map/set guarantees traversing order as insert order?
(1 answer)
std::unordered_map not behaving as expected
(3 answers)
Closed 6 months ago.
My understanding of what unordered_map means is that is stores unit value per key without ordering them. But is it expected that insertion order is not preserved?
When I compile and run:
std::unordered_map<std::string,int> temp;
temp["Start"] = 0;
temp["Read"] = 0;
for ( auto iter : temp )
{
std::cout << iter.first.c_str();
}
With VS2015, it outputs
Start
Read
With GCC 4.9 for Android, it outputs:
Read
Start
Is it a bug, or expected?
From here:
Internally, the elements in the unordered_map are not sorted in any particular order with respect to either their key or mapped values, but organized into buckets depending on their hash values to allow for fast access to individual elements directly by their key values (with a constant average time complexity on average).
I think that pretty much sums it up.
This is expected. In the standard there're no guarantees regarding the order of elements in std::unordered_map.

Implement the stack that pops the most frequently added item [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I was asked to implement a stack that pops the most frequently added item in an interview. I gave him the below answer but he was not happy with the solution.
class stack
{
// Map of value and count
map<int,int> Cntmap;
public:
void push(int val)
{
// Find val in map
// if found then increment map count
// else insert a pair (val,1)
}
int pop( )
{
// Find the Key in Cntmap with max value
// using std::max_element
// Decrement the Cntmap count for the popped val
}
}
Can anyone help me with the correct approach?
It's an interesting question, because in push, you look up
using the key, and in pop, using the mapped value. std::map
supports the first immediately: all you have to do is:
++ CntMap[ val ];
The [] operator will insert an entry if the key isn't
present, initializing the mapped type with its default
constructor, which for an int results in 0. You don't even
need the if.
The second is more difficult. The comments, however, give the
solution: all you need is a custom Compare, which takes two
std::pair<int, int>, and compares the second element.
std::max_element will return an iterator to the entry you're
interested in, so you can use it directly. So far so good (and
very simple), but you have to consider error conditions: what
happens if Cntmap is empty. And you might want to remove the
element if the count goes down to 0 (again, simple, since you
have an iterator designating the entry, with both the key and
the value).
Finally, if this is an interview question, I would definitly
point out that the pop operation is O(n), and that it might
be worthwhile (although significantly more complicated) to
maintain a secondary index, so that I could find the maximum
element more quickly. (If I were interviewing, that would be my
next question. Clearly for advanced programmers, however.)
The problem with only using a single (simple) data structure is that one of the operations will have to be linear (it has to search through all the elements), which is not good enough. In your case, I believe the linear-time operation is pop.
My attempt:
Have a linked-list (which will be ordered by frequency).
Have a map of values to nodes in the linked-list.
To push, look up the value in the map to get the linked-list node.
If found, increment the frequency and move the node appropriately to keep the linked-list sorted.
If not found, set the frequency to one and insert into the linked-list in the appropriate place.
To pop, decrement the frequency of first node of the linked-list and move it appropriately to keep the linked-list sorted, and return the applicable value.
You could have some pretty bad worst-case behaviour if there are many nodes with the same frequency. It should be possible to get constant time add / increment / decrement by having some sort of linked-list of linked-lists, with each node in the large linked-list representing a specific frequency and each linked-list from there representing all nodes having that frequency.
With the above optimization, pop can be O(1) and push can be O(log n). If you use an unordered_map (C++11), push can be O(1).
Another (probably slightly simpler) option is to do something similar to the above, but with a heap instead of a linked-list.
I think instead of a Map, Max-Heap will be better in your case. You can maintain a counter in a similar way. Note that the key of the heap will be the count rather than the actual value itself. When you have to insert a value, search for that value, if found, increment it's key, else, insert the value with key as 1.
Hope this helps.
The solution could be to wrap Boost.Bimap (does the organisation uses boost?). With this you could create a container which gives ordered access in one direction and hashed in the other. Your implementation of push and pop would use replace function of bimap.