Use std::set with input order preserved - c++

I would love to use std::set to store integers that have to be unique, but I don't want them to be sorted (e.g. I need the order of input to the set preserved)
For example:
set<int> exampleSet;
exampleSet.insert(5);
exampleSet.insert(2);
exampleSet.insert(10);
exampleSet.insert(0);
The set will now contain
{0,2,5,10}
I would like it to be in original order so
{5,2,10,0}
How do I achieve this?

Probably the easiest and most obvious way to do this is to use a set in conjunction with a vector:
// We'll use this solely to keep track of whether we've already seen a number
std::set<int> seen;
// and this to store numbers that weren't repeats in order
std::vector<int> result;
// some inputs to work with
std::vector<int> inputs{ 1, 10, 1, 19, 10, 5, 2, 1, 19, 5, 1};
for (int i : inputs)
if (seen.insert(i).second) // check if it's a duplicate
result.push_back(i); // if not, save it
// show the results:
std::copy(result.begin(), result.end(), std::ostream_iterator<int>(std::cout, "\t"));
Result:
1 10 19 5 2
If you might have a lot of unique numbers, an std::unordered_set may have better performance than an std::set.

You need an ordered set -- you can find one here. This is more or less a "drop in" replacement for std::set that maintains the insertion order.

Related

What is the fastest way to see if an array has two common elements?

Suppose that we have a very long array, of, say, int to make the problem simpler.
What is the fastest way (or just a fast way, if it's not the fastest), in C++ to see if an array has more than one common elements in C++?
To clarify, this function should return this:
[2, 5, 4, 3] => false
[2, 8, 2, 5, 7, 3, 4] => true
[8, 8, 5] => true
[1, 2, 3, 4, 1, 7, 1, 1, 7, 1, 2, 2, 3, 4] => true
[9, 1, 12] => false
One strategy is to loop through the array and for each array element loop through the array again to check. However, this can be very costly and expensive (literally O(n^2)). Is there any better way?
(✠Update Below) Insert the array elements to a std::unordered_set and if the insertion fails, it means you have duplicates.
Something like as follows:
#include <iostream>
#include <vector>
#include <unordered_set>
bool has_duplicates(const std::vector<int>& vec)
{
std::unordered_set<int> set;
for (int ele : vec)
if (const auto [iter, inserted] = set.emplace(ele); !inserted)
return true; // has duplicates!
return false;
}
int main()
{
std::vector<int> vec1{ 1, 2, 3 };
std::cout << std::boolalpha << has_duplicates(vec1) << '\n'; // false
std::vector<int> vec2{ 12, 3, 2, 3 };
std::cout << std::boolalpha << has_duplicates(vec2) << '\n'; // true
}
✠Update: As discussed in the comments, this can or may not be the fastest solution. In OP's case, as explained in Marcus Müller's answer, anO(N·log(N)) method would be better, which we can achieve by having a sorted array check for dupes.
Here is a quick benchmark that I made for the two cases "UnorderedSetInsertion" and the "ArraySort". Following are the result for GCC 10.3, C++20, O3:
This is nearly just a sorting problem, just that you can abort the sorting once you've hit a single equality and return true.
So, if you're memory-limited (That's often the case, not actually time-limited), an in-place sorting algorithm that aborts when it encounters to identical elements will do; so, std::sort with a comparator function that raises an exception when it encounters equality. Complexity would be O(N·log(N)), but let's be honest here: the fact that this is probably less indirect in memory addressing then the creation of a tree-like bucket structure might help. In that sense, I can only recommend you actually compare this to JeJos solution – that looks pretty reasonable, too!
The thing here is that there's very likely not a one-size-fits-all solution: what is fastest will depend on the amount of integers we're talking about. Even quadratic complexity might be better than any of our "clever" answers if that keeps memory access nice and linear – I'm almost certain your speed here is not bounded by your CPU, but by the amount of data you need to shuffle to and from RAM.
How about binning data (or create a histogram), and check for mode of the resultant data. A mode > 1 indicates a repeat value.

There is a given element say N. How to modify Binary Search to find greatest element in a sorted vector which smaller than N

For example:
Let us have a sorted vector with elements: [1, 3, 4, 6, 7, 10, 11, 13]
And we have an element N = 5
I want output as:
4
Since 4 is the greatest element smaller than N.
I want to modify Binary Search to get the answer
What would you want to happen if there is an element that equals N in the vector?
I would use std::lower_bound (or std::upper_bound depending on the answer to the above question). It runs in logarithmic time which means it's probably using binary search under the hood.
std::optional<int> find_first_less_than(int n, std::vector<int> data) {
// things must be sorted before processing
std::sort(data.begin(), data.end());
auto it = std::lower_bound(data.begin(), data.end(), n);
// if all of the elements are above N, we'll return nullopt
if (it == data.begin()) return std::nullopt;
return *std::prev(it);
}

Removing multiple elements from stl list while iterating

This is not similar to Can you remove elements from a std::list while iterating through it?. Mine is a different scenario.
Lets say I have a list like this.
1 2 3 1 2 2 1 3
I want to iterate this stl list in such a way that
When I first encounter an element X I do some activity and then I need to remove all the elements X in that list and continue iterating. Whats an efficient way of doing this in c++.
I am worried that when i do a remove or an erase I will be invalidating the iterators. If it was only one element then I could potentially increment the iterator and then erase. But in my scenario I would need to delete/erase all the occurances.
Was thinking something like this
while (!list.empty()) {
int num = list.front();
// Do some activity and if successfull
list.remove(num);
}
Dont know if this is the best.
Save a set of seen numbers and if you encounter a number in the set ignore it. You can do as follows:
list<int> old_list = {1, 2, 3, 1, 2, 2, 1, 3};
list<int> new_list;
set<int> seen_elements;
for(int el : old_list) {
if (seen_elements.find(el) == seen_elements.end()) {
seen_elements.insert(el);
new_list.push_back(el);
}
}
return new_list;
This will process each value only once and the new_list will only contain the first copy of each element in the old_list. This runs in O(n*log(n)) because each iteration performs a set lookup (you can make this O(n) by using a hashset). This is significantly better than the O(n^2) that your approach runs in.

Determine unique values across multiple sets

In this project, there are multiple sets in which they hold values from 1 - 9. Within this, I need to efficiently determine if there are values that is unique in one set but not others.
For Example:
std::set<int> s_1 = { 1, 2, 3, 4, 5 };
std::set<int> s_2 = { 2, 3, 4 };
std::set<int> s_3 = { 2, 3, 4, 6 };
Note: The number of sets is unknown until runtime.
As you can see, s_1 contains the unique value of 1 and 5 and s_3 contains the unique value of 6.
After determining the unique values, the aforementioned sets should then just contain the unique values like:
// s_1 { 1, 5 }
// s_2 { 2, 3, 4 }
// s_3 { 6 }
What I've tried so far is to loop through all the sets and record the count of the numbers that have appeared. However I wanted to know if there is a more efficient solution out there.
There are std algorithm in the std C++ library for intersection, difference and union operations on 2 sets.
If I understood well your problem you could do this :
do an intersection on all sets (in a loop) to determine a base, and then apply a difference between each set and the base ?
You could benchmark this against your current implementation. Should be faster.
Check out this answer.
Getting Union, Intersection, or Difference of Sets in C++
EDIT: cf Tony D. comment : You can basically do the same operation using a std::bitset<> and binary operators (& | etc..), which should be faster.
Depending on the actual size of your input, might be well worth a try.
I would suggest something in c# like this
Dictionary<int, int> result = new Dictionary<int, int>();
foreach(int i in sets){
if(!result.containskey(i))
result.add(i,1);
else
result[i].value = result[i].value+1;
}
now the Numbers with count value only 1 means its unique, then find the sets with these numbers...
I would suggest :
start inserting all the elements in all the sets into a multimap.
Here each element is a key and and the set name with be the value.
One your multimap is filled with all the elements in all the sets,
then loop throgth the multimap and take count of each element in the
multimap.
If the count is 1 for any key, this means its unique and value of
that will be the set name.

Sort-related algorithm (replace each item by its index in the sorted colletion)

I need to do the following:
Given an std::vector of int I need to replace each int by the index that it would be in if the vector were sorted.
I will try to explain it better with an example.
Input: {22, 149,31}
Output: {2, 0, 1}
(Note that in the sorted vector {149, 31, 22} the 22 is in the index 2 of the sorted vector, the 149 is in index 0, and the 31 is in index 1)
I hope I make the algorithm clear.
Is this implemented somehow in the STL C++11 library? Has this algorithm a name? Can you offer any ideas to implement it elegantly?
I don't think it has a name, but it's pretty easy to accomplish.
First, you create a target vector and fill it with the indices 0...n.
vector<int> indices(input.size());
std::iota(indices.begin(), indices.end(), 0);
Second, you sort that vector, but instead of comparing the numbers in the vector, you compare the numbers at the relevant index in the input vector.
std::sort(indices.begin(), indices.end(),
[&input](int l, int r) { return input[l] < input[r]; });
Edit Note that I'm sorting in ascending order, whereas you're looking for descending order. Just flip the comparison in the lambda.