What's the FASTEST way to compare vectors in C++? - c++

What is the fastest way to see if two vectors are equal in c++?
I'm trying to find the fastest way to see if any row is equal to any column of a matrix, so element by element comparison and exiting the loop when not equal is not good enough.

Do not reinvent the wheel. You can use std::equal from <algorithm>.
It has the following complexity:
No applications of the corresponding predicate if InputIterator1 and InputIterator2 meet the requirements of random access iterators and last1 - first1 != last2 - first2. Otherwise, at most min(last1 - first1, last2 - first2) applications of the corresponding predicate.
That's what you were looking for.
See the documentation for further details.
As mentioned in the comments, there is a subtle difference between operator== and std::equal: the former doesn't work if types are different (as an example, std::vector<int> and std::vector<double>), the latter does work instead.
I tried to give the most generic solution.
If types are the same, of course operator== works like a charm, as mentioned by #Jarod42.

Simply use operator == of vector:
std::vector<int> v1{1, 2, 3, 4}, v2{1, 2, 3, 4};
bool are_equal = (v1 == v2);

Equality operator ( == ) is overloaded in C++ Vector STL. So you can easily compare those directly like comparing two integers.
To compare row and column of a matrix as you said, use a loop and compare rows and columns directly by ==.

Related

C++ library method for intersection of two unordered_set

I have two unordered_set and want the intersection of those. I can't find a library function to do that.
Essentially, what I want is this:
unordered_set<int> a = {1, 2, 3};
unordered_set<int> b = {2, 4, 1};
unordered_set<int> c = a.intersect(b); // Should be {1, 2}
I can do something like
unordered_set<int> c;
for (int element : a) {
if (b.count(element) > 0) {
c.insert(element);
}
}
but I think there should be a more convenient way to do that? If there's not, can someone explain why? I know there is set_intersection, but that seems to operate on vectors only?
Thanks
In fact, a loop-based solutions is the best thing you can use with std::unordered_set.
There is an algorithm called std::set_intersection which allows to find an intersection of two sorted ranges:
Constructs a sorted range beginning at d_first consisting of elements
that are found in both sorted ranges [first1, last1) and [first2,
last2).
As you deal with std::unordered_set, you cannot apply this algorithm because there is no guaranteed order for the elements in std::unordered_set.
My advice is to stick with loops as it explicitly says what you want to achieve and has a linear complexity (O(N), where N is a number of elements in the unordered set you traverse with a for loop) which is the best compexity you might achieve.
There is a function from std called set_intersection. However, it would have a very high complexity using it with std::set as input parameter.. A better solution is, create two vectors from those sets and use set_intersection with vectors as input parameters.

Using for loop and find instead of set_intersection?

This question suggests using std::set_intersection to find the intersection of two arrays. Wouldn't using std::find work just as well?
int a[5] = {1, 2, 3, 4, 5};
int b[5] = {3, 4, 5, 6, 7};
for (int i=0; i<5; i++)
{
if (std::find(b, b+5, a[i])!=b+5)
std::cout << a[i] << std::endl;
}
Does std::set_intersection basically do the same thing? Or maybe it uses a more efficient algorithm? I think the complexity of above is O(n^2) if std::find takes O(n) time.
For all (or at least almost all) of your standard-library-complexity questions, a good reference can answer this for you.
In particular we get that std::find performs At most last - first applications of the predicate (operator< in this case) where first and last define your range to be searched.
We also get that std::set_intersection performs At most 2ยท(N1+N2-1) comparisons, where N1 = std::distance(first1, last1) and N2 = std::distance(first2, last2).
This means that your loop performs at most N1 * N2 applications of operator<, which in this case is 25. std::set_intersection would use at most 18.
So, the two methods would "work just as well" in the sense that they give the same answer, but std::set_intersection would take less comparisons to do it.
std::set is an ordered collection. There are faster methods (linear) for such collections (think mergesort).
std::set::find takes O(lg n), it's a binary search. So using a for loop together with find takes O(n lg n). std::set_intersection takes linear time: align the two sets to find their intersection (similar to the merge operation of mergesort).

STL remove first element that matches a predicate from a vector

What is an efficient way to erase the first element in a vector that matches a predicate? I am storing unique values in a vector so I wouldn't want the algorithm to search the whole container.
Currently I am doing:
if ((auto it = std::find_if(container.begin(),container.end(),
[](Type& elem){ return elem == value;}) != container.end()))
{
container.erase(it);
}
Thanks, in advance.
Only a minor improvement:
container.erase(
std::remove(container.begin(), container.end(), value),
container.end()
);
of if you want to use an unary predicate my_predicate:
container.erase(
std::remove_if(container.begin(), container.end(), my_predicate),
container.end()
);
This has exactly the same performance characteristics (find+erase together will touch all elements as well), but elegantly avoids special cases (!= container.end()) because it is ranged based.
If you don't care about keeping the vector stable (or sorted!) you can also swap the found element to the back() and pop_back() which will slightly improve the average (but not asymptotic!) runtime.
Not to forget this is also a commonly accepted C++ idiom, so it can be more easily recognized.

How to remove almost duplicates from a vector in C++

I have an std::vector of floats that I want to not contain duplicates but the math that populates the vector isn't 100% precise. The vector has values that differ by a few hundredths but should be treated as the same point. For example here's some values in one of them:
...
X: -43.094505
X: -43.094501
X: -43.094498
...
What would be the best/most efficient way to remove duplicates from a vector like this.
First sort your vector using std::sort. Then use std::unique with a custom predicate to remove the duplicates.
std::unique(v.begin(), v.end(),
[](double l, double r) { return std::abs(l - r) < 0.01; });
// treats any numbers that differ by less than 0.01 as equal
Live demo
Sorting is always a good first step. Use std::sort().
Remove not sufficiently unique elements: std::unique().
Last step, call resize() and maybe also shrink_to_fit().
If you want to preserve the order, do the previous 3 steps on a copy (omit shrinking though).
Then use std::remove_if with a lambda, checking for existence of the element in the copy (binary search) (don't forget to remove it if found), and only retain elements if found in the copy.
I say std::sort() it, then go through it one by one and remove the values within certain margin.
You can have a separate write iterator to the same vector and one resize operation at the end - instead of calling erase() for each removed element or having another destination copy for increased performance and smaller memory usage.
If your vector cannot contain duplicates, it may be more appropriate to use an std::set. You can then use a custom comparison object to consider small changes as being inconsequential.
Hi you could comprare like this
bool isAlmostEquals(const double &f1, const double &f2)
{
double allowedDif = xxxx;
return (abs(f1 - f2) <= allowedDif);
}
but it depends of your compare range and the double precision is not on your side
if your vector is sorted you could use std::unique with the function as predicate
I would do the following:
Create a set<double>
go through your vector in a loop or using a functor
Round each element and insert into the set
Then you can swap your vector with an empty vector
Copy all elements from the set to the empty vector
The complexity of this approach will be n * log(n) but it's simpler and can be done in a few lines of code. The memory consumption will double from just storing the vector. In addition set consumes slightly more memory per each element than vector. However, you will destroy it after using.
std::vector<double> v;
v.push_back(-43.094505);
v.push_back(-43.094501);
v.push_back(-43.094498);
v.push_back(-45.093435);
std::set<double> s;
std::vector<double>::const_iterator it = v.begin();
for(;it != v.end(); ++it)
s.insert(floor(*it));
v.swap(std::vector<double>());
v.resize(s.size());
std::copy(s.begin(), s.end(), v.begin());
The problem with most answers so far is that you have an unusual "equality". If A and B are similar but not identical, you want to treat them as equal. Basically, A and A+epsilon still compare as equal, but A+2*epsilon does not (for some unspecified epsilon). Or, depending on your algorithm, A*(1+epsilon) does and A*(1+2*epsilon) does not.
That does mean that A+epsilon compares equal to A+2*epsilon. Thus A = B and B = C does not imply A = C. This breaks common assumptions in <algorithm>.
You can still sort the values, that is a sane thing to do. But you have to consider what to do with a long range of similar values in the result. If the range is long enough, the difference between the first and last can still be large. There's no simple answer.

Difference between irange and counting_range in Boost

What's the difference between irange and counting_range?
I needed irange to quickly generate a range of integers like this:
auto example = boost::irange(0, 5); /// result is {0, 1, 2, 3, 4}
But noticed an example somewhere (lost the link) that talks instead about counting_range to accomplish the same task. Is there a simple explanation of the difference between these two?
The main difference is that irange is a random-access range while counting_range isn't. counting_range is based on Boost.Iterator's counting_iterator which uses all the underlying integers operations directly. Integers in C++ almost fit the iterator concept: the only thing missing is an operator*. counting_iterator provides an operator* as an identity operation and forwards everything else to the underlying type.
Another difference is that irange also supports increments different than 1.
None of them ever materialises the entire range of integers that they iterate over, so they both use O(1) memory.
Both irange and counting_range model a random access range for integer types. As counting_ranges documentation points out, its iterator category is determined according to the following algorithm:
if (CategoryOrTraversal is not use_default)
return CategoryOrTraversal
else if (numeric_limits<Incrementable>::is_specialized)
return iterator-category(random_access_traversal_tag, Incrementable, const Incrementable&)
else
return iterator-category(iterator_traversal<Incrementable>::type, Incrementable, const Incrementable&)
Therefore, for simple ranges such as boost::irange(0, 10) and boost::counting_range(0, 10) there is effectively no difference (aside from the types of each range, of course!).
However, irange also supports iteration with a different step size, e.g., boost::irange(0, 10, 2), and counting_range also supports types that are only incrementable and do not fully model an integer.