checking for difference between two vector<T> - c++

Suppose you have 2 vectors say v1 and v2 with the following values:
v1 = {8,4,9,9,1,3};
v2 = {9,4,3,8,1,9};
What is the most STL approach to check if they are "equal"? I am defining "equal" to mean the contents are the same regardless of the order. I would prefer to do this without sorting.
I was leaning towards building two std::map<double, int> to count up each of the vector's elements.
All, I need is a boolean Yes/No from the algorithm.
What say you?
Other conversations on Stack Overflow resort to sorting the vectors, I'd prefer to avoid that. Hence this new thread.

I was leaning towards building two std::map to count up each of the vector's elements.
This will be far slower than just creating sorted vectors. (Note also that std::map is powered by sorting; it just does so using red-black trees or AVL trees) Maps are data structures optimized for an even mix of inserts and lookups; but your use case is a whole bunch of inserts followed by a whole bunch of lookups with no overlap.
I would just sort the vectors (or make copies and sort those, if you are not allowed to destroy the source copies) and then use vector's built in operator ==.

Sorting the vectors and call set_difference is still the best way.
If the copy is heavy for you, the comparison between two unsorted arrays is even worse?
If you want current array untouched, you can make a copy of current arrays?
v1 = {8,4,9,9,1,3};
v2 = {9,4,3,8,1,9};
// can trade before copy/sort heavy work
if (v1.size() != v2.size()){
}
std::vector<int> v3(v1);
std::vector<int> v4(v2);
sort(v3.begin(), v3.end());
sort(v4.begin(), v4.end());
return v3 == v4;

I assume for some reason you can't sort the vectors, most likely because you still need them in their original order or they're expensive to copy. Otherwise, just sort them.
Create a "view" into each vector that allows you to see the vector in any order. You can do this with a vector of pointers that starts out pointing to the elements in order. Then sort the two views, producing a sorted view into each vector. Then compare the two views, comparing the two vectors in their view order. This avoids sorting the vectors themselves.

Was originally thinking of working in terms of sets since that's what you're actually thinking in terms of but that does necessitate sorting. This can be done in O(n) by converting both to hashmaps and checking for equality there.

just take the first vector and compare it with each element in the second vector.
If one value from the first one couldnt be find in the second the vectors are different.
In the worst case it takes O(n*m) time which n = size of first vector and m = size second vector.

This util method will help you to compare 2 int[], let me know in case of any issues
public static boolean compareArray(int[] v1, int[] v2){
boolean returnValue = false;
if(v1.length != v2.length)
returnValue = false;
if(v1.length == 0 || v2.length == 0)
returnValue = false;
List<Integer> intList = Ints.asList(v2);
for(int element : v1){
if(!intList.contains(element)){
returnValue = false;
break;
}else{
returnValue = true;
}
}

Related

Stable sorting a vector using std::sort

So I have some code like this, I want to sort the vector based on id and put the last overridden element first:
struct Data {
int64_t id;
double value;
};
std::vector<Data> v;
// add some Datas to v
// add some 'override' Datas with duplicated `id`s
std::sort(v.begin(), v.end(),
[](const Data& a, const Data& b) {
if (a.id < b.id) {
return true;
} else if (b.id < a.id) {
return false;
}
return &a > &b;
});
Since vectors are contiguous, &a > &b should work to put the appended overrides first in the sorted vector, which should be equivalent to using std::stable_sort, but I am not sure if there is a state in the std::sort implementation where the equal values would be swapped such that the address of an element that appeared later in the original vector is earlier now. I don't want to use stable_sort because it is significantly slower for my use case. I have also considered adding a field to the struct that keeps track of the original index, but I will need to copy the vector for that.
It seems to work here: https://onlinegdb.com/Hk8z1giqX
std::sort gives no guarantees whatsoever on when elements are compared, and in practice, I strongly suspect most implementations will misbehave for your comparator.
The common std::sort implementation is either plain quicksort or a hybrid sort (quicksort switching to a different sort for small ranges), implemented in-place to avoid using extra memory. As such, the comparator will be invoked with the same element at different memory addresses as the sort progresses; you can't use memory addresses to implement a stable sort.
Either add the necessary info to make the sort innately stable (e.g. the suggested initial index value) or use std::stable_sort. Using memory addresses to stabilize the sort won't work.
For the record, having experimented a bit, I suspect your test case is too small to trigger the issue. At a guess, the hybrid sorting strategy works coincidentally for smallish vectors, but breaks down when the vector gets large enough for an actual quicksort to occur. Once I increase your vector size with some more filler, the stability disappears, Try it online!

How to add an element to the front of a vector in C++? [duplicate]

iterator insert ( iterator position, const T& x );
Is the function declaration of the insert operator of the std::Vector class.
This function's return type is an iterator pointing to the inserted element. My question is, given this return type, what is the most efficient way (this is part of a larger program I am running where speed is of the essence, so I am looking for the most computationally efficient way) of inserting at the beginning. Is it the following?
//Code 1
vector<int> intvector;
vector<int>::iterator it;
it = myvector.begin();
for(int i = 1; i <= 100000; i++){
it = intvector.insert(it,i);
}
Or,
//Code 2
vector<int> intvector;
for(int i = 1; i <= 100000; i++){
intvector.insert(intvector.begin(),i);
}
Essentially, in Code 2, is the parameter,
intvector.begin()
"Costly" to evaluate computationally as compared to using the returned iterator in Code 1 or should both be equally cheap/costly?
If one of the critical needs of your program is to insert elements at the begining of a container: then you should use a std::deque and not a std::vector. std::vector is only good at inserting elements at the end.
Other containers have been introduced in C++11. I should start to find an updated graph with these new containers and insert it here.
The efficiency of obtaining the insertion point won't matter in the least - it will be dwarfed by the inefficiency of constantly shuffling the existing data up every time you do an insertion.
Use std::deque for this, that's what it was designed for.
An old thread, but it showed up at a coworker's desk as the first search result for a Google query.
There is one alternative to using a deque that is worth considering:
std::vector<T> foo;
for (int i = 0; i < 100000; ++i)
foo.push_back(T());
std::reverse( foo.begin(), foo.end() );
You still use a vector which is significantly more engineered than deque for performance. Also, swaps (which is what reverse uses) are quite efficient. On the other hand, the complexity, while still linear, is increased by 50%.
As always, measure before you decide what to do.
If you're looking for a computationally efficient way of inserting at the front, then you probably want to use a deque instead of a vector.
Most likely deque is the appropriate solution as suggested by others. But just for completeness, suppose that you need to do this front-insertion just once, that elsewhere in the program you don't need to do other operations on the front, and that otherwise vector provides the interface you need. If all of those are true, you could add the items with the very efficient push_back and then reverse the vector to get everything in order. That would have linear complexity rather than polynomial as it would when inserting at the front.
When you use a vector, you usually know the actual number of elements it is going to have. In this case, reserving the needed number of elements (100000 in the case you show) and filling them by using the [] operator is the fastest way. If you really need an efficient insert at the front, you can use deque or list, depending on your algorithms.
You may also consider inverting the logic of your algorithm and inserting at the end, that is usually faster for vectors.
I think you should change the type of your container if you really want to insert data at the beginning. It's the reason why vector does not have push_front() member function.
Intuitively, I agree with #Happy Green Kid Naps and ran a small test showing that for small sizes (1 << 10 elements of a primitive data type) it doesn't matter. For larger container sizes (1 << 20), however, std::deque seems to be of higher performance than reversing an std::vector. So, benchmark before you decide. Another factor might be the element type of the container.
Test 1: push_front (a) 1<<10 or (b) 1<<20 uint64_t into std::deque
Test 2: push_back (a) 1<<10 or (b) 1<<20 uint64_t into std::vector followed by std::reverse
Results:
Test 1 - deque (a) 19 µs
Test 2 - vector (a) 19 µs
Test 1 - deque (b) 6339 µs
Test 2 - vector (b) 10588 µs
You can support-
Insertion at front.
Insertion at the end.
Changing value at any position (won't present in deque)
Accessing value at any index (won't present in deque)
All above operations in O(1) time complexity
Note: You just need to know the upper bound on max_size it can go in left and right.
class Vector{
public:
int front,end;
int arr[100100]; // you should set this in according to 2*max_size
Vector(int initialize){
arr[100100/2] = initialize; // initializing value
front = end = 100100/2;
front--;end++;
}
void push_back(int val){
arr[end] = val;
end++;
}
void push_front(int val){
if(front<0){return;} // you should set initial size accordingly
arr[front] = val;
front--;
}
int value(int idx){
return arr[front+idx];
}
// similarity create function to change on any index
};
int main(){
Vector v(2);
for(int i=1;i<100;i++){
// O(1)
v.push_front(i);
}
for(int i=0;i<20;i++){
// to access the value in O(1)
cout<<v.value(i)<<" ";
}
return;
}
This may draw the ire of some because it does not directly answer the question, but it may help to keep in mind that retrieving the items from a std::vector in reverse order is both easy and fast.

How to remove almost duplicates from a vector in C++

I have an std::vector of floats that I want to not contain duplicates but the math that populates the vector isn't 100% precise. The vector has values that differ by a few hundredths but should be treated as the same point. For example here's some values in one of them:
...
X: -43.094505
X: -43.094501
X: -43.094498
...
What would be the best/most efficient way to remove duplicates from a vector like this.
First sort your vector using std::sort. Then use std::unique with a custom predicate to remove the duplicates.
std::unique(v.begin(), v.end(),
[](double l, double r) { return std::abs(l - r) < 0.01; });
// treats any numbers that differ by less than 0.01 as equal
Live demo
Sorting is always a good first step. Use std::sort().
Remove not sufficiently unique elements: std::unique().
Last step, call resize() and maybe also shrink_to_fit().
If you want to preserve the order, do the previous 3 steps on a copy (omit shrinking though).
Then use std::remove_if with a lambda, checking for existence of the element in the copy (binary search) (don't forget to remove it if found), and only retain elements if found in the copy.
I say std::sort() it, then go through it one by one and remove the values within certain margin.
You can have a separate write iterator to the same vector and one resize operation at the end - instead of calling erase() for each removed element or having another destination copy for increased performance and smaller memory usage.
If your vector cannot contain duplicates, it may be more appropriate to use an std::set. You can then use a custom comparison object to consider small changes as being inconsequential.
Hi you could comprare like this
bool isAlmostEquals(const double &f1, const double &f2)
{
double allowedDif = xxxx;
return (abs(f1 - f2) <= allowedDif);
}
but it depends of your compare range and the double precision is not on your side
if your vector is sorted you could use std::unique with the function as predicate
I would do the following:
Create a set<double>
go through your vector in a loop or using a functor
Round each element and insert into the set
Then you can swap your vector with an empty vector
Copy all elements from the set to the empty vector
The complexity of this approach will be n * log(n) but it's simpler and can be done in a few lines of code. The memory consumption will double from just storing the vector. In addition set consumes slightly more memory per each element than vector. However, you will destroy it after using.
std::vector<double> v;
v.push_back(-43.094505);
v.push_back(-43.094501);
v.push_back(-43.094498);
v.push_back(-45.093435);
std::set<double> s;
std::vector<double>::const_iterator it = v.begin();
for(;it != v.end(); ++it)
s.insert(floor(*it));
v.swap(std::vector<double>());
v.resize(s.size());
std::copy(s.begin(), s.end(), v.begin());
The problem with most answers so far is that you have an unusual "equality". If A and B are similar but not identical, you want to treat them as equal. Basically, A and A+epsilon still compare as equal, but A+2*epsilon does not (for some unspecified epsilon). Or, depending on your algorithm, A*(1+epsilon) does and A*(1+2*epsilon) does not.
That does mean that A+epsilon compares equal to A+2*epsilon. Thus A = B and B = C does not imply A = C. This breaks common assumptions in <algorithm>.
You can still sort the values, that is a sane thing to do. But you have to consider what to do with a long range of similar values in the result. If the range is long enough, the difference between the first and last can still be large. There's no simple answer.

Adding object to vector with push_back working fine, but adding objects with accessor syntax [ ] , not working

I've implemented a merge function for vectors, which basically combines to sorted vectors in a one sorted vector. (yes, it is for a merge sort algorithm). I was trying to make my code faster and avoid overheads, so I decided not to use the push_back method on the vector, but try to use the array syntax instead which has lesser over head. However, something is going terribly wrong, and the output is messed up when i do this. Here's the code:
while(size1<left.size() && size2 < right.size()) //left and right are the input vectors
{
//it1 and it2 are iterators on the two sorted input vectors
if(*it1 <= *it2)
{
final.push_back(*it1); //final is the final vector to output
//final[count] = *it1; // this does not work for some reason
it1++;
size1++;
//cout<<"count ="<<count<<" size1 ="<<size1<<endl;
}
else
{
final.push_back(*it2);
//final[count] = left[size2];
it2++;
size2++;
}
count++;
//cout<<"count ="<<count<<" size1 ="<<size1<<"size2 = "<<size2<<endl;
}
It seems to me that the two methods should be functionally equivalent.
PS I have already reserved space for the final vector so that shouldnt be a problem.
You can't add new objects to vector using operator[]. .reserve() doesn't add them neither. You have to either use .resize() or .push_back().
Also, you are not avoiding overheads at all; call cost of operator[] isn't really much better that push_back() one, so until you profile your code thorougly, just use push_back. You can still use reserve to make sure unneccessary allocations won't be made.
In most of the cases, "optimizations" like this don't really help. If you want to make your code faster, profile it first and look for the hot paths.
There is a huge difference between
vector[i] = item;
and
vector.push_back(item);
Differences:
The first one modifies the element at index i and i must be valid index. That is,
0 <= i < vector.size() must be true
If i is an invalid index, the first one invokes undefined behavior, which means ANYTHING can happen. You could, however, use at() which throws exception if i is invalid:
vector.at(i) = item; //throws exception if i is invalid
The second one adds an element to the vector at the end, which means the size of the vector increases by one.
Since, sematically both of them do different thing, choose the one which you need.

Sorting a vector alongside another vector in C++

I am writing a function in C++ which will take in 2 vectors of doubles called xvalues and yvalues. My aim is to create an interpolation with these inputs. However, it would be really convenient if the (x,y) pairs were sorted so that the x-values were in increasing order and the y-values still corresponded to the correct x-value.
Does anyone know how I can do this efficiently?
I would probably create a vector of pairs and sort that by whatever means necessary.
It sounds like your data abstraction (2 separate collections for values that are actually "linked" is wrong).
As an alternative, you could write some kind of iterator adaptor that internally holds two iterators and increases/decreases/assigns them simultaneously. They dereference to a special type that on swap, swaps the two values in both vectors, but on compare only compare one. This might be some work (extra swap,op<, class ), but when done as a template, and you need this more often, could pay out.
Or you use a vector of pairs, which you then can sort easily with the stl sort algorithm, or you write your own sort method. Therefore you've several options.
Within your own sorting algorithm you can then take care of not only sorting your x-vector but also the y-vector respectively.
Here as an example using bubble sort for your two vectors (vec1 and vec2).
bool bDone = false;
while (!done) {
done = true;
for(unsigned int i=0; i<=vec1.size()-1; ++i) {
if ( vec1.at(i) > vec1.at(i+1) ) {
double tmp = vec1.at(i);
vec1.at(i) = vec1.at(i+1);
vec1.at(i+1) = tmp;
tmp = vec2.at(i);
vec2.at(i) = vec2.at(i+1);
vec2.at(i+1) = tmp;
done = false;
}
}
}
But again, as others pointed out here, you should defenitely use std::vector< std::pair<double, double> > and the just sort it.
The idea is easy: implement a sort algorithm (e.g. quicksort is easy, short an OK for most use cases - there are a lot implementations available: http://www.java-samples.com/showtutorial.php?tutorialid=445 ).
Do the compare on your x-vector and
do the swap on both vectors.
The sort method has to take both vectors a input, but that should be a minor issue.