find the most similar value between two vectors in C++

find the most similar value between two vectors in C++ - c++

I have two sorted vectors and I want to find the index of a value in vector1 that has the smallest difference (distance) to another value in vector2. My following code does the job, however, because the vectors I use are always sorted I feel there most be another more efficient way to do the same thing. Any guides? Thanks in advance.
#include<iostream>
#include<cmath>
#include<vector>
#include<limits>
std::vector<float> v1{2,3,6,7,9};
std::vector<float> v2{4,6.2,10};
int main(int argc, const char * argv[])
{
float mn=std::numeric_limits<float>::infinity();
float difference;
int index;
for(int i=0; i<v1.size(); i++){
for(int j=0; j<v2.size(); j++){
difference = abs(v1[i]-v2[j]);
if(difference < mn){
mn= difference;
index = i;
}
}
}
std::cout<< index;
// 2 is the wanted index because |6-6.2| is the smallest distance between the 2 vectors
return 0;
}

Indeed, there is a faster way. You only need to compare elements in v1 to those in v2 that are smaller or equal, or the first that is greater. Basically, the idea is to have two iterators, i and j, and advance j if v2[j] < v1[i], otherwise advance i. Here is a possible implementation:
for (int i = 0, j = 0; i < v1.size(); i++) {
while (true) {
difference = std::abs(v1[i] - v2[j]);
if (difference < mn) {
mn = difference;
index = i;
}
// Try the next item in v1 if the current item in v2 is bigger.
if (v2[j] > v1[i])
break;
// Otherwise, try the next item in v2, unless we are at the last item.
if (j + 1 < v2.size())
j++;
else
break;
}
}
While it still looks like a double loop, it only computes differences at most v1.size() + v2.size() times, instead of v1.size() * v2.size() times.

Related

C++ sort ascending non-zero values

I'm a bit rusted with c++ and after one day of thinking I coulnd't come out with an efficient way of computing this problem.
Suppose I have an array of 5 float values
lints[5]={0, 0.5, 3, 0, 0.6};
I would like to introduce a new array:
ranks[5] that contains the ascending rank of the non-0 values of the array lints.
in this case the answer would read
ranks[1]=0;
ranks[2]=1;
ranks[3]=3;
ranks[4]=0;
ranks[5]=2;
In this example the 0 values returns rank 0 but they're not relevant since i only need the rank of positive values.
Thanks in advance
edit:
Thanks to everybody for help, this is what I found suiting my needs in case you have the same task :)
double lengths[5], ranks[5];
double temp;
int i,j;
lengths[0] = 2,lengths[1] = 0,lengths[2] = 1,lengths[3] = 0,lengths[4] = 4;
ranks[0] = 1, ranks[1] = 2, ranks[2] = 3, ranks[3] = 4, ranks[4] = 5;
for(i=0;i<4;i++){
for(j=0;j<4-i;j++){
if((lengths[j]>lengths[j+1] && lengths[j+1]) || lengths[j]==0){
// swap lenghts
temp=lengths[j];
lengths[j]=lengths[j+1];
lengths[j+1]=temp;
// swap ranks
temp=ranks[j];
ranks[j]=ranks[j+1];
ranks[j+1]=temp;
}
}
}
cheers.

You can use any sorting algorithm with a simple addition. When swapping 2 values you can swap index values too.
Create index values for initial indexes
ranks[5] = {1,2,3,4,5}; //or 0,1,2,3,4
for (int i = 0 ; i < 5 ; i++){
for(int j = 0 ; j < 5 ; j++){
//if array[i] < array[j]
//swap array[i] - array[j]
//swap ranks[i] - ranks[j]
}
}

As #cokceken said (I know answers shouldn't refer to other answers but I'm not a high enough Stack Overflow rank to comment on answers :/ ), use any simple sorting algorithm, and simply add in your own functionality for any special cases, such as values of 0 or negative values in your example.
For example, assuming you don't actually want to sort the original array and just create a new array that links indices in the array to their sorted rank,
array[arraySize] = // insert array here;
ranks[arraySize];
for (int i = 0; i < arraySize; i++){
int indexRank = 0;
for (int j = 0; j < arraySize; j++){
if (array[j] < array[i]){
indexRank++;
}
}
if (array[i] <= 0) {
ranks[i] = -1 // or whatever implementation you want here
} else {
ranks[i] = indexRank;
}
}
(note that arraySize must be a value and not a variable, since C++ does not let you statically define an array with a variable size)

I found this was easier if you keep separate values for the value, original position and the rank in a class:
#include <vector>
#include <iostream>
#include <algorithm>
struct Item {
float value;
int original_position;
int rank;
};
int main() {
float lints[5] = {0, 0.5, 3, 0, 0.6};
std::vector<Item> items{};
int index{};
for(auto i : lints)
items.push_back(Item{i,index++,0}); // assign index to original_position
std::sort(items.begin(), items.end(), [](auto& l, auto& r) {return l.value < r.value; }); // sort by float value
auto it = std::find_if(items.begin(), items.end(), [](auto& i) {return i.value > 0; }); // find first non-zero position (as iterator)
int new_rank_value{1}; // start numbering non-zero numbers from 1
std::for_each(it, items.end(), [&new_rank_value](auto& i) {i.rank = new_rank_value++; }); // assign non-zero numbers a rank value
std::sort(items.begin(), items.end(), [](auto& l, auto& r) {return l.original_position < r.original_position ; }); // sort by original position again
for(auto i : items)
std::cout << "ranks[" << i.original_position << "]=" << i.rank << ";\n";
}
Output:
ranks[0]=0;
ranks[1]=1;
ranks[2]=3;
ranks[3]=0;
ranks[4]=2;

Find similar distances between all values in vector and subset them

Given is a vector with double values. I want to know which distances between any elements of this vector have a similar distance to each other. In the best case, the result is a vector of subsets of the original values where subsets should have at least n members.
//given
vector<double> values = {1,2,3,4,8,10,12}; //with simple values as example
//some algorithm
//desired result as:
vector<vector<double> > subset;
//in case of above example I would expect some result like:
//subset[0] = {1,2,3,4}; //distance 1
//subset[1] = {8,10,12}; //distance 2
//subset[2] = {4,8,12}; // distance 4
//subset[3] = {2,4}; //also distance 2 but not connected with subset[1]
//subset[4] = {1,3}; //also distance 2 but not connected with subset[1] or subset[3]
//many others if n is just 2. If n is 3 (normally the minimum) these small subsets should be excluded.
This example is simplified as the distances of integer numbers could be iterated and tested for the vector which is not the case for double or float.
My idea so far
I thought of something like calculating the distances and storing them in a vector. Creating a difference distance matrix and thresholding this matrix for some tolerance for similar distances.
//Calculate distances: result is a vector
vector<double> distances;
for (int i = 0; i < values.size(); i++)
for (int j = 0; j < values.size(); j++)
{
if (i >= j)
continue;
distances.push_back(abs(values[i] - values[j]));
}
//Calculate difference of these distances: result is a matrix
Mat DiffDistances = Mat::zero(Size(distances.size(), distances.size()), CV_32FC1);
for (int i = 0; i < distances.size(); i++)
for (int j = 0; j < distances.size(); j++)
{
if (i >= j)
continue;
DiffDistances.at<float>(i,j) = abs(distances[i], distances[j]);
}
//threshold this matrix with some tolerance in difference distances
threshold(DiffDistances, DiffDistances, maxDistTol, 255, CV_THRESH_BINARY_INV);
//get points with similar distances
vector<Points> DiffDistancePoints;
findNonZero(DiffDistances, DiffDistancePoints);
At this point I get stuck with finding the original values corresponding to my similar distances. It should be possible to find them, but it seems very complicated to trace back the indices and I wonder if there isn't an easier way to solve the problem.

Here is a solution that works, as long as there are no branches meaning, that there are no values closer together than 2*threshold. That is the valid neighbor region because neighboring bonds should differ by less than the threshold, if I understood #Phann correctly.
The solution is definitively neither the fastest nor the nicest possible solution. But you might use it as a starting point:
#include <iostream>
#include <vector>
#include <algorithm>
int main(){
std::vector< double > values = {1,2,3,4,8,10,12};
const unsigned int nValues = values.size();
std::vector< std::vector< double > > distanceMatrix(nValues - 1);
// The distanceMatrix has a triangular shape
// First vector contains all distances to value zero
// Second row all distances to value one for larger values
// nth row all distances to value n-1 except those already covered
std::vector< std::vector< double > > similarDistanceSubsets;
double threshold = 0.05;
std::sort(values.begin(), values.end());
for (unsigned int i = 0; i < nValues-1; ++i) {
distanceMatrix.at(i).resize(nValues-i-1);
for (unsigned j = i+1; j < nValues; ++j){
distanceMatrix.at(i).at(j-i-1) = values.at(j) - values.at(i);
}
}
for (unsigned int i = 0; i < nValues-1; ++i) {
for (unsigned int j = i+1; j < nValues; ++j) {
std::vector< double > thisSubset;
double thisDist = distanceMatrix.at(i).at(j-i-1);
// This distance already belongs to another cluster
if (thisDist < 0) continue;
double minDist = thisDist - threshold;
double maxDist = thisDist + threshold;
thisSubset.push_back(values.at(i));
thisSubset.push_back(values.at(j));
//Indicate that this is already clustered
distanceMatrix.at(i).at(j-i-1) = -1;
unsigned int lastIndex = j;
for (unsigned int k = j+1; k < nValues; ++k) {
thisDist = distanceMatrix.at(lastIndex).at(k-lastIndex-1);
// This distance already belongs to another cluster
if (thisDist < 0) continue;
// Check if you found a new valid pair
if ((thisDist > minDist) && (thisDist < maxDist)){
// Update the valid distance interval
minDist = thisDist - threshold;
minDist = thisDist - threshold;
// Add the newly found point
thisSubset.push_back(values.at(k));
// Indicate that this is already clustered
distanceMatrix.at(lastIndex).at(k-lastIndex-1) = -1;
// Continue the search from here
lastIndex = k;
}
}
if (thisSubset.size() > 2) {
similarDistanceSubsets.push_back(thisSubset);
}
}
}
for (unsigned int i = 0; i < similarDistanceSubsets.size(); ++i) {
for (unsigned int j = 0; j < similarDistanceSubsets.at(i).size(); ++j) {
std::cout << similarDistanceSubsets.at(i).at(j);
if (j != similarDistanceSubsets.at(i).size()-1) {
std::cout << " ";
}
else {
std::cout << std::endl;
}
}
}
}
The idea is to precompute the distances and then look for every pair of particles, starting from the smallest and its larger neighbors, if there is another valid pair above it. If so these are all collected in a subset and this is added to the subset vector. For every new value the valid neighbor region has to be updated to ensure that neighboring distances differ by less than the threshold. Afterwards, the program continues with the next smallest value and its larger neighbors and so on.

Here is an algorithm which is slightly different from yours, which is O(n^3) in the length n of the vector - not very efficient.
It is based on the premise that you want to have subsets of at least size 2. So what you can do is consider all the two-element subsets of the vector, then find all other elements that also match.
So given a function
std::vector<int> findSubset(std::vector<int> v, int baseValue, int distance) {
// Find the subset of all elements in v that differ by a multiple of
// distance from the base value
}
you can do
std::vector<std::vector<int>> findSubsets(std::vector<int> v) {
for(int i = 0; i < v.size(); i++) {
for(int j = i + 1; j < v.size(); j++) {
subsets.push_back(findSubset(v, v[i], abs(v[i] - v[j])));
}
}
return subsets;
}
Only remaining problem is keeping track of the duplicates, maybe you can keep a hashed list of (baseValue % distance, distance) pairs for all the subsets you have already found.

c++ iterate through all neighbor permutations

I have a vector of N objects, and I would like to iterate through all neighbor permutations of this vector. What I call a neighbor permutation is a permutation where only two elements of the original vector would be changed :
if I have a vector with 'a','b','c','d' then :
'b','a','c','d' //is good
'a','c','b','d' //is good
'b','a','d','c' //is not good (2 permutations)
If I use std::next_permutation(myVector.begin(), myVector.end() then I will get all the possible permutations, not only the "neighbor" ones...
Do you have any idea how that could be achieved ?

Initially, I thought I would filter the permutations that have a hamming distance greater than 2.
However, if you really only need to generate all the vectors resulting by swapping one pair, it would be more efficient if you do like this:
for(int i = 0; i < n; i++)
for(int j = i + 1; j < n; j++)
// swap i and j
Depending on whether you need to collect all the results or not, you should make a copy or the vector before the swap, or swap again i and j after you processed the current permutation.
Collect all the results:
std::vector< std::vector<T> > neighbor_permutations;
for(int i = 0; i < n; i++) {
for(int j = i + 1; j < n; j++) {
std::vector<T> perm(v);
std::swap(perm[i], perm[j]);
neighbor_permutations.push_back(perm);
}
}
Faster version - do not collect results:
for(int i = 0; i < n; i++) {
for(int j = i + 1; j < n; j++) {
std::swap(v[i], v[j]);
process_permutation(v);
std::swap(v[i], v[j]);
}
}

Perhaps it's a good idea to divide this into two parts:
How to generate the "neighbor permutations"
How to iterate over them
Regarding the first, it's easy to write a function:
std::vector<T> make_neighbor_permutation(
const std::vector<T> &orig, std::size_t i, std::size_t j);
which swaps i and j. I did not understand from your question if there's an additional constraint that j = i + 1, in which case you could drop a parameter.
Armed with this function, you now need an iterator that iterates over all legal combinations of i and j (again, I'm not sure of the interpretation of your question. It might be that there are n - 1 values).
This is very easy to do using boost::iterator_facade. You simply need to define an iterator that takes in the constructor your original iterator, and sets i (and possibly j) to initial values. As it is incremented, it needs to update the index (or indices). The dereference method needs to call the above function.

Another way to get it, just a try.
int main()
{
std::vector<char> vec={'b','a','c','d'};
std::vector<int> vec_in={1,1,0,0};
do{
auto it =std::find(vec_in.begin(),vec_in.end(),1);
if( *(it++) ==1)
{
for(auto &x : vec)
{
std::cout<<x<<" ";
}
std::cout<<"\n";
}
} while(std::next_permutation(vec_in.begin(),vec_in.end()),
std::next_permutation(vec.begin(),vec.end()) );
}

Extract the n lowest sums from combinations of elements from m arrays for huge datasets

Let's say you have a number of unsorted arrays containing integers. Your job is to make sums of the arrays. The sums have to contain exactly one value from each array, i.e. (for 3 arrays)
sum = array1[2]+array2[12]+array3[4];
Goal: You should output the 20 combinations that generate the lowest possible sums.
The solution below is off-limits as the algorithm needs to be able to handle 10 arrays that can contain a huge number of integers. The following solution is way too slow for larger number of arrays:
//You already have int array1, array2 and array3
int top[20];
for(int i=0; i<20; i++)
top[i] = 1e99;
int sum = 0;
for(int i=0; i<array1.size(); i++) //One for loop per array is trouble for
for(int j=0; j<array2.size(); j++) //increasing numbers of arrays
for(int k=0; k<array3.size(); k++)
{
sum = array1[i] + array2[j] + array3[k];
if (sum < top[19])
swapFunction(sum, top); //Function that adds sum to top
//and sorts top in increasing order
}
printResults(top); // Outputs top 20 lowest sums in increasing order
What would you do to achieve correct results more efficiently (with a lower Big O notation)?

The answer can be found by considering how to find the absolute lowest sum, and how to find the 2nd lowest sum and so on.
As you only need 20 sums at most, you only need the lowest 20 values from each array at most. I would recommend using std::partial_sort for this.
The rest should be able to be accomplished with a priority_queue in which each element contains the current sum and the indicies of the arrays for this sum. Simply take each index of indicies and increase it by one, calculate the new sum and add that to the priority queue. The top most item of the queue should always be the one of the lowest sum. Remove the lowest sum, generate the next possibilities, and then repeat until you have enough answers.
Assuming that the number of answers needed is much less than Big O should be predominately be the efficiency of partial_sort (N + k*log(k)) * number of arrays
Here's some basic code to demonstrate the idea. There's very likely ways of improving on this. For example, I'm sure that with some work, you could avoid adding the same set of indicies multiple times, and there by eliminate the need for the do-while pop.
for (size_t i = 0; i < arrays.size(); i++)
{
auto b = arrays[i].begin();
partial_sort(b, b + numAnswers, arrays[i].end());
}
struct answer
{
answer(int s, vector<int> i)
: sum(s), indices(i)
{
}
int sum;
vector<int> indices;
bool operator <(const answer &o) const
{
return sum > o.sum;
}
};
auto getSum =[&arrays](const vector<int> &indices) {
auto retval = 0;
for (size_t i = 0; i < arrays.size(); i++)
{
retval += arrays[i][indices[i]];
}
return retval;
};
vector<int> initalIndices(arrays.size());
priority_queue<answer> q;
q.emplace(getSum(initalIndices), initalIndices );
for (auto i = 0; i < numAnswers; i++)
{
auto ans = q.top();
cout << ans.sum << endl;
do
{
q.pop();
} while (!q.empty() && q.top().indices == ans.indices);
for (size_t i = 0; i < ans.indices.size(); i++)
{
auto nextIndices = ans.indices;
nextIndices[i]++;
q.emplace(getSum(nextIndices), nextIndices);
}
}

sum of squares matrices

I want to do a function that given 2 matrix returns the sum of both.I think the problem is in how I initialize the Matrix 't'.
#include <iostream>
#include <vector>
using namespace std;
typedef vector< vector<int> > Matrix;
Matrix sum(const Matrix&a,const Matrix&b){
Matrix t;
for(int i=0;i<a.size();i++)
for(int j=0;j<a.size();j++)
t[i][j] = a[i][j] + b[i][j];
return t;
}

You'll need to initialize the rows and columns of t with something like:
Matrix t = vector< vector<int> >(row_count, vector<int>(col_count, 0));
That will make a row_count by col_count matrix filled with zeroes.
On a side note about performance: comparing to .size() in a for loop means that before each iteration, .size() has to be calculated again. You can save a bit of processing (which adds up for massive data sets) by pre-calculating it like so:
for (int row = 0, row_ct = mat.size(); row < row_ct; ++row)

You don't have a rectangular data set in general: each a[i] is a vector of a possibly different length. Supposing you do in fact take care to have a rectangular grid, your for loop is still off; it should be like this:
for (int i = 0; i < a.size(); i++)
{
assert(a.size() <= b.size() && a.size() <= t.size());
for (int j = 0; j < a[i].size(); j++) // !!
{
assert(a[i].size() <= b[i].size() && a[i].size() <= t[i].size());
t[i][j] = a[i][j] + b[i][j];
}
}
I added some assertions to indicate which preconditions you have to satisfy.
To initialize a rectangular array, you can do something like this:
std::vector<std::vector<int>> v(n_rows, std::vector<int>(n_cols, 0));

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

find the most similar value between two vectors in C++ - c++

Related

C++ sort ascending non-zero values

Find similar distances between all values in vector and subset them

c++ iterate through all neighbor permutations

Extract the n lowest sums from combinations of elements from m arrays for huge datasets

sum of squares matrices

Categories

Resources