how to correctly call std::lower_bound()? - c++

I'm plotting a data file with two columns (frequency and voltage) and I need to look for the closest value to a given value val. The thing is that my data behaves like a gaussian, so there are two values that satisfies that, above and below a max value. First I get each column in the data file into a vector array, having that I defined this function for finding those values
#include<iostream>
#include<vector>
#include<cmath>
typedef std::vector <double> vector;
vector posi(vector vec, int ref, double val);
int main(void){
//define a custom volt vector here e.g vector volt{...};
auto auxvmax = std::max_element(volt.begin(), volt.end());
int posvmax = auxvmax - volt.begin();//this is what I take as ref value
double val = 0.7;
vector fpos(2, 0.0);
fpos = posi(volt, posvmax, val);
double auxf_1 = fpos[0];
double auxf_2 = fpos[1];
std::cout << "closest value to " << val << " are " << volt[auxf_1] << " below and " << volt[auxf_2] << " above\n";
return 0;
}
vector posi(vector vec, int ref, double val){
vector posvec(2, 0.0);
auto pos1 = std::lower_bound(vec.begin(), vec.begin() + ref, val);
auto pos2 = std::lower_bound(vec.begin() + ref, vec.end(), val);
double val1a = *(pos1 - 1.0);
double val1b = *pos1;
double val2a = *(pos2 - 1.0);
double val2b = *pos2;
if(fabs(val - val1a) < fabs(val - val1b)){
posvec[0] = pos1 - vec.begin() - 1;
}
if(fabs(val - val1a) > fabs(val1b)){
posvec[0] = pos1 - vec.begin();
}
if(fabs(val - val2a) < fabs(val - val2b)){
posvec[1] = pos2 - vec.begin() - 1;
}
if(fabs(val - val2a) > fabs(val - val2b)){
posvec[1] = pos2 - vec.begin();
}
return posvec;
}
Let me explain how and why I constructed the function like this, so you can tell me where am I wrong.
Basically, I'm trying to use std::lower_bound() in two "regions" of the vector in which the values are, this is so that the program looks for only one closest value in each region, I know where the max value is (in main function, via std::max_element()) so I can easily make the split. ref is the position (iterator) where the max value is allocated in vec vector, so it is supposed to get the closest value to val above and below that ref position.
Next, it only makes sure that the value is the closest, considering the smallest next value to val (default value given by std::lower_bound()) and the previous one, getting the closest in each case (region). Finally, the position in the vector of each value is stored into posvec vector (since I have to get the frequencies where voltage is the closest to val, so I don't need the voltage value but his position, since freq and volt make pairs).
when I compile it gives no errors, no warnings, but the closest values are not the closest. I found (cppreference) that the array that std::lower_bound() gets must be sorted from low to max, and my data is sorted from lowest to max below the max value, and max to lowest above max value, so there should be a problem with the above data, but -and here is my question- why am I not getting the closest value even with the data below? am I not getting something with the std::lower_bound() behavior?, is it maybe something when I use the if statements? the output, printing an example voltage vector as well is below
Here you can see the "closest" values are indeed the furthest.
Thanks for your help in advance.
EDIT: asked in comments, the output is
closest values to 0.7 are 0.485437 below, and 0.320388 above
0.485437
0.500971
0.524272
0.543689
0.563107
0.594175
0.617476
0.648544
0.679612
0.71068
0.741748
0.786408
0.825243
0.864078
0.893204
0.932039
0.961165
0.980583
0.990291
1
0.990291
0.961165
0.941748
0.893204
0.854369
0.805825
0.757282
0.708738
0.669903
0.621359
0.582524
0.547573
0.512621
0.481553
0.454369
0.427184
0.403883
0.384466
0.361165
0.341748
0.320388

Related

How to optimize the sorting of a vector with coordinates based on distance?

I need to sort a vector of coordinates (x, y >= 1) in a way that every next point from the vector is the closest one to the previous by calculating the distance with the formula from getDistance().
My current solution is too slow as I need the program to be able to finish in 5 seconds or less with vector length (N) equal to 100 000.
struct Point {
int ind;
int x;
int y;
double dist;
};
double getDist(int x1, int y1, int x2, int y2) {
return sqrt((x1 - x2) * (x1 - x2) + (y1 - y2) * (y1 - y2));
}
vector<Point> cordSort(vector<Point> vect) {
vector<Point> finalDistVect;
finalDistVect.push_back(vect[0]);
Point firstPoint = vect[0];
vect.erase(vect.begin());
for (i = 0; i < pcVect.size() - 1; i++) {
sort(vect.begin(), vect.end(), [firstPoint](const Point & a, const Point & b) {
return getDist(firstPoint.x, firstPoint.y, a.x, a.y) < getDist(firstPoint.x, firstPoint.y, b.x, b.y);
});
finalDistVect.push_back(vect[0]);
finalDistVect[i].dist = getDist(firstPoint.x, firstPoint.y, vect[0].x, vect[0].y);
firstPoint = vect[0];
vect.erase(vect.begin());
}
return finalDistVect;
}
vect is the initial vector with coordinates sorted by:
sort(vect.begin(), vect.end(), [](const Point & a, const Point & b) {
if (a.x + a.y != b.x + b.y) {
return a.x + a.y < b.x + b.y;
}
return a.x < b.x;
});
I am thinking about implementing bucket sort but I don't know if it will work for my problem.
your implementation indeed increase inefficiency by repeatedly erase the first element from a vector. std::vector is not designed to be used to frequently erase elements from other the back.
Not sure if I read your algorithm correctly. The first element is predetermined as the first element of the input, then your program repeatedly find the points that has shortest distance from the last element (point) in the output point vector.
If that's the case, it has no benefit to sort at all.
A naive algorithm is like bubbling:
1. add first point to outputVector, add all rest to openSet;
2. if openSet is empty, we are done.
3. take the last point from output Vector, check with all points in `openSet`, to find the one with shortest distance from it, add it to outputVector, remove it from openSet;
4. goto 2
Basically I am recommending you do use a std::set to keep track of openset, or maybe even better, std::unordered_set.
Another way is to do it in place, just swap the chosen points with the one who is taking its place.
e.g. we have P0, P1, P2, P3, P4 as input in a vector
1. int pos = 1; // P0 is in right place, we are looking for
// the point that shall go to index 1;
2. check all points from index `pos` to `4`(which is the max index) and find the one with shortest distance from `P0`, let's say we get `P4`;
3. swap `P0` (the one at index `pos`) and `P4` (the chosen one);
4. ++ pos;
5. if pos!=4(max index), goto 2.
We are using pos to keep track of sorted and open and do it in place.
This is not a sorting problem, you have to come up with a different algorithm. For example, there may be no solution at all, which would not be the case for a sorting problem.
Consider four points on a single line: A=(1, 1), B=(100, 1), C=(101, 1), D=(1000, 1). Point D is not the closest point for any other point, so it should come first. Then we should put C, followed by B, and now we cannot put A because the closest point to B is actually C, not A.
And even if it was, you should come up with a faster algorithm. You have N iterations of the for loop, each iteration looks for the smallest element among N other elements, which is already at least O(N^2). Using sort instead of min_element makes it even worse: O(N^2 log N). This won't fly for N ~ 100'000 at all in competitive programming/home assignments.

idiom for iterating a range of angles in a container?

Suppose I have data of 2D lines in the form
struct Point { int x, y; };
struct Line {
Point p1, p2;
double angle() const { return atan2(p2.y-p1.y, p2.x-p1.x); }
};
And I want to store these sorted by angle, which must be in the interval (-PI, PI].
My problem: I want to iterate a range in this container, but allow it to wrap around the ends of the interval. For example "all lines between angles PI*3/4 to -PI*3/4".
To clarify, if I use standard containers like multimap, I can't simply do the usual:
std::multimap<double, Line> lm;
//insert elements...
auto begin = lm.lower_bound(PI*3/4);
auto end = lm.upper_bound(-PI*3/4);
for(auto & i = begin; i != end; ++i) { //infinite loop: end is before begin!
//do stuff with i
}
I could hack up a "circularly iterate i" function to take the place of the ++i in the loop, I guess. But this seems like it should be a common problem, so I'm wondering if there's already an existing idiom to address it?
There is trigonometric approach to solve problems with circular ranges. For range - normalize its ends (examples here), and get middle angle and half-angle
if range_end < range_start then
range_end = range_end + 2 * Pi
half = (range_end - range_start) / 2
mid = (range_end + range_start) / 2
coshalf = Cos(half)
Now compare that difference of angle and range middle is lower then half-angle. Cosine solves potential troubles with periodicity, negative values etc.
if Cos(angle - mid) >= coshalf then
angle lies in range

c++ armadillo cast/convert to integer type vector or matrix

How can I convert a double/float-typed vector or matrix to an word/uword-typed vector or matrix?
I need to create an indexing array indices.
vec t = linspace(0, 100);
double freq = 0.25;
indices = floor(t / freq);
I'm having trouble on the last line.
If you are just dealing with positive values, then the conv_to function of the armadillo package will do exactly the same as the method you are trying to use.
vec t = linspace(0, 100);
double freq = 0.25;
ivec indices = conv_to<ivec>::from(t / freq);
If you want the results to be the same as the use of the floor function for negative values of t, you could replace the last line with
ivec indices = conv_to<ivec>::from(floor(t / freq));
Your best bet would be to use an iterator to walk through your t vector, and then push back the results of floor( *it_to_t / freq ) onto your indices vector.

Improve minimum distance filter for pointset

I create a minimum distance filter for points.
The function takes a stream of points (x1,y1,x2,y2...) and removes the corresponding ones.
void minDistanceFilter(vector<float> &points, float distance = 0.0)
{
float p0x, p0y;
float dx, dy, dsq;
float mdsq = distance*distance; // minimum distance square
unsigned i, j, n = points.size();
for(i=0; i<n; ++i)
{
p0x = points[i];
p0y = points[i+1];
for(j=0; j<n; j+=2)
{
//if (i == j) continue; // discard itself (seems like it slows down the algorithm)
dx = p0x - points[j]; // delta x (p0x - p1x)
dy = p0y - points[j+1]; // delta y (p0y - p1y)
dsq = dx*dx + dy*dy; // distance square
if (dsq < mdsq)
{
auto del = points.begin() + j;
points.erase(del,del+3);
n = points.size(); // update n
j -= 2; // decrement j
}
}
}
}
The only problem that is very slow, due to it tests all points against all points (n^2).
How could it be improved?
kd-trees or range trees could be used for your problem. However, if you want to code from scratch and want something simpler, then you can use a hash table structure. For each point (a,b), hash using the key (round(a/d),round(b/d)) and store all the points that have the same key in a list. Then, for each key (m,n) in your hash table, compare all points in the list to the list of points that have key (m',n') for all 9 choices of (m',n') where m' = m + (-1 or 0 or 1) and n' = n + (-1 or 0 or 1). These are the only points that can be within distance d of your points that have key (m,n). The downside compared to a kd-tree or range tree is that for a given point, you are effectively searching within a square of side length 3*d for points that might have distance d or less, instead of searching within a square of side length 2*d which is what you would get if you used a kd-tree or range tree. But if you are coding from scratch, this is easier to code; also kd-trees and range trees are kinda overkill if you only have one universal distance d that you care about for all points.
Look up range tree, e.g. en.wikipedia.org/wiki/Range_tree . You can use this structure to store 2-dimensional points and very quickly find all the points that lie inside a query rectangle. Since you want to find points within a certain distance d of a point (a,b), your query rectangle will need to be [a-d,a+d]x[b-d,b+d] and then you test any points found inside the rectangle to make sure they are actually within distance d of (a,b). Range tree can be built in O(n log n) time and space, and range queries take O(log n + k) time where k is the number of points found in the rectangle. Seems optimal for your problem.

Better way than if else if else... for linear interpolation

question is easy.
Lets say you have function
double interpolate (double x);
and you have a table that has map of known x-> y
for example
5 15
7 18
10 22
note: real tables are bigger ofc, this is just example.
so for 8 you would return 18+((8-7)/(10-7))*(22-18)=19.3333333
One cool way I found is
http://www.bnikolic.co.uk/blog/cpp-map-interp.html
(long story short it uses std::map, key= x, value = y for x->y data pairs).
If somebody asks what is the if else if else way in title
it is basically:
if ((x>=5) && (x<=7))
{
//interpolate
}
else
if((x>=7) && x<=10)
{
//interpolate
}
So is there a more clever way to do it or map way is the state of the art? :)
Btw I prefer soutions in C++ but obviously any language solution that has 1:1 mapping to C++ is nice.
Well, the easiest way I can think of would be using a binary search to find the point where your point lies. Try to avoid maps if you can, as they are very slow in practice.
This is a simple way:
const double INF = 1.e100;
vector<pair<double, double> > table;
double interpolate(double x) {
// Assumes that "table" is sorted by .first
// Check if x is out of bound
if (x > table.back().first) return INF;
if (x < table[0].first) return -INF;
vector<pair<double, double> >::iterator it, it2;
// INFINITY is defined in math.h in the glibc implementation
it = lower_bound(table.begin(), table.end(), make_pair(x, -INF));
// Corner case
if (it == table.begin()) return it->second;
it2 = it;
--it2;
return it2->second + (it->second - it2->second)*(x - it2->first)/(it->first - it2->first);
}
int main() {
table.push_back(make_pair(5., 15.));
table.push_back(make_pair(7., 18.));
table.push_back(make_pair(10., 22.));
// If you are not sure if table is sorted:
sort(table.begin(), table.end());
printf("%f\n", interpolate(8.));
printf("%f\n", interpolate(10.));
printf("%f\n", interpolate(10.1));
}
You can use a binary search tree to store the interpolation data. This is beneficial when you have a large set of N interpolation points, as interpolation can then be performed in O(log N) time. However, in your example, this does not seem to be the case, and the linear search suggested by RedX is more appropriate.
#include <stdio.h>
#include <assert.h>
#include <map>
static double interpolate (double x, const std::map<double, double> &table)
{
assert(table.size() > 0);
std::map<double, double>::const_iterator it = table.lower_bound(x);
if (it == table.end()) {
return table.rbegin()->second;
} else {
if (it == table.begin()) {
return it->second;
} else {
double x2 = it->first;
double y2 = it->second;
--it;
double x1 = it->first;
double y1 = it->second;
double p = (x - x1) / (x2 - x1);
return (1 - p) * y1 + p * y2;
}
}
}
int main ()
{
std::map<double, double> table;
table.insert(std::pair<double, double>(5, 6));
table.insert(std::pair<double, double>(8, 4));
table.insert(std::pair<double, double>(9, 5));
double y = interpolate(5.1, table);
printf("%f\n", y);
}
Store your points sorted:
index X Y
1 1 -> 3
2 3 -> 7
3 10-> 8
Then loop from max to min and as soon as you get below a number you know it the one you want.
You want let's say 6 so:
// pseudo
for i = 3 to 1
if x[i] <= 6
// you found your range!
// interpolate between x[i] and x[i - 1]
break; // Do not look any further
end
end
Yes, I guess that you should think in a map between those intervals and the natural nummbers. I mean, just label the intervals and use a switch:
switch(I) {
case Int1: //whatever
break;
...
default:
}
I don't know, it's the first thing that I thought of.
EDIT Switch is more efficient than if-else if your numbers are within a relative small interval (that's something to take into account when doing the mapping)
If your x-coordinates must be irregularly spaced, then store the x-coordinates in sorted order, and use a binary search to find the nearest coordinate, for example using Daniel Fleischman's answer.
However, if your problem permits it, consider pre-interpolating to regularly spaced data. So
5 15
7 18
10 22
becomes
5 15
6 16.5
7 18
8 19.3333333
9 20.6666667
10 22
Then at run-time you can interpolate with O(1) using something like this:
double interp1( double x0, double dx, double* y, int n, double xi )
{
double f = ( xi - x0 ) / dx;
if (f<0) return y[0];
if (f>=(n-1)) return y[n-1];
int i = (int) f;
double w = f-(double)i;
return dy[i]*(1.0-w) + dy[i+1]*w;
}
using
double y[6] = {15,16.5,18,19.3333333, 20.6666667, 22 }
double yi = interp1( 5.0 , 1.0 , y, 5, xi );
This isn't necessarily suitable for every problem -- you could end up losing accuracy (if there's no nice grid that contains all your x-samples), and it could have a bad cache penalty if it would make your table much much bigger. But it's a good option for cases where you have some control over the x-coordinates to begin with.
How you've already got it is fairly readable and understandable, and there's a lot to be said for that over a "clever" solution. You can however do away with the lower bounds check and clumsy && because the sequence is ordered:
if (x < 5)
return 0;
else if (x <= 7)
// interpolate
else if (x <= 10)
// interpolate
...