Vector recycling with Rcpp - c++

I'm trying to get vector recycling to work in Rcpp.
> recycle_and_add <- Rcpp::cppFunction("
+ NumericVector recycle_and_add(NumericVector x, NumericVector y) {
+ return x + y;
+ }")
> recycle_and_add(42, 1:5)
[1] 43
I'm expecting it to return something like
> 42 + 1:5
[1] 43 44 45 46 47
After some analysis, I found out that x.size() is 1 and y.size() is 5 within the Rcpp function, so clearly vector recycling doesn't work out-of-the-box.
While I can manually find the longest of x and y and recycle the shorter one, in the actual application there are 3 or 4 arguments requiring recycling, so I can imagine manual unrolling would result in a lot of variables pointing to different vectors and turn the code into a pile of spaghetti.
Does Rcpp have any built-in support for vector recycling, like, with some sugar?

Strategy-wise, it's almost always easier to recycle in R and then move into C++.
If it must be done in C++, then the following design pattern should work:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericVector recycle_vector(Rcpp::NumericVector x,
Rcpp::NumericVector y) {
// Obtain vector sizes
int n_x = x.size();
int n_y = y.size();
// Check both vectors have elements
if(n_x <= 0 || n_y <= 0) {
Rcpp::stop("Both `x` and `y` vectors must have at least 1 element.");
}
// Compare the three cases that lead to recycling...
if(n_x == n_y) {
return x + y;
} else if (n_x > n_y) {
return Rcpp::rep_len(y, n_x) + x;
}
return Rcpp::rep_len(x, n_y) + y;
}
Test Cases:
recycle_vector(1:3, 1:3)
# [1] 2 4 6
recycle_vector(4, 1:3)
# [1] 5 6 7
recycle_vector(10:12, -2:-1)
# [1] 8 10 10

Related

Detecting and omitting na values from a std vector in Rcpp

I have a std::vector; whose elements need to be summed up after checking if there is any Na values (and obviously removing the Na values if it has any) in it. I have to do it in Rcpp. Now, for a numeric vector in Rcpp (NumericVector); it is very easy as the code says below:
cppFunction("
double res ( NumericVector x){
NumericVector v = x[! is_na(x)];
return sum(v);
}
")
. So for a vector "x", it easily gives the sum as follows:
x<- c(NaN,1,2)
res(x)
[1] 3
Now for a std::vector x; how can I do the same?
You should be able to use RcppHoney (also on CRAN here) which brings the vectorised idioms of Rcpp Sugar (which has vectorised NA tests just like R has) to any iterable container -- hence also STL ones.
See eg the into vignette for this example of combining different vector types and classes into a single scalar exppression:
// [[Rcpp::export]]
Rcpp::NumericVector example_manually_hooked() {
// We manually hooked std::list in to RcppHoney so we'll create one
std::list< int > l;
l.push_back(1); l.push_back(2); l.push_back(3); l.push_back(4); l.push_back(5);
// std::vector is already hooked in to RcppHoney in default_hooks.hpp so
// we'll create one of those too
std::vector< int > v(l.begin(), l.end());
// And for good measure, let's create an Rcpp::NumericVector which is
// also hooked by default
Rcpp::NumericVector v2(v.begin(), v.end());
// Now do some weird operations incorporating std::vector, std::list,
// Rcpp::NumericVector and some RcppHoney functions and return it. The
// return value will be equal to the following R snippet:
// v <- 1:5
// result <- 42 + v + v + log(v) - v - v + sqrt(v) + -v + 42
// We can store our result in any of RcppHoney::LogicalVector,
// RcppHoney::IntegerVector, or RcppHoney::NumericVector and simply return
// it to R. These classes inherit from their Rcpp counterparts and add a
// new constructor. The only copy of the data, in this case, is when we
// assign our expression to retval. Since it is then a "native" R type,
// returning it is a shallow copy. Alternatively we could write this as:
// return Rcpp::wrap(1 + v + RcppHoney::log(v) - v - 1
// + RcppHoney::sqrt(v) + -v2);
RcppHoney::NumericVector retval
= 42 + l + v + RcppHoney::log(v) - v - l + RcppHoney::sqrt(v) + -v2
+ 42;
return retval;
}

Rcpp version of base-R seq drops values

I wrote an Rcpp version of the base-R seq function.
library(Rcpp)
cppFunction('NumericVector seqC(double x, double y, double by) {
// length of result vector
int nRatio = (y - x) / by;
NumericVector anOut(nRatio + 1);
// compute sequence
int n = 0;
for (double i = x; i <= y; i = i + by) {
anOut[n] = i;
n += 1;
}
return anOut;
}')
For the following tests, it works just fine.
seqC(1, 11, 2)
[1] 1 3 5 7 9 11
seqC(1, 10, 2)
[1] 1 3 5 7 9 11
Also, it works (sometimes) when passing values with decimal digits rather than
integers.
seqC(0.43, 0.45, 0.001)
[1] 0.430 0.431 0.432 0.433 0.434 0.435 0.436 0.437 0.438 0.439 0.440 0.441 0.442 0.443 0.444 0.445 0.446 0.447 0.448 0.449 0.450
However, sometimes the function does not seem to work as expected since the last
entry of the sequence is being dropped (or rather, the output vector anOut
does not have the proper size), which - according to my rather scarce C++ skills,
may be attributed to some kind of rounding errors.
seqC(0.53, 0.59, 0.001)
[1] 0.530 0.531 0.532 0.533 0.534 0.535 0.536 0.537 0.538 0.539 0.540 0.541 0.542 0.543 0.544 0.545 0.546 0.547 0.548 0.549 0.550 0.551
[23] 0.552 0.553 0.554 0.555 0.556 0.557 0.558 0.559 0.560 0.561 0.562 0.563 0.564 0.565 0.566 0.567 0.568 0.569 0.570 0.571 0.572 0.573
[45] 0.574 0.575 0.576 0.577 0.578 0.579 0.580 0.581 0.582 0.583 0.584 0.585 0.586 0.587 0.588 0.589
In the last example, for instance, the last value (0.590) is missing. Does
anyone know how to fix this?
As noted by others, the problem you are experiencing is fundamentally a floating point arithmetic error. A common workaround is to scale your doubles up to sufficiently large integers, perform the task, and then adjust the result to the original scale of your inputs. I took a slightly different approach than #RHertel by letting the amount of scaling (adjust) be determined by the precision of the increment rather than using a fixed amount, but the idea is essentially the same.
#include <Rcpp.h>
struct add_multiple {
int incr;
int count;
add_multiple(int incr)
: incr(incr), count(0)
{}
inline int operator()(int d) {
return d + incr * count++;
}
};
// [[Rcpp::export]]
Rcpp::NumericVector rcpp_seq(double from_, double to_, double by_ = 1.0) {
int adjust = std::pow(10, std::ceil(std::log10(10 / by_)) - 1);
int from = adjust * from_;
int to = adjust * to_;
int by = adjust * by_;
std::size_t n = ((to - from) / by) + 1;
Rcpp::IntegerVector res = Rcpp::rep(from, n);
add_multiple ftor(by);
std::transform(res.begin(), res.end(), res.begin(), ftor);
return Rcpp::NumericVector(res) / adjust;
}
/*** R
all.equal(seq(.53, .59, .001), seqC(.53, .59, .001)) &&
all.equal(seq(.53, .59, .001), rcpp_seq(.53, .59, .001))
# [1] TRUE
all.equal(seq(.53, .54, .000001), seqC(.53, .54, .000001)) &&
all.equal(seq(.53, .54, .000001), rcpp_seq(.53, .54, .000001))
# [1] TRUE
microbenchmark::microbenchmark(
"seq" = seq(.53, .54, .000001),
"seqC" = seqC(0.53, 0.54, 0.000001),
"rcpp_seq" = rcpp_seq(0.53, 0.54, 0.000001),
times = 100L)
Unit: microseconds
expr min lq mean median uq max neval
seq 896.190 1015.7940 1167.4708 1132.466 1221.624 1651.571 100
seqC 212293.307 219527.6590 226933.4329 223384.592 227860.410 398462.561 100
rcpp_seq 182.848 194.1665 225.4338 227.396 244.942 320.436 100
*/
Where seqC was #RHertel's revised implementation that produced the correct result. FWIW I think the slow performance of this function is mainly do to the use of push_back on the NumericVector type, which the Rcpp developers strongly advise against.
The "<=" can create difficulties with floating point numbers. This is a variant of the famous question "Why are these numbers not equal?". Moreover, there is a similar issue with the vector length, which in the case of your last example should be 60, but it is actually calculated to be 59. This is most likely due to the conversion to an integer (by casting, i.e., truncation) of a value like 59.999999 or something similar.
It seems to be very difficult to fix these problems, so I have rewritten a considerable part of the code, hoping that now the function operates as required.
The following code should provide correct results for essentially any kind of increasing series (i.e., y > x, by > 0).
cppFunction('NumericVector seqC(double x, double y, double by) {
NumericVector anOut(1);
// compute sequence
double min_by = 1.e-8;
if (by < min_by) min_by = by/100;
double i = x + by;
anOut(0) = x;
while(i/min_by < y/min_by + 1) {
anOut.push_back(i);
i += by;
}
return anOut;
}')
Hope this helps. And thanks a lot to #Konrad Rudolph for pointing out mistakes in my previous attempts!

Trying to write a setdiff() function using RcppArmadillo gives compilation error

I'm trying to write a sort of analogue of R's setdiff() function in C++ using RcppArmadillo. My rather crude approach:
// [[Rcpp::export]]
arma::uvec my_setdiff(arma::uvec x, arma::uvec y){
// Coefficientes of unsigned integer vector y form a subset of the coefficients of unsigned integer vector x.
// Returns set difference between the coefficients of x and those of y
int n2 = y.n_elem;
uword q1;
for (int j=0 ; j<n2 ; j++){
q1 = find(x==y[j]);
x.shed_row(q1);
}
return x;
}
fails at compilation time. The error reads:
fnsauxarma.cpp:622:29: error: no matching function for call to ‘arma::Col<double>::shed_row(const arma::mtOp<unsigned int, arma::mtOp<unsigned int, arma::Col<double>, arma::op_rel_eq>, arma::op_find>)’
I really have no idea what's going on, any help or comments would be greatly appreciated.
The problem is that arma::find returns a uvec, and doesn't know how to make the implicit conversion to arma::uword, as pointed out by #mtall. You can help the compiler out by using the templated arma::conv_to<T>::from() function. Also, I included another version of my_setdiff that returns an Rcpp::NumericVector because although the first version returns the correct values, it's technically a matrix (i.e. it has dimensions), and I assume you would want this to be as compatible with R's setdiff as possible. This is accomplished by setting the dim attribute of the return vector to NULL, using R_NilValue and the Rcpp::attr member function.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::uvec my_setdiff(arma::uvec& x, const arma::uvec& y){
for (size_t j = 0; j < y.n_elem; j++) {
arma::uword q1 = arma::conv_to<arma::uword>::from(arma::find(x == y[j]));
x.shed_row(q1);
}
return x;
}
// [[Rcpp::export]]
Rcpp::NumericVector my_setdiff2(arma::uvec& x, const arma::uvec& y){
for (size_t j = 0; j < y.n_elem; j++) {
arma::uword q1 = arma::conv_to<arma::uword>::from(arma::find(x == y[j]));
x.shed_row(q1);
}
Rcpp::NumericVector x2 = Rcpp::wrap(x);
x2.attr("dim") = R_NilValue;
return x2;
}
/*** R
x <- 1:8
y <- 2:6
R> all.equal(setdiff(x,y), my_setdiff(x,y))
#[1] "Attributes: < target is NULL, current is list >" "target is numeric, current is matrix"
R> all.equal(setdiff(x,y), my_setdiff2(x,y))
#[1] TRUE
R> setdiff(x,y)
#[1] 1 7 8
R> my_setdiff(x,y)
# [,1]
# [1,] 1
# [2,] 7
# [3,] 8
R> my_setdiff2(x,y)
#[1] 1 7 8
*/
Edit:
For the sake of completeness, here is a more robust version of setdiff than the two implementations presented above:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
Rcpp::NumericVector arma_setdiff(arma::uvec& x, arma::uvec& y){
x = arma::unique(x);
y = arma::unique(y);
for (size_t j = 0; j < y.n_elem; j++) {
arma::uvec q1 = arma::find(x == y[j]);
if (!q1.empty()) {
x.shed_row(q1(0));
}
}
Rcpp::NumericVector x2 = Rcpp::wrap(x);
x2.attr("dim") = R_NilValue;
return x2;
}
/*** R
x <- 1:10
y <- 2:8
R> all.equal(setdiff(x,y), arma_setdiff(x,y))
#[1] TRUE
X <- 1:6
Y <- c(2,2,3)
R> all.equal(setdiff(X,Y), arma_setdiff(X,Y))
#[1] TRUE
*/
The previous versions would throw an error if you passed them vectors with non-unique elements, e.g.
R> my_setdiff2(X,Y)
error: conv_to(): given object doesn't have exactly one element
To solve the problem and more closely mirror R's setdiff, we just make x and y unique. Additionally, I switched out the arma::conv_to<>::from with q1(0) (where q1 is now a uvec instead of a uword), because uvec's are just a vector of uwords, and the explicit cast seemed a little inelegant.
I've used std::set_difference from the STL instead, converting back and forth from arma::uvec.
#include <RcppArmadillo.h>
#include <algorithm>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::uvec std_setdiff(arma::uvec& x, arma::uvec& y) {
std::vector<int> a = arma::conv_to< std::vector<int> >::from(arma::sort(x));
std::vector<int> b = arma::conv_to< std::vector<int> >::from(arma::sort(y));
std::vector<int> out;
std::set_difference(a.begin(), a.end(), b.begin(), b.end(),
std::inserter(out, out.end()));
return arma::conv_to<arma::uvec>::from(out);
}
Edit: I thought a performance comparison might be in order. The difference becomes smaller when the relative sizes of the sets are in the opposite order.
a <- sample.int(350)
b <- sample.int(150)
microbenchmark::microbenchmark(std_setdiff(a, b), arma_setdiff(a, b))
> Unit: microseconds
> expr min lq mean median uq max neval cld
> std_setdiff(a, b) 11.548 14.7545 17.29930 17.107 19.245 36.779 100 a
> arma_setdiff(a, b) 60.727 65.0040 71.77804 66.714 72.702 138.133 100 b
The Questioner might have already got the answer. However, the following template version may be more general. This is equivalent to setdiff function in Matlab
If P and Q are two sets, then their difference is given by P - Q or Q - P. If P = {1, 2, 3, 4} and Q = {4, 5, 6}, P - Q means elements of P which are not in Q. i.e., in the above example P - Q = {1, 2, 3}.
/* setdiff(t1, t2) is similar to setdiff() function in MATLAB. It removes the common elements and
gives the uncommon elements in the vectors t1 and t2. */
template <typename T>
T setdiff(T t1, T t2)
{
int size_of_t1 = size(t1);
int size_of_t2 = size(t2);
T Intersection_Elements;
uvec iA, iB;
intersect(Intersection_Elements, iA, iB, t1, t2);
for (int i = 0; i < size(iA); i++)
{
t1(iA(i)) = 0;
}
for (int i = 0; i < size(iB); i++)
{
t2(iB(i)) = 0;
}
T t1_t2_vec(size_of_t1 + size_of_t2);
t1_t2_vec = join_vert(t1, t2);
T DiffVec = nonzeros(t1_t2_vec);
return DiffVec;
}
Any suggestions for improving the performance of the algorithm are welcome.

Neighbor index computation for diagonally flattened matrix

I have a 2D matrix stored in a flat buffer along diagonals. For example a 4x4 matrix would have its indexes scattered like so:
0 2 5 9
1 4 8 12
3 7 11 14
6 10 13 15
With this representation, what is the most efficient way to calculate the index of a neighboring element given the original index and a X/Y offset? For example:
// return the index of a neighbor given an offset
int getNGonalNeighbor(const size_t index,
const int x_offset,
const int y_offset){
//...
}
// for the array above:
getNGonalNeighbor(15,-1,-1); // should return 11
getNGonalNeighbor(15, 0,-1); // should return 14
getNGonalNeighbor(15,-1, 0); // should return 13
getNGonalNeighbor(11,-2,-1); // should return 1
We assume here that overflow never occurs and there is no wrap-around.
I have a solution involving a lot of triangular number and triangular root calculations. It also contains a lot of branches, which I would prefer to replace with algebra if possible (this will run on GPUs where diverging control flow is expensive). My solution is working but very lengthy. I feel like there must be a much simpler and less compute intensive way of doing it.
Maybe it would help me if someone can put a name on this particular problem/representation.
I can post my full solution if anyone is interested, but as I said it is very long and relatively complicated for such a simple task. In a nutshell, my solution does:
translate the original index into a larger triangular matrix to avoid dealing with 2 triangles (for example 13 would become 17)
For the 4x4 matrix this would be:
0 2 5 9 14 20 27
1 4 8 13 19 26
3 7 12 18 25
6 11 17 24
10 16 23
15 22
21
calculate the index of the diagonal of the neighbor in this representation using the manhattan distance of the offset and the triangular root of the index.
calculate the position of the neighbor in this diagonal using the offset
translate back to the original representation by removing the padding.
For some reason this is the simplest solution i could come up with.
Edit:
having loop to accumulate the offset:
I realize that given the properties of the triangle numbers, it would be easier to split up the matrix in two triangles (let's call 0 to 9 'upper triangle' and 10 to 15 'lower triangle') and have a loop with a test inside to accumulate the offset by adding one while in the upper triangle and subtracting one in the lower (if that makes sense). But for my solution loops must be avoided at all cost, especially loops with unbalanced trip counts (again, very bad for GPUs).
So I am looking more for an algebraic solution rather than an algorithmic one.
Building a lookup table:
Again, because of the GPU, it is preferable to avoid building a lookup table and have random accesses in it (very expensive). An algebraic solution is preferable.
Properties of the matrix:
The size of the matrix is known.
For now I only consider square matrix, but a solution for rectangular ones as well would be nice.
as the name of the function in my example suggests, extending the solution to N-dimensional volumes (hence N-gonal flattening) would be a big plus too.
Table lookup
#include <stdio.h>
#define SIZE 16
#define SIDE 4 //sqrt(SIZE)
int table[SIZE];
int rtable[100];// {x,y| x<=99, y<=99 }
void setup(){
int i, x, y, xy, index;//xy = x + y
x=y=xy=0;
for(i=0;i<SIZE;++i){
table[i]= index= x*10 + y;
rtable[x*10+y]=i;
x = x + 1; y = y - 1;//right up
if(y < 0 || x >= SIDE){
++xy;
x = 0;
y = xy;;
while(y>=SIDE){
++x;
--y;
}
}
}
}
int getNGonalNeighbor(int index, int offsetX, int offsetY){
int x,y;
x=table[index] / 10 + offsetX;
y=table[index] % 10 + offsetY;
if(x < 0 || x >= SIDE || y < 0 || y >= SIDE) return -1; //ERROR
return rtable[ x*10+y ];
}
int main() {
int i;
setup();
printf("%d\n", getNGonalNeighbor(15,-1,-1));
printf("%d\n", getNGonalNeighbor(15, 0,-1));
printf("%d\n", getNGonalNeighbor(15,-1, 0));
printf("%d\n", getNGonalNeighbor(11,-2,-1));
printf("%d\n", getNGonalNeighbor(0, -1,-1));
return 0;
}
don't use table version.
#include <stdio.h>
#define SIZE 16
#define SIDE 4
void num2xy(int index, int *offsetX, int *offsetY){
int i, x, y, xy;//xy = x + y
x=y=xy=0;
for(i=0;i<SIZE;++i){
if(i == index){
*offsetX = x;
*offsetY = y;
return;
}
x = x + 1; y = y - 1;//right up
if(y < 0 || x >= SIDE){
++xy;
x = 0;
y = xy;;
while(y>=SIDE){
++x;
--y;
}
}
}
}
int xy2num(int offsetX, int offsetY){
int i, x, y, xy, index;//xy = x + y
x=y=xy=0;
for(i=0;i<SIZE;++i){
if(offsetX == x && offsetY == y) return i;
x = x + 1; y = y - 1;//right up
if(y < 0 || x >= SIDE){
++xy;
x = 0;
y = xy;;
while(y>=SIDE){
++x;
--y;
}
}
}
return -1;
}
int getNGonalNeighbor(int index, int offsetX, int offsetY){
int x,y;
num2xy(index, &x, &y);
return xy2num(x + offsetX, y + offsetY);
}
int main() {
printf("%d\n", getNGonalNeighbor(15,-1,-1));
printf("%d\n", getNGonalNeighbor(15, 0,-1));
printf("%d\n", getNGonalNeighbor(15,-1, 0));
printf("%d\n", getNGonalNeighbor(11,-2,-1));
printf("%d\n", getNGonalNeighbor(0, -1,-1));
return 0;
}
I actually already had the elements to solve it somewhere else in my code. As BLUEPIXY's solution hinted, I am using scatter/gather operations, which I had already implemented for layout transformation.
This solution basically rebuilds the original (x,y) index of the given element in the matrix, applies the index offset and translates the result back to the transformed layout. It splits the square in 2 triangles and adjust the computation depending on which triangle it belongs to.
It is an almost entirely algebraic transformation: it uses no loop and no table lookup, has a small memory footprint and little branching. The code can probably be optimized further.
Here is the draft of the code:
#include <stdio.h>
#include <math.h>
// size of the matrix
#define SIZE 4
// triangle number of X
#define TRIG(X) (((X) * ((X) + 1)) >> 1)
// triangle root of X
#define TRIROOT(X) ((int)(sqrt(8*(X)+1)-1)>>1);
// return the index of a neighbor given an offset
int getNGonalNeighbor(const size_t index,
const int x_offset,
const int y_offset){
// compute largest upper triangle index
const size_t upper_triangle = TRIG(SIZE);
// position of the actual element of index
unsigned int x = 0,y = 0;
// adjust the index depending of upper/lower triangle.
const size_t adjusted_index = index < upper_triangle ?
index :
SIZE * SIZE - index - 1;
// compute triangular root
const size_t triroot = TRIROOT(adjusted_index);
const size_t trig = TRIG(triroot);
const size_t offset = adjusted_index - trig;
// upper triangle
if(index < upper_triangle){
x = offset;
y = triroot-offset;
}
// lower triangle
else {
x = SIZE - offset - 1;
y = SIZE - (trig + triroot + 1 - adjusted_index);
}
// adjust the offset
x += x_offset;
y += y_offset;
// manhattan distance
const size_t man_dist = x+y;
// calculate index using triangular number
return TRIG(man_dist) +
(man_dist >= SIZE ? x - (man_dist - SIZE + 1) : x) -
(man_dist > SIZE ? 2* TRIG(man_dist - SIZE) : 0);
}
int main(){
printf("%d\n", getNGonalNeighbor(15,-1,-1)); // should return 11
printf("%d\n", getNGonalNeighbor(15, 0,-1)); // should return 14
printf("%d\n", getNGonalNeighbor(15,-1, 0)); // should return 13
printf("%d\n", getNGonalNeighbor(11,-2,-1)); // should return 1
}
And the output is indeed:
11
14
13
1
If you think this solution looks over complicated and inefficient, I remind you that the target here is GPU, where computation costs virtually nothing compared to memory accesses, and all index computations are computed at the same time using massively parallel architectures.

Better way than if else if else... for linear interpolation

question is easy.
Lets say you have function
double interpolate (double x);
and you have a table that has map of known x-> y
for example
5 15
7 18
10 22
note: real tables are bigger ofc, this is just example.
so for 8 you would return 18+((8-7)/(10-7))*(22-18)=19.3333333
One cool way I found is
http://www.bnikolic.co.uk/blog/cpp-map-interp.html
(long story short it uses std::map, key= x, value = y for x->y data pairs).
If somebody asks what is the if else if else way in title
it is basically:
if ((x>=5) && (x<=7))
{
//interpolate
}
else
if((x>=7) && x<=10)
{
//interpolate
}
So is there a more clever way to do it or map way is the state of the art? :)
Btw I prefer soutions in C++ but obviously any language solution that has 1:1 mapping to C++ is nice.
Well, the easiest way I can think of would be using a binary search to find the point where your point lies. Try to avoid maps if you can, as they are very slow in practice.
This is a simple way:
const double INF = 1.e100;
vector<pair<double, double> > table;
double interpolate(double x) {
// Assumes that "table" is sorted by .first
// Check if x is out of bound
if (x > table.back().first) return INF;
if (x < table[0].first) return -INF;
vector<pair<double, double> >::iterator it, it2;
// INFINITY is defined in math.h in the glibc implementation
it = lower_bound(table.begin(), table.end(), make_pair(x, -INF));
// Corner case
if (it == table.begin()) return it->second;
it2 = it;
--it2;
return it2->second + (it->second - it2->second)*(x - it2->first)/(it->first - it2->first);
}
int main() {
table.push_back(make_pair(5., 15.));
table.push_back(make_pair(7., 18.));
table.push_back(make_pair(10., 22.));
// If you are not sure if table is sorted:
sort(table.begin(), table.end());
printf("%f\n", interpolate(8.));
printf("%f\n", interpolate(10.));
printf("%f\n", interpolate(10.1));
}
You can use a binary search tree to store the interpolation data. This is beneficial when you have a large set of N interpolation points, as interpolation can then be performed in O(log N) time. However, in your example, this does not seem to be the case, and the linear search suggested by RedX is more appropriate.
#include <stdio.h>
#include <assert.h>
#include <map>
static double interpolate (double x, const std::map<double, double> &table)
{
assert(table.size() > 0);
std::map<double, double>::const_iterator it = table.lower_bound(x);
if (it == table.end()) {
return table.rbegin()->second;
} else {
if (it == table.begin()) {
return it->second;
} else {
double x2 = it->first;
double y2 = it->second;
--it;
double x1 = it->first;
double y1 = it->second;
double p = (x - x1) / (x2 - x1);
return (1 - p) * y1 + p * y2;
}
}
}
int main ()
{
std::map<double, double> table;
table.insert(std::pair<double, double>(5, 6));
table.insert(std::pair<double, double>(8, 4));
table.insert(std::pair<double, double>(9, 5));
double y = interpolate(5.1, table);
printf("%f\n", y);
}
Store your points sorted:
index X Y
1 1 -> 3
2 3 -> 7
3 10-> 8
Then loop from max to min and as soon as you get below a number you know it the one you want.
You want let's say 6 so:
// pseudo
for i = 3 to 1
if x[i] <= 6
// you found your range!
// interpolate between x[i] and x[i - 1]
break; // Do not look any further
end
end
Yes, I guess that you should think in a map between those intervals and the natural nummbers. I mean, just label the intervals and use a switch:
switch(I) {
case Int1: //whatever
break;
...
default:
}
I don't know, it's the first thing that I thought of.
EDIT Switch is more efficient than if-else if your numbers are within a relative small interval (that's something to take into account when doing the mapping)
If your x-coordinates must be irregularly spaced, then store the x-coordinates in sorted order, and use a binary search to find the nearest coordinate, for example using Daniel Fleischman's answer.
However, if your problem permits it, consider pre-interpolating to regularly spaced data. So
5 15
7 18
10 22
becomes
5 15
6 16.5
7 18
8 19.3333333
9 20.6666667
10 22
Then at run-time you can interpolate with O(1) using something like this:
double interp1( double x0, double dx, double* y, int n, double xi )
{
double f = ( xi - x0 ) / dx;
if (f<0) return y[0];
if (f>=(n-1)) return y[n-1];
int i = (int) f;
double w = f-(double)i;
return dy[i]*(1.0-w) + dy[i+1]*w;
}
using
double y[6] = {15,16.5,18,19.3333333, 20.6666667, 22 }
double yi = interp1( 5.0 , 1.0 , y, 5, xi );
This isn't necessarily suitable for every problem -- you could end up losing accuracy (if there's no nice grid that contains all your x-samples), and it could have a bad cache penalty if it would make your table much much bigger. But it's a good option for cases where you have some control over the x-coordinates to begin with.
How you've already got it is fairly readable and understandable, and there's a lot to be said for that over a "clever" solution. You can however do away with the lower bounds check and clumsy && because the sequence is ordered:
if (x < 5)
return 0;
else if (x <= 7)
// interpolate
else if (x <= 10)
// interpolate
...