Interpolation search? - c++

I have a uniform 1D grid with value {0.1, 0.22, 0.35, 0.5, 0.78, 0.92}. These values are equally positioned from position 0 to 5 like following:
value 0.1 0.22 0.35 0.5 0.78 0.92
|_________|_________|_________|_________|_________|
position 0 1 2 3 4 5
Now I like to extract/interpolated value positioned, say, at 2.3, which should be
val(2.3) = val(2)*(3-2.3) + val(3)*(2.3-2)
= 0.35*0.7 + 0.5*0.3
= 0.3950
So how should I do it in a optimized way in C++? I am on Visual Studio 2017.
I can think of a binary search, but is any some std methods/or better way to do the job? Thanks.

You can get the integer part of the interpolation value and use that to index the two values you need to interpolate between. No need to use binary search as you are always know between which two values you interpolate. Only need to look out for indices that are outside of the values if that could ever happen.
This only works if the values are always mapped to integer indices starting with zero.
#include <cmath>
float get( const std::vector<float>& val, float p )
{
// let's assume p is always valid so it is good as index
const int a = static_cast<int>(p); // round down
const float t = p - a;
return std::lerp(val[a], val[a+1], t);
}
Edit:
std::lerp is a c++20 feature. If you use earlier versions you can use the following implementation which should be good enough:
float lerp(float a, float b, float t)
{
return a + (b - a) * t;
}

Related

How do you properly "snap" to a value?

Let say I've set a step of 0.1 in my application.
So, whatever fp value I get, I just need 1 digit after the comma.
So, 47.93434 must be 47.9 (or at least, the nearest fp representable value).
If I write this:
double value = 47.9;
It correctly "snap" to the nearest fp value it can get, which is:
47.89999999999999857891452847979962825775146484375 // fp value
101111.11100110011001100110011001100110011001100110011 // binary
Now, suppose "I don't write those values", but I got them from a software.
And than I need to snap it. I wrote this function:
inline double SnapValue(double value, double step) {
return round(value / step) * step;
}
But it returns these values:
47.900000000000005684341886080801486968994140625 // fp value
101111.11100110011001100110011001100110011001100110100 // binary
which is formally a little far than the first example (its the "next" fp value, 011 + 1).
How would you get the first value (which is "more correct") for each input value?
Here's the testing code.
NOTE: the step can be different - i.e. step = 0.25 need to snap value around the nearest 0.25 X. Example: a step of 0.25 will return values as 0, 0.25, 0.50, 0.75, 1.0, 1.25 and so on. Thus, given an input of 1.30, it need to wrap to the nearest snapped value - i.e. 1.25.
You could try to use rational values instead of floating point. The latter are often inaccurate already, so not really an ideal match for a step.
inline double snap(double original, int numerator, int denominator)
{
return round(original * denominator / numerator) * numerator / denominator;
}
Say you want steps of 0.4, then use 2 / 5:
snap(1.7435, 2, 5) = round(4.35875) * 2 / 5 = 4 * 2 / 5 = 1.6 (or what comes closest to it)

"Normalize" a 2D Vector in C++ using lambda

I am implementing a lambda to row normalize a 2D vector in C++. Consider the simple case of a 3x3 matrix.
1 0 1
0 1 0
0 1 1
My normalization factor is the sum of non-zero entries in the row. Each entry is then divided by this normalization factor. For instance, row 1 has 2 non-zero entries summing up 2. Therefore, I divide each entry by 2. The row normalized vector is defined as follows:
1/2 0 1/2
0 1 0
0 1/2 1/2
The relevant normalization code is shown here(note MAX_SIZE = 3). There is a syntactical error in the lambda capture list.
for(int i = 0; i < MAX_SIZE ; i++)
{
transform(matrix[i].begin(),matrix[i].end(),matrix.begin(), [matrix[i].begin()](int x){
return distance(matrix[i].begin(),lower_bound(matrix[i].begin(),matrix[i].end(),x))});
}
Am I missing anything here?
A lambda capture list in C++ can only specify the names of values to capture, and matrix[i].begin() is not a name, it is a temporary value. You can either give it a name or you can make a variable for it in the enclosing scope. Much of the surrounding code is missing, so I invented a working version of the code for you to dissect:
#include <algorithm>
#include <cstdio>
template<int N>
void normalize(double (&mat)[N][N]) {
std::for_each(std::begin(mat), std::end(mat),
[](double (&row)[N]) {
double sum = std::accumulate(std::begin(row), std::end(row), 0.0);
std::transform(std::begin(row), std::end(row), std::begin(row),
[sum](double x) { return x / sum; });
});
}
template<int N>
void print(const double (&mat)[N][N]) {
std::for_each(std::begin(mat), std::end(mat),
[](const double (&row)[N]) {
std::for_each(std::begin(row), std::end(row),
[](double x) { std::printf(" %3.1f", x); });
std::putchar('\n');
});
}
int main() {
double mat[3][3] = {
{ 1, 0, 1 },
{ 0, 1, 0 },
{ 0, 1, 1 },
};
std::puts("Matrix:");
print(mat);
normalize(mat);
std::puts("Normalized:");
print(mat);
return 0;
}
Here is the output:
Matrix:
1.0 0.0 1.0
0.0 1.0 0.0
0.0 1.0 1.0
Normalized:
0.5 0.0 0.5
0.0 1.0 0.0
0.0 0.5 0.5
This code is a bit weird, as far as C++ code goes, because it uses lambdas for everything instead of loops (or mixing for loops with higher-order-functions). But you can see that by having a variable for each row (named row) we can make it very easy to loop over that row instead of specifying matrix[i] everywhere.
The weird syntax for array parameters double (&mat)[N][N] is to avoid pointer decay, which allows us to use begin() and end() in the function body (which don't work if the parameters decay to pointers).

Kernel Density Estimator ( with Gauss Kernel ) Sum f(x) = 1?

I want to use KDE with the Gaussian Kernel. If I'm correct, the sum of all f(x) must be 1 ( ~ rounding ) ?
My Implementation looks like this:
float K( float const& val)
{
const float p=1.0 / std::sqrt( 2.0 * M_PI);
float result = 0.5 * (val*val);
result = p * std::exp(- result);
return result;
};
std::vector< std::pair<float, float> kde( float *val, int len float h)
{
std::vector< std::pair<float, float>> density( len );
const float p = 1.0 / (h * len );
for(int r=0;r<len;r++)
{
float sum = 0;
for(int i=0;i<len;i++)
sum += k( (val[r] - val[i]) / h );
density[r] = std::make_pair( val[r], p*sum );
}
return density;
}
And I choosed h > 0. Am i right that p*sum is the probability for the value val[r] ? The sum over all probability is > 1 ( but looks ok for me ).
You misinterpreted the assumptions on the probability density here. The density integrates to one, whereas its values at certain points are definitely not 1.
Let's discuss it using the following formula from the linked Wikipedia article which you seem to use:
This formula provides the density f_h(x) evaluated at point x.
From my review, your code correctly evaluates this quantity. Yet, you misinterpreted the quantity which should be one. As a density, the integral over the complete space should yield one, i.e.
This property is called normalization of the density.
Moreover, being a density itself, each summand of f_h(x) should yield 1/n when integrated over the whole space, when one also includes the normalization constant. Again, there's no guarantee on the values of the summands.
In one dimension, you can easily confirm the normalization by using the trapezoidal rule or another quadrature scheme (--if you provide a working example, I can try to do that.)

How to let -1==-1.0000000000001

Here is a part of my code:
double tmp = OP.innerProduct(OQ);
double tmp2 = -1;
and the value of tmp and tmp2 is: (in binary)
tmp = 0b1011111111110000000000000000000000000000000000000000000000000001
tmp2= 0b1011111111110000000000000000000000000000000000000000000000000000
If I used acos(tmp), it will return nan.
I don't want the nan value, and I would like to ignore the small error to keep tmp in the range [-1,1].
How to do so?
EDIT:
I have two points given in spherical coordinate. ( for example, (r,45,45) (r,225,-45) )
Then I need to change them to cartesian coordinate. (a small error occur here!)
Then I want to compute the angle between two points.
The analytical solution is different to computer solution(since the small error).
I would like to make the two solutions same.
Are you trying to prevent branching? I usually make a little helper when I'm doing anything like this:
template<typename T>
inline T Clamp( T val, T low, T high ) {
return val < low ? low : (val > high ? high : val);
}
And then:
double result = acos( Clamp(tmp, -1.0, 1.0) );
If you're trying to write highly optimized code without branching, this won't help. Depending on the accuracy you require, you might consider making an acos lookup table and just put an extra value at each end to handle error-induced overflow.
[edit] I've just had a play around with a [-1,1] clamp without branching. Of course, this only cures inaccuracies. If you call it with a number that is grossly outside the range, it will bomb:
inline double safer_acos (double val)
{
double vals[] = {-1.0, val, val, 1.0};
return acos( vals[int(2.0 + val)] );
}

Code for normal distribution returns unexpected values [duplicate]

From this question: Random number generator which gravitates numbers to any given number in range? I did some research since I've come across such a random number generator before. All I remember was the name "Mueller", so I guess I found it, here:
Box-Mueller transform
I can find numerous implementations of it in other languages, but I can't seem to implement it correctly in C#.
This page, for instance, The Box-Muller Method for Generating Gaussian Random Numbers says that the code should look like this (this is not C#):
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
double gaussian(void)
{
static double v, fac;
static int phase = 0;
double S, Z, U1, U2, u;
if (phase)
Z = v * fac;
else
{
do
{
U1 = (double)rand() / RAND_MAX;
U2 = (double)rand() / RAND_MAX;
u = 2. * U1 - 1.;
v = 2. * U2 - 1.;
S = u * u + v * v;
} while (S >= 1);
fac = sqrt (-2. * log(S) / S);
Z = u * fac;
}
phase = 1 - phase;
return Z;
}
Now, here's my implementation of the above in C#. Note that the transform produces 2 numbers, hence the trick with the "phase" above. I simply discard the second value and return the first.
public static double NextGaussianDouble(this Random r)
{
double u, v, S;
do
{
u = 2.0 * r.NextDouble() - 1.0;
v = 2.0 * r.NextDouble() - 1.0;
S = u * u + v * v;
}
while (S >= 1.0);
double fac = Math.Sqrt(-2.0 * Math.Log(S) / S);
return u * fac;
}
My question is with the following specific scenario, where my code doesn't return a value in the range of 0-1, and I can't understand how the original code can either.
u = 0.5, v = 0.1
S becomes 0.5*0.5 + 0.1*0.1 = 0.26
fac becomes ~3.22
the return value is thus ~0.5 * 3.22 or ~1.6
That's not within 0 .. 1.
What am I doing wrong/not understanding?
If I modify my code so that instead of multiplying fac with u, I multiply by S, I get a value that ranges from 0 to 1, but it has the wrong distribution (seems to have a maximum distribution around 0.7-0.8 and then tapers off in both directions.)
Your code is fine. Your mistake is thinking that it should return values exclusively within [0, 1]. The (standard) normal distribution is a distribution with nonzero weight on the entire real line. That is, values outside of [0, 1] are possible. In fact, values within [-1, 0] are just as likely as values within [0, 1], and moreover, the complement of [0, 1] has about 66% of the weight of the normal distribution. Therefore, 66% of the time we expect a value outside of [0, 1].
Also, I think this is not the Box-Mueller transform, but is actually the Marsaglia polar method.
I am no mathematician, or statistician, but if I think about this I would not expect a Gaussian distribution to return numbers in an exact range. Given your implementation the mean is 0 and the standard deviation is 1 so I would expect values distributed on the bell curve with 0 at the center and then reducing as the numbers deviate from 0 on either side. So the sequence would definitely cover both +/- numbers.
Then since it is statistical, why would it be hard limited to -1..1 just because the std.dev is 1? There can statistically be some play on either side and still fulfill the statistical requirement.
The uniform random variate is indeed within 0..1, but the gaussian random variate (which is what Box-Muller algorithm generates) can be anywhere on the real line. See wiki/NormalDistribution for details.
I think the function returns polar coordinates. So you need both values to get correct results.
Also, Gaussian distribution is not between 0 .. 1. It can easily end up as 1000, but probability of such occurrence is extremely low.
This is a monte carlo method so you can't clamp the result, but what you can do is ignore samples.
// return random value in the range [0,1].
double gaussian_random()
{
double sigma = 1.0/8.0; // or whatever works.
while ( 1 ) {
double z = gaussian() * sigma + 0.5;
if (z >= 0.0 && z <= 1.0)
return z;
}
}