How to convert extremely small exp values to 0? - c++

I'm doing Gram-Schmidt Orthogonalization process. At some point I'm getting output 3D vectors with values extremely small. Basically the values are zeros. How to deal with values such as -3.5527136788005009 * 10^-15?
How to convert them to zero or compare if it is almost zero?

You asked "How to convert them to zero?" If you want to convert extremely small values to zero you can use a simple if statement:
const double delta = 0.000000001;
if (x < delta && x > -delta) { x = 0; }

I did research on my old code and I've found this little func:
static const double eps = 1e-10;
bool isZero(double value) const
return std::abs(value) <= eps;


C++ float comparison

I want to compare floats. I had a problem when comparing equality so I used epsilon and it was solved
inline bool isEqual(float x, float y)
const float epsilon = 1e-5;
return abs(x - y) <= epsilon * abs(x);
but I also want to compare other comparisons such as '>' and '<='
I have two floats = 49 but when executing f1 > f2 it returns true.
I have tried this function:
inline bool isSmallerOrEqual(float x, float y)
const float epsilon = 0.01;
return epsilon * abs(x) <= epsilon * abs(y);
It worked but not for all values.
Any ideas ?
First, your isEquals function may be wrong, you're using the relative epsilon version. This version is works better with extremely large numbers, but breaks down when using extremely small numbers. The other version would be to use absolute epsilon. This version works well with small numbers (including extremely small numbers), but breaks down with extremely large numbers.
inline bool epsilonEquals(const float x, const float y, const float epsilon = 1E-5f)
return abs(x - y) <= epsilon;
Second, calling it isEquals is inaccurate, the function is generally known as Epsilon Equals because it doesn't validate that 2 numbers are equal, it validates that they are reasonably close to each other within a small margin of error (epsilon).
Third, if you want to check that 2 numbers are less than or equal to each other then you generally don't need or even want an epsilon equals function involved, doing so only increases the likelihood of false positive. If you do want to use epsilon comparison you can take advantage of paulmckenzie's method:
inline bool epsilonLessThanOrEqualTo(const float x, const float y, const float epsilon = 1E-5f)
return x <= y || epsilonEquals(x, y, epsilon);
If you want to have an epsilon not-equals method you can simply check that the absolute difference between the 2 numbers is larger than epsilon.
I was trying to add float values (all have one precision) through for loop, and then compare the result value with a normal declared float. But I was not getting the answers right.
I solved it by converting the floats to int by multiplying each with 10, then compare it with value*10

Make sure c++ decimal comparison is correct

I have two double variable. double a = 0.10000, double b = 0.1. How can I make sure the comparison (a == b) is always true ?
If you are being paranoid about using == on doubles or floats (which you should be) you can always check that they are close within a small tolerance.
bool same = fabs(a-b) < 0.000001;
The other answers here require you to scale the tolerance factor manually, which I wouldn't advise. For instance if you are comparing two numbers less than one millionth, one answer will always say the two numbers are "close enough." The other answer instead leaves it to the caller to specify which is equally error-prone.
I would instead suggest something like the following function. It will return 0 if the two doubles are within the stated range of each other, otherwise -1 (if d1 is smaller), or +1. Using fabs() may require you to link with the math library, such as with -lm.
#include <algorithm> // for max()
#include <cmath> // for fabs()
int double_compare( double d1, double d2 ) {
double dEpsilon = .00000001;
double dLarger = std::max( std::fabs(d1), std::fabs(d2) );
double dRange = dLarger * dEpsilon;
if ( std::fabs( d1 - d2 ) < dRange )
return 0;
return d1 < d2 ? -1: 1;
New answer to old question, but using epsilons is the way to go, check this example:
bool equals(const double a, const double b, const double maxRelativDiff = numeric_limits<double>::epsilon()) {
double difference = fabs(a - b);
const auto absoluteA = fabs(a);
const auto absoluteB = fabs(b);
double biggerBoi = (absoluteB > absoluteA) ? absoluteB : absoluteA; // Get the bigger number
return difference <= (biggerBoi * maxRelativDiff);
In this case you're checking if they are equal up to maxRelativDiff, so 0.0001 == 0.0001.

Efficient way to compute geometric mean of many numbers

I need to compute the geometric mean of a large set of numbers, whose values are not a priori limited. The naive way would be
double geometric_mean(std::vector<double> const&data) // failure
auto product = 1.0;
for(auto x:data) product *= x;
return std::pow(product,1.0/data.size());
However, this may well fail because of underflow or overflow in the accumulated product (note: long double doesn't really avoid this problem). So, the next option is to sum-up the logarithms:
double geometric_mean(std::vector<double> const&data)
auto sumlog = 0.0;
for(auto x:data) sum_log += std::log(x);
return std::exp(sum_log/data.size());
This works, but calls std::log() for every element, which is potentially slow. Can I avoid that? For example by keeping track of (the equivalent of) the exponent and the mantissa of the accumulated product separately?
The "split exponent and mantissa" solution:
double geometric_mean(std::vector<double> const & data)
double m = 1.0;
long long ex = 0;
double invN = 1.0 / data.size();
for (double x : data)
int i;
double f1 = std::frexp(x,&i);
return std::pow( std::numeric_limits<double>::radix,ex * invN) * std::pow(m,invN);
If you are concerned that ex might overflow you can define it as a double instead of a long long, and multiply by invN at every step, but you might lose a lot of precision with this approach.
EDIT For large inputs, we can split the computation in several buckets:
double geometric_mean(std::vector<double> const & data)
long long ex = 0;
auto do_bucket = [&data,&ex](int first,int last) -> double
double ans = 1.0;
for ( ;first != last;++first)
int i;
ans *= std::frexp(data[first],&i);
return ans;
const int bucket_size = -std::log2( std::numeric_limits<double>::min() );
std::size_t buckets = data.size() / bucket_size;
double invN = 1.0 / data.size();
double m = 1.0;
for (std::size_t i = 0;i < buckets;++i)
m *= std::pow( do_bucket(i * bucket_size,(i+1) * bucket_size),invN );
m*= std::pow( do_bucket( buckets * bucket_size, data.size() ),invN );
return std::pow( std::numeric_limits<double>::radix,ex * invN ) * m;
I think I figured out a way to do it, it combined the two routines in the question, similar to Peter's idea. Here is an example code.
double geometric_mean(std::vector<double> const&data)
const double too_large = 1.e64;
const double too_small = 1.e-64;
double sum_log = 0.0;
double product = 1.0;
for(auto x:data) {
product *= x;
if(product > too_large || product < too_small) {
sum_log+= std::log(product);
product = 1;
return std::exp((sum_log + std::log(product))/data.size());
The bad news is: this comes with a branch. The good news: the branch predictor is likely to get this almost always right (the branch should only rarely be triggered).
The branch could be avoided using Peter's idea of a constant number of terms in the product. The problem with that is that overflow/underflow may still occur within only a few terms, depending on the values.
You may be able to accelerate this by multiplying numbers as in your original solution and only converting to logarithms every certain number of multiplications (depending on the size of your initial numbers).
A different approach which would give better accuracy and performance than the logarithm method would be to compensate out-of-range exponents by a fixed amount, maintaining an exact logarithm of the cancelled excess. Like so:
const int EXP = 64; // maximal/minimal exponent
const double BIG = pow(2, EXP); // overflow threshold
const double SMALL = pow(2, -EXP); // underflow threshold
double product = 1;
int excess = 0; // number of times BIG has been divided out of product
for(int i=0; i<n; i++)
product *= A[i];
while(product > BIG)
product *= SMALL;
while(product < SMALL)
product *= BIG;
double mean = pow(product, 1.0/n) * pow(BIG, double(excess)/n);
All multiplications by BIG and SMALL are exact, and there's no calls to log (a transcendental, and therefore particularly imprecise, function).
There is simple idea to reduce computation and also to prevent overflow. You can group together numbers say atleast two at time and calculate their log and then evaluate their sum.
log(abcde) = 5*log(K)
log(ab) + log(cde) = 5*log(k)
Summing logs to compute products stably is perfectly fine, and rather efficient (if this is not enough: there are ways to get vectorized logarithms with a few SSE operations -- there are also Intel MKL's vector operations).
To avoid overflow, a common technique is to divide every number by the maximum or minimum magnitude entry beforehand (or sum log differences to the log max or log min). You can also use buckets if the numbers vary a lot (eg. sum the log of small numbers and large numbers separately). Note that typically neither of this is needed except for very large sets since the log of a double is never huge (between say -700 and 700).
Also, you need to keep track of the signs separately.
Computing log x keeps typically the same number of significant digits as x, except when x is close to 1: you want to use std::log1p if you need to compute prod(1 + x_n) with small x_n.
Finally, if you have roundoff error problems when summing, you can use Kahan summation or variants.
Instead of using logarithms, which are very expensive, you can directly scale the results by powers of two.
double geometric_mean(std::vector<double> const&data) {
double huge = scalbn(1,512);
double tiny = scalbn(1,-512);
int scale = 0;
double product = 1.0;
for(auto x:data) {
if (x >= huge) {
x = scalbn(x, -512);
} else if (x <= tiny) {
x = scalbn(x, 512);
product *= x;
if (product >= huge) {
product = scalbn(product, -512);
} else if (product <= tiny) {
product = scalbn(product, 512);
return exp2((512.0*scale + log2(product)) / data.size());

I have const float M = 0.000001; and float input;. I want to not equality check on them. But I know direct check has side effect M != input. So, my question how I can compare two float value without side effect ?
const double epsilon = 1e-12;
if(fabs(input - M) < epsilon) //input == M
if(fabs(input - M) >= epsilon) // input != M
The smaller the value of epsilon the more accurate the comparison is, therefore the more the probablity that it will tell you that two values are not equal whereas you wanted them to be considered equal. The larger the value of epsilon, the more the probability that it will tell you the results are equal when in fact you wanted them to be not equal. The value of epsilon should be chosen in accordance with the specifics of the task at hand.
When comparing floats, you have to compare them for being "close" instead of "equal." There are multiple ways to define "close" based on what you need. However, a typical approach could be something like:
namespace FloatCmp {
const float Eps = 1e-6f;
bool eq(float a, float b, float eps = Eps) {
return fabs(a - b) < eps;
//etc. for neq, lt, gt, ...
Then, use FloatCmp::eq() instead of == to compare floats.

How i can make matlab precision to be the same as in c++?

I have problem with precision. I have to make my c++ code to have same precision as matlab. In matlab i have script which do some stuff with numbers etc. I got code in c++ which do the same as that script. Output on the same input is diffrent :( I found that in my script when i try 104 >= 104 it returns false. I tried to use format long but it did not help me to find out why its false. Both numbers are type of double. i thought that maybe matlab stores somewhere the real value of 104 and its for real like 103.9999... So i leveled up my precision in c++. It also didnt help because when matlab returns me value of 50.000 in c++ i got value of 50.050 with high precision. Those 2 values are from few calculations like + or *. Is there any way to make my c++ and matlab scrips have same precision?
for i = 1:neighbors
y = spoints(i,1)+origy;
x = spoints(i,2)+origx;
% Calculate floors, ceils and rounds for the x and y.
fy = floor(y); cy = ceil(y); ry = round(y);
fx = floor(x); cx = ceil(x); rx = round(x);
% Check if interpolation is needed.
if (abs(x - rx) < 1e-6) && (abs(y - ry) < 1e-6)
% Interpolation is not needed, use original datatypes
N = image(ry:ry+dy,rx:rx+dx);
D = N >= C;
% Interpolation needed, use double type images
ty = y - fy;
tx = x - fx;
% Calculate the interpolation weights.
w1 = (1 - tx) * (1 - ty);
w2 = tx * (1 - ty);
w3 = (1 - tx) * ty ;
w4 = tx * ty ;
%Compute interpolated pixel values
N = w1*d_image(fy:fy+dy,fx:fx+dx) + w2*d_image(fy:fy+dy,cx:cx+dx) + ...
w3*d_image(cy:cy+dy,fx:fx+dx) + w4*d_image(cy:cy+dy,cx:cx+dx);
D = N >= d_C;
I got problems in else which is in line 12. tx and ty eqauls 0.707106781186547 or 1 - 0.707106781186547. Values from d_image are in range 0 and 255. N is value 0..255 of interpolating 4 pixels from image. d_C is value 0.255. Still dunno why matlab shows that when i have in N vlaues like: x x x 140.0000 140.0000 and in d_C: x x x 140 x. D gives me 0 on 4th position so 140.0000 != 140. I Debugged it trying more precision but it still says that its 140.00000000000000 and it is still not 140.
int Codes::Interpolation( Point_<int> point, Point_<int> center , Mat *mat)
int x = center.x-point.x;
int y = center.y-point.y;
Point_<double> my;
int a=my.x;
int b=my.y;
double tx = my.x - a;
double ty = my.y - b;
double wage[4];
wage[0] = (1 - tx) * (1 - ty);
wage[1] = tx * (1 - ty);
wage[2] = (1 - tx) * ty ;
wage[3] = tx * ty ;
int values[4];
//wpisanie do tablicy 4 pixeli ktore wchodza do interpolacji
for(int i=0;i<4;i++)
int val = mat->at<uchar>(Point_<int>(a+help[i].x,a+help[i].y));
double moze = (wage[0]) * (values[0]) + (wage[1]) * (values[1]) + (wage[2]) * (values[2]) + (wage[3]) * (values[3]);
return moze;
LEN = 0.707106781186547 Values in array values are 100% same as matlab values.
Matlab uses double precision. You can use C++'s double type. That should make most things similar, but not 100%.
As someone else noted, this is probably not the source of your problem. Either there is a difference in the algorithms, or it might be something like a library function defined differently in Matlab and in C++. For example, Matlab's std() divides by (n-1) and your code may divide by n.
First, as a rule of thumb, it is never a good idea to compare floating point variables directly. Instead of, for example instead of if (nr >= 104) you should use if (nr >= 104-e), where e is a small number, like 0.00001.
However, there must be some serious undersampling or rounding error somewhere in your script, because getting 50050 instead of 50000 is not in the limit of common floating point imprecision. For example, Matlab can have a step of as small as 15 digits!
I guess there are some casting problems in your code, for example
int i;
double d;
// ...
d = i/3 * d;
will will give a very inaccurate result, because you have an integer division. d = (double)i/3 * d or d = i/3. * d would give a much more accurate result.
The above example would NOT cause any problems in Matlab, because there everything is already a floating-point number by default, so a similar problem might be behind the differences in the results of the c++ and Matlab code.
Seeing your calculations would help a lot in finding what went wrong.
In c and c++, if you compare a double with an integer of the same value, you have a very high chance that they will not be equal. It's the same with two doubles, but you might get lucky if you perform the exact same computations on them. Even in Matlab it's dangerous, and maybe you were just lucky that as both are doubles, both got truncated the same way.
By you recent edit it seems, that the problem is where you evaluate your array. You should never use == or != when comparing floats or doubles in c++ (or in any languages when you use floating-point variables). The proper way to do a comparison is to check whether they are within a small distance of each other.
An example: using == or != to compare two doubles is like comparing the weight of two objects by counting the number of atoms in them, and deciding that they are not equal even if there is one single atom difference between them.
MATLAB uses double precision unless you say otherwise. Any differences you see with an identical implementation in C++ will be due to floating-point errors.