How to let -1==-1.0000000000001 - c++

Here is a part of my code:
double tmp = OP.innerProduct(OQ);
double tmp2 = -1;
and the value of tmp and tmp2 is: (in binary)
tmp = 0b1011111111110000000000000000000000000000000000000000000000000001
tmp2= 0b1011111111110000000000000000000000000000000000000000000000000000
If I used acos(tmp), it will return nan.
I don't want the nan value, and I would like to ignore the small error to keep tmp in the range [-1,1].
How to do so?
I have two points given in spherical coordinate. ( for example, (r,45,45) (r,225,-45) )
Then I need to change them to cartesian coordinate. (a small error occur here!)
Then I want to compute the angle between two points.
The analytical solution is different to computer solution(since the small error).
I would like to make the two solutions same.

Are you trying to prevent branching? I usually make a little helper when I'm doing anything like this:
template<typename T>
inline T Clamp( T val, T low, T high ) {
return val < low ? low : (val > high ? high : val);
And then:
double result = acos( Clamp(tmp, -1.0, 1.0) );
If you're trying to write highly optimized code without branching, this won't help. Depending on the accuracy you require, you might consider making an acos lookup table and just put an extra value at each end to handle error-induced overflow.
[edit] I've just had a play around with a [-1,1] clamp without branching. Of course, this only cures inaccuracies. If you call it with a number that is grossly outside the range, it will bomb:
inline double safer_acos (double val)
double vals[] = {-1.0, val, val, 1.0};
return acos( vals[int(2.0 + val)] );


Kernel Density Estimator ( with Gauss Kernel ) Sum f(x) = 1?

I want to use KDE with the Gaussian Kernel. If I'm correct, the sum of all f(x) must be 1 ( ~ rounding ) ?
My Implementation looks like this:
float K( float const& val)
const float p=1.0 / std::sqrt( 2.0 * M_PI);
float result = 0.5 * (val*val);
result = p * std::exp(- result);
return result;
std::vector< std::pair<float, float> kde( float *val, int len float h)
std::vector< std::pair<float, float>> density( len );
const float p = 1.0 / (h * len );
for(int r=0;r<len;r++)
float sum = 0;
for(int i=0;i<len;i++)
sum += k( (val[r] - val[i]) / h );
density[r] = std::make_pair( val[r], p*sum );
return density;
And I choosed h > 0. Am i right that p*sum is the probability for the value val[r] ? The sum over all probability is > 1 ( but looks ok for me ).
You misinterpreted the assumptions on the probability density here. The density integrates to one, whereas its values at certain points are definitely not 1.
Let's discuss it using the following formula from the linked Wikipedia article which you seem to use:
This formula provides the density f_h(x) evaluated at point x.
From my review, your code correctly evaluates this quantity. Yet, you misinterpreted the quantity which should be one. As a density, the integral over the complete space should yield one, i.e.
This property is called normalization of the density.
Moreover, being a density itself, each summand of f_h(x) should yield 1/n when integrated over the whole space, when one also includes the normalization constant. Again, there's no guarantee on the values of the summands.
In one dimension, you can easily confirm the normalization by using the trapezoidal rule or another quadrature scheme (--if you provide a working example, I can try to do that.)

Code for normal distribution returns unexpected values [duplicate]

From this question: Random number generator which gravitates numbers to any given number in range? I did some research since I've come across such a random number generator before. All I remember was the name "Mueller", so I guess I found it, here:
Box-Mueller transform
I can find numerous implementations of it in other languages, but I can't seem to implement it correctly in C#.
This page, for instance, The Box-Muller Method for Generating Gaussian Random Numbers says that the code should look like this (this is not C#):
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
double gaussian(void)
static double v, fac;
static int phase = 0;
double S, Z, U1, U2, u;
if (phase)
Z = v * fac;
U1 = (double)rand() / RAND_MAX;
U2 = (double)rand() / RAND_MAX;
u = 2. * U1 - 1.;
v = 2. * U2 - 1.;
S = u * u + v * v;
} while (S >= 1);
fac = sqrt (-2. * log(S) / S);
Z = u * fac;
phase = 1 - phase;
return Z;
Now, here's my implementation of the above in C#. Note that the transform produces 2 numbers, hence the trick with the "phase" above. I simply discard the second value and return the first.
public static double NextGaussianDouble(this Random r)
double u, v, S;
u = 2.0 * r.NextDouble() - 1.0;
v = 2.0 * r.NextDouble() - 1.0;
S = u * u + v * v;
while (S >= 1.0);
double fac = Math.Sqrt(-2.0 * Math.Log(S) / S);
return u * fac;
My question is with the following specific scenario, where my code doesn't return a value in the range of 0-1, and I can't understand how the original code can either.
u = 0.5, v = 0.1
S becomes 0.5*0.5 + 0.1*0.1 = 0.26
fac becomes ~3.22
the return value is thus ~0.5 * 3.22 or ~1.6
That's not within 0 .. 1.
What am I doing wrong/not understanding?
If I modify my code so that instead of multiplying fac with u, I multiply by S, I get a value that ranges from 0 to 1, but it has the wrong distribution (seems to have a maximum distribution around 0.7-0.8 and then tapers off in both directions.)
Your code is fine. Your mistake is thinking that it should return values exclusively within [0, 1]. The (standard) normal distribution is a distribution with nonzero weight on the entire real line. That is, values outside of [0, 1] are possible. In fact, values within [-1, 0] are just as likely as values within [0, 1], and moreover, the complement of [0, 1] has about 66% of the weight of the normal distribution. Therefore, 66% of the time we expect a value outside of [0, 1].
Also, I think this is not the Box-Mueller transform, but is actually the Marsaglia polar method.
I am no mathematician, or statistician, but if I think about this I would not expect a Gaussian distribution to return numbers in an exact range. Given your implementation the mean is 0 and the standard deviation is 1 so I would expect values distributed on the bell curve with 0 at the center and then reducing as the numbers deviate from 0 on either side. So the sequence would definitely cover both +/- numbers.
Then since it is statistical, why would it be hard limited to -1..1 just because the is 1? There can statistically be some play on either side and still fulfill the statistical requirement.
The uniform random variate is indeed within 0..1, but the gaussian random variate (which is what Box-Muller algorithm generates) can be anywhere on the real line. See wiki/NormalDistribution for details.
I think the function returns polar coordinates. So you need both values to get correct results.
Also, Gaussian distribution is not between 0 .. 1. It can easily end up as 1000, but probability of such occurrence is extremely low.
This is a monte carlo method so you can't clamp the result, but what you can do is ignore samples.
// return random value in the range [0,1].
double gaussian_random()
double sigma = 1.0/8.0; // or whatever works.
while ( 1 ) {
double z = gaussian() * sigma + 0.5;
if (z >= 0.0 && z <= 1.0)
return z;

Using the gaussian probability density function in C++

First, is this the correct C++ representation of the pdf gaussian function ?
float pdf_gaussian = ( 1 / ( s * sqrt(2*M_PI) ) ) * exp( -0.5 * pow( (x-m)/s, 2.0 ) );
Second, does it make sense of we do something like this ?
if(pdf_gaussian < uniform_random())
do something
do other thing
EDIT: An example of what exactly are you trying to achieve:
Say I have a data called Y1. Then a new data called Xi arrive. I want to see if I should associated Xi to Y1 or if I should keep Xi as a new data data that will be called Y2. This is based on the distance between the new data Xi and the existing data Y1. If Xi is "far" from Y1 then Xi will not be associated to Y1, otherwise if it is "not far", it will be associated to Y1. Now I want to model this "far" or "not far" using a gaussian probability based on the mean and stdeviation of distances between Y and the data that where already associated to Y in the past.
float pdf_gaussian = ( 1 / ( s * sqrt(2*M_PI) ) ) * exp( -0.5 * pow( (x-m)/s, 2.0 ) );
is not incorrect, but can be improved.
First, 1 / sqrt(2 Pi) can be precomputed, and using pow with integers is not a good idea: it may use exp(2 * log x) or a routine specialized for floating point exponents instead of simply x * x.
Example better code:
float normal_pdf(float x, float m, float s)
static const float inv_sqrt_2pi = 0.3989422804014327;
float a = (x - m) / s;
return inv_sqrt_2pi / s * std::exp(-0.5f * a * a);
You may want to make this a template instead of using float:
template <typename T>
T normal_pdf(T x, T m, T s)
static const T inv_sqrt_2pi = 0.3989422804014327;
T a = (x - m) / s;
return inv_sqrt_2pi / s * std::exp(-T(0.5) * a * a);
this allows you to use normal_pdf on double arguments also (it is not that much more generic though). There are caveats with the last code, namely that you have to beware not using it with integers (there are workarounds, but this makes the routine more verbose).
yes. boost::random has gaussian distribution.
See, for example, this question: How to use boost normal distribution classes?
As an alternative, there's a standard way of converting two uniformly distributed random numbers into two normally distributed numbers.
See, e.g. this question: Generate random numbers following a normal distribution in C/C++
In response to your last edit (note that the question is completely different as edited, hence my answer to an original one is irrelevant). I think you'd better off first formulating for yourself what exactly do you mean to mean by "modelling far or not far using a gaussian distribution". Then reformulate that understanding in math terms and only then start programming. As it stands, I think the problem is underspecified.
Use Box-Muller transform. This creates values with a normal/gaussian distribution.
It's not very complex to code using math libraries.
Generate 2 uniform numbers, use them to get two normally distributed numbers. Then Return one and save the other so that you have it for your 'next' request of a random number.
Since C++ 11, std::normal_distribution defined in the standard header random can be used to generate Gaussian random samples. More information can be found herein.

Fast equivalent to sin() for DSP referenced in STK

I'm using bits of Perry Cook's Synthesis Toolkit (STK) to generate saw and square waves. STK includes this BLIT-based sawtooth oscillator:
inline STKFloat BlitSaw::tick( void ) {
StkFloat tmp, denominator = sin( phase_ );
if ( fabs(denominator) <= std::numeric_limits<StkFloat>::epsilon() )
tmp = a_;
else {
tmp = sin( m_ * phase_ );
tmp /= p_ * denominator;
tmp += state_ - C2_;
state_ = tmp * 0.995;
phase_ += rate_;
if ( phase_ >= PI )
phase_ -= PI;
lastFrame_[0] = tmp;
return lastFrame_[0];
The square wave oscillator is broadly similar. At the top, there's this comment:
// A fully optimized version of this code would replace the two sin
// calls with a pair of fast sin oscillators, for which stable fast
// two-multiply algorithms are well known.
I don't know where to start looking for these "fast two-multiply algorithms" and I'd appreciate some pointers. I could use a lookup table instead, but I'm keen to learn what these 'fast sin oscillators' are. I could also use an abbreviated Taylor series, but thats way more than two multiplies. Searching hasn't turned up anything much, although I did find this approximation:
#define AD_SIN(n) (n*(2.f- fabs(n)))
Plotting it out shows that it's not really a close approximation outside the range of -1 to 1, so I don't think I can use it when phase_ is in the range -pi to pi:
Here, Sine is the blue line and the purple line is the approximation.
Profiling my code reveals that the calls to sin() are far and away the most time-consuming calls, so I really would like to optimise this piece.
EDIT Thanks for the detailed and varied answers. I will explore these and accept one at the weekend.
EDIT 2 Would the anonymous close voter please kindly explain their vote in the comments? Thank you.
Essentially the sinusoidal oscilator is one (or more) variables that change with each DSP step, rather than getting recalculated from scratch.
The simplest are based on the following trig identities: (where d is constant, and thus so is cos(d) and sin(d) )
sin(x+d) = sin(x) cos(d) + cos(x) sin(d)
cos(x+d) = cos(x) cos(d) - sin(x) sin(d)
However this requires two variables (one for sin and one for cos) and 4 multiplications to update. However this will still be far faster than calculating a full sine at each step.
The solution by Oli Charlesworth is based on solutions to this general equation
A_{n+1} = a A_{n} + A_{n-1}
Where looking for a solution of the form A_n = k e^(i theta n) gives an equation for theta.
e^(i theta (n+1) ) = a e^(i theta n ) + b e^(i theta (n-1) )
Which simplifies to
e^(i theta) - e^(-i theta ) = a
2 cos(theta) = a
A_{n+1} = 2 cos(theta) A_{n} + A_{n-1}
Whichever approach you use you'll either need to use one or two of these oscillators for each frequency, or use another trig identity to derive the higher or lower frequencies.
How accurate do you need this?
This function, f(x)=0.398x*(3.1076-|x|), does a reasonably good job for x between -pi and pi.
An even better approximation is f(x)=0.38981969947653056*(pi-|x|), which keeps the absolute error to 0.038158444604 or less for x between -pi and pi.
A least squares minimization will yield a slightly different function.
It's not possible to generate one-off sin calls with just two multiplies (well, not a useful approximation, at any rate). But it is possible to generate an oscillator with low complexity, i.e. where each value is calculated in terms of the preceding ones.
For instance, consider that the following difference equation will give you a sinusoid:
y[n] = 2*cos(phi)*y[n-1] - y[n-2]
(where cos(phi) is a constant)
(From the original author of the VST BLT code).
As a matter of fact, I was porting the VST BLT oscillators to C#, so I was googling for good sin oscillators. Here's what I came up with. Translation to C++ is straightforward. See the notes at the end about accuumulated round-off errors.
public class FastOscillator
private double b1;
private double y1, y2;
private double fScale;
public void Initialize(int sampleRate)
fScale = AudioMath.TwoPi / sampleRate;
// frequency in Hz. phase in radians.
public void Start(float frequency, double phase)
double w = frequency * fScale;
b1 = 2.0 * Math.Cos(w);
y1 = Math.Sin(phase - w);
y2 = Math.Sin(phase - w * 2);
public double Tick()
double y0 = b1 * y1 - y2;
y2 = y1;
y1 = y0;
return y0;
Note that this particular oscillator implementation will drift over time, so it needs to be re-initialzed periodically. In this particular implementation, the magnitude of the sin wave decays over time. The original comments in the STK code suggested a two-multiply oscillator. There are, in fact, two-multiply oscillators that are reasonably stable over time. But in retrospect, the need to keep the sin(phase), and sin(m*phase) oscillators tightly in synch probably means that they have to be resynched anyway. Round-off errors between phase and m*phase mean that even if the oscillators were stable, they would drift eventually, running a significant risk of producing large spikes in values near the zeros of the BLT functions. May as well use a one-multiply oscillator.
These particular oscillators should probably be re-initialized every 30 to 100 cycles (or so). My C# implementation is frame based (i.e. it calculates an float[] array of results in a void Tick(int count, float[] result) method. The oscillators are re-synched at the end of each Tick call. Something like this:
void Tick(int count, float[] result)
for (int i = 0; i < count; ++i)
result[i] = bltResult;
// re-initialize the oscillators to avoid accumulated drift.
this.phase = (this.phase + this.dPhase*count) % AudioMath.TwoPi;
Probably missing from the STK code. You might want to investigate this. The original code provided to the STK did this. Gary Scavone tweaked the code a bit, and I think the optimization was lost. I do know that the STK implementations suffer from DC drift, which can be almost entirely eliminated when implemented properly.
There's a peculiar hack that prevents DC drift of the oscillators, even when sweeping the frequency of the oscillators. The trick is that the oscillators should be started with an initial phase adjustment of dPhase/2. That just so happens to start the oscillators off with zero DC drift, without having to figure out wat the correct initial state for various integrators in each of the BLT oscillators.
Strangely, if the adjustment is re-adjusted whenever the frequency of the oscillator changes, then this also prevents wild DC drift of the output when sweeping the frequency of the oscillator. Whenever the frequency changes, subtract dPhase/2 from the previous phase value, recalculate dPhase for the new frequency, and then add dPhase/2.I rather suspect this could be formally proven; but I have not been able to so. All I know is that It Just Works.
For a block implementation, the oscillators should actually be initialized as follows, instead of carrying the phase adjustment in the current this.phase value.
You might want to take a look here:
There's some sample code that calculates a very good appoximation of sin/cos using only multiplies, additions and the abs() function. Quite fast too. The comments are also a good read.
It essentiall boils down to this:
float sine(float x)
const float B = 4/pi;
const float C = -4/(pi*pi);
const float P = 0.225;
float y = B * x + C * x * abs(x);
return P * (y * abs(y) - y) + y;
and works for a range of -PI to PI
If you can, you should consider memorization based techniques. Essentially store sin(x) and cos(x) values for a bunch values. To calculate sin(y), find a and b for which precomputed values exist such that a<=y<=b. Now using sin(a), sin(b), cos(a), cos(b), y-a and y-b approximately calculate sin(y).
The general idea of getting periodically sampled results from the sine or cosine function is to use a trig recursion or an initialized (barely) stable IIR filter (which can end up being pretty much the same computations). There are bunches of these in the DSP literature, of varying accuracy and stability. Choose carefully.

Is there a C/C++ function to safely handle division by zero?

We have a situation we want to do a sort of weighted average of two values w1 & w2, based on how far two other values v1 & v2 are away from zero... for example:
If v1 is zero, it doesn't get weighted at all so we return w2
If v2 is zero, it doesn't get weighted at all so we return w1
If both values are equally far from zero, we do a mean average and return (w1 + w2 )/2
I've inherited code like:
float calcWeightedAverage(v1,v2,w1,w2)
return (v1/(v1+v2))*w1 + (v2/(v1+v2)*w2);
For a bit of background, v1 & v2 represent how far two different knobs are turned, the weighting of their individual resultant effects only depends how much they are turned, not in which direction.
Clearly, this has a problem when v1==v2==0, since we end up with return (0/0)*w1 + (0/0)*w2 and you can't do 0/0. Putting a special test in for v1==v2==0 sounds horrible mathematically, even if it wasn't bad practice with floating-point numbers.
So I wondered if
there was a standard library function to handle this
there's a neater mathematical representation
You're trying to implement this mathematical function:
F(x, y) = (W1 * |x| + W2 * |y|) / (|x| + |y|)
This function is discontinuous at the point x = 0, y = 0. Unfortunately, as R. stated in a comment, the discontinuity is not removable - there is no sensible value to use at this point.
This is because the "sensible value" changes depending on the path you take to get to x = 0, y = 0. For example, consider following the path F(0, r) from r = R1 to r = 0 (this is equivalent to having the X knob at zero, and smoothly adjusting the Y knob down from R1 to 0). The value of F(x, y) will be constant at W2 until you get to the discontinuity.
Now consider following F(r, 0) (keeping the Y knob at zero and adjusting the X knob smoothly down to zero) - the output will be constant at W1 until you get to the discontinuity.
Now consider following F(r, r) (keeping both knobs at the same value, and adjusting them down simulatneously to zero). The output here will be constant at W1 + W2 / 2 until you go to the discontinuity.
This implies that any value between W1 and W2 is equally valid as the output at x = 0, y = 0. There's no sensible way to choose between them. (And further, always choosing 0 as the output is completely wrong - the output is otherwise bounded to be on the interval W1..W2 (ie, for any path you approach the discontinuity along, the limit of F() is always within that interval), and 0 might not even lie in this interval!)
You can "fix" the problem by adjusting the function slightly - add a constant (eg 1.0) to both v1 and v2 after the fabs(). This will make it so that the minimum contribution of each knob can't be zero - just "close to zero" (the constant defines how close).
It may be tempting to define this constant as "a very small number", but that will just cause the output to change wildly as the knobs are manipulated close to their zero points, which is probably undesirable.
This is the best I could come up with quickly
float calcWeightedAverage(float v1,float v2,float w1,float w2)
float a1 = 0.0;
float a2 = 0.0;
if (v1 != 0)
a1 = v1/(v1+v2) * w1;
if (v2 != 0)
a2 = v2/(v1+v2) * w2;
return a1 + a2;
I don't see what would be wrong with just doing this:
float calcWeightedAverage( float v1, float v2, float w1, float w2 ) {
static const float eps = FLT_MIN; //Or some other suitably small value.
v1 = fabs( v1 );
v2 = fabs( v2 );
if( v1 + v2 < eps )
return (w1+w2)/2.0f;
return (v1/(v1+v2))*w1 + (v2/(v1+v2)*w2);
Sure, no "fancy" stuff to figure out your division, but why make it harder than it has to be?
Personally I don't see anything wrong with an explicit check for divide by zero. We all do them, so it could be argued that not having it is uglier.
However, it is possible to turn off the IEEE divide by zero exceptions. How you do this depends on your platform. I know on windows it has to be done process-wide, so you can inadvertantly mess with other threads (and they with you) by doing it if you aren't careful.
However, if you do that your result value will be NaN, not 0. I highly dooubt that's what you want. If you are going to have to put a special check in there anyway with different logic when you get NaN, you might as well just check for 0 in the denominator up front.
So with a weighted average, you need to look at the special case where both are zero. In that case you want to treat it as 0.5 * w1 + 0.5 * w2, right? How about this?
float calcWeightedAverage(float v1,float v2,float w1,float w2)
if (v1 == v2) {
v1 = 0.5;
} else {
v1 = v1 / (v1 + v2); // v1 is between 0 and 1
v2 = 1 - v1; // avoid addition and division because they should add to 1
return v1 * w1 + v2 * w2;
You chould test for fabs(v1)+fabs(v2)==0 (this seems to be the fastest given that you've already computed them), and return whatever value makes sense in this case (w1+w2/2?). Otherwise, keep the code as-is.
However, I suspect the algorithm itself is broken if v1==v2==0 is possible. This kind of numerical instability when the knobs are "near 0" hardly seems desirable.
If the behavior actually is right and you want to avoid special-cases, you could add the minimum positive floating point value of the given type to v1 and v2 after taking their absolute values. (Note that DBL_MIN and friends are not the correct value because they're the minimum normalized values; you need the minimum of all positive values, including subnormals.) This will have no effect unless they're already extremely small; the additions will just yield v1 and v2 in the usual case.
The problem with using an explicit check for zero is that you can end up with discontinuities in behaviour unless you are careful as outlined in cafs response ( and if its in the core of your algorithm the if can be expensive - but dont care about that until you measure...)
I tend to use something that just smooths out the weighting near zero instead.
float calcWeightedAverage(v1,v2,w1,w2)
eps = 1e-7; // Or whatever you like...
return (v1/(v1+v2))*w1 + (v2/(v1+v2)*w2);
Your function is now smooth, with no asymptotes or division by zero, and so long as one of v1 or v2 is above 1e-7 by a significant amount it will be indistinguishable from a "real" weighted average.
If the denominator is zero, how do you want it to default? You can do something like this:
static inline float divide_default(float numerator, float denominator, float default) {
return (denominator == 0) ? default : (numerator / denominator);
float calcWeightedAverage(v1, v2, w1, w2)
v1 = fabs(v1);
v2 = fabs(v2);
return w1 * divide_default(v1, v1 + v2, 0.0) + w2 * divide_default(v2, v1 + v2, 0.0);
Note that the function definition and use of static inline should really let the compiler know that it can inline.
This should work
#include <float.h>
float calcWeightedAverage(v1,v2,w1,w2)
return (v1/(v1+v2+FLT_EPSILON))*w1 + (v2/(v1+v2+FLT_EPSILON)*w2);
I saw there may be problems with some precision so instead of using FLT_EPSILON use DBL_EPSILON for accurate results (I guess you will return a float value).
I'd do like this:
float calcWeightedAverage(double v1, double v2, double w1, double w2)
v1 = fabs(v1);
v2 = fabs(v2);
/* if both values are equally far from 0 */
if (fabs(v1 - v2) < 0.000000001) return (w1 + w2) / 2;
return (v1*w1 + v2*w2) / (v1 + v2);