Gaussian integral in C++ is not working. Why? - c++

Here is my code:
const double kEps(0.00000001);
double gaussianIntegral(double x)
{
double res(x), numerator(x);
for(unsigned int i(1), k(3); (abs(numerator/k) > kEps) || (res < 0); ++i, k+=2)
{
numerator*=(-x*x/i);
res+=numerator/k;
}
return res;
}
Here is what I am trying to compute:
When I try to pass 30 as an argument my computations go forever. What is wrong? I am very stuck, it seems for me like there is no error and everything should work just fine, but still it is not so.

Although formally Taylor series converges, in practice you will run into machine precision limits even for the argument as small as 0.1 (and that is using kEps=0)
It's better to use (scaled appropriately) std::erf (C++11), or if it's a homework, look up an algorithm for computing erf function, e.g. here: https://math.stackexchange.com/questions/97/how-to-accurately-calculate-the-error-function-erfx-with-a-computer

Related

(Why) is the std::binomial_distribution biased for large probabilities p and slow for small n?

I want to generate binomially distributed random numbers in c++. Speed is a major concern. Not knowing a lot about random number generators, I use the standard libraries' tools. My code looks like something below:
#include <random>
static std::random_device random_dev;
static std::mt19937 random_generator{random_dev()};
std::binomial_distribution<int> binomial_generator;
void RandomInit(int s) {
//I create the generator object here to save time. Does this make sense?
binomial_generator = std::binomial_distribution<int>(1, 0.5);
random_generator.seed(s);
}
int binomrand(int n, double p) {
binomial_generator.param(std::binomial_distribution<int>::param_type(n, p));
return binomial_generator(random_generator);
}
To test my implementation, I have built a cython wrapper and then executed and timed the function from within python. For reference I have also implemented a "stupid" binomial distribution, which just returns the sum of Bernoulli trials.
int binomrand2(int n, double p) {
int result = 0;
for (int i = 0; i<n; i++) {
if (_Random() < p) //_Random is a thoroughly tested custom random number generator on U[0,1)
result++;
}
return result;
}
Timing showed that the latter implementation is about 50% faster than the former if n < 25. Furthermore, for p = 0.95, the former yielded significantly biased results (the mean over 1000000 trials for n = 40 was 38.23037; standard deviation is 0.0014; the result was reproducable with different seeds).
Is this a (known) issue with the standard library's functions or is my implementation wrong? What could I do to achieve my goal of obtaining accurate results with high efficiency?
The parameter n will mostly be below 100 and smaller values will occur more frequently.
I am open to suggestions outside the realm of the standard library, but I may not be able to use external software libraries.
I am using the VC 2019 compiler on 64bit Windows.
Edit
I have also tested the bias without using python:
double binomrandTest(int n, double p, long long N) {
long long result = 0;
for (long long i = 0; i<N; i++) {
result += binomrand(n, p);
}
return ((double) result) / ((double) N);
}
The result remained biased (38.228045 for the parameters above, where something like 38.000507 would be expected).

How does the cout statement affect the O/P of the code written?

#include <iostream>
#include <iomanip>
#include <math.h>
using namespace std;
int main() {
int t;
double n;
cin>>t;
while(t--)
{
cin>>n;
double x;
for(int i=1;i<=10000;i++)
{
x=n*i;
if(x==ceilf(x))
{
cout<<i<<endl;
break;
}
}
}
return 0;
}
For I/P:
3
5
2.98
3.16
O/P:
1
If my code is:
#include <iostream>
#include <iomanip>
#include <math.h>
using namespace std;
int main() {
int t;
double n;
cin>>t;
while(t--)
{
cin>>n;
double x;
for(int i=1;i<=10000;i++)
{
x=n*i;
cout<<"";//only this statement is added;
if(x==ceilf(x))
{
cout<<i<<endl;
break;
}
}
}
return 0;
}
For the same input O/P is:
1
50
25
The only extra line added in 2nd code is: cout<<"";
Can anyone please help in finding why there is such a difference in output just because of the cout statement added in the 2nd code?
Well this is a veritable Heisenbug. I've tried to strip your code down to a minimal replicating example, and ended up with this (http://ideone.com/mFgs0S):
#include <iostream>
#include <math.h>
using namespace std;
int main()
{
float n;
cin >> n; // this input is needed to reproduce, but the value doesn't matter
n = 2.98; // overwrite the input value
cout << ""; // comment this out => y = z = 149
float x = n * 50; // 149
float y = ceilf(x); // 150
cout << ""; // comment this out => y = z = 150
float z = ceilf(x); // 149
cout << "x:" << x << " y:" << y << " z:" << z << endl;
}
The behaviour of ceilf appears to depend on the particular sequence of iostream operations that occur around it. Unfortunately I don't have the means to debug in any more detail at the moment, but maybe this will help someone else to figure out what's going on. Regardless, it seems almost certain that it's a bug in gcc-4.9.2 and gcc-5.1. (You can check on ideone that you don't get this behaviour in gcc-4.3.2.)
You're probably getting an issue with floating point representations - which is to say that computers cannot perfectly represent all fractions. So while you see 50, the result is probably something closer to 50.00000000001. This is a pretty common problem you'll run across when dealing with doubles and floats.
A common way to deal with it is to define a very small constant (in mathematical terms this is Epsilon, a number which is simply "small enough")
const double EPSILON = 0.000000001;
And then your comparison will change from
if (x==ceilf(x))
to something like
double difference = fabs(x - ceilf(x));
if (difference < EPSILON)
This will smooth out those tiny inaccuracies in your doubles.
"Comparing for equality
Floating point math is not exact. Simple values like 0.2 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Different compilers and CPU architectures store temporary results at different precisions, so results will differ depending on the details of your environment. If you do a calculation and then compare the results against some expected value it is highly unlikely that you will get exactly the result you intended.
In other words, if you do a calculation and then do this comparison:
if (result == expectedResult)
then it is unlikely that the comparison will be true. If the comparison is true then it is probably unstable – tiny changes in the input values, compiler, or CPU may change the result and make the comparison be false."
From http://www.cygnus-software.com/papers/comparingfloats/Comparing%20floating%20point%20numbers.htm
Hope this answers your question.
Also you had a problem with
if(x==ceilf(x))
ceilf() returns a float value and x you have declared as a double.
Refer to problems in floating point comparison as to why that wont work.
change x to float and the program runs fine,
I made a plain try on my laptop and even online compilers.
g++ (4.9.2-10) gave the desired output (3 outputs), along with online compiler at geeksforgeeks.org. However, ideone, codechef did not gave the right output.
All I can infer is that online compilers name their compiler as "C++(gcc)" and give wrong output. While, geeksforgeeks.org, which names the compiler as "C++" runs perfectly, along with g++ (as tested on Linux).
So, we could arrive at a hypothesis that they use gcc to compile C++ code as a method suggested at this link. :)

Brute-force equation solving

I'm writing a program that uses brute-force to solve an equation. Unfortunately, I seem to have an error in my code somewhere, as my program stops at search = 0.19999. Here is the code:
#include <iostream>
#include <cmath>
#include <vector>
#define min -4.0
#define max 6.5
using namespace std;
double fx (double x){
long double result = cos(2*x)-0.4*x;
double scale = 0.00001;
double value = (int)(result / scale) * scale;
return value;
}
int sign (double a){
if(a<0) return -1;
if(a==0) return 0;
else return 1;
}
int main(){
vector <double> results;
double step, interval, start, end, search;
interval=(fabs(min)+fabs(max))/50;
step=0.00001;
start=min;
end=min+interval;
search=start;
while(end <= max){
if(sign(start) != sign(end)){
search=start;
while(search < end){
if(fx(search) == 0) results.push_back(search);
search=search+step;
}
}
start=end;
end=start + interval;
}
for(int i=0; i<results.size(); i++){
cout << results[i] << endl;
}
}
I've been looking at it for quite some time now and I still can't find the error in the code.
The program should check if there is a root in each given interval and, if yes, check every possibility in that interval. If it finds a root, it should push it into the results vector.
I know you already found the answer but I just spotted a problem while trying to find the bug. On line 37 you make the following comparison:
if(fx(search) == 0)
Since your fx function returns double. It's generally not advisable to test using the equal operator when dealing with double precision float numbers. Your result will probably never be exactly 0, then this test will never return true. I think you should use comparison using a maximum error margin, like this:
double maximum_error = 0.005;
if(abs(fx(search)) < maximum_error)
I think that would do the trick in your case. You may find more information on this link
Even if it's working right now, micro changes in your input numbers, CPU architecture or even compiler flags may break your program. It's highly dangerous to compare doubles in C++ like that, even though it's legal to do so.
I've just made a run through the code again and found the error.
if(sign(start) != sign(end))
was the culprit. There will be a root if the values of f(x) for start and end have different signs. Instead, I wrote that the if the signs of start and end are different, there will be a root. Sorry for the fuss.

Bad optimization of std::fabs()?

Recently i was working with an application that had code similar to:
for (auto x = 0; x < width - 1 - left; ++x)
{
// store / reset points
temp = hPoint = 0;
for(int channel = 0; channel < audioData.size(); channel++)
{
if (peakmode) /* fir rms of window size */
{
for (int z = 0; z < sizeFactor; z++)
{
temp += audioData[channel][x * sizeFactor + z + offset];
}
hPoint += temp / sizeFactor;
}
else /* highest sample in window */
{
for (int z = 0; z < sizeFactor; z++)
{
temp = audioData[channel][x * sizeFactor + z + offset];
if (std::fabs(temp) > std::fabs(hPoint))
hPoint = temp;
}
}
.. some other code
}
... some more code
}
This is inside a graphical render loop, called some 50-100 times / sec with buffers up to 192kHz in multiple channels. So it's a lot of data running through the innermost loops, and profiling showed this was a hotspot.
It occurred to me that one could cast the float to an integer and erase the sign bit, and cast it back using only temporaries. It looked something like this:
if ((const float &&)(*((int *)&temp) & ~0x80000000) > (const float &&)(*((int *)&hPoint) & ~0x80000000))
hPoint = temp;
This gave a 12x reduction in render time, while still producing the same, valid output. Note that everything in the audiodata is sanitized beforehand to not include nans/infs/denormals, and only have a range of [-1, 1].
Are there any corner cases where this optimization will give wrong results - or, why is the standard library function not implemented like this? I presume it has to do with handling of non-normal values?
e: the layout of the floating point model is conforming to ieee, and sizeof(float) == sizeof(int) == 4
Well, you set the floating-point mode to IEEE conforming. Typically, with switches like --fast-math the compiler can ignore IEEE corner cases like NaN, INF and denormals. If the compiler also uses intrinsics, it can probably emit the same code.
BTW, if you're going to assume IEEE format, there's no need for the cast back to float prior to the comparison. The IEEE format is nifty: for all positive finite values, a<b if and only if reinterpret_cast<int_type>(a) < reinterpret_cast<int_type>(b)
It occurred to me that one could cast the float to an integer and erase the sign bit, and cast it back using only temporaries.
No, you can't, because this violates the strict aliasing rule.
Are there any corner cases where this optimization will give wrong results
Technically, this code results in undefined behavior, so it always gives wrong "results". Not in the sense that the result of the absolute value will always be unexpected or incorrect, but in the sense that you can't possibly reason about what a program does if it has undefined behavior.
or, why is the standard library function not implemented like this?
Your suspicion is justified, handling denormals and other exceptional values is tricky, the stdlib function also needs to take those into account, and the other reason is still the undefined behavior.
One (non-)solution if you care about performance:
Instead of casting and pointers, you can use a union. Unfortunately, that only works in C, not in C++, though. That won't result in UB, but it's still not portable (although it will likely work with most, if not all, platforms with IEEE-754).
union {
float f;
unsigned u;
} pun = { .f = -3.14 };
pun.u &= ~0x80000000;
printf("abs(-pi) = %f\n", pun.f);
But, granted, this may or may not be faster than calling fabs(). Only one thing is sure: it won't be always correct.
You would expect fabs() to be implemented in hardware. There was an 8087 instruction for it in 1980 after all. You're not going to beat the hardware.
How the standard library function implements it is .... implementation dependent. So you may find different implementation of the standard library with different performance.
I imagine that you could have problems in platforms where int is not 32 bits. You 'd better use int32_t (cstdint>)
For my knowledge, was std::abs previously inlined ? Or the optimisation you observed is mainly due to suppression of the function call ?
Some observations on how refactoring may improve performance:
as mentioned, x * sizeFactor + offset can be factored out of the inner loops
peakmode is actually a switch changing the function's behaviour - make two functions rather than test the switch mid-loop. This has 2 benefits:
easier to maintain
fewer local variables and code paths to get in the way of optimisation.
The division of temp by sizeFactor can be deferred until outside the channel loop in the peakmode version.
abs(hPoint) can be pre-computed whenever hPoint is updated
if audioData is a vector of vectors you may get some performance benefit by taking a reference to audioData[channel] at the start of the body of the channel loop, reducing the array indexing within the z loop down to one dimension.
finally, apply whatever specific optimisations for the calculation of fabs you deem fit. Anything you do here will hurt portability so it's a last resort.
In VS2008, using the following to track the absolute value of hpoint and hIsNeg to remember whether it is positive or negative is about twice as fast as using fabs():
int hIsNeg=0 ;
...
//Inside loop, replacing
// if (std::fabs(temp) > std::fabs(hPoint))
// hPoint = temp;
if( temp < 0 )
{
if( -temp > hpoint )
{
hpoint = -temp ;
hIsNeg = 1 ;
}
}
else
{
if( temp > hpoint )
{
hpoint = temp ;
hIsNeg = 0 ;
}
}
...
//After loop
if( hIsNeg )
hpoint = -hpoint ;

C++ Exp vs. Log: Which is faster?

I have a C++ application in which I need to compare two values and decide which is greater. The only complication is that one number is represented in log-space, the other is not. For example:
double log_num_1 = log(1.23);
double num_2 = 1.24;
If I want to compare num_1 and num_2, I have to use either log() or exp(), and I'm wondering if one is easier to compute than the other (i.e. runs in less time, in general). You can assume I'm using the standard cmath library.
In other words, the following are semantically equivalent, so which is faster:
if(exp(log_num_1) > num_2)) cout << "num_1 is greater";
or
if(log_num_1 > log(num_2)) cout << "num_1 is greater";
AFAIK the algorithms, the complexity is the same, the difference should be only a (hopefully negligible) constant.
Due to this, I'd use the exp(a) > b, simply because it doesn't break on invalid input.
Do you really need to know? Is this going to occupy a large fraction of you running time? How do you know?
Worse, it may be platform dependent. Then what?
So sure, test it if you care, but spending much time agonizing over micro-optimization is usually a bad idea.
Edit: Modified the code to avoid exp() overflow. This caused the margin between the two functions to shrink considerably. Thanks, fredrikj.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main(int argc, char **argv)
{
if (argc != 3) {
return 0;
}
int type = atoi(argv[1]);
int count = atoi(argv[2]);
double (*func)(double) = type == 1 ? exp : log;
int i;
for (i = 0; i < count; i++) {
func(i%100);
}
return 0;
}
(Compile using:)
emil#lanfear /home/emil/dev $ gcc -o test_log test_log.c -lm
The results seems rather conclusive:
emil#lanfear /home/emil/dev $ time ./test_log 0 10000000
real 0m2.307s
user 0m2.040s
sys 0m0.000s
emil#lanfear /home/emil/dev $ time ./test_log 1 10000000
real 0m2.639s
user 0m2.632s
sys 0m0.004s
A bit surprisingly log seems to be the faster one.
Pure speculation:
Perhaps the underlying mathematical taylor series converges faster for log or something? It actually seems to me like the natural logarithm is easier to calculate than the exponential function:
ln(1+x) = x - x^2/2 + x^3/3 ...
e^x = 1 + x + x^2/2! + x^3/3! + x^4/4! ...
Not sure that the c library functions even does it like this, however. But it doesn't seem totally unlikely.
Since you're working with values << 1, note that x-1 > log(x) for x<1,
which means that x-1 < log(y) implies log(x) < log(y), which already
takes care of 1/e ~ 37% of the cases without having to use log or exp.
some quick tests in python (which uses c for math):
$ time python -c "from math import log, exp;[exp(100) for i in xrange(1000000)]"
real 0m0.590s
user 0m0.520s
sys 0m0.042s
$ time python -c "from math import log, exp;[log(100) for i in xrange(1000000)]"
real 0m0.685s
user 0m0.558s
sys 0m0.044s
would indicate that log is slightly slower
Edit: the C functions are being optimized out by compiler it seems, so the loop is what is taking up the time
Interestingly, in C they seem to be the same speed (possibly for reasons Mark mentioned in comment)
#include <math.h>
void runExp(int n) {
int i;
for (i=0; i<n; i++) {
exp(100);
}
}
void runLog(int n) {
int i;
for (i=0; i<n; i++) {
log(100);
}
}
int main(int argc, char **argv) {
if (argc <= 1) {
return 0;
}
if (argv[1][0] == 'e') {
runExp(1000000000);
} else if (argv[1][0] == 'l') {
runLog(1000000000);
}
return 0;
}
giving times:
$ time ./exp l
real 0m2.987s
user 0m2.940s
sys 0m0.015s
$ time ./exp e
real 0m2.988s
user 0m2.942s
sys 0m0.012s
It can depend on your libm, platform and processor. You're best off writing some code that calls exp/log a large number of times, and using time to call it a few times to see if there's a noticeable difference.
Both take basically the same time on my computer (Windows), so I'd use exp, since it's defined for all values (assuming you check for ERANGE). But if it's more natural to use log, you should use that instead of trying to optimise without good reason.
It would make sense that log be faster... Exp has to perform several multiplications to arrive at its answer whereas log only has to convert the mantissa and exponent to base-e from base-2.
Just be sure to boundary check (as many others have said) if you're using log.
if you are sure that this is hotspot -- compiler instrinsics are your friends. Although it`s platform-dependent (if you go for performance in places like this -- you cannot be platform-agnostic)
So. The question really is -- which one is asm instruction on your target architecture -- and latency+cycles. Without this it is pure speculation.