Bug in the C++ standard library in std::poisson_distribution? - c++

I think I have encountered an incorrect behaviour of std::poisson_distribution from C++ standard library.
Questions:
Could you confirm it is indeed a bug and not my error?
What exactly is wrong in the standard library code of poisson_distribution function, assuming that it is indeed a bug?
Details:
The following C++ code (file poisson_test.cc) is used to generate Poisson-distributed numbers:
#include <array>
#include <cmath>
#include <iostream>
#include <random>
int main() {
// The problem turned out to be independent on the engine
std::mt19937_64 engine;
// Set fixed seed for easy reproducibility
// The problem turned out to be independent on seed
engine.seed(1);
std::poisson_distribution<int> distribution(157.17);
for (int i = 0; i < 1E8; i++) {
const int number = distribution(engine);
std::cout << number << std::endl;
}
}
I compile this code as follows:
clang++ -o poisson_test -std=c++11 poisson_test.cc
./poisson_test > mypoisson.txt
The following python script was used to analyze the sequence of random numbers from file mypoisson.txt:
import numpy as np
import matplotlib.pyplot as plt
def expectation(x, m):
" Poisson pdf "
# Use Ramanujan formula to get ln n!
lnx = x * np.log(x) - x + 1./6. * np.log(x * (1 + 4*x*(1+2*x))) + 1./2. * np.log(np.pi)
return np.exp(x*np.log(m) - m - lnx)
data = np.loadtxt('mypoisson.txt', dtype = 'int')
unique, counts = np.unique(data, return_counts = True)
hist = counts.astype(float) / counts.sum()
stat_err = np.sqrt(counts) / counts.sum()
plt.errorbar(unique, hist, yerr = stat_err, fmt = '.', \
label = 'Poisson generated \n by std::poisson_distribution')
plt.plot(unique, expectation(unique, expected_mean), \
label = 'expected probability \n density function')
plt.legend()
plt.show()
# Determine bins with statistical significance of deviation larger than 3 sigma
deviation_in_sigma = (hist - expectation(unique, expected_mean)) / stat_err
d = dict((k, v) for k, v in zip(unique, deviation_in_sigma) if np.abs(v) > 3.0)
print d
The script produces the following plot:
You can see the problem by bare eye. The deviation at n = 158 is statistically significant, it is in fact a 22σ deviation!
Close-up of the previous plot.

My system is set up as follows (Debian testing):
libstdc++-7-dev:
Installed: 7.2.0-16
libc++-dev:
Installed: 3.5-2
clang:
Installed: 1:3.8-37
g++:
Installed: 4:7.2.0-1d1
I can confirm the bug when using libstdc++:
g++ -o pois_gcc -std=c++11 pois.cpp
clang++ -o pois_clang -std=c++11 -stdlib=libstdc++ pois.cpp
clang++ -o pois_clang_libc -std=c++11 -stdlib=libc++ pois.cpp
Result:

Related

C++ Eigen reductions are slower than simple loop

Given two 3xK matrices I wish to compute the squared average of column by column dot products.
This can be accomplished with a simple loop:
Eigen::Matrix<float,3,Eigen::Dynamic> L(3,K);
Eigen::Matrix<float,3,Eigen::Dynamic> P(3,K);
float distance = 0;
for (int q = 0; q < K; q++){
const Eigen::Vector3f& line = L.col(q);
const Eigen::Vector3f& point = P.col(q);
const float d = line.dot(point);
distance += d*d;
}
const float residual2 = distance / K;
which outperforms (g++ -O3 -DNDEBUG) the fancier reduction techniques, e.g.:
const float residual2 = (L.array() * P.array()).colwise().sum().square().mean();
const float residual2 = L.cwiseProduct(P).array().colwise().sum().array().square().mean();
const float residual2 = (L.transpose() * P).diagonal().array().square().mean();
Perhaps there is something I am missing here. Shouldn't the reductions be faster?
Edit: Using K = 20.
I perform each of the above 100*632*631 times with the loop version taking about 1200 msec while the others would take around 2000 msec. 3.2 GHz Intel Core i5, MacOS, clang++ -O3
Edit2: Created a small test program. Adding -NDEBUG for compiling made a huge difference (I thought you got this for free with -O3). The loop version is significantly faster than
the reductions:
./eigentest
CASE 1: 12 milliseconds
solution = 1482.5
CASE 2: 835 milliseconds
solution = 1482.5
CASE 3: 849 milliseconds
solution = 1482.5
CASE 4: 843 milliseconds
solution = 1482.5
Edit3: I think the test above is crap since the compiler unrolled the loop.... sigh... I'll get back to this soon...

How to generate a different set of random numbers in each iteration of a prallelized For loop?

The following problem arised directly due to applying the answer to this question.
In the minimal working example (MWE) there's a place in the myscript definition where I generate some random numbers, then perform some operations on them, and fnally write the output to a file. When this code is un-parallelized, it works correct. However, if it's parallel (I'm testing it on a 2-core machine, and have two threads at a time), when I want to perform 4 iterations (boot) I get twice the same output (i.e., among four outputs I get only two distinct numbers, not four as expected). How can this be fixed?
MWE:
import random
import math
import numpy as np
import multiprocessing as mp
from multiprocessing import Pool
boot = 4
RRpoints = 278
def myscript(iteration_number):
RRfile_name = "outputRR%d.txt" % iteration_number
with open(RRfile_name, "w") as RRf:
col1 = np.random.uniform(0 , 1 , RRpoints)
col2 = np.random.uniform(0 , 1 , RRpoints)
sph1 = [i * 2 * math.pi for i in col1]
sph2 = [math.asin(2 * i - 1) for i in col2]
for k in xrange(0 , RRpoints):
h = 0
mltp = sph1[k] * sph2[k]
h += mltp
RRf.write("%s\n" % h)
x = xrange(boot)
p = mp.Pool()
y = p.imap(myscript, x)
list(y)

Fastest way to compute the cdf of the Normal distribution over vectors - R::pnorm vs erfc vs?

I hope my reworded question now fits the criteria of Stackoverflow. Please consider the example below. I am writing a Log-Likelihood function in which computing the cdf over vectors is the most time consuming part. Example 1 uses the R::pnorm, Example 2 approximates the normal cdf with erfc. As you can see the results are sufficiently similar, the ercf version is a bit faster.
In practice (within an MLE) however it turns out that the ercf is not as precise, which lets the algorithm run into inf areas unless one sets the constraints accurately. My questions:
1) Am I missing something? Is it necessary to implement some error handling (for the erfc)?
2) Do you have any other suggestions to speed up the code, or alternatives? Does it pay off to look into parallelizing the for-loop?
require(Rcpp)
require(RcppArmadillo)
require(microbenchmark)
#Example 1 : standard R::pnorm
src1 <- '
NumericVector ppnorm(const arma::vec& x,const arma::vec& mu,const arma::vec& sigma, int lt, int lg) {
int n = x.size();
arma::vec res(n);
for (int i=0; i<n; i++) {
res(i) = R::pnorm(x(i),mu(i),sigma(i),lt,lg);
}
return wrap(res);
}
'
#Example 2: approximation with ercf
src2 <- '
NumericVector ppnorm(const arma::vec& x,const arma::vec& mu,const arma::vec& sigma, int lt, int lg) {
int n = x.size();
arma::vec res(n);
for (int i=0; i<n; i++) {
res(i) = 0.5 * erfc(-(x(i) - mu(i))/sigma(i) * M_SQRT1_2);
}
if (lt==0 & lg==0) {
return wrap(1 - res);
}
if (lt==1 & lg==0) {
return wrap(res);
}
if (lt==0 & lg==1) {
return wrap(log(1 - res));
}
if (lt==1 & lg==1) {
return wrap(log(res));
}
}
'
#some random numbers
xex = rnorm(100,5,4)
muex = rnorm(100,3,1)
siex = rnorm(100,0.8,0.3)
#compile c++ functions
func1 = cppFunction(depends = "RcppArmadillo",code=src1) #R::pnorm
func2 = cppFunction(depends = "RcppArmadillo",code=src2) #ercf
#run with exemplaric data
res1 = func1(xex,muex,siex,1,0)
res2 = func2(xex,muex,siex,1,0)
# sum of squared errors
sum((res1 - res2)^2,na.rm=T)
# 6.474419e-32 ... very small
#benchmarking
microbenchmark(func1(xex,muex,siex,1,0),func2(xex,muex,siex,1,0),times=10000)
#Unit: microseconds
#expr min lq mean median uq max neval
#func1(xex, muex, siex, 1, 0) 11.225 11.9725 13.72518 12.460 13.617 103.654 10000
#func2(xex, muex, siex, 1, 0) 8.360 9.1410 10.62114 9.669 10.769 205.784 10000
#my machine: Ubuntu 14.04 LTS, i7 2640M 2.8 Ghz x 4, 8GB memory, RRO 3.2.0 based on version R 3.2.0
1) Well, you really should use R's pnorm() as your 0-th example.
You don't, you use the Rcpp interface to it. R's pnorm() is already nicely vectorized R-internally (i.e. on C level) so may well be comparative or even faster than Rcpp. Also it does have the advantage to cover cases of NA, NaN, Inf, etc..
2) If you are talking about MLE, and you are concerned about speed and accuracy, you almost surely should rather work with the logarithms, and maybe not withpnorm() but rather dnorm() ?

Concatenate and Lists in Rcpp

I am just getting starting with Rcpp so this might be a very stupid question. Here is the specific question (context is provided below)
What is the Rcpp equivalent of
odes <- c(A = 1.0, B = 2.0, C = 3.0, D = 4.0, E = 5.0, F = 6.0, G = 7.0)
list(odes)
Context - I am trying to solve a system of Ordinary Differential Equation (ODEs) using the deSolve package's vode solver, but using Rcpp package to write the right hand side of ODEs in a compiled code. The solver expects the function which forms the RHS of ODEs to return a list, specifically in this case the RHS from a .R function (which the solver was able to intergrate successfully) was of the form
> X
[[1]]
9000000.00 -9000000.00 0.00 19993.04 -19993.04 -19993.04 -9000000.00
and I want my .cpp file to spit out odes as a list of similar form.
Any help here would be much appreciated!!
As suggested below, I am pasting the code to show exactly what I am doing
#include <Rcpp.h>
using namespace Rcpp;
// This is a simple example of exporting a C++ function to R. You can
// source this function into an R session using the Rcpp::sourceCpp
// function (or via the Source button on the editor toolbar). Learn
// more about Rcpp at:
//
// http://www.rcpp.org/
// http://adv-r.had.co.nz/Rcpp.html
// http://gallery.rcpp.org/
//
// [[Rcpp::export]]
List odes_gprotein(double t, NumericVector A, NumericVector p) {
NumericVector odes_vec(A.length());
List odes(1);
double Flux1 = p[1] * A[4] * A[5] - p[0] * A[3];
double Flux2 = p[2] * A[5] - p[3];
double Flux3 = p[4] * A[3];
double Flux4 = p[5] * A[1] * A[6];
double Flux5 = p[6] * A[0] * A[3];
double Flux6 = p[7] * A[2];
odes_vec[0] = (Flux4 - Flux5);
odes_vec[1] = (-Flux4 + Flux6);
odes_vec[2] = (Flux5 - Flux6);
odes_vec[3] = (Flux1 - Flux3);
odes_vec[4] = (-Flux1);
odes_vec[5] = (-Flux1 - Flux2);
odes_vec[6] = (-Flux4 + Flux5);
odes = List(odes_vec);
return odes;
}
This function returns (when I supply some value of t, p and A) the following,
> Rcpp::sourceCpp('odes_gprotein.cpp')
> X <- odes_gprotein(0,IC,p)
> str(X)
List of 7
$ : num 9e+06
$ : num -9e+06
$ : num 0
$ : num 19993
$ : num -19993
$ : num -19993
$ : num -9e+06
Whereas, what I need is the X as mentioned above
> X
[[1]]
9000000.00 -9000000.00 0.00 19993.04 -19993.04 -19993.04 -9000000.00
where
str(X)
List of 1
$ : num [1:7] 9e+06 -9e+06 0e+00 2e+04 -2e+04 ...
Thank you for your suggestions!
We still do not really know what you want or tried, but here is a minimal existence proof for you:
R> cppFunction('List mylist(IntegerVector x) { return List(x); }')
R> mylist(c(2:4))
[[1]]
[1] 2
[[2]]
[1] 3
[[3]]
[1] 4
R>
In goes a vector, out comes a list. Have a look at the Rcpp examples and eg the Rcpp Gallery site.
Turns out I was creating the List in a wrong way, I had to remove the following
List odes(1); and List(odes_vec); and return (odes); and had to add the following statement at the end
return Rcpp::List::create(odes_vec);
A better explanation can be found here

Convergence criteria for scipy.eigvalsh

I am using python (scipy) to compute eigenvalues of a symmetric real matrix. I am currently using the
scipy.linalg.eigvalsh
function to compute the eigenvalues (http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.eigvalsh.html#scipy.linalg.eigvalsh). Looking at the source code for eigvalsh it appears that python makes a call to a fortran package. It also mentions, in the documentation, that an error will be thrown in the computation does not converge.
My question is: what is the convergence criteria? and can I change it (relatively easily)?
In my specific application I compute the eigenvalues of a sequence of matrices and I am noticing strong correlation between several of the eigenvalues. I want to know if the correlation is not perfect purely because of numerical reasons. If I can strengthen the convergence criteria then I can see if the dependence increases.
If I read the source code right, the LAPACK function dsyevr() is used. If I understand it right, fiddling with its parameters will not necessarily get you a higher accuracy. If you need high accuracy, you could try mpmath:
import numpy as np
from mpmath import mp
print("*** Scipy calculations: ***")
# Generate matrix:
n = 25
AA = np.random.randn(n, n)
HH = np.dot(AA, AA.T)
# Calculate eigenvalues and -vectors:
w, VV = eigh(HH) # eigvalsh() calls also eigh()
# Check Result:
HH2 = np.dot(VV, np.diag(w).dot(VV.T))
dHH = HH - HH2
elem_diff_max = np.abs(HH-HH2).max()
print("Elements differ by maximally {}".format(np.abs(dHH).max()))
print("Froebenius norm: {}".format(np.linalg.norm(HH-HH2,'fro')))
print("")
print("*** Mpmath calculations (very slow): *** ")
mp.dps = 40 # number of precision digits for mpmath
mHH = mp.matrix(HH) # take previous atrix
mw, mVV = mp.eigh(mHH) # and do eigem decomposition
# Check rsults:
mHH2 = mVV*mp.diag(mw)*mVV.T
mdHH = mHH-mHH2
#Curiously I could not figure out how to determine abs(mdHH).max(),
hmax = mp.mpf(0)
for r in mdHH.tolist():
for c in r:
mc = c if c >= 0 else -c
hmax = mc if mc > hmax else hmax
print("Elements differ by maximally {}".format(hmax))
print("Froebenius norm: {}".format(mp.norm(mdHH)))
# Sample output (differs because of randn()):
#
# *** Scipy calculations: ***
# Elements differ by maximally 6.48370246381e-14
# Froebenius norm: 4.90996840307e-13
# *** Mpmath calculations (very slow): ***
# Elements differ by maximally 5.510129769479472693603452518229276614775e-39
# Froebenius norm: 3.772588954060141733111171961647528674136e-38