Using Boost.Units and Boost.Multiprecision - c++

I am attempting to write a molecular dynamics program, and I thought that Boost.Units was a logical choice for the variables, and I also decided that Boost.Multiprecision offered a better option than double or long double with respect to round off errors. A combination of the two seems fairly straight forward until I attempt to use a constant, then it breaks down.
#include <boost/multiprecision/gmp.hpp>
#include <boost/units/io.hpp>
#include <boost/units/pow.hpp>
#include <boost/units/quantity.hpp>
#include <boost/units/systems/si.hpp>
#include <boost/units/systems/si/codata/physico-chemical_constants.hpp>
namespace units = boost::units;
namespace si = boost::si;
namespace mp = boost::multiprecision;
units::quantity<si::mass, mp::mpf_float_50> mass = 1.0 * si::kilogram;
units::quantity<si::temperature, mp::mpf_float_50> temperature = 300. * si::kelvin;
auto k_B = si::constants::codata::k_B; // Boltzmann constant
units::quantity<si::velocity, mp::mpf_float_50> velocity = units::root<2>(temperature * k_B / mass);
std::cout << velocity << std::endl;
The output will be 1 M S^-1. If I use long double in lieu of mp::mpf_float_50, then the result is 2.87818e-11 m s^-1. I know that the problem likes within the conversion between the constant and the other data because the constant defaults to a double. I have thought about creating my own Boltzmann constant, but I prefer to use the predefined value if possible.
My question, therefore, is how do I go about using Boost.Multiprecision when I have predefined constants from Boost.Units? If I must concede to using double or long double, then I will, but I suspect that a way exists to convert or utilize the other on the constants.
I am working with Mac OS X 10.7, Xcode 4.6.2, Clang 3.2, Boost 1.53.0 and the C++11 extensions.
I appreciate any help that can be offered.

I'd advise you not to use multiple precision arithmetic for molecular dynamics simulations because the time-step integration will be painfully slow. If the goal is to preserve total energy as much as possible, then just use Verlet or any other symplectic integrator. Multiple precision arithmetic (or long double, or compensated summation with plain double) may be useful for aggregating ensemble averages, though.
Besides, if you write your simulation code using dimensionless (reduced) units you will also get rid of the dependency on Boost.Units.

Related

A limitation in the eval mechanism of gmpxx in GMP?

The GNU MP library provides a C++ interface in gmpxx.h which overloads arithmetic operators (among other functions), to make it easier for developers to write mathematical expressions using arbitrary-precision types, just like expressions that use native numerical types (see this answer).
While running code to write this answer, I came across an unexpected behaviour in how the interface handles expressions containing GMP types: passing a subtraction of mpf_t variables directly as an argument to gmp_printf gives the wrong answer, but saving the subtraction result into an intermediary variable works fine.
C++ code:
#include <iostream>
#include <gmpxx.h>
using namespace std;
int main (void) {
mpf_class actual = 1. / 6;
mpf_class expected("0.166666666666666666666666666666667");
gmp_printf("diff %.50Ff\n", expected - actual);
mpf_class diff = expected - actual;
gmp_printf("diff %.50Ff\n", diff);
}
The output:
diff 0.16666666666666666666700000000000000000000000000000
diff 0.00000000000000000925185853854297150353000000000000
And, assuming the name testgmpxx.cpp, compile with:
g++ testgmpxx.cpp -o testgmpxx -lgmpxx -lgmp
Is this a limitation in the gmpxx.h eval mechanism? Are there are instances where you can't use GMP types on mathematical expressions?

C++ Eigen for solving linear systems fast

So I wanted to test the speed of C++ vs Matlab for solving a linear system of equations. For this purpose I create a random system and measure the time required to solve it using Eigen on Visual Studio:
#include <Eigen/Core>
#include <Eigen/Dense>
#include <chrono>
using namespace Eigen;
using namespace std;
int main()
{
chrono::steady_clock sc; // create an object of `steady_clock` class
int n;
n = 5000;
MatrixXf m = MatrixXf::Random(n, n);
VectorXf b = VectorXf::Random(n);
auto start = sc.now(); // start timer
VectorXf x = m.lu().solve(b);
auto end = sc.now();
// measure time span between start & end
auto time_span = static_cast<chrono::duration<double>>(end - start);
cout << "Operation took: " << time_span.count() << " seconds !!!";
}
Solving this 5000 x 5000 system takes 6.4 seconds on average. Doing the same in Matlab takes 0.9 seconds. The matlab code is as follows:
a = rand(5000); b = rand(5000,1);
tic
x = a\b;
toc
According to this flowchart of the backslash operator:
given that a random matrix is not triangular, permuted triangular, hermitian or upper heisenberg, the backslash operator in Matlab uses a LU solver, which I believe is the same solver that I'm using on the C++ code, that is, lu().solve
Probably there is something that I'm missing, because I thought C++ was faster.
I am running it with release mode active on the Configuration Manager
Project Properties - C/C++ - Optimization - /O2 is active
Tried using Enhanced Instructions (SSE and SSE2). SSE actually made it slower and SSE2 barely made any difference.
I am using Community version of Visual Studio, if that makes any difference
First of all, for this kind of operations Eigen is very unlikely to beat MatLab because the later will directly call Intel's MKL which is heavily optimized and multi-threaded. Note that you can also configure Eigen to fallback to MKL, see how. If you do so, you'll end up with similar performance.
Nonetheless, 6.4s is way to much. The Eigen's documentation reports 0.7s for factorizing a 4k x 4k matrix. Running your example on my computer (Haswell laptop #2.6GHz) I got 1.6s (clang 7, -O3 -march=native), and 1s with multithreading enabled (-fopenmp). So make sure you enable all your CPU's feature (AVX, FMA) and openmp. With OpenMP you might need to explicitly reduce the number of openmp threads to the number of physical cores.

Mixed units with boost::units

In my program, I would like to take advantage of boost::units for type safe computations and automatic conversions. As a novice user of the library, I have a basic understanding of how it works, and why implicit type conversion is forbidden.
Right now I can write code like this
using namespace boost::units;
using namespace boost::units::si;
quantity<mass> my_mass(50 * kilogram);
quantity<force> my_force = my_mass * 9.81 * meter / pow<2>(second);
Where my_mass will be expressed in kilograms and my_force in Newton. But for convenience when interacting with other libraries which accept only double, I would prefer forces to be in kiloNewton (and, similarly, pressures in MegaPascal). So I do this:
typedef make_scaled_unit<force, scale<10, static_rational<3>>>::type kiloforce;
quantity<kiloforce> scaled_force(my_mass * 9.81 * meter / pow<2>(second));
Which works, but forces an explicit conversion. The following code, rightfully, does not compile:
quantity<kiloforce> scaled_force = my_mass * 9.81 * meter / pow<2>(second);
Because it represents an implicit conversion. My question is then: is there a way to configure the library so that quantities are expressed in the scaled unit of choice?
After all, this is the case for "kilogram", so I looked into scaled units, but I cannot seem to find a way to make it work. The idea would have been to define a custom system, but since mass, force and pressure are related to each other, this is not possible, as explained here.
The basic problem seems to be that you're looking for a dimensionless force in kilonewtons. That's not hard :
quantity<force> my_force = my_mass * 9.81 * meter / pow<2>(second);
double F = my_force / (kilo*newtons); // my_force in kN
The physics problem seems to be "kiloforce". That's not how physics works. Forces do not have prefixes, units do. E.g. length vs. meter/kilometer and mass vs gram/kilogram. Similarly, force vs Nnewton/kiloNewton.
That said, may I suggest
make_scaled_unit<acceleration, scale<10, static_rational<3>>>::type g(0.00981);
Don't ask me why you'd express the earth's acceleration in kilometers per second squared, but a kiloNewton is a kilometer kilogram per second squared.

C++11 round off error using pow() and std::complex

Running the following
#include <iostream>
#include <complex>
int main()
{
std::complex<double> i (0,1);
std::complex<double> comp =pow(i, 2 );
std::cout<<comp<<std::endl;
return 0;
}
gives me the expected result (-1,0) without c++11. However, compiling with c++11 gives the highly annoying (-1,1.22461e-016).
What to do, and what is best practice?
Of course this can be fixed manually by flooring etc., but I would appreciate to know the proper way of addressing the problem.
SYSTEM: Win8.1, using Desktop Qt 5.1.1 (Qt Creator) with MinGW 4.8 32 bit. Using c++11 by adding the flag QMAKE_CXXFLAGS += -std=c++11 in the Qt Creator .pro file.
In C++11 we have a few new overloads of pow(std::complex). GCC has two nonstandard overloads on top of that, one for raising to an int and one for raising to an unsigned int.
One of the new standard overloads (namely std::complex</*Promoted*/> pow(const std::complex<T> &, const U &)) causes an ambiguity when calling pow(i, 2) with the non-standard ones. Their solution is to #ifdef the non-standard ones out in the presence of C++11 and you go from calling the specialized function (which uses successive squaring) to the generic method (which uses pow(double,double) and std::polar).
You need to get into a different mode when you are using floating point numbers. Floating points are APPROXIMATIONS of real numbers.
1.22461e-016 is
0.0000000000000000122461
An engineer would say that IS zero. You will always get such variations (unless you stick to operations on sums of powers of 2 with the same general range.
A value as simple 0.1 cannot be represented exactly with floating point numbers.
The general problem you present has to parts:
1. Dealing with floating point numbers in processing
2. Displaying flooding point numbers.
For the processing, I would wager that doing:
comp = i * i ;
Would give you want you want.
Pow (x, y) is going to do
exp (log (x) * y)
For output, switch to using an F format.

Generate random numbers following a normal distribution in C/C++

How can I easily generate random numbers following a normal distribution in C or C++?
I don't want any use of Boost.
I know that Knuth talks about this at length but I don't have his books at hand right now.
There are many methods to generate Gaussian-distributed numbers from a regular RNG.
The Box-Muller transform is commonly used. It correctly produces values with a normal distribution. The math is easy. You generate two (uniform) random numbers, and by applying an formula to them, you get two normally distributed random numbers. Return one, and save the other for the next request for a random number.
C++11
C++11 offers std::normal_distribution, which is the way I would go today.
C or older C++
Here are some solutions in order of ascending complexity:
Add 12 uniform random numbers from 0 to 1 and subtract 6. This will match mean and standard deviation of a normal variable. An obvious drawback is that the range is limited to ±6 – unlike a true normal distribution.
The Box-Muller transform. This is listed above, and is relatively simple to implement. If you need very precise samples, however, be aware that the Box-Muller transform combined with some uniform generators suffers from an anomaly called Neave Effect1.
For best precision, I suggest drawing uniforms and applying the inverse cumulative normal distribution to arrive at normally distributed variates. Here is a very good algorithm for inverse cumulative normal distributions.
1. H. R. Neave, “On using the Box-Muller transformation with multiplicative congruential pseudorandom number generators,” Applied Statistics, 22, 92-97, 1973
A quick and easy method is just to sum a number of evenly distributed random numbers and take their average. See the Central Limit Theorem for a full explanation of why this works.
I created a C++ open source project for normally distributed random number generation benchmark.
It compares several algorithms, including
Central limit theorem method
Box-Muller transform
Marsaglia polar method
Ziggurat algorithm
Inverse transform sampling method.
cpp11random uses C++11 std::normal_distribution with std::minstd_rand (it is actually Box-Muller transform in clang).
The results of single-precision (float) version on iMac Corei5-3330S#2.70GHz , clang 6.1, 64-bit:
For correctness, the program verifies the mean, standard deviation, skewness and kurtosis of the samples. It was found that CLT method by summing 4, 8 or 16 uniform numbers do not have good kurtosis as the other methods.
Ziggurat algorithm has better performance than the others. However, it does not suitable for SIMD parallelism as it needs table lookup and branches. Box-Muller with SSE2/AVX instruction set is much faster (x1.79, x2.99) than non-SIMD version of ziggurat algorithm.
Therefore, I will suggest using Box-Muller for architecture with SIMD instruction sets, and may be ziggurat otherwise.
P.S. the benchmark uses a simplest LCG PRNG for generating uniform distributed random numbers. So it may not be sufficient for some applications. But the performance comparison should be fair because all implementations uses the same PRNG, so the benchmark mainly tests the performance of the transformation.
Here's a C++ example, based on some of the references. This is quick and dirty, you are better off not re-inventing and using the boost library.
#include "math.h" // for RAND, and rand
double sampleNormal() {
double u = ((double) rand() / (RAND_MAX)) * 2 - 1;
double v = ((double) rand() / (RAND_MAX)) * 2 - 1;
double r = u * u + v * v;
if (r == 0 || r > 1) return sampleNormal();
double c = sqrt(-2 * log(r) / r);
return u * c;
}
You can use a Q-Q plot to examine the results and see how well it approximates a real normal distribution (rank your samples 1..x, turn the ranks into proportions of total count of x ie. how many samples, get the z-values and plot them. An upwards straight line is the desired result).
Use std::tr1::normal_distribution.
The std::tr1 namespace is not a part of boost. It's the namespace that contains the library additions from the C++ Technical Report 1 and is available in up to date Microsoft compilers and gcc, independently of boost.
This is how you generate the samples on a modern C++ compiler.
#include <random>
...
std::mt19937 generator;
double mean = 0.0;
double stddev = 1.0;
std::normal_distribution<double> normal(mean, stddev);
cerr << "Normal: " << normal(generator) << endl;
You can use the GSL. Some complete examples are given to demonstrate how to use it.
Have a look on: http://www.cplusplus.com/reference/random/normal_distribution/. It's the simplest way to produce normal distributions.
If you're using C++11, you can use std::normal_distribution:
#include <random>
std::default_random_engine generator;
std::normal_distribution<double> distribution(/*mean=*/0.0, /*stddev=*/1.0);
double randomNumber = distribution(generator);
There are many other distributions you can use to transform the output of the random number engine.
I've followed the definition of the PDF given in http://www.mathworks.com/help/stats/normal-distribution.html and came up with this:
const double DBL_EPS_COMP = 1 - DBL_EPSILON; // DBL_EPSILON is defined in <limits.h>.
inline double RandU() {
return DBL_EPSILON + ((double) rand()/RAND_MAX);
}
inline double RandN2(double mu, double sigma) {
return mu + (rand()%2 ? -1.0 : 1.0)*sigma*pow(-log(DBL_EPS_COMP*RandU()), 0.5);
}
inline double RandN() {
return RandN2(0, 1.0);
}
It is maybe not the best approach, but it's quite simple.
The comp.lang.c FAQ list shares three different ways to easily generate random numbers with a Gaussian distribution.
You may take a look of it: http://c-faq.com/lib/gaussian.html
There exists various algorithms for the inverse cumulative normal distribution. The most popular in quantitative finance are tested on http://chasethedevil.github.io/post/monte-carlo-inverse-cumulative-normal-distribution/
In my opinion, there is not much incentive to use something else than algorithm AS241 from Wichura: it is machine precision, reliable and fast. Bottlenecks are rarely in the Gaussian random number generation.
The top answer here advocates for Box-Müller, you should be aware that it has known deficiencies. I quote https://www.sciencedirect.com/science/article/pii/S0895717710005935:
in the literature, Box–Muller is sometimes regarded as slightly inferior, mainly for two reasons. First, if one applies the Box–Muller method to numbers from a bad linear congruential generator, the transformed numbers provide an extremely poor coverage of the space. Plots of transformed numbers with spiraling tails can be found in many books, most notably in the classic book of Ripley, who was probably the first to make this observation"
Box-Muller implementation:
#include <cstdlib>
#include <cmath>
#include <ctime>
#include <iostream>
using namespace std;
// return a uniformly distributed random number
double RandomGenerator()
{
return ( (double)(rand()) + 1. )/( (double)(RAND_MAX) + 1. );
}
// return a normally distributed random number
double normalRandom()
{
double y1=RandomGenerator();
double y2=RandomGenerator();
return cos(2*3.14*y2)*sqrt(-2.*log(y1));
}
int main(){
double sigma = 82.;
double Mi = 40.;
for(int i=0;i<100;i++){
double x = normalRandom()*sigma+Mi;
cout << " x = " << x << endl;
}
return 0;
}
1) Graphically intuitive way you can generate Gaussian random numbers is by using something similar to the Monte Carlo method. You would generate a random point in a box around the Gaussian curve using your pseudo-random number generator in C. You can calculate if that point is inside or underneath the Gaussian distribution using the equation of the distribution. If that point is inside the Gaussian distribution, then you have got your Gaussian random number as the x value of the point.
This method isn't perfect because technically the Gaussian curve goes on towards infinity, and you couldn't create a box that approaches infinity in the x dimension. But the Guassian curve approaches 0 in the y dimension pretty fast so I wouldn't worry about that. The constraint of the size of your variables in C may be more of a limiting factor to your accuracy.
2) Another way would be to use the Central Limit Theorem which states that when independent random variables are added, they form a normal distribution. Keeping this theorem in mind, you can approximate a Gaussian random number by adding a large amount of independent random variables.
These methods aren't the most practical, but that is to be expected when you don't want to use a preexisting library. Keep in mind this answer is coming from someone with little or no calculus or statistics experience.
Monte Carlo method
The most intuitive way to do this would be to use a monte carlo method. Take a suitable range -X, +X. Larger values of X will result in a more accurate normal distribution, but takes longer to converge.
a. Choose a random number z between -X to X.
b. Keep with a probability of N(z, mean, variance) where N is the gaussian distribution. Drop otherwise and go back to step (a).
Take a look at what I found.
This library uses the Ziggurat algorithm.
Computer is deterministic device. There is no randomness in calculation.
Moreover arithmetic device in CPU can evaluate summ over some finite set of integer numbers (performing evaluation in finite field) and finite set of real rational numbers. And also performed bitwise operations. Math take a deal with more great sets like [0.0, 1.0] with infinite number of points.
You can listen some wire inside of computer with some controller, but would it have uniform distributions? I don't know. But if assumed that it's signal is the the result of accumulate values huge amount of independent random variables then you will receive approximately normal distributed random variable (It was proved in Probability Theory)
There is exist algorithms called - pseudo random generator. As I feeled the purpose of pseudo random generator is to emulate randomness. And the criteria of goodnes is:
- the empirical distribution is converged (in some sense - pointwise, uniform, L2) to theoretical
- values that you receive from random generator are seemed to be idependent. Of course it's not true from 'real point of view', but we assume it's true.
One of the popular method - you can summ 12 i.r.v with uniform distributions....But to be honest during derivation Central Limit Theorem with helping of Fourier Transform, Taylor Series, it is neededed to have n->+inf assumptions couple times. So for example theoreticaly - Personally I don't undersand how people perform summ of 12 i.r.v. with uniform distribution.
I had probility theory in university. And particulary for me it is just a math question. In university I saw the following model:
double generateUniform(double a, double b)
{
return uniformGen.generateReal(a, b);
}
double generateRelei(double sigma)
{
return sigma * sqrt(-2 * log(1.0 - uniformGen.generateReal(0.0, 1.0 -kEps)));
}
double generateNorm(double m, double sigma)
{
double y2 = generateUniform(0.0, 2 * kPi);
double y1 = generateRelei(1.0);
double x1 = y1 * cos(y2);
return sigma*x1 + m;
}
Such way how todo it was just an example, I guess it exist another ways to implement it.
Provement that it is correct can be found in this book
"Moscow, BMSTU, 2004: XVI Probability Theory, Example 6.12, p.246-247" of Krishchenko Alexander Petrovich ISBN 5-7038-2485-0
Unfortunately I don't know about existence of translation of this book into English.