Running std::normal_distribution with user-defined random generator

Running std::normal_distribution with user-defined random generator - c++

I am about to generate an array of normally distributed pseudo-random numbers. As I know the std library offers the following code for that:
std::random_device rd;
std::mt19937 gen(rd());
std::normal_distribution<> d(mean,std);
...
double number = d(gen);
The problem is that I want to use a Sobol' quasi-random sequence instead of Mersenne
Twister pseudo-random generator. So, my question is:
Is it possible to run the std::normal_distribution with a user-defined random generator (with a Sobol' quasi-random sequence generator in my case)?
More details: I have a class called RandomGenerators, which is used to generate a Sobol' quasi-random numbers:
RandomGenerator randgen;
double number = randgen.sobol(0,1);

Yes, it is possible. Just make it comply to the requirements of a uniform random number generator (§26.5.1.3 paragraphs 2 and 3):
2 A class G satisfies the requirements of a uniform random number
generator if the expressions shown in Table 116 are valid and have the
indicated semantics, and if G also satisfies all other requirements
of this section. In that Table and throughout this section:
a) T is the type named by G’s associatedresult_type`, and
b) g is a value of G.
Table 116 — Uniform random number generator requirements
Expression | Return type | Pre/post-condition | Complexity
----------------------------------------------------------------------
G::result_type | T | T is an unsigned integer | compile-time
| | type (§3.9.1). |
----------------------------------------------------------------------
g() | T | Returns a value in the | amortized constant
| | closed interval |
| | [G::min(), G::max()]. |
----------------------------------------------------------------------
G::min() | T | Denotes the least value | compile-time
| | potentially returned by |
| | operator(). |
----------------------------------------------------------------------
G::max() | T | Denotes the greatest value | compile-time
| | potentially returned by |
| | operator(). |
3 The following relation shall hold: G::min() < G::max().

A word of caution here - I came across a big gotcha when I implemented this. It seems that if the return types of max()/min()/operator() are not 64 bit then the distribution will resample. My (unsigned) 32 bit Sobol implementation was getting sampled twice per deviate thus destroying the properties of the numbers. This code reproduces:
#include <random>
#include <limits>
#include <iostream>
#include <cstdint>
typedef uint32_t rng_int_t;
int requested = 0;
int sampled = 0;
struct Quasi
{
rng_int_t operator()()
{
++sampled;
return 0;
}
rng_int_t min() const
{
return 0;
}
rng_int_t max() const
{
return std::numeric_limits<rng_int_t>::max();
}
};
int main()
{
std::uniform_real_distribution<double> dist(0.0,1.0);
Quasi q;
double total = 0.0;
for (size_t i = 0; i < 10; ++i)
{
dist(q);
++requested;
}
std::cout << "requested: " << requested << std::endl;
std::cout << "sampled: " << sampled << std::endl;
}
Output (using g++ 5.4):
requested: 10
sampled: 20
and even when compiled with -m32. If you change rng_int_t to 64bit the problem goes away. My workaround is to stick the 32 bit value into the most significant bits of the return value, e.g
return uint64_t(val) << 32;

You can now generate Sobol sequences directly with Boost. See boost/random/sobol.hpp.

Related

In C++ and range-v3, how to convert a string of space-separated numbers to a vector of integers?

Using C++ and range-v3 library, what's the optimal approach to converting a string with space-separated numbers to a vector of integers?
I tried the following code:
#include <iostream>
#include <range/v3/all.hpp>
using namespace std::literals;
int main() {
auto r = "1 1 2 3 5 8 13"sv
| ranges::views::split(" "sv)
| ranges::views::transform([](auto &&i){ return std::stoi(std::string{i}); })
| ranges::to<std::vector<int>>();
for (auto i: r)
std::cout << "Value: " << i << std::endl;
}
It doesn't compile however. In clang, the error is as follows:
repro-range.cpp:10:60: error: no matching constructor for initialization of 'std::string' (aka 'basic_string<char>')
| ranges::view::transform([](auto &&i){ return std::stoi(std::string{i}); })
^ ~~~
It seems that the type of i is ranges::detail::split_outer_iterator and it's not convertible to string. Actually, I don't understand how to use i, can't dereference it, can't convert it to anything useful... replacing string_views by strings also doesn't improve the situation.
What's weird, the code below works fine:
auto r = "1 1 2 3 5 8 13"sv
| ranges::views::split(" "sv)
| ranges::to<std::vector<std::string>>();
which suggest me the problem is netiher split nor to, but the transform itself.
How to make the first piece code working?

If you have a string containing space separated numbers you can first create an std::istringstream over the string and then use ranges::istream to parse the numbers (assuming ints here):
auto s = "1 1 2 3 5 8 13";
auto ss = std::istringstream{s};
auto r = ranges::istream<int>(ss)
| ranges::to<std::vector<int>>();
Here's a demo.

Digging deeper, I found out that i in my example isn't an iterator nor a wrapper over string_view (like I expected) but a range of characters (a special type with begin and end iterators).
Meaning, my code works if I first convert i to a string the range way:
auto r = "1 1 2 3 5 8 13"sv
| ranges::views::split(" "sv)
| ranges::views::transform([](auto &&i){
return std::stoi(i | ranges::to<std::string>());
})
| ranges::to<std::vector<int>>();
Although I'll be delighted if somebody posts a nicer (at least less verbose) way to do that.

Why is (n += 2 * i * i) faster than (n+= i) in C++?

This C++11 program takes on average between 7.42s and 7.79s to run.
#include <iostream>
#include <chrono>
using namespace std;
using c = chrono::system_clock;
using s = chrono::duration<double>;
void func(){
int n=0;
const auto before = c::now();
for(int i=0; i<2000000000; i++){
n += i;
}
const s duration = c::now() - before;
cout << duration.count();
}
if I replace n += i with n += 2 * i * i it takes between 5.80s and 5.96s. how come?
I ran each version of the program 20 times, alternating between the two. Here are the results:
n += i | n += 2 * i * i
---------+----------------
7.77047 | 5.87978
7.69226 | 5.83551
7.77375 | 5.84888
7.73748 | 5.84629
7.72988 | 5.84356
7.69736 | 5.83784
7.72597 | 5.84246
7.72722 | 5.81678
7.73291 | 5.81237
7.71871 | 5.81016
7.7478 | 5.80119
7.64906 | 5.80058
7.7253 | 5.9078
7.42734 | 5.96399
7.72573 | 5.84733
7.65591 | 5.81793
7.76619 | 5.83116
7.76963 | 5.84424
7.79928 | 5.87078
7.79274 | 5.84689
I have compiled it with (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1). No optimization levels
g++ -std=c++11
We know that the maximum integer is ~ 2 billion. So, when i ~ 32000, can we say that the compiler predicts that the calculation will overflow?

https://godbolt.org/z/B3zIsv
You'll notice that with -O2, the code used to calculate 'n' is removed completely. So the real questions should be:
Why are you profiling code without -O2?
Why are you profiling code that has no observable side effects? ('n' can be removed completely - e.g. printing the value of 'n' at the end would be more useful here)
Why are you not profiling code in a profiler?
The timing results you have, result from a deeply flawed methodology.

Method for generating a random bitset of uniform distribution

How can I generate a bitset whose length is a multiple of 8 (corresponding to a standard data type) wherein each bit is 0 or 1 with equal probability?

The following works.
Choose a PRNG with good statistical properties
Seed it well
Generate integers over an inclusive range including the minimum and maximum of the integer type.
Since the integers are uniformly distributed across their entire range, each bit representation must be equally probable. Since all bit representations are present, each bit is equally like to be on or off.
The following code accomplishes this:
#include <cstdint>
#include <iostream>
#include <random>
#include <algorithm>
#include <functional>
#include <bitset>
//Generate the goodness
template<class T>
T uniform_bits(std::mt19937& g){
std::uniform_int_distribution<T> dist(std::numeric_limits<T>::lowest(),std::numeric_limits<T>::max());
return dist( g );
}
int main(){
//std::default_random_engine can be anything, including an engine with short
//periods and bad statistical properties. Rather than cross my finers and pray
//that it'll somehow be okay, I'm going to rely on an engine whose strengths
//and weaknesses I know.
std::mt19937 engine;
//You'll see a lot of people write `engine.seed(std::random_device{}())`. This
//is bad. The Mersenne Twister has an internal state of 624 bytes. A single
//call to std::random_device() will give us 4 bytes: woefully inadequate. The
//following method should be slightly better, though, sadly,
//std::random_device may still return deterministic, poorly distributed
//numbers.
std::uint_fast32_t seed_data[std::mt19937::state_size];
std::random_device r;
std::generate_n(seed_data, std::mt19937::state_size, std::ref(r));
std::seed_seq q(std::begin(seed_data), std::end(seed_data));
engine.seed(q);
//Use bitset to print the numbers for analysis
for(int i=0;i<50000;i++)
std::cout<<std::bitset<64>(uniform_bits<uint64_t>(engine))<<std::endl;
return 0;
}
We can test the output by compiling (g++ -O3 test.cpp) and doing some stats with:
./a.out | sed -E 's/(.)/ \1/g' | sed 's/^ //' | numsum -c | tr " " "\n" | awk '{print $1/25000}' | tr "\n" " "
The result is:
1.00368 1.00788 1.00416 1.0036 0.99224 1.00632 1.00532 0.99336 0.99768 0.99952 0.99424 1.00276 1.00272 0.99636 0.99728 0.99524 0.99464 0.99424 0.99644 1.0076 0.99548 0.99732 1.00348 1.00268 1.00656 0.99748 0.99404 0.99888 0.99832 0.99204 0.99832 1.00196 1.005 0.99796 1.00612 1.00112 0.997 0.99988 0.99396 0.9946 1.00032 0.99824 1.00196 1.00612 0.99372 1.00064 0.99848 1.00008 0.99848 0.9914 1.00008 1.00416 0.99716 1.00868 0.993 1.00468 0.99908 1.003 1.00384 1.00296 1.0034 0.99264 1 1.00036
Since all of the values are "close" to one, we conclude that our mission is accomplished.

Here is a nice function to achieve this:
template<typename T, std::size_t N = sizeof(T) * CHAR_BIT> //CHAR_BIT is 8 on most
//architectures
auto randomBitset() {
std::uniform_int_distribution<int> dis(0, 1);
std::mt19937 mt{ std::random_device{}() };
std::string values;
for (std::size_t i = 0; i < N; ++i)
values += dis(mt) + '0';
return std::bitset<N>{ values };
}

Fibonacci series in C++ : Control reaches end of non-void function

while practicing recursive functions, I have written this code for fibonacci series and (factorial) the program does not run and shows the error "Control reaches end of non-void function" i suspect this is about the last iteration reaching zero and not knowing what to do into minus integers. I have tried return 0, return 1, but no good. any suggestions?
#include <cstdlib>
#include <iomanip>
#include <iostream>
#include <ctime>
using namespace std;
int fib(int n) {
int x;
if(n<=1) {
cout << "zero reached \n";
x= 1;
} else {
x= fib(n-1)+fib(n-2);
return x;
}
}
int factorial(int n){
int x;
if (n==0){
x=1;
}
else {
x=(n*factorial(n-1));
return x;
}
}

Change
else if (n==1)
x=1;
to
else if (n==1)
return 1;
Then fib() should work for all non-negative numbers. If you want to simplify it and have it work for all numbers, go with something like:
int fib(int n) {
if(n<=1) {
cout << "zero reached \n";
return 1;
} else {
return fib(n-1)+fib(n-2);
}
}

"Control reaches end of non-void function"
This is a compile-time warning (which can be treated as an error with appropriate compiler flags). It means that you have declared your function as non-void (in this case, int) and yet there is a path through your function for which there is no return (in this case if (n == 1)).
One of the reasons that some programmers prefer to have exactly one return statement per function, at the very last line of the function...
return x;
}
...is that it is easy to see that their functions return appropriately. This can also be achieved by keeping functions very short.
You should also check your logic in your factorial() implementation, you have infinite recursion therein.

Presumably the factorial function should be returning n * factorial(n-1) for n > 0.
x=(factorial(n)*factorial(n-1)) should read x = n * factorial(n-1)

In your second base case (n == 1), you never return x; or 'return 1;'

The else section of your factorial() function starts:
x=(factorial(n)*factorial(n-1));
This leads to infinite recursion. It should be:
x=(n*factorial(n-1));

Sometimes, your compiler is not able to deduce that your function actually has no missing return. In such cases, several solutions exist:
Assume
if (foo == 0) {
return bar;
} else {
return frob;
}
Restructure your code
if (foo == 0) {
return bar;
}
return frob;
This works good if you can interpret the if-statement as a kind of firewall or precondition.
abort()
(see David Rodríguez's comment.)
if (foo == 0) {
return bar;
} else {
return frob;
}
abort(); return -1; // unreachable
Return something else accordingly. The comment tells fellow programmers and yourself why this is there.
throw
#include <stdexcept>
if (foo == 0) {
return bar;
} else {
return frob;
}
throw std::runtime_error ("impossible");
Not a counter measure: Single Function Exit Point
Do not fall back to one-return-per-function a.k.a. single-function-exit-point as a workaround. This is obsolete in C++ because you almost never know where the function will exit:
void foo(int&);
int bar () {
int ret = -1;
foo (ret);
return ret;
}
Looks nice and looks like SFEP, but reverse engineering the 3rd party proprietary libfoo reveals:
void foo (int &) {
if (rand()%2) throw ":P";
}
Also, this can imply an unnecessary performance loss:
Frob bar ()
{
Frob ret;
if (...) ret = ...;
...
if (...) ret = ...;
else if (...) ret = ...;
return ret;
}
because:
class Frob { char c[1024]; }; // copy lots of data upon copy
And, every mutable variable increases the cyclomatic complexity of your code. It means more code and more state to test and verify, in turn means that you suck off more state from the maintainers brain, in turn means less maintainer's brain quality.
Last, not least: Some classes have no default construction and you would have to write really bogus code, if possible at all:
File mogrify() {
File f ("/dev/random"); // need bogus init
...
}
Do not do this.

There is nothing wrong with if-else statements. The C++ code applying them looks similar to other languages. In order to emphasize expressiveness of C++, one could write for factorial (as example):
int factorial(int n){return (n > 1) ? n * factorial(n - 1) : 1;}
This illustration, using "truly" C/C++ conditional operator ?:, and other suggestions above lack the production strength. It would be needed to take measures against overfilling the placeholder (int or unsigned int) for the result and with recursive solutions overfilling the calling stack. Clearly, that the maximum n for factorial can be computed in advance and serve for protection against "bad inputs". However, this could be done on other indirection levels controlling n coming to the function factorial. The version above returns 1 for any negative n. Using unsigned int would prevent dealing processing negative inputs. However, it would not prevent possible conversion situation created by a user. Thus, measures against negative inputs might be desirable too.

While the author is asking what is technically wrong with the recursive function computing the Fibonacci numbers, I would like to notice that the recursive function outlined above will solve the task in time exponentially growing with n. I do not want to discourage creating perfect C++ from it. However, it is known that the task can be computed faster. This is less important for small n. You would need to refresh the matrix multiplication knowledge in order to understand the explanation. Consider, evaluation of powers of the matrix:
power n = 1 | 1 1 |
| 1 0 | = M^1
power n = 2 | 1 1 | | 1 1 | | 2 1 |
| 1 0 | * | 1 0 | = | 1 1 | = M^2
power n = 3 | 2 1 | | 1 1 | | 3 2 |
| 1 1 | * | 1 0 | = | 2 1 | = M^3
power n = 4 | 3 2 | | 1 1 | | 5 3 |
| 2 1 | * | 1 0 | = | 3 2 | = M^4
Do you see that the matrix elements of the result resemble the Fibonacci numbers? Continue
power n = 5 | 5 3 | | 1 1 | | 8 5 |
| 3 2 | * | 1 0 | = | 5 3 | = M^5
Your guess is right (this is proved by mathematical induction, try or just use)
power n | 1 1 |^n | F(n + 1) F(n) |
| 1 0 | = M^n * | F(n) F(n - 1) |
When multiply the matrices, apply at least the so-called "exponentiation by squaring". Just to remind:
if n is odd (n % 2 != 0), then M * (M^2)^((n - 1) / 2)
M^n =
if n is even (n % 2 == 0), then (M^2)^(n/2)
Without this, your implementation will get the time properties of an iterative procedure (which is still better than exponential growth). Add your lovely C++ and it will give a decent result. However, since there is no a limit of perfectness, this also can be improved. Particularly, there is a so-called "fast doubling" for the Fibonacci numbers. This would have the same asymptotic properties but with a better time coefficient for the dependence on time. When one say O(N) or O(N^2) the actual constant coefficients will determine further differences. One O(N) can be still better than another O(n).

How to let Boost::random and Matlab produce the same random numbers

To check my C++ code, I would like to be able to let Boost::Random and Matlab produce the same random numbers.
So for Boost I use the code:
boost::mt19937 var(static_cast<unsigned> (std::time(0)));
boost::uniform_int<> dist(1, 6);
boost::variate_generator<boost::mt19937&, boost::uniform_int<> > die(var, dist);
die.engine().seed(0);
for(int i = 0; i < 10; ++i) {
std::cout << die() << " ";
}
std::cout << std::endl;
Which produces (every run of the program):
4 4 5 6 4 6 4 6 3 4
And for matlab I use:
RandStream.setDefaultStream(RandStream('mt19937ar','seed',0));
randi(6,1,10)
Which produces (every run of the program):
5 6 1 6 4 1 2 4 6 6
Which is bizarre, since both use the same algorithm, and same seed.
What do I miss?
It seems that Python (using numpy) and Matlab seems comparable, in the random uniform numbers:
Matlab
RandStream.setDefaultStream(RandStream('mt19937ar','seed',203));rand(1,10)
0.8479 0.1889 0.4506 0.6253 0.9697 0.2078 0.5944 0.9115 0.2457 0.7743
Python:
random.seed(203);random.random(10)
array([ 0.84790006, 0.18893843, 0.45060688, 0.62534723, 0.96974765,
0.20780668, 0.59444858, 0.91145688, 0.24568615, 0.77430378])
C++Boost
0.8479 0.667228 0.188938 0.715892 0.450607 0.0790326 0.625347 0.972369 0.969748 0.858771
Which is identical to ever other Python and Matlab value...

I have to agree with the other answers, stating that these generators are not "absolute". They may produce different results according to the implementation. I think the simplest solution would be to implement your own generator. It might look daunting (Mersenne twister sure is by the way) but take a look at Xorshift, an extremely simple though powerful one. I copy the C implementation given in the Wikipedia link :
uint32_t xor128(void) {
static uint32_t x = 123456789;
static uint32_t y = 362436069;
static uint32_t z = 521288629;
static uint32_t w = 88675123;
uint32_t t;
t = x ^ (x << 11);
x = y; y = z; z = w;
return w = w ^ (w >> 19) ^ (t ^ (t >> 8));
}
To have the same seed, just put any values you want int x,y,z,w (except(0,0,0,0) I believe). You just need to be sure that Matlab and C++ use both 32 bit for these unsigned int.

Using the interface like
randi(6,1,10)
will apply some kind of transformation on the raw result of the random generator. This transformation is not trivial in general and Matlab will almost certainly do a different selection step than Boost.
Try comparing raw data streams from the RNGs - chances are they are the same

In case this helps anyone interested in the question:
In order to the get the same behavior for the Twister algorithm:
Download the file
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.c
Try the following:
#include <stdint.h>
// mt19937ar.c content..
int main(void)
{
int i;
uint32_t seed = 100;
init_genrand(seed);
for (i = 0; i < 100; ++i)
printf("%.20f\n",genrand_res53());
return 0;
}
Make sure the same values are generated within matlab:
RandStream.setGlobalStream( RandStream.create('mt19937ar','seed',100) );
rand(100,1)
randi() seems to be simply ceil( rand()*maxval )

Thanks to Fezvez's answer I've written xor128 in matlab:
function [ w, state ] = xor128( state )
%XOR128 implementation of Xorshift
% https://en.wikipedia.org/wiki/Xorshift
% A starting state might be [123456789, 362436069, 521288629, 88675123]
x = state(1);
y = state(2);
z = state(3);
w = state(4);
% t1 = (x << 11)
t1 = bitand(bitshift(x,11),hex2dec('ffffffff'));
% t = x ^ (x << 11)
t = bitxor(x,t1);
x = y;
y = z;
z = w;
% t2 = (t ^ (t >> 8))
t2 = bitxor(t, bitshift(t,-8));
% t3 = w ^ (w >> 19)
t3 = bitxor(w, bitshift(w,-19));
% w = w ^ (w >> 19) ^ (t ^ (t >> 8))
w = bitxor(t3, t2);
state = [x y z w];
end
You need to pass state in to xor128 every time you use it. I've written a "tester" function which simply returns a vector with random numbers. I tested 1000 numbers output by this function against values output by cpp with gcc and it is perfect.
function [ v ] = txor( iterations )
%TXOR test xor128, returns vector v of length iterations with random number
% output from xor128
% output
v = zeros(iterations,1);
state = [123456789, 362436069, 521288629, 88675123];
i = 1;
while i <= iterations
disp(i);
[t,state] = xor128(state);
v(i) = t;
i = i + 1;
end

I would be very careful assuming that two different implementations of pseudo random generators (even though based on the same algorithms) produce the same result. There could be that one of the implementations use some sort of tweak, hence producing different results. If you need two equal "random" distributions I suggest you either precalculate a sequence, store and access from both C++ and Matlab or create your own generator. It should be fairly easy to implement MT19937 if you use the pseudocode on Wikipedia.
Take care ensuring that both your Matlab and C++ code runs on the same architecture (that is, both runs on either 32 or 64-bit) - using a 64 bit integer in one implementation and a 32 bit integer in the other will lead to different results.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Running std::normal_distribution with user-defined random generator - c++

You can now generate Sobol sequences directly with Boost. See boost/random/sobol.hpp.

Related

In C++ and range-v3, how to convert a string of space-separated numbers to a vector of integers?

Why is (n += 2 * i * i) faster than (n+= i) in C++?

Method for generating a random bitset of uniform distribution

Fibonacci series in C++ : Control reaches end of non-void function

How to let Boost::random and Matlab produce the same random numbers

Categories

Resources