Method for generating a random bitset of uniform distribution - c++

How can I generate a bitset whose length is a multiple of 8 (corresponding to a standard data type) wherein each bit is 0 or 1 with equal probability?

The following works.
Choose a PRNG with good statistical properties
Seed it well
Generate integers over an inclusive range including the minimum and maximum of the integer type.
Since the integers are uniformly distributed across their entire range, each bit representation must be equally probable. Since all bit representations are present, each bit is equally like to be on or off.
The following code accomplishes this:
#include <cstdint>
#include <iostream>
#include <random>
#include <algorithm>
#include <functional>
#include <bitset>
//Generate the goodness
template<class T>
T uniform_bits(std::mt19937& g){
std::uniform_int_distribution<T> dist(std::numeric_limits<T>::lowest(),std::numeric_limits<T>::max());
return dist( g );
}
int main(){
//std::default_random_engine can be anything, including an engine with short
//periods and bad statistical properties. Rather than cross my finers and pray
//that it'll somehow be okay, I'm going to rely on an engine whose strengths
//and weaknesses I know.
std::mt19937 engine;
//You'll see a lot of people write `engine.seed(std::random_device{}())`. This
//is bad. The Mersenne Twister has an internal state of 624 bytes. A single
//call to std::random_device() will give us 4 bytes: woefully inadequate. The
//following method should be slightly better, though, sadly,
//std::random_device may still return deterministic, poorly distributed
//numbers.
std::uint_fast32_t seed_data[std::mt19937::state_size];
std::random_device r;
std::generate_n(seed_data, std::mt19937::state_size, std::ref(r));
std::seed_seq q(std::begin(seed_data), std::end(seed_data));
engine.seed(q);
//Use bitset to print the numbers for analysis
for(int i=0;i<50000;i++)
std::cout<<std::bitset<64>(uniform_bits<uint64_t>(engine))<<std::endl;
return 0;
}
We can test the output by compiling (g++ -O3 test.cpp) and doing some stats with:
./a.out | sed -E 's/(.)/ \1/g' | sed 's/^ //' | numsum -c | tr " " "\n" | awk '{print $1/25000}' | tr "\n" " "
The result is:
1.00368 1.00788 1.00416 1.0036 0.99224 1.00632 1.00532 0.99336 0.99768 0.99952 0.99424 1.00276 1.00272 0.99636 0.99728 0.99524 0.99464 0.99424 0.99644 1.0076 0.99548 0.99732 1.00348 1.00268 1.00656 0.99748 0.99404 0.99888 0.99832 0.99204 0.99832 1.00196 1.005 0.99796 1.00612 1.00112 0.997 0.99988 0.99396 0.9946 1.00032 0.99824 1.00196 1.00612 0.99372 1.00064 0.99848 1.00008 0.99848 0.9914 1.00008 1.00416 0.99716 1.00868 0.993 1.00468 0.99908 1.003 1.00384 1.00296 1.0034 0.99264 1 1.00036
Since all of the values are "close" to one, we conclude that our mission is accomplished.

Here is a nice function to achieve this:
template<typename T, std::size_t N = sizeof(T) * CHAR_BIT> //CHAR_BIT is 8 on most
//architectures
auto randomBitset() {
std::uniform_int_distribution<int> dis(0, 1);
std::mt19937 mt{ std::random_device{}() };
std::string values;
for (std::size_t i = 0; i < N; ++i)
values += dis(mt) + '0';
return std::bitset<N>{ values };
}

Related

Change value of randomly generated number upon second compilation

I applied the random number generator to my code although the first number generated doesn't change when I run the code second or the third time. The other numbers change however and the issue is only on the first value. I'm using code blocks; Cygwin GCC compiler (c++ 17). Seeding using time.
#include <iostream>
#include <random>
#include <ctime>
int main()
{
std::default_random_engine randomGenerator(time(0));
std::uniform_int_distribution randomNumber(1, 20);
int a, b, c;
a = randomNumber(randomGenerator);
b = randomNumber(randomGenerator);
c = randomNumber(randomGenerator);
std::cout<<a<<std::endl;
std::cout<<b<<std::endl;
std::cout<<c<<std::endl;
return 0;
}
In such a case when I run the code the first time it may produce a result like a = 4, b = 5, c = 9. The second and further time (a) remains 4 but (b) and (c) keep changing.
Per my comment, the std::mt19937 is the main PRNG you should consider. It's the best one provided in <random>. You should also seed it better. Here I use std::random_device.
Some people will moan about how std::random_device falls back to a deterministic seed when a source of true random data can't be found, but that's pretty rare outside of low-level embedded stuff.
#include <iostream>
#include <random>
int main() {
std::mt19937 randomGenerator(std::random_device{}());
std::uniform_int_distribution randomNumber(1, 20);
for (int i = 0; i < 3; ++i) {
std::cout << randomNumber(randomGenerator) << ' ';
}
std::cout << '\n';
return 0;
}
Output:
~/tmp
❯ ./a.out
8 2 16
~/tmp
❯ ./a.out
7 12 14
~/tmp
❯ ./a.out
8 12 4
~/tmp
❯ ./a.out
18 8 7
Here we see four runs that are pretty distinct. Because your range is small, you'll see patterns pop up every now and again. There are other areas of improvement, notably providing a more robust seed to the PRNG, but for a toy program, this suffices.

My vectorized xorshift+ is not very random

I have the following code (the xorshift128+ code from Wikipedia modified to use vector types):
#include <immintrin.h>
#include <climits>
__v8si rand_si() {
static auto s0 = __v4du{4, 8, 15, 16},
s1 = __v4du{23, 34, 42, 69};
auto x = s0, y = s1;
s0 = y;
x ^= x << 23;
s1 = x ^ y ^ (x >> 17) ^ (y >> 26);
return (__v8si)(s1 + y);
}
#include <iostream>
#include <iomanip>
void foo() {
//Shuffle a bit. The result is much worse without this.
rand_si(); rand_si(); rand_si(); rand_si();
auto val = rand_si();
for (auto it = reinterpret_cast<int*>(&val);
it != reinterpret_cast<int*>(&val + 1);
++it)
std::cout << std::hex << std::setfill('0') << std::setw(8) << *it << ' ';
std::cout << '\n';
}
which outputs
09e2a657 000b8020 1504cc3b 00110040 1360ff2b 00150078 2a9998b7 00228080
Every other number is very small, and none have the leading bit set. On the other hand, using xorshift* instead:
__v8si rand_si() {
static auto x = __v4du{4, 8, 15, 16};
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
return x * (__v4du)_mm256_set1_epi64x(0x2545F4914F6CDD1D);
}
I get the much better output
0889632e a938b990 1e8b2f79 832e26bd 11280868 2a22d676 275ca4b8 10954ef9
But according to Wikipedia, xorshift+ is a good PRNG, and produces better pseudo-randomness than xorshift*. So, do I have a bug in my RNG code, or am I using it wrong?
I think you should not judge a random generator by looking at 8 numbers it generated. Furthermore, generators usually needs good seeding (your seeding can be considered bad - your seeds start with almost all bits zeros. Calling rand_si() just a few times is not enough for the bits to "spread").
So I recommend you to use proper seeding (for example, a simple solution is to call rand_si() even more times).
xorshift* look like behaving better because of the final multiplication, so it doesn't have easily spotted bad behavior because of inadequate seeding.
A tip: Compare the numbers your code generates with the original implementation. This way you can be sure that your implementation is correct.
geza's answer was exactly right, the seeding was the culprit. It worked much better to seed it using a standard 64-bit PRNG:
void seed(uint64_t s) {
std::mt19937_64 e(s);
s0 = __v4du{e(), e(), e(), e()};
s1 = __v4du{e(), e(), e(), e()};
}
Both guys above are mistaken. The xorshift+ generator works fine even when the initial base (seed) is 1, 2, 3, ... and other simplest values. Generator fails on a zero valued seed only. Check your 64-bit variables representation and correctly work of the binary operators.

Why is dividing slower than bitshifting in C++?

I wrote two pieces of code, one that divides a random number by two, and one that bitshifts the same random number right once. As I understand it, this should produce the same result. However, when I time both pieces of code, I consistently get data saying that the shifting is faster. Why is that?
Shifting code:
double iterations = atoi(argv[1]) * 1000;
int result = 0;
cout << "Doing " << iterations << " iterations." << endl;
srand(31459);
for(int i=0;i<iterations;i++){
if(i % 2 == 0){
result = result + (rand()>>1);
}else{
result = result - (rand()>>1);
}
}
Dividing code:
double iterations = atoi(argv[1]) * 1000;
int result = 0;
cout << "Doing " << iterations << " iterations." << endl;
srand(31459);
for(int i=0;i<iterations;i++){
if(i % 2 == 0){
result = result + (rand() / 2);
}else{
result = result - (rand() / 2);
}
}
Timing and results:
$ time ./divide 1000000; time ./shift 1000000
Doing 1e+09 iterations.
real 0m12.291s
user 0m12.260s
sys 0m0.021s
Doing 1e+09 iterations.
real 0m12.091s
user 0m12.056s
sys 0m0.019s
$ time ./shift 1000000; time ./divide 1000000
Doing 1e+09 iterations.
real 0m12.083s
user 0m12.028s
sys 0m0.035s
Doing 1e+09 iterations.
real 0m12.198s
user 0m12.158s
sys 0m0.028s
Addtional information:
I am not using any optimizations when compiling
I am running this on a virtualized install of Fedora 20, kernal: 3.12.10-300.fc20.x86_64
It's not; it's slower on the architecture you're running on. It's almost always slower because the hardware behind bit shifting is trivial, while division is a bit of a nightmare. In base 10, what's easier for you, 78358582354 >> 3 or 78358582354 / 85? Instructions generally take the same time to execute regardless of input, and in you case, it's the compiler's job to convert /2 to >>1; the CPU just does as it's told.
It isn't actually slower. I've run your benchmark using nonius like so:
#define NONIUS_RUNNER
#include "Nonius.h++"
#include <type_traits>
#include <random>
#include <vector>
NONIUS_BENCHMARK("Divide", [](nonius::chronometer meter)
{
std::random_device rd;
std::uniform_int_distribution<int> dist(0, 9);
std::vector<int> storage(meter.runs());
meter.measure([&](int i) { storage[i] = storage[i] % 2 == 0 ? storage[i] - (dist(rd) >> 1) : storage[i] + (dist(rd) >> 1); });
})
NONIUS_BENCHMARK("std::string destruction", [](nonius::chronometer meter)
{
std::random_device rd;
std::uniform_int_distribution<int> dist(0, 9);
std::vector<int> storage(meter.runs());
meter.measure([&](int i) { storage[i] = storage[i] % 2 == 0 ? storage[i] - (dist(rd) / 2) : storage[i] + (dist(rd) / 2); });
})
And these are the results:
As you can see both of them are neck and neck.
(You can find the html output here)
P.S: It seems I forgot to rename the second test. My bad.
It seems that difference in resuls is bellow the results spread, so you cann't really tell if it is different. But in general division can't be done in single opperation, bit shift can, so bit shift usualy should be faster.
But as you have literal 2 in your code, I would guess that compiler, even without optimizations produces identical code.
Note that rand returns int and divide int (signed by default) by 2 is not the same as shifting by 1. You can easily check generated asm and see the difference, or simply check resulting binary size:
> g++ -O3 boo.cpp -c -o boo # divide
> g++ -O3 foo.cpp -c -o foo # shift
> ls -la foo boo
... 4016 ... boo # divide
... 3984 ... foo # shift
Now add static_cast patch:
if (i % 2 == 0) {
result = result + (static_cast<unsigned>(rand())/2);
}
else {
result = result - (static_cast<unsigned>(rand())/2);
}
and check the size again:
> g++ -O3 boo.cpp -c -o boo # divide
> g++ -O3 foo.cpp -c -o foo # shift
> ls -la foo boo
... 3984 ... boo # divide
... 3984 ... foo # shift
to be sure you can verify that generated asm in both binaries is the same

Running std::normal_distribution with user-defined random generator

I am about to generate an array of normally distributed pseudo-random numbers. As I know the std library offers the following code for that:
std::random_device rd;
std::mt19937 gen(rd());
std::normal_distribution<> d(mean,std);
...
double number = d(gen);
The problem is that I want to use a Sobol' quasi-random sequence instead of Mersenne
Twister pseudo-random generator. So, my question is:
Is it possible to run the std::normal_distribution with a user-defined random generator (with a Sobol' quasi-random sequence generator in my case)?
More details: I have a class called RandomGenerators, which is used to generate a Sobol' quasi-random numbers:
RandomGenerator randgen;
double number = randgen.sobol(0,1);
Yes, it is possible. Just make it comply to the requirements of a uniform random number generator (§26.5.1.3 paragraphs 2 and 3):
2 A class G satisfies the requirements of a uniform random number
generator if the expressions shown in Table 116 are valid and have the
indicated semantics, and if G also satisfies all other requirements
of this section. In that Table and throughout this section:
a) T is the type named by G’s associatedresult_type`, and
b) g is a value of G.
Table 116 — Uniform random number generator requirements
Expression | Return type | Pre/post-condition | Complexity
----------------------------------------------------------------------
G::result_type | T | T is an unsigned integer | compile-time
| | type (§3.9.1). |
----------------------------------------------------------------------
g() | T | Returns a value in the | amortized constant
| | closed interval |
| | [G::min(), G::max()]. |
----------------------------------------------------------------------
G::min() | T | Denotes the least value | compile-time
| | potentially returned by |
| | operator(). |
----------------------------------------------------------------------
G::max() | T | Denotes the greatest value | compile-time
| | potentially returned by |
| | operator(). |
3 The following relation shall hold: G::min() < G::max().
A word of caution here - I came across a big gotcha when I implemented this. It seems that if the return types of max()/min()/operator() are not 64 bit then the distribution will resample. My (unsigned) 32 bit Sobol implementation was getting sampled twice per deviate thus destroying the properties of the numbers. This code reproduces:
#include <random>
#include <limits>
#include <iostream>
#include <cstdint>
typedef uint32_t rng_int_t;
int requested = 0;
int sampled = 0;
struct Quasi
{
rng_int_t operator()()
{
++sampled;
return 0;
}
rng_int_t min() const
{
return 0;
}
rng_int_t max() const
{
return std::numeric_limits<rng_int_t>::max();
}
};
int main()
{
std::uniform_real_distribution<double> dist(0.0,1.0);
Quasi q;
double total = 0.0;
for (size_t i = 0; i < 10; ++i)
{
dist(q);
++requested;
}
std::cout << "requested: " << requested << std::endl;
std::cout << "sampled: " << sampled << std::endl;
}
Output (using g++ 5.4):
requested: 10
sampled: 20
and even when compiled with -m32. If you change rng_int_t to 64bit the problem goes away. My workaround is to stick the 32 bit value into the most significant bits of the return value, e.g
return uint64_t(val) << 32;
You can now generate Sobol sequences directly with Boost. See boost/random/sobol.hpp.

How to let Boost::random and Matlab produce the same random numbers

To check my C++ code, I would like to be able to let Boost::Random and Matlab produce the same random numbers.
So for Boost I use the code:
boost::mt19937 var(static_cast<unsigned> (std::time(0)));
boost::uniform_int<> dist(1, 6);
boost::variate_generator<boost::mt19937&, boost::uniform_int<> > die(var, dist);
die.engine().seed(0);
for(int i = 0; i < 10; ++i) {
std::cout << die() << " ";
}
std::cout << std::endl;
Which produces (every run of the program):
4 4 5 6 4 6 4 6 3 4
And for matlab I use:
RandStream.setDefaultStream(RandStream('mt19937ar','seed',0));
randi(6,1,10)
Which produces (every run of the program):
5 6 1 6 4 1 2 4 6 6
Which is bizarre, since both use the same algorithm, and same seed.
What do I miss?
It seems that Python (using numpy) and Matlab seems comparable, in the random uniform numbers:
Matlab
RandStream.setDefaultStream(RandStream('mt19937ar','seed',203));rand(1,10)
0.8479 0.1889 0.4506 0.6253 0.9697 0.2078 0.5944 0.9115 0.2457 0.7743
Python:
random.seed(203);random.random(10)
array([ 0.84790006, 0.18893843, 0.45060688, 0.62534723, 0.96974765,
0.20780668, 0.59444858, 0.91145688, 0.24568615, 0.77430378])
C++Boost
0.8479 0.667228 0.188938 0.715892 0.450607 0.0790326 0.625347 0.972369 0.969748 0.858771
Which is identical to ever other Python and Matlab value...
I have to agree with the other answers, stating that these generators are not "absolute". They may produce different results according to the implementation. I think the simplest solution would be to implement your own generator. It might look daunting (Mersenne twister sure is by the way) but take a look at Xorshift, an extremely simple though powerful one. I copy the C implementation given in the Wikipedia link :
uint32_t xor128(void) {
static uint32_t x = 123456789;
static uint32_t y = 362436069;
static uint32_t z = 521288629;
static uint32_t w = 88675123;
uint32_t t;
t = x ^ (x << 11);
x = y; y = z; z = w;
return w = w ^ (w >> 19) ^ (t ^ (t >> 8));
}
To have the same seed, just put any values you want int x,y,z,w (except(0,0,0,0) I believe). You just need to be sure that Matlab and C++ use both 32 bit for these unsigned int.
Using the interface like
randi(6,1,10)
will apply some kind of transformation on the raw result of the random generator. This transformation is not trivial in general and Matlab will almost certainly do a different selection step than Boost.
Try comparing raw data streams from the RNGs - chances are they are the same
In case this helps anyone interested in the question:
In order to the get the same behavior for the Twister algorithm:
Download the file
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.c
Try the following:
#include <stdint.h>
// mt19937ar.c content..
int main(void)
{
int i;
uint32_t seed = 100;
init_genrand(seed);
for (i = 0; i < 100; ++i)
printf("%.20f\n",genrand_res53());
return 0;
}
Make sure the same values are generated within matlab:
RandStream.setGlobalStream( RandStream.create('mt19937ar','seed',100) );
rand(100,1)
randi() seems to be simply ceil( rand()*maxval )
Thanks to Fezvez's answer I've written xor128 in matlab:
function [ w, state ] = xor128( state )
%XOR128 implementation of Xorshift
% https://en.wikipedia.org/wiki/Xorshift
% A starting state might be [123456789, 362436069, 521288629, 88675123]
x = state(1);
y = state(2);
z = state(3);
w = state(4);
% t1 = (x << 11)
t1 = bitand(bitshift(x,11),hex2dec('ffffffff'));
% t = x ^ (x << 11)
t = bitxor(x,t1);
x = y;
y = z;
z = w;
% t2 = (t ^ (t >> 8))
t2 = bitxor(t, bitshift(t,-8));
% t3 = w ^ (w >> 19)
t3 = bitxor(w, bitshift(w,-19));
% w = w ^ (w >> 19) ^ (t ^ (t >> 8))
w = bitxor(t3, t2);
state = [x y z w];
end
You need to pass state in to xor128 every time you use it. I've written a "tester" function which simply returns a vector with random numbers. I tested 1000 numbers output by this function against values output by cpp with gcc and it is perfect.
function [ v ] = txor( iterations )
%TXOR test xor128, returns vector v of length iterations with random number
% output from xor128
% output
v = zeros(iterations,1);
state = [123456789, 362436069, 521288629, 88675123];
i = 1;
while i <= iterations
disp(i);
[t,state] = xor128(state);
v(i) = t;
i = i + 1;
end
I would be very careful assuming that two different implementations of pseudo random generators (even though based on the same algorithms) produce the same result. There could be that one of the implementations use some sort of tweak, hence producing different results. If you need two equal "random" distributions I suggest you either precalculate a sequence, store and access from both C++ and Matlab or create your own generator. It should be fairly easy to implement MT19937 if you use the pseudocode on Wikipedia.
Take care ensuring that both your Matlab and C++ code runs on the same architecture (that is, both runs on either 32 or 64-bit) - using a 64 bit integer in one implementation and a 32 bit integer in the other will lead to different results.