Fast way to avoid modulo bias - c++

I'm doing a shuffle and it gets done very often on a small array. Could be anything from 1 - 10 elements.
I've tried the accepted answer in this question:
Is this C implementation of Fisher-Yates shuffle correct?
Unfortunately it's extremely slow.
I need a faster way of doing this and avoiding modulo bias which I'm seeing. Any suggestions?
EDIT:
Sorry I should point out that it's not the shuffle that's slow, it's the method used to generate a random int range. i.e. rand_int(). I'm using a Mersenne twister algorithm and RAND_MAX in my case is UINT_MAX to help out. This of course makes it slower when n is much smaller than RAND_MAX
I've also found 2 implementations of a rand_int type function.
static int rand_int(int n) {
int limit = RAND_MAX - RAND_MAX % n;
int rnd;
do {
rnd = rand();
} while (rnd >= limit);
return rnd % n;
}
The following is much much faster. But, does it avoid the modulo bias problem?
int rand_int(int limit) {
int divisor = RAND_MAX/(limit);
int retval;
do {
retval = rand() / divisor;
} while (retval > limit);
return retval;
}

Edit
To address the basic question on avoiding the modulo bias with rand() see http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx.
In short, you can't get truly uniform other than skipping non-domain random numbers1; The article lists some formulae to get a smaller bias (int r = rand() / ( RAND_MAX / N + 1 ) eg) without sacrificing more performance.
1 See Java's implementation of Random.nextInt(int):
http://download.oracle.com/javase/1.4.2/docs/api/java/util/Random.html#nextInt(int)
Using C++
You should be able to use std::random_shuffle (from <algorithm> header);
If you must roll your own shuffle implementation, I suggest using std::random (TR1, C++0x or Boost). It comes with a number of generators and distributions, with varying performance characteristics.
#include <random>
std::mt19937 rng(seed);
std::uniform_int_distribution<int> gen(0, N); // uniform, unbiased
int r = gen(rng);
Refer to the boost documentation for a good overview of Boost Random generator and distribution characteristics:
http://www.boost.org/doc/libs/1_47_0/doc/html/boost_random/reference.html#boost_random.reference.generators
Here is a sample of doing std::random_shuffle using Boost Random, directly:
#include <algorithm>
#include <functional>
#include <vector>
#include <boost/random.hpp>
struct Rng
{
Rng(boost::mt19937 &rng) : _rng(rng) {}
unsigned operator()(unsigned i)
{
boost::uniform_int<> dist(0, i - 1);
return dist(_rng);
}
private:
boost::mt19937 &_rng;
};
boost::mt19937 state;
std::random_shuffle(v.begin(), v.end(), Rng(state));

Related

When is it preferable to use rand() vs a generator + a distribution? (e.g. mt19937 + uniform_real_distribution)

After going through the rabbit hole that is learning about rand() and how it's not very good at generating uniform pseudorandom data based on what I've dug into based on this post:
Random float number generation. I am stuck trying to figure out which strategy would yield better balance of performance and accuracy when iterated a significant number of times, 128*10^6 for an example of my use case.
This link is what led me to make this post, otherwise I would have just used rand(): rand() considered harmful
Anyway, my main goal is to understand whether rand() is ever preferable to use over the generator + distribution method. There doesn't seem to be very good info even on cppreference.com or cplusplus.com for performance or time complexity for either of the two strategies.
For example, between the following two random number generation strategies is it always preferable to use the 2nd approach?
rand()
std::mt19937 and uniform_real_distribution
Here is an example of what my code would be doing:
int main(){
int numIterations = 128E6;
std::vector<float> randomData;
randomData.resize(numIterations);
for(int i = 0; i < numIterations; i++){
randomData[i] = float(rand())/float(RAND_MAX);
}
}
vs.
#include<random>
int main(){
std::mt19937 mt(1729);
std::uniform_real_distribution<float> dist(0.0, 1.0);
int numIterations = 128E6;
std::vector<float> randomData;
randomData.resize(numIterations);
for(int i = 0; i < numIterations; i++){
randomData[i] = dist(mt);
}
}

Draw random numbers from Boost binomial distribution

Here is an example to draw random numbers from a binomial distribution with std::binomial_distribution
#include <random>
int main ()
{
std::mt19937 eng(14);
std::binomial_distribution<size_t> dist(28,0.2);
size_t randomNumber = dist(eng);
return 0;
}
I am failing to find a similar example for boost. I went through this documentation, which explains how to compute PDF, CDF and others from a boost::math::binomial object but they are not talking about sampling a random number.
Should I write a binary search myself based on the CDF that boost::math::binomial will compute for me or can boost directly return random numbers?
Thanks to this link from #Bob__, here is a simple working example
#include <random>
#include <boost/random.hpp>
int main ()
{
std::mt19937 eng;
boost::random::binomial_distribution<int> dist(28,0.2);
int randomNumber = dist(eng);
return 0;
}
For some reason, it would not compile with size_t, so I used int (see #Bob__'s comment below for more information).

Adding Gaussian noise

I have a .arff file which contains a list of float numbers. I need to add to every number a gaussian noise, which in MATLAB would be:
m = m+k*randn(size(m)
where m is one of the numbers in the list and k is a standard deviation and has value 0.1. What is the C++ equivalent to randn()?
Could you please provide an example?
Use std::normal_distribution with an appropriate generator (std::default_random_engine will usually work). See http://en.cppreference.com/w/cpp/numeric/random for details on all of the random number generation facilities of the C++ standard library.
(live example)
#include <iostream>
#include <iterator>
#include <random>
int main() {
// Example data
std::vector<double> data = {1., 2., 3., 4., 5., 6.};
// Define random generator with Gaussian distribution
const double mean = 0.0;
const double stddev = 0.1;
std::default_random_engine generator;
std::normal_distribution<double> dist(mean, stddev);
// Add Gaussian noise
for (auto& x : data) {
x = x + dist(generator);
}
// Output the result, for demonstration purposes
std::copy(begin(data), end(data), std::ostream_iterator<double>(std::cout, " "));
std::cout << "\n";
return 0;
}
Output:
0.987803 1.89132 3.06843 3.89248 5.00333 6.07448
Further considerations
For decent statistical properties, you'll probably want to choose the std::mersenne_twister_engine generator (or, for convenience, the std::mt19937 predefined version), and seed it using std::random_device:
std::mt19937 generator(std::random_device{}());
[Note: Seeding from std::random_device is a good practice to get into; if you use the current time as a seed, you can end up with the same seed value across multiple generators (e.g. when initialising several generators in a very short period of time). std::random_device will obtain entropy from the system, if available.]
In order to avoid passing the generator to the distribution every time, you can do something like:
auto dist = std::bind(std::normal_distribution<double>{mean, stddev},
std::mt19937(std::random_device{}()));
which you can then use as follows:
double val = dist();
(Live example with these modifications)
The c++ standard now includes several distributions for random numbers.
You are looking for std::normal_distribution.
In the documentation you can also find a code sample
// construct a trivial random generator engine from a time-based seed:
unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
std::default_random_engine generator (seed);
std::normal_distribution<double> distribution (0.0,1.0);
std::cout << "some Normal-distributed(0.0,1.0) results:" << std::endl;
for (int i=0; i<10; ++i)
std::cout << distribution(generator) << std::endl;
The parameters given to the constructor, std::normal_distribution, are first mean (0.0) and standard-deviation (1.0).

C++ with OpenMP thread safe random numbers

I am trying to draw some random points, and then calculate smth with them. I am using few threads, but my random is not so random as it supposed to be... I mean when I am using rand() I gets correct answer, but very slow(because of static rand), so I am using rand_r with seed, but the answer of my program is always wird.
double randomNumber(unsigned int seed, double a, double b) {
return a + ((float)rand_r(&seed))/(float)(RAND_MAX) * (b-a);
}
my program:
#pragma omp parallel
for(int i = 0; i < points; i++){
seedX = (i+1) * time(NULL);
seedY = (points - i) * time(NULL);
punkt.x = randomNumber(seedX, minX, maxX);
punkt.y = randomNumber(seedY, minY, maxY);
...
}
I found some solution in other topics(some mt19937 generators etc), but i cant compile anything.
I am using g++ -fopenmp for compiling.(g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2)
edit:
seed = rand();
#pragma omp parallel
for(int i = 0; i < points; i++){
punkt.x = randomNumber(seed, minX, maxX);
punkt.y = randomNumber(seed, minY, maxY);
...
}
Re-seeding your generators within each iteration of the for loop is going to ruin their statistical properties.
Also, it's likely that you'll introduce correlation between your x and y values if you extract them using two linear congruential generators.
Keep it simple; use one generator, and one seed.
Going forward, I'd recommend you use mt19937 as it will have better properties still. Linear congruential generators can fail a chi squared test for autocorrelation which is particularly important if you are using it for an x, y plot.
I believe that others are trying to say is, seed one in constructor with srand(some number), then do not seed anymore.
class someRandomNumber
{
}

Generate Random Number Based on Beta Distribution using Boost

I am trying to use Boost to generate random numbers according to the beta distribution using C++. I have seen many examples online for generating random numbers according to distributions in random.hpp (e.g. this book). However, I cannot seen to translate them to use the beta distribution found in beta.hpp.
Thanks.
You'll first want to draw a random number uniformly from the range (0,1). Given any distribution, you can then plug that number into the distribution's "quantile function," and the result is as if a random value was drawn from the distribution. From here:
A general method to generate random numbers from an arbitrary distribution which has a cdf without jumps is to use the inverse function to the cdf: G(y)=F^{-1}(y). If u(1), ..., u(n) are random numbers from the uniform on (0,1) distribution then G(u(1)), ..., G(u(n)) is a random sample from the distribution with cdf F(x).
So how do we get a quantile function for a beta distribution? The documentation for beta.hpp is here. You should be able to use something like this:
#include <boost/math/distributions.hpp>
using namespace boost::math;
double alpha, beta, randFromUnif;
//parameters and the random value on (0,1) you drew
beta_distribution<> dist(alpha, beta);
double randFromDist = quantile(dist, randFromUnif);
According to boost's demo for the random number library
Random_demo.cpp and Generating integers with different probabilities
What you should do is to use "variate_generator" class to bind your random number generator and distribution.
An example may look like
#include <iostream>
#include "boost/random.hpp"
int main(int argc, char *argv[])
{
int seed = 2018;
typedef boost::random::mt19937 RandomNumberGenerator;
typedef boost::random::beta_distribution<> BetaDistribution;
typedef boost::variate_generator<RandomNumberGenerator&, BetaDistribution>
Generator;
RandomNumberGenerator Rng(seed);
BetaDistribution distribution(2,5);
Generator getRandomNumber(Rng,distribution);
for (int idx = 0 ; idx < 1000 ; ++idx)
{
std::cout << getRandomNumber() << std::endl;
}
return 0;
}
However, in the more recent document enter link description here, it seems that boost recommends to directly passing the generator to the distribution obejct. The result from the code below is identical.
#include <iostream>
#include "boost/random.hpp"
int main(int argc, char *argv[])
{
int seed = 2018;
typedef boost::random::mt19937 RandomNumberGenerator;
typedef boost::random::beta_distribution<> BetaDistribution;
RandomNumberGenerator Rng(seed);
BetaDistribution distribution(2,5);
for (int idx = 0 ; idx < 1000 ; ++idx)
{
std::cout << distribution(Rng) << std::endl;
}
return 0;
}