Random function generator C++ - c++

I want to generate a random number in C++ based on known distribution.
Here is the problem. I rolled a dice (say) 6 times, and I record a four for 3 times, and an one for 1 times, and a two for 2 times.
So four=3/6, one=1/6, two=2/6
Is there a library function that I could use which generates a random number based on the above distribution?
If not, do you think it is valid for me to simply do
int i= ran()%5;
if (i is in the range of 0 to 2)
{
//PICK FOUR
}
else if (i is in the range of 3 to 4)
{
// PICK ONE
}
else
{
// PICK TWO
}

int pick()
{
static const int val[6] = { 4,4,4,1,2,2 };
return val[ran()%6]; // <---- note %6 not %5
}
Edit Note ran() % 6 may or may not be uniformly distributed, even if ran() is. You probably want something that is guaranteed to be uniformly distributed, e.g.
std::random_device device;
std::default_random_engine engine(device());
std::uniform_int_distribution<int> dist(0, 5);
Now dist(engine) is a good replacement for ran()%6.
Edit2 From a suggestion in the comments, here's a version based on std::discrete_distribution:
std::random_device device;
std::default_random_engine engine(device());
std::discrete_distribution<> dist ({1, 2, 0, 3, 0, 0});
int pick()
{
return dist(engine) + 1;
}

Related

When is it preferable to use rand() vs a generator + a distribution? (e.g. mt19937 + uniform_real_distribution)

After going through the rabbit hole that is learning about rand() and how it's not very good at generating uniform pseudorandom data based on what I've dug into based on this post:
Random float number generation. I am stuck trying to figure out which strategy would yield better balance of performance and accuracy when iterated a significant number of times, 128*10^6 for an example of my use case.
This link is what led me to make this post, otherwise I would have just used rand(): rand() considered harmful
Anyway, my main goal is to understand whether rand() is ever preferable to use over the generator + distribution method. There doesn't seem to be very good info even on cppreference.com or cplusplus.com for performance or time complexity for either of the two strategies.
For example, between the following two random number generation strategies is it always preferable to use the 2nd approach?
rand()
std::mt19937 and uniform_real_distribution
Here is an example of what my code would be doing:
int main(){
int numIterations = 128E6;
std::vector<float> randomData;
randomData.resize(numIterations);
for(int i = 0; i < numIterations; i++){
randomData[i] = float(rand())/float(RAND_MAX);
}
}
vs.
#include<random>
int main(){
std::mt19937 mt(1729);
std::uniform_real_distribution<float> dist(0.0, 1.0);
int numIterations = 128E6;
std::vector<float> randomData;
randomData.resize(numIterations);
for(int i = 0; i < numIterations; i++){
randomData[i] = dist(mt);
}
}

Generating a random connected graph in c++

So I have to do make a generate function for a undirected graph. It has to make a randomly connected graph. First I have to create a spanning tree, then proceed with more edges.
So let's say I have to make a 100-vertice graph with a density (I guess that is how you say it in English) of let's say 25%. So I will have to make 1237.5 edges (let's say 1237).
Now I would do this like this:
Make a list of all non-used vertices
Randomly choose 2 of them, connect them, remove the first one from a list
Connect next vertice to the one connected before (the second one, not erased from a list)
Do this until we have all vertices used (empty list)
Then calculate how many I have to add (1237-100)
Now randomly choose 2 vertices, check if they are connected. If not, connect them. Repeat this 1137 times
Now I have a connected graph. The problem is I wrote C++ code for this. It works as intended, but I have to measure the times for some operations (not generation, f.e Prim's algorithm), for densities = {25, 50, 75, 99} and for 5 different vertice quantities (f.e from 500 to 5000). I've run a program which would do so. It's running for like a half an hour now. Is there a way so I can optimize my code (by changing steps I did) to make it go faster? (I would especially like step 6 to be more convenient, cuz picking 2 random values and checking if they are not connected (that means, iterating through whole vector of vectors) is not really good, cuz then I can randomly again pick the same values and it can go 4ever)
The code:
std::vector<std::vector<int>> undirected_graph::generate(int vertices, int d)
{
float density = d / 100;
std::vector<std::vector<int>> data;
density = density / 100;
int edges = floor(density * vertices * (vertices - 1) / 2);
std::vector<int> v_vertices(vertices);
std::iota(std::begin(v_vertices), std::end(v_vertices), 0);
std::random_device r;
std::default_random_engine e1(r());
std::uniform_int_distribution<int> uniform_dist2(1, 100);
int random = 0, random1;
v_vertices.erase(v_vertices.begin());
for (int i = 0; i < vertices - 1; i++) // spanning tree
{
std::uniform_int_distribution<int> uniform_dist1(0, v_vertices.size() - 1);
random1 = uniform_dist1(e1);
data.push_back({ random , v_vertices[random1], uniform_dist2(e1) }); // random edge
random = v_vertices[random1];
v_vertices.erase(v_vertices.begin() + random1);
}
// now the additional edges
int needed = edges - vertices;
std::uniform_int_distribution<int> uniform_dist1(0, vertices-1);
std::uniform_int_distribution<int> uniform_dist3(0, vertices-1);
for (int i = 0; i < needed;)
{
random = uniform_dist1(e1);
random1 = uniform_dist3(e1);
if (random != random1)
{
if (!multipleEdgeGenerate(data, random, random1)) // checking if edge like this exists
{
data.push_back({ random , v_vertices[random1], uniform_dist2(e1) });
i++;
}
}
}
return data;
}

Adding Gaussian noise

I have a .arff file which contains a list of float numbers. I need to add to every number a gaussian noise, which in MATLAB would be:
m = m+k*randn(size(m)
where m is one of the numbers in the list and k is a standard deviation and has value 0.1. What is the C++ equivalent to randn()?
Could you please provide an example?
Use std::normal_distribution with an appropriate generator (std::default_random_engine will usually work). See http://en.cppreference.com/w/cpp/numeric/random for details on all of the random number generation facilities of the C++ standard library.
(live example)
#include <iostream>
#include <iterator>
#include <random>
int main() {
// Example data
std::vector<double> data = {1., 2., 3., 4., 5., 6.};
// Define random generator with Gaussian distribution
const double mean = 0.0;
const double stddev = 0.1;
std::default_random_engine generator;
std::normal_distribution<double> dist(mean, stddev);
// Add Gaussian noise
for (auto& x : data) {
x = x + dist(generator);
}
// Output the result, for demonstration purposes
std::copy(begin(data), end(data), std::ostream_iterator<double>(std::cout, " "));
std::cout << "\n";
return 0;
}
Output:
0.987803 1.89132 3.06843 3.89248 5.00333 6.07448
Further considerations
For decent statistical properties, you'll probably want to choose the std::mersenne_twister_engine generator (or, for convenience, the std::mt19937 predefined version), and seed it using std::random_device:
std::mt19937 generator(std::random_device{}());
[Note: Seeding from std::random_device is a good practice to get into; if you use the current time as a seed, you can end up with the same seed value across multiple generators (e.g. when initialising several generators in a very short period of time). std::random_device will obtain entropy from the system, if available.]
In order to avoid passing the generator to the distribution every time, you can do something like:
auto dist = std::bind(std::normal_distribution<double>{mean, stddev},
std::mt19937(std::random_device{}()));
which you can then use as follows:
double val = dist();
(Live example with these modifications)
The c++ standard now includes several distributions for random numbers.
You are looking for std::normal_distribution.
In the documentation you can also find a code sample
// construct a trivial random generator engine from a time-based seed:
unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
std::default_random_engine generator (seed);
std::normal_distribution<double> distribution (0.0,1.0);
std::cout << "some Normal-distributed(0.0,1.0) results:" << std::endl;
for (int i=0; i<10; ++i)
std::cout << distribution(generator) << std::endl;
The parameters given to the constructor, std::normal_distribution, are first mean (0.0) and standard-deviation (1.0).

Default_random_engine passed into a function gives repeatable results

I have a class Permutation that inherits from std::vector<int>. I created a constructor that makes the object filled with non-repeating numbers. Randomness is meant to be guaranteed by <random> stuff, so the declaration goes like this:
/* Creates a random permutation of a given length
* Input: n - length of permutation
* generator - engine that does the randomizing work */
Permutation(int n, default_random_engine generator);
Function itself looks like this (irrevelant details skipped):
Permutation::Permutation(int n, default_random_engine generator):
vector<int>(n, 0)
{
vector<int> someIntermediateStep(n, 0);
iota(someIntermediateStep.begin(), someIntermediateStep.end(), 0); //0, 1, 2...
shuffle(someIntermediateStep.begin(), someIntermediateStep.end(),
generator);
// etc.
}
And is called in the following context:
auto seed = std::chrono::system_clock::now().time_since_epoch().count();
static std::default_random_engine generator(seed);
for (int i = 0; i < n; i++)
Permutation test(length, generator);
Code compiles perfectly fine, but all instances of Permutation are the same. How to force regular generation of random numbers? I know that default_random_engine should be binded to a distribution object, but hey, I don't have any – I use the engine only in shuffle() (at least at the moment).
Is there any solution or a workaround that still uses the goodness of <random>?
Your Permutation constructor takes the engine in by value. So, in this loop:
for (int i = 0; i < n; i++)
Permutation test(length, generator);
You are passing a copy of the same engine, in the same state, over and over. So you are of course getting the same results. Pass the engine by reference instead
Permutation::Permutation(int n, default_random_engine& generator)
That way its state will be modified by the call to std::shuffle.
So a childish mistake, just as I supposed – I mixed various solutions to similar problems in a wrong way.
As Benjamin pointed out, I mustn't copy the same engine over and over again, because it remains, well, the same. But this alone doesn't solve the issue, since the engine is pointlessly declared static (thanks, Zereges).
For the sake of clarity, corrected code looks like this:
Permutation(int n, default_random_engine &generator);
// [...]
Permutation::Permutation(int n, default_random_engine generator):
vector<int>(n, 0)
{
vector<int> someIntermediateStep(n, 0);
iota(someIntermediateStep.begin(), someIntermediateStep.end(), 0); //0, 1, 2...
shuffle(someIntermediateStep.begin(), someIntermediateStep.end(),
generator);
// etc.
}
// [...]
// some function
auto seed = chrono::system_clock::now().time_since_epoch().count();
default_random_engine generator(seed);
for (int i = 0; i < n; i++)
Permutation test(length, generator);

Fast way to avoid modulo bias

I'm doing a shuffle and it gets done very often on a small array. Could be anything from 1 - 10 elements.
I've tried the accepted answer in this question:
Is this C implementation of Fisher-Yates shuffle correct?
Unfortunately it's extremely slow.
I need a faster way of doing this and avoiding modulo bias which I'm seeing. Any suggestions?
EDIT:
Sorry I should point out that it's not the shuffle that's slow, it's the method used to generate a random int range. i.e. rand_int(). I'm using a Mersenne twister algorithm and RAND_MAX in my case is UINT_MAX to help out. This of course makes it slower when n is much smaller than RAND_MAX
I've also found 2 implementations of a rand_int type function.
static int rand_int(int n) {
int limit = RAND_MAX - RAND_MAX % n;
int rnd;
do {
rnd = rand();
} while (rnd >= limit);
return rnd % n;
}
The following is much much faster. But, does it avoid the modulo bias problem?
int rand_int(int limit) {
int divisor = RAND_MAX/(limit);
int retval;
do {
retval = rand() / divisor;
} while (retval > limit);
return retval;
}
Edit
To address the basic question on avoiding the modulo bias with rand() see http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx.
In short, you can't get truly uniform other than skipping non-domain random numbers1; The article lists some formulae to get a smaller bias (int r = rand() / ( RAND_MAX / N + 1 ) eg) without sacrificing more performance.
1 See Java's implementation of Random.nextInt(int):
http://download.oracle.com/javase/1.4.2/docs/api/java/util/Random.html#nextInt(int)
Using C++
You should be able to use std::random_shuffle (from <algorithm> header);
If you must roll your own shuffle implementation, I suggest using std::random (TR1, C++0x or Boost). It comes with a number of generators and distributions, with varying performance characteristics.
#include <random>
std::mt19937 rng(seed);
std::uniform_int_distribution<int> gen(0, N); // uniform, unbiased
int r = gen(rng);
Refer to the boost documentation for a good overview of Boost Random generator and distribution characteristics:
http://www.boost.org/doc/libs/1_47_0/doc/html/boost_random/reference.html#boost_random.reference.generators
Here is a sample of doing std::random_shuffle using Boost Random, directly:
#include <algorithm>
#include <functional>
#include <vector>
#include <boost/random.hpp>
struct Rng
{
Rng(boost::mt19937 &rng) : _rng(rng) {}
unsigned operator()(unsigned i)
{
boost::uniform_int<> dist(0, i - 1);
return dist(_rng);
}
private:
boost::mt19937 &_rng;
};
boost::mt19937 state;
std::random_shuffle(v.begin(), v.end(), Rng(state));