Random output different between implementations - c++

I've tried this program with libstdc++, libc++ and dinkumware:
#include <iostream>
#include <algorithm>
#include <vector>
#include <random>
#include <functional>
#include <limits>
int main()
{
std::vector<int> v(10);
std::mt19937 rand{0};
std::uniform_int_distribution<> dist(
1, 10
);
std::generate_n(v.begin(), v.size(),
std::bind(dist, rand));
for (auto i : v)
std::cout << i << " ";
}
Output respectively is:
6 6 8 9 7 9 6 9 5 7
6 1 4 4 8 10 4 6 3 5
5 10 4 1 4 10 8 4 8 4
The output is consistent for each run but as you can see, they're different. Explain?

There is no required implementation for uniform_int_distribution<>. [rand.dist.general] specifies that:
The algorithms for producing each of the specified distributions are implementation-defined.
All that [rand.dist.uni.int] states is:
A uniform_int_distribution random number distribution produces random integers i, a <= i <= b, distributed
according to the constant discrete probability function
P(i | a, b) = 1/(b − a + 1) .
Each implementation is free to achieve this distribution how it wishes. What you are seeing is apparently three different implementations.

To be clear: the random number generators themselves are specified quite tightly--including the input parameters and results. To be technical, what's specified is the 10000th result from a default-constructed generator, but for any practical purpose a match on this result from a generator that's at least reasonably close to correct otherwise essentially guarantees that the generator is working correctly, and its outputs will match ever other similar generator for a given seed.
For example, a quick test:
#include <random>
#include <iostream>
int main() {
std::mt19937 r;
for (int i=0; i<10000-2; i++)
r();
for (int i=0; i<3; i++)
std::cout << r() << "\n";
}
...shows identical results with every (recent) compiler I have handy:
1211010839
4123659995
725333953
The second of those three is the value required by the standard.
More leeway is given, however, in the distribution templates. A uniform_int_distribution has to map inputs to outputs uniformly, but there are different ways of doing that, and no requirement about which of those ways to use.
If you really need to produce a sequence of integers within a range that's not only uniformly distributed, but consistent between implementations, you'll probably have to implement your own distribution code. Doing this well isn't quite as trivial as most people initially think. You might want to look at one of my previous answers for a working implementation along with some explanation and a bit of test code.

Related

Random ints with different likelihoods

I was wondering if there was a way to have a random number between A an b and where if a number meets a certain requirement it is more likely to appear than all the other numbers between A and B, for example: Lower numbers are more likely to appear so if A = 1 and B = 10 then 1 would be the likeliest and 10 would be the unlikeliest.
All help is appreciated :) (sorry for bad English/grammar/question)
C++11 (which you should absolutely be using by now) added the <random> header to the C++ standard library. This header provides much higher quality random number generators to C++. Using srand() and rand() has never been a very good idea because there's no guarantee of quality, but now it's truly inexcusable.
In your example, it sounds like you want what would probably be called a 'discrete triangular distribution': the probability mass function looks like a triangle. The easiest (but perhaps not the most efficient) way to implement this in C++ would be the discrete distribution included in <random>:
auto discrete_triangular_distribution(int max) {
std::vector<int> weights(max);
std::iota(weights.begin(), weights.end(), 0);
std::discrete_distribution<> dist(weights.begin(), weights.end());
return dist;
}
int main() {
std::random_device rd;
std::mt19937 gen(rd());
auto&& dist = discrete_triangular_distribution(10);
std::map<int, int> counts;
for (int i = 0; i < 10000; i++)
++counts[dist(gen)];
for (auto count: counts)
std::cout << count.first << " generated ";
std::cout << count.second << " times.\n";
}
which for me gives the following output:
1 generated 233 times.
2 generated 425 times.
3 generated 677 times.
4 generated 854 times.
5 generated 1130 times.
6 generated 1334 times.
7 generated 1565 times.
8 generated 1804 times.
9 generated 1978 times.
Things more complex than this would be better served with either using one of the existing distributions (I have been told that all commonly used statistical distributions are included) or by writing your own distribution, which isn't too hard: it just has to be an object with a function call operator that takes a random bit generator and uses those bits to produce (in this case) random numbers. But you could create one that made random strings, or any arbitrary random objects, perhaps for testing purposes).
Your question doesn't specify which distribution to use. One option (of many) is to use the (negative) exponential distribution. This distribution is parameterized by a parameter λ. For each value of λ, the maximum result is unbounded (which needs to be handled in order to return results only in the range specified)
(from Wikipedia, By Skbkekas, CC BY 3.0)
so any λ could theoretically work; however, the properties of the CDF
(from Wikipedia, By Skbkekas, CC BY 3.0)
imply that it pays to choose something in the order of 1 / (to - from + 1).
The following class works like a standard library distribution. Internally, it generates numbers in a loop, until a result in [from, to] is obtained.
#include <iostream>
#include <iomanip>
#include <string>
#include <map>
#include <random>
class bounded_discrete_exponential_dist {
public:
explicit bounded_discrete_exponential_dist(std::size_t from, std::size_t to) :
m_from{from}, m_to{to}, m_d{0.5 / (to - from + 1)} {}
explicit bounded_discrete_exponential_dist(std::size_t from, std::size_t to, double factor) :
m_from{from}, m_to{to}, m_d{factor} {}
template<class Gen>
std::size_t operator()(Gen &gen) {
while(true) {
const auto r = m_from + static_cast<std::size_t>(m_d(gen));
if(r <= m_to)
return r;
}
}
private:
std::size_t m_from, m_to;
std::exponential_distribution<> m_d;
};
Here is an example of using it:
int main()
{
std::random_device rd;
std::mt19937 gen(rd());
bounded_discrete_exponential_dist d{1, 10};
std::vector<std::size_t> hist(10, 0);
for(std::size_t i = 0; i < 99999; ++i)
++hist[d(gen) - 1];
for(auto h: hist)
std::cout << std::string(static_cast<std::size_t>(80 * h / 99999.), '+') << std::endl;
}
When run, it outputs a histogram like this:
$ ./a.out
++++++++++
+++++++++
+++++++++
++++++++
+++++++
+++++++
+++++++
+++++++
++++++
++++++
Your basic random number generator should produce a high-quality, uniform random numbers on 0 to 1 - epsilon. You then transform it to get the distribution you want. The simplest transform is of course (int) ( p * N) in the common case of needing an integer on 0 to N -1.
But there are many many other transforms you can try. Take the square root, for example, to bias it to 1.0, then 1 - p to set the bias towards zero. Or you can look up the Poisson distribution, which might be what you are after. You can also use a half-Gaussian distribution (statistical bell curve with the zero entries cut off, and presumably also the extreme tail of the distribution as it goes out of range).
There can be no right answer. Try various things, plot out ten thousand or so values, and pick the one that gives results you like.
You can make an array of values, the more likely value has more indexes and then choose a random index.
example:
int random[55];
int result;
int index = 0;
for (int i = 1 ; i <= 10 ; ++i)
for (int j = i ; j <= 10 ; ++j)
random[index++] = i;
result = random[rand() % 55];
Also, you can try to get random number twice, first time you choose the max number then you choose your random number:
int max= rand() % 10 + 1; // This is your max value
int random = rand() % max + 1; // This is you result
Both ways will make 1 more likely than 2 , 2 more likely than 3 ... 9 more likely than 10.

Trying to produce a unique sequence of random numbers per iteration

As the title states, I'm trying to create a unique sequence of random numbers every time I run this little program.
However, sometimes I get results like:
102
201
102
The code
#include <cstdlib>
#include <ctime>
#include <iostream>
using namespace std;
int main() {
for (int i = 0; i < 3; i++) {
srand (time(NULL)+i);
cout << rand() % 3;
cout << rand() % 3;
cout << rand() % 3 << '\n' << endl;
}
}
Clearly srand doesn't have quite the magical functionality I wanted it to. I'm hoping that there's a logical hack around this though?
Edit1: To clarify, this is just a simple test program for what will be implemented on a larger scale. So instead of 3 iterations of rand%3, I might run 1000, or more of rand%50.
If I see 102 at some point in its operation, I'd want it so that I never see 102 again.
First of all, if you were going to use srand/rand, you'd want to seed it once (and only once) at the beginning of each execution of the program:
int main() {
srand(time(NULL));
for (int i = 0; i < 3; i++) {
cout << rand() % 3;
cout << rand() % 3;
cout << rand() % 3 << '\n' << endl;
}
Second, time typically only produces a result with a resolution of one second, so even with this correction, if you run the program twice in the same second, you can expect it to produce identical results in the two runs.
Third, you don't really want to use srand/rand anyway. The random number generator in <random> are generally considerably better (and, perhaps more importantly, are enough better defined that they represent a much better-known quantity).
#include <random>
#include <iostream>
int main() {
std::mt19937_64 gen { std::random_device()() };
std::uniform_int_distribution<int> d(0, 2);
for (int i = 0; i < 3; i++) {
for (int j=0; j<3; j++)
std::cout << d(gen);
std::cout << "\n";
}
}
Based on the edit, however, this still isn't adequate. What you really want is a random sample without duplication. To get that, you need to do more than just generate numbers. Randomly generated numbers not only can repeat, but inevitably will repeat if you generate enough of them (but the likelihood of repetition becomes quite high even when it's not yet inevitable).
As long as the number of results you're producing is small compared to the number of possible results, you can pretty easily just store results in a set as you produce them, and only treat a result as actual output if it wasn't previously present in the set:
#include <random>
#include <iostream>
#include <set>
#include <iomanip>
int main() {
std::mt19937_64 gen { std::random_device()() };
std::uniform_int_distribution<int> d(0, 999);
std::set<int> results;
for (int i = 0; i < 50;) {
int result = d(gen);
if (results.insert(result).second) {
std::cout << std::setw(5) << result;
++i;
if (i % 10 == 0)
std::cout << "\n";
}
}
}
This becomes quite inefficient if the number of results approaches the number of possible results. For example, let's assume your producing numbers from 1 to 1000 (so 1000 possible results). Consider what happens if you decide to produce 1000 results (i.e., all possible results). In this case, when you're producing the last result, there's really only one possibility left--but rather than just producing that one possibility, you produce one random number after another after another, until you stumble across the one possibility that remains.
For such a case, there are better ways to do the job. For example, you can start with a container holding all the possible numbers. To generate an output, you generate a random index into that container. You output that number, and remove that number from the container, then repeat (but this time, the container is one smaller, so you reduce the range of your random index by one). This way, each random number you produce gives one output.
It is possible to do the same by just shuffling an array of numbers. This has two shortcomings though. First, you need to shuffle them correctly--a Fischer-Yates shuffle works nicely, but otherwise it's easy to produce bias. Second, unless you actually do use all (or very close to all) the numbers in the array, this is inefficient.
For an extreme case, consider wanting a few (10, for example) 64-bit numbers. In this, you start by filling an array with numbers from 264-1. You then do 264-2 swaps. So, you're doing roughly 265 operations just to produce 10 numbers. In this extreme of a case, the problem should be quite obvious. Although it's less obvious if you produce (say) 1000 numbers of 32 bits apiece, you still have the same basic problem, just to a somewhat lesser degree. So, while this is a valid way to do things for a few specific cases, its applicability is fairly narrow.
Generate an array containing the 27 three digit numbers whose digits are less than 3. Shuffle it. Iterate through the shuffled array as needed, values will be unique until you've exhausted them all.
As other people have pointed out, don't keep reseeding your random number generator. Also, rand is a terrible generator, you should use one of the better choices available in C++'s standard libraries.
You are effectively generating a three digit base 3 number. Use your RNG of choice to generate a base 10 number in the range 0 .. 26 and convert it to base 3. That gives 000 .. 222.
If you absolutely must avoid repeats, then shuffle an array as pjs suggests. That will result in later numbers being 'less random' than the earlier numbers because they are taken from a smaller pool.

Random() efficiency in C++

I am writing function where I need to find the random number between 1 - 10. One of the easiest way is to use random() libc call. I am going to use this function a lot. But I don't know how efficient it will be. If any one has idea about efficiency of random() that will be a help ?
Also I notice that random() give the same pattern in 2 runs.
int main()
{
for(int i=0;i<10;i++)
{
cout << random() % 10 << endl;
}
}
Output 1st time :- 3 6 7 5 3 5 6 2 9 1
Second time also I got same output.
Then how come it's random ?
Others have explained why it's the same sequence every time, but this is how you generate a random number with C++:
#include <random>
int main() {
std::random_device rd{}; //(hopefully) truly random device
std::mt19937 engine{rd()}; //seed a pseudo rng with random_device
std::uniform_int_distribution<int> d(1,10); //1 to 10, inclusive
int RandNum = d(engine); //generate
return 0;
}
http://en.cppreference.com/w/cpp/numeric/random
The actual execution time depends on your platform of course, but it is pretty much straight forward, couple multiplication and divisions or shifts:
What common algorithms are used for C's rand()?
I don't think you should be worried. If you need a lot of random numbers, then another random source probably would be a better choice for you.
If you are looking for tweaks, how about splitting the result from rand() into individual digits to get several results per call.
This way is very simple and effective, you only need to set the seed:
#include <iostream>
#include <stdlib.h>
#include <time.h>
using namespace std;
int main(){
srand(time(NULL));
for(int i=0;i<10;i++)
cout << rand() % 10 << endl;
}
To fix the problem of getting same pattern in 2 runs just add the function randomize()

Range of random numbers as applied to array of words

I am very new to C++. I am basically self teaching. I came across a Hangman game project that I am using for practice. My problem is to do with the random word generation.
I know that for example int n=rand()% 10 means generate random numbers from range 0 to 10.
Now in the game there is an array with 10 elements for the ten words. What I am confused about is that if numbers from 0 to 10 is randomly generated, that would be a selection from 11 random numbers. However the array only has 10 elements (0-9).
What happens when the random generator chooses 10? Element 10 does not exist in the array, right?
So should this code not have been int n=rand()% 9 instead?
Also, could the same word be repeated before all words have been selected in the game? That would obviously not be ideal. If it could, then how do I prevent this?
I know that for example int n=rand()% 10 means generate random numbers
from range 0 to 10.
Not exactly. Generated range is then [0,9].
Side note: in C++11 you should use better random number generator: std::uniform_int_distribution
#include <random>
#include <iostream>
int main()
{
std::random_device rd;
std::mt19937 gen( rd());
// here (0,9) means endpoints included (this is a call to constructor)
std::uniform_int_distribution<> dis(0, 9);
std::cout << dis(gen) << std::endl; // std::endl forces std::cout to
// flush it's content, you may use '\n'
// instead to buffer content
return 0;
}
If you try to subscript array with out-of-range index then it is a disaster named Undefined Behavior:
Undefined behavior and sequence points
What are all the common undefined behaviours that a C++ programmer should know about?
You misunderstand ranges and modulus in C/C++: Ranges include the first element, but (usually) not the last element. Hence, the range [0, 10) is 0,1,2,3,...,9.The modulus is mathematical, the expression x % 10 clamps the result to the range [0, 10), which is 0,1,2,3,...,9

rand() gives still the same value

I noticed that while practicing by doing a simple console-based quiz app. When I'm using rand() it gives me the same value several times in a row. The smaller number range, the bigger the problem is.
For example
for (i=0; i<10; i++) {
x = rand() % 20 + 1;
cout << x << ", ";
}
Will give me 1, 1, 1, 2, 1, 1, 1, 1, 14, - there are definetely too much ones, right? I usually got from none to 4 odd numbers (rest is just the same, it can also be 11, 11, 11, 4, 11 ...)
Am I doing something wrong? Or rand() is not so random that I thought it is?
(Or is it just some habit from C#/Java that I'm not aware of? It happens a lot to me, too...)
If I run that code a couple of times, I get different output. Sure, not as varied as I'd like, but seemingly not deterministic (although of course it is, since rand() only gives pseudo-random numbers...).
However, the way you treat your numbers isn't going to give you a uniform distribution over [1,20], which I guess is what you expect. To achieve that is rather more complicated, but in no way impossible. For an example, take a look at the documentation for <random> at cplusplus.com - at the bottom there's a showcase program that generates a uniform distribution over [0,1). To get that to [1,20), you simply change the input parameters to the generator - it can give you a uniform distribution over any range you like.
I did a quick test, and called rand() one million times. As you can see in the output below, even at very large sample sizes, there are some nonuniformities in the distribution. As the number of samples goes to infinity, the line will (probably) flatten out, using something like rand() % 20 + 1 gives you a distribution that takes very long time to do so. If you take something else (like the example above) your chances are better at achieving a uniform distribution even for quite small sample sizes.
Edit:
I see several others posting about using srand() to seed the random number generator before using it. This is good advice, but it won't solve your problem in this case. I repeat: seeding is not the problem in this case.
Seeds are mainly used to control the reproducibility of the output of your program. If you seed your random number with a constant value (e.g. 0), the program will give the same output every time, which is useful for testing that everything works the way it should. By seeding with something non-constant (the current time is a popular choice) you ensure that the results vary between different runs of the program.
Not calling srand() at all is the same as calling srand(1), by the C++ standard. Thus, you'll get the same results every time you run the program, but you'll have a perfectly valid series of pseudo-random numbers within each run.
Sounds like you're hitting modulo bias.
Scaling your random numbers to a range by using % is not a good idea. It's just about passable if your reducing it to a range that is a power of 2, but still pretty poor. It is primarily influenced by the smaller bits which are frequently less random with many algorithms (and rand() in particular), and it contracts to the smaller range in a non-uniform fashion because the range your reducing to will not equally divide the range of your random number generator. To reduce the range you should be using a division and loop, like so:
// generate a number from 0 to range-1
int divisor = MAX_RAND/(range+1);
int result;
do
{
result = rand()/divisor;
} while (result >= range);
This is not as inefficient as it looks because the loop is nearly always passed through only once. Also if you're ever going to use your generator for numbers that approach MAX_RAND you'll need a more complex equation for divisor which I can't remember off-hand.
Also, rand() is a very poor random number generator, consider using something like a Mersenne Twister if you care about the quality of your results.
You need to call srand() first and give it the time for parameter for better pseudorandom values.
Example:
#include <iostream>
#include <string>
#include <vector>
#include "stdlib.h"
#include "time.h"
using namespace std;
int main()
{
srand(time(0));
int x,i;
for (i=0; i<10; i++) {
x = rand() % 20 + 1;
cout << x << ", ";
}
system("pause");
return 0;
}
If you don't want any of the generated numbers to repeat and memory isn't a concern you can use a vector of ints, shuffle it randomly and then get the values of the first N ints.
Example:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int main()
{
//Get 5 random numbers between 1 and 20
vector<int> v;
for(int i=1; i<=20; i++)
v.push_back(i);
random_shuffle(v.begin(),v.end());
for(int i=0; i<5; i++)
cout << v[i] << endl;
system("pause");
return 0;
}
The likely problems are that you are using the same "random" numbers each time and that any int mod 1 is zero. In other words (myInt % 1 == 0) is always true. Instead of %1, use % theBiggestNumberDesired.
Also, seed your random numbers with srand. Use a constant seed to verify that you are getting good results. Then change the seed to make sure you are still getting good results. Then use a more random seed like the clock to teat further. Release with the random seed.