Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Weird question I guess. It's out of curiosity.
Using rand() function, if we set the parameters between 1-10, i then ran a test a few times on my machines UNIX operating system, more specifically Ubuntu. My results always showed higher numbers (greater then 5) being more likely returned. It didn't seem at all as if it was random.
I also read up on the modulus which states that using the modulus operation, we form some kind of bias.
Notice though that this modulo operation does not generate uniformly distributed random numbers in the span (since in most cases this operation makes lower numbers slightly more likely).
Why is that? Also it said lower numbers become more likely, however I get more higher numbers
How to test bias
The rand() generator on your system (the one in glibc) has problems, but excessive bias is not among them. Let's assume that you use the following code to generate random numbers within a given range.
int random_int(int min, int max)
return min + rand() % (max - min + 1);
Let's not assume you seed the numbers.
int main(int argc, char **argv)
int histo[10];
for (int i = 0; i < 10; i++)
histo[i] = 0;
for (int i = 0; i < 10000; i++)
histo[random_int(1, 10) - 1]++;
for (int i = 0; i < 10; i++)
printf("%d\n", histo[i]);
This will give us 10,000 samples, which is small but workable. I get the following results. If you are using the same version of glibc, you will get the exact same.
We expect bins to follow the binomial distribution, given an unbiased generator. For 10,000 samples, we expect the per-bin variance to be Np(1-p) or 900, which gives a standard deviation of exactly 30. Our sample variance is 1105. Now, I'm not going to do anything rigorous here... I'm going to pretend that the binomial distributions are normal... and I'm just going to do a simple chi-square test. The results are p=0.2. Not exactly damning.
So if you want to test your random number generator, remember to do the math afterwards to interpret the results of your test.
Modulo bias
The modulo bias actually increases the probability of lower numbers, not higher numbers. The bias is very small for such ranges (1..10) because RAND_MAX is 231-1 for glibc, and this gives increases the probability of small numbers by something like 1 in 200 million. You would need to perform a larger number of tests to expose modulo bias.
The main reason that modulo is discouraged is because the low bits of common rand() implementations show poor independence. Of course, you also shouldn't use this technique to generate large ranges.
If you really want to test your random number generator, I suggest looking at the late Marsaglia's "Diehard" tests. If you just want a good random number generator, you can use arc4random, Mersenne Twister, or /dev/urandom. Your choice will differ depending on whether you are developing a cryptographic application or using the results for Monte Carlo simulation.
What is the difference between a truly random number and a pseudo random number in C++? (I guess, that it wants me to make my question longer, although I've put my problem as basically as I can. I thought that was the reason my last question wasn't received so well, I could be wrong. -_-. I did google this, and I've not been successful.)
Pseudo random numbers aren't truly random numbers because they are generated using a deterministic process. For example, if I wanted a random number between 0 and 4 and I had access to the current time in seconds, I could generate pseudo random numbers by applying modulus 5 to the time value and this would return a different number between 0 and 4.The reason the number isn't truly random is because it is possible to predict that number before generation by knowing the time. A truly random number could never be predicted.
You could Google "determinism" for more.
This question could lead into a very deep philosophical rabbit hole. It is an open question if there really is a process in nature that can be truly non-deterministic. Quantum mechanics suggests that the universe is non-deterministic, but with deterministic probabilistic laws on some level. Nevertheless, this does not rule out the possibility that at some more fundamental level, a deterministic process resides, which simply appears random, at a larger scale. Randomness and chaos are intimately related, but they do not necessarily mean the same thing.
As a result, the word random could mean several different things. For practical purposes, we can consider randomness as a certain type of mathematical pattern, with certain characteristics. A random process typically follows some probabilistic distribution (i.e. white noise, pink noise, Gaussian, Poisson, ...), but it is not possible to practically predict individual results of when you sample the output of the random process.
A pseudo random number generator or PRNG uses an algorithm that employs deterministic chaos to create a pattern that appears statistically similar to a true random process. PRNGs typically use a starting point or seed. It is common to use the output of a time function to seed a PRNG. There are a multitude of different algorithms for PRGNs, which differ in performance and quality.
One significant weaknesses of a PRNG, is that the output eventually repeats or is periodic. It is also possible, if one knows the algorithm, and the starting point (seed), to recreate the exact same sequence, because digital computers are necessarily deterministic devices. Security mechanisms which rely on PRNGs often have hidden vulnerabilities, due to the PRNGs that they employ.
Certain applications, especially in security, or for example a lottery, require what is, for practical purposes considered a true random number generator. Hardware devices are available, which exploit physical phenomena. Thermal noise (i.e. random motion of electrons), for example, is governed by Quantum Mechanical behaviour and is often used in RNG hardware implementations. Certain quantum mechanical phenomena are generally considered to be random. According to most standard interpretations of Quantum Mechanics, even if you had a complete model of the system (i.e. time evolution of Schrödinger Equation) and knowledge of all the initial variables, to infinite precision, you still would not be able to predict exactly the next outcome of a measurement.
In theory, you could also use radio background noise (think television static or radio noise between the stations). Any of these signal sources can be filtered or used as input themselves for a PRNG to create high quality random sequences, without any noticeable statistical predictability.
Here is very simple toy PRNG algorithm for you to play around with and help understand the concept.
// linear congruential generator
int seed 123456789;
int rand() {
seed = ( a * seed + c ) % m;
return seed;
The above is taken from Special simple random number generator. I hope that sheds some light on your question.
What is the difference between a truly random number and a pseudo
random number in C++?
In C++, it is possible to repeat a pseudo random number sequence. A 'truly random' number (I think not supported by C++ nor C, without custom hardware) can not be repeated.
Marked as C++, so here is one C++ approach.
Lately I have been using what C++ provides, but I still control the range of my numbers by shuffling a simple vector of values. This also supports applying the same random shuffle to more than one test, or to repeat a sequence.
typedef std::vector<int> IntVec_t; // a convenient type
IntVec_t iVec; // a std vector of int
for (int i=1; i<=32; ++i)
iVec.push_back(i); // add 32 ints, range 1..32
std::random_device rd; // the generator uses this
std::mt19937_64 gen(rd()); // there are several generators available
std::shuffle (iVec.begin(), iVec.end(), gen); // shuffle the vector
initialized to: ( 16 elements to a line)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Shuffled contents (possible) :
29 26 12 16 20 7 31 28 30 18 4 23 17 14 32 1
11 13 5 6 9 2 19 10 21 24 3 8 25 27 22 15
In short
pseudo numbers have repition while
truly random numbers have no repition its have a seed such as a time or anything else
which will obviously different each and every time.
For a monte carlo integration process, I need to pull a lot of random samples from
a histogram that has N buckets, and where N is arbitrary (i.e. not a power of two) but
doesn't change at all during the course of the computation.
By a lot, I mean something on the order of 10^10, 10 billions, so pretty much any
kind of lengthy precomputation is likely worth it in the face of the sheer number of
I have at my disposal a very fast uniform pseudo random number generator that
typically produces unsigned 64 bits integers (all the ints in the discussion
below are unsigned).
The naive way to pull a sample : histogram[ prng() % histogram.size() ]
The naive way is very slow: the modulo operation is using an integer division (IDIV)
which is terribly expensive and the compiler, not knowing the value of histogram.size()
at compile time, can't be up to its usual magic (i.e.
As a matter of fact, the bulk of my computation time is spent extracting that darn modulo.
The slightly less naive way: I use libdivide ( which is capable
of pulling off a very fast "divide by a constant not known at compile time".
That gives me a very nice win (25% or so), but I have a nagging feeling that I can do
better, here's why:
First intuition: libdivide computes a division. What I need is a modulo, and to get there
I have to do an additional mult and a sub : mod = dividend - divisor*(uint64_t)(dividend/divisor). I suspect there might be a small win there, using libdivide-type
techniques that produce the modulo directly.
Second intuition: I am actually not interested in the modulo itself. What I truly want is
to efficiently produce a uniformly distributed integer value that is guaranteed to be strictly smaller than N.
The modulo is a fairly standard way of getting there, because of two of its properties:
A) mod(prng(), N) is guaranteed to be uniformly distributed if prng() is
B) mod(prgn(), N) is guaranteed to belong to [0,N[
But the modulo is/does much more that just satisfy the two constraints above, and in fact
it does probably too much work.
All need is a function, any function that obeys constraints A) and B) and is fast.
So, long intro, but here comes my two questions:
Is there something out there equivalent to libdivide that computes integer modulos directly ?
Is there some function F(X, N) of integers X and N which obeys the following two constraints:
If X is a random variable uniformly distributed then F(X,N) is also unirformly distributed
F(X, N) is guranteed to be in [0, N[
(PS : I know that if N is small, I do not need to cunsume all the 64 bits coming out of
the PRNG. As a matter of fact, I already do that. But like I said, even that optimization
is a minor win when compare to the big fat loss of having to compute a modulo).
Edit : prng() % N is indeed not exactly uniformly distributed. But for N large enough, I don't think it's much of problem (or is it ?)
Edit 2 : prng() % N is indeed potentially very badly distributed. I had never realized how bad it could get. Ouch. I found a good article on this :
Under the circumstances, the simplest approach may work the best. One extremely simple approach that might work out if your PRNG is fast enough would be to pre-compute one less than the next larger power of 2 than your N to use as a mask. I.e., given some number that looks like 0001xxxxxxxx in binary (where x means we don't care if it's a 1 or a 0) we want a mask like 000111111111.
From there, we generate numbers as follows:
Generate a number
and it with your mask
if result > n, go to 1
The exact effectiveness of this will depend on how close N is to a power of 2. Each successive power of 2 is (obviously enough) double its predecessor. So, in the best case N is exactly one less than a power of 2, and our test in step 3 always passes. We've added only a mask and a comparison to the time taken for the PRNG itself.
In the worst case, N is exactly equal to a power of 2. In this case, we expect to throw away roughly half the numbers we generated.
On average, N ends up roughly halfway between powers of 2. That means, on average, we throw away about one out of four inputs. We can nearly ignore the mask and comparison themselves, so our speed loss compared to the "raw" generator is basically equal to the number of its outputs that we discard, or 25% on average.
If you have fast access to the needed instruction, you could 64-bit multiply prng() by N and return the high 64 bits of the 128-bit result. This is sort of like multiplying a uniform real in [0, 1) by N and truncating, with bias on the order of the modulo version (i.e., practically negligible; a 32-bit version of this answer would have small but perhaps noticeable bias).
Another possibility to explore would be use word parallelism on a branchless modulo algorithm operating on single bits, to get random numbers in batches.
Libdivide, or any other complex ways to optimize that modulo are simply overkill. In a situation as yours, the only sensible approach is to
ensure that your table size is a power of two (add padding if you must!)
replace the modulo operation with a bitmask operation. Like this:
size_t tableSize = 1 << 16;
size_t tableMask = tableSize - 1;
histogram[prng() & tableMask]
A bitmask operation is a single cycle on any CPU that is worth its money, you can't beat its speed.
I don't know about the quality of your random number generator, but it may not be a good idea to use the last bits of the random number. Some RNGs produce poor randomness in the last bits and better randomness in the upper bits. If that is the case with your RNG, use a bitshift to get the most significant bits:
size_t bitCount = 16;
histogram[prng() >> (64 - bitCount)]
This is just as fast as the bitmask, but it uses different bits.
You could extend your histogram to a "large" power of two by cycling it, filling in the trailing spaces with some dummy value (guaranteed to never occur in the real data). E.g. given a histogram
[10, 5, 6]
extend it to length 16 like so (assuming -1 is an appropriate sentinel):
[10, 5, 6, 10, 5, 6, 10, 5, 6, 10, 5, 6, 10, 5, 6, -1]
Then sampling can be done via a binary mask histogram[prng() & mask] where mask = (1 << new_length) - 1, with a check for the sentinel value to retry, that is,
int value;
do {
value = histogram[prng() & mask];
} while (value == SENTINEL);
// use `value` here
The extension is longer than necessary to make retries unlikely by ensuring that the vast majority of the elements are valid (e.g. in the example above only 1/16 lookups will "fail", and this rate can be reduced further by extending it to e.g. 64). You could even use a "branch prediction" hint (e.g. __builtin_expect in GCC) on the check so that the compiler orders code to be optimal for the case when value != SENTINEL, which is hopefully the common case.
This is very much a memory vs. speed trade-off.
Just a few ideas to complement the other good answers:
What percent of time is spent in the modulo operation, and how do you know what that percent is? I only ask because sometimes people say something is terribly slow when in fact it is less than 10% of the time and they only think it's big because they're using a silly self-time-only profiler. (I have a hard time envisioning a modulo operation taking a lot of time compared to a random number generator.)
When does the number of buckets become known? If it doesn't change too frequently, you can write a program-generator. When the number of buckets changes, automatically print out a new program, compile, link, and use it for your massive execution.
That way, the compiler will know the number of buckets.
Have you considered using a quasi-random number generator, as opposed to a pseudo-random generator? It can give you higher precision of integration in much fewer samples.
Could the number of buckets be reduced without hurting the accuracy of the integration too much?
The non-uniformity dbaupp cautions about can be side-stepped by rejecting&redrawing values no less than M*(2^64/M) (before taking the modulus).
If M can be represented in no more than 32 bits, you can get more than one value less than M by repeated multiplication (see David Eisenstat's answer) or divmod; alternatively, you can use bit operations to single out bit patterns long enough for M, again rejecting values no less than M.
(I'd be surprised at modulus not being dwarfed in time/cycle/energy consumption by random number generation.)
To feed the bucket, you may use std::binomial_distribution to directly feed each bucket instead of feeding the bucket one sample by one sample:
Following may help:
int nrolls = 60; // number of experiments
const std::size_t N = 6;
unsigned int bucket[N] = {};
std::mt19937 generator(time(nullptr));
for (int i = 0; i != N; ++i) {
double proba = 1. / static_cast<double>(N - i);
std::binomial_distribution<int> distribution (nrolls, proba);
bucket[i] = distribution(generator);
nrolls -= bucket[i];
Live example
Instead of integer division you can use fixed point math, i.e integer multiplication & bitshift. Say if your prng() returns values in range 0-65535 and you want this quantized to range 0-99, then you do (prng()*100)>>16. Just make sure that the multiplication doesn't overflow your integer type, so you may have to shift the result of prng() right. Note that this mapping is better than modulo since it's retains the uniform distribution.
Thanks everyone for you suggestions.
First, I am now thoroughly convinced that modulo is really evil.
It is both very slow and yields incorrect results in most cases.
After implementing and testing quite a few of the suggestions, what
seems to be the best speed/quality compromise is the solution proposed
by #Gene:
pre-compute normalizer as:
auto normalizer = histogram.size() / (1.0+urng.max());
draw samples with:
return histogram[ (uint32_t)floor(urng() * normalizer);
It is the fastest of all methods I've tried so far, and as far as I can tell,
it yields a distribution that's much better, even if it may not be as perfect
as the rejection method.
Edit: I implemented David Eisenstat's method, which is more or less the same as Jarkkol's suggestion : index = (rng() * N) >> 32. It works as well as the floating point normalization and it is a little faster (9% faster in fact). So it is my preferred way now.
I'm writing in C/C++ and I want to create a lot of random numbers which are bigger than 100,000. How I would do that? With rand();
You wouldn't do that with rand, but with a proper random number generator which comes with newer C++, see e.g.
const int min = 100000;
const int max = 1000000;
std::default_random_engine generator;
std::uniform_int_distribution<int> distribution(min,max);
int random_int = distribution(generator); // generate random int flat in [min, max]
Don't forget to properly seed your generator.
Above I imply that rand is not a "proper" pseudo-RNG since it typically comes with a number of shortcomings. In the best case, it lacks abstraction so picking from a different distribution becomes hard and error-prone (search the web for e.g. "random range modulus"). Also replacing the underlying engine used to generate the random numbers is AFAIK impossible by design. In less optimal cases rand as a pseudo-RNG doesn't provide long enough sequence lengths for many/most use cases. With TR1/C++11 generating high-quality random numbers is easy enough to always use the proper solution, so that one doesn't need to first worry about the quality of the used pseudo-RNG when obscure bugs show up. Microsoft's STL gave a presentation giving a nice summary talk on the topic at GoingNative2013.
// Initialize rand()'s sequence. A typical seed value is the return value of time()
long range = 150000; // 100000 + range is the maximum value you allow
long number = 100000 + (rand() * range) / RAND_MAX;
You may need to use something larger than a long int for range and number if (100000 + range) will exceed its max value.
In general you can use a random number generator that goes between 0 and 1, and get any range you want by doing the following transformation:
x' = r x + b
So if you want random numbers between, say, 100,000 and 300,000, and x is your random number between 0 and 1, then you'd set r to be 200,000 and b to be 100,000 and x' will be within the range you want.
If you don't have access to the C++ builtins yet, Boost has a bunch of real randomizers in Boost.Random, including specific solutions for your apparent problem space.
I'd echo the comments that clarifying edits in your question would improve the accuracy of answers eg. "I need uniformly-distributed integers from 100,001 through 1,000,000".
I'd like to generate a random number of reasonably arbitrary length in C++. By "reasonably arbitary" I mean limited by speed and memory of the host computer.
Let's assume:
I want to sample a decimal number (base 10) of length ceil(log10(MY_CUSTOM_RAND_MAX)) from 0 to 10^(ceil(log10(MY_CUSTOM_RAND_MAX))+1)-1
I have a vector<char>
The length of vector<char> is ceil(log10(MY_CUSTOM_RAND_MAX))
Each char is really an integer, a random number between 0 and 9, picked with rand() or similar methods
If I use std::random_shuffle to shuffle the vector, I could iterate through each element from the end, multiplying by incremented powers of ten to convert it to unsigned long long or whatever that gets mapped to my final range.
I don't know if there are problems with std::random_shuffle in terms of how random it is or isn't, particularly when also picking a sequence of rand() results to populate the vector<char>.
How sketchy is std::random_shuffle for generating a random number of arbitrary length in this manner, in a quantifiable sense?
(I realize that there is a library in Boost for making random int numbers. It's not clear what the range limitations are, but it looks like MAX_INT. That said, I realize that said library exists. This is more of a general question about this part of the STL in the generation of an arbitrarily large random number. Thanks in advance for focusing your answers on this part.)
I'm slightly unclear as to the focus of this question, but I'll try to answer it from a few different angles:
The quality of the standard library rand() function is typically poor. However, it is very easy to find replacement random number generators which are of a higher quality (you mentioned Boost.Random yourself, so clearly you're aware of other RNGs). It is also possible to boost (no pun intended) the quality of rand() output by combining the results of multiple calls, as long as you're careful about it:
If you don't want the decimal representation in the end, there's little to no point in generating it and then converting to binary. You can just as easily stick multiple 32-bit random numbers (from rand() or elsewhere) together to make an arbitrary bit-width random number.
If you're generating the individual digits (binary or decimal) randomly, there is little to no point in shuffling them afterwards.
I'm making a game in C++ and it involves filling tiles with random booleans (either yes or no) whether it is yes or no is decided by rand() % 1. It doesn't feel very random.
I'm using srand with ctime at startup, but it seems like the same patterns are coming up.
Are there any algorithms that will create very random numbers? Or any suggestions on how I could improve rand()?
True randomness often doesn't seem very random. Do expect to see odd runs.
But at least one immediate thing you can do to help is to avoid using just the lowest-order bit. To quote Numerical Recipes in C:
If you want to generate a random integer between 1 and 10, you should always do it by using high-order bits, as in
j = 1 + (int) (10.0 * (rand() / (RAND_MAX + 1.0)));
and never by anything resembling
j = 1 + (rand() % 10);
(which uses lower-order bits).
Also, you might consider using a different RNG with better properties instead. The Xorshift algorithm is a nice alternative. It's speedy and compact at just a few lines of C, and should be good enough statistically for nearly any game.
The low order bits are not very random.
By using %2 you are only checking the bottom bit of the random number.
Assuming you are not needing crypto strength randomness.
Then the following should be OK.
bool tile = rand() > (RAND_MAX / 2);
The easiest thing you can do, short of writing another PRNG or using a library, would be to just use all bits that a single call to rand() gives you. Most random number generators can be broken down to a stream of bits which has certain randomness and statistical properties. Individual bits, spaced evenly on that stream, need not have the same properties. Essentially you're throwing away between 14 and 31 bits of pseudo-randomness here.
You can just cache the number generated by a call to rand() and use each bit of it (depending on the number of bits rand() gives you, of course, which will depend on RAND_MAX). So if your RAND_MAX is 32768 you can use the lowest-order 15 bits of that number in sequence. Especially if RAND_MAX is that small you are not dealing with the low-order bits of the generator, so taking bits from the high end doesn't gain you much. For example the Microsoft CRT generates random numbers with the equation
xn + 1 = xn · 214013 + 2531011
and then shifts away the lowest-order 16 bits of that result and restricts it to 15 bits. So no low-order bits from the generator there. This largely holds true for generators where RAND_MAX is as high as 231 but you can't count on that sometimes (so maybe restrict yourself to 16 or 24 bits there, taken from the high-order end).
So, generally, just cache the result of a call to rand() and use the bits of that number in sequence for your application, instead of rand() % 2.
Many pseudo-random number generators suffer from cyclical lower bits, especially linear congruential algorithms, which are typically the most common implementations. Some people suggest shifting out the least significant bits to solve this.
C++11 has the following way of implementing the Mersenne tittie twister algorothm. From
#include <random>
#include <iostream>
int main()
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(1, 6);
for (int n=0; n<10; ++n)
std::cout << dis(gen) << ' ';
std::cout << '\n';
This produces random numbers suitable for simulations without the disadvantages of many other random number generators. It is not suitable for cryptography; but cryptographic random number generators are more computationally intensive.
There is also the Well equidistributed long-period linear algorithm; with many example implementations.
Boost Random Number Library
I have used the Mersenne Twister random number generator successfully for many years. Its source code is available from the maths department of Hiroshima Uni here. (Direct link so you don't have to read Japanese!)
What is great about this algorithm is that:
Its 'randomness' is very good
Its state vector is a vector of unsigned ints and an index, so it is very easy to save its state, reload its state, and resume a pseudo-random process from where it left off.
I'd recommend giving it a look for your game.
The perfect way of Yes or No as random is toggling those. You may not need random function.
The lowest bits of standard random number generators aren't very random, this is a well known problem.
I'd look into the boost random number library.
A quick thing that might make your numbers feel a bit more random would be to re-seed the generator each time the condition if(rand() % 50==0) is true.
Knuth suggests a Random number generation by subtractive method. Its is believed to be quite randome. For a sample implementation in the Scheme language see here
People say lower-order bits are not random. So try something from the middle. This will get you the 28th bit:
(rand() >> 13) % 2
With random numbers to get good results you really need to have a generator that combines several generators's results. Just discarding the bottom bit is a pretty silly answer.
multiply with carry is simple to implement and has good results on its own and if you have several of them and combine the results you will get extremely good results. It also doesn't require much memory and is very fast.
Also if you reseed too fast then you will get the exact same number. Personally I use a class that updates the seed only when the time has changed.