More call to mersenne_twister than there suppose to be - c++

I have a peculiar problem with my current code. I'm writing a program that needs to generate random real number from two distributions (a normal distribution and a real one.) The code to generate these values live inside a for loop :
char* buffer = new char[config.number_of_value * config.sizeof_line()];
//...
//Loop over how much values we want
for(std::size_t i = 0; i < config.number_of_value; ++i)
{
//Calculates the offset where the current line begins (0, sizeof_line * 1, sizeof_line * 2, etc.)
std::size_t line_offset = config.sizeof_line() * i;
//The actual numbers we want to output to the file
double x = next_uniform_real();
double y = config.y_intercept + config.slope * x + next_normal_real();
//Res is the number of character written. The character at buffer[res] is '\0', so we need
//To get rid of it
int res = sprintf((buffer + line_offset), "%f", x);
buffer[line_offset + res] = '0';
//Since we written double_rep_size character, we put the delimiter at double_rep_size index
res = sprintf((buffer + line_offset + config.data_point_character_size() + sizeof(char)), "%f", y);
buffer[line_offset + config.data_point_character_size() + sizeof(char) + res] = '0';
}
return buffer;
When running the program the usual value of "number_of_value" is 100'000. So there should be 100'000 calls to next_uniform_real() et 100'000 calls next_normal_real(). The strange parts is, when I profile this code with VSPerf on Visual Studio 2017 I get 227'242 calls to the mersenne_twister generator, which is 113'621 calls to each functions. As you can see there is 3'621 calls more than there is suppose to be.
Can anyone help me figure this out?
For reference, the functions look like this :
double generator::next_uniform_real()
{
return uniform_real_dist(eng);
}
double generator::next_normal_real()
{
return normal_dist(eng);
}
Where eng is std::mt19937, seeded with a random_device or time(0) when random_device has no entropy.
normal_dist is of type std::normal_real_distribution<>
and uniform_real_dist is of type std::uniform_real_distribution<>
For those wondering, I'm filling up a buffer a char* so that I can make one single write to an ostream rather than one for each iteration of the loop.
(As an aside, if someone knows a faster way to write float/double values to char* or a faster way to generate real numbers than this method, that'd be really helpful!)

All major standard library implementations of std::normal_distribution use the Marsaglia polar method. As noted in the Wikipedia article,
this procedure requires about 27% more evaluations of the underlying random number generator (only π/4 ≈ 79% of generated points lie inside of unit circle).
Your number sounds about right (100000 uniform reals at 1 RNG call per number plus 100000 normal reals at 1.27 RNG calls per number is 227000).

Imagine if you're trying to generate a random integer between 1 and 10 inclusive and your input source provides a random number between 1 and 12 inclusive. If you get a number between 1 and 10, you can just output it. But if you get an 11, you must get another number between 1 and 12. So extra calls may be needed when matching a random source to a random output with a different distribution.

Related

Simulate random iteration of array

I have an array of given size. I want to traverse it in pseudorandom order, keeping array intact and visiting each element once. It will be best if current state can be stored in a few integers.
I know you can't have full randomness without storing full array, but I don't need the order to be really random. I need it to be perceived as random by user. The solution should use sub-linear space.
One possible suggestion - using large prime number - is given here. The problem with this solution is that there is an obvious fixed step (taken module array size). I would prefer a solution which is not so obviously non-random. Is there a better solution?
How about this algorithm?
To pseudo-pseudo randomly traverse an array of size n.
Create a small array of size k
Use the large prime number method to fill the small array, i = 0
Randomly remove a position using a RNG from the small array, i += 1
if i < n - k then add a new position using the large prime number method
if i < n goto 3.
the higher k is the more randomness you get. This approach will allow you to delay generating numbers from the prime number method.
A similar approach can be done to generate a number earlier than expected in the sequence by creating another array, "skip-list". Randomly pick items later in the sequence, use them to traverse the next position, and then add them to the skip-list. When they naturally arrive they are searched for in the skip-list and suppressed and then removed from the skip-list at which point you can randomly add another item to the skip-list.
The idea of a random generator that simulates a shuffle is good if you can get one whose maximum period you can control.
A Linear Congruential Generator calculates a random number with the formula:
x[i + 1] = (a * x[i] + c) % m;
The maximum period is m and it is achieved when the following properties hold:
The parameters c and m are relatively prime.
For every prime number r dividing m, a - 1 is a multiple of r.
If m is a multiple of 4 then also a - 1 is multiple of 4.
My first darft involved making m the next multiple of 4 after the array length and then finding suitable a and c values. This was (a) a lot of work and (b) yielded very obvious results sometimes.
I've rethought this approach. We can make m the smallest power of two that the array length will fit in. The only prime factor of m is then 2, which will make every odd number relatively prime to it. With the exception of 1 and 2, m will be divisible by 4, which means that we must make a - 1 a multiple of 4.
Having a greater m than the array length means that we must discard all values that are illegal array indices. This will happen at most every other turn and should be negligible.
The following code yields pseudo random numbers with a period of exaclty m. I've avoided trivial values for a and c and on my (not too numerous) spot cheks, the results looked okay. At least there was no obvious cycling pattern.
So:
class RandomIndexer
{
public:
RandomIndexer(size_t length) : len(length)
{
m = 8;
while (m < length) m <<= 1;
c = m / 6 + uniform(5 * m / 6);
c |= 1;
a = m / 12 * uniform(m / 6);
a = 4*a + 1;
x = uniform(m);
}
size_t next()
{
do { x = (a*x + c) % m; } while (x >= len);
return x;
}
private:
static size_t uniform(size_t m)
{
double p = std::rand() / (1.0 + RAND_MAX);
return static_cast<int>(m * p);
}
size_t len;
size_t x;
size_t a;
size_t c;
size_t m;
};
You can then use the generator like this:
std::vector<int> list;
for (size_t i = 0; i < 3; i++) list.push_back(i);
RandomIndexer ix(list.size());
for (size_t i = 0; i < list.size(); i++) {
std::cout << list[ix.next()]<< std::endl;
}
I am aware that this still isn't a great random number generator, but it is reasonably fast, doesn't require a copy of the array and seems to work okay.
If the approach of picking a and c randomly yields bad results, it might be a good idea to restrict the generator to some powers of two and to hard-code literature values that have proven to be good.
As pointed out by others, you can create a sort of "flight plan" upfront by shuffling an array of array indices and then follow it. This violates the "it will be best if current state can be stored in a few integers" constraint but does it really matter? Are there tight performance constraints? After all, I believe that if you don't accept repetitions, than you need to store the items you already visited somewhere or somehow.
Alternatively, you can opt for an intrusive solution and store a bool inside each element of the array, telling you whether the element was already selected or not. This can be done in an almost clean way by employing inheritance (multiple as needed).
Many problems come with this solution, e.g. thread safety, and of course it violates the "keep the array intact" constraint.
Quadratic residues which you have mentioned ("using a large prime") are well-known, will work, and guarantee iterating each and every element exactly once (if that is required, but it seems that's not strictly the case?). Unluckily they are not "very random looking", and there are a few other requirements to the modulo in addition to being prime for it to work.
There is a page on Jeff Preshing's site which describes the technique in detail and suggests to feed the output of the residue generator into the generator again with a fixed offset.
However, since you said that you merely need "perceived as random by user", it seems that you might be able to do with feeding a hash function (say, cityhash or siphash) with consecutive integers. The output will be a "random" integer, and at least so far there will be a strict 1:1 mapping (since there are a lot more possible hash values than there are inputs).
Now the problem is that your array is most likely not that large, so you need to somehow reduce the range of these generated indices without generating duplicates (which is tough).
The obvious solution (taking the modulo) will not work, as it pretty much guarantees that you get a lot of duplicates.
Using a bitmask to limit the range to the next greater power of two should work without introducing bias, and discarding indices that are out of bounds (generating a new index) should work as well. Note that this needs non-deterministic time -- but the combination of these two should work reasonably well (a couple of tries at most) on the average.
Otherwise, the only solution that "really works" is shuffling an array of indices as pointed out by Kamil Kilolajczyk (though you don't want that).
Here is a java solution, which can be easily converted to C++ and similar to M Oehm's solution above, albeit with a different way of choosing LCG parameters.
import java.util.Enumeration;
import java.util.Random;
public class RandomPermuteIterator implements Enumeration<Long> {
int c = 1013904223, a = 1664525;
long seed, N, m, next;
boolean hasNext = true;
public RandomPermuteIterator(long N) throws Exception {
if (N <= 0 || N > Math.pow(2, 62)) throw new Exception("Unsupported size: " + N);
this.N = N;
m = (long) Math.pow(2, Math.ceil(Math.log(N) / Math.log(2)));
next = seed = new Random().nextInt((int) Math.min(N, Integer.MAX_VALUE));
}
public static void main(String[] args) throws Exception {
RandomPermuteIterator r = new RandomPermuteIterator(100);
while (r.hasMoreElements()) System.out.print(r.nextElement() + " ");
//output:50 52 3 6 45 40 26 49 92 11 80 2 4 19 86 61 65 44 27 62 5 32 82 9 84 35 38 77 72 7 ...
}
#Override
public boolean hasMoreElements() {
return hasNext;
}
#Override
public Long nextElement() {
next = (a * next + c) % m;
while (next >= N) next = (a * next + c) % m;
if (next == seed) hasNext = false;
return next;
}
}
maybe you could use this one: http://www.cplusplus.com/reference/algorithm/random_shuffle/ ?

Using Boost PRNG to make a huge lookup table of random numbers

I'm trying to use Boost's normal distribution to generate random numbers given different seeds. In other words, I need the same random numbers produced for seed1, seed2, etc.; thousands of seeds will be passed to the function over the course of the simulation. The random number generator will never be used unseeded. [Edit: "Key" is a better word than "seed"--see final description block below.] I'm not sure whether it makes the most sense to generate a single RNG and reseed it (and if so, how) or if it's easier to generate a new one each time. Here's what I have so far, which involves the construction of a new, seeded rng at each request for a random normal number:
double rnorm( int thisSeed ) {
boost::mt19937 rng( thisSeed );
boost::normal_distribution<> nd( 0.0, 1.0 ); // (mean, sd)
boost::variate_generator > var_nor( rng, nd );
return var_nor();
}
Is this dumb? I'm new to PRNGs and especially Boost's implementation.
A more thorough description of why I'm doing this:
I am creating a huge random energy landscape to simulate protein interactions: each sequence has a particular energy that's calculated as the sum of quenched Gaussian random numbers that depend on the values of particular amino acids at particular positions (and a few other sequence attributes). I want to use the PRNG to calculate what these pseudorandom values are: these values must be consistent (the same sequence should yield the same values), but there are way too many to store. As a simple example, I might have a sequence ARNDAMR and compute its total energy based on two subenergies: one is a random normal number that depends on having A in position 1 and D at position 4, and the other subenergy is a random number that depends on the last three amino acids. I'm converting the configurations into keys for use as seeds (arguments) for my PRNG. Many thousands of sequences will be constructed and mutated, so I need a way to compute energies quickly--so I need to know how best to seed and call my RNG. I will not be using the Boost RNG for anything other than these energy value "lookups."
Further (tl;dr) explanation:
I am going to have "key" values that are integers between 1 and 10^6 or 10^7. I want each to map to a Gaussian random number. There should not be any cross-correlation between the key values and their numbers (e.g., keys 145-148 should not map to autocorrelated "random" numbers).
I need a given key to return the same random number each time it (the key) is called in the simulation. I do not want to store the key-random number pairs in a lookup table.
Your approach fundmentally misunderstands how PRNGs work. If you reseed on every use, then you won't get random numbers at all, you'll just get a bad hash function of the seed. In particular, your numbers won't be normally distributed even if you're calling the PRNG's normal distribution function, because the PRNG only guarantees that the random numbers generated from a particular seed will be normal.
If you need a large set of random numbers to be repeatable for a specific set of inputs, then generate a single number which is a function of those inputs, seed the PRNG with that, then get numbers from the PRNG in a predictable sequence; it will produce the same sequence for the same inputs, and the numbers will be properly distributed by the PRNG.
If the set of inputs you use to determine the random sequence is large (and in particular, larger that the size of the seed for your PRNG), then you won't have a unique sequence for every set of inputs. That might be OK for your application, or you might want to use a PRNG with larger seeds.
Take a look at my public domain ojrandlib. It uses big seeds, and generates normally distributed numbers with the fast Ziggurat algorithm.
Edit after seeing your clarification:
Ah, now I see. There's no such thing as "a" Gaussian random. Distribution only makes sense with regard to the whole sequence from one seed, so what you need to do is create and seed a single generator, then fetch the Nth random value from that generator for each of your keys N. If you're not doing this in order (that is, if you're fetching from keys totally at random and not as part of a sequence) this will be very slow, but still possible. You may want to see if you can force a sequence, say by sorting the keys before you fetch them.
ojrandlib has a function discard() for this too, so that if you need to find the 1,000,000th number in a sequence, you can seed the PRNG and discard 999,999 of them, which is faster than actually generating them, but will still be pretty slow.
Probably better: instead of using your key to seed a Gaussian generator, compute a good hash function of the key + fixed seed (which will result in uniformly distributed random bits), then interpret those hash bits as two uniform floats, then do the Box-Muller or Ziggurat with those to transform the distribution. That way, the numbers you get will be all from the same "seed" (which is the input to the hash), but normally distributed. You don't need a cryptographically secure hash, so something like MurMurHash might work well, though you would probably be better off rolling your own for such a special purpose.
Thought users of my library might have similar problems to yours, so I investigated some possibilities. Here's some code that might work for you:
/* Thomas Wang's 32-bit integer hash */
uint32_t nth_rand32(uint32_t a) {
a -= a << 6;
a ^= a >> 17;
a -= a << 9;
a ^= a << 4;
a -= a << 3;
a ^= a << 10;
a ^= a >> 15;
return a;
}
/* Marsaglia polar method */
double nth_normal(int index) {
double f, g, w;
int skip = 0;
uint64_t x, y;
do {
x = (uint64_t)nth_rand32((index & ~1) + skip);
y = (uint64_t)nth_rand32((index | 1) + skip);
skip += 0x40000001;
x = (x << 20) | 0x3ff0000000000000ull;
f = *(double *)(&x) * 2.0 - 3.0;
y = (y << 20) | 0x3ff0000000000000ull;
g = *(double *)(&y) * 2.0 - 3.0;
w = f * f + g * g;
} while (w >= 1.0 || w == 0.0);
w = sqrt((-2.0 * log(w)) / w);
if (index & 1) w *= f;
else w *= g;
return w;
}
The hash doesn't pass diehard, but it's pretty good. I generated 10,000,000 random normals, and got this distribution (if this image upload works):
Not perfect, but not too bad. It would be a lot better with a more expensive hash, but I'll let you decide where the speed/accuracy tradeoff is for you.

Random selection of dictionary words

I want to select a number of random words from an array to make a total amount of 36 letters.
At first I tried to select a random word and add it after checking that it's not longer than the amount of free space we have. That was not efficient since the list would fill up and there would only be empty space left for a 2-3 letter word and it takes a long time to find such a short word.
So i decided to only choose six 6-letter words and I'm doing that by generating a random number and then incrementing it by 1 until we find a 6 letter word. It's pretty fast, but the words aren't really that random, often I get words that start from the same letter or only words that start with letters in sequence like a,b,c or x,y,z.
srand ( time(NULL) );
for(int i=0;i<6;i++)
{
randNumb = rand()%dictionary.size();
while(dictionary.at(randNumb).length() != 6)
{
randNumb++;
}
a << "/" << dictionary.at(randNumb) << "/";
}
I would like to choose words with different lengths but in favor of performance I'll settle with just the 6-letter words but then i would at least want them to be more randomly selected.
You should get a new random number instead of increasing the index. The way you do it, all the strings not matching your criteria "attract" more random numbers, and possibly lead to the following string to have a higher probability of being chosen.
The rand() function generates a number between 0 and RAND_MAX.
If RAND_MAX is defined as 32767, then you will not access elements in your dictionary (array?) with indices greater than that.
If you need to generate a random number greater than RAND_MAX, then think about summing the result of n calls of rand(), such that n * RAND_MAX >= dictionary.size(). The modulus of this result is then guaranteed to give an index that falls somewhere in the bounds of the entire dictionary.
Even if RAND_MAX is greater than dictionary.size(), using the % operator to select the index leads to a non-uniform distribution. The modulus will cause the early words to be selected more often than the later words (unless RAND_MAX + 1 is an integer multiple of dictionary.size()).
Consider a simple example: Assume your dictionary has 10 words, and RAND_MAX is 14. When rand() returns a value from 0 to 9, the corresponding word is chosen directly. But when rand() is 10 through 14, then one of the first five words will be chosen. So the first five words have twice the chance of being selected than the last five words.
A better way to map [0..RAND_MAX] to [0..dictionary.size()) is to use division:
assert(RAND_MAX + 1 >= dictionary.size());
randNumb = rand() * dictionary.size() / (RAND_MAX + 1);
But you have to be careful of integer overflow. If RAND_MAX * dictionary.size() is larger than you can represent in an integer, you'll need to use a larger data type. Some systems have a function like MulDiv for just this purpose. If you don't have something like MulDiv, you can convert to a floating point type and then truncate the result back to an integer:
double temp = static_cast<double>(rand()) * dictionary.size() / (RAND_MAX + 1);
randNumb = static_cast<int>(temp);
This is still an imperfect distribution, but the "hot" words will now be evenly distributed across the dictionary instead of clumping at the beginning.
The closer RAND_MAX + 1 is to an integer multiple of dictionary.size(), the better off you'll be. And if you can't be sure that it's close to an integer multiple, then you want RAND_MAX to be as large as possible relative to dictionary.size().
Since you don't have much control over RAND_MAX, you could consider tweaking dictionary.size(). For example, if you only want six-letter words, then why not strip all the others out of the dictionary?
std::vector<std::string> six_letter_words;
std::copy_if(dictionary.begin(), dictionary.end(),
std::back_inserter(six_letter_words),
[](const std::string &word){ return word.size() == 6; });
With the reduced set, we can use a more generic algorithm to select the words:
typedef std::vector<std::string> WordList;
// Returns true with the given probability, which should be 0.0 to 1.0.
bool Probably(double probability) {
return (static_cast<double>(std::rand()) / RAND_MAX) < probability;
}
// Selects n words from the dictionary using a normal distribution and
// copies them to target.
template <typename OutputIt>
OutputIt Select(int n, const WordList &dictionary, OutputIt target) {
double count = static_cast<double>(n);
for (std::size_t i = 0; count > 0.0 && i < dictionary.size(); ++i) {
if (Probably(count / (dictionary.size() - i))) {
*target++ = dictionary[i];
count -= 1.0;
}
}
return target;
}
The idea is to step through each word in the dictionary and select it with a probability of the number of words you need to pick divided by the number of words left to pick from. This works well, even if RAND_MAX is relatively small. Overall, though, it's much more computation than trying to randomly select indexes. Also note that this technique will never choose the same word more than once, where the index mapping technique could.
You call Select like this:
// Select six words from six_letter_words using a normal distribution.
WordList selected;
Select(6, six_letter_words, std::back_inserter(selected));
Also note that most implementations of rand() are pretty simplistic and may not give a good normal distribution to begin with.

What is the most efficient way to generate unique pseudo-random numbers? [duplicate]

Duplicate:
Unique random numbers in O(1)?
I want an pseudo random number generator that can generate numbers with no repeats in a random order.
For example:
random(10)
might return
5, 9, 1, 4, 2, 8, 3, 7, 6, 10
Is there a better way to do it other than making the range of numbers and shuffling them about, or checking the generated list for repeats?
Edit:
Also I want it to be efficient in generating big numbers without the entire range.
Edit:
I see everyone suggesting shuffle algorithms. But if I want to generate large random number (1024 byte+) then that method would take alot more memory than if I just used a regular RNG and inserted into a Set until it was a specified length, right? Is there no better mathematical algorithm for this.
You may be interested in a linear feedback shift register.
We used to build these out of hardware, but I've also done them in software. It uses a shift register with some of the bits xor'ed and fed back to the input, and if you pick just the right "taps" you can get a sequence that's as long as the register size. That is, a 16-bit lfsr can produce a sequence 65535 long with no repeats. It's statistically random but of course eminently repeatable. Also, if it's done wrong, you can get some embarrassingly short sequences. If you look up the lfsr, you will find examples of how to construct them properly (which is to say, "maximal length").
A shuffle is a perfectly good way to do this (provided you do not introduce a bias using the naive algorithm). See Fisher-Yates shuffle.
If a random number is guaranteed to never repeat it is no longer random and the amount of randomness decreases as the numbers are generated (after nine numbers random(10) is rather predictable and even after only eight you have a 50-50 chance).
I understand tou don't want a shuffle for large ranges, since you'd have to store the whole list to do so.
Instead, use a reversible pseudo-random hash. Then feed in the values 0 1 2 3 4 5 6 etc in turn.
There are infinite numbers of hashes like this. They're not too hard to generate if they're restricted to a power of 2, but any base can be used.
Here's one that would work for example if you wanted to go through all 2^32 32 bit values. It's easiest to write because the implicit mod 2^32 of integer math works to your advantage in this case.
unsigned int reversableHash(unsigned int x)
{
x*=0xDEADBEEF;
x=x^(x>>17);
x*=0x01234567;
x+=0x88776655;
x=x^(x>>4);
x=x^(x>>9);
x*=0x91827363;
x=x^(x>>7);
x=x^(x>>11);
x=x^(x>>20);
x*=0x77773333;
return x;
}
If you don't mind mediocre randomness properties and if the number of elements allows it then you could use a linear congruential random number generator.
A shuffle is the best you can do for random numbers in a specific range with no repeats. The reason that the method you describe (randomly generate numbers and put them in a Set until you reach a specified length) is less efficient is because of duplicates. Theoretically, that algorithm might never finish. At best it will finish in an indeterminable amount of time, as compared to a shuffle, which will always run in a highly predictable amount of time.
Response to edits and comments:
If, as you indicate in the comments, the range of numbers is very large and you want to select relatively few of them at random with no repeats, then the likelihood of repeats diminishes rapidly. The bigger the difference in size between the range and the number of selections, the smaller the likelihood of repeat selections, and the better the performance will be for the select-and-check algorithm you describe in the question.
What about using GUID generator (like in the one in .NET). Granted it is not guaranteed that there will be no duplicates, however the chance getting one is pretty low.
This has been asked before - see my answer to the previous question. In a nutshell: You can use a block cipher to generate a secure (random) permutation over any range you want, without having to store the entire permutation at any point.
If you want to creating large (say, 64 bits or greater) random numbers with no repeats, then just create them. If you're using a good random number generator, that actually has enough entropy, then the odds of generating repeats are so miniscule as to not be worth worrying about.
For instance, when generating cryptographic keys, no one actually bothers checking to see if they've generated the same key before; since you're trusting your random number generator that a dedicated attacker won't be able to get the same key out, then why would you expect that you would come up with the same key accidentally?
Of course, if you have a bad random number generator (like the Debian SSL random number generator vulnerability), or are generating small enough numbers that the birthday paradox gives you a high chance of collision, then you will need to actually do something to ensure you don't get repeats. But for large random numbers with a good generator, just trust probability not to give you any repeats.
As you generate your numbers, use a Bloom filter to detect duplicates. This would use a minimal amount of memory. There would be no need to store earlier numbers in the series at all.
The trade off is that your list could not be exhaustive in your range. If your numbers are truly on the order of 256^1024, that's hardly any trade off at all.
(Of course if they are actually random on that scale, even bothering to detect duplicates is a waste of time. If every computer on earth generated a trillion random numbers that size every second for trillions of years, the chance of a collision is still absolutely negligible.)
I second gbarry's answer about using an LFSR. They are very efficient and simple to implement even in software and are guaranteed not to repeat in (2^N - 1) uses for an LFSR with an N-bit shift-register.
There are some drawbacks however: by observing a small number of outputs from the RNG, one can reconstruct the LFSR and predict all values it will generate, making them not usable for cryptography and anywhere were a good RNG is needed. The second problem is that either the all zero word or the all one (in terms of bits) word is invalid depending on the LFSR implementation. The third issue which is relevant to your question is that the maximum number generated by the LFSR is always a power of 2 - 1 (or power of 2 - 2).
The first drawback might not be an issue depending on your application. From the example you gave, it seems that you are not expecting zero to be among the answers; so, the second issue does not seem relevant to your case.
The maximum value (and thus range) problem can solved by reusing the LFSR until you get a number within your range. Here's an example:
Say you want to have numbers between 1 and 10 (as in your example). You would use a 4-bit LFSR which has a range [1, 15] inclusive. Here's a pseudo code as to how to get number in the range [1,10]:
x = LFSR.getRandomNumber();
while (x > 10) {
x = LFSR.getRandomNumber();
}
You should embed the previous code in your RNG; so that the caller wouldn't care about implementation.
Note that this would slow down your RNG if you use a large shift-register and the maximum number you want is not a power of 2 - 1.
This answer suggests some strategies for getting what you want and ensuring they are in a random order using some already well-known algorithms.
There is an inside out version of the Fisher-Yates shuffle algorithm, called the Durstenfeld version, that randomly distributes sequentially acquired items into arrays and collections while loading the array or collection.
One thing to remember is that the Fisher-Yates (AKA Knuth) shuffle or the Durstenfeld version used at load time is highly efficient with arrays of objects because only the reference pointer to the object is being moved and the object itself doesn't have to be examined or compared with any other object as part of the algorithm.
I will give both algorithms further below.
If you want really huge random numbers, on the order of 1024 bytes or more, a really good random generator that can generate unsigned bytes or words at a time will suffice. Randomly generate as many bytes or words as you need to construct the number, make it into an object with a reference pointer to it and, hey presto, you have a really huge random integer. If you need a specific really huge range, you can add a base value of zero bytes to the low-order end of the byte sequence to shift the value up. This may be your best option.
If you need to eliminate duplicates of really huge random numbers, then that is trickier. Even with really huge random numbers, removing duplicates also makes them significantly biased and not random at all. If you have a really large set of unduplicated really huge random numbers and you randomly select from the ones not yet selected, then the bias is only the bias in creating the huge values for the really huge set of numbers from which to choose. A reverse version of Durstenfeld's version of the Yates-Fisher could be used to randomly choose values from a really huge set of them, remove them from the remaining values from which to choose and insert them into a new array that is a subset and could do this with just the source and target arrays in situ. This would be very efficient.
This may be a good strategy for getting a small number of random numbers with enormous values from a really large set of them in which they are not duplicated. Just pick a random location in the source set, obtain its value, swap its value with the top element in the source set, reduce the size of the source set by one and repeat with the reduced size source set until you have chosen enough values. This is essentiall the Durstenfeld version of Fisher-Yates in reverse. You can then use the Dursenfeld version of the Fisher-Yates algorithm to insert the acquired values into the destination set. However, that is overkill since they should be randomly chosen and randomly ordered as given here.
Both algorithms assume you have some random number instance method, nextInt(int setSize), that generates a random integer from zero to setSize meaning there are setSize possible values. In this case, it will be the size of the array since the last index to the array is size-1.
The first algorithm is the Durstenfeld version of Fisher-Yates (aka Knuth) shuffle algorithm as applied to an array of arbitrary length, one that simply randomly positions integers from 0 to the length of the array into the array. The array need not be an array of integers, but can be an array of any objects that are acquired sequentially which, effectively, makes it an array of reference pointers. It is simple, short and very effective
int size = someNumber;
int[] int array = new int[size]; // here is the array to load
int location; // this will get assigned a value before used
// i will also conveniently be the value to load, but any sequentially acquired
// object will work
for (int i = 0; i <= size; i++) { // conveniently, i is also the value to load
// you can instance or acquire any object at this place in the algorithm to load
// by reference, into the array and use a pointer to it in place of j
int j = i; // in this example, j is trivially i
if (i == 0) { // first integer goes into first location
array[i] = j; // this may get swapped from here later
} else { // subsequent integers go into random locations
// the next random location will be somewhere in the locations
// already used or a new one at the end
// here we get the next random location
// to preserve true randomness without a significant bias
// it is REALLY IMPORTANT that the newest value could be
// stored in the newest location, that is,
// location has to be able to randomly have the value i
int location = nextInt(i + 1); // a random value between 0 and i
// move the random location's value to the new location
array[i] = array[location];
array[location] = j; // put the new value into the random location
} // end if...else
} // end for
Voila, you now have an already randomized array.
If you want to randomly shuffle an array you already have, here is the standard Fisher-Yates algorithm.
type[] array = new type[size];
// some code that loads array...
// randomly pick an item anywhere in the current array segment,
// swap it with the top element in the current array segment,
// then shorten the array segment by 1
// just as with the Durstenfeld version above,
// it is REALLY IMPORTANT that an element could get
// swapped with itself to avoid any bias in the randomization
type temp; // this will get assigned a value before used
int location; // this will get assigned a value before used
for (int i = arrayLength -1 ; i > 0; i--) {
int location = nextInt(i + 1);
temp = array[i];
array[i] = array[location];
array[location] = temp;
} // end for
For sequenced collections and sets, i.e. some type of list object, you could just use adds/or inserts with an index value that allows you to insert items anywhere, but it has to allow adding or appending after the current last item to avoid creating bias in the randomization.
Shuffling N elements doesn't take up excessive memory...think about it. You only swap one element at a time, so the maximum memory used is that of N+1 elements.
Assuming you have a random or pseudo-random number generator, even if it's not guaranteed to return unique values, you can implement one that returns unique values each time using this code, assuming that the upper limit remains constant (i.e. you always call it with random(10), and don't call it with random(10); random(11).
The code doesn't check for errors. You can add that yourself if you want to.
It also requires a lot of memory if you want a large range of numbers.
/* the function returns a random number between 0 and max -1
* not necessarily unique
* I assume it's written
*/
int random(int max);
/* the function returns a unique random number between 0 and max - 1 */
int unique_random(int max)
{
static int *list = NULL; /* contains a list of numbers we haven't returned */
static int in_progress = 0; /* 0 --> we haven't started randomizing numbers
* 1 --> we have started randomizing numbers
*/
static int count;
static prev_max = 0;
// initialize the list
if (!in_progress || (prev_max != max)) {
if (list != NULL) {
free(list);
}
list = malloc(sizeof(int) * max);
prev_max = max;
in_progress = 1;
count = max - 1;
int i;
for (i = max - 1; i >= 0; --i) {
list[i] = i;
}
}
/* now choose one from the list */
int index = random(count);
int retval = list[index];
/* now we throw away the returned value.
* we do this by shortening the list by 1
* and replacing the element we returned with
* the highest remaining number
*/
swap(&list[index], &list[count]);
/* when the count reaches 0 we start over */
if (count == 0) {
in_progress = 0;
free(list);
list = 0;
} else { /* reduce the counter by 1 */
count--;
}
}
/* swap two numbers */
void swap(int *x, int *y)
{
int temp = *x;
*x = *y;
*y = temp;
}
Actually, there's a minor point to make here; a random number generator which is not permitted to repeat is not random.
Suppose you wanted to generate a series of 256 random numbers without repeats.
Create a 256-bit (32-byte) memory block initialized with zeros, let's call it b
Your looping variable will be n, the number of numbers yet to be generated
Loop from n = 256 to n = 1
Generate a random number r in the range [0, n)
Find the r-th zero bit in your memory block b, let's call it p
Put p in your list of results, an array called q
Flip the p-th bit in memory block b to 1
After the n = 1 pass, you are done generating your list of numbers
Here's a short example of what I am talking about, using n = 4 initially:
**Setup**
b = 0000
q = []
**First loop pass, where n = 4**
r = 2
p = 2
b = 0010
q = [2]
**Second loop pass, where n = 3**
r = 2
p = 3
b = 0011
q = [2, 3]
**Third loop pass, where n = 2**
r = 0
p = 0
b = 1011
q = [2, 3, 0]
** Fourth and final loop pass, where n = 1**
r = 0
p = 1
b = 1111
q = [2, 3, 0, 1]
Please check answers at
Generate sequence of integers in random order without constructing the whole list upfront
and also my answer lies there as
very simple random is 1+((power(r,x)-1) mod p) will be from 1 to p for values of x from 1 to p and will be random where r and p are prime numbers and r <> p.
I asked a similar question before but mine was for the whole range of a int see Looking for a Hash Function /Ordered Int/ to /Shuffled Int/
static std::unordered_set<long> s;
long l = 0;
for(; !l && (s.end() != s.find(l)); l = generator());
v.insert(l);
generator() being your random number generator. You roll numbers as long as the entry is not in your set, then you add what you find in it. You get the idea.
I did it with long for the example, but you should make that a template if your PRNG is templatized.
Alternative is to use a cryptographically secure PRNG that will have a very low probability to generate twice the same number.
If you don't mean poor statisticall properties of generated sequence, there is one method:
Let's say you want to generate N numbers, each of 1024 bits each. You can sacrifice some bits of generated number to be "counter".
So you generate each random number, but into some bits you choosen you put binary encoded counter (from variable, you increase each time next random number is generated).
You can split that number into single bits and put it in some of less significant bits of generated number.
That way you are sure you get unique number each time.
I mean for example each generated number looks like that:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyxxxxyxyyyyxxyxx
where x is take directly from generator, and ys are taken from counter variable.
Mersenne twister
Description of which can be found here on Wikipedia: Mersenne twister
Look at the bottom of the page for implementations in various languages.
The problem is to select a "random" sequence of N unique numbers from the range 1..M where there is no constraint on the relationship between N and M (M could be much bigger, about the same, or even smaller than N; they may not be relatively prime).
Expanding on the linear feedback shift register answer: for a given M, construct a maximal LFSR for the smallest power of two that is larger than M. Then just grab your numbers from the LFSR throwing out numbers larger than M. On average, you will throw out at most half the generated numbers (since by construction more than half the range of the LFSR is less than M), so the expected running time of getting a number is O(1). You are not storing previously generated numbers so space consumption is O(1) too. If you cycle before getting N numbers then M less than N (or the LFSR is constructed incorrectly).
You can find the parameters for maximum length LFSRs up to 168 bits here (from wikipedia): http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf
Here's some java code:
/**
* Generate a sequence of unique "random" numbers in [0,M)
* #author dkoes
*
*/
public class UniqueRandom
{
long lfsr;
long mask;
long max;
private static long seed = 1;
//indexed by number of bits
private static int [][] taps = {
null, // 0
null, // 1
null, // 2
{3,2}, //3
{4,3},
{5,3},
{6,5},
{7,6},
{8,6,5,4},
{9,5},
{10,7},
{11,9},
{12,6,4,1},
{13,4,3,1},
{14,5,3,1},
{15,14},
{16,15,13,4},
{17,14},
{18,11},
{19,6,2,1},
{20,17},
{21,19},
{22,21},
{23,18},
{24,23,22,17},
{25,22},
{26,6,2,1},
{27,5,2,1},
{28,25},
{29,27},
{30,6,4,1},
{31,28},
{32,22,2,1},
{33,20},
{34,27,2,1},
{35,33},
{36,25},
{37,5,4,3,2,1},
{38,6,5,1},
{39,35},
{40,38,21,19},
{41,38},
{42,41,20,19},
{43,42,38,37},
{44,43,18,17},
{45,44,42,41},
{46,45,26,25},
{47,42},
{48,47,21,20},
{49,40},
{50,49,24,23},
{51,50,36,35},
{52,49},
{53,52,38,37},
{54,53,18,17},
{55,31},
{56,55,35,34},
{57,50},
{58,39},
{59,58,38,37},
{60,59},
{61,60,46,45},
{62,61,6,5},
{63,62},
};
//m is upperbound; things break if it isn't positive
UniqueRandom(long m)
{
max = m;
lfsr = seed; //could easily pass a starting point instead
//figure out number of bits
int bits = 0;
long b = m;
while((b >>>= 1) != 0)
{
bits++;
}
bits++;
if(bits < 3)
bits = 3;
mask = 0;
for(int i = 0; i < taps[bits].length; i++)
{
mask |= (1L << (taps[bits][i]-1));
}
}
//return -1 if we've cycled
long next()
{
long ret = -1;
if(lfsr == 0)
return -1;
do {
ret = lfsr;
//update lfsr - from wikipedia
long lsb = lfsr & 1;
lfsr >>>= 1;
if(lsb == 1)
lfsr ^= mask;
if(lfsr == seed)
lfsr = 0; //cycled, stick
ret--; //zero is stuck state, never generated so sub 1 to get it
} while(ret >= max);
return ret;
}
}
Here is a way to random without repeating results. It also works for strings. Its in C# but the logig should work in many places. Put the random results in a list and check if the new random element is in that list. If not than you have a new random element. If it is in that list, repeat the random until you get an element that is not in that list.
List<string> Erledigte = new List<string>();
private void Form1_Load(object sender, EventArgs e)
{
label1.Text = "";
listBox1.Items.Add("a");
listBox1.Items.Add("b");
listBox1.Items.Add("c");
listBox1.Items.Add("d");
listBox1.Items.Add("e");
}
private void button1_Click(object sender, EventArgs e)
{
Random rand = new Random();
int index=rand.Next(0, listBox1.Items.Count);
string rndString = listBox1.Items[index].ToString();
if (listBox1.Items.Count <= Erledigte.Count)
{
return;
}
else
{
if (Erledigte.Contains(rndString))
{
//MessageBox.Show("vorhanden");
while (Erledigte.Contains(rndString))
{
index = rand.Next(0, listBox1.Items.Count);
rndString = listBox1.Items[index].ToString();
}
}
Erledigte.Add(rndString);
label1.Text += rndString;
}
}
For a sequence to be random there should not be any auto correlation. The restriction that the numbers should not repeat means the next number should depend on all the previous numbers which means it is not random anymore....
If you can generate 'small' random numbers, you can generate 'large' random numbers by integrating them: add a small random increment to each 'previous'.
const size_t amount = 100; // a limited amount of random numbers
vector<long int> numbers;
numbers.reserve( amount );
const short int spread = 250; // about 250 between each random number
numbers.push_back( myrandom( spread ) );
for( int n = 0; n != amount; ++n ) {
const short int increment = myrandom( spread );
numbers.push_back( numbers.back() + increment );
}
myshuffle( numbers );
The myrandom and myshuffle functions I hereby generously delegate to others :)
to have non repeated random numbers and to avoid waistingtime with checking for doubles numbers and get new numbers over and over use the below method which will assure the minimum usage of Rand:
for example if you want to get 100 non repeated random number:
1. fill an array with numbers from 1 to 100
2. get a random number using Rand function in the range of (1-100)
3. use the genarted random number as an Index to get th value from the array (Numbers[IndexGeneratedFromRandFunction]
4. shift the number in the array after that Index to the left
5. repeat from step 2 but now the the rang should be (1-99) and go on
now we have a array with different numbers!
int main() {
int b[(the number
if them)];
for (int i = 0; i < (the number of them); i++) {
int a = rand() % (the number of them + 1) + 1;
int j = 0;
while (j < i) {
if (a == b[j]) {
a = rand() % (the number of them + 1) + 1;
j = -1;
}
j++;
}
b[i] = a;
}
}

Create Random Number Sequence with No Repeats

Duplicate:
Unique random numbers in O(1)?
I want an pseudo random number generator that can generate numbers with no repeats in a random order.
For example:
random(10)
might return
5, 9, 1, 4, 2, 8, 3, 7, 6, 10
Is there a better way to do it other than making the range of numbers and shuffling them about, or checking the generated list for repeats?
Edit:
Also I want it to be efficient in generating big numbers without the entire range.
Edit:
I see everyone suggesting shuffle algorithms. But if I want to generate large random number (1024 byte+) then that method would take alot more memory than if I just used a regular RNG and inserted into a Set until it was a specified length, right? Is there no better mathematical algorithm for this.
You may be interested in a linear feedback shift register.
We used to build these out of hardware, but I've also done them in software. It uses a shift register with some of the bits xor'ed and fed back to the input, and if you pick just the right "taps" you can get a sequence that's as long as the register size. That is, a 16-bit lfsr can produce a sequence 65535 long with no repeats. It's statistically random but of course eminently repeatable. Also, if it's done wrong, you can get some embarrassingly short sequences. If you look up the lfsr, you will find examples of how to construct them properly (which is to say, "maximal length").
A shuffle is a perfectly good way to do this (provided you do not introduce a bias using the naive algorithm). See Fisher-Yates shuffle.
If a random number is guaranteed to never repeat it is no longer random and the amount of randomness decreases as the numbers are generated (after nine numbers random(10) is rather predictable and even after only eight you have a 50-50 chance).
I understand tou don't want a shuffle for large ranges, since you'd have to store the whole list to do so.
Instead, use a reversible pseudo-random hash. Then feed in the values 0 1 2 3 4 5 6 etc in turn.
There are infinite numbers of hashes like this. They're not too hard to generate if they're restricted to a power of 2, but any base can be used.
Here's one that would work for example if you wanted to go through all 2^32 32 bit values. It's easiest to write because the implicit mod 2^32 of integer math works to your advantage in this case.
unsigned int reversableHash(unsigned int x)
{
x*=0xDEADBEEF;
x=x^(x>>17);
x*=0x01234567;
x+=0x88776655;
x=x^(x>>4);
x=x^(x>>9);
x*=0x91827363;
x=x^(x>>7);
x=x^(x>>11);
x=x^(x>>20);
x*=0x77773333;
return x;
}
If you don't mind mediocre randomness properties and if the number of elements allows it then you could use a linear congruential random number generator.
A shuffle is the best you can do for random numbers in a specific range with no repeats. The reason that the method you describe (randomly generate numbers and put them in a Set until you reach a specified length) is less efficient is because of duplicates. Theoretically, that algorithm might never finish. At best it will finish in an indeterminable amount of time, as compared to a shuffle, which will always run in a highly predictable amount of time.
Response to edits and comments:
If, as you indicate in the comments, the range of numbers is very large and you want to select relatively few of them at random with no repeats, then the likelihood of repeats diminishes rapidly. The bigger the difference in size between the range and the number of selections, the smaller the likelihood of repeat selections, and the better the performance will be for the select-and-check algorithm you describe in the question.
What about using GUID generator (like in the one in .NET). Granted it is not guaranteed that there will be no duplicates, however the chance getting one is pretty low.
This has been asked before - see my answer to the previous question. In a nutshell: You can use a block cipher to generate a secure (random) permutation over any range you want, without having to store the entire permutation at any point.
If you want to creating large (say, 64 bits or greater) random numbers with no repeats, then just create them. If you're using a good random number generator, that actually has enough entropy, then the odds of generating repeats are so miniscule as to not be worth worrying about.
For instance, when generating cryptographic keys, no one actually bothers checking to see if they've generated the same key before; since you're trusting your random number generator that a dedicated attacker won't be able to get the same key out, then why would you expect that you would come up with the same key accidentally?
Of course, if you have a bad random number generator (like the Debian SSL random number generator vulnerability), or are generating small enough numbers that the birthday paradox gives you a high chance of collision, then you will need to actually do something to ensure you don't get repeats. But for large random numbers with a good generator, just trust probability not to give you any repeats.
As you generate your numbers, use a Bloom filter to detect duplicates. This would use a minimal amount of memory. There would be no need to store earlier numbers in the series at all.
The trade off is that your list could not be exhaustive in your range. If your numbers are truly on the order of 256^1024, that's hardly any trade off at all.
(Of course if they are actually random on that scale, even bothering to detect duplicates is a waste of time. If every computer on earth generated a trillion random numbers that size every second for trillions of years, the chance of a collision is still absolutely negligible.)
I second gbarry's answer about using an LFSR. They are very efficient and simple to implement even in software and are guaranteed not to repeat in (2^N - 1) uses for an LFSR with an N-bit shift-register.
There are some drawbacks however: by observing a small number of outputs from the RNG, one can reconstruct the LFSR and predict all values it will generate, making them not usable for cryptography and anywhere were a good RNG is needed. The second problem is that either the all zero word or the all one (in terms of bits) word is invalid depending on the LFSR implementation. The third issue which is relevant to your question is that the maximum number generated by the LFSR is always a power of 2 - 1 (or power of 2 - 2).
The first drawback might not be an issue depending on your application. From the example you gave, it seems that you are not expecting zero to be among the answers; so, the second issue does not seem relevant to your case.
The maximum value (and thus range) problem can solved by reusing the LFSR until you get a number within your range. Here's an example:
Say you want to have numbers between 1 and 10 (as in your example). You would use a 4-bit LFSR which has a range [1, 15] inclusive. Here's a pseudo code as to how to get number in the range [1,10]:
x = LFSR.getRandomNumber();
while (x > 10) {
x = LFSR.getRandomNumber();
}
You should embed the previous code in your RNG; so that the caller wouldn't care about implementation.
Note that this would slow down your RNG if you use a large shift-register and the maximum number you want is not a power of 2 - 1.
This answer suggests some strategies for getting what you want and ensuring they are in a random order using some already well-known algorithms.
There is an inside out version of the Fisher-Yates shuffle algorithm, called the Durstenfeld version, that randomly distributes sequentially acquired items into arrays and collections while loading the array or collection.
One thing to remember is that the Fisher-Yates (AKA Knuth) shuffle or the Durstenfeld version used at load time is highly efficient with arrays of objects because only the reference pointer to the object is being moved and the object itself doesn't have to be examined or compared with any other object as part of the algorithm.
I will give both algorithms further below.
If you want really huge random numbers, on the order of 1024 bytes or more, a really good random generator that can generate unsigned bytes or words at a time will suffice. Randomly generate as many bytes or words as you need to construct the number, make it into an object with a reference pointer to it and, hey presto, you have a really huge random integer. If you need a specific really huge range, you can add a base value of zero bytes to the low-order end of the byte sequence to shift the value up. This may be your best option.
If you need to eliminate duplicates of really huge random numbers, then that is trickier. Even with really huge random numbers, removing duplicates also makes them significantly biased and not random at all. If you have a really large set of unduplicated really huge random numbers and you randomly select from the ones not yet selected, then the bias is only the bias in creating the huge values for the really huge set of numbers from which to choose. A reverse version of Durstenfeld's version of the Yates-Fisher could be used to randomly choose values from a really huge set of them, remove them from the remaining values from which to choose and insert them into a new array that is a subset and could do this with just the source and target arrays in situ. This would be very efficient.
This may be a good strategy for getting a small number of random numbers with enormous values from a really large set of them in which they are not duplicated. Just pick a random location in the source set, obtain its value, swap its value with the top element in the source set, reduce the size of the source set by one and repeat with the reduced size source set until you have chosen enough values. This is essentiall the Durstenfeld version of Fisher-Yates in reverse. You can then use the Dursenfeld version of the Fisher-Yates algorithm to insert the acquired values into the destination set. However, that is overkill since they should be randomly chosen and randomly ordered as given here.
Both algorithms assume you have some random number instance method, nextInt(int setSize), that generates a random integer from zero to setSize meaning there are setSize possible values. In this case, it will be the size of the array since the last index to the array is size-1.
The first algorithm is the Durstenfeld version of Fisher-Yates (aka Knuth) shuffle algorithm as applied to an array of arbitrary length, one that simply randomly positions integers from 0 to the length of the array into the array. The array need not be an array of integers, but can be an array of any objects that are acquired sequentially which, effectively, makes it an array of reference pointers. It is simple, short and very effective
int size = someNumber;
int[] int array = new int[size]; // here is the array to load
int location; // this will get assigned a value before used
// i will also conveniently be the value to load, but any sequentially acquired
// object will work
for (int i = 0; i <= size; i++) { // conveniently, i is also the value to load
// you can instance or acquire any object at this place in the algorithm to load
// by reference, into the array and use a pointer to it in place of j
int j = i; // in this example, j is trivially i
if (i == 0) { // first integer goes into first location
array[i] = j; // this may get swapped from here later
} else { // subsequent integers go into random locations
// the next random location will be somewhere in the locations
// already used or a new one at the end
// here we get the next random location
// to preserve true randomness without a significant bias
// it is REALLY IMPORTANT that the newest value could be
// stored in the newest location, that is,
// location has to be able to randomly have the value i
int location = nextInt(i + 1); // a random value between 0 and i
// move the random location's value to the new location
array[i] = array[location];
array[location] = j; // put the new value into the random location
} // end if...else
} // end for
Voila, you now have an already randomized array.
If you want to randomly shuffle an array you already have, here is the standard Fisher-Yates algorithm.
type[] array = new type[size];
// some code that loads array...
// randomly pick an item anywhere in the current array segment,
// swap it with the top element in the current array segment,
// then shorten the array segment by 1
// just as with the Durstenfeld version above,
// it is REALLY IMPORTANT that an element could get
// swapped with itself to avoid any bias in the randomization
type temp; // this will get assigned a value before used
int location; // this will get assigned a value before used
for (int i = arrayLength -1 ; i > 0; i--) {
int location = nextInt(i + 1);
temp = array[i];
array[i] = array[location];
array[location] = temp;
} // end for
For sequenced collections and sets, i.e. some type of list object, you could just use adds/or inserts with an index value that allows you to insert items anywhere, but it has to allow adding or appending after the current last item to avoid creating bias in the randomization.
Shuffling N elements doesn't take up excessive memory...think about it. You only swap one element at a time, so the maximum memory used is that of N+1 elements.
Assuming you have a random or pseudo-random number generator, even if it's not guaranteed to return unique values, you can implement one that returns unique values each time using this code, assuming that the upper limit remains constant (i.e. you always call it with random(10), and don't call it with random(10); random(11).
The code doesn't check for errors. You can add that yourself if you want to.
It also requires a lot of memory if you want a large range of numbers.
/* the function returns a random number between 0 and max -1
* not necessarily unique
* I assume it's written
*/
int random(int max);
/* the function returns a unique random number between 0 and max - 1 */
int unique_random(int max)
{
static int *list = NULL; /* contains a list of numbers we haven't returned */
static int in_progress = 0; /* 0 --> we haven't started randomizing numbers
* 1 --> we have started randomizing numbers
*/
static int count;
static prev_max = 0;
// initialize the list
if (!in_progress || (prev_max != max)) {
if (list != NULL) {
free(list);
}
list = malloc(sizeof(int) * max);
prev_max = max;
in_progress = 1;
count = max - 1;
int i;
for (i = max - 1; i >= 0; --i) {
list[i] = i;
}
}
/* now choose one from the list */
int index = random(count);
int retval = list[index];
/* now we throw away the returned value.
* we do this by shortening the list by 1
* and replacing the element we returned with
* the highest remaining number
*/
swap(&list[index], &list[count]);
/* when the count reaches 0 we start over */
if (count == 0) {
in_progress = 0;
free(list);
list = 0;
} else { /* reduce the counter by 1 */
count--;
}
}
/* swap two numbers */
void swap(int *x, int *y)
{
int temp = *x;
*x = *y;
*y = temp;
}
Actually, there's a minor point to make here; a random number generator which is not permitted to repeat is not random.
Suppose you wanted to generate a series of 256 random numbers without repeats.
Create a 256-bit (32-byte) memory block initialized with zeros, let's call it b
Your looping variable will be n, the number of numbers yet to be generated
Loop from n = 256 to n = 1
Generate a random number r in the range [0, n)
Find the r-th zero bit in your memory block b, let's call it p
Put p in your list of results, an array called q
Flip the p-th bit in memory block b to 1
After the n = 1 pass, you are done generating your list of numbers
Here's a short example of what I am talking about, using n = 4 initially:
**Setup**
b = 0000
q = []
**First loop pass, where n = 4**
r = 2
p = 2
b = 0010
q = [2]
**Second loop pass, where n = 3**
r = 2
p = 3
b = 0011
q = [2, 3]
**Third loop pass, where n = 2**
r = 0
p = 0
b = 1011
q = [2, 3, 0]
** Fourth and final loop pass, where n = 1**
r = 0
p = 1
b = 1111
q = [2, 3, 0, 1]
Please check answers at
Generate sequence of integers in random order without constructing the whole list upfront
and also my answer lies there as
very simple random is 1+((power(r,x)-1) mod p) will be from 1 to p for values of x from 1 to p and will be random where r and p are prime numbers and r <> p.
I asked a similar question before but mine was for the whole range of a int see Looking for a Hash Function /Ordered Int/ to /Shuffled Int/
static std::unordered_set<long> s;
long l = 0;
for(; !l && (s.end() != s.find(l)); l = generator());
v.insert(l);
generator() being your random number generator. You roll numbers as long as the entry is not in your set, then you add what you find in it. You get the idea.
I did it with long for the example, but you should make that a template if your PRNG is templatized.
Alternative is to use a cryptographically secure PRNG that will have a very low probability to generate twice the same number.
If you don't mean poor statisticall properties of generated sequence, there is one method:
Let's say you want to generate N numbers, each of 1024 bits each. You can sacrifice some bits of generated number to be "counter".
So you generate each random number, but into some bits you choosen you put binary encoded counter (from variable, you increase each time next random number is generated).
You can split that number into single bits and put it in some of less significant bits of generated number.
That way you are sure you get unique number each time.
I mean for example each generated number looks like that:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyxxxxyxyyyyxxyxx
where x is take directly from generator, and ys are taken from counter variable.
Mersenne twister
Description of which can be found here on Wikipedia: Mersenne twister
Look at the bottom of the page for implementations in various languages.
The problem is to select a "random" sequence of N unique numbers from the range 1..M where there is no constraint on the relationship between N and M (M could be much bigger, about the same, or even smaller than N; they may not be relatively prime).
Expanding on the linear feedback shift register answer: for a given M, construct a maximal LFSR for the smallest power of two that is larger than M. Then just grab your numbers from the LFSR throwing out numbers larger than M. On average, you will throw out at most half the generated numbers (since by construction more than half the range of the LFSR is less than M), so the expected running time of getting a number is O(1). You are not storing previously generated numbers so space consumption is O(1) too. If you cycle before getting N numbers then M less than N (or the LFSR is constructed incorrectly).
You can find the parameters for maximum length LFSRs up to 168 bits here (from wikipedia): http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf
Here's some java code:
/**
* Generate a sequence of unique "random" numbers in [0,M)
* #author dkoes
*
*/
public class UniqueRandom
{
long lfsr;
long mask;
long max;
private static long seed = 1;
//indexed by number of bits
private static int [][] taps = {
null, // 0
null, // 1
null, // 2
{3,2}, //3
{4,3},
{5,3},
{6,5},
{7,6},
{8,6,5,4},
{9,5},
{10,7},
{11,9},
{12,6,4,1},
{13,4,3,1},
{14,5,3,1},
{15,14},
{16,15,13,4},
{17,14},
{18,11},
{19,6,2,1},
{20,17},
{21,19},
{22,21},
{23,18},
{24,23,22,17},
{25,22},
{26,6,2,1},
{27,5,2,1},
{28,25},
{29,27},
{30,6,4,1},
{31,28},
{32,22,2,1},
{33,20},
{34,27,2,1},
{35,33},
{36,25},
{37,5,4,3,2,1},
{38,6,5,1},
{39,35},
{40,38,21,19},
{41,38},
{42,41,20,19},
{43,42,38,37},
{44,43,18,17},
{45,44,42,41},
{46,45,26,25},
{47,42},
{48,47,21,20},
{49,40},
{50,49,24,23},
{51,50,36,35},
{52,49},
{53,52,38,37},
{54,53,18,17},
{55,31},
{56,55,35,34},
{57,50},
{58,39},
{59,58,38,37},
{60,59},
{61,60,46,45},
{62,61,6,5},
{63,62},
};
//m is upperbound; things break if it isn't positive
UniqueRandom(long m)
{
max = m;
lfsr = seed; //could easily pass a starting point instead
//figure out number of bits
int bits = 0;
long b = m;
while((b >>>= 1) != 0)
{
bits++;
}
bits++;
if(bits < 3)
bits = 3;
mask = 0;
for(int i = 0; i < taps[bits].length; i++)
{
mask |= (1L << (taps[bits][i]-1));
}
}
//return -1 if we've cycled
long next()
{
long ret = -1;
if(lfsr == 0)
return -1;
do {
ret = lfsr;
//update lfsr - from wikipedia
long lsb = lfsr & 1;
lfsr >>>= 1;
if(lsb == 1)
lfsr ^= mask;
if(lfsr == seed)
lfsr = 0; //cycled, stick
ret--; //zero is stuck state, never generated so sub 1 to get it
} while(ret >= max);
return ret;
}
}
Here is a way to random without repeating results. It also works for strings. Its in C# but the logig should work in many places. Put the random results in a list and check if the new random element is in that list. If not than you have a new random element. If it is in that list, repeat the random until you get an element that is not in that list.
List<string> Erledigte = new List<string>();
private void Form1_Load(object sender, EventArgs e)
{
label1.Text = "";
listBox1.Items.Add("a");
listBox1.Items.Add("b");
listBox1.Items.Add("c");
listBox1.Items.Add("d");
listBox1.Items.Add("e");
}
private void button1_Click(object sender, EventArgs e)
{
Random rand = new Random();
int index=rand.Next(0, listBox1.Items.Count);
string rndString = listBox1.Items[index].ToString();
if (listBox1.Items.Count <= Erledigte.Count)
{
return;
}
else
{
if (Erledigte.Contains(rndString))
{
//MessageBox.Show("vorhanden");
while (Erledigte.Contains(rndString))
{
index = rand.Next(0, listBox1.Items.Count);
rndString = listBox1.Items[index].ToString();
}
}
Erledigte.Add(rndString);
label1.Text += rndString;
}
}
For a sequence to be random there should not be any auto correlation. The restriction that the numbers should not repeat means the next number should depend on all the previous numbers which means it is not random anymore....
If you can generate 'small' random numbers, you can generate 'large' random numbers by integrating them: add a small random increment to each 'previous'.
const size_t amount = 100; // a limited amount of random numbers
vector<long int> numbers;
numbers.reserve( amount );
const short int spread = 250; // about 250 between each random number
numbers.push_back( myrandom( spread ) );
for( int n = 0; n != amount; ++n ) {
const short int increment = myrandom( spread );
numbers.push_back( numbers.back() + increment );
}
myshuffle( numbers );
The myrandom and myshuffle functions I hereby generously delegate to others :)
to have non repeated random numbers and to avoid waistingtime with checking for doubles numbers and get new numbers over and over use the below method which will assure the minimum usage of Rand:
for example if you want to get 100 non repeated random number:
1. fill an array with numbers from 1 to 100
2. get a random number using Rand function in the range of (1-100)
3. use the genarted random number as an Index to get th value from the array (Numbers[IndexGeneratedFromRandFunction]
4. shift the number in the array after that Index to the left
5. repeat from step 2 but now the the rang should be (1-99) and go on
now we have a array with different numbers!
int main() {
int b[(the number
if them)];
for (int i = 0; i < (the number of them); i++) {
int a = rand() % (the number of them + 1) + 1;
int j = 0;
while (j < i) {
if (a == b[j]) {
a = rand() % (the number of them + 1) + 1;
j = -1;
}
j++;
}
b[i] = a;
}
}