Create Random Number Sequence with No Repeats - c++

Duplicate:
Unique random numbers in O(1)?
I want an pseudo random number generator that can generate numbers with no repeats in a random order.
For example:
random(10)
might return
5, 9, 1, 4, 2, 8, 3, 7, 6, 10
Is there a better way to do it other than making the range of numbers and shuffling them about, or checking the generated list for repeats?
Edit:
Also I want it to be efficient in generating big numbers without the entire range.
Edit:
I see everyone suggesting shuffle algorithms. But if I want to generate large random number (1024 byte+) then that method would take alot more memory than if I just used a regular RNG and inserted into a Set until it was a specified length, right? Is there no better mathematical algorithm for this.

You may be interested in a linear feedback shift register.
We used to build these out of hardware, but I've also done them in software. It uses a shift register with some of the bits xor'ed and fed back to the input, and if you pick just the right "taps" you can get a sequence that's as long as the register size. That is, a 16-bit lfsr can produce a sequence 65535 long with no repeats. It's statistically random but of course eminently repeatable. Also, if it's done wrong, you can get some embarrassingly short sequences. If you look up the lfsr, you will find examples of how to construct them properly (which is to say, "maximal length").

A shuffle is a perfectly good way to do this (provided you do not introduce a bias using the naive algorithm). See Fisher-Yates shuffle.

If a random number is guaranteed to never repeat it is no longer random and the amount of randomness decreases as the numbers are generated (after nine numbers random(10) is rather predictable and even after only eight you have a 50-50 chance).

I understand tou don't want a shuffle for large ranges, since you'd have to store the whole list to do so.
Instead, use a reversible pseudo-random hash. Then feed in the values 0 1 2 3 4 5 6 etc in turn.
There are infinite numbers of hashes like this. They're not too hard to generate if they're restricted to a power of 2, but any base can be used.
Here's one that would work for example if you wanted to go through all 2^32 32 bit values. It's easiest to write because the implicit mod 2^32 of integer math works to your advantage in this case.
unsigned int reversableHash(unsigned int x)
{
x*=0xDEADBEEF;
x=x^(x>>17);
x*=0x01234567;
x+=0x88776655;
x=x^(x>>4);
x=x^(x>>9);
x*=0x91827363;
x=x^(x>>7);
x=x^(x>>11);
x=x^(x>>20);
x*=0x77773333;
return x;
}

If you don't mind mediocre randomness properties and if the number of elements allows it then you could use a linear congruential random number generator.

A shuffle is the best you can do for random numbers in a specific range with no repeats. The reason that the method you describe (randomly generate numbers and put them in a Set until you reach a specified length) is less efficient is because of duplicates. Theoretically, that algorithm might never finish. At best it will finish in an indeterminable amount of time, as compared to a shuffle, which will always run in a highly predictable amount of time.
Response to edits and comments:
If, as you indicate in the comments, the range of numbers is very large and you want to select relatively few of them at random with no repeats, then the likelihood of repeats diminishes rapidly. The bigger the difference in size between the range and the number of selections, the smaller the likelihood of repeat selections, and the better the performance will be for the select-and-check algorithm you describe in the question.

What about using GUID generator (like in the one in .NET). Granted it is not guaranteed that there will be no duplicates, however the chance getting one is pretty low.

This has been asked before - see my answer to the previous question. In a nutshell: You can use a block cipher to generate a secure (random) permutation over any range you want, without having to store the entire permutation at any point.

If you want to creating large (say, 64 bits or greater) random numbers with no repeats, then just create them. If you're using a good random number generator, that actually has enough entropy, then the odds of generating repeats are so miniscule as to not be worth worrying about.
For instance, when generating cryptographic keys, no one actually bothers checking to see if they've generated the same key before; since you're trusting your random number generator that a dedicated attacker won't be able to get the same key out, then why would you expect that you would come up with the same key accidentally?
Of course, if you have a bad random number generator (like the Debian SSL random number generator vulnerability), or are generating small enough numbers that the birthday paradox gives you a high chance of collision, then you will need to actually do something to ensure you don't get repeats. But for large random numbers with a good generator, just trust probability not to give you any repeats.

As you generate your numbers, use a Bloom filter to detect duplicates. This would use a minimal amount of memory. There would be no need to store earlier numbers in the series at all.
The trade off is that your list could not be exhaustive in your range. If your numbers are truly on the order of 256^1024, that's hardly any trade off at all.
(Of course if they are actually random on that scale, even bothering to detect duplicates is a waste of time. If every computer on earth generated a trillion random numbers that size every second for trillions of years, the chance of a collision is still absolutely negligible.)

I second gbarry's answer about using an LFSR. They are very efficient and simple to implement even in software and are guaranteed not to repeat in (2^N - 1) uses for an LFSR with an N-bit shift-register.
There are some drawbacks however: by observing a small number of outputs from the RNG, one can reconstruct the LFSR and predict all values it will generate, making them not usable for cryptography and anywhere were a good RNG is needed. The second problem is that either the all zero word or the all one (in terms of bits) word is invalid depending on the LFSR implementation. The third issue which is relevant to your question is that the maximum number generated by the LFSR is always a power of 2 - 1 (or power of 2 - 2).
The first drawback might not be an issue depending on your application. From the example you gave, it seems that you are not expecting zero to be among the answers; so, the second issue does not seem relevant to your case.
The maximum value (and thus range) problem can solved by reusing the LFSR until you get a number within your range. Here's an example:
Say you want to have numbers between 1 and 10 (as in your example). You would use a 4-bit LFSR which has a range [1, 15] inclusive. Here's a pseudo code as to how to get number in the range [1,10]:
x = LFSR.getRandomNumber();
while (x > 10) {
x = LFSR.getRandomNumber();
}
You should embed the previous code in your RNG; so that the caller wouldn't care about implementation.
Note that this would slow down your RNG if you use a large shift-register and the maximum number you want is not a power of 2 - 1.

This answer suggests some strategies for getting what you want and ensuring they are in a random order using some already well-known algorithms.
There is an inside out version of the Fisher-Yates shuffle algorithm, called the Durstenfeld version, that randomly distributes sequentially acquired items into arrays and collections while loading the array or collection.
One thing to remember is that the Fisher-Yates (AKA Knuth) shuffle or the Durstenfeld version used at load time is highly efficient with arrays of objects because only the reference pointer to the object is being moved and the object itself doesn't have to be examined or compared with any other object as part of the algorithm.
I will give both algorithms further below.
If you want really huge random numbers, on the order of 1024 bytes or more, a really good random generator that can generate unsigned bytes or words at a time will suffice. Randomly generate as many bytes or words as you need to construct the number, make it into an object with a reference pointer to it and, hey presto, you have a really huge random integer. If you need a specific really huge range, you can add a base value of zero bytes to the low-order end of the byte sequence to shift the value up. This may be your best option.
If you need to eliminate duplicates of really huge random numbers, then that is trickier. Even with really huge random numbers, removing duplicates also makes them significantly biased and not random at all. If you have a really large set of unduplicated really huge random numbers and you randomly select from the ones not yet selected, then the bias is only the bias in creating the huge values for the really huge set of numbers from which to choose. A reverse version of Durstenfeld's version of the Yates-Fisher could be used to randomly choose values from a really huge set of them, remove them from the remaining values from which to choose and insert them into a new array that is a subset and could do this with just the source and target arrays in situ. This would be very efficient.
This may be a good strategy for getting a small number of random numbers with enormous values from a really large set of them in which they are not duplicated. Just pick a random location in the source set, obtain its value, swap its value with the top element in the source set, reduce the size of the source set by one and repeat with the reduced size source set until you have chosen enough values. This is essentiall the Durstenfeld version of Fisher-Yates in reverse. You can then use the Dursenfeld version of the Fisher-Yates algorithm to insert the acquired values into the destination set. However, that is overkill since they should be randomly chosen and randomly ordered as given here.
Both algorithms assume you have some random number instance method, nextInt(int setSize), that generates a random integer from zero to setSize meaning there are setSize possible values. In this case, it will be the size of the array since the last index to the array is size-1.
The first algorithm is the Durstenfeld version of Fisher-Yates (aka Knuth) shuffle algorithm as applied to an array of arbitrary length, one that simply randomly positions integers from 0 to the length of the array into the array. The array need not be an array of integers, but can be an array of any objects that are acquired sequentially which, effectively, makes it an array of reference pointers. It is simple, short and very effective
int size = someNumber;
int[] int array = new int[size]; // here is the array to load
int location; // this will get assigned a value before used
// i will also conveniently be the value to load, but any sequentially acquired
// object will work
for (int i = 0; i <= size; i++) { // conveniently, i is also the value to load
// you can instance or acquire any object at this place in the algorithm to load
// by reference, into the array and use a pointer to it in place of j
int j = i; // in this example, j is trivially i
if (i == 0) { // first integer goes into first location
array[i] = j; // this may get swapped from here later
} else { // subsequent integers go into random locations
// the next random location will be somewhere in the locations
// already used or a new one at the end
// here we get the next random location
// to preserve true randomness without a significant bias
// it is REALLY IMPORTANT that the newest value could be
// stored in the newest location, that is,
// location has to be able to randomly have the value i
int location = nextInt(i + 1); // a random value between 0 and i
// move the random location's value to the new location
array[i] = array[location];
array[location] = j; // put the new value into the random location
} // end if...else
} // end for
Voila, you now have an already randomized array.
If you want to randomly shuffle an array you already have, here is the standard Fisher-Yates algorithm.
type[] array = new type[size];
// some code that loads array...
// randomly pick an item anywhere in the current array segment,
// swap it with the top element in the current array segment,
// then shorten the array segment by 1
// just as with the Durstenfeld version above,
// it is REALLY IMPORTANT that an element could get
// swapped with itself to avoid any bias in the randomization
type temp; // this will get assigned a value before used
int location; // this will get assigned a value before used
for (int i = arrayLength -1 ; i > 0; i--) {
int location = nextInt(i + 1);
temp = array[i];
array[i] = array[location];
array[location] = temp;
} // end for
For sequenced collections and sets, i.e. some type of list object, you could just use adds/or inserts with an index value that allows you to insert items anywhere, but it has to allow adding or appending after the current last item to avoid creating bias in the randomization.

Shuffling N elements doesn't take up excessive memory...think about it. You only swap one element at a time, so the maximum memory used is that of N+1 elements.

Assuming you have a random or pseudo-random number generator, even if it's not guaranteed to return unique values, you can implement one that returns unique values each time using this code, assuming that the upper limit remains constant (i.e. you always call it with random(10), and don't call it with random(10); random(11).
The code doesn't check for errors. You can add that yourself if you want to.
It also requires a lot of memory if you want a large range of numbers.
/* the function returns a random number between 0 and max -1
* not necessarily unique
* I assume it's written
*/
int random(int max);
/* the function returns a unique random number between 0 and max - 1 */
int unique_random(int max)
{
static int *list = NULL; /* contains a list of numbers we haven't returned */
static int in_progress = 0; /* 0 --> we haven't started randomizing numbers
* 1 --> we have started randomizing numbers
*/
static int count;
static prev_max = 0;
// initialize the list
if (!in_progress || (prev_max != max)) {
if (list != NULL) {
free(list);
}
list = malloc(sizeof(int) * max);
prev_max = max;
in_progress = 1;
count = max - 1;
int i;
for (i = max - 1; i >= 0; --i) {
list[i] = i;
}
}
/* now choose one from the list */
int index = random(count);
int retval = list[index];
/* now we throw away the returned value.
* we do this by shortening the list by 1
* and replacing the element we returned with
* the highest remaining number
*/
swap(&list[index], &list[count]);
/* when the count reaches 0 we start over */
if (count == 0) {
in_progress = 0;
free(list);
list = 0;
} else { /* reduce the counter by 1 */
count--;
}
}
/* swap two numbers */
void swap(int *x, int *y)
{
int temp = *x;
*x = *y;
*y = temp;
}

Actually, there's a minor point to make here; a random number generator which is not permitted to repeat is not random.

Suppose you wanted to generate a series of 256 random numbers without repeats.
Create a 256-bit (32-byte) memory block initialized with zeros, let's call it b
Your looping variable will be n, the number of numbers yet to be generated
Loop from n = 256 to n = 1
Generate a random number r in the range [0, n)
Find the r-th zero bit in your memory block b, let's call it p
Put p in your list of results, an array called q
Flip the p-th bit in memory block b to 1
After the n = 1 pass, you are done generating your list of numbers
Here's a short example of what I am talking about, using n = 4 initially:
**Setup**
b = 0000
q = []
**First loop pass, where n = 4**
r = 2
p = 2
b = 0010
q = [2]
**Second loop pass, where n = 3**
r = 2
p = 3
b = 0011
q = [2, 3]
**Third loop pass, where n = 2**
r = 0
p = 0
b = 1011
q = [2, 3, 0]
** Fourth and final loop pass, where n = 1**
r = 0
p = 1
b = 1111
q = [2, 3, 0, 1]

Please check answers at
Generate sequence of integers in random order without constructing the whole list upfront
and also my answer lies there as
very simple random is 1+((power(r,x)-1) mod p) will be from 1 to p for values of x from 1 to p and will be random where r and p are prime numbers and r <> p.

I asked a similar question before but mine was for the whole range of a int see Looking for a Hash Function /Ordered Int/ to /Shuffled Int/

static std::unordered_set<long> s;
long l = 0;
for(; !l && (s.end() != s.find(l)); l = generator());
v.insert(l);
generator() being your random number generator. You roll numbers as long as the entry is not in your set, then you add what you find in it. You get the idea.
I did it with long for the example, but you should make that a template if your PRNG is templatized.
Alternative is to use a cryptographically secure PRNG that will have a very low probability to generate twice the same number.

If you don't mean poor statisticall properties of generated sequence, there is one method:
Let's say you want to generate N numbers, each of 1024 bits each. You can sacrifice some bits of generated number to be "counter".
So you generate each random number, but into some bits you choosen you put binary encoded counter (from variable, you increase each time next random number is generated).
You can split that number into single bits and put it in some of less significant bits of generated number.
That way you are sure you get unique number each time.
I mean for example each generated number looks like that:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyxxxxyxyyyyxxyxx
where x is take directly from generator, and ys are taken from counter variable.

Mersenne twister
Description of which can be found here on Wikipedia: Mersenne twister
Look at the bottom of the page for implementations in various languages.

The problem is to select a "random" sequence of N unique numbers from the range 1..M where there is no constraint on the relationship between N and M (M could be much bigger, about the same, or even smaller than N; they may not be relatively prime).
Expanding on the linear feedback shift register answer: for a given M, construct a maximal LFSR for the smallest power of two that is larger than M. Then just grab your numbers from the LFSR throwing out numbers larger than M. On average, you will throw out at most half the generated numbers (since by construction more than half the range of the LFSR is less than M), so the expected running time of getting a number is O(1). You are not storing previously generated numbers so space consumption is O(1) too. If you cycle before getting N numbers then M less than N (or the LFSR is constructed incorrectly).
You can find the parameters for maximum length LFSRs up to 168 bits here (from wikipedia): http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf
Here's some java code:
/**
* Generate a sequence of unique "random" numbers in [0,M)
* #author dkoes
*
*/
public class UniqueRandom
{
long lfsr;
long mask;
long max;
private static long seed = 1;
//indexed by number of bits
private static int [][] taps = {
null, // 0
null, // 1
null, // 2
{3,2}, //3
{4,3},
{5,3},
{6,5},
{7,6},
{8,6,5,4},
{9,5},
{10,7},
{11,9},
{12,6,4,1},
{13,4,3,1},
{14,5,3,1},
{15,14},
{16,15,13,4},
{17,14},
{18,11},
{19,6,2,1},
{20,17},
{21,19},
{22,21},
{23,18},
{24,23,22,17},
{25,22},
{26,6,2,1},
{27,5,2,1},
{28,25},
{29,27},
{30,6,4,1},
{31,28},
{32,22,2,1},
{33,20},
{34,27,2,1},
{35,33},
{36,25},
{37,5,4,3,2,1},
{38,6,5,1},
{39,35},
{40,38,21,19},
{41,38},
{42,41,20,19},
{43,42,38,37},
{44,43,18,17},
{45,44,42,41},
{46,45,26,25},
{47,42},
{48,47,21,20},
{49,40},
{50,49,24,23},
{51,50,36,35},
{52,49},
{53,52,38,37},
{54,53,18,17},
{55,31},
{56,55,35,34},
{57,50},
{58,39},
{59,58,38,37},
{60,59},
{61,60,46,45},
{62,61,6,5},
{63,62},
};
//m is upperbound; things break if it isn't positive
UniqueRandom(long m)
{
max = m;
lfsr = seed; //could easily pass a starting point instead
//figure out number of bits
int bits = 0;
long b = m;
while((b >>>= 1) != 0)
{
bits++;
}
bits++;
if(bits < 3)
bits = 3;
mask = 0;
for(int i = 0; i < taps[bits].length; i++)
{
mask |= (1L << (taps[bits][i]-1));
}
}
//return -1 if we've cycled
long next()
{
long ret = -1;
if(lfsr == 0)
return -1;
do {
ret = lfsr;
//update lfsr - from wikipedia
long lsb = lfsr & 1;
lfsr >>>= 1;
if(lsb == 1)
lfsr ^= mask;
if(lfsr == seed)
lfsr = 0; //cycled, stick
ret--; //zero is stuck state, never generated so sub 1 to get it
} while(ret >= max);
return ret;
}
}

Here is a way to random without repeating results. It also works for strings. Its in C# but the logig should work in many places. Put the random results in a list and check if the new random element is in that list. If not than you have a new random element. If it is in that list, repeat the random until you get an element that is not in that list.
List<string> Erledigte = new List<string>();
private void Form1_Load(object sender, EventArgs e)
{
label1.Text = "";
listBox1.Items.Add("a");
listBox1.Items.Add("b");
listBox1.Items.Add("c");
listBox1.Items.Add("d");
listBox1.Items.Add("e");
}
private void button1_Click(object sender, EventArgs e)
{
Random rand = new Random();
int index=rand.Next(0, listBox1.Items.Count);
string rndString = listBox1.Items[index].ToString();
if (listBox1.Items.Count <= Erledigte.Count)
{
return;
}
else
{
if (Erledigte.Contains(rndString))
{
//MessageBox.Show("vorhanden");
while (Erledigte.Contains(rndString))
{
index = rand.Next(0, listBox1.Items.Count);
rndString = listBox1.Items[index].ToString();
}
}
Erledigte.Add(rndString);
label1.Text += rndString;
}
}

For a sequence to be random there should not be any auto correlation. The restriction that the numbers should not repeat means the next number should depend on all the previous numbers which means it is not random anymore....

If you can generate 'small' random numbers, you can generate 'large' random numbers by integrating them: add a small random increment to each 'previous'.
const size_t amount = 100; // a limited amount of random numbers
vector<long int> numbers;
numbers.reserve( amount );
const short int spread = 250; // about 250 between each random number
numbers.push_back( myrandom( spread ) );
for( int n = 0; n != amount; ++n ) {
const short int increment = myrandom( spread );
numbers.push_back( numbers.back() + increment );
}
myshuffle( numbers );
The myrandom and myshuffle functions I hereby generously delegate to others :)

to have non repeated random numbers and to avoid waistingtime with checking for doubles numbers and get new numbers over and over use the below method which will assure the minimum usage of Rand:
for example if you want to get 100 non repeated random number:
1. fill an array with numbers from 1 to 100
2. get a random number using Rand function in the range of (1-100)
3. use the genarted random number as an Index to get th value from the array (Numbers[IndexGeneratedFromRandFunction]
4. shift the number in the array after that Index to the left
5. repeat from step 2 but now the the rang should be (1-99) and go on

now we have a array with different numbers!
int main() {
int b[(the number
if them)];
for (int i = 0; i < (the number of them); i++) {
int a = rand() % (the number of them + 1) + 1;
int j = 0;
while (j < i) {
if (a == b[j]) {
a = rand() % (the number of them + 1) + 1;
j = -1;
}
j++;
}
b[i] = a;
}
}

Related

C++ How to generate 10,000 UNIQUE random integers to store in a BST?

I am trying to generate 10,000 unique random integers in the range of 1 to 20,000 to store in a BST, but not sure the best way to do this.
I saw some good suggestions on how to do it with an array or a vector, but not for a BST. I have a contains method but I don't believe it will work in this scenario as it is used to search and return results on how many tries it took to find the desired number. Below is the closest I've gotten but it doesn't like my == operator. Would it be better to use an array and just store the array in the BST? Or is there a better way to use the below code so that while it's generating the numbers it's just storing them right in the tree?
for (int i = 0; i < 10000; i++)
{
int random = rand() % 20000;
tree1Ptr->add(random);
for (int j = 0; j < i; j++) {
if (tree1Ptr[j]==random) i--;
}
}
There are a couple of problems in your code. But let's go straight to the hurting point.
What's the main problem ?
From your code, it is obvious that tree1Ptr is a pointer. In principle, it should point to a node of the tree, which has two pointers, one to the left node and one to the right node.
So somewhere in your code, you should have:
tree1Ptr = new Node; // or whatever the type of your node is called
However, in your inner loop, you are just using it as if it was an array:
for (int i = 0; i < 10000; i++)
{
int random = rand() % 20000;
tree1Ptr->add(random);
for (int j = 0; j < i; j++) {
if (tree1Ptr[j]==random) //<============ OUCH !!
i--;
}
}
The compiler won't complain, because it's valid syntax: you can use array indexing on a pointer. But it's up to you to ensure that you don not go out of bounds (so here, that j remains <1).
Other remarks
By the way, in the inner loop, you just want to say that you have to retry if the number is found. You can break the inner loop if the number is already found, in order not to continue.
You should also seed your random number generator, to avoid running the program always with the same sequence.
How to solve it ?
You really need to deepen your understanding of BST. Navigating through the node requires make comparison with the value in the current node, and depending on the result, iterate continuing either with the left or the right pointer, not using indexing. But it would be too long to explain here. So may be you should look for a tutorial, like this one
For a lot of unique 'random' numbers I usually use a Format Preserving Encryption. Since encryption is one-to-one, you are guaranteed unique outputs as long as the inputs are unique. A different encryption key will generate a different set of outputs, i.e. a different permutation of the inputs. Simply encrypt 0, 1, 2, 3, 4, ... and the outputs are guaranteed unique.
You want numbers in the range [1 .. 20,000]. Unfortunately 20,000 needs 21 bits and most encryption schemes have an even number of bits: 22 bits in your case. That means you will need to cycle walk; re-encrypt the output if the number is too big until you get a number in the desired range. Since your inputs only go up to 10,000 and you will be cycle walking above 20,000 you will still avoid duplicates.
The only standard cipher I know of which allows a 22 bit block size is Hasty Pudding cipher. Alternatively it is easy enough to write your own simple Feistel cipher. Four rounds are enough if you do not want cryptographic security. For crypto level security you will need to use AES/FFX, which is NIST approved.
There are two ways where you can pick random unique numbers out of a sequence without checking against the numbers previously picked (i.e. already in your BST).
Use random_shuffle
A simple way is to shuffle a sorted array of 1 ... 20,000 and simply pick the first 10,000 items:
#include <algorithm>
#include <vector>
std::vector<int> values(20000);
for (int i = 0; i < 20000; ++i) {
values[i] = i+1;
}
std::random_shuffle(values.begin(), values.end());
for (int i = 0; i < 10000; ++i) {
// Insert values[i] into your BST
}
This method works well if the size of random numbers (10,000) to pick is comparable to the size of total numbers (20,000), because the complexity of random shuffling is amortized over a larger result set.
Use uniform_int_distribution
If the size of random numbers to pick is much smaller than the size of total numbers, then an alternative way can be used:
#include <chrono>
#include <random>
#include <vector>
// Use timed seed so every run produces different random picks.
std::default_random_engine reng(
std::chrono::steady_clock::now().time_since_epoch().count());
int num_pick = 1000; // # of random numbers remained to pick
int num_total = 20000; // Total # of numbers to pick from
int cur_value = 1; // Current prospective number to be picked
while (num_pick > 0) {
// Probability to pick `cur_value` is num_pick / (num_total-cur_value+1)
std::uniform_int_distribution<int> distrib(0, num_total-cur_value);
if (distrib(reng) < num_pick) {
bst.insert(cur_value); // insert `cur_value` to your BST
--num_pick;
}
++cur_value;
}

efficiently mask-out exactly 30% of array with 1M entries

My question's header is similar to this link, however that one wasn't answered to my expectations.
I have an array of integers (1 000 000 entries), and need to mask exactly 30% of elements.
My approach is to loop over elements and roll a dice for each one. Doing it in a non-interrupted manner is good for cache coherency.
As soon as I notice that exactly 300 000 of elements were indeed masked, I need to stop. However, I might reach the end of an array and have only 200 000 elements masked, forcing me to loop a second time, maybe even a third, etc.
What's the most efficient way to ensure I won't have to loop a second time, and not being biased towards picking some elements?
Edit:
//I need to preserve the order of elements.
//For instance, I might have:
[12, 14, 1, 24, 5, 8]
//Masking away 30% might give me:
[0, 14, 1, 24, 0, 8]
The result of masking must be the original array, with some elements set to zero
Just do a fisher-yates shuffle but stop at only 300000 iterations. The last 300000 elements will be the randomly chosen ones.
std::size_t size = 1000000;
for(std::size_t i = 0; i < 300000; ++i)
{
std::size_t r = std::rand() % size;
std::swap(array[r], array[size-1]);
--size;
}
I'm using std::rand for brevity. Obviously you want to use something better.
The other way is this:
for(std::size_t i = 0; i < 300000;)
{
std::size_t r = rand() % 1000000;
if(array[r] != 0)
{
array[r] = 0;
++i;
}
}
Which has no bias and does not reorder elements, but is inferior to fisher yates, especially for high percentages.
When I see a massive list, my mind always goes first to divide-and-conquer.
I won't be writing out a fully-fleshed algorithm here, just a skeleton. You seem like you have enough of a clue to take decent idea and run with it. I think I only need to point you in the right direction. With that said...
We'd need an RNG that can return a suitably-distributed value for how many masked values could potentially be below a given cut point in the list. I'll use the halfway point of the list for said cut. Some statistician can probably set you up with the right RNG function. (Anyone?) I don't want to assume it's just uniformly random [0..mask_count), but it might be.
Given that, you might do something like this:
// the magic RNG your stats homework will provide
int random_split_sub_count_lo( int count, int sub_count, int split_point );
void mask_random_sublist( int *list, int list_count, int sub_count )
{
if (list_count > SOME_SMALL_THRESHOLD)
{
int list_count_lo = list_count / 2; // arbitrary
int list_count_hi = list_count - list_count_lo;
int sub_count_lo = random_split_sub_count_lo( list_count, mask_count, list_count_lo );
int sub_count_hi = list_count - sub_count_lo;
mask( list, list_count_lo, sub_count_lo );
mask( list + sub_count_lo, list_count_hi, sub_count_hi );
}
else
{
// insert here some simple/obvious/naive implementation that
// would be ludicrous to use on a massive list due to complexity,
// but which works great on very small lists. I'm assuming you
// can do this part yourself.
}
}
Assuming you can find someone more informed on statistical distributions than I to provide you with a lead on the randomizer you need to split the sublist count, this should give you O(n) performance, with 'n' being the number of masked entries. Also, since the recursion is set up to traverse the actual physical array in constantly-ascending-index order, cache usage should be as optimal as it's gonna get.
Caveat: There may be minor distribution issues due to the discrete nature of the list versus the 30% fraction as you recurse down and down to smaller list sizes. In practice, I suspect this may not matter much, but whatever person this solution is meant for may not be satisfied that the random distribution is truly uniform when viewed under the microscope. YMMV, I guess.
Here's one suggestion. One million bits is only 128K which is not an onerous amount.
So create a bit array with all items initialised to zero. Then randomly select 300,000 of them (accounting for duplicates, of course) and mark those bits as one.
Then you can run through the bit array and, any that are set to one (or zero, if your idea of masking means you want to process the other 700,000), do whatever action you wish to the corresponding entry in the original array.
If you want to ensure there's no possibility of duplicates when randomly selecting them, just trade off space for time by using a Fisher-Yates shuffle.
Construct an collection of all the indices and, for each of the 700,000 you want removed (or 300,000 if, as mentioned, masking means you want to process the other ones) you want selected:
pick one at random from the remaining set.
copy the final element over the one selected.
reduce the set size.
This will leave you with a random subset of indices that you can use to process the integers in the main array.
You want reservoir sampling. Sample code courtesy of Wikipedia:
(*
S has items to sample, R will contain the result
*)
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i = 1 to k
R[i] := S[i]
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R[j] := S[i]

Simulate random iteration of array

I have an array of given size. I want to traverse it in pseudorandom order, keeping array intact and visiting each element once. It will be best if current state can be stored in a few integers.
I know you can't have full randomness without storing full array, but I don't need the order to be really random. I need it to be perceived as random by user. The solution should use sub-linear space.
One possible suggestion - using large prime number - is given here. The problem with this solution is that there is an obvious fixed step (taken module array size). I would prefer a solution which is not so obviously non-random. Is there a better solution?
How about this algorithm?
To pseudo-pseudo randomly traverse an array of size n.
Create a small array of size k
Use the large prime number method to fill the small array, i = 0
Randomly remove a position using a RNG from the small array, i += 1
if i < n - k then add a new position using the large prime number method
if i < n goto 3.
the higher k is the more randomness you get. This approach will allow you to delay generating numbers from the prime number method.
A similar approach can be done to generate a number earlier than expected in the sequence by creating another array, "skip-list". Randomly pick items later in the sequence, use them to traverse the next position, and then add them to the skip-list. When they naturally arrive they are searched for in the skip-list and suppressed and then removed from the skip-list at which point you can randomly add another item to the skip-list.
The idea of a random generator that simulates a shuffle is good if you can get one whose maximum period you can control.
A Linear Congruential Generator calculates a random number with the formula:
x[i + 1] = (a * x[i] + c) % m;
The maximum period is m and it is achieved when the following properties hold:
The parameters c and m are relatively prime.
For every prime number r dividing m, a - 1 is a multiple of r.
If m is a multiple of 4 then also a - 1 is multiple of 4.
My first darft involved making m the next multiple of 4 after the array length and then finding suitable a and c values. This was (a) a lot of work and (b) yielded very obvious results sometimes.
I've rethought this approach. We can make m the smallest power of two that the array length will fit in. The only prime factor of m is then 2, which will make every odd number relatively prime to it. With the exception of 1 and 2, m will be divisible by 4, which means that we must make a - 1 a multiple of 4.
Having a greater m than the array length means that we must discard all values that are illegal array indices. This will happen at most every other turn and should be negligible.
The following code yields pseudo random numbers with a period of exaclty m. I've avoided trivial values for a and c and on my (not too numerous) spot cheks, the results looked okay. At least there was no obvious cycling pattern.
So:
class RandomIndexer
{
public:
RandomIndexer(size_t length) : len(length)
{
m = 8;
while (m < length) m <<= 1;
c = m / 6 + uniform(5 * m / 6);
c |= 1;
a = m / 12 * uniform(m / 6);
a = 4*a + 1;
x = uniform(m);
}
size_t next()
{
do { x = (a*x + c) % m; } while (x >= len);
return x;
}
private:
static size_t uniform(size_t m)
{
double p = std::rand() / (1.0 + RAND_MAX);
return static_cast<int>(m * p);
}
size_t len;
size_t x;
size_t a;
size_t c;
size_t m;
};
You can then use the generator like this:
std::vector<int> list;
for (size_t i = 0; i < 3; i++) list.push_back(i);
RandomIndexer ix(list.size());
for (size_t i = 0; i < list.size(); i++) {
std::cout << list[ix.next()]<< std::endl;
}
I am aware that this still isn't a great random number generator, but it is reasonably fast, doesn't require a copy of the array and seems to work okay.
If the approach of picking a and c randomly yields bad results, it might be a good idea to restrict the generator to some powers of two and to hard-code literature values that have proven to be good.
As pointed out by others, you can create a sort of "flight plan" upfront by shuffling an array of array indices and then follow it. This violates the "it will be best if current state can be stored in a few integers" constraint but does it really matter? Are there tight performance constraints? After all, I believe that if you don't accept repetitions, than you need to store the items you already visited somewhere or somehow.
Alternatively, you can opt for an intrusive solution and store a bool inside each element of the array, telling you whether the element was already selected or not. This can be done in an almost clean way by employing inheritance (multiple as needed).
Many problems come with this solution, e.g. thread safety, and of course it violates the "keep the array intact" constraint.
Quadratic residues which you have mentioned ("using a large prime") are well-known, will work, and guarantee iterating each and every element exactly once (if that is required, but it seems that's not strictly the case?). Unluckily they are not "very random looking", and there are a few other requirements to the modulo in addition to being prime for it to work.
There is a page on Jeff Preshing's site which describes the technique in detail and suggests to feed the output of the residue generator into the generator again with a fixed offset.
However, since you said that you merely need "perceived as random by user", it seems that you might be able to do with feeding a hash function (say, cityhash or siphash) with consecutive integers. The output will be a "random" integer, and at least so far there will be a strict 1:1 mapping (since there are a lot more possible hash values than there are inputs).
Now the problem is that your array is most likely not that large, so you need to somehow reduce the range of these generated indices without generating duplicates (which is tough).
The obvious solution (taking the modulo) will not work, as it pretty much guarantees that you get a lot of duplicates.
Using a bitmask to limit the range to the next greater power of two should work without introducing bias, and discarding indices that are out of bounds (generating a new index) should work as well. Note that this needs non-deterministic time -- but the combination of these two should work reasonably well (a couple of tries at most) on the average.
Otherwise, the only solution that "really works" is shuffling an array of indices as pointed out by Kamil Kilolajczyk (though you don't want that).
Here is a java solution, which can be easily converted to C++ and similar to M Oehm's solution above, albeit with a different way of choosing LCG parameters.
import java.util.Enumeration;
import java.util.Random;
public class RandomPermuteIterator implements Enumeration<Long> {
int c = 1013904223, a = 1664525;
long seed, N, m, next;
boolean hasNext = true;
public RandomPermuteIterator(long N) throws Exception {
if (N <= 0 || N > Math.pow(2, 62)) throw new Exception("Unsupported size: " + N);
this.N = N;
m = (long) Math.pow(2, Math.ceil(Math.log(N) / Math.log(2)));
next = seed = new Random().nextInt((int) Math.min(N, Integer.MAX_VALUE));
}
public static void main(String[] args) throws Exception {
RandomPermuteIterator r = new RandomPermuteIterator(100);
while (r.hasMoreElements()) System.out.print(r.nextElement() + " ");
//output:50 52 3 6 45 40 26 49 92 11 80 2 4 19 86 61 65 44 27 62 5 32 82 9 84 35 38 77 72 7 ...
}
#Override
public boolean hasMoreElements() {
return hasNext;
}
#Override
public Long nextElement() {
next = (a * next + c) % m;
while (next >= N) next = (a * next + c) % m;
if (next == seed) hasNext = false;
return next;
}
}
maybe you could use this one: http://www.cplusplus.com/reference/algorithm/random_shuffle/ ?

Random pairs of different bits

I have the following problem. I have a number represented in binary representation. I need a way to randomly select two bits of them that are different (i.e. find a 1 and a 0). Besides this I run other operations on that number (reversing sequences, permute sequences,...) These are the approaches I already used:
Keep track of all the ones and the zeros. When I create the binary representation of the binary number I store the places of the 0's and 1's. So that I can choose an index for one list and one index from the other one. I then have two different bits. To run my other operations I created those from an elementary swap operations which updates the indices of the 1's and 0's when manipulating. Therefore I have a third list that stores the list index for each bit. If a bit is 1 I know where to find in the list with all the indices of the ones (same goes for zeros).
The method above yields some overhead when operations are done that do not require the bits to be different. So another way would be to create the lists whenever different bits are needed.
Does anyone have a better idea to do this? I need these operations to be really fast (I am working with popcount, clz, and other binary operations)
I don't feel as though I have enough information to assess the tradeoffs properly, but perhaps you'll find this idea useful. To find a random 1 in a word (find a 1 over multiple words by popcount and reservoir sampling; find a 0 by complementing), first test the popcount. If the popcount is high, then generate indexes uniformly at random and test them until a one is found. If the popcount is medium, then take bitwise ANDs with uniform random masks (but keep the original if the AND is zero) to reduce the popcount. When the popcount is low, use clz to compile the (small) list of candidates efficiently and then sample uniformly at random.
I think the following might be a rather efficient algorithm to do what you are asking. You only iterate over each bit in the number once, and for each element, you have to generate a random number (not exactly sure how costly that is, but I believe there are some optimized CPU instructions for getting random numbers).
Idea is to iterate over all the bits, and with the right probability, update the index to the current index you are visiting.
Generic pseudocode for getting an element from a stream/array:
p = 1
e = null
for s in stream:
with probability 1/p:
replace e with s
p++
return e
Java version:
int[] getIdx(int n){
int oneIdx = 0;
int zeroIdx = 0;
int ones = 1;
int zeros = 1;
// this loop depends on whether you want to select all the prepended zeros
// in a 32/64 bit representation. Alter to your liking...
for(int i = n, j = 0; i > 0; i = i >>> 1, j++){
if((i & 1) == 1){ // current element is 1
if(Math.random() < 1/(float)ones){
oneIdx = j;
}
ones++;
} else{ // element is 0
if(Math.random() < 1/(float)zeros){
zeroIdx = j;
}
zeros++;
}
}
return new int[]{zeroIdx,oneIdx};
}
An optimization you might look into is to do the probability selection using ints instead of floats, might be slightly faster. Here is a short proof I did some time ago regarding that this works: here . I believe the algorithm is attributed to Knuth but can't remember exactly.

What is the most efficient way to generate unique pseudo-random numbers? [duplicate]

Duplicate:
Unique random numbers in O(1)?
I want an pseudo random number generator that can generate numbers with no repeats in a random order.
For example:
random(10)
might return
5, 9, 1, 4, 2, 8, 3, 7, 6, 10
Is there a better way to do it other than making the range of numbers and shuffling them about, or checking the generated list for repeats?
Edit:
Also I want it to be efficient in generating big numbers without the entire range.
Edit:
I see everyone suggesting shuffle algorithms. But if I want to generate large random number (1024 byte+) then that method would take alot more memory than if I just used a regular RNG and inserted into a Set until it was a specified length, right? Is there no better mathematical algorithm for this.
You may be interested in a linear feedback shift register.
We used to build these out of hardware, but I've also done them in software. It uses a shift register with some of the bits xor'ed and fed back to the input, and if you pick just the right "taps" you can get a sequence that's as long as the register size. That is, a 16-bit lfsr can produce a sequence 65535 long with no repeats. It's statistically random but of course eminently repeatable. Also, if it's done wrong, you can get some embarrassingly short sequences. If you look up the lfsr, you will find examples of how to construct them properly (which is to say, "maximal length").
A shuffle is a perfectly good way to do this (provided you do not introduce a bias using the naive algorithm). See Fisher-Yates shuffle.
If a random number is guaranteed to never repeat it is no longer random and the amount of randomness decreases as the numbers are generated (after nine numbers random(10) is rather predictable and even after only eight you have a 50-50 chance).
I understand tou don't want a shuffle for large ranges, since you'd have to store the whole list to do so.
Instead, use a reversible pseudo-random hash. Then feed in the values 0 1 2 3 4 5 6 etc in turn.
There are infinite numbers of hashes like this. They're not too hard to generate if they're restricted to a power of 2, but any base can be used.
Here's one that would work for example if you wanted to go through all 2^32 32 bit values. It's easiest to write because the implicit mod 2^32 of integer math works to your advantage in this case.
unsigned int reversableHash(unsigned int x)
{
x*=0xDEADBEEF;
x=x^(x>>17);
x*=0x01234567;
x+=0x88776655;
x=x^(x>>4);
x=x^(x>>9);
x*=0x91827363;
x=x^(x>>7);
x=x^(x>>11);
x=x^(x>>20);
x*=0x77773333;
return x;
}
If you don't mind mediocre randomness properties and if the number of elements allows it then you could use a linear congruential random number generator.
A shuffle is the best you can do for random numbers in a specific range with no repeats. The reason that the method you describe (randomly generate numbers and put them in a Set until you reach a specified length) is less efficient is because of duplicates. Theoretically, that algorithm might never finish. At best it will finish in an indeterminable amount of time, as compared to a shuffle, which will always run in a highly predictable amount of time.
Response to edits and comments:
If, as you indicate in the comments, the range of numbers is very large and you want to select relatively few of them at random with no repeats, then the likelihood of repeats diminishes rapidly. The bigger the difference in size between the range and the number of selections, the smaller the likelihood of repeat selections, and the better the performance will be for the select-and-check algorithm you describe in the question.
What about using GUID generator (like in the one in .NET). Granted it is not guaranteed that there will be no duplicates, however the chance getting one is pretty low.
This has been asked before - see my answer to the previous question. In a nutshell: You can use a block cipher to generate a secure (random) permutation over any range you want, without having to store the entire permutation at any point.
If you want to creating large (say, 64 bits or greater) random numbers with no repeats, then just create them. If you're using a good random number generator, that actually has enough entropy, then the odds of generating repeats are so miniscule as to not be worth worrying about.
For instance, when generating cryptographic keys, no one actually bothers checking to see if they've generated the same key before; since you're trusting your random number generator that a dedicated attacker won't be able to get the same key out, then why would you expect that you would come up with the same key accidentally?
Of course, if you have a bad random number generator (like the Debian SSL random number generator vulnerability), or are generating small enough numbers that the birthday paradox gives you a high chance of collision, then you will need to actually do something to ensure you don't get repeats. But for large random numbers with a good generator, just trust probability not to give you any repeats.
As you generate your numbers, use a Bloom filter to detect duplicates. This would use a minimal amount of memory. There would be no need to store earlier numbers in the series at all.
The trade off is that your list could not be exhaustive in your range. If your numbers are truly on the order of 256^1024, that's hardly any trade off at all.
(Of course if they are actually random on that scale, even bothering to detect duplicates is a waste of time. If every computer on earth generated a trillion random numbers that size every second for trillions of years, the chance of a collision is still absolutely negligible.)
I second gbarry's answer about using an LFSR. They are very efficient and simple to implement even in software and are guaranteed not to repeat in (2^N - 1) uses for an LFSR with an N-bit shift-register.
There are some drawbacks however: by observing a small number of outputs from the RNG, one can reconstruct the LFSR and predict all values it will generate, making them not usable for cryptography and anywhere were a good RNG is needed. The second problem is that either the all zero word or the all one (in terms of bits) word is invalid depending on the LFSR implementation. The third issue which is relevant to your question is that the maximum number generated by the LFSR is always a power of 2 - 1 (or power of 2 - 2).
The first drawback might not be an issue depending on your application. From the example you gave, it seems that you are not expecting zero to be among the answers; so, the second issue does not seem relevant to your case.
The maximum value (and thus range) problem can solved by reusing the LFSR until you get a number within your range. Here's an example:
Say you want to have numbers between 1 and 10 (as in your example). You would use a 4-bit LFSR which has a range [1, 15] inclusive. Here's a pseudo code as to how to get number in the range [1,10]:
x = LFSR.getRandomNumber();
while (x > 10) {
x = LFSR.getRandomNumber();
}
You should embed the previous code in your RNG; so that the caller wouldn't care about implementation.
Note that this would slow down your RNG if you use a large shift-register and the maximum number you want is not a power of 2 - 1.
This answer suggests some strategies for getting what you want and ensuring they are in a random order using some already well-known algorithms.
There is an inside out version of the Fisher-Yates shuffle algorithm, called the Durstenfeld version, that randomly distributes sequentially acquired items into arrays and collections while loading the array or collection.
One thing to remember is that the Fisher-Yates (AKA Knuth) shuffle or the Durstenfeld version used at load time is highly efficient with arrays of objects because only the reference pointer to the object is being moved and the object itself doesn't have to be examined or compared with any other object as part of the algorithm.
I will give both algorithms further below.
If you want really huge random numbers, on the order of 1024 bytes or more, a really good random generator that can generate unsigned bytes or words at a time will suffice. Randomly generate as many bytes or words as you need to construct the number, make it into an object with a reference pointer to it and, hey presto, you have a really huge random integer. If you need a specific really huge range, you can add a base value of zero bytes to the low-order end of the byte sequence to shift the value up. This may be your best option.
If you need to eliminate duplicates of really huge random numbers, then that is trickier. Even with really huge random numbers, removing duplicates also makes them significantly biased and not random at all. If you have a really large set of unduplicated really huge random numbers and you randomly select from the ones not yet selected, then the bias is only the bias in creating the huge values for the really huge set of numbers from which to choose. A reverse version of Durstenfeld's version of the Yates-Fisher could be used to randomly choose values from a really huge set of them, remove them from the remaining values from which to choose and insert them into a new array that is a subset and could do this with just the source and target arrays in situ. This would be very efficient.
This may be a good strategy for getting a small number of random numbers with enormous values from a really large set of them in which they are not duplicated. Just pick a random location in the source set, obtain its value, swap its value with the top element in the source set, reduce the size of the source set by one and repeat with the reduced size source set until you have chosen enough values. This is essentiall the Durstenfeld version of Fisher-Yates in reverse. You can then use the Dursenfeld version of the Fisher-Yates algorithm to insert the acquired values into the destination set. However, that is overkill since they should be randomly chosen and randomly ordered as given here.
Both algorithms assume you have some random number instance method, nextInt(int setSize), that generates a random integer from zero to setSize meaning there are setSize possible values. In this case, it will be the size of the array since the last index to the array is size-1.
The first algorithm is the Durstenfeld version of Fisher-Yates (aka Knuth) shuffle algorithm as applied to an array of arbitrary length, one that simply randomly positions integers from 0 to the length of the array into the array. The array need not be an array of integers, but can be an array of any objects that are acquired sequentially which, effectively, makes it an array of reference pointers. It is simple, short and very effective
int size = someNumber;
int[] int array = new int[size]; // here is the array to load
int location; // this will get assigned a value before used
// i will also conveniently be the value to load, but any sequentially acquired
// object will work
for (int i = 0; i <= size; i++) { // conveniently, i is also the value to load
// you can instance or acquire any object at this place in the algorithm to load
// by reference, into the array and use a pointer to it in place of j
int j = i; // in this example, j is trivially i
if (i == 0) { // first integer goes into first location
array[i] = j; // this may get swapped from here later
} else { // subsequent integers go into random locations
// the next random location will be somewhere in the locations
// already used or a new one at the end
// here we get the next random location
// to preserve true randomness without a significant bias
// it is REALLY IMPORTANT that the newest value could be
// stored in the newest location, that is,
// location has to be able to randomly have the value i
int location = nextInt(i + 1); // a random value between 0 and i
// move the random location's value to the new location
array[i] = array[location];
array[location] = j; // put the new value into the random location
} // end if...else
} // end for
Voila, you now have an already randomized array.
If you want to randomly shuffle an array you already have, here is the standard Fisher-Yates algorithm.
type[] array = new type[size];
// some code that loads array...
// randomly pick an item anywhere in the current array segment,
// swap it with the top element in the current array segment,
// then shorten the array segment by 1
// just as with the Durstenfeld version above,
// it is REALLY IMPORTANT that an element could get
// swapped with itself to avoid any bias in the randomization
type temp; // this will get assigned a value before used
int location; // this will get assigned a value before used
for (int i = arrayLength -1 ; i > 0; i--) {
int location = nextInt(i + 1);
temp = array[i];
array[i] = array[location];
array[location] = temp;
} // end for
For sequenced collections and sets, i.e. some type of list object, you could just use adds/or inserts with an index value that allows you to insert items anywhere, but it has to allow adding or appending after the current last item to avoid creating bias in the randomization.
Shuffling N elements doesn't take up excessive memory...think about it. You only swap one element at a time, so the maximum memory used is that of N+1 elements.
Assuming you have a random or pseudo-random number generator, even if it's not guaranteed to return unique values, you can implement one that returns unique values each time using this code, assuming that the upper limit remains constant (i.e. you always call it with random(10), and don't call it with random(10); random(11).
The code doesn't check for errors. You can add that yourself if you want to.
It also requires a lot of memory if you want a large range of numbers.
/* the function returns a random number between 0 and max -1
* not necessarily unique
* I assume it's written
*/
int random(int max);
/* the function returns a unique random number between 0 and max - 1 */
int unique_random(int max)
{
static int *list = NULL; /* contains a list of numbers we haven't returned */
static int in_progress = 0; /* 0 --> we haven't started randomizing numbers
* 1 --> we have started randomizing numbers
*/
static int count;
static prev_max = 0;
// initialize the list
if (!in_progress || (prev_max != max)) {
if (list != NULL) {
free(list);
}
list = malloc(sizeof(int) * max);
prev_max = max;
in_progress = 1;
count = max - 1;
int i;
for (i = max - 1; i >= 0; --i) {
list[i] = i;
}
}
/* now choose one from the list */
int index = random(count);
int retval = list[index];
/* now we throw away the returned value.
* we do this by shortening the list by 1
* and replacing the element we returned with
* the highest remaining number
*/
swap(&list[index], &list[count]);
/* when the count reaches 0 we start over */
if (count == 0) {
in_progress = 0;
free(list);
list = 0;
} else { /* reduce the counter by 1 */
count--;
}
}
/* swap two numbers */
void swap(int *x, int *y)
{
int temp = *x;
*x = *y;
*y = temp;
}
Actually, there's a minor point to make here; a random number generator which is not permitted to repeat is not random.
Suppose you wanted to generate a series of 256 random numbers without repeats.
Create a 256-bit (32-byte) memory block initialized with zeros, let's call it b
Your looping variable will be n, the number of numbers yet to be generated
Loop from n = 256 to n = 1
Generate a random number r in the range [0, n)
Find the r-th zero bit in your memory block b, let's call it p
Put p in your list of results, an array called q
Flip the p-th bit in memory block b to 1
After the n = 1 pass, you are done generating your list of numbers
Here's a short example of what I am talking about, using n = 4 initially:
**Setup**
b = 0000
q = []
**First loop pass, where n = 4**
r = 2
p = 2
b = 0010
q = [2]
**Second loop pass, where n = 3**
r = 2
p = 3
b = 0011
q = [2, 3]
**Third loop pass, where n = 2**
r = 0
p = 0
b = 1011
q = [2, 3, 0]
** Fourth and final loop pass, where n = 1**
r = 0
p = 1
b = 1111
q = [2, 3, 0, 1]
Please check answers at
Generate sequence of integers in random order without constructing the whole list upfront
and also my answer lies there as
very simple random is 1+((power(r,x)-1) mod p) will be from 1 to p for values of x from 1 to p and will be random where r and p are prime numbers and r <> p.
I asked a similar question before but mine was for the whole range of a int see Looking for a Hash Function /Ordered Int/ to /Shuffled Int/
static std::unordered_set<long> s;
long l = 0;
for(; !l && (s.end() != s.find(l)); l = generator());
v.insert(l);
generator() being your random number generator. You roll numbers as long as the entry is not in your set, then you add what you find in it. You get the idea.
I did it with long for the example, but you should make that a template if your PRNG is templatized.
Alternative is to use a cryptographically secure PRNG that will have a very low probability to generate twice the same number.
If you don't mean poor statisticall properties of generated sequence, there is one method:
Let's say you want to generate N numbers, each of 1024 bits each. You can sacrifice some bits of generated number to be "counter".
So you generate each random number, but into some bits you choosen you put binary encoded counter (from variable, you increase each time next random number is generated).
You can split that number into single bits and put it in some of less significant bits of generated number.
That way you are sure you get unique number each time.
I mean for example each generated number looks like that:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyxxxxyxyyyyxxyxx
where x is take directly from generator, and ys are taken from counter variable.
Mersenne twister
Description of which can be found here on Wikipedia: Mersenne twister
Look at the bottom of the page for implementations in various languages.
The problem is to select a "random" sequence of N unique numbers from the range 1..M where there is no constraint on the relationship between N and M (M could be much bigger, about the same, or even smaller than N; they may not be relatively prime).
Expanding on the linear feedback shift register answer: for a given M, construct a maximal LFSR for the smallest power of two that is larger than M. Then just grab your numbers from the LFSR throwing out numbers larger than M. On average, you will throw out at most half the generated numbers (since by construction more than half the range of the LFSR is less than M), so the expected running time of getting a number is O(1). You are not storing previously generated numbers so space consumption is O(1) too. If you cycle before getting N numbers then M less than N (or the LFSR is constructed incorrectly).
You can find the parameters for maximum length LFSRs up to 168 bits here (from wikipedia): http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf
Here's some java code:
/**
* Generate a sequence of unique "random" numbers in [0,M)
* #author dkoes
*
*/
public class UniqueRandom
{
long lfsr;
long mask;
long max;
private static long seed = 1;
//indexed by number of bits
private static int [][] taps = {
null, // 0
null, // 1
null, // 2
{3,2}, //3
{4,3},
{5,3},
{6,5},
{7,6},
{8,6,5,4},
{9,5},
{10,7},
{11,9},
{12,6,4,1},
{13,4,3,1},
{14,5,3,1},
{15,14},
{16,15,13,4},
{17,14},
{18,11},
{19,6,2,1},
{20,17},
{21,19},
{22,21},
{23,18},
{24,23,22,17},
{25,22},
{26,6,2,1},
{27,5,2,1},
{28,25},
{29,27},
{30,6,4,1},
{31,28},
{32,22,2,1},
{33,20},
{34,27,2,1},
{35,33},
{36,25},
{37,5,4,3,2,1},
{38,6,5,1},
{39,35},
{40,38,21,19},
{41,38},
{42,41,20,19},
{43,42,38,37},
{44,43,18,17},
{45,44,42,41},
{46,45,26,25},
{47,42},
{48,47,21,20},
{49,40},
{50,49,24,23},
{51,50,36,35},
{52,49},
{53,52,38,37},
{54,53,18,17},
{55,31},
{56,55,35,34},
{57,50},
{58,39},
{59,58,38,37},
{60,59},
{61,60,46,45},
{62,61,6,5},
{63,62},
};
//m is upperbound; things break if it isn't positive
UniqueRandom(long m)
{
max = m;
lfsr = seed; //could easily pass a starting point instead
//figure out number of bits
int bits = 0;
long b = m;
while((b >>>= 1) != 0)
{
bits++;
}
bits++;
if(bits < 3)
bits = 3;
mask = 0;
for(int i = 0; i < taps[bits].length; i++)
{
mask |= (1L << (taps[bits][i]-1));
}
}
//return -1 if we've cycled
long next()
{
long ret = -1;
if(lfsr == 0)
return -1;
do {
ret = lfsr;
//update lfsr - from wikipedia
long lsb = lfsr & 1;
lfsr >>>= 1;
if(lsb == 1)
lfsr ^= mask;
if(lfsr == seed)
lfsr = 0; //cycled, stick
ret--; //zero is stuck state, never generated so sub 1 to get it
} while(ret >= max);
return ret;
}
}
Here is a way to random without repeating results. It also works for strings. Its in C# but the logig should work in many places. Put the random results in a list and check if the new random element is in that list. If not than you have a new random element. If it is in that list, repeat the random until you get an element that is not in that list.
List<string> Erledigte = new List<string>();
private void Form1_Load(object sender, EventArgs e)
{
label1.Text = "";
listBox1.Items.Add("a");
listBox1.Items.Add("b");
listBox1.Items.Add("c");
listBox1.Items.Add("d");
listBox1.Items.Add("e");
}
private void button1_Click(object sender, EventArgs e)
{
Random rand = new Random();
int index=rand.Next(0, listBox1.Items.Count);
string rndString = listBox1.Items[index].ToString();
if (listBox1.Items.Count <= Erledigte.Count)
{
return;
}
else
{
if (Erledigte.Contains(rndString))
{
//MessageBox.Show("vorhanden");
while (Erledigte.Contains(rndString))
{
index = rand.Next(0, listBox1.Items.Count);
rndString = listBox1.Items[index].ToString();
}
}
Erledigte.Add(rndString);
label1.Text += rndString;
}
}
For a sequence to be random there should not be any auto correlation. The restriction that the numbers should not repeat means the next number should depend on all the previous numbers which means it is not random anymore....
If you can generate 'small' random numbers, you can generate 'large' random numbers by integrating them: add a small random increment to each 'previous'.
const size_t amount = 100; // a limited amount of random numbers
vector<long int> numbers;
numbers.reserve( amount );
const short int spread = 250; // about 250 between each random number
numbers.push_back( myrandom( spread ) );
for( int n = 0; n != amount; ++n ) {
const short int increment = myrandom( spread );
numbers.push_back( numbers.back() + increment );
}
myshuffle( numbers );
The myrandom and myshuffle functions I hereby generously delegate to others :)
to have non repeated random numbers and to avoid waistingtime with checking for doubles numbers and get new numbers over and over use the below method which will assure the minimum usage of Rand:
for example if you want to get 100 non repeated random number:
1. fill an array with numbers from 1 to 100
2. get a random number using Rand function in the range of (1-100)
3. use the genarted random number as an Index to get th value from the array (Numbers[IndexGeneratedFromRandFunction]
4. shift the number in the array after that Index to the left
5. repeat from step 2 but now the the rang should be (1-99) and go on
now we have a array with different numbers!
int main() {
int b[(the number
if them)];
for (int i = 0; i < (the number of them); i++) {
int a = rand() % (the number of them + 1) + 1;
int j = 0;
while (j < i) {
if (a == b[j]) {
a = rand() % (the number of them + 1) + 1;
j = -1;
}
j++;
}
b[i] = a;
}
}