Random algorithm with priority to one end - c++

I am writing a little program with a GUI in C++ and Qt.
It is supposed to be similar to a vocabulary trainer. I will use it for my own studying.
I have a QList of objects (name and description as string for example).
Then I have a second QList with ints in it. For every object in my other list, an int is in this list. Start value is 50 for every object; if user clicks correct, it gets decremented, vice versa.
So a object with value 70 should be shown more often to the user than an object with value 30. So in the correct answer method I increase/decrease it, sort the QList and use my random algorithm:
if(packList.count()==0) // the QList with objects
return;
int Min = 0;
int Max = packList.count()-1; // -1 because i need the index
qsrand(QTime::currentTime().msec());
if (Min > Max)
{
int Temp = Min;
Min = Max;
Max = Temp;
}
int randNum = ((rand()%(Max-Min+1))+Min);
setPage(randNum); // randNum will be used as index in this method
Now what I need is a way to implement my priority in this random algorithm. I don't want the ones with a higher value to appear 90% of the time, but just more often, just like a vocabulary trainer.

First a remark: You should use qsrand only once at the beginning of the program.
Now to your algorithm: First get the sum of all your values, let us call it sumValues, then compute your random number between 0 and sumValues-1. Go through your list and sum the values into the variable currentSum until it is greater or equal to your random number, and use the index of this entry. This will be more efficient if you sort your list by decreasing values.

Related

Randomly select index from a STL vector from truth value

I have a vector that looks like:
vector<int> A = {0, 1, 1, 0, 0, 1, 0, 1};
I'd like to select a random index from the non-zero values of A. Using this example A, I want to randomly select an element from the array {1,2,5,7}.
Currently I do this by creating another array
vector<int> b;
for(int i=0;i<A.size();i++)
if(A[i])
b.push_back(i);
Once b is created, I find the index by using this answer:
get random element from container
Is there a more STL-like (or C++11) way of doing this, perhaps one that does not create an intermediate array? In this example A is small, but in my production code this selection process is in an inner-loop and A is non-static and thousands of elements long.
A great way to do this is Reservoir Sampling.
In short, you walk your array until you find the first non-zero value, and record that index as the first possible answer you might return.
Then, you continue to walk the array. Every time you find a non-zero value, you randomly might change which new index is your possible answer, with decreasing probability.
This algorithm also works great if you need M random index values from your array.
What's great about this, is that you walk each element only one time, and you don't need a separate memory structure to record the non-zero elements. It's O(N) in speed, and O(M) in memory, in your case it's O(1) in memory, since you only want 1 random value.
On the flip side, random number generators are traditionally quite slow. So, you might want to performance test this against any other ideas people come up with here, to see if the trade-off of speed-vs-memory is worth it for you.
With a single pass through the array, you can determine how many false (or true) values there are. If you are doing this kind of thing often, you can even write a class to keep track of this for you.
Regardless, you can then pick a random number i between 0 and num_false (or num_true). Then with another pass through the array, you can return the ith false (or true) index.
We can loop through each non-zero value and assign it a random number. The index with the largest random number is the one we select.
int value = 0;
int index = 0;
while(int i = 0; i < A.size(); i++) {
if(!A[i]) continue;
auto j = rand();
if(j > value) {
index = i;
value = j;
}
}
vector<int> A = {0,1,1,0,0,1,0,1};
random_shuffle(A.begin(),A.end());
auto it = find_if(A.begin(),A.end(),[](const int elem){return elem;});

Generate a new element different from 1000 elements of an array

I was asked this questions in an interview. Consider the scenario of punched cards, where each punched card has 64 bit pattern. I was suggested each card as an int since each int is a collection of bits.
Also, to be considered that I have an array which already contains 1000 such cards. I have to generate a new element everytime which is different from the previous 1000 cards. The integers(aka cards) in the array are not necessarily sorted.
Even more, how would that be possible the question was for C++, where does the 64 bit int comes from and how can I generate this new card from the array where the element to be generated is different from all the elements already present in the array?
There are 264 64 bit integers, a number that is so much
larger than 1000 that the simplest solution would be to just generate a
random 64 bit number, and then verify that it isn't in the table of
already generated numbers. (The probability that it is is
infinitesimal, but you might as well be sure.)
Since most random number generators do not generate 64 bit values, you
are left with either writing your own, or (much simpler), combining the
values, say by generating 8 random bytes, and memcpying them into a
uint64_t.
As for verifying that the number isn't already present, std::find is
just fine for one or two new numbers; if you have to do a lot of
lookups, sorting the table and using a binary search would be
worthwhile. Or some sort of a hash table.
I may be missing something, but most of the other answers appear to me as overly complicated.
Just sort the original array and then start counting from zero: if the current count is in the array skip it, otherwise you have your next number. This algorithm is O(n), where n is the number of newly generated numbers: both sorting the array and skipping existing numbers are constants. Here's an example:
#include <algorithm>
#include <iostream>
unsigned array[] = { 98, 1, 24, 66, 20, 70, 6, 33, 5, 41 };
unsigned count = 0;
unsigned index = 0;
int main() {
std::sort(array, array + 10);
while ( count < 100 ) {
if ( count > array[index] )
++index;
else {
if ( count < array[index] )
std::cout << count << std::endl;
++count;
}
}
}
Here's an O(n) algorithm:
int64 generateNewValue(list_of_cards)
{
return find_max(list_of_cards)+1;
}
Note: As #amit points out below, this will fail if INT64_MAX is already in the list.
As far as I'm aware, this is the only way you're going to get O(n). If you want to deal with that (fairly important) edge case, then you're going to have to do some kind of proper sort or search, which will take you to O(n log n).
#arne is almost there. What you need is a self-balancing interval tree, which can be built in O(n lg n) time.
Then take the top node, which will store some interval [i, j]. By the properties of an interval tree, both i-1 and j+1 are valid candidates for a new key, unless i = UINT64_MIN or j = UINT64_MAX. If both are true, then you've stored 2^64 elements and you can't possibly generate a new element. Store the new element, which takes O(lg n) worst-case time.
I.e.: init takes O(n lg n), generate takes O(lg n). Both are worst-case figures. The greatest thing about this approach is that the top node will keep "growing" (storing larger intervals) and merging with its successor or predecessor, so the tree will actually shrink in terms of memory use and eventually the time per operation decays to O(1). You also won't waste any numbers, so you can keep generating until you've got 2^64 of them.
This algorithm has O(N lg N) initialisation, O(1) query and O(N) memory usage. I assume you have some integer type which I will refer to as int64 and that it can represent the integers [0, int64_max].
Sort the numbers
Create a linked list containing intervals [u, v]
Insert [1, first number - 1]
For each of the remaining numbers, insert [prev number + 1, current number - 1]
Insert [last number + 1, int64_max]
You now have a list representing the numbers which are not used. You can simply iterate over them to generate new numbers.
I think the way to go is to use some kind of hashing. So you store your cards in some buckets based on lets say on MOD operation. Until you create some sort of indexing you are stucked with looping over the whole array.
IF you have a look on HashSet implementation in java you might get a clue.
Edit: I assume you wanted them to be random numbers, if you don't mind sequence MAX+1 below is good solution :)
You could build a binary tree of the already existing elements and traverse it until you find a node whose depth is not 64 and which has less than two child nodes. You can then construct a "missing" child node and have a new element. The should be fairly quick, in the order of about O(n) if I'm not mistaken.
bool seen[1001] = { false };
for each element of the original array
if the element is in the range 0..1000
seen[element] = true
find the index for the first false value in seen
Initialization:
Don't sort the list.
Create a new array 1000 long containing 0..999.
Iterate the list and, if any number is in the range 0..999, invalidate it in the new array by replacing the value in the new array with the value of the first item in the list.
Insertion:
Use an incrementing index to the new array. If the value in the new array at this index is not the value of the first element in the list, add it to the list, else check the value from the next position in the new array.
When the new array is used up, refill it using 1000..1999 and invalidating existing values as above. Yes, this is looping over the list, but it doesn't have to be done for each insertion.
Near O(1) until the list gets so large that occasionally iterating it for invalidation of the 'new' new array becomes significant. Maybe you could mitigate this by using a new array that grows, maybee always the size of the list?
Rgds,
Martin
Put them all into a hash table of size > 1000, and find the empty cell (this is the parking problem). Generate a key for that. This will of course work better for bigger table size. The table needs only 1-bit entries.
EDIT: this is the pigeonhole principle.
This needs "modulo tablesize" (or some other "semi-invertible" function) for a hash function.
unsigned hashtab[1001] = {0,};
unsigned long long long long numbers[1000] = { ... };
void init (void)
{
unsigned idx;
for (idx=0; idx < 1000; idx++) {
hashtab [ numbers[idx] % 1001 ] += 1; }
}
unsigned long long long long generate(void)
{
unsigned idx;
for (idx = 0; idx < 1001; idx++) {
if ( !hashtab [ idx] ) break; }
return idx + rand() * 1001;
}
Based on the solution here: question on array and number
Since there are 1000 numbers, if we consider their remainders with 1001, at least one remainder will be missing. We can pick that as our missing number.
So we maintain an array of counts: C[1001], which will maintain the number of integers with remainder r (upon dividing by 1001) in C[r].
We also maintain a set of numbers for which C[j] is 0 (say using a linked list).
When we move the window over, we decrement the count of the first element (say remainder i), i.e. decrement C[i]. If C[i] becomes zero we add i to the set of numbers. We update the C array with the new number we add.
If we need one number, we just pick a random element from the set of j for which C[j] is 0.
This is O(1) for new numbers and O(n) initially.
This is similar to other solutions but not quite.
How about something simple like this:
1) Partition the array into numbers equal and below 1000 and above
2) If all the numbers fit within the lower partition then choose 1001 (or any number greater than 1000) and we're done.
3) Otherwise we know that there must exist a number between 1 and 1000 that doesn't exist within the lower partition.
4) Create a 1000 element array of bools, or a 1000-element long bitfield, or whatnot and initialize the array to all 0's
5) For each integer in the lower partition, use its value as an index into the array/bitfield and set the corresponding bool to true (ie: do a radix sort)
6) Go over the array/bitfield and pick any unset value's index as the solution
This works in O(n) time, or since we've bounded everything by 1000, technically it's O(1), but O(n) time and space in general. There are three passes over the data, which isn't necessarily the most elegant approach, but the complexity remains O(n).
you can create a new array with the numbers that are not in the original array, then just pick one from this new array.
¿O(1)?

What is the most efficient way to generate unique pseudo-random numbers? [duplicate]

Duplicate:
Unique random numbers in O(1)?
I want an pseudo random number generator that can generate numbers with no repeats in a random order.
For example:
random(10)
might return
5, 9, 1, 4, 2, 8, 3, 7, 6, 10
Is there a better way to do it other than making the range of numbers and shuffling them about, or checking the generated list for repeats?
Edit:
Also I want it to be efficient in generating big numbers without the entire range.
Edit:
I see everyone suggesting shuffle algorithms. But if I want to generate large random number (1024 byte+) then that method would take alot more memory than if I just used a regular RNG and inserted into a Set until it was a specified length, right? Is there no better mathematical algorithm for this.
You may be interested in a linear feedback shift register.
We used to build these out of hardware, but I've also done them in software. It uses a shift register with some of the bits xor'ed and fed back to the input, and if you pick just the right "taps" you can get a sequence that's as long as the register size. That is, a 16-bit lfsr can produce a sequence 65535 long with no repeats. It's statistically random but of course eminently repeatable. Also, if it's done wrong, you can get some embarrassingly short sequences. If you look up the lfsr, you will find examples of how to construct them properly (which is to say, "maximal length").
A shuffle is a perfectly good way to do this (provided you do not introduce a bias using the naive algorithm). See Fisher-Yates shuffle.
If a random number is guaranteed to never repeat it is no longer random and the amount of randomness decreases as the numbers are generated (after nine numbers random(10) is rather predictable and even after only eight you have a 50-50 chance).
I understand tou don't want a shuffle for large ranges, since you'd have to store the whole list to do so.
Instead, use a reversible pseudo-random hash. Then feed in the values 0 1 2 3 4 5 6 etc in turn.
There are infinite numbers of hashes like this. They're not too hard to generate if they're restricted to a power of 2, but any base can be used.
Here's one that would work for example if you wanted to go through all 2^32 32 bit values. It's easiest to write because the implicit mod 2^32 of integer math works to your advantage in this case.
unsigned int reversableHash(unsigned int x)
{
x*=0xDEADBEEF;
x=x^(x>>17);
x*=0x01234567;
x+=0x88776655;
x=x^(x>>4);
x=x^(x>>9);
x*=0x91827363;
x=x^(x>>7);
x=x^(x>>11);
x=x^(x>>20);
x*=0x77773333;
return x;
}
If you don't mind mediocre randomness properties and if the number of elements allows it then you could use a linear congruential random number generator.
A shuffle is the best you can do for random numbers in a specific range with no repeats. The reason that the method you describe (randomly generate numbers and put them in a Set until you reach a specified length) is less efficient is because of duplicates. Theoretically, that algorithm might never finish. At best it will finish in an indeterminable amount of time, as compared to a shuffle, which will always run in a highly predictable amount of time.
Response to edits and comments:
If, as you indicate in the comments, the range of numbers is very large and you want to select relatively few of them at random with no repeats, then the likelihood of repeats diminishes rapidly. The bigger the difference in size between the range and the number of selections, the smaller the likelihood of repeat selections, and the better the performance will be for the select-and-check algorithm you describe in the question.
What about using GUID generator (like in the one in .NET). Granted it is not guaranteed that there will be no duplicates, however the chance getting one is pretty low.
This has been asked before - see my answer to the previous question. In a nutshell: You can use a block cipher to generate a secure (random) permutation over any range you want, without having to store the entire permutation at any point.
If you want to creating large (say, 64 bits or greater) random numbers with no repeats, then just create them. If you're using a good random number generator, that actually has enough entropy, then the odds of generating repeats are so miniscule as to not be worth worrying about.
For instance, when generating cryptographic keys, no one actually bothers checking to see if they've generated the same key before; since you're trusting your random number generator that a dedicated attacker won't be able to get the same key out, then why would you expect that you would come up with the same key accidentally?
Of course, if you have a bad random number generator (like the Debian SSL random number generator vulnerability), or are generating small enough numbers that the birthday paradox gives you a high chance of collision, then you will need to actually do something to ensure you don't get repeats. But for large random numbers with a good generator, just trust probability not to give you any repeats.
As you generate your numbers, use a Bloom filter to detect duplicates. This would use a minimal amount of memory. There would be no need to store earlier numbers in the series at all.
The trade off is that your list could not be exhaustive in your range. If your numbers are truly on the order of 256^1024, that's hardly any trade off at all.
(Of course if they are actually random on that scale, even bothering to detect duplicates is a waste of time. If every computer on earth generated a trillion random numbers that size every second for trillions of years, the chance of a collision is still absolutely negligible.)
I second gbarry's answer about using an LFSR. They are very efficient and simple to implement even in software and are guaranteed not to repeat in (2^N - 1) uses for an LFSR with an N-bit shift-register.
There are some drawbacks however: by observing a small number of outputs from the RNG, one can reconstruct the LFSR and predict all values it will generate, making them not usable for cryptography and anywhere were a good RNG is needed. The second problem is that either the all zero word or the all one (in terms of bits) word is invalid depending on the LFSR implementation. The third issue which is relevant to your question is that the maximum number generated by the LFSR is always a power of 2 - 1 (or power of 2 - 2).
The first drawback might not be an issue depending on your application. From the example you gave, it seems that you are not expecting zero to be among the answers; so, the second issue does not seem relevant to your case.
The maximum value (and thus range) problem can solved by reusing the LFSR until you get a number within your range. Here's an example:
Say you want to have numbers between 1 and 10 (as in your example). You would use a 4-bit LFSR which has a range [1, 15] inclusive. Here's a pseudo code as to how to get number in the range [1,10]:
x = LFSR.getRandomNumber();
while (x > 10) {
x = LFSR.getRandomNumber();
}
You should embed the previous code in your RNG; so that the caller wouldn't care about implementation.
Note that this would slow down your RNG if you use a large shift-register and the maximum number you want is not a power of 2 - 1.
This answer suggests some strategies for getting what you want and ensuring they are in a random order using some already well-known algorithms.
There is an inside out version of the Fisher-Yates shuffle algorithm, called the Durstenfeld version, that randomly distributes sequentially acquired items into arrays and collections while loading the array or collection.
One thing to remember is that the Fisher-Yates (AKA Knuth) shuffle or the Durstenfeld version used at load time is highly efficient with arrays of objects because only the reference pointer to the object is being moved and the object itself doesn't have to be examined or compared with any other object as part of the algorithm.
I will give both algorithms further below.
If you want really huge random numbers, on the order of 1024 bytes or more, a really good random generator that can generate unsigned bytes or words at a time will suffice. Randomly generate as many bytes or words as you need to construct the number, make it into an object with a reference pointer to it and, hey presto, you have a really huge random integer. If you need a specific really huge range, you can add a base value of zero bytes to the low-order end of the byte sequence to shift the value up. This may be your best option.
If you need to eliminate duplicates of really huge random numbers, then that is trickier. Even with really huge random numbers, removing duplicates also makes them significantly biased and not random at all. If you have a really large set of unduplicated really huge random numbers and you randomly select from the ones not yet selected, then the bias is only the bias in creating the huge values for the really huge set of numbers from which to choose. A reverse version of Durstenfeld's version of the Yates-Fisher could be used to randomly choose values from a really huge set of them, remove them from the remaining values from which to choose and insert them into a new array that is a subset and could do this with just the source and target arrays in situ. This would be very efficient.
This may be a good strategy for getting a small number of random numbers with enormous values from a really large set of them in which they are not duplicated. Just pick a random location in the source set, obtain its value, swap its value with the top element in the source set, reduce the size of the source set by one and repeat with the reduced size source set until you have chosen enough values. This is essentiall the Durstenfeld version of Fisher-Yates in reverse. You can then use the Dursenfeld version of the Fisher-Yates algorithm to insert the acquired values into the destination set. However, that is overkill since they should be randomly chosen and randomly ordered as given here.
Both algorithms assume you have some random number instance method, nextInt(int setSize), that generates a random integer from zero to setSize meaning there are setSize possible values. In this case, it will be the size of the array since the last index to the array is size-1.
The first algorithm is the Durstenfeld version of Fisher-Yates (aka Knuth) shuffle algorithm as applied to an array of arbitrary length, one that simply randomly positions integers from 0 to the length of the array into the array. The array need not be an array of integers, but can be an array of any objects that are acquired sequentially which, effectively, makes it an array of reference pointers. It is simple, short and very effective
int size = someNumber;
int[] int array = new int[size]; // here is the array to load
int location; // this will get assigned a value before used
// i will also conveniently be the value to load, but any sequentially acquired
// object will work
for (int i = 0; i <= size; i++) { // conveniently, i is also the value to load
// you can instance or acquire any object at this place in the algorithm to load
// by reference, into the array and use a pointer to it in place of j
int j = i; // in this example, j is trivially i
if (i == 0) { // first integer goes into first location
array[i] = j; // this may get swapped from here later
} else { // subsequent integers go into random locations
// the next random location will be somewhere in the locations
// already used or a new one at the end
// here we get the next random location
// to preserve true randomness without a significant bias
// it is REALLY IMPORTANT that the newest value could be
// stored in the newest location, that is,
// location has to be able to randomly have the value i
int location = nextInt(i + 1); // a random value between 0 and i
// move the random location's value to the new location
array[i] = array[location];
array[location] = j; // put the new value into the random location
} // end if...else
} // end for
Voila, you now have an already randomized array.
If you want to randomly shuffle an array you already have, here is the standard Fisher-Yates algorithm.
type[] array = new type[size];
// some code that loads array...
// randomly pick an item anywhere in the current array segment,
// swap it with the top element in the current array segment,
// then shorten the array segment by 1
// just as with the Durstenfeld version above,
// it is REALLY IMPORTANT that an element could get
// swapped with itself to avoid any bias in the randomization
type temp; // this will get assigned a value before used
int location; // this will get assigned a value before used
for (int i = arrayLength -1 ; i > 0; i--) {
int location = nextInt(i + 1);
temp = array[i];
array[i] = array[location];
array[location] = temp;
} // end for
For sequenced collections and sets, i.e. some type of list object, you could just use adds/or inserts with an index value that allows you to insert items anywhere, but it has to allow adding or appending after the current last item to avoid creating bias in the randomization.
Shuffling N elements doesn't take up excessive memory...think about it. You only swap one element at a time, so the maximum memory used is that of N+1 elements.
Assuming you have a random or pseudo-random number generator, even if it's not guaranteed to return unique values, you can implement one that returns unique values each time using this code, assuming that the upper limit remains constant (i.e. you always call it with random(10), and don't call it with random(10); random(11).
The code doesn't check for errors. You can add that yourself if you want to.
It also requires a lot of memory if you want a large range of numbers.
/* the function returns a random number between 0 and max -1
* not necessarily unique
* I assume it's written
*/
int random(int max);
/* the function returns a unique random number between 0 and max - 1 */
int unique_random(int max)
{
static int *list = NULL; /* contains a list of numbers we haven't returned */
static int in_progress = 0; /* 0 --> we haven't started randomizing numbers
* 1 --> we have started randomizing numbers
*/
static int count;
static prev_max = 0;
// initialize the list
if (!in_progress || (prev_max != max)) {
if (list != NULL) {
free(list);
}
list = malloc(sizeof(int) * max);
prev_max = max;
in_progress = 1;
count = max - 1;
int i;
for (i = max - 1; i >= 0; --i) {
list[i] = i;
}
}
/* now choose one from the list */
int index = random(count);
int retval = list[index];
/* now we throw away the returned value.
* we do this by shortening the list by 1
* and replacing the element we returned with
* the highest remaining number
*/
swap(&list[index], &list[count]);
/* when the count reaches 0 we start over */
if (count == 0) {
in_progress = 0;
free(list);
list = 0;
} else { /* reduce the counter by 1 */
count--;
}
}
/* swap two numbers */
void swap(int *x, int *y)
{
int temp = *x;
*x = *y;
*y = temp;
}
Actually, there's a minor point to make here; a random number generator which is not permitted to repeat is not random.
Suppose you wanted to generate a series of 256 random numbers without repeats.
Create a 256-bit (32-byte) memory block initialized with zeros, let's call it b
Your looping variable will be n, the number of numbers yet to be generated
Loop from n = 256 to n = 1
Generate a random number r in the range [0, n)
Find the r-th zero bit in your memory block b, let's call it p
Put p in your list of results, an array called q
Flip the p-th bit in memory block b to 1
After the n = 1 pass, you are done generating your list of numbers
Here's a short example of what I am talking about, using n = 4 initially:
**Setup**
b = 0000
q = []
**First loop pass, where n = 4**
r = 2
p = 2
b = 0010
q = [2]
**Second loop pass, where n = 3**
r = 2
p = 3
b = 0011
q = [2, 3]
**Third loop pass, where n = 2**
r = 0
p = 0
b = 1011
q = [2, 3, 0]
** Fourth and final loop pass, where n = 1**
r = 0
p = 1
b = 1111
q = [2, 3, 0, 1]
Please check answers at
Generate sequence of integers in random order without constructing the whole list upfront
and also my answer lies there as
very simple random is 1+((power(r,x)-1) mod p) will be from 1 to p for values of x from 1 to p and will be random where r and p are prime numbers and r <> p.
I asked a similar question before but mine was for the whole range of a int see Looking for a Hash Function /Ordered Int/ to /Shuffled Int/
static std::unordered_set<long> s;
long l = 0;
for(; !l && (s.end() != s.find(l)); l = generator());
v.insert(l);
generator() being your random number generator. You roll numbers as long as the entry is not in your set, then you add what you find in it. You get the idea.
I did it with long for the example, but you should make that a template if your PRNG is templatized.
Alternative is to use a cryptographically secure PRNG that will have a very low probability to generate twice the same number.
If you don't mean poor statisticall properties of generated sequence, there is one method:
Let's say you want to generate N numbers, each of 1024 bits each. You can sacrifice some bits of generated number to be "counter".
So you generate each random number, but into some bits you choosen you put binary encoded counter (from variable, you increase each time next random number is generated).
You can split that number into single bits and put it in some of less significant bits of generated number.
That way you are sure you get unique number each time.
I mean for example each generated number looks like that:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyxxxxyxyyyyxxyxx
where x is take directly from generator, and ys are taken from counter variable.
Mersenne twister
Description of which can be found here on Wikipedia: Mersenne twister
Look at the bottom of the page for implementations in various languages.
The problem is to select a "random" sequence of N unique numbers from the range 1..M where there is no constraint on the relationship between N and M (M could be much bigger, about the same, or even smaller than N; they may not be relatively prime).
Expanding on the linear feedback shift register answer: for a given M, construct a maximal LFSR for the smallest power of two that is larger than M. Then just grab your numbers from the LFSR throwing out numbers larger than M. On average, you will throw out at most half the generated numbers (since by construction more than half the range of the LFSR is less than M), so the expected running time of getting a number is O(1). You are not storing previously generated numbers so space consumption is O(1) too. If you cycle before getting N numbers then M less than N (or the LFSR is constructed incorrectly).
You can find the parameters for maximum length LFSRs up to 168 bits here (from wikipedia): http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf
Here's some java code:
/**
* Generate a sequence of unique "random" numbers in [0,M)
* #author dkoes
*
*/
public class UniqueRandom
{
long lfsr;
long mask;
long max;
private static long seed = 1;
//indexed by number of bits
private static int [][] taps = {
null, // 0
null, // 1
null, // 2
{3,2}, //3
{4,3},
{5,3},
{6,5},
{7,6},
{8,6,5,4},
{9,5},
{10,7},
{11,9},
{12,6,4,1},
{13,4,3,1},
{14,5,3,1},
{15,14},
{16,15,13,4},
{17,14},
{18,11},
{19,6,2,1},
{20,17},
{21,19},
{22,21},
{23,18},
{24,23,22,17},
{25,22},
{26,6,2,1},
{27,5,2,1},
{28,25},
{29,27},
{30,6,4,1},
{31,28},
{32,22,2,1},
{33,20},
{34,27,2,1},
{35,33},
{36,25},
{37,5,4,3,2,1},
{38,6,5,1},
{39,35},
{40,38,21,19},
{41,38},
{42,41,20,19},
{43,42,38,37},
{44,43,18,17},
{45,44,42,41},
{46,45,26,25},
{47,42},
{48,47,21,20},
{49,40},
{50,49,24,23},
{51,50,36,35},
{52,49},
{53,52,38,37},
{54,53,18,17},
{55,31},
{56,55,35,34},
{57,50},
{58,39},
{59,58,38,37},
{60,59},
{61,60,46,45},
{62,61,6,5},
{63,62},
};
//m is upperbound; things break if it isn't positive
UniqueRandom(long m)
{
max = m;
lfsr = seed; //could easily pass a starting point instead
//figure out number of bits
int bits = 0;
long b = m;
while((b >>>= 1) != 0)
{
bits++;
}
bits++;
if(bits < 3)
bits = 3;
mask = 0;
for(int i = 0; i < taps[bits].length; i++)
{
mask |= (1L << (taps[bits][i]-1));
}
}
//return -1 if we've cycled
long next()
{
long ret = -1;
if(lfsr == 0)
return -1;
do {
ret = lfsr;
//update lfsr - from wikipedia
long lsb = lfsr & 1;
lfsr >>>= 1;
if(lsb == 1)
lfsr ^= mask;
if(lfsr == seed)
lfsr = 0; //cycled, stick
ret--; //zero is stuck state, never generated so sub 1 to get it
} while(ret >= max);
return ret;
}
}
Here is a way to random without repeating results. It also works for strings. Its in C# but the logig should work in many places. Put the random results in a list and check if the new random element is in that list. If not than you have a new random element. If it is in that list, repeat the random until you get an element that is not in that list.
List<string> Erledigte = new List<string>();
private void Form1_Load(object sender, EventArgs e)
{
label1.Text = "";
listBox1.Items.Add("a");
listBox1.Items.Add("b");
listBox1.Items.Add("c");
listBox1.Items.Add("d");
listBox1.Items.Add("e");
}
private void button1_Click(object sender, EventArgs e)
{
Random rand = new Random();
int index=rand.Next(0, listBox1.Items.Count);
string rndString = listBox1.Items[index].ToString();
if (listBox1.Items.Count <= Erledigte.Count)
{
return;
}
else
{
if (Erledigte.Contains(rndString))
{
//MessageBox.Show("vorhanden");
while (Erledigte.Contains(rndString))
{
index = rand.Next(0, listBox1.Items.Count);
rndString = listBox1.Items[index].ToString();
}
}
Erledigte.Add(rndString);
label1.Text += rndString;
}
}
For a sequence to be random there should not be any auto correlation. The restriction that the numbers should not repeat means the next number should depend on all the previous numbers which means it is not random anymore....
If you can generate 'small' random numbers, you can generate 'large' random numbers by integrating them: add a small random increment to each 'previous'.
const size_t amount = 100; // a limited amount of random numbers
vector<long int> numbers;
numbers.reserve( amount );
const short int spread = 250; // about 250 between each random number
numbers.push_back( myrandom( spread ) );
for( int n = 0; n != amount; ++n ) {
const short int increment = myrandom( spread );
numbers.push_back( numbers.back() + increment );
}
myshuffle( numbers );
The myrandom and myshuffle functions I hereby generously delegate to others :)
to have non repeated random numbers and to avoid waistingtime with checking for doubles numbers and get new numbers over and over use the below method which will assure the minimum usage of Rand:
for example if you want to get 100 non repeated random number:
1. fill an array with numbers from 1 to 100
2. get a random number using Rand function in the range of (1-100)
3. use the genarted random number as an Index to get th value from the array (Numbers[IndexGeneratedFromRandFunction]
4. shift the number in the array after that Index to the left
5. repeat from step 2 but now the the rang should be (1-99) and go on
now we have a array with different numbers!
int main() {
int b[(the number
if them)];
for (int i = 0; i < (the number of them); i++) {
int a = rand() % (the number of them + 1) + 1;
int j = 0;
while (j < i) {
if (a == b[j]) {
a = rand() % (the number of them + 1) + 1;
j = -1;
}
j++;
}
b[i] = a;
}
}

Fastest way to obtain the largest X numbers from a very large unsorted list?

I'm trying to obtain the top say, 100 scores from a list of scores being generated by my program. Unfortuatly the list is huge (on the order of millions to billions) so sorting is a time intensive portion of the program.
Whats the best way of doing the sorting to get the top 100 scores?
The only two methods i can think of so far is either first generating all the scores into a massive array and then sorting it and taking the top 100. Or second, generating X number of scores, sorting it and truncating the top 100 scores then continue generating more scores, adding them to the truncated list and then sorting it again.
Either way I do it, it still takes more time than i would like, any ideas on how to do it in an even more efficient way? (I've never taken programming courses before, maybe those of you with comp sci degrees know about efficient algorithms to do this, at least that's what I'm hoping).
Lastly, whats the sorting algorithm used by the standard sort() function in c++?
Thanks,
-Faken
Edit: Just for anyone who is curious...
I did a few time trials on the before and after and here are the results:
Old program (preforms sorting after each outer loop iteration):
top 100 scores: 147 seconds
top 10 scores: 147 seconds
top 1 scores: 146 seconds
Sorting disabled: 55 seconds
new program (implementing tracking of only top scores and using default sorting function):
top 100 scores: 350 seconds <-- hmm...worse than before
top 10 scores: 103 seconds
top 1 scores: 69 seconds
Sorting disabled: 51 seconds
new rewrite (optimizations in data stored, hand written sorting algorithm):
top 100 scores: 71 seconds <-- Very nice!
top 10 scores: 52 seconds
top 1 scores: 51 seconds
Sorting disabled: 50 seconds
Done on a core 2, 1.6 GHz...I can't wait till my core i7 860 arrives...
There's a lot of other even more aggressive optimizations for me to work out (mainly in the area of reducing the number of iterations i run), but as it stands right now, the speed is more than good enough, i might not even bother to work out those algorithm optimizations.
Thanks to eveyrone for their input!
take the first 100 scores, and sort them in an array.
take the next score, and insertion-sort it into the array (starting at the "small" end)
drop the 101st value
continue with the next value, at 2, until done
Over time, the list will resemble the 100 largest value more and more, so more often, you find that the insertion sort immediately aborts, finding that the new value is smaller than the smallest value of the candidates for the top 100.
You can do this in O(n) time, without any sorting, using a heap:
#!/usr/bin/python
import heapq
def top_n(l, n):
top_n = []
smallest = None
for elem in l:
if len(top_n) < n:
top_n.append(elem)
if len(top_n) == n:
heapq.heapify(top_n)
smallest = heapq.nsmallest(1, top_n)[0]
else:
if elem > smallest:
heapq.heapreplace(top_n, elem)
smallest = heapq.nsmallest(1, top_n)[0]
return sorted(top_n)
def random_ints(n):
import random
for i in range(0, n):
yield random.randint(0, 10000)
print top_n(random_ints(1000000), 100)
Times on my machine (Core2 Q6600, Linux, Python 2.6, measured with bash time builtin):
100000 elements: .29 seconds
1000000 elements: 2.8 seconds
10000000 elements: 25.2 seconds
Edit/addition: In C++, you can use std::priority_queue in much the same way as Python's heapq module is used here. You'll want to use the std::greater ordering instead of the default std::less, so that the top() member function returns the smallest element instead of the largest one. C++'s priority queue doesn't have the equivalent of heapreplace, which replaces the top element with a new one, so instead you'll want to pop the top (smallest) element and then push the newly seen value. Other than that the algorithm translates quite cleanly from Python to C++.
Here's the 'natural' C++ way to do this:
std::vector<Score> v;
// fill in v
std::partial_sort(v.begin(), v.begin() + 100, v.end(), std::greater<Score>());
std::sort(v.begin(), v.begin() + 100);
This is linear in the number of scores.
The algorithm used by std::sort isn't specified by the standard, but libstdc++ (used by g++) uses an "adaptive introsort", which is essentially a median-of-3 quicksort down to a certain level, followed by an insertion sort.
Declare an array where you can put the 100 best scores. Loop through the huge list and check for each item if it qualifies to be inserted in the top 100. Use a simple insert sort to add an item to the top list.
Something like this (C# code, but you get the idea):
Score[] toplist = new Score[100];
int size = 0;
foreach (Score score in hugeList) {
int pos = size;
while (pos > 0 && toplist[pos - 1] < score) {
pos--;
if (pos < 99) toplist[pos + 1] = toplist[pos];
}
if (size < 100) size++;
if (pos < size) toplist[pos] = score;
}
I tested it on my computer (Code 2 Duo 2.54 MHz Win 7 x64) and I can process 100.000.000 items in 369 ms.
Since speed is of the essence here, and 40.000 possible highscore values is totally maintainable by any of today's computers, I'd resort to bucket sort for simplicity. My guess is that it would outperform any of the algorithms proposed thus far. The downside is that you'd have to determine some upper limit for the highscore values.
So, let's assume your max highscore value is 40.000:
Make an array of 40.000 entries. Loop through your highscore values. Each time you encounter highscore x, increase your array[x] by one. After this, all you have to do is count the top entries in your array until you have reached 100 counted highscores.
You can do it in Haskell like this:
largest100 xs = take 100 $ sortBy (flip compare) xs
This looks like it sorts all the numbers into descending order (the "flip compare" bit reverses the arguments to the standard comparison function) and then returns the first 100 entries from the list. But Haskell is lazily evaluated, so the sortBy function does just enough sorting to find the first 100 numbers in the list, and then stops.
Purists will note that you could also write the function as
largest100 = take 100 . sortBy (flip compare)
This means just the same thing, but illustrates the Haskell style of composing a new function out of the building blocks of other functions rather than handing variables around the place.
You want the absolute largest X numbers, so I'm guessing you don't want some sort of heuristic. How unsorted is the list? If it's pretty random, your best bet really is just to do a quick sort on the whole list and grab the top X results.
If you can filter scores during the list generation, that's way way better. Only ever store X values, and every time you get a new value, compare it to those X values. If it's less than all of them, throw it out. If it's bigger than one of them, throw out the new smallest value.
If X is small enough you can even keep your list of X values sorted so that you are comparing your new number to a sorted list of values, you can make an O(1) check to see if the new value is smaller than all of the rest and thus throw it out. Otherwise, a quick binary search can find where the new value goes in the list and then you can throw away the first value of the array (assuming the first element is the smallest element).
Place the data into a balanced Tree structure (probably Red-Black tree) that does the sorting in place. Insertions should be O(lg n). Grabbing the highest x scores should be O(lg n) as well.
You can prune the tree every once in awhile if you find you need optimizations at some point.
If you only need to report the value of top 100 scores (and not any associated data), and if you know that the scores will all be in a finite range such as [0,100], then an easy way to do it is with "counting sort"...
Basically, create an array representing all possible values (e.g. an array of size 101 if scores can range from 0 to 100 inclusive), and initialize all the elements of the array with a value of 0. Then, iterate through the list of scores, incrementing the corresponding entry in the list of achieved scores. That is, compile the number of times each score in the range has been achieved. Then, working from the end of the array to the beginning of the array, you can pick out the top X score. Here is some pseudo-code:
let type Score be an integer ranging from 0 to 100, inclusive.
let scores be an array of Score objects
let scorerange be an array of integers of size 101.
for i in [0,100]
set scorerange[i] = 0
for each score in scores
set scorerange[score] = scorerange[score] + 1
let top be the number of top scores to report
let idx be an integer initialized to the end of scorerange (i.e. 100)
while (top > 0) and (idx>=0):
if scorerange[idx] > 0:
report "There are " scorerange[idx] " scores with value " idx
top = top - scorerange[idx]
idx = idx - 1;
I answered this question in response to an interview question in 2008. I implemented a templatized priority queue in C#.
using System;
using System.Collections.Generic;
using System.Text;
namespace CompanyTest
{
// Based on pre-generics C# implementation at
// http://www.boyet.com/Articles/WritingapriorityqueueinC.html
// and wikipedia article
// http://en.wikipedia.org/wiki/Binary_heap
class PriorityQueue<T>
{
struct Pair
{
T val;
int priority;
public Pair(T v, int p)
{
this.val = v;
this.priority = p;
}
public T Val { get { return this.val; } }
public int Priority { get { return this.priority; } }
}
#region Private members
private System.Collections.Generic.List<Pair> array = new System.Collections.Generic.List<Pair>();
#endregion
#region Constructor
public PriorityQueue()
{
}
#endregion
#region Public methods
public void Enqueue(T val, int priority)
{
Pair p = new Pair(val, priority);
array.Add(p);
bubbleUp(array.Count - 1);
}
public T Dequeue()
{
if (array.Count <= 0)
throw new System.InvalidOperationException("Queue is empty");
else
{
Pair result = array[0];
array[0] = array[array.Count - 1];
array.RemoveAt(array.Count - 1);
if (array.Count > 0)
trickleDown(0);
return result.Val;
}
}
#endregion
#region Private methods
private static int ParentOf(int index)
{
return (index - 1) / 2;
}
private static int LeftChildOf(int index)
{
return (index * 2) + 1;
}
private static bool ParentIsLowerPriority(Pair parent, Pair item)
{
return (parent.Priority < item.Priority);
}
// Move high priority items from bottom up the heap
private void bubbleUp(int index)
{
Pair item = array[index];
int parent = ParentOf(index);
while ((index > 0) && ParentIsLowerPriority(array[parent], item))
{
// Parent is lower priority -- move it down
array[index] = array[parent];
index = parent;
parent = ParentOf(index);
}
// Write the item once in its correct place
array[index] = item;
}
// Push low priority items from the top of the down
private void trickleDown(int index)
{
Pair item = array[index];
int child = LeftChildOf(index);
while (child < array.Count)
{
bool rightChildExists = ((child + 1) < array.Count);
if (rightChildExists)
{
bool rightChildIsHigherPriority = (array[child].Priority < array[child + 1].Priority);
if (rightChildIsHigherPriority)
child++;
}
// array[child] points at higher priority sibling -- move it up
array[index] = array[child];
index = child;
child = LeftChildOf(index);
}
// Put the former root in its correct place
array[index] = item;
bubbleUp(index);
}
#endregion
}
}
Median of medians algorithm.

Create Random Number Sequence with No Repeats

Duplicate:
Unique random numbers in O(1)?
I want an pseudo random number generator that can generate numbers with no repeats in a random order.
For example:
random(10)
might return
5, 9, 1, 4, 2, 8, 3, 7, 6, 10
Is there a better way to do it other than making the range of numbers and shuffling them about, or checking the generated list for repeats?
Edit:
Also I want it to be efficient in generating big numbers without the entire range.
Edit:
I see everyone suggesting shuffle algorithms. But if I want to generate large random number (1024 byte+) then that method would take alot more memory than if I just used a regular RNG and inserted into a Set until it was a specified length, right? Is there no better mathematical algorithm for this.
You may be interested in a linear feedback shift register.
We used to build these out of hardware, but I've also done them in software. It uses a shift register with some of the bits xor'ed and fed back to the input, and if you pick just the right "taps" you can get a sequence that's as long as the register size. That is, a 16-bit lfsr can produce a sequence 65535 long with no repeats. It's statistically random but of course eminently repeatable. Also, if it's done wrong, you can get some embarrassingly short sequences. If you look up the lfsr, you will find examples of how to construct them properly (which is to say, "maximal length").
A shuffle is a perfectly good way to do this (provided you do not introduce a bias using the naive algorithm). See Fisher-Yates shuffle.
If a random number is guaranteed to never repeat it is no longer random and the amount of randomness decreases as the numbers are generated (after nine numbers random(10) is rather predictable and even after only eight you have a 50-50 chance).
I understand tou don't want a shuffle for large ranges, since you'd have to store the whole list to do so.
Instead, use a reversible pseudo-random hash. Then feed in the values 0 1 2 3 4 5 6 etc in turn.
There are infinite numbers of hashes like this. They're not too hard to generate if they're restricted to a power of 2, but any base can be used.
Here's one that would work for example if you wanted to go through all 2^32 32 bit values. It's easiest to write because the implicit mod 2^32 of integer math works to your advantage in this case.
unsigned int reversableHash(unsigned int x)
{
x*=0xDEADBEEF;
x=x^(x>>17);
x*=0x01234567;
x+=0x88776655;
x=x^(x>>4);
x=x^(x>>9);
x*=0x91827363;
x=x^(x>>7);
x=x^(x>>11);
x=x^(x>>20);
x*=0x77773333;
return x;
}
If you don't mind mediocre randomness properties and if the number of elements allows it then you could use a linear congruential random number generator.
A shuffle is the best you can do for random numbers in a specific range with no repeats. The reason that the method you describe (randomly generate numbers and put them in a Set until you reach a specified length) is less efficient is because of duplicates. Theoretically, that algorithm might never finish. At best it will finish in an indeterminable amount of time, as compared to a shuffle, which will always run in a highly predictable amount of time.
Response to edits and comments:
If, as you indicate in the comments, the range of numbers is very large and you want to select relatively few of them at random with no repeats, then the likelihood of repeats diminishes rapidly. The bigger the difference in size between the range and the number of selections, the smaller the likelihood of repeat selections, and the better the performance will be for the select-and-check algorithm you describe in the question.
What about using GUID generator (like in the one in .NET). Granted it is not guaranteed that there will be no duplicates, however the chance getting one is pretty low.
This has been asked before - see my answer to the previous question. In a nutshell: You can use a block cipher to generate a secure (random) permutation over any range you want, without having to store the entire permutation at any point.
If you want to creating large (say, 64 bits or greater) random numbers with no repeats, then just create them. If you're using a good random number generator, that actually has enough entropy, then the odds of generating repeats are so miniscule as to not be worth worrying about.
For instance, when generating cryptographic keys, no one actually bothers checking to see if they've generated the same key before; since you're trusting your random number generator that a dedicated attacker won't be able to get the same key out, then why would you expect that you would come up with the same key accidentally?
Of course, if you have a bad random number generator (like the Debian SSL random number generator vulnerability), or are generating small enough numbers that the birthday paradox gives you a high chance of collision, then you will need to actually do something to ensure you don't get repeats. But for large random numbers with a good generator, just trust probability not to give you any repeats.
As you generate your numbers, use a Bloom filter to detect duplicates. This would use a minimal amount of memory. There would be no need to store earlier numbers in the series at all.
The trade off is that your list could not be exhaustive in your range. If your numbers are truly on the order of 256^1024, that's hardly any trade off at all.
(Of course if they are actually random on that scale, even bothering to detect duplicates is a waste of time. If every computer on earth generated a trillion random numbers that size every second for trillions of years, the chance of a collision is still absolutely negligible.)
I second gbarry's answer about using an LFSR. They are very efficient and simple to implement even in software and are guaranteed not to repeat in (2^N - 1) uses for an LFSR with an N-bit shift-register.
There are some drawbacks however: by observing a small number of outputs from the RNG, one can reconstruct the LFSR and predict all values it will generate, making them not usable for cryptography and anywhere were a good RNG is needed. The second problem is that either the all zero word or the all one (in terms of bits) word is invalid depending on the LFSR implementation. The third issue which is relevant to your question is that the maximum number generated by the LFSR is always a power of 2 - 1 (or power of 2 - 2).
The first drawback might not be an issue depending on your application. From the example you gave, it seems that you are not expecting zero to be among the answers; so, the second issue does not seem relevant to your case.
The maximum value (and thus range) problem can solved by reusing the LFSR until you get a number within your range. Here's an example:
Say you want to have numbers between 1 and 10 (as in your example). You would use a 4-bit LFSR which has a range [1, 15] inclusive. Here's a pseudo code as to how to get number in the range [1,10]:
x = LFSR.getRandomNumber();
while (x > 10) {
x = LFSR.getRandomNumber();
}
You should embed the previous code in your RNG; so that the caller wouldn't care about implementation.
Note that this would slow down your RNG if you use a large shift-register and the maximum number you want is not a power of 2 - 1.
This answer suggests some strategies for getting what you want and ensuring they are in a random order using some already well-known algorithms.
There is an inside out version of the Fisher-Yates shuffle algorithm, called the Durstenfeld version, that randomly distributes sequentially acquired items into arrays and collections while loading the array or collection.
One thing to remember is that the Fisher-Yates (AKA Knuth) shuffle or the Durstenfeld version used at load time is highly efficient with arrays of objects because only the reference pointer to the object is being moved and the object itself doesn't have to be examined or compared with any other object as part of the algorithm.
I will give both algorithms further below.
If you want really huge random numbers, on the order of 1024 bytes or more, a really good random generator that can generate unsigned bytes or words at a time will suffice. Randomly generate as many bytes or words as you need to construct the number, make it into an object with a reference pointer to it and, hey presto, you have a really huge random integer. If you need a specific really huge range, you can add a base value of zero bytes to the low-order end of the byte sequence to shift the value up. This may be your best option.
If you need to eliminate duplicates of really huge random numbers, then that is trickier. Even with really huge random numbers, removing duplicates also makes them significantly biased and not random at all. If you have a really large set of unduplicated really huge random numbers and you randomly select from the ones not yet selected, then the bias is only the bias in creating the huge values for the really huge set of numbers from which to choose. A reverse version of Durstenfeld's version of the Yates-Fisher could be used to randomly choose values from a really huge set of them, remove them from the remaining values from which to choose and insert them into a new array that is a subset and could do this with just the source and target arrays in situ. This would be very efficient.
This may be a good strategy for getting a small number of random numbers with enormous values from a really large set of them in which they are not duplicated. Just pick a random location in the source set, obtain its value, swap its value with the top element in the source set, reduce the size of the source set by one and repeat with the reduced size source set until you have chosen enough values. This is essentiall the Durstenfeld version of Fisher-Yates in reverse. You can then use the Dursenfeld version of the Fisher-Yates algorithm to insert the acquired values into the destination set. However, that is overkill since they should be randomly chosen and randomly ordered as given here.
Both algorithms assume you have some random number instance method, nextInt(int setSize), that generates a random integer from zero to setSize meaning there are setSize possible values. In this case, it will be the size of the array since the last index to the array is size-1.
The first algorithm is the Durstenfeld version of Fisher-Yates (aka Knuth) shuffle algorithm as applied to an array of arbitrary length, one that simply randomly positions integers from 0 to the length of the array into the array. The array need not be an array of integers, but can be an array of any objects that are acquired sequentially which, effectively, makes it an array of reference pointers. It is simple, short and very effective
int size = someNumber;
int[] int array = new int[size]; // here is the array to load
int location; // this will get assigned a value before used
// i will also conveniently be the value to load, but any sequentially acquired
// object will work
for (int i = 0; i <= size; i++) { // conveniently, i is also the value to load
// you can instance or acquire any object at this place in the algorithm to load
// by reference, into the array and use a pointer to it in place of j
int j = i; // in this example, j is trivially i
if (i == 0) { // first integer goes into first location
array[i] = j; // this may get swapped from here later
} else { // subsequent integers go into random locations
// the next random location will be somewhere in the locations
// already used or a new one at the end
// here we get the next random location
// to preserve true randomness without a significant bias
// it is REALLY IMPORTANT that the newest value could be
// stored in the newest location, that is,
// location has to be able to randomly have the value i
int location = nextInt(i + 1); // a random value between 0 and i
// move the random location's value to the new location
array[i] = array[location];
array[location] = j; // put the new value into the random location
} // end if...else
} // end for
Voila, you now have an already randomized array.
If you want to randomly shuffle an array you already have, here is the standard Fisher-Yates algorithm.
type[] array = new type[size];
// some code that loads array...
// randomly pick an item anywhere in the current array segment,
// swap it with the top element in the current array segment,
// then shorten the array segment by 1
// just as with the Durstenfeld version above,
// it is REALLY IMPORTANT that an element could get
// swapped with itself to avoid any bias in the randomization
type temp; // this will get assigned a value before used
int location; // this will get assigned a value before used
for (int i = arrayLength -1 ; i > 0; i--) {
int location = nextInt(i + 1);
temp = array[i];
array[i] = array[location];
array[location] = temp;
} // end for
For sequenced collections and sets, i.e. some type of list object, you could just use adds/or inserts with an index value that allows you to insert items anywhere, but it has to allow adding or appending after the current last item to avoid creating bias in the randomization.
Shuffling N elements doesn't take up excessive memory...think about it. You only swap one element at a time, so the maximum memory used is that of N+1 elements.
Assuming you have a random or pseudo-random number generator, even if it's not guaranteed to return unique values, you can implement one that returns unique values each time using this code, assuming that the upper limit remains constant (i.e. you always call it with random(10), and don't call it with random(10); random(11).
The code doesn't check for errors. You can add that yourself if you want to.
It also requires a lot of memory if you want a large range of numbers.
/* the function returns a random number between 0 and max -1
* not necessarily unique
* I assume it's written
*/
int random(int max);
/* the function returns a unique random number between 0 and max - 1 */
int unique_random(int max)
{
static int *list = NULL; /* contains a list of numbers we haven't returned */
static int in_progress = 0; /* 0 --> we haven't started randomizing numbers
* 1 --> we have started randomizing numbers
*/
static int count;
static prev_max = 0;
// initialize the list
if (!in_progress || (prev_max != max)) {
if (list != NULL) {
free(list);
}
list = malloc(sizeof(int) * max);
prev_max = max;
in_progress = 1;
count = max - 1;
int i;
for (i = max - 1; i >= 0; --i) {
list[i] = i;
}
}
/* now choose one from the list */
int index = random(count);
int retval = list[index];
/* now we throw away the returned value.
* we do this by shortening the list by 1
* and replacing the element we returned with
* the highest remaining number
*/
swap(&list[index], &list[count]);
/* when the count reaches 0 we start over */
if (count == 0) {
in_progress = 0;
free(list);
list = 0;
} else { /* reduce the counter by 1 */
count--;
}
}
/* swap two numbers */
void swap(int *x, int *y)
{
int temp = *x;
*x = *y;
*y = temp;
}
Actually, there's a minor point to make here; a random number generator which is not permitted to repeat is not random.
Suppose you wanted to generate a series of 256 random numbers without repeats.
Create a 256-bit (32-byte) memory block initialized with zeros, let's call it b
Your looping variable will be n, the number of numbers yet to be generated
Loop from n = 256 to n = 1
Generate a random number r in the range [0, n)
Find the r-th zero bit in your memory block b, let's call it p
Put p in your list of results, an array called q
Flip the p-th bit in memory block b to 1
After the n = 1 pass, you are done generating your list of numbers
Here's a short example of what I am talking about, using n = 4 initially:
**Setup**
b = 0000
q = []
**First loop pass, where n = 4**
r = 2
p = 2
b = 0010
q = [2]
**Second loop pass, where n = 3**
r = 2
p = 3
b = 0011
q = [2, 3]
**Third loop pass, where n = 2**
r = 0
p = 0
b = 1011
q = [2, 3, 0]
** Fourth and final loop pass, where n = 1**
r = 0
p = 1
b = 1111
q = [2, 3, 0, 1]
Please check answers at
Generate sequence of integers in random order without constructing the whole list upfront
and also my answer lies there as
very simple random is 1+((power(r,x)-1) mod p) will be from 1 to p for values of x from 1 to p and will be random where r and p are prime numbers and r <> p.
I asked a similar question before but mine was for the whole range of a int see Looking for a Hash Function /Ordered Int/ to /Shuffled Int/
static std::unordered_set<long> s;
long l = 0;
for(; !l && (s.end() != s.find(l)); l = generator());
v.insert(l);
generator() being your random number generator. You roll numbers as long as the entry is not in your set, then you add what you find in it. You get the idea.
I did it with long for the example, but you should make that a template if your PRNG is templatized.
Alternative is to use a cryptographically secure PRNG that will have a very low probability to generate twice the same number.
If you don't mean poor statisticall properties of generated sequence, there is one method:
Let's say you want to generate N numbers, each of 1024 bits each. You can sacrifice some bits of generated number to be "counter".
So you generate each random number, but into some bits you choosen you put binary encoded counter (from variable, you increase each time next random number is generated).
You can split that number into single bits and put it in some of less significant bits of generated number.
That way you are sure you get unique number each time.
I mean for example each generated number looks like that:
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyxxxxyxyyyyxxyxx
where x is take directly from generator, and ys are taken from counter variable.
Mersenne twister
Description of which can be found here on Wikipedia: Mersenne twister
Look at the bottom of the page for implementations in various languages.
The problem is to select a "random" sequence of N unique numbers from the range 1..M where there is no constraint on the relationship between N and M (M could be much bigger, about the same, or even smaller than N; they may not be relatively prime).
Expanding on the linear feedback shift register answer: for a given M, construct a maximal LFSR for the smallest power of two that is larger than M. Then just grab your numbers from the LFSR throwing out numbers larger than M. On average, you will throw out at most half the generated numbers (since by construction more than half the range of the LFSR is less than M), so the expected running time of getting a number is O(1). You are not storing previously generated numbers so space consumption is O(1) too. If you cycle before getting N numbers then M less than N (or the LFSR is constructed incorrectly).
You can find the parameters for maximum length LFSRs up to 168 bits here (from wikipedia): http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf
Here's some java code:
/**
* Generate a sequence of unique "random" numbers in [0,M)
* #author dkoes
*
*/
public class UniqueRandom
{
long lfsr;
long mask;
long max;
private static long seed = 1;
//indexed by number of bits
private static int [][] taps = {
null, // 0
null, // 1
null, // 2
{3,2}, //3
{4,3},
{5,3},
{6,5},
{7,6},
{8,6,5,4},
{9,5},
{10,7},
{11,9},
{12,6,4,1},
{13,4,3,1},
{14,5,3,1},
{15,14},
{16,15,13,4},
{17,14},
{18,11},
{19,6,2,1},
{20,17},
{21,19},
{22,21},
{23,18},
{24,23,22,17},
{25,22},
{26,6,2,1},
{27,5,2,1},
{28,25},
{29,27},
{30,6,4,1},
{31,28},
{32,22,2,1},
{33,20},
{34,27,2,1},
{35,33},
{36,25},
{37,5,4,3,2,1},
{38,6,5,1},
{39,35},
{40,38,21,19},
{41,38},
{42,41,20,19},
{43,42,38,37},
{44,43,18,17},
{45,44,42,41},
{46,45,26,25},
{47,42},
{48,47,21,20},
{49,40},
{50,49,24,23},
{51,50,36,35},
{52,49},
{53,52,38,37},
{54,53,18,17},
{55,31},
{56,55,35,34},
{57,50},
{58,39},
{59,58,38,37},
{60,59},
{61,60,46,45},
{62,61,6,5},
{63,62},
};
//m is upperbound; things break if it isn't positive
UniqueRandom(long m)
{
max = m;
lfsr = seed; //could easily pass a starting point instead
//figure out number of bits
int bits = 0;
long b = m;
while((b >>>= 1) != 0)
{
bits++;
}
bits++;
if(bits < 3)
bits = 3;
mask = 0;
for(int i = 0; i < taps[bits].length; i++)
{
mask |= (1L << (taps[bits][i]-1));
}
}
//return -1 if we've cycled
long next()
{
long ret = -1;
if(lfsr == 0)
return -1;
do {
ret = lfsr;
//update lfsr - from wikipedia
long lsb = lfsr & 1;
lfsr >>>= 1;
if(lsb == 1)
lfsr ^= mask;
if(lfsr == seed)
lfsr = 0; //cycled, stick
ret--; //zero is stuck state, never generated so sub 1 to get it
} while(ret >= max);
return ret;
}
}
Here is a way to random without repeating results. It also works for strings. Its in C# but the logig should work in many places. Put the random results in a list and check if the new random element is in that list. If not than you have a new random element. If it is in that list, repeat the random until you get an element that is not in that list.
List<string> Erledigte = new List<string>();
private void Form1_Load(object sender, EventArgs e)
{
label1.Text = "";
listBox1.Items.Add("a");
listBox1.Items.Add("b");
listBox1.Items.Add("c");
listBox1.Items.Add("d");
listBox1.Items.Add("e");
}
private void button1_Click(object sender, EventArgs e)
{
Random rand = new Random();
int index=rand.Next(0, listBox1.Items.Count);
string rndString = listBox1.Items[index].ToString();
if (listBox1.Items.Count <= Erledigte.Count)
{
return;
}
else
{
if (Erledigte.Contains(rndString))
{
//MessageBox.Show("vorhanden");
while (Erledigte.Contains(rndString))
{
index = rand.Next(0, listBox1.Items.Count);
rndString = listBox1.Items[index].ToString();
}
}
Erledigte.Add(rndString);
label1.Text += rndString;
}
}
For a sequence to be random there should not be any auto correlation. The restriction that the numbers should not repeat means the next number should depend on all the previous numbers which means it is not random anymore....
If you can generate 'small' random numbers, you can generate 'large' random numbers by integrating them: add a small random increment to each 'previous'.
const size_t amount = 100; // a limited amount of random numbers
vector<long int> numbers;
numbers.reserve( amount );
const short int spread = 250; // about 250 between each random number
numbers.push_back( myrandom( spread ) );
for( int n = 0; n != amount; ++n ) {
const short int increment = myrandom( spread );
numbers.push_back( numbers.back() + increment );
}
myshuffle( numbers );
The myrandom and myshuffle functions I hereby generously delegate to others :)
to have non repeated random numbers and to avoid waistingtime with checking for doubles numbers and get new numbers over and over use the below method which will assure the minimum usage of Rand:
for example if you want to get 100 non repeated random number:
1. fill an array with numbers from 1 to 100
2. get a random number using Rand function in the range of (1-100)
3. use the genarted random number as an Index to get th value from the array (Numbers[IndexGeneratedFromRandFunction]
4. shift the number in the array after that Index to the left
5. repeat from step 2 but now the the rang should be (1-99) and go on
now we have a array with different numbers!
int main() {
int b[(the number
if them)];
for (int i = 0; i < (the number of them); i++) {
int a = rand() % (the number of them + 1) + 1;
int j = 0;
while (j < i) {
if (a == b[j]) {
a = rand() % (the number of them + 1) + 1;
j = -1;
}
j++;
}
b[i] = a;
}
}