C++ random number generator without repeating numbers - c++

I have searched high and low for a type of function that turns this code
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
using namespace std;
void ran(int array[], int max);
int main() {
printf("Today's lottery numbers are:\n");
for (int i = 0; i < 6; i++)
srand((unsigned)(NULL));
}
into a random number generator that ensures no repeating numbers can someone help me with it? after the check I plan to print it with printf("%d\n", rand()%50);
I just need a routine that makes sure its non repeating. Please If you can give me a routine I would be greatly relieved and will be sure to pay it forward.
Thanks. The libraries don't seem to be reading right on this scren but they are stdio, stdlib and time and im using namespace.

Why not just use what's already in the STL? Looking at your example code, and assuming it's somewhat representative of what you want to do, everything should be in there. (I assume you need a relatively small range of numbers, so memory wouldn't be a constraint)
Using std::random_shuffle, and an std::vector containing the integers in the range you wish your numbers to be in, should give you a sequence of unique random numbers that you need in your example code.
You will still have to call srand once, and once only, before using std::random_shuffle. Not multiple times like you're doing in your current code example.

If your range of random numbers is finite and small, say you have X different numbers.
Create an array with every single number
Select a random index I between 0 and X, and get its value
Move X value into I position
Decrease X and repeat

You should only call srand once in your code and you should call it with a "random" seed like time(NULL).
By calling srand within the loop, and calling it with a 0 seed each time, you'll get six numbers exactly the same.
However, even with those fixes, rand()%50 may give you the same number twice. What you should be using is a shuffle algorithm like this one since it works exactly the same as the lottery machines.
Here's a complete program showing that in action:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
static void getSix (int *dst) {
int sz, pos, i, src[50];
for (i = 0; i < sizeof(src)/sizeof(*src); i++)
src[i] = i + 1;
sz = 50;
for (i = 0; i < 6; i++) {
pos = rand() % sz;
dst[i] = src[pos];
src[pos] = src[sz-1];
sz--;
}
}
int main (void) {
srand (time (NULL));
int i, numbers[6];
getSix (numbers);
printf ("Numbers are:\n");
for (i = 0; i < sizeof(numbers)/sizeof(*numbers); i++)
printf (" %d\n", numbers[i]);
return 0;
}
Sample runs:
Numbers are:
25
10
26
4
18
1
Numbers are:
39
45
8
18
17
22
Numbers are:
8
6
49
21
40
28
Numbers are:
37
49
45
43
6
40

I would recommend using a better random number generation algorithm that can offer that internally, rather than using rand.
The problem with rand() and trying to prevent repeats is that finding an unused number will slow down with every number added to the used list, eventually becoming a very long process of finding and discarding numbers.
If you were to use a more complex pseudo-random number generator (and there are many, many available, check Boost for a few), you'll have an easier time and may be able to avoid the repeats altogether. It depends on the algorithm, so you'll need to check the documentation.
To do it without using any additional libraries, you could prefill a vector or list with sequential (or even random) numbers, making sure each number is present once in the list. Then, to generate a number, generate a random number and select (and remove) that item from the list. By removing each item as it's used, so long as every item was present once to begin with, you'll never run into a duplicate.

And if you have access to C++0x you can use the new random generator facilities that wrap all of this junk for you!
http://www2.research.att.com/~bs/C++0xFAQ.html#std-random

Related

C++ How to generate 10,000 UNIQUE random integers to store in a BST?

I am trying to generate 10,000 unique random integers in the range of 1 to 20,000 to store in a BST, but not sure the best way to do this.
I saw some good suggestions on how to do it with an array or a vector, but not for a BST. I have a contains method but I don't believe it will work in this scenario as it is used to search and return results on how many tries it took to find the desired number. Below is the closest I've gotten but it doesn't like my == operator. Would it be better to use an array and just store the array in the BST? Or is there a better way to use the below code so that while it's generating the numbers it's just storing them right in the tree?
for (int i = 0; i < 10000; i++)
{
int random = rand() % 20000;
tree1Ptr->add(random);
for (int j = 0; j < i; j++) {
if (tree1Ptr[j]==random) i--;
}
}
There are a couple of problems in your code. But let's go straight to the hurting point.
What's the main problem ?
From your code, it is obvious that tree1Ptr is a pointer. In principle, it should point to a node of the tree, which has two pointers, one to the left node and one to the right node.
So somewhere in your code, you should have:
tree1Ptr = new Node; // or whatever the type of your node is called
However, in your inner loop, you are just using it as if it was an array:
for (int i = 0; i < 10000; i++)
{
int random = rand() % 20000;
tree1Ptr->add(random);
for (int j = 0; j < i; j++) {
if (tree1Ptr[j]==random) //<============ OUCH !!
i--;
}
}
The compiler won't complain, because it's valid syntax: you can use array indexing on a pointer. But it's up to you to ensure that you don not go out of bounds (so here, that j remains <1).
Other remarks
By the way, in the inner loop, you just want to say that you have to retry if the number is found. You can break the inner loop if the number is already found, in order not to continue.
You should also seed your random number generator, to avoid running the program always with the same sequence.
How to solve it ?
You really need to deepen your understanding of BST. Navigating through the node requires make comparison with the value in the current node, and depending on the result, iterate continuing either with the left or the right pointer, not using indexing. But it would be too long to explain here. So may be you should look for a tutorial, like this one
For a lot of unique 'random' numbers I usually use a Format Preserving Encryption. Since encryption is one-to-one, you are guaranteed unique outputs as long as the inputs are unique. A different encryption key will generate a different set of outputs, i.e. a different permutation of the inputs. Simply encrypt 0, 1, 2, 3, 4, ... and the outputs are guaranteed unique.
You want numbers in the range [1 .. 20,000]. Unfortunately 20,000 needs 21 bits and most encryption schemes have an even number of bits: 22 bits in your case. That means you will need to cycle walk; re-encrypt the output if the number is too big until you get a number in the desired range. Since your inputs only go up to 10,000 and you will be cycle walking above 20,000 you will still avoid duplicates.
The only standard cipher I know of which allows a 22 bit block size is Hasty Pudding cipher. Alternatively it is easy enough to write your own simple Feistel cipher. Four rounds are enough if you do not want cryptographic security. For crypto level security you will need to use AES/FFX, which is NIST approved.
There are two ways where you can pick random unique numbers out of a sequence without checking against the numbers previously picked (i.e. already in your BST).
Use random_shuffle
A simple way is to shuffle a sorted array of 1 ... 20,000 and simply pick the first 10,000 items:
#include <algorithm>
#include <vector>
std::vector<int> values(20000);
for (int i = 0; i < 20000; ++i) {
values[i] = i+1;
}
std::random_shuffle(values.begin(), values.end());
for (int i = 0; i < 10000; ++i) {
// Insert values[i] into your BST
}
This method works well if the size of random numbers (10,000) to pick is comparable to the size of total numbers (20,000), because the complexity of random shuffling is amortized over a larger result set.
Use uniform_int_distribution
If the size of random numbers to pick is much smaller than the size of total numbers, then an alternative way can be used:
#include <chrono>
#include <random>
#include <vector>
// Use timed seed so every run produces different random picks.
std::default_random_engine reng(
std::chrono::steady_clock::now().time_since_epoch().count());
int num_pick = 1000; // # of random numbers remained to pick
int num_total = 20000; // Total # of numbers to pick from
int cur_value = 1; // Current prospective number to be picked
while (num_pick > 0) {
// Probability to pick `cur_value` is num_pick / (num_total-cur_value+1)
std::uniform_int_distribution<int> distrib(0, num_total-cur_value);
if (distrib(reng) < num_pick) {
bst.insert(cur_value); // insert `cur_value` to your BST
--num_pick;
}
++cur_value;
}

Trying to produce a unique sequence of random numbers per iteration

As the title states, I'm trying to create a unique sequence of random numbers every time I run this little program.
However, sometimes I get results like:
102
201
102
The code
#include <cstdlib>
#include <ctime>
#include <iostream>
using namespace std;
int main() {
for (int i = 0; i < 3; i++) {
srand (time(NULL)+i);
cout << rand() % 3;
cout << rand() % 3;
cout << rand() % 3 << '\n' << endl;
}
}
Clearly srand doesn't have quite the magical functionality I wanted it to. I'm hoping that there's a logical hack around this though?
Edit1: To clarify, this is just a simple test program for what will be implemented on a larger scale. So instead of 3 iterations of rand%3, I might run 1000, or more of rand%50.
If I see 102 at some point in its operation, I'd want it so that I never see 102 again.
First of all, if you were going to use srand/rand, you'd want to seed it once (and only once) at the beginning of each execution of the program:
int main() {
srand(time(NULL));
for (int i = 0; i < 3; i++) {
cout << rand() % 3;
cout << rand() % 3;
cout << rand() % 3 << '\n' << endl;
}
Second, time typically only produces a result with a resolution of one second, so even with this correction, if you run the program twice in the same second, you can expect it to produce identical results in the two runs.
Third, you don't really want to use srand/rand anyway. The random number generator in <random> are generally considerably better (and, perhaps more importantly, are enough better defined that they represent a much better-known quantity).
#include <random>
#include <iostream>
int main() {
std::mt19937_64 gen { std::random_device()() };
std::uniform_int_distribution<int> d(0, 2);
for (int i = 0; i < 3; i++) {
for (int j=0; j<3; j++)
std::cout << d(gen);
std::cout << "\n";
}
}
Based on the edit, however, this still isn't adequate. What you really want is a random sample without duplication. To get that, you need to do more than just generate numbers. Randomly generated numbers not only can repeat, but inevitably will repeat if you generate enough of them (but the likelihood of repetition becomes quite high even when it's not yet inevitable).
As long as the number of results you're producing is small compared to the number of possible results, you can pretty easily just store results in a set as you produce them, and only treat a result as actual output if it wasn't previously present in the set:
#include <random>
#include <iostream>
#include <set>
#include <iomanip>
int main() {
std::mt19937_64 gen { std::random_device()() };
std::uniform_int_distribution<int> d(0, 999);
std::set<int> results;
for (int i = 0; i < 50;) {
int result = d(gen);
if (results.insert(result).second) {
std::cout << std::setw(5) << result;
++i;
if (i % 10 == 0)
std::cout << "\n";
}
}
}
This becomes quite inefficient if the number of results approaches the number of possible results. For example, let's assume your producing numbers from 1 to 1000 (so 1000 possible results). Consider what happens if you decide to produce 1000 results (i.e., all possible results). In this case, when you're producing the last result, there's really only one possibility left--but rather than just producing that one possibility, you produce one random number after another after another, until you stumble across the one possibility that remains.
For such a case, there are better ways to do the job. For example, you can start with a container holding all the possible numbers. To generate an output, you generate a random index into that container. You output that number, and remove that number from the container, then repeat (but this time, the container is one smaller, so you reduce the range of your random index by one). This way, each random number you produce gives one output.
It is possible to do the same by just shuffling an array of numbers. This has two shortcomings though. First, you need to shuffle them correctly--a Fischer-Yates shuffle works nicely, but otherwise it's easy to produce bias. Second, unless you actually do use all (or very close to all) the numbers in the array, this is inefficient.
For an extreme case, consider wanting a few (10, for example) 64-bit numbers. In this, you start by filling an array with numbers from 264-1. You then do 264-2 swaps. So, you're doing roughly 265 operations just to produce 10 numbers. In this extreme of a case, the problem should be quite obvious. Although it's less obvious if you produce (say) 1000 numbers of 32 bits apiece, you still have the same basic problem, just to a somewhat lesser degree. So, while this is a valid way to do things for a few specific cases, its applicability is fairly narrow.
Generate an array containing the 27 three digit numbers whose digits are less than 3. Shuffle it. Iterate through the shuffled array as needed, values will be unique until you've exhausted them all.
As other people have pointed out, don't keep reseeding your random number generator. Also, rand is a terrible generator, you should use one of the better choices available in C++'s standard libraries.
You are effectively generating a three digit base 3 number. Use your RNG of choice to generate a base 10 number in the range 0 .. 26 and convert it to base 3. That gives 000 .. 222.
If you absolutely must avoid repeats, then shuffle an array as pjs suggests. That will result in later numbers being 'less random' than the earlier numbers because they are taken from a smaller pool.

Random list of numbers

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include <math.h>
int main()
{
int i;
int diceRoll;
for(i=0; i < 20; i++)
{
printf("%d \n", rand());
}
return 0;
}
This is the code I wrote in c (codeblocks) to get random numbers, the problem is I always get the same sequence: 41,18467,6334,26500 etc...
I'm still learning so please try to explain like you're talking with a 8 year old D:
You get the same sequence each time because the seed for the random number generator isn't set. You need to call srand(time(NULL)) like this:
int main()
{
srand(time(NULL));
....
Random number generators are pseudorandom. What this means is that they use some "algorithm" to come up with the next "random" number. In other words, if you start with the same seed to this algorithm, you get the same sequence of random numbers each time. To solve this, you have to make sure to seed your random number generator. Sometimes, it is desirable to use the same seed so that you may deduce if the logic of your program is working correct. Either way, one common way that folks seed their programs is through the use of time(NULL). time gives the time elapsed (in seconds) since the epoch time. What this means is that this function changes every second. Thus, if you seed your random number generator with (srand(time(NULL)) at the beginning of the program, you'll get a different random number sequence every different second that you run your program. Be sure not to seed for every random number that you request. Just do this once at the very beginning of your code and then leave it alone.
Your title says C# but I've answered with C++. You'll want to include ctime for this. It may also be beneficial to look at the new style of random number generation as rand() isn't very random these days. Look into #include random and make yourself an engine and distribution to pull random numbers through. Don't forget to seed there as well!
First of all, seed your random function by including <ctime> and calling srand(time(NULL));.
Secondly, you need a modulo if you're going to call rand(), for example: rand() % x will return a random number from 0 to x-1. Since you're simulating dice rolls, do rand() % 6 + 1.
The line srand((unsigned)(time(NULL)) must be outside the loop, must have this line just once in your code.
The modulo rand()%10 means you get any number starting from 0 going up to what you are modulo by -1. So in this case 0-9, if you want 1-10 you do: rand()%10 + 1
int main()
{
int i;
int diceRoll;
srand((unsigned)(time(NULL));
for(i=0; i < 20; i++)
{
printf("%d \n", rand() % 10); //Gets you numbers 0-9
}
return 0;
}

rand() gives still the same value

I noticed that while practicing by doing a simple console-based quiz app. When I'm using rand() it gives me the same value several times in a row. The smaller number range, the bigger the problem is.
For example
for (i=0; i<10; i++) {
x = rand() % 20 + 1;
cout << x << ", ";
}
Will give me 1, 1, 1, 2, 1, 1, 1, 1, 14, - there are definetely too much ones, right? I usually got from none to 4 odd numbers (rest is just the same, it can also be 11, 11, 11, 4, 11 ...)
Am I doing something wrong? Or rand() is not so random that I thought it is?
(Or is it just some habit from C#/Java that I'm not aware of? It happens a lot to me, too...)
If I run that code a couple of times, I get different output. Sure, not as varied as I'd like, but seemingly not deterministic (although of course it is, since rand() only gives pseudo-random numbers...).
However, the way you treat your numbers isn't going to give you a uniform distribution over [1,20], which I guess is what you expect. To achieve that is rather more complicated, but in no way impossible. For an example, take a look at the documentation for <random> at cplusplus.com - at the bottom there's a showcase program that generates a uniform distribution over [0,1). To get that to [1,20), you simply change the input parameters to the generator - it can give you a uniform distribution over any range you like.
I did a quick test, and called rand() one million times. As you can see in the output below, even at very large sample sizes, there are some nonuniformities in the distribution. As the number of samples goes to infinity, the line will (probably) flatten out, using something like rand() % 20 + 1 gives you a distribution that takes very long time to do so. If you take something else (like the example above) your chances are better at achieving a uniform distribution even for quite small sample sizes.
Edit:
I see several others posting about using srand() to seed the random number generator before using it. This is good advice, but it won't solve your problem in this case. I repeat: seeding is not the problem in this case.
Seeds are mainly used to control the reproducibility of the output of your program. If you seed your random number with a constant value (e.g. 0), the program will give the same output every time, which is useful for testing that everything works the way it should. By seeding with something non-constant (the current time is a popular choice) you ensure that the results vary between different runs of the program.
Not calling srand() at all is the same as calling srand(1), by the C++ standard. Thus, you'll get the same results every time you run the program, but you'll have a perfectly valid series of pseudo-random numbers within each run.
Sounds like you're hitting modulo bias.
Scaling your random numbers to a range by using % is not a good idea. It's just about passable if your reducing it to a range that is a power of 2, but still pretty poor. It is primarily influenced by the smaller bits which are frequently less random with many algorithms (and rand() in particular), and it contracts to the smaller range in a non-uniform fashion because the range your reducing to will not equally divide the range of your random number generator. To reduce the range you should be using a division and loop, like so:
// generate a number from 0 to range-1
int divisor = MAX_RAND/(range+1);
int result;
do
{
result = rand()/divisor;
} while (result >= range);
This is not as inefficient as it looks because the loop is nearly always passed through only once. Also if you're ever going to use your generator for numbers that approach MAX_RAND you'll need a more complex equation for divisor which I can't remember off-hand.
Also, rand() is a very poor random number generator, consider using something like a Mersenne Twister if you care about the quality of your results.
You need to call srand() first and give it the time for parameter for better pseudorandom values.
Example:
#include <iostream>
#include <string>
#include <vector>
#include "stdlib.h"
#include "time.h"
using namespace std;
int main()
{
srand(time(0));
int x,i;
for (i=0; i<10; i++) {
x = rand() % 20 + 1;
cout << x << ", ";
}
system("pause");
return 0;
}
If you don't want any of the generated numbers to repeat and memory isn't a concern you can use a vector of ints, shuffle it randomly and then get the values of the first N ints.
Example:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
int main()
{
//Get 5 random numbers between 1 and 20
vector<int> v;
for(int i=1; i<=20; i++)
v.push_back(i);
random_shuffle(v.begin(),v.end());
for(int i=0; i<5; i++)
cout << v[i] << endl;
system("pause");
return 0;
}
The likely problems are that you are using the same "random" numbers each time and that any int mod 1 is zero. In other words (myInt % 1 == 0) is always true. Instead of %1, use % theBiggestNumberDesired.
Also, seed your random numbers with srand. Use a constant seed to verify that you are getting good results. Then change the seed to make sure you are still getting good results. Then use a more random seed like the clock to teat further. Release with the random seed.

unordered_map maxes out unique keys only with 16 character strings

This code generates a random 16-character string using only A,C,T,G. It then checks whether this sequence is in the hash (unordered_map), and if not, inserts it and points to a dummy placeholder.
In its current form, it hangs at datact=16384 when the 'for i loop' requires 20000 iterations, despite the fact that there are 4^16 strings with ACTG.
But.. if the string length is changed to 8, 9, 10, 11.. to 15, or 17, 18.. it correctly iterates to 20000. Why does unordered_map refuse to hash new sequences, but only when those sequences are 16 characters long?
#include <string>
#include <vector>
#include <unordered_map>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <iostream>
using namespace std;
int main(int argc, char* argv[])
{
string funnelstring;
srand ( time(NULL) );
const int buffersize=10000;
int currentsize=buffersize;
int datact=0;
vector <unsigned int> ctarr(buffersize);
vector <char> nuc(4);
nuc[0]='A';
nuc[1]='C';
nuc[2]='T';
nuc[3]='G';
unordered_map <string,unsigned int*> location;
unsigned int sct;
sct=1;
for (int i=0;i<20000; i++)
{
do
{
funnelstring="";
for (int i=0; i<16; i++)
{ // generate random 16 nucleotide sequence
funnelstring+=nuc[(rand() % 4)];
}
} while (location.find(funnelstring) != location.end()); //asks whether this key has been assigned
ctarr[datact]=sct;
location[funnelstring]=&ctarr[datact]; //assign current key to point to data count
datact++;
cout << datact << endl;
if (datact>=currentsize)
{
ctarr.resize(currentsize+buffersize);
currentsize+=buffersize;
}
}
return 0;
}
As #us2012 said, the problem is your PRNG, and the poor randomness in the lower order bits. Here's a relevant quote:
In Numerical Recipes in C: The Art of Scientific Computing (William H. Press, Brian P. Flannery, Saul A. Teukolsky, William T. Vetterling; New York: Cambridge University Press, 1992 (2nd ed., p. 277)), the following comments are made:
"If you want to generate a random integer between 1 and 10, you should always do it by using high-order bits, as in
j = 1 + (int) (10.0 * (rand() / (RAND_MAX + 1.0)));
and never by anything resembling
j = 1 + (rand() % 10);
(which uses lower-order bits)."
Also, as others have pointed out, you can also use a better, more modern RNG.
The culprit is very likely your random number generator, i.e. the sequence of random numbers from the PRNG became periodic (mod 4) too quickly (most random number generators really produce pseudo-random numbers, hence the name PRNG). Therefore, your do...while loop never quits as it is unable to find a new nucleotide sequence with the random numbers provided.
Two fixes I can think of:
Instead of generating random numbers mod 4, generate them mod 4^length and extract the bit pairs, 00 -> A, 01 -> G, ...
Use a better PRNG, like std::mersenne_twister_engine.
(Disclaimer: I'm not an expert on random numbers. Don't rely on this advice for mission-critical systems, cryptographic requirements, etc.)