How to improve this random number generator code in c++? - c++

I am C++ student and I am working on creating a random number generator.
Infact I should say my algorithm selects a number within a defined range.
I am writing this just because of my curiosity.
I am not challenging existing library functions.
I always use library functions when writing applications based on randomness but I am again stating that I just want to make it because of my curiosity.
I would also like to know if there is something wrong with my algorithm or with my approach.
Because i googled how PRNGs work and on some sites they said that a mathematical algorithm is there and a predefined series of numbers and a seed just sets the pointer in a different point in the series and after some intervals the sequence repeats itself.
My algorithm just starts moving to and fro in the array of possible values and the seed breaks the loop with different values each time. I don't i this approach is wrong. I got answers suggesting a different algorithm but they didn't explain What's wrong with my current algorithm?
Yes,there was a problem with my seed as it was not precise and made results little predictable as here:-
cout<
<
rn(50,100);
The results in running four times are 74,93,56,79.
See the pattern of "increasing order".
And for large ranges patterns could be seen easily.I got an answer on getting good seeds but that too recommended a new algorithm(but didn't say why?).
An alternative way could be to shuffle my array randomly generating a new sequence every time.And the pattern of increasing order will go off.Any help with that rearranging too will also be good.Here is the code below.And if my function is not possible please notify me.
Thanking you in anticipation.
int rn(int lowerlt, int upperlt)
{
/* Over short ranges, results are satisfactory.
* I want to make it effective for big ranges.
*/
const int size = upperlt - lowerlt; // Constant size of the integer array.
int ar[size]; // Array to store all possible values within defined range.
int i, x, ret; // Variables to control loops and return value.
long pointer = 0; //pointer variable. The one which breaks the main loop.
// Loop to initialize the array with possible values..
for (i=0, x=lowerlt; x <= upperlt; i++, x++)
ar[i]=x;
long seed = time(0);
//Main loop . To find the random number.
for (i=0; pointer <= seed; i++, pointer++)
{
ret = ar[i];
if (i == size-1)
{
// Reverse loop.
for (; i >= 0; i--)
{
ret=ar[i];
}
}
}
return ret;
}

Caveat: From your post, aside from your random generator algorithm, one of your problems is getting a good seed value, so I'll address that part of it.
You could use /dev/random to get a seed value. That would be a great place to start [and would be sufficient on its own], but might be considered "cheating" from some perspective.
So, here are some other sources of "entropy":
Use a higher resolution time of day clock source: gettimeofday or clock_gettime(CLOCK_REALTIME,...) call it "cur_time". Use only the microsecond or nanosecond portion respectively, call it "cur_nano". Note that cur_nano is usually pretty random all by itself.
Do a getpid(2). This has a few unpredictable bits because between invocations other programs are starting and we don't know how many.
Create a new temp file and get the file's inode number [then delete it]. This varies slowly over time. It may be the same on each invocation [or not]
Get the high resolution value for the system's time of day clock when the system was booted, call it "sysboot".
Get the high resolution value for the start time of your "session": When your program's parent shell was started, call it "shell_start".
If you were using Linux, you could compute a checksum of /proc/interrupts as that's always changing. For other systems, get some hash of the number of interrupts of various types [should be available from some type of syscall].
Now, create some hash of all of the above (e.g.):
dev_random * cur_nano * (cur_time - sysboot) * (cur_time - shell_start) *
getpid * inode_number * interrupt_count
That's a simple equation. You could enhance it with some XOR and/or sum operations. Experiment until you get one that works for you.
Note: This only gives you the seed value for your PRNG. You'll have to create your PRNG from something else (e.g. earl's linear algorithm)

unsigned int Random::next() {
s = (1664525 * s + 1013904223);
return s;
}
's' is growing with every call of that function.
Correct is
unsigned int Random::next() {
s = (1664525 * s + 1013904223) % xxxxxx;
return s;
}
Maybe use this function
long long Factor = 279470273LL, Divisor = 4294967291LL;
long long seed;
next()
{
seed = (seed * Factor) % Divisor;
}

Related

Wrong outcome while summing big arrays

I keep getting wrong outcome, while I try to sum big arrays. I have isolated problem to the following code sample (its not an sum of a big array, but I believe this is The problem)
Compilable Sample:
template <typename T>
void cpu_sum(const unsigned int size, T & od_value) {
od_value = 0;
for (unsigned int i = 0; i < size; i++) {
od_value += 1;
}
}
int main() {
typedef float Data;
const unsigned int size = 800000000;
Data sum;
cpu_sum(size, sum);
cout << setprecision(35) << sum << endl; // prints: 16777216 // ERROR !!!
getchar();
}
Environment:
OS: Windows 8.1 x64 home
IDE: Microsoft Visual Studio 2015
Error Description:
While my outcome should obviously be sum == 800000000 I keep getting sum == 16777216.
That is very weird for me, because float max value is far above this one, and yet it looks like sum variable reach its limit.
What did I miss??
It is a well known problem. Gradually your sum becomes so big that next summand becomes comparable with an epsilon (about 10^-14) of it. At that moment you start loosing precision.
Standard solution is to change summation tactics: when array larger than, say, 100 elements, split it in halves and sum each half separately. It goes on recursively and tend to keep precision much better.
As pointed out by Michael Simbursky, this is a well-known problem when trying to sum a large number of limited-precision numbers. The solution he provides works well, and is quite efficient.
For the curious reader, a slower method is to sort your array before computing the sum, assuring that the values are added in order of increasing absolute value. This ensures that the least-significant values make their contributions to the overall sum before being overwhelmed by the other values. This technique is used more as an example/illustration than in any serious programming ventures.
For serious scientific programming where the overall size of the data can vary a great deal, another algorithm to be aware of is the Kahan Summation Algorithm, and the referenced wikipedia page provides a nice description of it.

Program works only if I keep displaying something until solution is found

I have made a program that generates a random array of numbers. I am aware that if I make a for structure to place random numbers, there might be a chance that I will have values that will repeat. For that I made a separate recursive function that keeps looking for duplicates and replace them with other values until there will be only distinct numbers:
#include <iostream>
#include <stdlib.h>
#include <time.h>
using namespace std;
int n, m, k, row[100];
int verif(int v[100], int k)
{
for(int i=0; i<k-1; i++)
for(int j=i+1; j<k; j++)
if(v[i]==v[j])
{
cout<<".";
srand(time(NULL));
v[i]=rand()%100+1;
return verif(v, k);
}
return 1;
}
int main()
{
k=10, n=10;
srand(time(NULL));
row[0]=rand()%n+1;
srand(row[0]);
for(int i=1; i<k; i++)
{
srand(row[i-1]);
row[i]=rand()%n+1;
}
verif(row, k);
for(int i=0; i<k; i++)
cout<<row[i]<<" ";
return 0;
}
I hope you can explain me why does a simple cout inside verif makes my program to work and why without it nothing works.
This is a result of calling a function recursively for an extended period of time, likely triggering a stack overflow on your machine.
This expression
srand(time(NULL));
seeds the random number generator for rand() to the value of the present number of seconds since the epoch. Naturally this number changes only once per second. Reseeding the rand generator will get you the same sequence as last time, so the result of rand()%100 + 1 on the following line will remain the same until the result of time(NULL) changes.
Now that you've ensured that v[i]==v[j] will be true until at least a second has passed since the initial assignment, you recursively call this function again, (and again (and again...)) until you get a new value from your "random" number generating procedure (at least a second has passed) or you run out of space on the stack. (you crash)
You can clearly see this happening much more clearly by using a function which holds up execution for longer than is taken by the cout statement, for instance
std::this_thread::sleep_for(std::chrono::milliseconds{500});
You can see such an effect here on coliru
With the sleeping function (which holds the execution up for 500 milliseconds every recursion) you get ~13 recursions.
Without it you get ~2846000 recursions. That's quite a difference.
There are better ways to get a set of non-repeating random numbers in a range, as shown in the answers to this question. I'm partial to Darklighter's answer myself. For that answer if you want, say, 10 elements out of 100, generate the vector with 100 elements, shuffle and then resize it to include only the first 10.
The cout solves the problem because it takes time to print something on the console. All the rest of the code takes roughly 0 time, thus each time you do
srand(time(NULL));
v[i]=rand()%100+1;
You will get exactly the same number, as you always use the same seed. Then I am not 100% sure what happens, it might be a stackoverflow because you keep calling the function recursively forever, or it might be something else. Actually it doesnt really matter, because the real problem is your seed.
Only with the cout you will eventually get a differnt seed and also a different random number.
You should seed the rng only once. And btw if you want random numbers you should use the rngs from <random> and not rand (actually it should be deprecated because of its low quality).

The fastest way to retrieve 16k Key-Value pairs?

OK, here's my situation :
I have a function - let's say U64 calc (U64 x) - which takes a 64-bit integer parameter, performs some CPU-intensive operation, and returns a 64-bit value
Now, given that I know ALL possible inputs (the xs) of that function beforehand (there are some 16000 though), I thought it might be better to pre-calculate them and then fetch them on demand (from an array-like structure).
The ideal situation would be to store them all in some array U64 CALC[] and retrieve them by index (the x again)
And here's the issue : I may know what the possible inputs for my calc function are, but they are most definitely NOT consecutive (e.g. not from 1 to 16000, but values that may go as low as 0 and as high as some trillions - always withing a 64-bit range)
E.G.
X CALC[X]
-----------------------
123123 123123123
12312 12312312
897523 986123
etc.
And here comes my question :
How would you store them?
What workaround would you prefer?
Now, given that these values (from CALC) will have to be accessed some thousands-to-millions of times, per sec, which would be the best solution performance-wise?
Note : I'm no mentioning anything I've thought of or tried so as not to turn the answers into some debate of A vs B type-of-thing, and mostly not influence anyone...
I would use some sort of hash function that creates an index to a u64 pair where one is the value the key was created from and the other the replacement value. Technically the index could be three bytes long (assuming 16 million -"16000 thousand" - pairs) if you need to conserve space but I'd use u32s. If the stored value does not match the value computed on (hash collision) you'd enter an overflow handler.
You need to determine a custom hashing algorithm to fit your data
Since you know the size of the data you don't need algorithms that allow the data to grow.
I'd be wary of using some standard algorithm because they seldom fit specific data
I'd be wary of using a C++ method unless you are sure the code is WYSIWYG (doesn't generate a lot of invisible calls)
Your index should be 25% larger than the number of pairs
Run through all possible inputs and determine min, max, average and standard deviation for the number of collisions and use these to determine the acceptable performance level. Then profile to achieve the best possible code.
The required memory space (using a u32 index) comes out to (4*1.25)+8+8 = 21 bytes per pair or 336 MeB, no problem on a typical PC.
________ EDIT________
I have been challenged by "RocketRoy" to put my money where my mouth is. Here goes:
The problem has to do with collision handling in a (fixed size) hash index. To set the stage:
I have a list of n entries where a field in the entry contains the value v that the hash is computed from
I have a vector of n*1.25 (approximately) indeces such that the number of indeces x is a prime number
A prime number y is computed which is a fraction of x
The vector is initialized to say -1 to denote unoccupied
Pretty standard stuff I think you'll agree.
The entries in the list are processed and the hash value h computed and modulo'd and used as an index into the vector and the index to the entry is placed there.
Eventually I encounter the situation where the vector entry pointed to by the index is occupied (doesn't contain -1) - voilĂ , a collision.
So what do I do? I keep the original h as ho, add y to h and take modulo x and get a new index into the vector. If the entry is unoccupied I use it, otherwise I continue with add y modulo x until I reach ho. In theory, this will happen because both x and y are prime numbers. In practice x is larger than n so it won't.
So the "re-hash" that RocketRoy claims is very costly is no such thing.
The tricky part with this method - as with all hashing methods - is to:
Determine a suitable value for x (becomes less tricky the larger x finally used)
Determine the algorithm a for h=a(v)%x such that a the h's index reasonably evenly ("randomly") into the index vector with as few collisions as possible
Determine a suitable value for y such that collisions index reasonably evenly ("randomly") into the index vector
________ EDIT________
I'm sorry I've taken so long to produce this code ... other things have had higher priorities.
Anyway, here is the code which proves that hashing has better prospects for quick lookups than a binary tree. It runs through a bunch of hashing index sizes and algorithms to aid in finding the most suitable combo for the specific data. For every algorithm the code will print the first index size such that no lookup takes longer than fourteen searches (worst case for binary searching) and an average lookup takes less than 1.5 searches.
I have a fondness for prime numbers in these types of applications, in case you haven't noticed.
There are many ways of creating a hashing algorithm with its mandatory overflow handling. I opted for simplicity assuming it will translate into speed ... and it does. On my laptop with an i5 M 480 # 2.67 GHz an average lookup requires between 55 and 60 clock cycles (comes out to around 45 million lookups per second). I implemented a special get operation with a constant number of indeces and ditto rehash value and the cycle count dropped to 40 (65 million lookups per second). If you look at the line calling getOpSpec the index i is xor'ed with 0x444 to exercise the caches to achieve more "real world"-like results.
I must again point out that the program suggests suitable combinations for the specific data. Other data may require a different combo.
The source code contains both the code for generating the 16000 unsigned long long pairs and for testing different constants (index sizes and rehash values):
#include <windows.h>
#define i8 signed char
#define i16 short
#define i32 long
#define i64 long long
#define id i64
#define u8 char
#define u16 unsigned short
#define u32 unsigned long
#define u64 unsigned long long
#define ud u64
#include <string.h>
#include <stdio.h>
u64 prime_find_next (const u64 value);
u64 prime_find_previous (const u64 value);
static inline volatile unsigned long long rdtsc_to_rax (void)
{
unsigned long long lower,upper;
asm volatile( "rdtsc\n"
: "=a"(lower), "=d"(upper));
return lower|(upper<<32);
}
static u16 index[65536];
static u64 nindeces,rehshFactor;
static struct PAIRS {u64 oval,rval;} pairs[16000] = {
#include "pairs.h"
};
struct HASH_STATS
{
u64 ninvocs,nrhshs,nworst;
} getOpStats,putOpStats;
i8 putOp (u16 index[], const struct PAIRS data[], const u64 oval, const u64 ci)
{
u64 nworst=1,ho,h,i;
i8 success=1;
++putOpStats.ninvocs;
ho=oval%nindeces;
h=ho;
do
{
i=index[h];
if (i==0xffff) /* unused position */
{
index[h]=(u16)ci;
goto added;
}
if (oval==data[i].oval) goto duplicate;
++putOpStats.nrhshs;
++nworst;
h+=rehshFactor;
if (h>=nindeces) h-=nindeces;
} while (h!=ho);
exhausted: /* should not happen */
duplicate:
success=0;
added:
if (nworst>putOpStats.nworst) putOpStats.nworst=nworst;
return success;
}
i8 getOp (u16 index[], const struct PAIRS data[], const u64 oval, u64 *rval)
{
u64 ho,h,i;
i8 success=1;
ho=oval%nindeces;
h=ho;
do
{
i=index[h];
if (i==0xffffu) goto not_found; /* unused position */
if (oval==data[i].oval)
{
*rval=data[i].rval; /* fetch the replacement value */
goto found;
}
h+=rehshFactor;
if (h>=nindeces) h-=nindeces;
} while (h!=ho);
exhausted:
not_found: /* should not happen */
success=0;
found:
return success;
}
volatile i8 stop = 0;
int main (int argc, char *argv[])
{
u64 i,rval,mulup,divdown,start;
double ave;
SetThreadAffinityMask (GetCurrentThread(), 0x00000004ull);
divdown=5; //5
while (divdown<=100)
{
mulup=3; // 3
while (mulup<divdown)
{
nindeces=16000;
while (nindeces<65500)
{
nindeces= prime_find_next (nindeces);
rehshFactor=nindeces*mulup/divdown;
rehshFactor=prime_find_previous (rehshFactor);
memset (index, 0xff, sizeof(index));
memset (&putOpStats, 0, sizeof(struct HASH_STATS));
i=0;
while (i<16000)
{
if (!putOp (index, pairs, pairs[i].oval, (u16) i)) stop=1;
++i;
}
ave=(double)(putOpStats.ninvocs+putOpStats.nrhshs)/(double)putOpStats.ninvocs;
if (ave<1.5 && putOpStats.nworst<15)
{
start=rdtsc_to_rax ();
i=0;
while (i<16000)
{
if (!getOp (index, pairs, pairs[i^0x0444]. oval, &rval)) stop=1;
++i;
}
start=rdtsc_to_rax ()-start+8000; /* 8000 is half of 16000 (pairs), for rounding */
printf ("%u;%u;%u;%u;%1.3f;%u;%u\n", (u32)mulup, (u32)divdown, (u32)nindeces, (u32)rehshFactor, ave, (u32) putOpStats.nworst, (u32) (start/16000ull));
goto found;
}
nindeces+=2;
}
printf ("%u;%u\n", (u32)mulup, (u32)divdown);
found:
mulup=prime_find_next (mulup);
}
divdown=prime_find_next (divdown);
}
SetThreadAffinityMask (GetCurrentThread(), 0x0000000fu);
return 0;
}
It was not possible to include the generated pairs file (an answer is apparently limited to 30000 characters). But send a message to my inbox and I'll mail it.
And these are the results:
3;5;35569;21323;1.390;14;73
3;7;33577;14389;1.435;14;60
5;7;32069;22901;1.474;14;61
3;11;35107;9551;1.412;14;59
5;11;33967;15427;1.446;14;61
7;11;34583;22003;1.422;14;59
3;13;34253;7901;1.439;14;61
5;13;34039;13063;1.443;14;60
7;13;32801;17659;1.456;14;60
11;13;33791;28591;1.436;14;59
3;17;34337;6053;1.413;14;59
5;17;32341;9511;1.470;14;61
7;17;32507;13381;1.474;14;62
11;17;33301;21529;1.454;14;60
13;17;34981;26737;1.403;13;59
3;19;33791;5333;1.437;14;60
5;19;35149;9241;1.403;14;59
7;19;33377;12289;1.439;14;97
11;19;34337;19867;1.417;14;59
13;19;34403;23537;1.430;14;61
17;19;33923;30347;1.467;14;61
3;23;33857;4409;1.425;14;60
5;23;34729;7547;1.429;14;60
7;23;32801;9973;1.456;14;61
11;23;33911;16127;1.445;14;60
13;23;33637;19009;1.435;13;60
17;23;34439;25453;1.426;13;60
19;23;33329;27529;1.468;14;62
3;29;32939;3391;1.474;14;62
5;29;34543;5953;1.437;13;60
7;29;34259;8263;1.414;13;59
11;29;34367;13033;1.409;14;60
13;29;33049;14813;1.444;14;60
17;29;34511;20219;1.422;14;60
19;29;33893;22193;1.445;13;61
23;29;34693;27509;1.412;13;92
3;31;34019;3271;1.441;14;60
5;31;33923;5449;1.460;14;61
7;31;33049;7459;1.442;14;60
11;31;35897;12721;1.389;14;59
13;31;35393;14831;1.397;14;59
17;31;33773;18517;1.425;14;60
19;31;33997;20809;1.442;14;60
23;31;34841;25847;1.417;14;59
29;31;33857;31667;1.426;14;60
3;37;32569;2633;1.476;14;61
5;37;34729;4691;1.419;14;59
7;37;34141;6451;1.439;14;60
11;37;34549;10267;1.410;13;60
13;37;35117;12329;1.423;14;60
17;37;34631;15907;1.429;14;63
19;37;34253;17581;1.435;14;60
23;37;32909;20443;1.453;14;61
29;37;33403;26177;1.445;14;60
31;37;34361;28771;1.413;14;59
3;41;34297;2503;1.424;14;60
5;41;33587;4093;1.430;14;60
7;41;34583;5903;1.404;13;59
11;41;32687;8761;1.440;14;60
13;41;34457;10909;1.439;14;60
17;41;34337;14221;1.425;14;59
19;41;32843;15217;1.476;14;62
23;41;35339;19819;1.423;14;59
29;41;34273;24239;1.436;14;60
31;41;34703;26237;1.414;14;60
37;41;33343;30089;1.456;14;61
3;43;34807;2423;1.417;14;59
5;43;35527;4129;1.413;14;60
7;43;33287;5417;1.467;14;61
11;43;33863;8647;1.436;14;60
13;43;34499;10427;1.418;14;78
17;43;34549;13649;1.431;14;60
19;43;33749;14897;1.429;13;60
23;43;34361;18371;1.409;14;59
29;43;33149;22349;1.452;14;61
31;43;34457;24821;1.428;14;60
37;43;32377;27851;1.482;14;81
41;43;33623;32057;1.424;13;59
3;47;33757;2153;1.459;14;61
5;47;33353;3547;1.445;14;61
7;47;34687;5153;1.414;13;59
11;47;34519;8069;1.417;14;60
13;47;34549;9551;1.412;13;59
17;47;33613;12149;1.461;14;61
19;47;33863;13687;1.443;14;60
23;47;35393;17317;1.402;14;59
29;47;34747;21433;1.432;13;60
31;47;34871;22993;1.409;14;59
37;47;34729;27337;1.425;14;59
41;47;33773;29453;1.438;14;60
43;47;31253;28591;1.487;14;62
3;53;33623;1901;1.430;14;59
5;53;34469;3229;1.430;13;60
7;53;34883;4603;1.408;14;59
11;53;34511;7159;1.412;13;59
13;53;32587;7963;1.453;14;60
17;53;34297;10993;1.432;13;80
19;53;33599;12043;1.443;14;64
23;53;34337;14897;1.415;14;59
29;53;34877;19081;1.424;14;61
31;53;34913;20411;1.406;13;59
37;53;34429;24029;1.417;13;60
41;53;34499;26683;1.418;14;59
43;53;32261;26171;1.488;14;62
47;53;34253;30367;1.437;14;79
3;59;33503;1699;1.432;14;61
5;59;34781;2939;1.424;14;60
7;59;35531;4211;1.403;14;59
11;59;34487;6427;1.420;14;59
13;59;33563;7393;1.453;14;61
17;59;34019;9791;1.440;14;60
19;59;33967;10937;1.447;14;60
23;59;33637;13109;1.438;14;60
29;59;34487;16943;1.424;14;59
31;59;32687;17167;1.480;14;61
37;59;35353;22159;1.404;14;59
41;59;34499;23971;1.431;14;60
43;59;34039;24799;1.445;14;60
47;59;32027;25471;1.499;14;62
53;59;34019;30557;1.449;14;61
3;61;35059;1723;1.418;14;60
5;61;34351;2803;1.416;13;60
7;61;35099;4021;1.412;14;59
11;61;34019;6133;1.442;14;60
13;61;35023;7459;1.406;14;88
17;61;35201;9803;1.414;14;61
19;61;34679;10799;1.425;14;101
23;61;34039;12829;1.441;13;60
29;61;33871;16097;1.446;14;60
31;61;34147;17351;1.427;14;61
37;61;34583;20963;1.412;14;59
41;61;32999;22171;1.452;14;62
43;61;33857;23857;1.431;14;98
47;61;34897;26881;1.431;14;60
53;61;33647;29231;1.434;14;60
59;61;32999;31907;1.454;14;60
3;67;32999;1471;1.455;14;61
5;67;35171;2621;1.403;14;59
7;67;33851;3533;1.463;14;61
11;67;34607;5669;1.437;14;60
13;67;35081;6803;1.416;14;61
17;67;33941;8609;1.417;14;60
19;67;34673;9829;1.427;14;60
23;67;35099;12043;1.415;14;60
29;67;33679;14563;1.452;14;61
31;67;34283;15859;1.437;14;60
37;67;32917;18169;1.460;13;61
41;67;33461;20443;1.441;14;61
43;67;34313;22013;1.426;14;60
47;67;33347;23371;1.452;14;61
53;67;33773;26713;1.434;14;60
59;67;35911;31607;1.395;14;58
61;67;34157;31091;1.431;14;63
3;71;34483;1453;1.423;14;59
5;71;34537;2423;1.428;14;59
7;71;33637;3313;1.428;13;60
11;71;32507;5023;1.465;14;79
13;71;35753;6529;1.403;14;59
17;71;33347;7963;1.444;14;61
19;71;35141;9397;1.410;14;59
23;71;32621;10559;1.475;14;61
29;71;33637;13729;1.429;14;60
31;71;33599;14657;1.443;14;60
37;71;34361;17903;1.396;14;59
41;71;33757;19489;1.435;14;61
43;71;34583;20939;1.413;14;59
47;71;34589;22877;1.441;14;60
53;71;35353;26387;1.418;14;59
59;71;35323;29347;1.406;14;59
61;71;35597;30577;1.401;14;59
67;71;34537;32587;1.425;14;59
3;73;34613;1409;1.418;14;59
5;73;32969;2251;1.453;14;62
7;73;33049;3167;1.448;14;61
11;73;33863;5101;1.435;14;60
13;73;34439;6131;1.456;14;60
17;73;33629;7829;1.455;14;61
19;73;34739;9029;1.421;14;60
23;73;33071;10399;1.469;14;61
29;73;33359;13249;1.460;14;61
31;73;33767;14327;1.422;14;59
37;73;32939;16693;1.490;14;62
41;73;33739;18947;1.438;14;60
43;73;33937;19979;1.432;14;61
47;73;33767;21739;1.422;14;59
53;73;33359;24203;1.435;14;60
59;73;34361;27767;1.401;13;59
61;73;33827;28229;1.443;14;60
67;73;34421;31583;1.423;14;71
71;73;33053;32143;1.447;14;60
3;79;35027;1327;1.410;14;60
5;79;34283;2161;1.432;14;60
7;79;34439;3049;1.432;14;60
11;79;34679;4817;1.416;14;59
13;79;34667;5701;1.405;14;59
17;79;33637;7237;1.428;14;60
19;79;34469;8287;1.417;14;60
23;79;34439;10009;1.433;14;60
29;79;33427;12269;1.448;13;61
31;79;33893;13297;1.445;14;61
37;79;33863;15823;1.439;14;60
41;79;32983;17107;1.450;14;60
43;79;34613;18803;1.431;14;60
47;79;33457;19891;1.457;14;61
53;79;33961;22777;1.435;14;61
59;79;32983;24631;1.465;14;60
61;79;34337;26501;1.428;14;60
67;79;33547;28447;1.458;14;61
71;79;32653;29339;1.473;14;61
73;79;34679;32029;1.429;14;64
3;83;35407;1277;1.405;14;59
5;83;32797;1973;1.451;14;60
7;83;33049;2777;1.443;14;61
11;83;33889;4483;1.431;14;60
13;83;35159;5503;1.409;14;59
17;83;34949;7151;1.412;14;59
19;83;32957;7541;1.467;14;61
23;83;32569;9013;1.470;14;61
29;83;33287;11621;1.474;14;61
31;83;33911;12659;1.448;13;60
37;83;33487;14923;1.456;14;62
41;83;33587;16573;1.438;13;60
43;83;34019;17623;1.435;14;60
47;83;31769;17987;1.483;14;62
53;83;33049;21101;1.451;14;61
59;83;32369;23003;1.465;14;61
61;83;32653;23993;1.469;14;61
67;83;33599;27109;1.437;14;61
71;83;33713;28837;1.452;14;61
73;83;33703;29641;1.454;14;61
79;83;34583;32911;1.417;14;59
3;89;34147;1129;1.415;13;60
5;89;32797;1831;1.461;14;61
7;89;33679;2647;1.443;14;73
11;89;34543;4261;1.427;13;60
13;89;34603;5051;1.419;14;60
17;89;34061;6491;1.444;14;60
19;89;34457;7351;1.422;14;79
23;89;33529;8663;1.450;14;61
29;89;34283;11161;1.431;14;60
31;89;35027;12197;1.411;13;59
37;89;34259;14221;1.403;14;59
41;89;33997;15649;1.434;14;60
43;89;33911;16127;1.445;14;60
47;89;34949;18451;1.419;14;59
53;89;34367;20443;1.434;14;60
59;89;33791;22397;1.430;14;59
61;89;34961;23957;1.404;14;59
67;89;33863;25471;1.433;13;60
71;89;35149;28031;1.414;14;79
73;89;33113;27143;1.447;14;60
79;89;32909;29209;1.458;14;61
83;89;33617;31337;1.400;14;59
3;97;34211;1051;1.448;14;60
5;97;34807;1789;1.430;14;60
7;97;33547;2417;1.446;14;60
11;97;35171;3967;1.407;14;89
13;97;32479;4349;1.474;14;61
17;97;34319;6011;1.444;14;60
19;97;32381;6337;1.491;14;64
23;97;33617;7963;1.421;14;59
29;97;33767;10093;1.423;14;59
31;97;33641;10739;1.447;14;60
37;97;34589;13187;1.425;13;60
41;97;34171;14437;1.451;14;60
43;97;31973;14159;1.484;14;62
47;97;33911;16127;1.445;14;61
53;97;34031;18593;1.448;14;80
59;97;32579;19813;1.457;14;61
61;97;34421;21617;1.417;13;60
67;97;33739;23297;1.448;14;60
71;97;33739;24691;1.435;14;60
73;97;33863;25471;1.433;13;60
79;97;34381;27997;1.419;14;59
83;97;33967;29063;1.446;14;60
89;97;33521;30727;1.441;14;60
Cols 1 and 2 are used to calculate a rough relationship between the rehash value and the index size. The next two are the first index size/rehash factor combination which averages less than 1.5 searches for a lookup with a worst case of 14 searches. Then average and worst case. Finally, the last column is the average number of clock cycles per lookup. It does not take into account the time required to read the time stamp register.
The actual memory space for the best constants (# of indeces = 31253 and rehash factor = 28591) comes out to more than I initially indicated (16000*2*8 + 1,25*16000*2 => 296000 bytes). The actual size is 16000*2*8+31253*2 => 318506.
The fastest combination is an approximate ratio of 11/31 with an index size of 35897 and rehash value of 12721. This will average 1.389 (1 initial hash + 0.389 rehashes) with a maximum of 14 (1+13).
________ EDIT________
I removed the "goto found;" in main () to show all combinations and it shows that much better performance is possible, of course at the expense of a larger index size. For example the combination 57667 and 33797 yields and average of 1.192 and a maximum rehash of 6. The combination 44543 and 23399 yields a 1.249 average and 10 maximum rehashes (it saves (57667-44543)*2=26468 bytes of index table compared to 57667/33797).
Specialized functions with hard-coded hash index size and rehash factor will execute in 60-70% of the time compared to variables. This is probably due to the compiler (gcc 64-bit) substituting the modulo with multiplications and not having to fetch the values from memory locations as they will be coded as immediate values.
________ EDIT________
On the subject of caches I see two issues.
The first is data cacheing which I don't think will be possible because the lookup will just be a small step in some larger process and you run the risk of the table data's cache lines begin invalidated to a lesser or (probably) greater degree - if not entirely - by other data accesses in other steps of the larger process. I e the more code executed and data accessed in the process as a whole the less likely it will be that any pertinent lookup data will remain in the caches (this may or may not be pertinent to the OP's situation). To find an entry using (my) hashing you will encounter two cache misses (one to load the correct part of the index, and the other to load the area containg the entry itself) for every comparison that needs to be performed. Finding an entry on the first try will have cost two misses, the second try four etc. In my example the 60 clock cycle average cost per lookup implies that the table probably resided entirely in the L2 cache and with L1 not having to go there in a majority of the cases. My x86-64 CPU has L1-3, RAM wait states of approximately 4, 10, 40 and 100 which to me shows that RAM was completely kept out and L3 mostly.
The second is code cacheing which will have a more significant impact if it is small, tight, in-lined and with few control transfers (jumps and calls). My hash routine probably resides entirely in the L1 code cache. For more normal cases, the fewer the number of code cache line loads the faster it will be.
Make an array of structures of key val pairs.
Sort the array by key, put this in your program as static array, would only be 128kbyte.
Then in your program a simple binary look up by key will need on average only 14 key comparisons to find the right value. Should be able to approach speeds of 300 million look ups per second on modern pc.
You can sort with qsort and search with bsearch, both std lib functions.
Perform memonization, or in simple terms, cache the values you've computed already and calculate the new ones. You should hash the input and check the cache for that result. You can even start off with a set of cache values that you think the function would get called more often for. Besides that, I don't think you need to go to any extreme as the other answer suggest. Do things simple and when you are done with your application you can use a profiling tool to find bottle necks.
EDIT: Some code
#include <iostream>
#include <ctime>
using namespace std;
const int MAX_SIZE = 16000;
int preCalcData[MAX_SIZE] = {};
int getPrecalculatedResult(int x){
return preCalcData[x];
}
void setupPreCalcDataCache(){
for(int i = 0; i < MAX_SIZE; ++i){
preCalcData[i] = i*i; //or whatever calculation
}
}
int main(){
setupPreCalcDataCache();
cout << getPrecalculatedResult(0) << endl;
cout << getPrecalculatedResult(15999) << endl;
return 0;
}
I wouldn't worry about performance too much. This simple example, using an array and binary search lower_bound
#include <stdint.h>
#include <algorithm>
#include <cstdlib>
#include <iostream>
#include <memory>
const int N = 16000;
typedef std::pair<uint64_t, uint64_t> CALC;
CALC calc[N];
static inline bool cmp_calcs(const CALC &c1, const CALC &c2)
{
return c1.first < c2.first;
}
int main(int argc, char **argv)
{
std::iostream::sync_with_stdio(false);
for (int i = 0; i < N; ++i)
calc[i] = std::make_pair(i, i);
std::sort(&calc[0], &calc[N], cmp_calcs);
for (long i = 0; i < 10000000; ++i) {
int r = rand() % 16000;
CALC *p = std::lower_bound(&calc[0], &calc[N], std::make_pair(r, 0), cmp_calcs);
if (p->first == r)
std::cout << "found\n";
}
return 0;
}
and compiled with
g++ -O2 example.cpp
does, including setup, 10,000,000 searches in about 2 seconds on my 5 year old PC.
You need to store 16 thousand values efficiently, preferably in memory. We are assuming that the computation of these values is more time consuming than accessing them from storage.
You have at your disposal many different data structures to get the job done, including databases. If you access these values in queriable chunks, then the DB overhead may very well be absorbed and spread accross your processing.
You mentioned map and hashmap (or hashtable) already in your question tags, but these are probably not the best possible answers for your problem, although they could do a fair job, provided that the hashing function isn't more expensive than the direct computation of the target UINT64 value, which has to be your reference benchmark.
Van Emde Boas Trees
Many variants of B-Trees (used extensively in database engines, high performance filesystems),
Tries
Are probably much better suited. Having some experience with it, I would probably go for a B-tree: they support fairly well serialization. That should let you prepare your dataset in advance in a different program. VEB trees have a very good access time (O(log log(n)), but I don't know how easily they may be serialized.
Later on, if you need even more performance, it would also be interesting to know usage patterns of your "database" to figure out what caching techniques you could implement on top of the store.
Using std::pair is better than any of map for speed.
but if I were you, I firstly use a std::list to store the data, after I got them all, I move them into a simple vector, then retrieving goes very fast if you implement a simple binary tree search by yourself.

Python's random.uniform() in C++?

I've been ripping my hair out for a while now with these random numbers in C++.
In Python, I had the awesome:
random.uniform(0, 1)
Which churned out a new random number each time I called it.
C++ has to have something like this. I Googled for a long time, and found erand48(), which I plan to implement into my raytracer (I'm translating it from Python to C++).
I tried a simple test case, but I was hoping to create a random_uniform() function which always spits out a new random number (using time() isn't going to work AFAICT, as this will be running really quickly)
unsigned short Xi[3] = {0, 1, 27};
std::cout << erand48(Xi);
And the output was (and will be every time I call the program):
0.174529
I tried using the previous output as the new Xi, like this (Xis initial value was defined):
float random_uniform() {
long generated = erand48(Xi);
int temp = generated * 1000000;
unsigned short Xi[3] = {temp - 16, temp - 7, temp - 18};
return generated;
}
But that doesn't seem like it would generate random enough numbers (and it only spits out 0. I', not sure why...).
Is there any way that I could make a function which spits out a new random number each time?
Not being familiar with python, I'm going to assume that random.uniform(0, 1) spits out a random number from a uniform distribution between 0 and 1?
If so, try the following:
srand(time(NULL));
float myRand = ((float) rand( )) / RAND_MAX;
Note that your mileage may vary though - rand() is not guaranteed to be a very high quality random number source. You may get visual artifacts depending on the implementation and how you're using it. I use a Mersenne Twister random number generator in my raytracer.
EDIT: Just to be a little clearer, that's:
...
void init_program ( )
{
...
srand(time(NULL));
...
}
...
float random_uniform ( )
{
return ((float) rand( )) / RAND_MAX;
}
...
If you call both srand and rand at the same time, you're likely to get the same "random" number each time, because unless the time has changed in the meantime you'll seed the generator with exactly the same seed, and hence get exactly the same first random number.
Is there a reason you're not just using rand(), which is part of the C++ standard library? The function you refer to is part of the Single UNIX Specification, but is probably not portable...
The reason you're getting the same value over and over is because you're not seeding the random number generator, so it behaves deterministically. Use srand() or srand48() if you're OK with the UNIX-specific functions and need the 48-bit precision. Only call it once at the beginning of your program, and pass it time(NULL). Don't worry about using time(), since you're only calling this function once, you don't need to worry about always getting the same time value.
You may also want to refer to this page about how to take the return value from rand() and scale it to the range you want effectively:
USING THE C OR C++ rand() FUNCTION
EDIT: Go back and read the docs for erand48 again. You'll notice it returns a double in the range [0, 1). You're putting that in a long, which is probably why you're always seeing zero (the value is truncated to an integer). I also think that it's probably updating the values of the unsigned long array for you with the current number in the sequence -- you provide the storage, it uses it to "keep track" of the sequence.
Try:
unsigned short xsubi[3] = { 0, 1, 27 };
for (int i = 0; i < 10; ++i)
{
double value = erand48(xsubi);
printf("%f\n", value);
}
Yeah- it's called std::rand(), comes as Standard and ships in <cstdlib>.
http://msdn.microsoft.com/en-us/library/398ax69y(v=VS.100).aspx
Might want to look into Boost.Random.

C++ To Cuda Conversion/String Generation And Comparison

So I am in a basic High School coding class. We had to think up one
of our semester projects. I chose to
base mine on ideas and applications
that arn't used in traditional code.
This brought up the idea for use of
CUDA. One of the best ways I would
know to compare speed of traditional
methods versus unconventional is
string generation and comparison. One
could demonstrate the generation and
matching speed of traditional CPU
generation with timers and output. And
then you could show the increase(or
decrease) in speed and output of GPU
Processing.
I wrote this C++ code to generate random characters that are input into
a character array and then match that
array to a predetermined string.
However like most CPU programming it
is incredibly slow comparatively to
GPU programming. I've looked over CUDA
API and could not find something that
would possibly lead me in the right
direction for what I'm looking to do.
Below is the code I have written in C++, if anyone could point me in
the direction of such things as a
random number generator that I can
convert to chars using ASCII codes,
that would be excellent.
#include <iostream>
#include <string>
#include <cstdlib>
using namespace std;
int sLength = 0;
int count = 0;
int stop = 0;
int maxValue = 0;
string inString = "aB1#";
static const char alphanum[] =
"0123456789"
"!##$%^&*"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz";
int stringLength = sizeof(alphanum) - 1;
char genRandom()
{
return alphanum[rand() % stringLength];
}
int main()
{
cout << "Length of string to match?" << endl;
cin >> sLength;
string sMatch(sLength, ' ');
while(true)
{
for (int x = 0; x < sLength; x++)
{
sMatch[x] = genRandom();
//cout << sMatch[x];
count++;
if (count == 2147000000)
{
count == 0;
maxValue++;
}
}
if (sMatch == inString)
{
cout << "It took " << count + (maxValue*2147000000) << " randomly generated characters to match the strings." << endl;
cin >> stop;
}
//cout << endl;
}
}
If you want to implement a pseudorandom number generator using CUDA, have a look over here. If you want to generate chars from a predetermined set of characters, you can just put all possible chars into that array and create a random index (just as you are doing it right now).
But I think it might be more valuable comparison might be one that uses brute force. Therefore, you could adapt your program to try not random strings, but try one string after another in any meaningful order.
Then, on the other hand, you could implement the brute-force stuff on the GPU using CUDA. This can be tricky since you might want to stop all CUDA threads as soon as one of them finds a solution. I could imagine the brute force process using CUDA the following way: One thread tries aa as first two letters and brute-forces all following digits, the next thread tries ab as first two letters and brute-forces all following digits, the next thread tries ac as first two letters and brute-forces all following digits, and so on. All these threads run in parallel. Of course, you could vary the number of predetermined chars such that e.g. the first thread tries aaaa, the second aaab. Then, you could compare different input values.
Any way, if you have never dealt with CUDA, I recommend the vector addition sample, a very basic CUDA example, that serves very well for getting a basic understanding of what's going on with CUDA. Moreover, you should read the CUDA programming guide to make yourself familiar with CUDAs concept of a grid of thread-blocks containing a grid of threads. Once you understand this, I think it becomes clearer how CUDA organizes stuff. To be short, in CUDA, you should replace loops with a kernel, that is executed multiple times at once.
First off, I am not sure what your actual question is? Do you need a faster random number generator or one with a greater period? In that case I would recommend boost::random, the "Mersenne Twister" is generally considered state of the art. It is a little hard to get started, but boost is a great library so worth the effort.
I think the method you arer using should be fairly efficient. Be aware that it could take up to (#characters)^(length of string) draws to get to the target string (here 70^4 = 24010000). GPU should be at an advantage here since this process is a Monte Carlo simulation and trivially parallelizable.
Have you compiled the code with optimizations?