I am reading book called the c programming language by Brian W.Kernighan and Dennis M.Ritchie. I cannot understand the function that is written in the book for generating pseudo-random number it is like this;
unsigned long int next = 1;
int rand(void)
{
next = next * 1103515243 + 12345;
return (unsigned int)(next / 65536) % 32768;
}
void srand(unsigned int seed)
{
next = seed;
}
I also tried my self. But I only came up with the following observations
65536 = is the value of 16 bit unsigned + 1 bit
32768 = is the value of
16 bit signed + 1 bit
but I am not able to figure out the whole process .
This is the book written by the legends and I want to understand this book.
Please if anybody can help me to figure out this problem I will feel very very fortunate.
Pseudo Random Number Generators are a very complex subject. You could learn it for years, and get a PhD on it. As commented, read also about linear congruential generator (it looks like your code is an example in some C standard)
In C on POSIX systems, you have random(3) (and also lrand48(3), sort-of obsolete); In C++11 you have <random>
The /65536 operation might be compiled as >>16 a right shift of 16 bits.
The %32768 operation could be optimized as a bitmask (same as &0x7fff) keeping 15 least bits.
This hasn't an accepted answer yet, so let's try one.
As noted by Basile Starynkevitch, what is implemented here is a pseudo-random number generator (RNG) from the class of linear congruential generators (LCGs). These in general take the form of a sequence
X := (a * X + c) mod m
The starting value for X is called the seed (same as in the code). Of course c < m and a < m. Often also a << c. Both c and m are usually large numbers chosen so that the whole sequence does reasonably well in the spectral test, but you probably don't have to care about that to understand the basic mode of operation. If you are a little bit into number theory, you will probably see that the sequence repeats after a while (it is periodic).
Random numbers are generated, by first seeding X with a starting seed. For each generated number, the sequence is cycled and a subset of the bits of X are returned.
In the code from the question, a = 1103515245, c = 12345, and
m is implicitly pow(2, 8 * sizeof(unsigned long)) by virtue of unsigned integer overflow. These are also the values ISO/IEC 9899, i.e. the C language standard suggests.
With this known, the first pitfall is probably this statement:
return (unsigned int)(next / 65536) % 32768;
Kernighan and Ritchie probably thought that using only simple arithmetic is more readable and more portable than using bit masks and bit shifts. The above is equivalent to
return (unsigned int)(next >> 16) & 0x7fff
which selects bits 16-30 from next. You get back a pseudo-random number in the range [0;32767]. The bit range is also the one suggested in the C standard.
WARNING: It is well known that this LCG does --while widely deployed, because it's noted in the standard-- not produce very good pseudo-random numbers (the version in GLIBc is even worse). Distinctively, it is absolutely unsafe to use for cryptographic applications. With the few number of random bits, I would not even use it for any Monte Carlo method, because results may be severely skewed by the quality of the RNG.
So in short: Try to understand it: yes, you are welcome. Use it for anything: no.
Related
I hope some guru here could help me out
I am writing a C/C++ code to implement Consistent Hashing using SHA1 as the hashing algorithm.
I need to implement module operation as follow :
0100 0011....0110 100 mod 0010 1001 = ?
If the divisor (0000 1111) is a power of 2 pow(2,n) , then it would be easy as the last n bit of dividend is the result.
SHA1 length is 160 bit ( or 40 hexa ). My question is how do I implement a modulo operation of a long bit string to another arbitrary bit string?
Thank you.
As user stark points out (more rudely than necessary, I think) in comments, this is just long division in base 2^32. Or in base 2, with more digits. And you can use the algorithm for long division that you learned in school for base 10.
There are fancier algorithms for doing division on much bigger numbers, but for your application I suspect you can afford to do it very simply and inefficiently, along the following lines (this is the base-2 version, which just amounts to subtracting off left-shifted versions of the denominator until you can no longer do so):
// Treat x,y as 160-bit numbers, where [0] is least significant and
// [4] is most significant. Compute x mod y, and put the result in out.
void mod160(unsigned int out[5], const unsigned int x[5], const unsigned y[5]) {
unsigned int temp[5];
copy160(out, x);
copy160(temp, y);
int n=0;
// Find first 1-bit in quotient.
while (lessthanhalf160(temp, x)) { lshift160(temp); ++n; }
while (n>=0) {
if (!less160(out, temp)) sub160(out, temp); // quotient bit is 1
rshift160(temp); --n; // proceed to next bit of quotient
}
}
For the avoidance of doubt, the above is only a crude sketch:
It may be full of bugs.
I haven't written the implementations of the building-block functions like less160.
Actually you'd probably just put the code for those inline rather than having separate functions. E.g., copy160 is just five assignments, or a short loop, or a memcpy.
It could surely be made more efficient; e.g., that first step might do better to count leading 0-bits and then do a single shift, instead of shifting one place at a time. (The right-shifting probably doesn't want to do that, though, because half the time you will be doing only a single shift.)
The "base-2^32" version may well be faster, but the implementation will be a bit more complicated.
I want to know that what is the time complexity of this piece of code which is Russian peasant Implementation
unsigned long long int russian(unsigned long long int a, unsigned long long int b) {
unsigned long long int res = 0;
while (b > 0) {
if (b & 1)
res = res + a;
a <<= 1;
b >>= 1;
}
return res % mod;
}
As far as my knowledge i think its time complexity is either lg2b or lg2a(depending upon our choice of a or b) .Any expert comment?
The time complexity of the piece of code you supplied is, of course, O(1), because there is an upper bound on how long it can take and will never exceed that upper bound on any inputs.
Presumably, that's not the answer to the question you actually mean to ask. There are actually several different things you might be actually interested in, and they all actually have different answers.
(also, since you seem to be trying to do a modular multiply, you really should be reducing all relevant quantities inside the loop so that you don't overflow, and so that you can use - instead of %)
You might be interested in having a precise estimate of the wall-clock time. Obtaining this will actually require gathering some empirical data, but it will probably look something like
A + B bitlength(b) + C popcount(b)
(popcount is the number of 1s in the binary expansion) for some constants A, B, and C. However, CPU hardware is actually rather complicated, and it might actually be extremely involved to get a good estimate for the third term above, since branch prediction hardware might do some odd things.
And A, B, and C probably aren't even constants; they will depend to some extent on whether this function gets inlined, and the sort of code surrounding the places where it's used.
Now, you might want a more abstract answer where b can be of arbitrary size, rather than constrained to be the size of an unsigned long long, and want to count the number of arithmetic operations. This is very clearly just the bit length of b, or as the comments indicate, O(lg(b)). (where lg is the log base 2)
Now, you might actually be interested not just in the arithmetic operations, but their cost. And might be interested in a being of arbitrary size rather than constrained to be an unsigned long long. A useful unit of measure would be bit operations. e.g. doing a left-shift by 1 on an N-bit number ought to cost O(N) bit operations.
I'm pretty sure the loop works out to O(lg(a)lg(b)+lg(b)^2) bit operations. (this doesn't include the % operation you do afterwards)
I'm new to programming.
I want to know exactly what rand() does.
Searching only yields examples on its usage. But none explain each step of how the function generates a random number. They treat rand() as a blackbox.
I want to know what rand() is doing; each step.
Is there a resource that will allow me to see exactly what rand() does?
This is all open source stuff isn't it? I'll settle for the disassembly, if there's no source.
I know it returns a random number, but how does it generate that number? I want to see each step.
Thank you.
Here is the current glibc implementation:
/* Return a random integer between 0 and RAND_MAX. */
int
rand (void)
{
return (int) __random ();
}
That's not much help, but __random eventually calls __random_r:
/* If we are using the trivial TYPE_0 R.N.G., just do the old linear
congruential bit. Otherwise, we do our fancy trinomial stuff, which is the
same in all the other cases due to all the global variables that have been
set up. The basic operation is to add the number at the rear pointer into
the one at the front pointer. Then both pointers are advanced to the next
location cyclically in the table. The value returned is the sum generated,
reduced to 31 bits by throwing away the "least random" low bit.
Note: The code takes advantage of the fact that both the front and
rear pointers can't wrap on the same call by not testing the rear
pointer if the front one has wrapped. Returns a 31-bit random number. */
int
__random_r (buf, result)
struct random_data *buf;
int32_t *result;
{
int32_t *state;
if (buf == NULL || result == NULL)
goto fail;
state = buf->state;
if (buf->rand_type == TYPE_0)
{
int32_t val = state[0];
val = ((state[0] * 1103515245) + 12345) & 0x7fffffff;
state[0] = val;
*result = val;
}
else
{
int32_t *fptr = buf->fptr;
int32_t *rptr = buf->rptr;
int32_t *end_ptr = buf->end_ptr;
int32_t val;
val = *fptr += *rptr;
/* Chucking least random bit. */
*result = (val >> 1) & 0x7fffffff;
++fptr;
if (fptr >= end_ptr)
{
fptr = state;
++rptr;
}
else
{
++rptr;
if (rptr >= end_ptr)
rptr = state;
}
buf->fptr = fptr;
buf->rptr = rptr;
}
return 0;
fail:
__set_errno (EINVAL);
return -1;
}
This was 10 seconds of googling:
gcc implementation of rand()
How is the rand()/srand() function implemented in C
implementation of rand()
...
http://gcc.gnu.org/onlinedocs/gcc-4.6.2/libstdc++/api/a01206.html
http://www.gnu.org/software/libc/manual/html_node/Pseudo_002dRandom-Numbers.html
I was gonna list the actual search, but seeing this is clearly a dupe, I'll just vote as dupe
You can browse the source code for different implementations of the C standard.
The question has been answered before, you might find what you're looking for at What common algorithms are used for C's rand()?
That answer provides code for glibc's implementation of rand()
The simplest reasonably good pseudo-random number generators are Linear Congruential Generators (LCGs). These are iterations of a formula such as
X_{n+1} = (a * X_n + c) modulo m
The constants a, c, and m are chosen to given unpredictable sequences. X_0 is the random seed value. Many other algorithms exists, but this is probably enough to get you going.
Really good pseudo-random number generators are more complex, such as the Mersenne Twister.
I guess, THIS is what you are looking for. It contains the detailed explanation of random function, and simple C program to understand the algo.
Edit:
You should check THIS as well. A possible duplicate.
Well, I believe rand is from the C standard library, not the C++ standard library. There is no one implementation of either library, there are several.
You could go somewhere like this page to view the source code for glibc, the c library used on most Linux distributions. For glibc you'd find it in source files under stdlib such as rand.c and random.c.
A different implementation, such as uClibc might be easier to read. Try here under the libc/stdlib folder.
Correct me if I'm wrong, but although this answer points to part of the implementation, I found that there is more to rand() used in stdlib, which is from [glibc][2]. From the 2.32 version obtained from here, the stdlib folder contains a random.c file which explains that a simple linear congruential algorithm is used. The folder also has rand.c and rand_r.c which can show you more of the source code. stdlib.h contained in the same folder will show you the values used for macros like RAND_MAX.
/* An improved random number generation package. In addition to the
standard rand()/srand() like interface, this package also has a
special state info interface. The initstate() routine is called
with a seed, an array of bytes, and a count of how many bytes are
being passed in; this array is then initialized to contain
information for random number generation with that much state
information. Good sizes for the amount of state information are
32, 64, 128, and 256 bytes. The state can be switched by calling
the setstate() function with the same array as was initialized with
initstate(). By default, the package runs with 128 bytes of state
information and generates far better random numbers than a linear
congruential generator. If the amount of state information is less
than 32 bytes, a simple linear congruential R.N.G. is used.
Internally, the state information is treated as an array of longs;
the zeroth element of the array is the type of R.N.G. being used
(small integer); the remainder of the array is the state
information for the R.N.G. Thus, 32 bytes of state information
will give 7 longs worth of state information, which will allow a
degree seven polynomial. (Note: The zeroth word of state
information also has some other information stored in it; see setstate
for details). The random number generation technique is a linear
feedback shift register approach, employing trinomials (since there
are fewer terms to sum up that way). In this approach, the least
significant bit of all the numbers in the state table will act as a
linear feedback shift register, and will have period 2^deg - 1
(where deg is the degree of the polynomial being used, assuming
that the polynomial is irreducible and primitive). The higher order
bits will have longer periods, since their values are also
influenced by pseudo-random carries out of the lower bits. The
total period of the generator is approximately deg*(2deg - 1); thus
doubling the amount of state information has a vast influence on the
period of the generator. Note: The deg*(2deg - 1) is an
approximation only good for large deg, when the period of the shift
register is the dominant factor. With deg equal to seven, the
period is actually much longer than the 7*(2**7 - 1) predicted by
this formula. */
I am reading a program which contains the following function, which is
int f(int n) {
int c;
for (c=0;n!=0;++c)
n=n&(n-1);
return c;
}
I don't quite understand what does this function intend to do?
It counts number of 1's in binary representation of n
The function is INTENDED to return the number of bits in the representation of n. What is missed out in the other answers is, that the function invokes undefined behaviour for arguments n < 0. This is because the function peels the number away one bit a time, starting from the lowest bit to the highest. For a negative number this means, that the last value of n before the loop terminates (for 32-bit integers in 2-complement) is 0x8000000. This number is INT_MIN and it is now used in the loop for the last time:
n = n&(n-1)
Unfortunately, INT_MIN-1 is a overflow and overflows invoke undefined behavior. A conforming implementation is not required to "wrap around" integers, it may for example issue an overflow trap instead or leave all kinds of weird results.
It is a (now obsolete) workaround for the lack of the POPCNT instruction in non-military cpu's.
This counts the number of iterations it takes to reduce n to 0 by using a binary and.
The expression n = n & (n - 1) is a bitwise operation which replaces the rightmost bit '1' to '0' in n.
For examle, take an integer 5 (0101). Then n & (n - 1) → (0101) & (0100) → 0100 (removes first '1' bit from right side).
So the above code returns the number of 1's in binary form of given integer.
It shows a way how not to program(for the x86 Instruction set), using a intrinsic/inline assembler instruction is faster and better to read for something simple like this. (but this is only true for a x86 Architecture as far as i know, i don't know how's it about ARM or SPARC or something else)
Could it be it tries to return the number of significant bits in n? (Haven't thought through it completely...)
In an app I'm profiling, I found that in some scenarios this function is able to take over 10% of total execution time.
I've seen discussion over the years of faster sqrt implementations using sneaky floating-point trickery, but I don't know if such things are outdated on modern CPUs.
MSVC++ 2008 compiler is being used, for reference... though I'd assume sqrt is not going to add much overhead though.
See also here for similar discussion on modf function.
EDIT: for reference, this is one widely-used method, but is it actually much quicker? How many cycles is SQRT anyway these days?
Yes, it is possible even without trickery:
sacrifice accuracy for speed: the sqrt algorithm is iterative, re-implement with fewer iterations.
lookup tables: either just for the start point of the iteration, or combined with interpolation to get you all the way there.
caching: are you always sqrting the same limited set of values? if so, caching can work well. I've found this useful in graphics applications where the same thing is being calculated for lots of shapes the same size, so results can be usefully cached.
Hello from 11 years in the future.
Considering this still gets occasional votes, I thought I'd add a note about performance, which now even more than then is dramatically limited by memory accesses. You absolutely must use a realistic benchmark (ideally, your whole application) when optimising something like this - the memory access patterns of your application will have a dramatic effect on solutions like lookup tables and caches, and just comparing 'cycles' for your optimised version will lead you wildly astray: it is also very difficult to assign program time to individual instructions, and your profiling tool may mislead you here.
On a related note, consider using simd/vectorised instructions for calculating square roots, like _mm512_sqrt_ps or similar, if they suit your use case.
Take a look at section 15.12.3 of intel's optimisation reference manual, which describes approximation methods, with vectorised instructions, which would probably translate pretty well to other architectures too.
There's a great comparison table here:
http://assemblyrequired.crashworks.org/timing-square-root/
Long story short, SSE2's ssqrts is about 2x faster than FPU fsqrt, and an approximation + iteration is about 4x faster than that (8x overall).
Also, if you're trying to take a single-precision sqrt, make sure that's actually what you're getting. I've heard of at least one compiler that would convert the float argument to a double, call double-precision sqrt, then convert back to float.
You're very likely to gain more speed improvements by changing your algorithms than by changing their implementations: Try to call sqrt() less instead of making calls faster. (And if you think this isn't possible - the improvements for sqrt() you mention are just that: improvements of the algorithm used to calculate a square root.)
Since it is used very often, it is likely that your standard library's implementation of sqrt() is nearly optimal for the general case. Unless you have a restricted domain (e.g., if you need less precision) where the algorithm can take some shortcuts, it's very unlikely someone comes up with an implementation that's faster.
Note that, since that function uses 10% of your execution time, even if you manage to come up with an implementation that only takes 75% of the time of std::sqrt(), this still will only bring your execution time down by 2,5%. For most applications users wouldn't even notice this, except if they use a watch to measure.
How accurate do you need your sqrt to be? You can get reasonable approximations very quickly: see Quake3's excellent inverse square root function for inspiration (note that the code is GPL'ed, so you may not want to integrate it directly).
Don't know if you fixed this, but I've read about it before, and it seems that the fastest thing to do is replace the sqrt function with an inline assembly version;
you can see a description of a load of alternatives here.
The best is this snippet of magic:
double inline __declspec (naked) __fastcall sqrt(double n)
{
_asm fld qword ptr [esp+4]
_asm fsqrt
_asm ret 8
}
It's about 4.7x faster than the standard sqrt call with the same precision.
Here is a fast way with a look up table of only 8KB. Mistake is ~0.5% of the result. You can easily enlarge the table, thus reducing the mistake. Runs about 5 times faster than the regular sqrt()
// LUT for fast sqrt of floats. Table will be consist of 2 parts, half for sqrt(X) and half for sqrt(2X).
const int nBitsForSQRTprecision = 11; // Use only 11 most sagnificant bits from the 23 of float. We can use 15 bits instead. It will produce less error but take more place in a memory.
const int nUnusedBits = 23 - nBitsForSQRTprecision; // Amount of bits we will disregard
const int tableSize = (1 << (nBitsForSQRTprecision+1)); // 2^nBits*2 because we have 2 halves of the table.
static short sqrtTab[tableSize];
static unsigned char is_sqrttab_initialized = FALSE; // Once initialized will be true
// Table of precalculated sqrt() for future fast calculation. Approximates the exact with an error of about 0.5%
// Note: To access the bits of a float in C quickly we must misuse pointers.
// More info in: http://en.wikipedia.org/wiki/Single_precision
void build_fsqrt_table(void){
unsigned short i;
float f;
UINT32 *fi = (UINT32*)&f;
if (is_sqrttab_initialized)
return;
const int halfTableSize = (tableSize>>1);
for (i=0; i < halfTableSize; i++){
*fi = 0;
*fi = (i << nUnusedBits) | (127 << 23); // Build a float with the bit pattern i as mantissa, and an exponent of 0, stored as 127
// Take the square root then strip the first 'nBitsForSQRTprecision' bits of the mantissa into the table
f = sqrtf(f);
sqrtTab[i] = (short)((*fi & 0x7fffff) >> nUnusedBits);
// Repeat the process, this time with an exponent of 1, stored as 128
*fi = 0;
*fi = (i << nUnusedBits) | (128 << 23);
f = sqrtf(f);
sqrtTab[i+halfTableSize] = (short)((*fi & 0x7fffff) >> nUnusedBits);
}
is_sqrttab_initialized = TRUE;
}
// Calculation of a square root. Divide the exponent of float by 2 and sqrt() its mantissa using the precalculated table.
float fast_float_sqrt(float n){
if (n <= 0.f)
return 0.f; // On 0 or negative return 0.
UINT32 *num = (UINT32*)&n;
short e; // Exponent
e = (*num >> 23) - 127; // In 'float' the exponent is stored with 127 added.
*num &= 0x7fffff; // leave only the mantissa
// If the exponent is odd so we have to look it up in the second half of the lookup table, so we set the high bit.
const int halfTableSize = (tableSize>>1);
const int secondHalphTableIdBit = halfTableSize << nUnusedBits;
if (e & 0x01)
*num |= secondHalphTableIdBit;
e >>= 1; // Divide the exponent by two (note that in C the shift operators are sign preserving for signed operands
// Do the table lookup, based on the quaternary mantissa, then reconstruct the result back into a float
*num = ((sqrtTab[*num >> nUnusedBits]) << nUnusedBits) | ((e + 127) << 23);
return n;
}