getting random numbers larger than RAND_MAX

getting random numbers larger than RAND_MAX - c++

Question 7-9 of Accelerated C++ by Andrew Koenig asks:
7-9. (difficult) The implementation of nrand in §7.4.4/135 will not
work for arguments greater than RAND_MAX. Usually, this restriction is
no problem, because RAND_MAX is often the largest possible integer
anyway. Nevertheless, there are implementations under which RAND_MAX
is much smaller than the largest possible integer. For example, it is
not uncommon for RAND_MAX to be 32767 (2^15 -1) and the largest
possible integer to be 2147483647 (2^31 -1). Reimplement nrand so that
it works well for all values of n.
If n > RAN_MAX my thoughts are to take
double temp = n/RAN_MAX + .5;
int mult = temp;
int randomNum = 0;
for (int i = 0; i != mult; mult++)
randomNum += rand();
then test to see if randomNum < n. Would this work to generate a random number > RAND_MAX? I don't know how to use larger integers than my computer can handle, so I don't think there is any real way to tell.

If you're truly mucking with integers larger than your computer can handle, that's, well, complicated.
But you do have several options for integers big than int, these include: unsigned int, long, unsigned long, long long, unsigned long long in increasing order of bigness. Just how big the numbers become various depending on your architecture.
For instance, on my machine I have the following:
Data Type: Bytes Minimum Maximum
Short SInt: 2 -32768 32767
Short UInt: 2 0 65535
UInt: 4 0 4294967295
SInt: 4 -2147483648 2147483647
ULong: 8 0 18446744073709551615
SLong: 8 -9223372036854775808 9223372036854775807
ULong Long: 8 0 18446744073709551615
SLong Long: 8 -9223372036854775808 9223372036854775807
So, as you can see, you can make numbers much larger than int and 32767.
One way to do this is as follows:
double a=rand()/(double)RAND_MAX;
unsigned long long random_n=(unsigned long long)(BIG_MAXIMUM_NUMBER*a);
However, due to the discrete nature of floating-point numbers, this may mean that some values will just never show up in your output stream.
C++11 has a library which solves both this problem and the problem you mention. An example of its usage is:
const int min = 100000;
const int max = 1000000;
std::default_random_engine generator;
std::uniform_int_distribution<int> distribution(min,max);
int random_int = distribution(generator);
Just change the data types to suit your big needs.
Another way to look at this is that we can interpret rand() as returning a bit-field and that, since it is a uniform PRNG, all bit-fields are equally likely. We can then just make multiple calls to rand() to get multiple equally-likely bit-fields and merge these to make big numbers. Here's how we would do this to make a 16-bit random number from two 8-bit random numbers:
uint16 a=(uint16)(rand()&255);
uint16 b=(uint16)(rand()&255);
uint16 random_int=b<<8 | a;
The rand()&255 keeps only the 8 least significant bits of whatever number rand() returns; that is, it keeps only the last byte of rand().
The (uint16) casts this byte into an unsigned 16-bit number.
a<<8 shifts the bits of a 8 bits to the left, which makes room to safely add b.
But what if rand() returns a signed-value, such that the most-significant bit is always 0 or 1? We can then do the following:
uint16 a=(uint16)(rand()&255);
uint16 b=(uint16)(rand()&255);
uint16 c=(uint16)(rand()&1);
uint16 random_int=c<<14 | b<<7 | a;
We left-shift b only 7-bits so that the 8th least significant bit is random. This means the 14th and 15th least significant bits will be non-random. Since we want to mimic the behaviour of rand(), we leave the 15th least significant bit non-random, and grab a single random bit to left-shift into the 14th LSB's place.

Related

Set all meaningful unset bits of a number

Given an integer n(1≤n≤1018). I need to make all the unset bits in this number as set (i.e. only the bits meaningful for the number, not the padding bits required to fit in an unsigned long long).
My approach: Let the most significant bit be at the position p, then n with all set bits will be 2p+1-1.
My all test cases matched except the one shown below.
Input
288230376151711743
My output
576460752303423487
Expected output
288230376151711743
Code
#include<bits/stdc++.h>
using namespace std;
typedef long long int ll;
int main() {
ll n;
cin >> n;
ll x = log2(n) + 1;
cout << (1ULL << x) - 1;
return 0;
}

The precision of typical double is only about 15 decimal digits.
The value of log2(288230376151711743) is 57.999999999999999994994646087789191106964114967902921472132432244... (calculated using Wolfram Alpha)
Threfore, this value is rounded to 58 and this result in putting a bit 1 to higher digit than expected.
As a general advice, you should avoid using floating-point values as much as possible when dealing with integer values.

You can solve this with shift and or.
uint64_t n = 36757654654;
int i = 1;
while (n & (n + 1) != 0) {
n |= n >> i;
i *= 2;
}
Any set bit will be duplicated to the next lower bit, then pairs of bits will be duplicated 2 bits lower, then quads, bytes, shorts, int until all meaningful bits are set and (n + 1) becomes the next power of 2.
Just hardcoding the maximum of 6 shifts and ors might be faster than the loop.

If you need to do integer arithmetics and count bits, you'd better count them properly, and avoid introducing floating point uncertainty:
unsigned x=0;
for (;n;x++)
n>>=1;
...
(demo)
The good news is that for n<=1E18, x will never reach the number of bits in an unsigned long long. So the rest of you code is not at risk of being UB and you could stick to your minus 1 approach, (although it might in theory not be portable for C++ before C++20) ;-)
Btw, here are more ways to efficiently find the most significant bit, and the simple log2() is not among them.

why declare "score[11] = {};" and "grade" as "unsigned" instead of "int'

I'm new to C++ and is trying to learn the concept of array. I saw this code snippet online. For the sample code below, does it make any difference to declare:
unsigned scores[11] = {};
unsigned grade;
as:
int scores[11] = {};
int grade;
I guess there must be a reason why score[11] = {}; and grade is declared as unsigned, but what is the reason behind it?
int main() {
unsigned scores[11] = {};
unsigned grade;
while (cin >> grade) {
if (0 <= grade <= 100) {
++scores[grade / 10];
}
}
for (int i = 0; i < 11; i++) {
cout << scores[i] << endl;
}
}

unsigned means that the variable will not hold a negative values (or even more accurate - It will not care about the sign-). It seems obvious that scores and grades are signless values (no one scores -25). So, it is natural to use unsigned.
But note that: if (0 <= grade <= 100) is redundant. if (grade <= 100) is enough since no negative values are allowed.
As Blastfurnace commented, if (0 <= grade <= 100) is not right even. if you want it like this you should write it as:
if (0 <= grade && grade <= 100)

Unsigned variables
Declaring a variable as unsigned int instead of int has 2 consequences:
It can't be negative. It provides you a guarantee that it never will be and therefore you don't need to check for it and handle special cases when writing code that only works with positive integers
As you have a limited size, it allows you to represent bigger numbers. On 32 bits, the biggest unsigned int is 4294967295 (2^32-1) whereas the biggest int is 2147483647 (2^31-1)
One consequence of using unsigned int is that arithmetic will be done in the set of unsigned int. So 9 - 10 = 4294967295 instead of -1 as no negative number can be encoded on unsigned int type. You will also have issues if you compare them to negative int.
More info on how negative integer are encoded.
Array initialization
For the array definition, if you just write:
unsigned int scores[11];
Then you have 11 uninitialized unsigned int that have potentially values different than 0.
If you write:
unsigned int scores[11] = {};
Then all int are initialized with their default value that is 0.
Note that if you write:
unsigned int scores[11] = { 1, 2 };
You will have the first int intialized to 1, the second to 2 and all the others to 0.
You can easily play a little bit with all these syntax to gain a better understanding of it.
Comparison
About the code:
if(0 <= grade <= 100)
as stated in the comments, this does not do what you expect. In fact, this will always evaluate to true and therefore execute the code in the if. Which means if you enter a grade of, say, 20000, you should have a core dump. The reason is that this:
0 <= grade <= 100
is equivalent to:
(0 <= grade) <= 100
And the first part is either true (implicitly converted to 1) or false (implicitly converted to 0). As both values are lower than 100, the second comparison is always true.

unsigned integers have some strange properties and you should avoid them unless you have a good reason. Gaining 1 extra bit of positive size, or expressing a constraint that a value may not be negative, are not good reasons.
unsigned integers implement arithmetic modulo UINT_MAX+1. By contrast, operations on signed integers represent the natural arithmetic that we are familiar with from school.
Overflow semantics
unsigned has well defined overflow; signed does not:
unsigned u = UINT_MAX;
u++; // u becomes 0
int i = INT_MAX;
i++; // undefined behaviour
This has the consequence that signed integer overflow can be caught during testing, while an unsigned overflow may silently do the wrong thing. So use unsigned only if you are sure you want to legalize overflow.
If you have a constraint that a value may not be negative, then you need a way to detect and reject negative values; int is perfect for this. An unsigned will accept a negative value and silently overflow it into a positive value.
Bit shift semantics
Bit shift of unsigned by an amount not greater than the number of bits in the data type is always well defined. Until C++20, bit shift of signed was undefined if it would cause a 1 in the sign bit to be shifted left, or implementation-defined if it would cause a 1 in the sign bit to be shifted right. Since C++20, signed right shift always preserves the sign, but signed left shift does not. So use unsigned for some kinds of bit twiddling operations.
Mixed sign operations
The built-in arithmetic operations always operate on operands of the same type. If they are supplied operands of different types, the "usual arithmetic conversions" coerce them into the same type, sometimes with surprising results:
unsigned u = 42;
std::cout << (u * -1); // 4294967254
std::cout << std::boolalpha << (u >= -1); // false
What's the difference?
Subtracting an unsigned from another unsigned yields an unsigned result, which means that the difference between 2 and 1 is 4294967295.
Double the max value
int uses one bit to represent the sign of the value. unsigned uses this bit as just another numerical bit. So typically, int has 31 numerical bits and unsigned has 32. This extra bit is often cited as a reason to use unsigned. But if 31 bits are insufficient for a particular purpose, then most likely 32 bits will also be insufficient, and you should be considering 64 bits or more.
Function overloading
The implicit conversion from int to unsigned has the same rank as the conversion from int to double, so the following example is ill formed:
void f(unsigned);
void f(double);
f(42); // error: ambiguous call to overloaded function
Interoperability
Many APIs (including the standard library) use unsigned types, often for misguided reasons. It is sensible to use unsigned to avoid mixed-sign operations when interacting with these APIs.
Appendix
The quoted snippet includes the expression 0 <= grade <= 100. This will first evaluate 0 <= grade, which is always true, because grade can't be negative. Then it will evaluate true <= 100, which is always true, because true is converted to the integer 1, and 1 <= 100 is true.

Yes it does make a difference. In the first case you declare an array of 11 elements a variable of type "unsigned int". In the second case you declare them as ints.
When the int is on 32 bits you can have values from the following ranges
–2,147,483,648 to 2,147,483,647 for plain int
0 to 4,294,967,295 for unsigned int
You normally declare something unsigned when you don't need negative numbers and you need that extra range given by unsigned. In your case I assume that that by declaring the variables unsigned, the developer doesn't accept negative scores and grades. You basically do a statistic of how many grades between 0 and 10 were introduced at the command line. So it looks like something to simulate a school grading system, therefore you don't have negative grades. But this is my opinion after reading the code.
Take a look at this post which explains what unsigned is:
what is the unsigned datatype?

As the name suggests, signed integers can be negative and unsigned cannot be. If we represent an integer with N bits then for unsigned the minimum value is 0 and the maximum value is 2^(N-1). If it is a signed integer of N bits then it can take the values from -2^(N-2) to 2^(N-2)-1. This is because we need 1-bit to represent the sign +/-
Ex: signed 3-bit integer (yes there are such things)
000 = 0
001 = 1
010 = 2
011 = 3
100 = -4
101 = -3
110 = -2
111 = -1
But, for unsigned it just represents the values [0,7]. The most significant bit (MSB) in the example signifies a negative value. That is, all values where the MSB is set are negative. Hence the apparent loss of a bit in its absolute values.
It also behaves as one might expect. If you increment -1 (111) we get (1 000) but since we don't have a fourth bit it simply "falls off the end" and we are left with 000.
The same applies to subtracting 1 from 0. First take the two's complement
111 = twos_complement(001)
and add it to 000 which yields 111 = -1 (from the table) which is what one might expect. What happens when you increment 011(=3) yielding 100(=-4) is perhaps not what one might expect and is at odds with our normal expectations. These overflows are troublesome with fixed point arithmetic and have to be dealt with.
One other thing worth pointing out is the a signed integer can take one negative value more than it can positive which has a consequence for rounding (when using integer to represent fixed point numbers for example) but am sure that's better covered in the DSP or signal processing forums.

c++ stol throwing out of range exception for string length 10 or longer

I am currently trying to convert a string of length 18 which represents a long integer into a long integer so that I can then do a multiplication with the string. When I use stol to try to do the conversion, it runs into an out of range memory exception anytime the string length exceeds 10. It works fine when the string length is less than 10. Below is the relevant parts of my code:
for (unsigned int i = 0; i < numbers.size(); i++)
numbers[i] = numbers[i].substr(0, 10);
for (unsigned int i = 0; i < numbers.size(); i++)
n = stol(numbers[i].c_str());

This should depend on the implementation. For example, in GCC long and int are the same and on most machines are signed and 32 bits. This means that the largest number you can put there is 2147483647, that is, having 10 digits.
You should probably use stoll as its return type is long long -- 64 bits.

Are you sure long is more than 32 bits on your implementation, in particular can represent that number?
While the standard does not mandate the exact range of the type, it only guarantees numbers in the range +/- 2147483647 will fit.
Anything more is at the discretion of the implementation.

A long only supports values between -2147483648 and 2147483647 inclusive. If you want to try using a long long instead then you'll need to use stoll().

Can n %= m ever return negative value for very large nonnegative n and m?

This question is regarding the modulo operator %. We know in general a % b returns the remainder when a is divided by b and the remainder is greater than or equal to zero and strictly less than b. But does the above hold when a and b are of magnitude 10^9 ?
I seem to be getting a negative output for the following code for input:
74 41 28
However changing the final output statement does the work and the result becomes correct!
#include<iostream>
using namespace std;
#define m 1000000007
int main(){
int n,k,d;
cin>>n>>k>>d;
if(d>n)
cout<<0<<endl;
else
{
long long *dp1 = new long long[n+1], *dp2 = new long long[n+1];
//build dp1:
dp1[0] = 1;
dp1[1] = 1;
for(int r=2;r<=n;r++)
{
dp1[r] = (2 * dp1[r-1]) % m;
if(r>=k+1) dp1[r] -= dp1[r-k-1];
dp1[r] %= m;
}
//build dp2:
for(int r=0;r<d;r++) dp2[r] = 0;
dp2[d] = 1;
for(int r = d+1;r<=n;r++)
{
dp2[r] = ((2*dp2[r-1]) - dp2[r-d] + dp1[r-d]) % m;
if(r>=k+1) dp2[r] -= dp1[r-k-1];
dp2[r] %= m;
}
cout<<dp2[n]<<endl;
}
}
changing the final output statement to:
if(dp2[n]<0) cout<<dp2[n]+m<<endl;
else cout<<dp2[n]<<endl;
does the work, but why was it required?
By the way, the code is actually my solution to this question

This is a limit imposed by the range of int.
int can only hold values between –2,147,483,648 to 2,147,483,647.
Consider using long long for your m, n, k, d & r variables. If possible use unsigned long long if your calculations should never have a negative value.
long long can hold values from –9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
while unsigned long long can hold values from 0 to 18,446,744,073,709,551,615. (2^64)
The range of positive values is approximately halved in signed types compared to unsigned types, due to the fact that the most significant bit is used for the sign; When you try to assign a positive value greater than the range imposed by the specified Data Type the most significant bit is raised and it gets interpreted as a negative value.

Well, no, modulo with positive operands does not produce negative results.
However .....
The int type is only guaranteed by the C standards to support values in the range -32767 to 32767, which means your macro m is not necessarily expanding to a literal of type int. It will fit in a long though (which is guaranteed to have a large enough range).
If that's happening (e.g. a compiler that has a 16-bit int type and a 32-bit long type) the results of your modulo operations will be computed as long, and may have values that exceed what an int can represent. Converting that value to an int (as will be required with statements like dp1[r] %= m since dp1 is a pointer to int) gives undefined behaviour.

Mathematically, there is nothing special about big numbers, but computers only have a limited width to write down numbers in, so when things get too big you get "overflow" errors. A common analogy is the counter of miles traveled on a car dashboard - eventually it will show as all 9s and roll round to 0. Because of the way negative numbers are handled, standard signed integers don't roll round to zero, but to a very large negative number.
You need to switch to larger variable types so that they overflow less quickly - "long int" or "long long int" instead of just "int", the range doubling with each extra bit of width. You can also use unsigned types for a further doubling, since no range is used for negatives.

Is value of variable always negative in case of overflow

It tried to solve this problem, given N,K amd M, find maximum integer T such that N*(K^T) <= M. N,K and M can be the values to 10^18. So long long is sufficient.
I tried to solve it using iteration on T
int T = 0;
long long Kpow = 1;
while(1)
{
long long prod = N*Kpow;
if(prod > M)
break;
T++;
Kpow = Kpow*K;
}
But since N*Kpow may go out of range of long long, there is need to handle the product using some big integer. But I found some other code which smartly handles this case
long long prod = N*Kpow;
if(prod < 0)
break;
Even I have seen always, that in overflow, the value of variable becomes negative. Is it always the case or sometimes even positive values also occur in overflow case?

From the point of view of the language, the behaviour of signed integer overflow is undefined. Which means anythign could happen - it can be negative, it can be unchanged, the program can crash or it can order pizza online.
What will most likely happen in practice depends on the processor architecture on which you're running - so you'd have to consult the platform specs to know.
But I'd guess you can't guarantee overflow to be negative. As a contrived example:
signed char c = 127;
c += 255;
std::cout << (int)c << '\n';
This happens to print 126 on x86. But again, it could actually do anything.

No. The value of variable is not always negative in case of overflow.
With signed integers, C11dr §3.7.1 3 undefined behavior "An example of undefined behavior is the behavior on integer overflow." so there is no test to do after overflow that is certain to work across compilers and platforms.
Detect potential overflow before it can happen.
int T = 0;
long long Kpow = 1;
long long Kpow_Max = LLONG_MAX/K;
long long prod_Max = LLONG_MAX/N;
while(1)
{
if (Kpow > prod_Max) Handle_Overflow();
long long prod = N*Kpow;
if(prod > M)
break;
T++;
if (Kpow > Kpow_Max) Handle_Overflow();
Kpow = Kpow*K;
}

Couldn't this problem be converted to K^T <= (M + N - 1) / N?
As far as overflow detection goes, normally addition and subtraction are performed as if the numbers were unsigned, with the overflow bit set based on signed math, and the carry / borrow bit set based on unsigned math. For multiplication, the low order of the result is the same for signed or unsigned multiply (this is why the ARM cpu only has a signed / unsigned multiply for 64 bit results from 32 bit operands). Overflow occurs if the product is too large to fit in the register that receives the product, like a 32 bit multiply that results in a 39 bit product, that is supposed to go into a 32 bit register. For divide, overflow can occur if the divisor is zero or if the quotient is too large to fit in the register that receives the quotient, for example a 64 bit dividend divided by a 32 bit divisor, resulting in a 40 bit quotient. For multiply and divide, it doesn't matter if the operands are signed or not, only if the size of the result will fit in the register that receives the result.

Just as in any other situation with signed integers of any length...overflow makes the number negative if and only if the specific bit which is overflowed into the sign bit is on.
Meaning if the result of your arithmetic would give you an overflow which, if you were to double the length of your variable word, would leave your current sign bit as off, you could quite possibly come up with an erroneous result which is positive.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js