Why BKDFHash doesn't care about out of range issue? - c++

unsigned int BKDRHash(const std::string& str){
unsigned int seed = 131; // 31 131 1313 13131 131313 etc..
unsigned int hash = 0;
for(std::size_t i = 0; i < str.length(); i++)
{
hash = (hash * seed) + str[i];
}
return hash;}
Why we don't need to care about if hash goes out of the range of unsigned int in the above code? I've seen several example code that did nothing about the overflow problem. Why it still works? What will happen when the value hash go out of the range of unsigned int?

It works because overflow doesn't actually happen for unsigned integer types per the standard:
3.9.1 Fundamental types [basic.fundamental]
Unsigned integers shall obey the laws of arithmetic modulo 2n where n is the
number of bits in the value representation of that particular size of
integer.48
This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting
unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type.
Ex: if an unsigned int arithmetic result x would otherwise exceed UINT_MAX, the result is exactly:
x % (UINT_MAX+1)
thereby leaving you a result within 0...UINT_MAX

Related

Why are these two numbers comparing equal?

I'm wondering why it's the case that these two numbers are comparing equal. I had a (maybe false) realisation that I messed up my enums, because in my enums I often do:
enum class SomeFlags : unsigned long long { ALL = ~0, FLAG1 = 1, FLAG2 };
And I thought that 0 being an int and being assigned to an unsigned long long, that there was a mistake.
unsigned long long number1 = ~0;
/* I expect the 0 ( which is an int, to be flipped, which would make it
the max value of an unsigned int. It would then get promoted and be
assigned that value to an unsigned long long. So number1 should
have the value of max value of unsigned int*/
int main()
{
unsigned long long number2 = unsigned long long(0);
number2 = number2 - 1;
/* Here I expect number2 to have the max value of unsigned long long*/
if (number1 == number2)
{
std::cout << "Is same\n";
// They are the same, but how???
}
}
This is really confusing. To emphasise the point:
unsigned long long number1 = int(~0);
/* I feel like I'm saying:
- initialize an int with the value of zero with its bits flipped
- assign it to the unsigned long long
And this somehow ends up as the max value of unsigned long long???*/
cppreference: "The result of operator~ is the bitwise NOT (one's complement) value of the argument (after promotion."
No promotion (to int or unsigned int) is however needed here - but flipping all bits in a signed type with the value 0 will make it -1 (all bits set) which when converted to an unsigned type will be the largest value that type can hold.
To avoid this, make 0 unsigned. The result of ~0U will be an unsigned int with all bits set, which can be converted to any larger unsigned type perfectly.
unsigned long long number1 = ~0U;
You seems to be thinking that the conversion of int to long long is done just by prepending 0 in front of the int. While this is true for the conversion of positive integer, it's not true for negative integers.
unsigned long long number1 = ~0
is identical as
unsigned long long number1 = -1
So when the conversion takes place, it is not simply copying the bits into number1
Since C++11 unsigned arithmetic must use modulo wrapping.
Which means that if you have N bits then any operation OP works like this:
result = (a OP b) % (2^N)
Say you have 8 bits and substract as operation.
unsigned char result1 = (0 - 1) % 256 = -1 % 256 = 255 = 0b11111111
unsigned char result2 = ~(0b00000000) = 0b11111111 = 255
The same holds for larger N and thus your result
In C/C++ integer promotion is governed by the type of the SOURCE argument, not the DESTINATION.
So - the promotion from signed integer types will be sign extended - regardless the type of the destination, be it signed or unsigned.
The promotion from unsigned integer types will be zero extended - again, regardless the type of destination.
In your case 0 is signed integer and ~0 is also signed integer. This will be sign extended to unsigned long long.
The conversion from int to unsigned long long is specified based on the value of the int (aka "the source integer") not the bit pattern. The value is -1 in this case:
4.7 Integral conversions [conv.integral]
A prvalue of an integer type can be converted to a prvalue of another integer type. A prvalue of an unscoped enumeration type can be converted to a prvalue of an integer type.
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]
For an integer with a value of -1 the "least unsigned integer congruent to the source integer" modulo 2^64 is (-1 + 2^64).
The simple-to-understand effect of this rule is that int is sign-extended when being converted to an unsigned type.

Assign a negative number to an unsigned int

This code gives the meaningful output
#include <iostream>
int main() {
unsigned int ui = 100;
unsigned int negative_ui = -22u;
std::cout << ui + negative_ui << std::endl;
}
Output:
78
The variable negative_ui stores -22, but is an unsigned int.
My question is why does unsigned int negative_ui = -22u; work.
How can an unsigned int store a negative number? Is it save to be used or does this yield undefined behaviour?
I use the intel compiler 18.0.3. With the option -Wall no warnings occurred.
Ps. I have read What happens if I assign a negative value to an unsigned variable? and Why unsigned int contained negative number
How can an unsigned int store a negative number?
It doesn't. Instead, it stores a representable number that is congruent with that negative number modulo the number of all representable values. The same is also true with results that are larger than the largest representable value.
Is it save to be used or does this yield undefined behaviour?
There is no UB. Unsigned arithmetic overflow is well defined.
It is safe to rely on the result. However, it can be brittle. For example, if you add -22u and 100ull, then you get UINT_MAX + 79 (i.e. a large value assuming unsigned long long is a larger type than unsigned) which is congruent with 78 modulo UINT_MAX + 1 that is representable in unsigned long long but not representable in unsigned.
Note that signed arithmetic overflow is undefined.
Signed/Unsigned is a convention. It uses the last bit of the variable (in case of x86 int, the last 31th bit). What you store in the variable takes the full bit length.
It's the calculations that follow that take the upper bit as a sign indicator or ignore it. Therefore, any "unsigned" variable can contain a signed value which will be converted to the unsigned form when the unsigned variable participates in a calculation.
unsigned int x = -1; // x is now 0xFFFFFFFF.
x -= 1; // x is now 0xFFFFFFFE.
if (x < 0) // false. x is compared as 0xFFFFFFFE.
int x = -1; // x stored as 0xFFFFFFFF
x -= 1; // x stored as 0xFFFFFFFE
if (x < 0) // true, x is compared as -2.
Technically valid, bad programming.

Squaring number in c++, Kaprekar numbers [duplicate]

This question already has answers here:
Multiplication of two integers in C++
(3 answers)
Closed 6 years ago.
Found this issue in C++ while detecting Kaprekar numbers in a range. For number 77778 -
unsigned long long sq = pow(n, 2);
returns 6,049,417,284 while
unsigned long long sq = n * n;
returns 1,754,449,988
Any ideas why? Is this some sort of overflow which pow avoids but normal n*n does not.
Assuming your n to be typical int or unsigned int, the reason for this is because
this line
unsigned long long sq = n * n;
is equivalent to
unsigned long long sq = (int)(n * n);
as the n * n will be first processed (both as integers) before assigning the result to sq. So, this is an overflow problem (And welcome to Stack Overflow too!).
You also probably want to understand these terms overflow and casting more by searching around (since they are very common issues in Computing, understanding them early will be of great help!).
This has nothing to do with Kaprekar numbers. In most of nowadays machine int is 32-bit. Thus it can only handle value -2,147,483,648 to 2,147,483,647 (or 0 to 4,294,967,295 for unsigned integer counter part).
Thus processing n * n will give you:
n * n = 6,049,417,284 - 4,294,967,296 = 1,754,449,988 //overflow at (4,294,967,295 + 1)!
If you do casting before hand:
unsigned int n = 77778;
unsigned long long sq = pow(n, 2);
unsigned long long sq2 = (unsigned long long)n * n; //note the casting here.
std::cout << sq << std::endl;
std::cout << sq2 << std::endl;
Then the results will be identical, since there won't be overflow.
Your n is declared as a 32 bit int. You need to either change it to long long or just typecast the operation into long long.
unsigned long long sq=(unsigned long long)n*n;
this will give the right answer
I suspect that n is declared as unsigned int and you've compiler with a data model that assumes int to be 32 bits wide. The maximum value that can be represented with this type would be 232 - 1 = 4294967295. Anything beyond this value would wrap around. So assigning 4294967296 would become 0, 4294967297 would become 1, and so on.
You have an overflow; since both operands are unsigned int the resulting type would be the same too. The true result of the operation would be 6049417284. Assigning it to an unsigned int would (wrap) and become 1754449988 = 6049417284 - 4294967296. This unsigned int result is assigned to a wider type unsigned long long, which doesn't change the value. It's necessary to understand the difference between the result's type (the type of the expression) and destination type (the type of the variable that is going to hold the result).
Wrap around behaviour (more formally modulo n) in unsigned types is well-defined in C++, so the compiler might not warn you.
Quote from Unsigned Arithmetic:
If an unsigned integer overflows, the result is defined modulo 2w, where w is the number of bits in that particular unsigned integer. By implication, an unsigned integer is never negative.

Can n %= m ever return negative value for very large nonnegative n and m?

This question is regarding the modulo operator %. We know in general a % b returns the remainder when a is divided by b and the remainder is greater than or equal to zero and strictly less than b. But does the above hold when a and b are of magnitude 10^9 ?
I seem to be getting a negative output for the following code for input:
74 41 28
However changing the final output statement does the work and the result becomes correct!
#include<iostream>
using namespace std;
#define m 1000000007
int main(){
int n,k,d;
cin>>n>>k>>d;
if(d>n)
cout<<0<<endl;
else
{
long long *dp1 = new long long[n+1], *dp2 = new long long[n+1];
//build dp1:
dp1[0] = 1;
dp1[1] = 1;
for(int r=2;r<=n;r++)
{
dp1[r] = (2 * dp1[r-1]) % m;
if(r>=k+1) dp1[r] -= dp1[r-k-1];
dp1[r] %= m;
}
//build dp2:
for(int r=0;r<d;r++) dp2[r] = 0;
dp2[d] = 1;
for(int r = d+1;r<=n;r++)
{
dp2[r] = ((2*dp2[r-1]) - dp2[r-d] + dp1[r-d]) % m;
if(r>=k+1) dp2[r] -= dp1[r-k-1];
dp2[r] %= m;
}
cout<<dp2[n]<<endl;
}
}
changing the final output statement to:
if(dp2[n]<0) cout<<dp2[n]+m<<endl;
else cout<<dp2[n]<<endl;
does the work, but why was it required?
By the way, the code is actually my solution to this question
This is a limit imposed by the range of int.
int can only hold values between –2,147,483,648 to 2,147,483,647.
Consider using long long for your m, n, k, d & r variables. If possible use unsigned long long if your calculations should never have a negative value.
long long can hold values from –9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
while unsigned long long can hold values from 0 to 18,446,744,073,709,551,615. (2^64)
The range of positive values is approximately halved in signed types compared to unsigned types, due to the fact that the most significant bit is used for the sign; When you try to assign a positive value greater than the range imposed by the specified Data Type the most significant bit is raised and it gets interpreted as a negative value.
Well, no, modulo with positive operands does not produce negative results.
However .....
The int type is only guaranteed by the C standards to support values in the range -32767 to 32767, which means your macro m is not necessarily expanding to a literal of type int. It will fit in a long though (which is guaranteed to have a large enough range).
If that's happening (e.g. a compiler that has a 16-bit int type and a 32-bit long type) the results of your modulo operations will be computed as long, and may have values that exceed what an int can represent. Converting that value to an int (as will be required with statements like dp1[r] %= m since dp1 is a pointer to int) gives undefined behaviour.
Mathematically, there is nothing special about big numbers, but computers only have a limited width to write down numbers in, so when things get too big you get "overflow" errors. A common analogy is the counter of miles traveled on a car dashboard - eventually it will show as all 9s and roll round to 0. Because of the way negative numbers are handled, standard signed integers don't roll round to zero, but to a very large negative number.
You need to switch to larger variable types so that they overflow less quickly - "long int" or "long long int" instead of just "int", the range doubling with each extra bit of width. You can also use unsigned types for a further doubling, since no range is used for negatives.

What if I try to assign values greater than pow(2,64)-1 to unsigned long long in c++?

If I have two unsigned long long values say pow(10,18) and pow (10,19) and I multiply them and store the output in another variable of type unsigned long long...the value which we get is obviously not the answer but does it have any logic? We get a junk type of value each time we try to this with arbitrarily large numbers, but do the outputs have any logic with the input values?
Unsigned integral types in C++ obey the rules of modular arithmetic, i.e. they represent the integers modulo 2N, where N is the number of value bits of the integral type (possibly less than its sizeof times CHAR_BIT); specifically, the type holds the values [0, 2N).
So when you multiply two numbers, the result is the remainder of the mathematical result divided by 2N.
The number N is obtainable programmatically via std::numeric_limits<T>::digits.
Yes, there's a logic.
As KerreK wrote, integers are "wrapped around" the 2N bits that constitute the width of their datatype.
To make it easy, let's consider the following:
#include <iostream>
#include <cmath>
using namespace std;
int main() {
unsigned char tot;
unsigned char ca = 200;
unsigned char ca2 = 200;
tot = ca * ca2;
cout << (int)tot;
return 0;
}
(try it: http://ideone.com/nWDYjO)
In the above example an unsigned char is 1 byte wide (max 255 decimal value), when multiplying 200 * 200 we get 40000. If we try to store it into the unsigned char it won't obviously fit.
The value is then "wrapped around", that is, tot gets the result of the following
(ca*ca2) % 256
where 256 are the bit of the unsigned char (1 byte), 28 bits
In your case you would get
(pow(10,18) * pow (10,19)) %
2number_of_bits_of_unsigned_long_long(architecture_dependent)