Hash Function Clarification - c++

Went over this in class today:
const int tabsize = 100000;
int hash(string s) {
const int init = 21512712, mult = 96169, emergency = 876127;
int v = init;
for (int i=0; i<s.length(); i+=1)
v = v * mult + s[i];
if (v < 0) v = -v;
if (v < 0) v = emergency;
return v % tabsize;
}
Having some trouble figuring out what the last 2 if-statements are supposed to do.
Any ideas?
Thanks

The first if statement takes care of overflow behavior of signed integers. Thus if the integer gets too big that it wraps and becomes negative, this if statement ensures that only the positive integer is returned.
The second if statement is used to take care of the rare case of where v is 2147483648.
Note that positive signed 32 bit integers only go up to 231 - 1 or 2147483647 while the negative can go down to -231 or -2147483648.This number is negative and even negating it still gives a negative number. So that is what the emergency number is for
int main() {
int t = -2147483648;
std::cout << (-t) << std::endl;
}

They ensure the v is positive, because when you use the % operator on a negative number you can get a negative result which is not desirable for a hash value.
However, this does get into undefined behavior with the integer overflow so it might not work everywhere.

Related

How to check for INT_MAX without overflowing int

I created a BigInt class which allows for huge numbers (from any bases 2-36) far beyond the integer max by storing each digit in a vector. I need to be able to convert this back to an integer but return the int max/min instead if the max is reached, otherwise there will ofc be an integer overflow.
My question is how can I check if I have exceeded the max without overflowing the integer I am building. I have tried moving the if statements at the bottom into the for loop but my integer still overflows. I feel like the solution is simple but I just can't grasp it.
// Convert BigInt to integer base 10 and return that int
// If BigInt > INT_MAX, return INT_MAX.
// If BigInt < INT_MIN, return INT_MIN.
int BigInt::to_int() const{
int number = 0;
for(size_t i = 0; i < vec.size(); i++) {
number += vec[i] * pow(base, i);
}
if (!isPositive) { number *= -1; }
if (number > INT_MAX) { return INT_MAX; }
if (number < INT_MIN) { return INT_MIN; }
return number;
}
Comparing an int value to INT_MAX is pointless except for equality because all values are less than or equal.
Performing overflow check after the overflowing signed operations is pointless because either they show that there was no overflow or the behaviour of the program is undefined. Always do the check before attempting operations that would overflow the result.
In this case, convert INT_MAX to your BigInt type and compare that with *this.
Preliminary info: you need to check for overflow before calculating something that overflows. If the overflow already happened, it's too late.
Adding an overflow check to your version of to_int() is tricky because you build up your value starting from the one's place. Because of this approach, you try to add pow(base, i), which could overflow an int by itself and that is not easy to detect in advance. Possible, but let's consider something else.
If you were to build up your value ending at the one's place (i.e. repeatedly calculate number*base + digit), you could check for overflow before multiplying. Here is some math, using shorter names for an easier read. Let x and M be integers, base a positive integer, and d some non-negative integer less than base. (M short for "max" and d short for "digit".) Division will mean real-valued division, as I can trunc() the result to get integer division. We want to know how x*base + d compares to M.
If x*base + d <= M then dividing by base gives x + d/base <= M/base, hence x <= trunc(M/base).
By the contrapositive, if x > trunc(M/base) then x*base + d > M.
If x*base + d >= M then dividing by base gives x + d/base >= M/base, hence x >= trunc(M/base).
By the contrapositive, if x < trunc(M/base) then x*base + d < M.
If x == trunc(M/base) then x*base == trunc(M/base)*base. Add M%base to both sides to get x*base + M%base == M. Well, I hope you'll accept the observation that trunc(M/base)*base + M%base == M. If you can accept that much, then the comparison between x*base + d and M is the same as the comparison between d and M%base.
Done with the math. Let's put this into code. You might note a performance increase as well, depending on how your compiler optimizes.
// Tests if number * base + next_digit will overflow an int.
bool will_overflow(int number, int base, int next_digit )
{
if ( number > INT_MAX/base )
return true;
if ( number < INT_MAX/base )
return false;
// It's close enough that the next digit decides it.
return next_digit > INT_MAX % base;
}
// Convert BigInt to integer base 10 and return that int
// If BigInt > INT_MAX, return INT_MAX.
// If BigInt < INT_MIN, return INT_MIN.
int BigInt::to_int() const {
int number = 0;
// Loop in the reverse direction. Be careful with unsigned values!
for(size_t i = vec.size(); i > 0; --i) {
if ( will_overflow(number, base, vec[i-1]) )
return isPositive ? INT_MAX : INT_MIN;
number = number * base + vec[i-1];
}
return number;
}
I will point out one small cheat in this. There is a single negative value that fits in an int, but whose absolute value is greater than INT_MAX. If that singular value comes up, this function will incorrectly detect it as an overflow and return INT_MIN. Fortunately, that works out fine since the singular value is INT_MIN. :)

How to get a right shifted number without actual calculation?

I have the following snippet:
int n = 10;
int k = n>>1;
std::cout<<k;
This prints 5.
I want k to be the last digit in binary representation of n.
Like bin(n) = 1010
So, I want k to be 0.
I understand long methods are possible. Please suggest a one liner if possible.
Edit:
After going through the comments and answers, I discovered that there are various ways of doing that.
Some of them are:
k = n%2
k = n&1
Thanks to all those who answered the question. :)
int main( )
{
unsigned int val= 0x1010;
//so you just want the least siginificant bit?
//and assign it to another int?
unsigned int assign= val & 0x1;
std::cout << assign << std::endl;
val= 0x1001;
assign= val & 0x1;
std::cout << assign << std::endl;
return 0;
}
UPDATE:
I would add that bit masking is not uncommon with c. I use ints to hold states often
#define STATE_MOTOR_RUNNING 0x0001
#define STATE_UPDATE_DISPLAY 0x0002
#define STATE_COUNTER_READY 0x0004
Then:
unsigned int state= STATE_COUNTER_READY;
if( state & STATE_COUNTER_READY )
{
start_motor( );
state|= STATE_MOTOR_RUNNING;
}
//etc...
You aren't going to be able to avoid some calculation.
int k = n % 10;
will get you the last decimal digit, as that assignment gives k the remainder of division by 10.

Floating point error in C++ code

I am trying to solve a question in which i need to find out the number of possible ways to make a team of two members.(note: a team can have at most two person)
After making this code, It works properly but in some test cases it shows floating point error ad i can't find out what it is exactly.
Input: 1st line : Number of test cases
2nd line: number of total person
Thank you
#include<iostream>
using namespace std;
long C(long n, long r)
{
long f[n + 1];
f[0] = 1;
for (long i = 1; i <= n; i++)
{
f[i] = i * f[i - 1];
}
return f[n] / f[r] / f[n - r];
}
int main()
{
long n, r, m,t;
cin>>t;
while(t--)
{
cin>>n;
r=1;
cout<<C(n, min(r, n - r))+1<<endl;
}
return 0;
}
You aren't getting a floating point exception. You are getting a divide by zero exception. Because your code is attempting to divide by the number 0 (which can't be done on a computer).
When you invoke C(100, 1) the main loop that initializes the f array inside C increases exponentially. Eventually, two values are multiplied such that i * f[i-1] is zero due to overflow. That leads to all the subsequent f[i] values being initialized to zero. And then the division that follows the loop is a division by zero.
Although purists on these forums will say this is undefined, here's what's really happening on most 2's complement architectures. Or at least on my computer....
At i==21:
f[20] is already equal to 2432902008176640000
21 * 2432902008176640000 overflows for 64-bit signed, and will typically become -4249290049419214848 So at this point, your program is bugged and is now in undefined behavior.
At i==66
f[65] is equal to 0x8000000000000000. So 66 * f[65] gets calculated as zero for reasons that make sense to me, but should be understood as undefined behavior.
With f[66] assigned to 0, all subsequent assignments of f[i] become zero as well. After the main loop inside C is over, the f[n-r] is zero. Hence, divide by zero error.
Update
I went back and reverse engineered your problem. It seems like your C function is just trying to compute this expression:
N!
-------------
R! * (N-R)!
Which is the "number of unique sorted combinations"
In which case instead of computing the large factorial of N!, we can reduce that expression to this:
n
[ ∏ i ]
n-r
--------------------
R!
This won't eliminate overflow, but will allow your C function to be able to take on larger values of N and R to compute the number of combinations without error.
But we can also take advantage of simple reduction before trying to do a big long factorial expression
For example, let's say we were trying to compute C(15,5). Mathematically that is:
15!
--------
10! 5!
Or as we expressed above:
1*2*3*4*5*6*7*8*9*10*11*12*13*14*15
-----------------------------------
1*2*3*4*5*6*7*8*9*10 * 1*2*3*4*5
The first 10 factors of the numerator and denominator cancel each other out:
11*12*13*14*15
-----------------------------------
1*2*3*4*5
But intuitively, you can see that "12" in the numerator is already evenly divisible by denominators 2 and 3. And that 15 in the numerator is evenly divisible by 5 in the denominator. So simple reduction can be applied:
11*2*13*14*3
-----------------------------------
1 * 4
There's even more room for greatest common divisor reduction, but this is a great start.
Let's start with a helper function that computes the product of all the values in a list.
long long multiply_vector(std::vector<int>& values)
{
long long result = 1;
for (long i : values)
{
result = result * i;
if (result < 0)
{
std::cout << "ERROR - multiply_range hit overflow" << std::endl;
return 0;
}
}
return result;
}
Not let's implement C as using the above function after doing the reduction operation
long long C(int n, int r)
{
if ((r >= n) || (n < 0) || (r < 0))
{
std::cout << "invalid parameters passed to C" << std::endl;
return 0;
}
// compute
// n!
// -------------
// r! * (n-r)!
//
// assume (r < n)
// Which maps to
// n
// [∏ i]
// n - r
// --------------------
// R!
int end = n;
int start = n - r + 1;
std::vector<int> numerators;
std::vector<int> denominators;
long long numerator = 1;
long long denominator = 1;
for (int i = start; i <= end; i++)
{
numerators.push_back(i);
}
for (int i = 2; i <= r; i++)
{
denominators.push_back(i);
}
size_t n_length = numerators.size();
size_t d_length = denominators.size();
for (size_t n = 0; n < n_length; n++)
{
int nval = numerators[n];
for (size_t d = 0; d < d_length; d++)
{
int dval = denominators[d];
if ((nval % dval) == 0)
{
denominators[d] = 1;
numerators[n] = nval / dval;
}
}
}
numerator = multiply_vector(numerators);
denominator = multiply_vector(denominators);
if ((numerator == 0) || (denominator == 0))
{
std::cout << "Giving up. Can't resolve overflow" << std::endl;
return 0;
}
long long result = numerator / denominator;
return result;
}
You are not using floating-point. And you seem to be using variable sized arrays, which is a C feature and possibly a C++ extension but not standard.
Anyway, you will get overflow and therefore undefined behaviour even for rather small values of n.
In practice the overflow will lead to array elements becoming zero for not much larger values of n.
Your code will then divide by zero and crash.
They also might have a test case like (1000000000, 999999999) which is trivial to solve, but not for your code which I bet will crash.
You don't specify what you mean by "floating point error" - I reckon you are referring to the fact that you are doing an integer division rather than a floating point one so that you will always get integers rather than floats.
int a, b;
a = 7;
b = 2;
std::cout << a / b << std::endl;
this will result in 3, not 3.5! If you want floating point result you should use floats instead like this:
float a, b;
a = 7;
b = 2;
std::cout << a / b << std::end;
So the solution to your problem would simply be to use float instead of long long int.
Note also that you are using variable sized arrays which won't work in C++ - why not use std::vector instead??
Array syntax as:
type name[size]
Note: size must a constant not a variable
Example #1:
int name[10];
Example #2:
const int asize = 10;
int name[asize];

My RSA encryption produces 2^64 every time (C++)

I have written an attempt at my own RSA algorithm, but the encryption portion isn't quite working when I use fairly large numbers (nothing like the size which should be used for RSA) and I'm not sure why.
It works in the following way:
The input is a list of characters, for this example "abc"
This is converted to an array: [10,11,12]. (I have chosen 10 - 35 for lower case letters so that they are all 2 digit numbers just to make it easier)
The numbers are combined to form 121110 (using 12*100^2 + 11*100^1 + 10*100^0)
Apply the algorithm: m^e (mod n)
This is simplified using a^b (mod n) = a^c (mod n) * a^d (mod n)
This works for small values in that it can be deciphered using the decryption program which I have written.
When using larger values the output is always 1844674407188030241, with a little bit of research I found that this is roughly 2^64 (to 10 significant figures, it has been pointed out that odd numbers can't be powers of two, oops). I am sure that there is something that I have overlooked and I apologise for what (I really hope) will be a trivial question with an easy answer. Why is the output value always 2^64 and what can I change to fix it? Thank you very much for any help, here is my code:
#include <iostream>
#include <string>
#include <math.h>
int returnVal (char x)
{
return (int) x;
}
unsigned long long modExp(unsigned long long b, unsigned long long e, unsigned long long m)
{
unsigned long long remainder;
int x = 1;
while (e != 0)
{
remainder = e % 2;
e= e/2;
if (remainder == 1)
x = (x * b) % m;
b= (b * b) % m;
}
return x;
}
unsigned mysteryFunction(const std::string& input)
{
unsigned result = 0;
unsigned factor = 1;
for (size_t i = 0; i < input.size(); ++i)
{
result += factor * (input[i] - 87);
factor *= 100;
}
return result;
}
int main()
{
unsigned long long p = 70021;
unsigned long long q = 80001;
int e = 7;
unsigned long long n = p * q;
std::string foo = "ab";
for (int i = 0; i < foo.length(); i++);
{
std::cout << modExp (mysteryFunction(foo), e, n);
}
}
Your code has several problems.
Problem 1: Inconsistent use of unsigned long long.
int x = 1;
Changing this declaration in modExp to unsigned long long causes the program to give a more reasonable-looking result. I don't whether it's the correct result, but it's less than n, at least. I'm still not sure what the exact mechanism of the error was. I can see ways it would have screwed things up, but none that could have caused an output of 1844674407188030241.
Problem 2: Composite "primes".
For RSA, p and q both need to be prime. Neither p nor q is prime in your code.
70021 = 7^2 * 1429
80001 = 3^2 * 2963
In mysteryFunction, you subtract 89, which corresponds to 'W', from the input characters. You probably want to subtract '97' instead, which corresponds to 'a'.

Data type problems

Can you explain to me why this doesn't work:
#include <iostream>
using namespace std;
double data_convert(int n);
int main(void) {
cout << data_convert(sizeof(int));
}
double data_convert(int n) {
int i;
double x;
x = 8 * n;
for(i = 0; i < 32; i++)
x = x * 32;
return x;
}
I tried using pow from cmath, but I got the same results. Apparently, this outputs "4.67681e+049". Where as it should output (using Windows Calculator) "4294967296".
The for loop is my own hardcoded pow() function for this specific task. All I wanna do is make a program that can show how big a data type is, along with it's range (bit range or something, yeah?)
If you want 2^32, you should be multiplying by 2 each time. Your code multiplies by 32 each time, so you'll end up with a much larger value.
Also, your x value should start from 1. 8 * n is actually the number of bits in the integer, so that should be your upper limit for the loop:
x = 1;
for (i = 0; i < 8 * n; i++)
x = x * 2;
return x;
A simpler method would be to bitwise negate 0, which will give you the largest possible integer:
return ~0;
will give you 2^32 - 1 = 4294967295 (on a 32-bit machine).
Basically you are multiplying the input by 8 and are then multiplying that by 32, 32 times.
I don't understand what that is suppose to get you to.
If you want the range of an unsigned integer for x amount of bytes you should use this calculation:
max number = 2^(bytes*8) - 1
So in the loop it should multiply 2 until i goes from 0 to bytes*8 and stop there (So it ends before it gets to bytes*8)
I'm not sure what you're doing, but don't you meanx = x*2?