Acting like unsigned int overflow. What is causing it?

Acting like unsigned int overflow. What is causing it? - c++

I have this function which generates a specified number of so called 'triangle numbers'. If I print out the deque afterwords, the numbers increase, jumps down, then increases again. Triangle numbers should never get lower as i rises so there must be some kind of overflow happening. I tried to fix it by adding the line if(toPush > INT_MAX) return i - 1; to try to stop the function from generating more numbers (and return the number it generated) if the result is overflowing. That is not working however, the output continues to be incorrect (increases for a while, jumps down to a lower number, then increases again). The line I added doesn't actually seem to be doing anything at all. Return is not being reached. Does anyone know what's going on here?
#include <iostream>
#include <deque>
#include <climits>
int generateTriangleNumbers(std::deque<unsigned int> &triangleNumbers, unsigned int generateCount) {
for(unsigned int i = 1; i <= generateCount; i++) {
unsigned int toPush = (i * (i + 1)) / 2;
if(toPush > INT_MAX) return i - 1;
triangleNumbers.push_back(toPush);
}
return generateCount;
}

INT_MAX is the maximum value of signed int. It's about half the maximum value of unsigned int (UINT_MAX). Your calculation of toPush may well get much higher than UINT_MAX because you square the value (if it's near INT_MAX the result will be much larger than UINT_MAX that your toPush can hold). In this case the toPush wraps around and results in smaller value than previous one.
First of all, your comparison to INT_MAX is flawed since your type is unsigned int, not signed int. Secondly, even a comparison to UINT_MAX would be incorrect since it implies that toPush (the left operand of the comparison expression) can hold a value above it's maximum - and that's not possible. The correct way would be to compare your generated number with the previous one. If it's lower, you know you have got an overflow and you should stop.
Additionally, you may want to use types that can hold a larger range of values (such as unsigned long long).

The 92682th triangle number is already greater than UINT32_MAX. But the culprit here is much earlier, in the computation of i * (i + 1). There, the calculation overflows for the 65536th triangular number. If we ask Python with its native bignum support:
>>> 2**16 * (2**16+1) > 0xffffffff
True
Oops. Then if you inspect your stored numbers, you will see your sequence dropping back to low values. To attempt to emulate what the Standard says about the behaviour of this case, in Python:
>>> (int(2**16 * (2**16+1)) % 0xffffffff) >> 1
32768
and that is the value you will see for the 65536th triangular number, which is incorrect.
One way to detect overflow here is ensure that the sequence of numbers you generate is monotonic; that is, if the Nth triangle number generated is strictly greater than the (N-1)th triangle number.
To avoid overflow, you can use 64-bit variables to both generate & store them, or use a big number library if you need a large amount of triangle numbers.

In Visual C++ int (and of course unsigned int) is 32 bits even on 64-bit computers.
Either use unsigned long long or uint64_t to use a 64-bit value.

Related

why do I get different results with the same conditions in a for loop?

I was stuck when I was trying to use a for loop to solve a problem.
Here's my simplified code:
int main(int argc, const char * argv[])
{
std::vector<int> a;
a.push_back(2333);
int n = 10, m = 10;
for(int i=0; i< -1; i++)
m--;
std::cout<<m<<endl;
for(int j=0; j<a.size()-2; j++)
n--;
std::cout<<n<<endl;
return 0;
}
Apparently, a.size() = 1 so these two end conditions should be the same. However, when I ran my code on Xcode 9.4.1 I got unexpected as it turned out that m = 10
and n = 11. And I found that the time it took to get the value of n is much longer than m.
Why would I get such a result? Any help will be appreciated.

The value returned by size() is std::size_t, which is an unsigned integral type. This means that it can only represent non-negative numbers, and if you do an operation that results in a negative number, it will wrap around to the largest possible value like in modular arithmetic.
Here, 2 - 1 is -1, which wraps to 2^32 - 1 on a 32-bit system. When you try to subtract 2^32 - 1 from 10, you cause a signed integer underflow since the minimum value of a 32-bit integer is -2^31. Signed integer overflow/underflow is undefined behavior, so anything can happen.
In this case, it seems like the underflow wrapped around to the maximum value. So the result would be 10 - (2^32 - 1) + 2^32, which is 11. We add 2^32 to simulate the underflow wrapping around. In other words, after the 2^31 + 10th iteration of the loop, n is the minimum possible value in a 32-bit integer. The next iteration causes the wrap around, so n is now 2^31 - 1. Then, the remaining 2^31 - 12 iterations decrease n to 11.
Again, signed integer overflow/underflow is undefined behavior, so don't be surprised when something weird happens because of that, especially with modern compiler optimizations. For example, your entire program can be "optimized" to do absolutely nothing since it will always invoke UB. You're not even guaranteed to see the output from std::cout<<m<<endl;, even though the UB is invoked after that line executes.

The value returned by a.size() is it type size_t, which is an unsigned int, because there wouldn’t be any reason to have a size that is negative. If you do 1-2 with unsigned numbers it will roll over and become a value near the maximum value for an unsigned int and the loop will take quite a while to run, or might not even stop since a signed integer can’t be larger than the top half of unsigned values. This depends on the rules of comparing signed and unsigned which I don’t remember for sure on the spot.
Using a debugger and making sure the types are correct (your compiler should mention signed/unsigned mismatch here) helps determine these cases.

Large Number Issues in C++

I'm working on a relatively simple problem based around adding all the primes under a certain value together. I've written a program that should accomplish this task. I am using long type variables. As I get up into higher numbers (~200/300k), the variable I am using to track the sum becomes negative despite the fact that no negative values are being added to it (based on my knowledge and some testing I've done). Is there some issue with the data type or I am missing something.
My code is below (in C++) [Vector is basically a dynamic array in case people are wondering]:
bool checkPrime(int number, vector<long> & primes, int numberOfPrimes) {
for (int i=0; i<numberOfPrimes-1; i++) {
if(number%primes[i]==0) return false;
}
return true;
}
long solveProblem10(int maxNumber) {
long sumOfPrimes=0;
vector<long> primes;
primes.resize(1);
int numberOfPrimes=0;
for (int i=2; i<maxNumber; i++) {
if(checkPrime(i, primes, numberOfPrimes)) {
sumOfPrimes=sumOfPrimes+i;
primes[numberOfPrimes]=long(i);
numberOfPrimes++;
primes.resize(numberOfPrimes+1);
}
}
return sumOfPrimes;
}

Integers represent values use two's complement which means that the highest order bit represents the sign. When you add the number up high enough, the highest bit is set (an integer overflow) and the number becomes negative.
You can resolve this by using an unsigned long (32-bit, and may still overflow with the values you're summing) or by using an unsigned long long (which is 64 bit).

the variable I am using to track the sum becomes negative despite the fact that no negative values are being added to it (based on my knowledge and some testing I've done)
longs are signed integers. In C++ and other lower-level languages, integer types have a fixed size. When you add past their maximum they will overflow and wrap-around to negative numbers. This is due to the behavior of how twos complement works.

check valid integer values: Variables. Data Types.
you're using signed long, which is usually 32 bit, which means -2kkk - 2kkk, you can either use unsigned long, which is 0-4kkk, or use 64 bit (un)signed long long
if you need values bigger 2^64 (unsigned long long), you will need to use bignum math

long is probably only 32 bits on your system - use uint64_t for the sum - this gives you a guaranteed 64 bit unsigned integer.
#include <cstdint>
uint64_t sumOfPrimes=0;

You can include header <cstdint> and use type std::uintmax_t instead of long.

big integer addition without carry flag

In assembly languages, there is usually an instruction that adds two operands and a carry. If you want to implement big integer additions, you simply add the lowest integers without a carry and the next integers with a carry. How would I do that efficiently in C or C++ where I don't have access to the carry flag? It should work on several compilers and architectures, so I cannot simply use inline assembly or such.

You can use "nails" (a term from GMP): rather than using all 64 bits of a uint64_t when representing a number, use only 63 of them, with the top bit zero. That way you can detect overflow with a simple bit-shift. You may even want less than 63.
Or, you can do half-word arithmetic. If you can do 64-bit arithmetic, represent your number as an array of uint32_ts (or equivalently, split 64-bit words into upper and lower 32-bit chunks). Then, when doing arithmetic operations on these 32-bit integers, you can first promote to 64 bits do the arithmetic there, then convert back. This lets you detect carry, and it's also good for multiplication if you don't have a "multiply hi" instruction.
As the other answer indicates, you can detect overflow in an unsigned addition by:
uint64_t sum = a + b;
uint64_t carry = sum < a;
As an aside, while in practice this will also work in signed arithmetic, you have two issues:
It's more complex
Technically, overflowing a signed integer is undefined behavior
so you're usually better off sticking to unsigned numbers.

You can figure out the carry by virtue of the fact that, if you overflow by adding two numbers, the result will always be less than either of those other two values.
In other words, if a + b is less than a, it overflowed. That's for positive values of a and b of course but that's what you'd almost certainly be using for a bignum library.
Unfortunately, a carry introduces an extra complication in that adding the largest possible value plus a carry of one will give you the same value you started with. Hence, you have to handle that as a special case.
Something like:
carry = 0
for i = 7 to 0:
if a[i] > b[i]:
small = b[i], large = a[i]
else:
small = a[i], large = b[i]
if carry is 1 and large is maxvalue:
c[i] = small
carry = 1
else:
c[i] = large + small + carry
if c[i] < large:
carry = 1
else
carry = 0
In reality, you may also want to consider not using all the bits in your array elements.
I've implemented libraries in the past, where the maximum "digit" is less than or equal to the square root of the highest value it can hold. So for 8-bit (octet) digits, you store values from 0 through 15 - that way, multiplying two digits and adding the maximum carry will always fit with an octet, making overflow detection moot, though at the cost of some storage.
Similarly, 16-bit digits would have the range 0 through 255 so that it won't overflow at 65536.
In fact, I've sometimes limited it to more than that, ensuring the artificial wrap value is a power of ten (so an octet would hold 0 through 9, 16-bit digits would be 0 through 99, 32-bit digits from 0 through 9999, and so on.
That's a bit more wasteful on space but makes conversion to and from text (such as printing your numbers) incredibly easy.

u can check for carry for unsigned types by checking, is result less than an operand (any operand will do).
just start the thing with carry 0.

If I understand you correctly, you want to write you own addition for you own big integer type.
You can do this with a simple function. No need to worry about the carry flag in the first run. Just go from right to left, add digit by digit and the carry flag (internally in that function), starting with a carry of 0, and set the result to (a+b+carry) %10 and the carry to (a+b+carry) / 10.
this SO could be relevant:
how to implement big int in c

Basic integer explanation in C++

This is a very basic question.Please don't mind but I need to ask this. Adding two integers
int main()
{
cout<<"Enter a string: ";
int a,b,c;
cout<<"Enter a";
cin>>a;
cout<<"\nEnter b";
cin>>b;
cout<<a<<"\n"<<b<<"\n";
c= a + b;
cout <<"\n"<<c ;
return 0;
}
If I give a = 2147483648 then
b automatically takes a value of 4046724. Note that cin will not be prompted
and the result c is 7433860
If int is 2^32 and if the first bit is MSB then it becomes 2^31
c= 2^31+2^31
c=2^(31+31)
is this correct?
So how to implement c= a+b for a= 2147483648 and b= 2147483648 and should c be an integer or a double integer?

When you perform any sort of input operation, you must always include an error check! For the stream operator, this could look like this:
int n;
if (!(std::cin >> n)) { std::cerr << "Error!\n"; std::exit(-1); }
// ... rest of program
If you do this, you'll see that your initial extraction of a already fails, so whatever values are read afterwards are not well defined.
The reason the extraction fails is that the literal token "2147483648" does not represent a value of type int on your platform (it is too large), no different from, say, "1z" or "Hello".
The real danger in programming is to assume silently that an input operation succeeds when often it doesn't. Fail as early and as noisily as possible.

The int type is signed and therefor it's maximum value is 2^31-1 = 2147483648 - 1 = 2147483647
Even if you used unsigned integer it's maximum value is 2^32 -1 = a + b - 1 for the values of a and b you give.
For the arithmetics you are doing, you should better use "long long", which has maximum value of 2^63-1 and is signed or "unsigned long long" which has a maximum value of 2^64-1 but is unsigned.

c= 2^31+2^31
c=2^(31+31)
is this correct?
No, but you're right that the result takes more than 31 bits. In this case the result takes 32 bits (whereas 2^(31+31) would take 62 bits). You're confusing multiplication with addition: 2^31 * 2^31 = 2^(31+31).
Anyway, the basic problem you're asking about dealing with is called overflow. There are a few options. You can detect it and report it as an error, detect it and redo the calculation in such a way as to get the answer, or just use data types that allow you to do the calculation correctly no matter what the input types are.
Signed overflow in C and C++ is technically undefined behavior, so detection consists of figuring out what input values will cause it (because if you do the operation and then look at the result to see if overflow occurred, you may have already triggered undefined behavior and you can't count on anything). Here's a question that goes into some detail on the issue: Detecting signed overflow in C/C++
Alternatively, you can just perform the operation using a data type that won't overflow for any of the input values. For example, if the inputs are ints then the correct result for any pair of ints can be stored in a wider type such as (depending on your implementation) long or long long.
int a, b;
...
long c = (long)a + (long)b;
If int is 32 bits then it can hold any value in the range [-2^31, 2^31-1]. So the smallest value obtainable would be -2^31 + -2^31 which is -2^32. And the largest value obtainable is 2^31 - 1 + 2^31 - 1 which is 2^32 - 2. So you need a type that can hold these values and every value in between. A single extra bit would be sufficient to hold any possible result of addition (a 33-bit integer would hold any integer from [-2^32,2^32-1]).
Or, since double can probably represent every integer you need (a 64-bit IEEE 754 floating point data type can represent integers up to 53 bits exactly) you could do the addition using doubles as well (though adding doubles may be slower than adding longs).
If you have a library that offers arbitrary precision arithmetic you could use that as well.

When I calculate a large factorial, why do I get a negative number?

So, simple procedure, calculate a factorial number. Code is as follows.
int calcFactorial(int num)
{
int total = 1;
if (num == 0)
{
return 0;
}
for (num; num > 0; num--)
{
total *= num;
}
return total;
}
Now, this works fine and dandy (There are certainly quicker and more elegant solutions, but this works for me) for most numbers. However when inputting larger numbers such as 250 it, to put it bluntly, craps out. Now, the first couple factorial "bits" for 250 are { 250, 62250, 15126750, 15438000, 3813186000 } for reference.
My code spits out { 250, 62250, 15126750, 15438000, -481781296 } which is obviously off. My first suspicion was perhaps that I had breached the limit of a 32 bit integer, but given that 2^32 is 4294967296 I don't think so. The only thing I can think of is perhaps that it breaches a signed 32-bit limit, but shouldn't it be able to think about this sort of thing? If being signed is the problem I can solve this by making the integer unsigned but this would only be a temporary solution, as the next iteration yields 938043756000 which is far above the 4294967296 limit.
So, is my problem the signed limit? If so, what can I do to calculate large numbers (Though I've a "LargeInteger" class I made a while ago that may be suited!) without coming across this problem again?

2^32 doesn't give you the limit for signed integers.
The signed integer limit is actually 2147483647 (if you're developing on Windows using the MS tools, other toolsuites/platforms would have their own limits that are probably similar).
You'll need a C++ large number library like this one.

In addition to the other comments, I'd like to point out two serious bugs in your code.
You have no guard against negative numbers.
The factorial of zero is one, not zero.

Yes, you hit the limit. An int in C++ is, by definition, signed. And, uh, no, C++ does not think, ever. If you tell it to do a thing, it will do it, even if it is obviously wrong.
Consider using a large number library. There are many of them around for C++.

If you don't specify signed or unsigned, the default is signed. You can modify this using a command line switch on your compiler.
Just remember, C (or C++) is a very low-level language and does precisely what you tell it to do. If you tell it to store this value in a signed int, that's what it will do. You as the programmer have to figure out when that's a problem. It's not the language's job.

My Windows calculator (Start-Run-Calc) tells me that
hex (3813186000) = E34899D0
hex (-481781296) = FFFFFFFFE34899D0
So yes, the cause is the signed limit. Since factorials can by definition only be positive, and can only be calculated for positive numbers, both the argument and the return value should be unsigned numbers anyway. (I know that everybody uses int i = 0 in for loops, so do I. But that left aside, we should use always unsigned variables if the value can not be negative, it's good practice IMO).
The general problem with factorials is, that they can easily generate very large numbers. You could use a float, thus sacrificing precision but avoiding the integer overflow problem.
Oh wait, according to what I wrote above, you should make that an unsigned float ;-)

If i remember well:
unsigned short int = max 65535
unsigned int = max 4294967295
unsigned long = max 4294967295
unsigned long long (Int64 )= max 18446744073709551615
Edited source:
Int/Long Max values
Modern Compiler Variable

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Acting like unsigned int overflow. What is causing it? - c++

In Visual C++ int (and of course unsigned int) is 32 bits even on 64-bit computers. Either use unsigned long long or uint64_t to use a 64-bit value.

Related

why do I get different results with the same conditions in a for loop?

Large Number Issues in C++

big integer addition without carry flag

Basic integer explanation in C++

When I calculate a large factorial, why do I get a negative number?

Categories

Resources