Large Number Issues in C++ - c++

I'm working on a relatively simple problem based around adding all the primes under a certain value together. I've written a program that should accomplish this task. I am using long type variables. As I get up into higher numbers (~200/300k), the variable I am using to track the sum becomes negative despite the fact that no negative values are being added to it (based on my knowledge and some testing I've done). Is there some issue with the data type or I am missing something.
My code is below (in C++) [Vector is basically a dynamic array in case people are wondering]:
bool checkPrime(int number, vector<long> & primes, int numberOfPrimes) {
for (int i=0; i<numberOfPrimes-1; i++) {
if(number%primes[i]==0) return false;
}
return true;
}
long solveProblem10(int maxNumber) {
long sumOfPrimes=0;
vector<long> primes;
primes.resize(1);
int numberOfPrimes=0;
for (int i=2; i<maxNumber; i++) {
if(checkPrime(i, primes, numberOfPrimes)) {
sumOfPrimes=sumOfPrimes+i;
primes[numberOfPrimes]=long(i);
numberOfPrimes++;
primes.resize(numberOfPrimes+1);
}
}
return sumOfPrimes;
}

Integers represent values use two's complement which means that the highest order bit represents the sign. When you add the number up high enough, the highest bit is set (an integer overflow) and the number becomes negative.
You can resolve this by using an unsigned long (32-bit, and may still overflow with the values you're summing) or by using an unsigned long long (which is 64 bit).

the variable I am using to track the sum becomes negative despite the fact that no negative values are being added to it (based on my knowledge and some testing I've done)
longs are signed integers. In C++ and other lower-level languages, integer types have a fixed size. When you add past their maximum they will overflow and wrap-around to negative numbers. This is due to the behavior of how twos complement works.

check valid integer values: Variables. Data Types.
you're using signed long, which is usually 32 bit, which means -2kkk - 2kkk, you can either use unsigned long, which is 0-4kkk, or use 64 bit (un)signed long long
if you need values bigger 2^64 (unsigned long long), you will need to use bignum math

long is probably only 32 bits on your system - use uint64_t for the sum - this gives you a guaranteed 64 bit unsigned integer.
#include <cstdint>
uint64_t sumOfPrimes=0;

You can include header <cstdint> and use type std::uintmax_t instead of long.

Related

Is there a fundamental type for natural numbers in C/C++?

I have a problem in which I need to declare some variables as natural numbers. Which is the propper fundamental type that I should use for variables that should be natural numbers ? Like for integers is int ...
The following types resemble natural numbers set with 0 included in C++:
unsigned char
unsigned short int
unsigned int
unsigned long int
unsigned long long int, since C++11.
Each one differs with the other in the range of values it can represent.
Notice that a computer (and perhaps even the entire universe) is a finite machine; it has a finite (but very large number) of bits (my laptop has probably less than 1015 bits).
Of course int are not the mathematical integers. On my machine int is a 32 bits signed integer (and long is a 64 bits signed integer), so int-s have only 232 possible values (and that is much less than the infinite cardinal of mathematical integers).
So a computer can only represent a finite set of numbers, but quite a large one. That is smaller than the infinite set of natural numbers (remember, some of them are not representable on the entire Earth; read about Richard's paradox).
You might want to use unsigned (same as unsigned int, on my machine represents natural numbers up to 232-1), unsigned long, unsigned long long or (from <stdint.h>) types like uint32_t, uint64_t ... you would get unsigned binary numbers of 32 or 64 bits. Some compilers and implementations might know about uint128_t or something similar.
If that is not enough, consider using big ints. You could use a library like GMPlib (but even a big computer is not able to represent extremely large natural numbers -with all their bits-..., and your own brain cannot comprehend them neither).
If you need numbers that can't be negative, your best bet would be unsigned int. If you want to learn more about data types, you can check this site
There's not any particular data type representing natural numbers. But you can use data types for whole numbers and then make some appropriate edits. Here are a few ways to declare whole numbers:
- unsigned short int
- unsigned int
- unsigned long int
- unsigned long long int

Why doesn't this snippet of code print the number normally?

Why do I get two different results? Unsigned long is big enough to handle such number, and it can't be an overflow of some kind, right?
I am deliberately trying to make it show in decimal form, but it just doesn't work.
What could be the reason?
#include <iostream>
using namespace std;
void Print(unsigned long num)
{
cout<<dec<<num<<endl;
}
int main()
{
Print(9110865112);
cout<<dec<<9110865112;
return 0;
}
Edit
It outputs:
520930520
9110865112
unsigned long is not always sufficiently large. With 32 bits it can occupy integers from 0 up to and including 232-1, which is about four billions. 9'110'865'112 is nine billions and would thus not fit into unsigned long.
Try outputting sizeof unsigned long and see what you get.
Also, consider your output: 9110865112 mod 232 is 520930520, which basically proves that unsigned long is 32 bit large on your machine.
The problem is that the numeric literal that you specify is too large to fit in an unsigned long.
When you use the literal directly, the compiler treats it as long long, and chooses the proper overload for operator <<.
To fix this problem, use unsigned long long in the signature of the Print function:
void Print(unsigned long long num)
{
cout<<dec<<num<<endl;
}
Demo.
Because 9,110,865,112 is greater than 32 bits, the method is only accepting 32 of the bits even though you're trying to pass it more.
To fix this, you should use an unsigned long long data type for you num parameter. When you print it directly written as a constant, the code prints out find because the compiler says that constant is an unsigned long long, but when you pass it as an unsigned long, the compiler says that constant should be an unsigned long. Because it's not an unsigned long, it drops some of the bits. (I'm suprised your compiler didn't print out a warning.)
As a reference, an unsigned long can hold values between 0 and 4,294,967,295 (inclusive). Any value great than this should be assigned a larger data type. An unsigned long long can hold values between 0 and 18,446,744,073,709,551,615 (inclusive).
It is worth noting that frequently the data types uint32_t and uint64_t are used in place of unsigned long and unsigned long long respectively. The u denotes that the number is unsigned (if the u is left out, the number is assumed to be signed). The number (64 and 32 in this case) states how many bytes the number should have. And _t at the end just indicates that this is a data type. So (u)int#_t is a common way to write numeric data types; # can be 8, 16, 32, or 64 in standard C++ depending on the number of bits you need.
To summarize: You're throwing a number that's too large at the function. You need to change your function's parameters to support this number:
void Print(uint64_t num){
cout << dec << num << endl;
}

Acting like unsigned int overflow. What is causing it?

I have this function which generates a specified number of so called 'triangle numbers'. If I print out the deque afterwords, the numbers increase, jumps down, then increases again. Triangle numbers should never get lower as i rises so there must be some kind of overflow happening. I tried to fix it by adding the line if(toPush > INT_MAX) return i - 1; to try to stop the function from generating more numbers (and return the number it generated) if the result is overflowing. That is not working however, the output continues to be incorrect (increases for a while, jumps down to a lower number, then increases again). The line I added doesn't actually seem to be doing anything at all. Return is not being reached. Does anyone know what's going on here?
#include <iostream>
#include <deque>
#include <climits>
int generateTriangleNumbers(std::deque<unsigned int> &triangleNumbers, unsigned int generateCount) {
for(unsigned int i = 1; i <= generateCount; i++) {
unsigned int toPush = (i * (i + 1)) / 2;
if(toPush > INT_MAX) return i - 1;
triangleNumbers.push_back(toPush);
}
return generateCount;
}
INT_MAX is the maximum value of signed int. It's about half the maximum value of unsigned int (UINT_MAX). Your calculation of toPush may well get much higher than UINT_MAX because you square the value (if it's near INT_MAX the result will be much larger than UINT_MAX that your toPush can hold). In this case the toPush wraps around and results in smaller value than previous one.
First of all, your comparison to INT_MAX is flawed since your type is unsigned int, not signed int. Secondly, even a comparison to UINT_MAX would be incorrect since it implies that toPush (the left operand of the comparison expression) can hold a value above it's maximum - and that's not possible. The correct way would be to compare your generated number with the previous one. If it's lower, you know you have got an overflow and you should stop.
Additionally, you may want to use types that can hold a larger range of values (such as unsigned long long).
The 92682th triangle number is already greater than UINT32_MAX. But the culprit here is much earlier, in the computation of i * (i + 1). There, the calculation overflows for the 65536th triangular number. If we ask Python with its native bignum support:
>>> 2**16 * (2**16+1) > 0xffffffff
True
Oops. Then if you inspect your stored numbers, you will see your sequence dropping back to low values. To attempt to emulate what the Standard says about the behaviour of this case, in Python:
>>> (int(2**16 * (2**16+1)) % 0xffffffff) >> 1
32768
and that is the value you will see for the 65536th triangular number, which is incorrect.
One way to detect overflow here is ensure that the sequence of numbers you generate is monotonic; that is, if the Nth triangle number generated is strictly greater than the (N-1)th triangle number.
To avoid overflow, you can use 64-bit variables to both generate & store them, or use a big number library if you need a large amount of triangle numbers.
In Visual C++ int (and of course unsigned int) is 32 bits even on 64-bit computers.
Either use unsigned long long or uint64_t to use a 64-bit value.

Overflowing of Unsigned Int

What will the unsigned int contain when I overflow it? To be specific, I want to do a multiplication with two unsigned ints: what will be in the unsigned int after the multiplication is finished?
unsigned int someint = 253473829*13482018273;
unsigned numbers can't overflow, but instead wrap around using the properties of modulo.
For instance, when unsigned int is 32 bits, the result would be: (a * b) mod 2^32.
As CharlesBailey pointed out, 253473829*13482018273 may use signed multiplication before being converted, and so you should be explicit about unsigned before the multiplication:
unsigned int someint = 253473829U * 13482018273U;
Unsigned integer overflow, unlike its signed counterpart, exhibits well-defined behaviour.
Values basically "wrap" around. It's safe and commonly used for counting down, or hashing/mod functions.
It probably depends a bit on your compiler. I had errors like this years ago, and sometimes you would get runtime error, other times it would basically "wrap" back to a really small number that would result from chopping off the highest level bits and leaving the remainder, i.e if it's a 32 bit unsigned int, and the result of your multiplication would be a 34 bit number, it would chop off the high order 2 bits and give you the remainder. You would probably have to try it on your compiler to see exactly what you get, which may not be the same thing you would get with a different compiler, especially if the overflow happens in the middle of an expression where the end result is within the range of an unsigned int.

When I calculate a large factorial, why do I get a negative number?

So, simple procedure, calculate a factorial number. Code is as follows.
int calcFactorial(int num)
{
int total = 1;
if (num == 0)
{
return 0;
}
for (num; num > 0; num--)
{
total *= num;
}
return total;
}
Now, this works fine and dandy (There are certainly quicker and more elegant solutions, but this works for me) for most numbers. However when inputting larger numbers such as 250 it, to put it bluntly, craps out. Now, the first couple factorial "bits" for 250 are { 250, 62250, 15126750, 15438000, 3813186000 } for reference.
My code spits out { 250, 62250, 15126750, 15438000, -481781296 } which is obviously off. My first suspicion was perhaps that I had breached the limit of a 32 bit integer, but given that 2^32 is 4294967296 I don't think so. The only thing I can think of is perhaps that it breaches a signed 32-bit limit, but shouldn't it be able to think about this sort of thing? If being signed is the problem I can solve this by making the integer unsigned but this would only be a temporary solution, as the next iteration yields 938043756000 which is far above the 4294967296 limit.
So, is my problem the signed limit? If so, what can I do to calculate large numbers (Though I've a "LargeInteger" class I made a while ago that may be suited!) without coming across this problem again?
2^32 doesn't give you the limit for signed integers.
The signed integer limit is actually 2147483647 (if you're developing on Windows using the MS tools, other toolsuites/platforms would have their own limits that are probably similar).
You'll need a C++ large number library like this one.
In addition to the other comments, I'd like to point out two serious bugs in your code.
You have no guard against negative numbers.
The factorial of zero is one, not zero.
Yes, you hit the limit. An int in C++ is, by definition, signed. And, uh, no, C++ does not think, ever. If you tell it to do a thing, it will do it, even if it is obviously wrong.
Consider using a large number library. There are many of them around for C++.
If you don't specify signed or unsigned, the default is signed. You can modify this using a command line switch on your compiler.
Just remember, C (or C++) is a very low-level language and does precisely what you tell it to do. If you tell it to store this value in a signed int, that's what it will do. You as the programmer have to figure out when that's a problem. It's not the language's job.
My Windows calculator (Start-Run-Calc) tells me that
hex (3813186000) = E34899D0
hex (-481781296) = FFFFFFFFE34899D0
So yes, the cause is the signed limit. Since factorials can by definition only be positive, and can only be calculated for positive numbers, both the argument and the return value should be unsigned numbers anyway. (I know that everybody uses int i = 0 in for loops, so do I. But that left aside, we should use always unsigned variables if the value can not be negative, it's good practice IMO).
The general problem with factorials is, that they can easily generate very large numbers. You could use a float, thus sacrificing precision but avoiding the integer overflow problem.
Oh wait, according to what I wrote above, you should make that an unsigned float ;-)
If i remember well:
unsigned short int = max 65535
unsigned int = max 4294967295
unsigned long = max 4294967295
unsigned long long (Int64 )= max 18446744073709551615
Edited source:
Int/Long Max values
Modern Compiler Variable