Integer overflow in boolean expressions - c++

I have the following c++ code:
#include <iostream>
using namespace std;
int main()
{
long long int currentDt = 467510400*1000000;
long long int addedDt = 467510400*1000000;
if(currentDt-addedDt >= 0 && currentDt-addedDt <= 30*24*3600*1000000)
{
cout << "1" << endl;
cout << currentDt-addedDt << endl;
}
if(currentDt-addedDt > 30*24*3600*1000000 && currentDt-addedDt <= 60*24*3600*1000000)
{
cout << "2" << endl;
cout << currentDt-addedDt << endl;
}
if(currentDt-addedDt > 60*24*3600*1000000 && currentDt-addedDt <= 90*24*3600*1000000)
{
cout << "3" << endl;
cout << currentDt-addedDt << endl;
}
return 0;
}
Firstly, I get a warning for integer overflow, which strikes me as odd because the number 467510400*1000000 falls well within the range of a long long int, does it not? Secondly, I get the following output:
1
0
3
0
If in both cases currentDt-addedDt evaluates to 0, how could the third if statement possibly evaluate to true?

467510400*1000000 is within the range of long long, but it's not within the range of int. Since both literals are of type int, the type of the product is also of type int - and that will overflow. Just because you're assigning the result to a long long doesn't change the value that gets assigned. For the same reason that in:
double d = 1 / 2;
d will hold 0.0 and not 0.5.
You need to explicitly cast one of the literals to be of a larger integral type. For example:
long long int addedDt = 467510400LL * 1000000;

long long int currentDt = 467510400ll*1000000ll;
long long int addedDt = 467510400ll*1000000ll;
Note the two lowercase letter "l"s following the digits. These make your constants long long. C++ normally interpret strings of digits in source as plain ints.

The problem you are having is that all of your integer literals are int. When you multiply them they overflow giving you the unexpected behavior. To correct this you can make them long long literals using 467510400ll * 1000000ll

It because
60*24*3600*1000000 evaluates to -25526272
use
60LL*24LL*3600LL*1000000LL
instead (note the 'LL' suffix)

You have tagged this with C++.
My minimal change to your code would use the c++ static_cast to promote at least one of the literal numbers (of any overflow generating expression) to an int64_t (found in include file cstdint).
Example:
// 0 true
if(currentDt-addedDt >= 0
&& // true because vvvv
// 0 true
currentDt-addedDt <= 30*24*3600*static_cast<int64_t>(1000000))
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(for test 1, the result of the if clause is true.
for test 2 and 3 is false)
Upon finding the static_cast, the compiler promotes the 3 other integers (in the clause) to int64_t, and thus generates no warnings about overflow.
Yes, it adds a lot of chars for being, in some sense, 'minimal'.

Related

Is it Normal For Chars to be Between -128 and 127? [duplicate]

This question already has answers here:
Why is 'char' signed by default in C++?
(2 answers)
Is char signed or unsigned by default?
(6 answers)
Closed 5 years ago.
To find out the range of integer values for a standard 8-bit char, I ran the following code:
int counter = 0;
for (int i = -300; i <= 300; i++)
{
char c = static_cast<char> (i);
int num = static_cast<int> (c);
if (num == i)
{
counter ++;
}
else
{
cout << "Bad: " << i << "\n";
}
}
cout << "\n" << counter;
I ended up seeing a value of 256 for counter, which makes sense. However, on the list of "Bad" numbers (i.e., numbers that chars don't store), I found that the greatest Bad negative number was -129, while the smallest Bad positive number was 128.
From this test, it seems like chars only store integer values from -128 to 127. Is this conclusion correct, or am I missing something? Because I always figured chars stored integer values from 0 to 255.
Although implementation defined, for the most part - yes it is normal as your implementation defines char as a signed char. You can use the CHAR_MIN and CHAR_MAX macros to print out the minimum and maximum values of type char:
#include <iostream>
#include <cstdint>
int main() {
std::cout << CHAR_MIN << '\n';
std::cout << CHAR_MAX << '\n';
}
Or using the std::numeric_limits class template:
#include <iostream>
#include <limits>
int main() {
std::cout << static_cast<int>(std::numeric_limits<char>::min()) << '\n';
std::cout << static_cast<int>(std::numeric_limits<char>::max()) << '\n';
}
As for the 0..255 range that is the unsigned char type. Min value is 0 and max should be 255. It can be printed out using:
std::cout << UCHAR_MAX;
Whether the type is signed or not can be checked via:
std::numeric_limits<char>::is_signed;
Excerpt from the char type reference:
char - type for character representation which can be most
efficiently processed on the target system (has the same
representation and alignment as either signed char or unsigned
char, but is always a distinct type).

C++ literal integer type

Do literal expressions have types too ?
long long int a = 2147483647+1 ;
long long int b = 2147483648+1 ;
std::cout << a << ',' << b ; // -2147483648,2147483649
Yes, literal numbers have types. The type of an unsuffixed decimal integer literal is the first of int, long, long long in which the integer can be represented. The type of binary, hex and octal literals is selected similarly but with unsigned types in the list as well.
You can force the use of unsigned types by using a U suffix. If you use a single L in the suffix then the type will be at least long but it might be long long if it cannot be represented as a long. If you use LL, then the type must be long long (unless the implementation has extended types wider than long long).
The consequence is that if int is a 32-bit type and long is 64 bits, then 2147483647 has type int while 2147483648 has type long. That means that 2147483647+1 will overflow (which is undefined behaviour), while 2147483648+1 is simply 2147483649L.
This is defined by ยง2.3.12 ([lex.icon]) paragraph 2 of the C++ standard, and the above description is a summary of Table 7 from that section.
It's important to remember that the type of the destination of the assignment does not influence in any way the value of the expression on the right-hand side of the assignment. If you want to force a computation to have a long long result you need to force some argument of the computation to be long long; just assigning to a long long variable isn't enough:
long long a = 2147483647 + 1LL;
std::cout << a << '\n';
produces
2147483648
(live on coliru)
int a = INT_MAX ;
long long int b = a + 1 ; // adds 1 to a and convert it then to long long ing
long long int c = a; ++c; // convert a to long long int and increment the result with 1
cout << a << std::endl; // 2147483647
cout << b << std::endl; // -2147483648
cout << c << std::endl; // 2147483648
cout << 2147483647 + 1 << std::endl; // -2147483648 (by default integer literal is assumed to be int)
cout << 2147483647LL + 1 << std::endl; // 2147483648 (force the the integer literal to be interpreted as a long long int)
You can find more information about integer literals here.

why does subtraction overflow with static_cast?

I understand that s1.size() - s2.size() underflows when s2 is bigger because it's subtraction of unsigned.
Why casting one them to int doesn't result in integer subtraction?
Why casting the whole thing gives me the correct result? I expected it to evaluate what is inside parentheses, then underflow which would give a big number and then the cast to int would not make difference. What am I missing?
#include <iostream>
#include <string>
using std::cout;
using std::cin;
using std::endl;
using std::string;
bool isShorter(const string &s1, const string &s2) {
return (static_cast<int>(s1.size()) - s2.size() < 0) ? true : false; // underflows
//return (static_cast<int>(s1.size() - s2.size()) < 0) ? true : false; // this works
}
int main() {
string s, t;
getline(cin, s);
getline(cin, t);
cout << "s: " << s << endl;
cout << "t: " << t << endl;
cout << "printing shorter string of the two..." << endl;
cout << ((isShorter(s, t)) ? s : t) << endl;
}
When you do
static_cast<int>(s1.size()) - s2.size()
You convert s1.size() to a int and then when you subtract s2.size() from it that int is promoted to the same type as s2.size() and then it is subtracted. This means you still have unsigned integer subtraction and since that can't ever be negative it will wrap around to a larger number. It is no different from doing s1.size() - s2.size().
You have the same thing with
static_cast<int>(s1.size() - s2.size())
With the added bonus of possible signed integer overflow which is undefined behavior. You are still doing unsigned integer subtraction so if s1 is smaller than s2 than you wrap around to a large number.
What you need to do is convert both s1.size() and s2.size() to a signed integer type to get singed integer subtraction. That could look like
static_cast<ptrdiff_t>(s1.size()) - static_cast<ptrdiff_t>(s2.size())
And now you will actually get a negative number if s1.size() is less than s2.size().
It should be noted that all of this can be avoided by using less than operator. Your function can be rewritten to be
bool isShorter(const string &s1, const string &s2)
{
return s1.size() < s2.size();
}
which, IMHO, is much easier to read and understand.
Casting "one of them" to int leaves you with arithmetic operation that mixes string::size_type and int. In this mix the unsigned type has the same rank as int or higher, which means that the unsigned type still "wins": your int is implicitly converted back to string::size_type and the calculations are performed in the domain of string::size_type. Your conversion to int is effectively ignored.
Meanwhile, casting the result to int means that you are attempting to convert a value that does not fit into int's range. The behavior in such cases is implementation defined. In real-life 2's-complement implementations it is not unusual to see a simple truncation of the representation, which produces the "correct" result. This is not a good approach though.
If you want to perform this subtraction as a signed one, you have to convert both operands to signed types, making sure that the target signed type can represent both values.
(Theoretically, you can get away with converting just one operand to signed type, but for that you'd need to choose a type that can represent the entire range of string::size_type.)

Is it more portable to use ~0 or -1 to represent a type with all bits flipped to 1?

I saw an code example today which used the following form to check against -1 for an unsigned 64-bit integer:
if (a == (uint64_t)~0)
Is there any use case where you would WANT to compare against ~0 instead of something like std::numeric_limits<uint64_t>::max() or straight up -1? The original intent was unclear to me as I'd not seen a comparison like this before.
To clarify, the comparison is checking for an error condition where the unsigned integer type will have all of its bits set to 1.
UPDATE
According to https://stackoverflow.com/a/809341/1762276, -1 does not always represent all bits flipped to 1 but ~0 does. Is this correct?
I recommend you to do it exactly as you have shown, since it is the
most straight forward one. Initialize to -1 which will work always,
independent of the actual sign representation, while ~ will sometimes
have surprising behavior because you will have to have the right
operand type. Only then you will get the most high value of an
unsigned type.
I believe this error case is handled so long as ~0 is always case to the correct type (as indicated). So this would suggest that (uint64_t)~0 is indeed a more accurate and portal representation of an unsigned type with all bits flipped?
All of the following seem to be true (GCC x86_x64):
#include <iostream>
#include <limits>
using namespace std;
int main() {
uint64_t a = 0xFFFFFFFFFFFFFFFF;
cout << (int)(a == -1) << endl;
cout << (int)(a == ~0) << endl;
cout << (int)(a == (uint64_t)-1) << endl;
cout << (int)(a == (uint64_t)~0) << endl;
cout << (int)(a == static_cast<uint64_t>(-1)) << endl;
cout << (int)(a == static_cast<uint64_t>(~0)) << endl;
cout << (int)(a == std::numeric_limits<uint64_t>::max()) << endl;
return 0;
}
Result:
1
1
1
1
1
1
1
In general you should be casting before applying the operator, because casting to a wider unsigned type may or may not cause sign extension depending on whether the source type is signed.
If you want a value of primitive type T with all bits set, the most portable approach is ~T(0). It should work on any number-like classes as well.
As Mr. Bingley said, the types from stdint.h are guaranteed to be two's-complement, so that -T(1) will also give a value with all bits set.
The source you reference has the right thought but misses some of the details, for example neither of (T)~0u nor (T)-1u will be the same as ~T(0u) and -T(1u). (To be fair, litb wasn't talking about widening in that answer you linked)
Note that if there are no variables, just an unsuffixed literal 0 or -1, then the source type is guaranteed to be signed and none of the above concerns apply. But why write different code when dealing with literals, when the universally correct code is no more complex?
std::numeric_limits<uint64_t>::max() is same as (uint64_t)~0 witch is same as (uint64_t)-1
look to this example of code:
#include <iostream>
#include <stdint.h>
using namespace std;
int main()
{
bool x = false;
cout << x << endl;
x = std::numeric_limits<uint64_t>::max() == (uint64_t)~0;
cout << x << endl;
x = false;
cout << x << endl;
x = std::numeric_limits<uint64_t>::max() == (uint64_t)-1;
cout << x;
}
Result:
0
1
0
1
so it's more simple to write (uint64_t)~0 or (uint64_t)-1 than std::numeric_limits<uint64_t>::max() in the code.
The fixed-width integer types like uint64_t are guaranteed to be represented in two's complement, so for those -1 and ~0 are equivalent. For the normal integer types (like int or long) this is not necessarily the case, since the C++ standard does not specify their bit representations.

Reading 'unsigned int' using 'cin'

I am trying to read an unsigned int using cin as follows:
#include <limits.h>
#include <iostream>
using namespace std;
int main(int argc, char* argv[])
{
unsigned int number;
// UINT_MAX = 4294967295
cout << "Please enter a number between 0 and " << UINT_MAX << ":" << endl;
cin >> number;
// Check if the number is a valid unsigned integer
if ((number < 0) || ((unsigned int)number > UINT_MAX))
{
cout << "Invalid number." << endl;
return -1;
}
return 0;
}
However, whenever I enter a value greater than the upper limit of unsigned integer (UINT_MAX), the program displays 3435973836. How do I check if the input given by user falls between 0 to UINT_MAX?
Two things:
Checking if an unsigned integer is < 0 or > UINT_MAX is pointless, since it can never reach that value! Your compiler probably already complains with a warning like "comparison is always false due to limited range of type".
The only solution I can think of is catching the input in a string, then use old-fashioned strtoul() which sets errno in case of overflow.
I.e.:
#include <stdlib.h>
unsigned long number;
std::string numbuf;
cin >> numbuf;
number = strtoul(numbuf.c_str(), 0, 10);
if (ULONG_MAX == number && ERANGE == errno)
{
std::cerr << "Number too big!" << std::endl;
}
Note: strtoul returns an unsigned long; there's no function strtou(), returning an unsigned int.
Your check makes no sense (which a compiler with properly enabled warnings would tell you) as your value is never under 0 and never over UINT_MAX, since those are the smallest and biggest value a variable of the type unsigned int (which number is) can hold.
Use the stream state to determine if reading into the integer worked properly.
You could read into an unsigned long long and test that against the unsigned int limit.
When users enter a number higher than UINT_MAX, cin caps it at UINT_MAX anyway. The value cannot be negative, either.
If you need to extend the range, use unsigned long long for input, and cast to unsigned int after the check. This will not guard against numbers that are outside of range of unsigned long long, though.
For a general-purpose solution, you can read a string, and do a conversion yourself using unsigned long long as your result.
If you try to read it into an unsigned int you are going to have to limit yourself to the constraints of an unsigned int.
The most general way to do what you're asking is to read the input as a string and parse it to make sure it's in the proper range. Once you have validated it, you can convert it to an unsigned int.