Consider this C++ code in VS13:
long long Number;
Number = 10000000000*900;
Number = 10000000*214;
Number = 10000000*215;
Number = (long long) 10000000*215;
When you compile it you get warning C4307: '*' : integral constant overflow for line 4. When you run the code there is effectively an integral overflow. The other lines are OK. Am I overlooking something?
The 10000000 constant (literal) is by default treated as a int constant as it fits the int type:
The type of the integer literal is the first type in which the value
can fit, from the list of types which depends on which numeric base
and which integer-suffix was used
(http://en.cppreference.com/w/cpp/language/integer_literal)
Therefore the multiplication is done within int type (no integer promotion happens). You can consider the line
Number = 10000000*215;
as if it was
const int tmp1 = 10000000;
const int tmp2 = 215;
Number = tmp1*tmp2;
Obviously this will produce overflow, and similarly the third and the last line will not produce it. For the second line, the compiler understands that 10000000000 does not fit neither int neither long, and therefore uses long long for it.
You can use the ll suffix to force a constant to be long long:
Number = 10000000ll*215;
Literal numbers are by default, understood to be of type int. In Visual Studio, that's a 32-bit integer. Multiplying two ints together results in an int.
Therefore this expression:
10000000*215 // (hex value is 0x80266580.)
is already going to be an overflowed since the expected value can't be expressed as a 32-bit positive int. The compiler will interpret it as -2144967296, which is completely different than 2150000000.
Hence, for force the expression as a 64-bit, at least one of those operands has to be 64-bit. Hence, this works nicely:
Number = 10000000LL * 215; // LL qualifies the literal number as a "long long"
It forces the whole expression (long long multiplied by int) to be treated as a long long. Hence, no overflow.
Related
I want to check if the reverse of an signed int value x lies inside INT_MAX and INT_MIN. For this, I have reversed x twice and checked if it is equal to the original x, if so then it lies inside INT_MAX and INT_MIN, else it does not.
But online compilers are giving a runtime error, but my g++ compiler is working fine and giving the correct output. Can anybody tell me the reason?
int reverse(int x) {
int tx=x,rx=0,ans;
while(tx!=0){
rx = rx+rx+rx+rx+rx+rx+rx+rx+rx+rx+tx%10;
tx/=10;
}
ans = tx = rx;
rx=0;
while(tx!=0){
rx = rx*10 + tx%10;
tx/=10;
}
while(x%10==0&&x!=0)x/=10;
//triming trailing zeros
if(rx!=x){
return 0;
}else{
return ans;
}
}
ERROR:
Line 6: Char 23: runtime error: signed integer overflow: 1929264870 + 964632435 cannot be represented in type 'int' (solution.cpp)
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior prog_joined.cpp:15:23
According to the cppreference:
Overflows
...
When signed integer arithmetic operation overflows (the result does not fit in the result type), the behavior is undefined: it may wrap around according to the rules of the representation (typically 2's complement), it may trap on some platforms or due to compiler options (e.g. -ftrapv in GCC and Clang), or may be completely optimized out by the compiler.
So the online compiler may comes with more strict checking, while your local GCC choosed to "wrap" on overflow.
Instead, if you want to wrap on overflow for sure, you may promote your operands to 64 bit width, perform the addition and then convert the result to 32 bit width again. According to cppreference this seems to be well defined after C++20.
Numeric conversions
Unlike the promotions, numeric conversions may change the values, with potential loss of precision.
Integral conversions
...
if the destination type is signed, the value does not change if the source integer can be represented in the destination type. Otherwise the result is { implementation-defined (until C++20) } / { the unique value of the destination type equal to the source value modulo 2n
where n is the number of bits used to represent the destination type. (since C++20) }
Example code on godbolt
I'm not sure what's wrong with your algorithm, but let me suggest an alternative that avoids doing math and the possibility of integer overflows.
If you want to find of if say, 2147483641 is a valid integer when reversed (e.g. 1463847412), you can do it entirely as string comparisons after converting the initial value to a string and reversing that string.
Basic algorithm for a non-negative value it this:
convert integer to string (let's call this string s)
convert INT_MAX to a string (let's call this string max_s)
reverse s. Handle leading zero's and negative chars appropriately. That is, 120 reversed is "21" and -456 reversed is "-654". The result of this conversion is a string called rev.
If rev.length() < s_max.length() Then rev is valid as an integer.
If rev.length() > s_max.length(), then rev is not valid as a integer.
If rev.size() == s_max.length(), then the reversed string is the same length as INT_MAX as a string. A lexical comparison will suffice to see if it's valid. That is isValid = (rev <= s_max) or isValid = strcmp(rev, s_max) < 1
The algorithm for a negative value is identical except replace INT_MAX with INT_MIN.
We know that -2*4^31 + 1 = -9.223.372.036.854.775.807, the lowest value you can store in long long, as being said here: What range of values can integer types store in C++.
So I have this operation:
#include <iostream>
unsigned long long pow(unsigned a, unsigned b) {
unsigned long long p = 1;
for (unsigned i = 0; i < b; i++)
p *= a;
return p;
}
int main()
{
long long nr = -pow(4, 31) + 5 -pow(4,31);
std::cout << nr << std::endl;
}
Why does it show -9.223.372.036.854.775.808 instead of -9.223.372.036.854.775.803? I'm using Visual Studio 2015.
This is a really nasty little problem which has three(!) causes.
Firstly there is a problem that floating point arithmetic is approximate. If the compiler picks a pow function returning float or double, then 4**31 is so large that 5 is less than 1ULP (unit of least precision), so adding it will do nothing (in other words, 4.0**31+5 == 4.0**31). Multiplying by -2 can be done without loss, and the result can be stored in a long long without loss as the wrong answer: -9.223.372.036.854.775.808.
Secondly, a standard header may include other standard headers, but is not required to. Evidently, Visual Studio's version of <iostream> includes <math.h> (which declares pow in the global namespace), but Code::Blocks' version doesn't.
Thirdly, the OP's pow function is not selected because he passes arguments 4, and 31, which are both of type int, and the declared function has arguments of type unsigned. Since C++11, there are lots of overloads (or a function template) of std::pow. These all return float or double (unless one of the arguments is of type long double - which doesn't apply here).
Thus an overload of std::pow will be a better match ... with a double return values, and we get floating point rounding.
Moral of the story: Don't write functions with the same name as standard library functions, unless you really know what you are doing!
Visual Studio has defined pow(double, int), which only requires a conversion of one argument, whereas your pow(unsigned, unsigned) requires conversion of both arguments unless you use pow(4U, 31U). Overloading resolution in C++ is based on the inputs - not the result type.
The lowest long long value can be obtained through numeric_limits. For long long it is:
auto lowest_ll = std::numeric_limits<long long>::lowest();
which results in:
-9223372036854775808
The pow() function that gets called is not yours hence the observed results. Change the name of the function.
The only possible explaination for the -9.223.372.036.854.775.808 result is the use of the pow function from the standard library returning a double value. In that case, the 5 will be below the precision of the double computation, and the result will be exactly -263 and converted to a long long will give 0x8000000000000000 or -9.223.372.036.854.775.808.
If you use you function returning an unsigned long long, you get a warning saying that you apply unary minus to an unsigned type and still get an ULL. So the whole operation should be executed as unsigned long long and should give without overflow 0x8000000000000005 as unsigned value. When you cast it to a signed value, the result is undefined, but all compilers I know simply use the signed integer with same representation which is -9.223.372.036.854.775.803.
But it would be simple to make the computation as signed long long without any warning by just using:
long long nr = -1 * pow(4, 31) + 5 - pow(4,31);
As a addition, you have neither undefined cast nor overflow here so the result is perfectly defined per standard provided unsigned long long is at least 64 bits.
Your first call to pow is using the C standard library's function, which operates on floating points. Try giving your pow function a unique name:
unsigned long long my_pow(unsigned a, unsigned b) {
unsigned long long p = 1;
for (unsigned i = 0; i < b; i++)
p *= a;
return p;
}
int main()
{
long long nr = -my_pow(4, 31) + 5 - my_pow(4, 31);
std::cout << nr << std::endl;
}
This code reports an error: "unary minus operator applied to unsigned type, result still unsigned". So, essentially, your original code called a floating point function, negated the value, applied some integer arithmetic to it, for which it did not have enough precision to give the answer you were looking for (at 19 digits of presicion!). To get the answer you're looking for, change the signature to:
long long my_pow(unsigned a, unsigned b);
This worked for me in MSVC++ 2013. As stated in other answers, you're getting the floating-point pow because your function expects unsigned, and receives signed integer constants. Adding U to your integers invokes your version of pow.
With regard to to those definitions found in stdint.h, I wish to test a function for converting vectors of int8_t or vectors of int64_t to vectors of std::string.
Here are my tests:
TEST(TestAlgorithms, toStringForInt8)
{
std::vector<int8_t> input = boost::assign::list_of(-128)(0)(127);
Container container(input);
EXPECT_TRUE(boost::apply_visitor(ToString(),container) == boost::assign::list_of("-128")("0")("127"));
}
TEST(TestAlgorithms, toStringForInt64)
{
std::vector<int64_t> input = boost::assign::list_of(-9223372036854775808)(0)(9223372036854775807);
Container container(input);
EXPECT_TRUE(boost::apply_visitor(ToString(),container) == boost::assign::list_of("-9223372036854775808")("0")("9223372036854775807"));
}
However, I am getting a warning in visual studio for the line:
std::vector<int64_t> input = boost::assign::list_of(-9223372036854775808)(0)(9223372036854775807);
as follows:
warning C4146: unary minus operator applied to unsigned type, result still unsigned
If I change -9223372036854775808 to -9223372036854775807, the warning disappears.
What is the issue here? With regard to my original code, the test is passing.
It's like the compiler says, -9223372036854775808 is not a valid number because the - and the digits are treated separately.
You could try -9223372036854775807 - 1 or use std::numeric_limits<int64_t>::min() instead.
The issue is that integer literals are not negative; so -42 is not a literal with a negative value, but rather the - operator applied to the literal 42.
In this case, 9223372036854775808 is out of the range of int64_t, so it will be given an unsigned type. Due to the magic of modular arithmetic, you can still negate it, assign it to int64_t, and end up with the result you expect; but the compiler will warn you about the unsigned negation (if you tell it to) since that can often be the result of an error.
You could avoid the warning (and make the code more obviously correct) by using std::numeric_limits<int64_t>::min() instead.
Change -9223372036854775808 to -9223372036854775807-1.
The issue is that -9223372036854775808 isn't -9223372036854775808 but rather -(9223372036854775808) and 9223372036854775808 cannot fit into a signed 64-bit type (decimal integer constants by default are a signed type), so it instead becomes unsigned. Applying negation with - to an unsigned type is suspicious, hence the warning.
In the code below, why 1-byte anUChar is automatically converted into 4 bytes to produce the desired result 0x300 (instead of 0x0 if anUChar would remain 1 byte in size):
unsigned char anUChar = 0xc0; // only the two most significant bits are set
int anInt = anUChar << 2; // 0x300 (correct)
But in this code, aimed at a 64-bit result, no automatic conversion into 8 bytes happens:
unsigned int anUInt = 0xc0000000; // only the two most significant bits are set
long long aLongLong = anUInt << 2; // 0x0 (wrong, means no conversion occurred)
And only placing an explicit type cast works:
unsigned int anUInt = 0xc0000000;
long long aLongLong = (long long)anUInt << 2; // 0x300000000 (correct)
And most importantly, would this behavior be the same in a program that targets 64-bit machines?
By the way, which of the two is most right and portable: (type)var << 1 or ((type)var) << 1?
char always gets promoted to int during arithmetic. I think this is specified behavior in the C standard.
However, int is not automatically promoted to long long.
Under some situations, some compilers (Visual Studio) will actually warn you about this if you try to left-shift a smaller integer and store it into a larger one.
By the way, which of the two is most right and portable: (type)var <<
1 or ((type)var) << 1?
Both are fine and portable. Though I prefer the first one since it's shorter. Casting has higher precedence than shift.
Conversion does happen. The problem is the result of the expression anUInt << 2 is an unsigned int because anUInt is an unsigned int.
Casting anUInt to a long long (actually, this is conversion in this particular case) is the correct thing to do.
Neither (type)var << 1 or ((type)var) << 1 is more correct or portable because operator precedence is strictly defined by the Standard. However, the latter is probably better because it's easier to understand to humans looking at the code casually. Others may disagree with this assertion.
EDIT:
Note that in your first example:
unsigned char anUChar = 0xc0;
int anInt = anUChar << 2;
...the result of the expression anUChar << 2 is not an unsigned char as you might expect, but an int because of Integral Promotion.
The operands of operator<< are integral or enumeration type (See Standard 5.8/1). When a binary operator that expects operands of arithmetic or enumeration type is called, the compiler attempts to convert both operands to the same type, so that the expression may yield a common type. In this case, integral promotion is performed on both operands (5/9). When an unsigned char takes part in integral promotion, it will be converted to an int if your platform can accomodate all possible values of unsigned char in an int, else it will be converted to an unsigned int (4.5/1).
Shorter integral types are promoted to an int type for bitshift operations. This has nothing to do with the type to which you assign the result of the shift.
On 64-bit machines, your second piece of code would be equally problematic since the int types are usually also 32 bit wide. (Between x86 and x64, long long int is typically always 64 and int 32 bits, only long int depends on the platform.)
(In the spirit of C++, I would write the conversion as (unsigned long long int)(anUInt) << 2, evocative of the conversion-constructor syntax. The first set of parentheses is purely because the type name consists of several tokens.)
I would also prefer to do bitshifting exclusively on unsigned types, because only unsigned types can be considered equivalent (in terms of values) to their own bit pattern value.
Because of integer promotions. For most operators (e.g. <<), char operands are promoted to int first.
This has nothing to do with where the result of the calculation is going. In other words, the fact that your second example assigns to a long long does not affect the promotion of the operands.
In case of integer overflows what is the result of (unsigned int) * (int) ? unsigned or int? What type does the array index operator (operator[]) take for char*: int, unsigned int or something else?
I was auditing the following function, and suddenly this question arose. The function has a vulnerability at line 17.
// Create a character array and initialize it with init[]
// repeatedly. The size of this character array is specified by
// w*h.
char *function4(unsigned int w, unsigned int h, char *init)
{
char *buf;
int i;
if (w*h > 4096)
return (NULL);
buf = (char *)malloc(4096+1);
if (!buf)
return (NULL);
for (i=0; i<h; i++)
memcpy(&buf[i*w], init, w); // line 17
buf[4096] = '\0';
return buf;
}
Consider both w and h are very large unsigned integers. The multiplication at line 9 have a chance to pass the validation.
Now the problem is at line 17. Multiply int i with unsigned int w: if the result is int, it is possible that the product is negative, resulting in accessing a position that is before buf. If the result is unsigned int, the product will always be positive, resulting in accessing a position that is after buf.
It's hard to write code to justify this: int is too large. Does anyone has ideas on this?
Is there any documentation that specifies the type of the product? I have searched for it, but so far haven't found anything.
I suppose that as far as the vulnerability is concerned, whether (unsigned int) * (int) produces unsigned int or int doesn't matter, because in the compiled object file, they are just bytes. The following code works the same no matter the type of the product:
unsigned int x = 10;
int y = -10;
printf("%d\n", x * y); // print x * y in signed integer
printf("%u\n", x * y); // print x * y in unsigned integer
Therefore, it does not matter what type the multiplication returns. It matters that whether the consumer function takes int or unsigned.
The question here is not how bad the function is, or how to improve the function to make it better. The function undoubtedly has a vulnerability. The question is about the exact behavior of the function, based on the prescribed behavior from the standards.
do the w*h calculation in long long, check if bigger than MAX_UINT
EDIT : alternative : if overflown (w*h)/h != w (is this always the case ?! should be, right ?)
To answer your question: the type of an expression multiplying an int and an unsigned int will be an unsigned int in C/C++.
To answer your implied question, one decent way to deal with possible overflow in integer arithmetic is to use the "IntSafe" set of routines from Microsoft:
http://blogs.msdn.com/michael_howard/archive/2006/02/02/523392.aspx
It's available in the SDK and contains inline implementations so you can study what they're doing if you're on another platform.
Ensure that w * h doesn't overflow by limiting w and h.
The type of w*i is unsigned in your case. If I read the standard correctly, the rule is that the operands are converted to the larger type (with its signedness), or unsigned type corresponding to the signed type (which is unsigned int in your case).
However, even if it's unsigned, it doesn't prevent the wraparound (writing to memory before buf), because it might be the case (on i386 platform, it is), that p[-1] is the same as p[-1u]. Anyway, in your case, both buf[-1] and buf[big unsigned number] would be undefined behavior, so the signed/unsigned question is not that important.
Note that signed/unsigned matters in other contexts - eg. (int)(x*y/2) gives different results depending on the types of x and y, even in the absence of undefined behaviour.
I would solve your problem by checking for overflow on line 9; since 4096 is a pretty small constant and 4096*4096 doesn't overflow on most architectures (you need to check), I'd do
if (w>4096 || h>4096 || w*h > 4096)
return (NULL);
This leaves out the case when w or h are 0, you might want to check for it if needed.
In general, you could check for overflow like this:
if(w*h > 4096 || (w*h)/w!=h || (w*h)%w!=0)
In C/C++ the p[n] notation is really a shortcut to writting *(p+n), and this pointer arithmetic takes into account the sign. So p[-1] is valid and refers to the value immediately before *p.
So the sign really matters here, the result of arithmetic operator with integer follow a set of rules defined by the standard, and this is called integer promotions.
Check out this page: INT02-C. Understand integer conversion rules
2 changes make it safer:
if (w >= 4096 || h >= 4096 || w*h > 4096) return NULL;
...
unsigned i;
Note also that it's not less a bad idea to write to or read from past the buffer end. So the question is not whether iw may become negative, but whether 0 <= ih +w <= 4096 holds.
So it's not the type that matters, but the result of h*i.
For example, it doesn't make a difference whether this is (unsigned)0x80000000 or (int)0x80000000, the program will seg-fault anyway.
For C, refer to "Usual arithmetic conversions" (C99: Section 6.3.1.8, ANSI C K&R A6.5) for details on how the operands of the mathematical operators are treated.
In your example the following rules apply:
C99:
Otherwise, if the type of the operand
with signed integer type can represent
all of the values of the type of the
operand with unsigned integer type,
then the operand with unsigned integer
type is converted to the type of the
operand with signed integer type.
Otherwise, both operands are converted
to the unsigned integer type
corresponding to the type of the
operand with signed integer type.
ANSI C:
Otherwise, if either operand is unsigned int, the other is converted to unsigned int.
Why not just declare i as unsigned int? Then the problem goes away.
In any case, i*w is guaranteed to be <= 4096, as the code tests for this, so it's never going to overflow.
memcpy(&buf[iw > -1 ? iw < 4097? iw : 0 : 0], init, w);
I don't think the triple calculation of iw does degrade the perfomance)
w*h could overflow if w and/or h are sufficiently large and the following validation could pass.
9. if (w*h > 4096)
10. return (NULL);
On int , unsigned int mixed operations, int is elevated to unsigned int, in which case, a negative value of 'i' would become a large positive value. In that case
&buf[i*w]
would be accessing a out of bound value.
Unsigned arithmetic is done as modular (or wrap-around), so the product of two large unsigned ints can easily be less than 4096. The multiplication of int and unsigned int will result in an unsigned int (see section 4.5 of the C++ standard).
Therefore, given large w and a suitable value of h, you can indeed get into trouble.
Making sure integer arithmetic doesn't overflow is difficult. One easy way is to convert to floating-point and doing a floating-point multiplication, and seeing if the result is at all reasonable. As qwerty suggested, long long would be usable, if available on your implementation. (It's a common extension in C90 and C++, does exist in C99, and will be in C++0x.)
There are 3 paragraphs in the current C1X draft on calculating (UNSIGNED TYPE1) X (SIGNED TYPE2) in 6.3.1.8 Usual arithmetic coversions, N1494,
WG 14: C - Project status and milestones
Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.
Otherwise, if the type of the operand with signed integer type can represent
all of the values of the type of the operand with unsigned integer type, then
the operand with unsigned integer type is converted to the type of the
operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
So if a is unsigned int and b is int, parsing of (a * b) should generate code (a * (unsigned int)b). Will overflow if b < 0 or a * b > UINT_MAX.
If a is unsigned int and b is long of greater size, (a * b) should generate ((long)a * (long)b). Will overflow if a * b > LONG_MAX or a * b < LONG_MIN.
If a is unsigned int and b is long of the same size, (a * b) should generate ((unsigned long)a * (unsigned long)b). Will overflow if b < 0 or a * b > ULONG_MAX.
On your second question about the type expected by "indexer", the answer appears "integer type" which allows for any (signed) integer index.
6.5.2.1 Array subscripting
Constraints
1 One of the expressions shall have type ‘‘pointer to complete object type’’, the other
expression shall have integer type, and the result has type ‘‘type’’.
Semantics
2 A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
It is up to the compiler to perform static analysis and warn the developer about possibility of buffer overrun when the pointer expression is an array variable and the index may be negative. Same goes about warning on possible array size overruns even when the index is positive or unsigned.
To actually answer your question, without specifying the hardware you're running on, you don't know, and in code intended to be portable, you shouldn't depend on any particular behavior.