(-2147483648> 0) returns true in C++? - c++

-2147483648 is the smallest integer for integer type with 32 bits, but it seems that it will overflow in the if(...) sentence:
if (-2147483648 > 0)
std::cout << "true";
else
std::cout << "false";
This will print true in my testing. However, if we cast -2147483648 to integer, the result will be different:
if (int(-2147483648) > 0)
std::cout << "true";
else
std::cout << "false";
This will print false.
I'm confused. Can anyone give an explanation on this?
Update 02-05-2012:
Thanks for your comments, in my compiler, the size of int is 4 bytes. I'm using VC for some simple testing. I've changed the description in my question.
That's a lot of very good replys in this post, AndreyT gave a very detailed explanation on how the compiler will behave on such input, and how this minimum integer was implemented. qPCR4vir on the other hand gave some related "curiosities" and how integers are represented. So impressive!

-2147483648 is not a "number". C++ language does not support negative literal values.
-2147483648 is actually an expression: a positive literal value 2147483648 with unary - operator in front of it. Value 2147483648 is apparently too large for the positive side of int range on your platform. If type long int had greater range on your platform, the compiler would have to automatically assume that 2147483648 has long int type. (In C++11 the compiler would also have to consider long long int type.) This would make the compiler to evaluate -2147483648 in the domain of larger type and the result would be negative, as one would expect.
However, apparently in your case the range of long int is the same as range of int, and in general there's no integer type with greater range than int on your platform. This formally means that positive constant 2147483648 overflows all available signed integer types, which in turn means that the behavior of your program is undefined. (It is a bit strange that the language specification opts for undefined behavior in such cases, instead of requiring a diagnostic message, but that's the way it is.)
In practice, taking into account that the behavior is undefined, 2147483648 might get interpreted as some implementation-dependent negative value which happens to turn positive after having unary - applied to it. Alternatively, some implementations might decide to attempt using unsigned types to represent the value (for example, in C89/90 compilers were required to use unsigned long int, but not in C99 or C++). Implementations are allowed to do anything, since the behavior is undefined anyway.
As a side note, this is the reason why constants like INT_MIN are typically defined as
#define INT_MIN (-2147483647 - 1)
instead of the seemingly more straightforward
#define INT_MIN -2147483648
The latter would not work as intended.

The compiler (VC2012) promote to the "minimum" integers that can hold the values. In the first case, signed int (and long int) cannot (before the sign is applied), but unsigned int can: 2147483648 has unsigned int ???? type.
In the second you force int from the unsigned.
const bool i= (-2147483648 > 0) ; // --> true
warning C4146: unary minus operator applied to unsigned type, result still unsigned
Here are related "curiosities":
const bool b= (-2147483647 > 0) ; // false
const bool i= (-2147483648 > 0) ; // true : result still unsigned
const bool c= ( INT_MIN-1 > 0) ; // true :'-' int constant overflow
const bool f= ( 2147483647 > 0) ; // true
const bool g= ( 2147483648 > 0) ; // true
const bool d= ( INT_MAX+1 > 0) ; // false:'+' int constant overflow
const bool j= ( int(-2147483648)> 0) ; // false :
const bool h= ( int(2147483648) > 0) ; // false
const bool m= (-2147483648L > 0) ; // true
const bool o= (-2147483648LL > 0) ; // false
C++11 standard:
2.14.2 Integer literals [lex.icon]
…
An integer literal is a sequence of digits that has no period or
exponent part. An integer literal may have a prefix that specifies its
base and a suffix that specifies its type.
…
The type of an integer literal is the first of the corresponding list
in which its value can be represented.
If an integer literal cannot be represented by any type in its list
and an extended integer type (3.9.1) can represent its value, it may
have that extended integer type. If all of the types in the list for
the literal are signed, the extended integer type shall be signed. If
all of the types in the list for the literal are unsigned, the
extended integer type shall be unsigned. If the list contains both
signed and unsigned types, the extended integer type may be signed or
unsigned. A program is ill-formed if one of its translation units
contains an integer literal that cannot be represented by any of the
allowed types.
And these are the promotions rules for integers in the standard.
4.5 Integral promotions [conv.prom]
A prvalue of an integer type other than bool, char16_t, char32_t, or
wchar_t whose integer conversion rank (4.13) is less than the rank of
int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be
converted to a prvalue of type unsigned int.

In Short, 2147483648 overflows to -2147483648, and (-(-2147483648) > 0) is true.
This is how 2147483648 looks like in binary.
In addition, in the case of signed binary calculations, the most significant bit ("MSB") is the sign bit. This question may help explain why.

Because -2147483648 is actually 2147483648 with negation (-) applied to it, the number isn't what you'd expect. It is actually the equivalent of this pseudocode: operator -(2147483648)
Now, assuming your compiler has sizeof(int) equal to 4 and CHAR_BIT is defined as 8, that would make 2147483648 overflow the maximum signed value of an integer (2147483647). So what is the maximum plus one? Lets work that out with a 4 bit, 2s compliment integer.
Wait! 8 overflows the integer! What do we do? Use its unsigned representation of 1000 and interpret the bits as a signed integer. This representation leaves us with -8 being applied the 2s complement negation resulting in 8, which, as we all know, is greater than 0.
This is why <limits.h> (and <climits>) commonly define INT_MIN as ((-2147483647) - 1) - so that the maximum signed integer (0x7FFFFFFF) is negated (0x80000001), then decremented (0x80000000).

Related

Vector size comparison with integer for non-zero vector size [duplicate]

See this code snippet
int main()
{
unsigned int a = 1000;
int b = -1;
if (a>b) printf("A is BIG! %d\n", a-b);
else printf("a is SMALL! %d\n", a-b);
return 0;
}
This gives the output: a is SMALL: 1001
I don't understand what's happening here. How does the > operator work here? Why is "a" smaller than "b"? If it is indeed smaller, why do i get a positive number (1001) as the difference?
Binary operations between different integral types are performed within a "common" type defined by so called usual arithmetic conversions (see the language specification, 6.3.1.8). In your case the "common" type is unsigned int. This means that int operand (your b) will get converted to unsigned int before the comparison, as well as for the purpose of performing subtraction.
When -1 is converted to unsigned int the result is the maximal possible unsigned int value (same as UINT_MAX). Needless to say, it is going to be greater than your unsigned 1000 value, meaning that a > b is indeed false and a is indeed small compared to (unsigned) b. The if in your code should resolve to else branch, which is what you observed in your experiment.
The same conversion rules apply to subtraction. Your a-b is really interpreted as a - (unsigned) b and the result has type unsigned int. Such value cannot be printed with %d format specifier, since %d only works with signed values. Your attempt to print it with %d results in undefined behavior, so the value that you see printed (even though it has a logical deterministic explanation in practice) is completely meaningless from the point of view of C language.
Edit: Actually, I could be wrong about the undefined behavior part. According to C language specification, the common part of the range of the corresponding signed and unsigned integer type shall have identical representation (implying, according to the footnote 31, "interchangeability as arguments to functions"). So, the result of a - b expression is unsigned 1001 as described above, and unless I'm missing something, it is legal to print this specific unsigned value with %d specifier, since it falls within the positive range of int. Printing (unsigned) INT_MAX + 1 with %d would be undefined, but 1001u is fine.
On a typical implementation where int is 32-bit, -1 when converted to an unsigned int is 4,294,967,295 which is indeed ≥ 1000.
Even if you treat the subtraction in an unsigned world, 1000 - (4,294,967,295) = -4,294,966,295 = 1,001 which is what you get.
That's why gcc will spit a warning when you compare unsigned with signed. (If you don't see a warning, pass the -Wsign-compare flag.)
You are doing unsigned comparison, i.e. comparing 1000 to 2^32 - 1.
The output is signed because of %d in printf.
N.B. sometimes the behavior when you mix signed and unsigned operands is compiler-specific. I think it's best to avoid them and do casts when in doubt.
#include<stdio.h>
int main()
{
int a = 1000;
signed int b = -1, c = -2;
printf("%d",(unsigned int)b);
printf("%d\n",(unsigned int)c);
printf("%d\n",(unsigned int)a);
if(1000>-1){
printf("\ntrue");
}
else
printf("\nfalse");
return 0;
}
For this you need to understand the precedence of operators
Relational Operators works left to right ...
so when it comes
if(1000>-1)
then first of all it will change -1 to unsigned integer because int is by default treated as unsigned number and it range it greater than the signed number
-1 will change into the unsigned number ,it changes into a very big number
Find a easy way to compare, maybe useful when you can not get rid of unsigned declaration, (for example, [NSArray count]), just force the "unsigned int" to an "int".
Please correct me if I am wrong.
if (((int)a)>b) {
....
}
The hardware is designed to compare signed to signed and unsigned to unsigned.
If you want the arithmetic result, convert the unsigned value to a larger signed type first. Otherwise the compiler wil assume that the comparison is really between unsigned values.
And -1 is represented as 1111..1111, so it a very big quantity ... The biggest ... When interpreted as unsigned.
while comparing a>b where a is unsigned int type and b is int type, b is type casted to unsigned int so, signed int value -1 is converted into MAX value of unsigned**(range: 0 to (2^32)-1 )**
Thus, a>b i.e., (1000>4294967296) becomes false. Hence else loop printf("a is SMALL! %d\n", a-b); executed.

C++ can not calculate a formula with a vector's size in it?

int main() {
vector<int> v;
if (0 < v.size() - 1) {
printf("true");
} else {
printf("false");
}
}
It prints true which indicates 0 < -1
std::vector::size() returns an unsigned integer. If it is 0 and you subtract 1, it underflows and becomes a huge value (specifically std::numeric_limits<std::vector::size_type>::max()). The comparison works fine, but the subtraction produces a value you did not expect.
For more about unsigned underflow (and overflow), see: C++ underflow and overflow
The simplest fix for your code is probably if (1 < v.size()).
v.size() returns a result of size_t, which is an unsigned type. An unsigned value minus 1 is still unsigned. And all non-zero unsigned values are greater than zero.
std::vector<int>::size() returns type size_t which is an unsigned type whose rank is usually at least that of int.
When, in a math operation, you put together a signed type with a unsigned type and the unsigned type doesn't have a lower rank, the signed typed will get converted to the unsigned type (see 6.3.1.8 Usual arithmetic conversions (I'm linking to the C standard, but rules for integer arithmetic are foundational and need to be common to both languages)).
In other words, assuming that size_t isn't unsigned char or unsigned short
(it's usually unsigned long and the C standard recommends it shouldn't be unsigned long long unless necessary)
(size_t)0 - 1
gets implicitly translated to
(size_t)0 - (size_t)1
which is a positive number equal to SIZE_MAX (-1 cannot be represented in an unsigned type so it gets converted converted by formally "repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type" (6.3.1.3p)).
0 is always less than SIZE_MAX.

Negative integral constant converted to unsigned type [duplicate]

-2147483648 is the smallest integer for integer type with 32 bits, but it seems that it will overflow in the if(...) sentence:
if (-2147483648 > 0)
std::cout << "true";
else
std::cout << "false";
This will print true in my testing. However, if we cast -2147483648 to integer, the result will be different:
if (int(-2147483648) > 0)
std::cout << "true";
else
std::cout << "false";
This will print false.
I'm confused. Can anyone give an explanation on this?
Update 02-05-2012:
Thanks for your comments, in my compiler, the size of int is 4 bytes. I'm using VC for some simple testing. I've changed the description in my question.
That's a lot of very good replys in this post, AndreyT gave a very detailed explanation on how the compiler will behave on such input, and how this minimum integer was implemented. qPCR4vir on the other hand gave some related "curiosities" and how integers are represented. So impressive!
-2147483648 is not a "number". C++ language does not support negative literal values.
-2147483648 is actually an expression: a positive literal value 2147483648 with unary - operator in front of it. Value 2147483648 is apparently too large for the positive side of int range on your platform. If type long int had greater range on your platform, the compiler would have to automatically assume that 2147483648 has long int type. (In C++11 the compiler would also have to consider long long int type.) This would make the compiler to evaluate -2147483648 in the domain of larger type and the result would be negative, as one would expect.
However, apparently in your case the range of long int is the same as range of int, and in general there's no integer type with greater range than int on your platform. This formally means that positive constant 2147483648 overflows all available signed integer types, which in turn means that the behavior of your program is undefined. (It is a bit strange that the language specification opts for undefined behavior in such cases, instead of requiring a diagnostic message, but that's the way it is.)
In practice, taking into account that the behavior is undefined, 2147483648 might get interpreted as some implementation-dependent negative value which happens to turn positive after having unary - applied to it. Alternatively, some implementations might decide to attempt using unsigned types to represent the value (for example, in C89/90 compilers were required to use unsigned long int, but not in C99 or C++). Implementations are allowed to do anything, since the behavior is undefined anyway.
As a side note, this is the reason why constants like INT_MIN are typically defined as
#define INT_MIN (-2147483647 - 1)
instead of the seemingly more straightforward
#define INT_MIN -2147483648
The latter would not work as intended.
The compiler (VC2012) promote to the "minimum" integers that can hold the values. In the first case, signed int (and long int) cannot (before the sign is applied), but unsigned int can: 2147483648 has unsigned int ???? type.
In the second you force int from the unsigned.
const bool i= (-2147483648 > 0) ; // --> true
warning C4146: unary minus operator applied to unsigned type, result still unsigned
Here are related "curiosities":
const bool b= (-2147483647 > 0) ; // false
const bool i= (-2147483648 > 0) ; // true : result still unsigned
const bool c= ( INT_MIN-1 > 0) ; // true :'-' int constant overflow
const bool f= ( 2147483647 > 0) ; // true
const bool g= ( 2147483648 > 0) ; // true
const bool d= ( INT_MAX+1 > 0) ; // false:'+' int constant overflow
const bool j= ( int(-2147483648)> 0) ; // false :
const bool h= ( int(2147483648) > 0) ; // false
const bool m= (-2147483648L > 0) ; // true
const bool o= (-2147483648LL > 0) ; // false
C++11 standard:
2.14.2 Integer literals [lex.icon]
…
An integer literal is a sequence of digits that has no period or
exponent part. An integer literal may have a prefix that specifies its
base and a suffix that specifies its type.
…
The type of an integer literal is the first of the corresponding list
in which its value can be represented.
If an integer literal cannot be represented by any type in its list
and an extended integer type (3.9.1) can represent its value, it may
have that extended integer type. If all of the types in the list for
the literal are signed, the extended integer type shall be signed. If
all of the types in the list for the literal are unsigned, the
extended integer type shall be unsigned. If the list contains both
signed and unsigned types, the extended integer type may be signed or
unsigned. A program is ill-formed if one of its translation units
contains an integer literal that cannot be represented by any of the
allowed types.
And these are the promotions rules for integers in the standard.
4.5 Integral promotions [conv.prom]
A prvalue of an integer type other than bool, char16_t, char32_t, or
wchar_t whose integer conversion rank (4.13) is less than the rank of
int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be
converted to a prvalue of type unsigned int.
In Short, 2147483648 overflows to -2147483648, and (-(-2147483648) > 0) is true.
This is how 2147483648 looks like in binary.
In addition, in the case of signed binary calculations, the most significant bit ("MSB") is the sign bit. This question may help explain why.
Because -2147483648 is actually 2147483648 with negation (-) applied to it, the number isn't what you'd expect. It is actually the equivalent of this pseudocode: operator -(2147483648)
Now, assuming your compiler has sizeof(int) equal to 4 and CHAR_BIT is defined as 8, that would make 2147483648 overflow the maximum signed value of an integer (2147483647). So what is the maximum plus one? Lets work that out with a 4 bit, 2s compliment integer.
Wait! 8 overflows the integer! What do we do? Use its unsigned representation of 1000 and interpret the bits as a signed integer. This representation leaves us with -8 being applied the 2s complement negation resulting in 8, which, as we all know, is greater than 0.
This is why <limits.h> (and <climits>) commonly define INT_MIN as ((-2147483647) - 1) - so that the maximum signed integer (0x7FFFFFFF) is negated (0x80000001), then decremented (0x80000000).

c++ illogical >= comparison when dealing with vector.size() most likely due to size_type being unsigned

I could use a little help clarifying this strange comparison when dealing with vector.size() aka size_type
vector<cv::Mat> rebuiltFaces;
int rebuildIndex = 1;
cout << "rebuiltFaces size is " << rebuiltFaces.size() << endl;
while( rebuildIndex >= rebuiltFaces.size() ) {
cout << (rebuildIndex >= rebuiltFaces.size()) << " , " << rebuildIndex << " >= " << rebuiltFaces.size() << endl;
--rebuildIndex;
}
And what I get out of the console is
rebuiltFaces size is 0
1 , 1 >= 0
1 , 0 >= 0
1 , -1 >= 0
1 , -2 >= 0
1 , -3 >= 0
If I had to guess I would say the compiler is blindly casting rebuildIndex to unsigned and the +- but is causing things to behave oddly, but I'm really not sure. Does anyone know?
As others have pointed out, this is due to the somewhat
counter-intuitive rules C++ applies when comparing values with different
signedness; the standard requires the compiler to convert both values to
unsigned. For this reason, it's generally considered best practice to
avoid unsigned unless you're doing bit manipulations (where the actual
numeric value is irrelevant). Regretfully, the standard containers
don't follow this best practice.
If you somehow know that the size of the vector can never overflow
int, then you can just cast the results of std::vector<>::size() to
int and be done with it. This is not without danger, however; as Mark
Twain said: "It's not what you don't know that kills you, it's what you
know for sure that ain't true." If there are no validations when
inserting into the vector, then a safer test would be:
while ( rebuildFaces.size() <= INT_MAX
&& rebuildIndex >= (int)rebuildFaces.size() )
Or if you really don't expect the case, and are prepared to abort if it
occurs, design (or find) a checked_cast function, and use it.
On any modern computer that I can think of, signed integers are represented as two's complement. 32-bit int max is 0x7fffffff, and int min is 0x80000000, this makes adding easy when the value is negative. The system works so that 0xffffffff is -1, and adding one to that causes the bits to all roll over and equal zero. It's a very efficient thing to implement in hardware.
When the number is cast from a signed value to an unsigned value the bits stored in the register don't change. This makes a barely negative value like -1 into a huge unsigned number (unsigned max), and this would make that loop run for a long time if the code inside didn't do something that would crash the program by accessing memory it shouldn't.
Its all perfectly logical, just not necessarily the logic you expected.
Example...
$ cat foo.c
#include <stdio.h>
int main (int a, char** v) {
unsigned int foo = 1;
int bar = -1;
if(foo < bar) printf("wat\n");
return 0;
}
$ gcc -o foo foo.c
$ ./foo
wat
$
In C and C++ languages when unsigned type has the same or greater width than signed type, mixed signed/unsigned comparisons are performed in the domain of unsigned type. The singed value is implicitly converted to unsigned type. There's nothing about the "compiler" doing anything "blindly" here. It was like that in C and C++ since the beginning of times.
This is what happens in your example. Your rebuildIndex is implicitly converted to vector<cv::Mat>::size_type. I.e. this
rebuildIndex >= rebuiltFaces.size()
is actually interpreted as
(vector<cv::Mat>::size_type) rebuildIndex >= rebuiltFaces.size()
When signed value are converted to unsigned type, the conversion is performed in accordance with the rules of modulo arithmetic, which is a well-known fundamental principle behind unsigned arithmetic in C and C++.
Again, all this is required by the language, it has absolutely nothing to do with how numbers are represented in the machine etc and which bits are stored where.
Regardless of the underlying representation (two's complement being the most popular, but one's complement and sign magnitude are others), if you cast -1 to an unsigned type, you will get the largest number that can be represented in that type.
The reason is that unsigned 'overflow' behavior is strictly defined as converting the value to the number between 0 and the maximum value of that type by way of modulo arithmetic. Essentially, if the value is larger than the largest value, you repeatedly subtract the maximum value until your value is in range. If your value is smaller than the smallest value (0), you repeatedly add the largest value until it's in range. So if we assume a 32-bit size_t, you start with -1, which is less than 0. Therefore, you add 2^32, giving you 2^32 - 1, which is in range, so that's your final value.
Roughly speaking, C++ defines promotion rules like this: any type of char or short is first promoted to int, regardless of signedness. Smaller types in a comparison are promoted up to the larger type in the comparison. If two types are the same size, but one is signed and one is unsigned, then the signed type is converted to unsigned. What is happening here is that your rebuildIndex is being converted up to the unsigned size_t. 1 is converted to 1u, 0 is converted to 0u, and -1 is converted to -1u, which when cast to an unsigned type is the largest value of type size_t.

In case of integer overflows what is the result of (unsigned int) * (int) ? unsigned or int?

In case of integer overflows what is the result of (unsigned int) * (int) ? unsigned or int? What type does the array index operator (operator[]) take for char*: int, unsigned int or something else?
I was auditing the following function, and suddenly this question arose. The function has a vulnerability at line 17.
// Create a character array and initialize it with init[]
// repeatedly. The size of this character array is specified by
// w*h.
char *function4(unsigned int w, unsigned int h, char *init)
{
char *buf;
int i;
if (w*h > 4096)
return (NULL);
buf = (char *)malloc(4096+1);
if (!buf)
return (NULL);
for (i=0; i<h; i++)
memcpy(&buf[i*w], init, w); // line 17
buf[4096] = '\0';
return buf;
}
Consider both w and h are very large unsigned integers. The multiplication at line 9 have a chance to pass the validation.
Now the problem is at line 17. Multiply int i with unsigned int w: if the result is int, it is possible that the product is negative, resulting in accessing a position that is before buf. If the result is unsigned int, the product will always be positive, resulting in accessing a position that is after buf.
It's hard to write code to justify this: int is too large. Does anyone has ideas on this?
Is there any documentation that specifies the type of the product? I have searched for it, but so far haven't found anything.
I suppose that as far as the vulnerability is concerned, whether (unsigned int) * (int) produces unsigned int or int doesn't matter, because in the compiled object file, they are just bytes. The following code works the same no matter the type of the product:
unsigned int x = 10;
int y = -10;
printf("%d\n", x * y); // print x * y in signed integer
printf("%u\n", x * y); // print x * y in unsigned integer
Therefore, it does not matter what type the multiplication returns. It matters that whether the consumer function takes int or unsigned.
The question here is not how bad the function is, or how to improve the function to make it better. The function undoubtedly has a vulnerability. The question is about the exact behavior of the function, based on the prescribed behavior from the standards.
do the w*h calculation in long long, check if bigger than MAX_UINT
EDIT : alternative : if overflown (w*h)/h != w (is this always the case ?! should be, right ?)
To answer your question: the type of an expression multiplying an int and an unsigned int will be an unsigned int in C/C++.
To answer your implied question, one decent way to deal with possible overflow in integer arithmetic is to use the "IntSafe" set of routines from Microsoft:
http://blogs.msdn.com/michael_howard/archive/2006/02/02/523392.aspx
It's available in the SDK and contains inline implementations so you can study what they're doing if you're on another platform.
Ensure that w * h doesn't overflow by limiting w and h.
The type of w*i is unsigned in your case. If I read the standard correctly, the rule is that the operands are converted to the larger type (with its signedness), or unsigned type corresponding to the signed type (which is unsigned int in your case).
However, even if it's unsigned, it doesn't prevent the wraparound (writing to memory before buf), because it might be the case (on i386 platform, it is), that p[-1] is the same as p[-1u]. Anyway, in your case, both buf[-1] and buf[big unsigned number] would be undefined behavior, so the signed/unsigned question is not that important.
Note that signed/unsigned matters in other contexts - eg. (int)(x*y/2) gives different results depending on the types of x and y, even in the absence of undefined behaviour.
I would solve your problem by checking for overflow on line 9; since 4096 is a pretty small constant and 4096*4096 doesn't overflow on most architectures (you need to check), I'd do
if (w>4096 || h>4096 || w*h > 4096)
return (NULL);
This leaves out the case when w or h are 0, you might want to check for it if needed.
In general, you could check for overflow like this:
if(w*h > 4096 || (w*h)/w!=h || (w*h)%w!=0)
In C/C++ the p[n] notation is really a shortcut to writting *(p+n), and this pointer arithmetic takes into account the sign. So p[-1] is valid and refers to the value immediately before *p.
So the sign really matters here, the result of arithmetic operator with integer follow a set of rules defined by the standard, and this is called integer promotions.
Check out this page: INT02-C. Understand integer conversion rules
2 changes make it safer:
if (w >= 4096 || h >= 4096 || w*h > 4096) return NULL;
...
unsigned i;
Note also that it's not less a bad idea to write to or read from past the buffer end. So the question is not whether iw may become negative, but whether 0 <= ih +w <= 4096 holds.
So it's not the type that matters, but the result of h*i.
For example, it doesn't make a difference whether this is (unsigned)0x80000000 or (int)0x80000000, the program will seg-fault anyway.
For C, refer to "Usual arithmetic conversions" (C99: Section 6.3.1.8, ANSI C K&R A6.5) for details on how the operands of the mathematical operators are treated.
In your example the following rules apply:
C99:
Otherwise, if the type of the operand
with signed integer type can represent
all of the values of the type of the
operand with unsigned integer type,
then the operand with unsigned integer
type is converted to the type of the
operand with signed integer type.
Otherwise, both operands are converted
to the unsigned integer type
corresponding to the type of the
operand with signed integer type.
ANSI C:
Otherwise, if either operand is unsigned int, the other is converted to unsigned int.
Why not just declare i as unsigned int? Then the problem goes away.
In any case, i*w is guaranteed to be <= 4096, as the code tests for this, so it's never going to overflow.
memcpy(&buf[iw > -1 ? iw < 4097? iw : 0 : 0], init, w);
I don't think the triple calculation of iw does degrade the perfomance)
w*h could overflow if w and/or h are sufficiently large and the following validation could pass.
9. if (w*h > 4096)
10. return (NULL);
On int , unsigned int mixed operations, int is elevated to unsigned int, in which case, a negative value of 'i' would become a large positive value. In that case
&buf[i*w]
would be accessing a out of bound value.
Unsigned arithmetic is done as modular (or wrap-around), so the product of two large unsigned ints can easily be less than 4096. The multiplication of int and unsigned int will result in an unsigned int (see section 4.5 of the C++ standard).
Therefore, given large w and a suitable value of h, you can indeed get into trouble.
Making sure integer arithmetic doesn't overflow is difficult. One easy way is to convert to floating-point and doing a floating-point multiplication, and seeing if the result is at all reasonable. As qwerty suggested, long long would be usable, if available on your implementation. (It's a common extension in C90 and C++, does exist in C99, and will be in C++0x.)
There are 3 paragraphs in the current C1X draft on calculating (UNSIGNED TYPE1) X (SIGNED TYPE2) in 6.3.1.8 Usual arithmetic coversions, N1494,
WG 14: C - Project status and milestones
Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.
Otherwise, if the type of the operand with signed integer type can represent
all of the values of the type of the operand with unsigned integer type, then
the operand with unsigned integer type is converted to the type of the
operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
So if a is unsigned int and b is int, parsing of (a * b) should generate code (a * (unsigned int)b). Will overflow if b < 0 or a * b > UINT_MAX.
If a is unsigned int and b is long of greater size, (a * b) should generate ((long)a * (long)b). Will overflow if a * b > LONG_MAX or a * b < LONG_MIN.
If a is unsigned int and b is long of the same size, (a * b) should generate ((unsigned long)a * (unsigned long)b). Will overflow if b < 0 or a * b > ULONG_MAX.
On your second question about the type expected by "indexer", the answer appears "integer type" which allows for any (signed) integer index.
6.5.2.1 Array subscripting
Constraints
1 One of the expressions shall have type ‘‘pointer to complete object type’’, the other
expression shall have integer type, and the result has type ‘‘type’’.
Semantics
2 A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
It is up to the compiler to perform static analysis and warn the developer about possibility of buffer overrun when the pointer expression is an array variable and the index may be negative. Same goes about warning on possible array size overruns even when the index is positive or unsigned.
To actually answer your question, without specifying the hardware you're running on, you don't know, and in code intended to be portable, you shouldn't depend on any particular behavior.