Consider the following two C program. My question is in first program unsigned keyword prints -12 but I think it should print 4294967284 but it does not print it for %d specifier. It prints it for %u specifier. But if we look on second program, the output is 144 where it should be -112. Something is fishy about unsigned keyword which I am not getting. Any help friends!
#include <stdio.h>
int main()
{ unsigned int i = -12;
printf(" i = %d\n",i);
printf(" i = %u\n",i);
return 0;
}
Above prorgam I got from this link : Assigning negative numbers to an unsigned int?
#include <stdio.h>
int main(void)
{unsigned char a=200, b=200, c;
c = a+b;
printf("result=%d\n",c);
return 0;
}
Each printf format specifier requires an argument of some particular type. "%d" requires an argument of type int; "%u" requires an argument of type unsigned int. It is entirely your responsibility to pass arguments of the correct type.
unsigned int i = -12;
-12 is of type int. The initialization implicitly converts that value from int to unsigned int. The converted value (which is positive and very large) is stored in i. If int and unsigned int are 32 bits, the stored value will be 4294967284 (232-12).
printf(" i = %d\n",i);
i is of type unsigned int, but "%d" requires an int argument. The behavior is not defined by the C standard. Typically the value stored in i will be interpreted as if it had been stored in an int object. On most systems, the output will be i = -12 -- but you shouldn't depend on that.
printf(" i = %u\n",i);
This will correctly print the value of i (assuming the undefined behavior of the previous statement didn't mess things up).
For ordinary functions, assuming you call them correctly, arguments will often be implicitly converted to the declared type of the parameter, if such a conversion is available. For a variadic function like printf, which can take a varying number and type(s) of arguments, no such conversion can be done, because the compiler doesn't know what type is expected. Instead, arguments undergo the default argument promotions. An argument of a type narrow than int is promoted to int if int can hold all values of the type, or to unsigned int otherwise. An argument of type float is promoted to double (which is why "%f" works for both float and double arguments).
The rules are such an argument of a narrow unsigned type will often (but not always) be promoted to (signed) int.
unsigned char a=200, b=200, c;
Assuming 8-bit bytes, a and b are set to 200.
c = a+b;
The sum 400 is too bit to fit in an unsigned char. For unsigned arithmetic and conversion, out-of-range results are reduced to the range of the type. c is set to 144.
printf("result=%d\n",c);
The value of c is promoted to int; even though the argument is of an unsigned type, int is big enough to hold all possible values of the type. The output is result=144.
In the first program the behaviour is undefined. It's your responsibility to make sure that the format specifier matches the data type of the argument. The compiler emits code that assumes you got it right; at runtime it does not have to do any checks (and often, cannot do any checks even if it wanted to).
(For example, the library implementation printf function does not know what arguments you gave it , it only sees some bytes and it has to assume those are the bytes for the type that you specified using %d).
You appear to be trying to infer something unsigned means based on the output of a program with undefined behaviour. That won't work. Stick to well-defined programs (and preferably just read the definition of unsigned).
In a comment you say:
could give me any reference of unsigned keyword. Still concept is not getting cleared to me. Unsigned definition in C/C++ standard.
In the C99 standard read section 6.2.5, from part 6 onwards.
The definition of unsigned int is an integer type that can hold values from 0 up to a positive number UINT_MAX (which should be one less than a power of two), which must be at least 65535, and typically is 4294967295.
When you write unsigned int i = -12;, the compiler sees that -12 is outside of the range of permitted values for unsigned int, and it performs a conversion. The definition of that conversion is to add or subtract UINT_MAX+1 until the value is in range.
The second part of your question is unrelated to all this. There are no unsigned int in that program; only unsigned char.
In that program, 200 + 200 gives 400. As mentioned above, since this is out of range the compiler converts it by subtracting UCHAR_MAX+1 (i.e. 256) until it is in range. 400 - 256 = 144.
The %d and %u specifiers of printf have capability of (or, are responsible for) typecasting the input integer into int and unsigned int, respectively.
In fact printf (in general, any variadic functions) and arithmetic operators can accept only three types of arguments (except for the format string): 4-byte int, 8-byte long long and double (warning: very inaccurate description!) Any integral arguments whose size is less than int are extended into int. Any float arguments are extended into double. These rules improve uniformity of the input parameters of printf and arithmetic operators.
Regarding your 2nd example: the following steps take place
The + operator requires (unsigned) char operands to be extended into (unsigned) int values (which are 4-bytes integers in your case, I assume.)
The resulting sum is 400 of a 4-bytes unsigned int.
Only the least significant 1 byte of the above sum can fit into unsigned char c, so c has the value of 400 % 256 == 144.
printf requires all the smaller integral arguments to be expanded into int, thus what printf receives is 400 of a 4-bytes int.
The %d specifier prints the above argument as "400".
Google for "default argument promotion" for more details.
Related
long long int n = 2000*2000*2000*2000; // overflow
long long int n = pow(2000,4); // works
long long int n = 16000000000000; // works
Why does the first one overflow (multiplying integer literal constants to assign to a long long)?
What's different about it vs. the second or third ones?
Because 2000 is an int which is usually 32-bit. Just use 2000LL.
Using LL suffix instead of ll was suggested by #AdrianMole in, now deleted, comment. Please check his answer.
By default, integer literals are of the smallest type that can hold their value but not smaller than int. 2000 can easily be stored in an int since the Standard guarantees it is effectively at least a 16-bit type.
Arithmetic operators are always called with the larger of the types present but not smaller than int:
char*char will be promoted to operator*(int,int)->int
char*int calls operator*(int,int)->int
long*int calls operator*(long,long)->long
int*int still calls operator*(int,int)->int.
Crucially, the type is not dependent on whether the result can be stored in the inferred type. Which is exactly the problem happening in your case - multiplication is done with ints but the result overflows as it is still stored as int.
C++ does not support inferring types based on their destination like Haskell does so the assignment is irrelevant.
The constants (literals) on the RHS of your first line of code are int values (not long long int). Thus, the mulitplications are performed using int arithmetic, which will overflow.
To fix this, make the constants long long using the LL suffix:
long long int n = 2000LL * 2000LL * 2000LL * 2000LL;
cppreference
In fact, as noted in the comment by Peter Cordes, the LL suffix is only actually needed on either the first (leftmost) or second constant. This is because, when multiplying types of two different ranks, the operand of lower rank is promoted to the type of the higher rank, as described here: Implicit type conversion rules in C++ operators. Furthermore, as the * (multiplication) operator has left-to-right associativity, the 'promoted' result of the first multiplication propagates that promotion to the second and third.
Thus, either of the following lines will also work without overflow:
long long int n1 = 2000LL * 2000 * 2000 * 2000;
long long int n2 = 2000 * 2000LL * 2000 * 2000;
Note: Although lowercase suffixes (as in 2000ll) are valid C++, and entirely unambiguous to the compiler, there is a general consensus that the lowercase letter, 'ell', should be avoided in long and long long integer literals, as it can easily be mistaken, by human readers, for the digit, 1. Thus, you will notice that 2000LL (uppercase suffix) has been used throughout the answers here presented.
2000*2000*2000*2000 is a multiplication of 4 int values, which returns an int value. When you assign this int value to long long int n the overflow already happend (if int is 32 bit the resulting value won't fit).
You need to make sure that the overflow does not occur, so when you write
long long int n = static_cast<long long int>(2000)*2000*2000*2000;
you make sure that you are doing a long long int multiplication (long long int multiplied with int returns a long long int, so no overflow in your case).
A shorter (and better way) is to write 2000LL or 2000ll instead of the static_cast. That gives the integer literal the right type. This is not needed for 2000 which fits into an int but it would be needed for higher values that don't fit into an int.
long long int n = 2000LL*2000*2000*2000;
long long int n = 2000LL*2000LL*2000LL*2000LL;
The other answers (as of this writing) appear to not have been explicit enough to answer the question as stated. I'll try to fill this gap.
Why does the first one overflow (multiplying integer literal constants to assign to a long long)?
The expression
long long int n = 2000*2000*2000*2000;
is evaluated as follows:
long long int n = ((2000*2000)*2000)*2000;
where the steps are (assuming 32-bit int):
(2000*2000) is a multiplication of two int values that yields 4000000, another int value.
((2000*2000)*2000) is a multiplication of the above yielded int value 4000000 with an int value 2000. This would yield 8000000000 if the value could fit into an int. But our assumed 32-bit int can store a maximum value of 231-1=2147483647. So we get overflow right at this point.
The next multiplication would happen if there hadn't been overflow above.
The assignment of the resulting int product would happen (if not the overflow) to the long long variable, which would preserve the value.
Since we did have overflow, the statement has undefined behavior, so steps 3 and 4 can't be guaranteed.
What's different about it vs. the second or third ones?
long long int n = pow(2000,4);
The pow(2000,4) converts 2000 and 4 into double (see some docs on pow), and then the function implementation does its best to produce a good approximation of the result, as a double. Then the assignment converts this double value to long long.
long long int n = 16000000000000;
The literal 16000000000000 is too large to fit into an int, so its type is instead the next signed type that can fit the value. It could be long or long long, depending on the platform. See Integer literal#The type of the literal for details. then the assignment converts this value to long long (or just writes it, if the literal's type was long long already).
The first is a multiplication using integers (typically 32 bit). It overflows because those integers cannot store 2000^4. The result is then cast to long long int.
The second calls the pow function which casts the first argument to double and returns a double. The result is then cast to long long int. There is no overflow in this case because the math is done on a double value.
You might want to use the following in C++ to understand this:
#include<iostream>
#include<cxxabi.h>
using namespace std;
using namespace abi;
int main () {
int status;
cout << __cxa_demangle(typeid(2000*2000*2000*2000).name(),0,0,&status);
}
As you can see, the type is int.
In C, you can use (courtesy of):
#include <stdio.h>
#include <stddef.h>
#include <stdint.h>
#define typename(x) _Generic((x), /* Get the name of a type */ \
\
_Bool: "_Bool", unsigned char: "unsigned char", \
char: "char", signed char: "signed char", \
short int: "short int", unsigned short int: "unsigned short int", \
int: "int", unsigned int: "unsigned int", \
long int: "long int", unsigned long int: "unsigned long int", \
long long int: "long long int", unsigned long long int: "unsigned long long int", \
float: "float", double: "double", \
long double: "long double", char *: "pointer to char", \
void *: "pointer to void", int *: "pointer to int", \
char(*)[]: "pointer to char array", default: "other")
unsigned int a = 3;
int main() {
printf("%s", typename(a-10));
return 0;
}
Here the type of the expression is unsigned int because the type mismatch implicitly upgrades the type to the largest type between unsigned int and int, which is unsigned int. The unsigned int will underflow to a large positive, which will be the expected negative when assigned to or interpreted as an int. The result of the calculation will always be unsigned int regardless of the values involved.
C
The minimum default type of an integer literal without a suffix is int, but only if the literal exceeds this, does its type becomes an unsigned int; if larger than that it is given a type of a long int, therefore 2000s are all ints. The type of an expression performed on a literal however, using unary or binary operators, uses the implicit type hierarchy to decide a type, not the value of the result (unlike the literal itself which uses the length of the literal in deciding the type), this is because C uses type coercion and not type synthesis. In order to solve this, you'd have to use long suffixes ul on the 2000s to explicitly specify the type of the literal.
Similarly, the default type of a decimal literal is double, but this can be changed with a f suffix. Prefixes do not change the type of decimal or integer literals.
The type of a string literal is char [], although it is really a const char [], and is just an address of the first character in the actual representation of that string literal in .rodata, and the address can be taken like any array using the unary ampersand &"string", which is the same value (address) as "string", just a different type (char (*)[7] vs. char[7]; "string" i.e. char[] is not just (at compiler level) a pointer to the array, it is the array, whereas the unary ampersand extracts just the pointer to the array). The u prefix changes this to an array of char16_t, which is an unsigned short int; the U prefix changes it to an array of char32_t, which is an unsigned int; and the L prefix changes it to an array of wchar_t which is an int. u8 is a char and an unprefixed string uses implementation specific encoding, which is typically the same as u8 i.e. UTF-8, of which ASCII is a subset. A raw (R) prefix available only for string literals (and available only on GNU C (std=gnu99 onwards)) can be prefixed i.e. uR or u8R, but this does not influence the type.
The type of a character literal is int unless prefixed with u (u'a' is unsigned short int) or U (U'a' is unsigned int). u8 and and L are both int when used on a character literal. An escape sequence in a string or character literal does not influence the encoding and hence the type, it's just a way of actually presenting the character to be encoded to the compiler.
The type of a complex literal 10i+1 or 10j+1 is complex int, where both the real and the imaginary part can have a suffix, like 10Li+1, which in this case makes the imaginary part long and the overall type is complex long int, and upgrades the type of both the real and the imaginary part, so it doesn't matter where you put the suffix or whether you put it on both. A mismatch will always use the largest of the two suffixes as the overall type.
Using an explicit cast instead of a literal suffix always results in the correct behaviour if you use it correctly and are aware of the semantic difference that it truncates/extends (sign extends for signed; zero extends for unsigned – this is based on the type of the literal or expression being cast and not the type that's being cast to, so a signed int is sign extended into an unsigned long int) a literal to an expression of that type, rather than the literal inherently having that type.
C++
Again, the minimum default type is an int for the smallest literal base. The literal base i.e. the actual value of the literal, and the suffix influence the final literal type according to the following table where within each box for each suffix, the order of final type is listed from smallest to largest based on the size of the actual literal base. For each suffix, the final type of the literal can only be equal to or larger than the suffix type, and based on the size of the literal base. C exhibits the same behaviour. When larger than a long long int, depending on the compiler, __int128 is used. I think you could also create your own literal suffix operator i128 and return a value of that type.
The default type of a decimal literal is the same as C.
The type of a string literal is char []. The type of &"string" is const char (*) [7] and the type of +"string" is const char * (in C you can only decay using "string"+0). C++ differs in that the latter 2 forms acquire a const but in C they don't. The string prefixes behave the same as in C
Character and complex literals behave the same as C.
long long int n = 2000*2000*2000*2000; // overflow
long long int n = pow(2000,4); // works
long long int n = 16000000000000; // works
Why does the first one overflow (multiplying integer literal constants to assign to a long long)?
What's different about it vs. the second or third ones?
Because 2000 is an int which is usually 32-bit. Just use 2000LL.
Using LL suffix instead of ll was suggested by #AdrianMole in, now deleted, comment. Please check his answer.
By default, integer literals are of the smallest type that can hold their value but not smaller than int. 2000 can easily be stored in an int since the Standard guarantees it is effectively at least a 16-bit type.
Arithmetic operators are always called with the larger of the types present but not smaller than int:
char*char will be promoted to operator*(int,int)->int
char*int calls operator*(int,int)->int
long*int calls operator*(long,long)->long
int*int still calls operator*(int,int)->int.
Crucially, the type is not dependent on whether the result can be stored in the inferred type. Which is exactly the problem happening in your case - multiplication is done with ints but the result overflows as it is still stored as int.
C++ does not support inferring types based on their destination like Haskell does so the assignment is irrelevant.
The constants (literals) on the RHS of your first line of code are int values (not long long int). Thus, the mulitplications are performed using int arithmetic, which will overflow.
To fix this, make the constants long long using the LL suffix:
long long int n = 2000LL * 2000LL * 2000LL * 2000LL;
cppreference
In fact, as noted in the comment by Peter Cordes, the LL suffix is only actually needed on either the first (leftmost) or second constant. This is because, when multiplying types of two different ranks, the operand of lower rank is promoted to the type of the higher rank, as described here: Implicit type conversion rules in C++ operators. Furthermore, as the * (multiplication) operator has left-to-right associativity, the 'promoted' result of the first multiplication propagates that promotion to the second and third.
Thus, either of the following lines will also work without overflow:
long long int n1 = 2000LL * 2000 * 2000 * 2000;
long long int n2 = 2000 * 2000LL * 2000 * 2000;
Note: Although lowercase suffixes (as in 2000ll) are valid C++, and entirely unambiguous to the compiler, there is a general consensus that the lowercase letter, 'ell', should be avoided in long and long long integer literals, as it can easily be mistaken, by human readers, for the digit, 1. Thus, you will notice that 2000LL (uppercase suffix) has been used throughout the answers here presented.
2000*2000*2000*2000 is a multiplication of 4 int values, which returns an int value. When you assign this int value to long long int n the overflow already happend (if int is 32 bit the resulting value won't fit).
You need to make sure that the overflow does not occur, so when you write
long long int n = static_cast<long long int>(2000)*2000*2000*2000;
you make sure that you are doing a long long int multiplication (long long int multiplied with int returns a long long int, so no overflow in your case).
A shorter (and better way) is to write 2000LL or 2000ll instead of the static_cast. That gives the integer literal the right type. This is not needed for 2000 which fits into an int but it would be needed for higher values that don't fit into an int.
long long int n = 2000LL*2000*2000*2000;
long long int n = 2000LL*2000LL*2000LL*2000LL;
The other answers (as of this writing) appear to not have been explicit enough to answer the question as stated. I'll try to fill this gap.
Why does the first one overflow (multiplying integer literal constants to assign to a long long)?
The expression
long long int n = 2000*2000*2000*2000;
is evaluated as follows:
long long int n = ((2000*2000)*2000)*2000;
where the steps are (assuming 32-bit int):
(2000*2000) is a multiplication of two int values that yields 4000000, another int value.
((2000*2000)*2000) is a multiplication of the above yielded int value 4000000 with an int value 2000. This would yield 8000000000 if the value could fit into an int. But our assumed 32-bit int can store a maximum value of 231-1=2147483647. So we get overflow right at this point.
The next multiplication would happen if there hadn't been overflow above.
The assignment of the resulting int product would happen (if not the overflow) to the long long variable, which would preserve the value.
Since we did have overflow, the statement has undefined behavior, so steps 3 and 4 can't be guaranteed.
What's different about it vs. the second or third ones?
long long int n = pow(2000,4);
The pow(2000,4) converts 2000 and 4 into double (see some docs on pow), and then the function implementation does its best to produce a good approximation of the result, as a double. Then the assignment converts this double value to long long.
long long int n = 16000000000000;
The literal 16000000000000 is too large to fit into an int, so its type is instead the next signed type that can fit the value. It could be long or long long, depending on the platform. See Integer literal#The type of the literal for details. then the assignment converts this value to long long (or just writes it, if the literal's type was long long already).
The first is a multiplication using integers (typically 32 bit). It overflows because those integers cannot store 2000^4. The result is then cast to long long int.
The second calls the pow function which casts the first argument to double and returns a double. The result is then cast to long long int. There is no overflow in this case because the math is done on a double value.
You might want to use the following in C++ to understand this:
#include<iostream>
#include<cxxabi.h>
using namespace std;
using namespace abi;
int main () {
int status;
cout << __cxa_demangle(typeid(2000*2000*2000*2000).name(),0,0,&status);
}
As you can see, the type is int.
In C, you can use (courtesy of):
#include <stdio.h>
#include <stddef.h>
#include <stdint.h>
#define typename(x) _Generic((x), /* Get the name of a type */ \
\
_Bool: "_Bool", unsigned char: "unsigned char", \
char: "char", signed char: "signed char", \
short int: "short int", unsigned short int: "unsigned short int", \
int: "int", unsigned int: "unsigned int", \
long int: "long int", unsigned long int: "unsigned long int", \
long long int: "long long int", unsigned long long int: "unsigned long long int", \
float: "float", double: "double", \
long double: "long double", char *: "pointer to char", \
void *: "pointer to void", int *: "pointer to int", \
char(*)[]: "pointer to char array", default: "other")
unsigned int a = 3;
int main() {
printf("%s", typename(a-10));
return 0;
}
Here the type of the expression is unsigned int because the type mismatch implicitly upgrades the type to the largest type between unsigned int and int, which is unsigned int. The unsigned int will underflow to a large positive, which will be the expected negative when assigned to or interpreted as an int. The result of the calculation will always be unsigned int regardless of the values involved.
C
The minimum default type of an integer literal without a suffix is int, but only if the literal exceeds this, does its type becomes an unsigned int; if larger than that it is given a type of a long int, therefore 2000s are all ints. The type of an expression performed on a literal however, using unary or binary operators, uses the implicit type hierarchy to decide a type, not the value of the result (unlike the literal itself which uses the length of the literal in deciding the type), this is because C uses type coercion and not type synthesis. In order to solve this, you'd have to use long suffixes ul on the 2000s to explicitly specify the type of the literal.
Similarly, the default type of a decimal literal is double, but this can be changed with a f suffix. Prefixes do not change the type of decimal or integer literals.
The type of a string literal is char [], although it is really a const char [], and is just an address of the first character in the actual representation of that string literal in .rodata, and the address can be taken like any array using the unary ampersand &"string", which is the same value (address) as "string", just a different type (char (*)[7] vs. char[7]; "string" i.e. char[] is not just (at compiler level) a pointer to the array, it is the array, whereas the unary ampersand extracts just the pointer to the array). The u prefix changes this to an array of char16_t, which is an unsigned short int; the U prefix changes it to an array of char32_t, which is an unsigned int; and the L prefix changes it to an array of wchar_t which is an int. u8 is a char and an unprefixed string uses implementation specific encoding, which is typically the same as u8 i.e. UTF-8, of which ASCII is a subset. A raw (R) prefix available only for string literals (and available only on GNU C (std=gnu99 onwards)) can be prefixed i.e. uR or u8R, but this does not influence the type.
The type of a character literal is int unless prefixed with u (u'a' is unsigned short int) or U (U'a' is unsigned int). u8 and and L are both int when used on a character literal. An escape sequence in a string or character literal does not influence the encoding and hence the type, it's just a way of actually presenting the character to be encoded to the compiler.
The type of a complex literal 10i+1 or 10j+1 is complex int, where both the real and the imaginary part can have a suffix, like 10Li+1, which in this case makes the imaginary part long and the overall type is complex long int, and upgrades the type of both the real and the imaginary part, so it doesn't matter where you put the suffix or whether you put it on both. A mismatch will always use the largest of the two suffixes as the overall type.
Using an explicit cast instead of a literal suffix always results in the correct behaviour if you use it correctly and are aware of the semantic difference that it truncates/extends (sign extends for signed; zero extends for unsigned – this is based on the type of the literal or expression being cast and not the type that's being cast to, so a signed int is sign extended into an unsigned long int) a literal to an expression of that type, rather than the literal inherently having that type.
C++
Again, the minimum default type is an int for the smallest literal base. The literal base i.e. the actual value of the literal, and the suffix influence the final literal type according to the following table where within each box for each suffix, the order of final type is listed from smallest to largest based on the size of the actual literal base. For each suffix, the final type of the literal can only be equal to or larger than the suffix type, and based on the size of the literal base. C exhibits the same behaviour. When larger than a long long int, depending on the compiler, __int128 is used. I think you could also create your own literal suffix operator i128 and return a value of that type.
The default type of a decimal literal is the same as C.
The type of a string literal is char []. The type of &"string" is const char (*) [7] and the type of +"string" is const char * (in C you can only decay using "string"+0). C++ differs in that the latter 2 forms acquire a const but in C they don't. The string prefixes behave the same as in C
Character and complex literals behave the same as C.
If I have this code:
int A;
unsigned int B;
if (A==B) foo();
the compiler will complain about mixed types in comparison. If I cast A like this:
if ((unsigned int) A==B) foo();
does this instruct the compiler to insert code to convert A from int to unsigned int? Or does it just tell the compiler don't worry about, ignore the type mismatch?
UPDATE: If this is unsafe (as pointed out below), how should I handle this comparison? (Wouldn't assigning the contents of an int to an unsigned int for later comparison also be unsafe)
UPDATE: Wow are there some different answers (from people with thousands of posts). I've accepted what seems like the best, but anyone reading this question should read ALL answers carefully.
When casting, at least at the conceptual level, compiler will create a temporary variable of the type specified in the cast expression.
You may test that this expression:
(unsigned int) A = B; // This time assignment is intended
will generate an error pointing modification of a temporary (const) variable.
Of course compiler is free to optimize away any temporary variables created through a cast. Nevertheless a valid method to build a temporary must exist.
The cast implies a conversion, if necessary. But this is problematic for negative values. They are mapped to positive values on the unsigned type. Thus you have to make sure a negative value never compares equal any (positive) unsigned value:
int A;
unsigned int B;
...
if ( (A >= 0) && (static_cast<unsigned int>(A) == B) )
foo();
This works because the unsigned variant of an integer type is guaranteed to hold all positive values (including 0) of the corressponding signed type.
Notice the usage of a static_cast instead of the "classic" C-style cast.
With plain types, in C and C++, == is always done with both operands converted to the same type. In OP's code, A is converted to unsigned first.
If I cast ... does this instruct the compiler to insert code to convert A from int to unsigned int?
Yes, but that code would have occurred anyway. Without the cast, the compiler is simple warning that it is going to do something that the programmer may not have intended.
Or (If I cast ) does it just tell the compiler don't worry about, ignore the type mismatch?
The type mis-match is not ignored. By supplying the cast, there is no type mis-match to warn about.
How should I handle this comparison?
Insure A is not negative, then convert to unsigned with a cast.
int A;
unsigned int B;
// if (A==B) foo();
if (A >= 0 && (unsigned)A == B) foo();
Every non-negative int can be converted to an unsigned with no value change.
The range of nonnegative values of a signed integer type is a subrange of the
corresponding unsigned integer type C11dr §6.2.5 9
So you question is just about a signed/unsigned comparison.
C++ standard says in clause 5 Expressions [expr] § 10:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follow:...
Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type.
and in 4.7 Integral conversions [conv.integral] §2
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s
complement representation, this conversion is conceptual and there is no change in the bit pattern (if there
is no truncation). —end note ]
That means that on a common system using 2-complement for negative numbers and 32 bits for an int or unsigned int, (unsigned int) -1 will end in 4294967295.
It may be what you want or not, the compiler just warn you that it will consider them as equal.
If it is not what you want, just first test whether the signed value is negative. If it is, say that they are not equal an skip the equality comparison.
It depends on type of cast and what you are casting. In your particular case nothing is going to happen, but in other cases the actual code will be performed. Simplest example:
void foo(double d) {};
...
int x;
foo(static_cast<double>(x));
In this example there would be code generated.
How does C/C++ deal if you pass an int as a parameter into a method that takes in a byte (a char)? Does the int get truncated? Or something else?
For example:
void method1()
{
int i = //some int;
method2(i);
}
void method2(byte b)
{
//Do something
}
How does the int get "cast" to a byte (a char)? Does it get truncated?
If byte stands for char type, the behavior will depend on whether char is signed or unsigned on your platform.
If char is unsigned, the original int value is reduced to the unsigned char range modulo UCHAR_MAX+1. Values in [0, UCHAR_MAX] range are preserved. C language specification describes this process as
... the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
If char type is signed, then values within [SCHAR_MIN, SCHAR_MAX] range are preserved, while any values outside this range are converted in some implementation-defined way. (C language additionally explicitly allows an implementation-defined signal to be raised in such situations.) I.e. there's no universal answer. Consult your platform's documentation. Or, better, write code that does not rely on any specific conversion behavior.
Just truncated AS bit pattern (byte is in general unsigned char, however, you have to check)
int i = -1;
becomes
byte b = 255; when byte = unsigned char
byte b = -1; when byte = signed char
i = 0; b = 0;
i = 1024; b = 0;
i = 1040; b = 16;
Quoting the C++ 2003 standard:
Clause 5.2.2 paragrah 4: When a function is called, each parameter (8.3.5) shall be initialized (8.5, 12.8, 12.1) with its corresponding
argument.
So, b is initialized with i. What does that mean?
8.5/14 the initial value of the object being initialized is the (possibly converted) value of the initializer
expression. Standard conversions (clause 4) will be used, if necessary, to convert the initializer
expression to the … destination type; no user-defined conversions are considered
Oh, i is converted, using the standard conversions. What does that mean? Among many other standard conversions are these:
4.7/2 If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type).
4.7/3 If the destination type is signed, the value is unchanged if it can be represented in the destination type (and
bit-field width); otherwise, the value is implementation-defined.
Oh, so if char is unsigned, the value is truncated to the number of bits in a char (or computed modulo UCHAR_MAX+1, whichever way you want to think about it.)
And if char is signed, then the value is unchanged, if it fits; implementation-defined otherwise.
In practice, on the computers and compilers you care about, the value is always truncated to fit in 8 bits, regardless of whether chars are signed or unsigned.
You don't tell what a byte is, but if you pass a parameter that is convertible to the parameter type, the value will be converted.
If the types have different value ranges there is a risk that the value is outside the range of the parameter type, and then it will not work. If it is within the range, it will be safe.
Here's an example:
1) Code:
#include <stdio.h>
void
method1 (unsigned char b)
{
int a = 10;
printf ("a=%d, b=%d...\n", a, b);
}
void
method2 (unsigned char * b)
{
int a = 10;
printf ("a=%d, b=%d...\n", a, *b);
}
int
main (int argc, char *argv[])
{
int i=3;
method1 (i);
method2 (i);
return 0;
}
2) Compile (with warning):
$ gcc -o x -Wall -pedantic x.c
x.c: In function `main':
x.c:22: warning: passing arg 1 of `method2' makes pointer from integer without a cast
3) Execute (with crash):
$ ./x
a=10, b=3...
Segmentation fault (core dumped)
'Hope that helps - both with your original question, and with related issues.
There are two cases to worry about:
// Your input "int i" gets truncated
void method2(byte b)
{
...
// Your "method2()" stack gets overwritten
void method2(byte * b)
{
...
It will be cast to a byte the same as if you casted it explicitly as (byte)i.
Your sample code above might be a different case though, unless you have a forward declaration for method2 that is not shown. Because method2 is not yet declared at the time it is called, the compiler doesn't know the type of its first parameter. In C, functions should be declared (or defined) before they are called. What happens in this case is that the compiler assumes (as an implicit declaration) that method2's first parameter is an int and method2 receives an int. Officially that results in undefined behaviour, but on most architectures, both int and byte would be passed in the same size register anyway and it will happen to work.
This question already has answers here:
Implicit type conversion rules in C++ operators
(9 answers)
Closed 4 years ago.
Consider the following programs:
// http://ideone.com/4I0dT
#include <limits>
#include <iostream>
int main()
{
int max = std::numeric_limits<int>::max();
unsigned int one = 1;
unsigned int result = max + one;
std::cout << result;
}
and
// http://ideone.com/UBuFZ
#include <limits>
#include <iostream>
int main()
{
unsigned int us = 42;
int neg = -43;
int result = us + neg;
std::cout << result;
}
How does the + operator "know" which is the correct type to return? The general rule is to convert all of the arguments to the widest type, but here there's no clear "winner" between int and unsigned int. In the first case, unsigned int must be being chosen as the result of operator+, because I get a result of 2147483648. In the second case, it must be choosing int, because I get a result of -1. Yet I don't see in the general case how this is decidable. Is this undefined behavior I'm seeing or something else?
This is outlined explicitly in §5/9:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:
If either operand is of type long double, the other shall be converted to long double.
Otherwise, if either operand is double, the other shall be converted to double.
Otherwise, if either operand is float, the other shall be converted to float.
Otherwise, the integral promotions shall be performed on both operands.
Then, if either operand is unsigned long the other shall be converted to unsigned long.
Otherwise, if one operand is a long int and the other unsigned int, then if a long int can represent all the values of an unsigned int, the unsigned int shall be converted to a long int; otherwise both operands shall be converted to unsigned long int.
Otherwise, if either operand is long, the other shall be converted to long.
Otherwise, if either operand is unsigned, the other shall be converted to unsigned.
[Note: otherwise, the only remaining case is that both operands are int]
In both of your scenarios, the result of operator+ is unsigned. Consequently, the second scenario is effectively:
int result = static_cast<int>(us + static_cast<unsigned>(neg));
Because in this case the value of us + neg is not representable by int, the value of result is implementation-defined – §4.7/3:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.
Before C was standardized, there were differences between compilers -- some followed "value preserving" rules, and others "sign preserving" rules. Sign preserving meant that if either operand was unsigned, the result was unsigned. This was simple, but at times gave rather surprising results (especially when a negative number was converted to an unsigned).
C standardized on the rather more complex "value preserving" rules. Under the value preserving rules, promotion can/does depend on the actual ranges of the types, so you can get different results on different compilers. For example, on most MS-DOS compilers, int is the same size as short and long is different from either. On many current systems int is the same size as long, and short is different from either. With value preserving rules, these can lead to the promoted type being different between the two.
The basic idea of value preserving rules is that it'll promote to a larger signed type if that can represent all the values of the smaller type. For example, a 16-bit unsigned short can be promoted to a 32-bit signed int, because every possible value of unsigned short can be represented as a signed int. The types will be promoted to an unsigned type if and only if that's necessary to preserve the values of the smaller type (e.g., if unsigned short and signed int are both 16 bits, then a signed int can't represent all possible values of unsigned short, so an unsigned short will be promoted to unsigned int).
When you assign the result as you have, the result will get converted to the destination type anyway, so most of this makes relatively little difference -- at least in most typical cases, where it'll just copy the bits into the result, and it's up to you to decide whether to interpret that as signed or unsigned.
When you don't assign the result such as in a comparison, things can get pretty ugly though. For example:
unsigned int a = 5;
signed int b = -5;
if (a > b)
printf("Of course");
else
printf("What!");
Under sign preserving rules, b would be promoted to unsigned, and in the process become equal to UINT_MAX - 4, so the "What!" leg of the if would be taken. With value preserving rules, you can manage to produce some strange results a bit like this as well, but 1) primarily on the DOS-like systems where int is the same size as short, and 2) it's generally harder to do it anyway.
It's choosing whatever type you put your result into or at least cout is honoring that type during output.
I don't remember for sure but I think C++ compilers generate the same arithmetic code for both, it's only compares and output that care about sign.