I am writing some code to test for a bit in a bit-field and I wrote this:
return (unsigned)(m_category | category) > 0
I though this wouldn't compile, since I didn't give the integer type, but to my surprise it did. So I'm wondering, what does this do? Is it undefined?
unsigned is the same as unsigned int.
The cast will convert the signed integer to unsigned as follows:
https://en.cppreference.com/w/cpp/language/implicit_conversion#Integral_conversions
If the destination type is unsigned, the resulting value is the smallest
unsigned value equal to the source value modulo 2n
where n is the number of bits used to represent the destination type.
That is, depending on whether the destination type is wider or narrower,
signed integers are sign-extended[footnote 1] or truncated
and unsigned integers are zero-extended or truncated respectively.
Related
Suppose I assign an eleven digits number to an int, what will happen? I played around with it a little bit and I know it's giving me some other numbers within the int range. How is this new number created?
It is implementation-defined behaviour. This means that your compiler must provide documentation saying what happens in this scenario.
So, consult that documentation to get your answer.
A common way that implementations define it is to truncate the input integer to the number of bits of int (after reinterpreting unsigned as signed if necessary).
C++14 Standard references: [expr.ass]/3, [conv.integral]/3
In C++20, this behavior will still be implementation-defined1 but the requirements are much stronger than before. This is a side-consequence of the requirement to use two's complement representation for signed integer types coming in C++20.
This kind of conversion is specified by [conv.integral]:
A prvalue of an integer type can be converted to a prvalue of another integer type. [...]
[...]
Otherwise, the result is the unique value of the destination type that is congruent to the source integer modulo 2N, where N is the width of the destination type.
[...]
This behaviour is the same as truncating the representation of the number to the width of the integer type you are assigning to, e.g.:
int32_t u = 0x6881736df7939752;
...will kept the 32 right-most bits, i.e., 0xf7939752, and "copy" these bits to u. In two's complement, this corresponds to -141322414.
1 This will still be implementation-defined because the size of int is implementation-defined. If you assign to, e.g., int32_t, then the behaviour is fully defined.
Prior to C++20, there were two different rules for unsigned and signed types:
A prvalue of an integer type can be converted to a prvalue of another integer type. [...]
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [...]
If the destination type is signed, the value is unchanged if it can be represented in the destination type; otherwise, the value is implementation-defined.
INT_MAX on a 32 bit system is 2,147,483,647 (231 − 1), UINT_MAX is 4,294,967,295 (232 − 1).
int thing = 21474836470;
What happens is implementation-defined, it's up to the compiler. Mine appears to truncate the higher bits. 21474836470 is 0x4fffffff6,
warning: implicit conversion from 'long' to 'int' changes
value from 21474836470 to -10 [-Wconstant-conversion]
int thing = 21474836470;
I know that the following
unsigned short b=-5u;
evaluates to b being 65531 due to an underflow, but I don't understand if 5u is converted to a signed int before being transformed into -5 and then re-converted back to unsigned to be stored in b or -5u is equal to 0 - 5u (this should not be the case, -x is a unary operator)
5u is a literal unsigned integer, -5u is its negation.. Negation for unsigned integers is defined as subtraction from 2**n, which gets the same result as wrapping the result of subtraction from zero.
5u is a single token, an rvalue expression which has type unsigned int.
The unary - operator is applied to it according to the rules of unsigned
arithmetic (arithmetic modulo 2^n, where n is the number of bits in the
unsigned type). The results are converted to unsigned short; if they don't
fit (and they won't if sizeof(int) > sizeof(short)), the conversion will
be done using modulo arithmetic as well (modulo 2^n, where n is the number of
bits in the target type).
It's probably worth noting that if the original argument has type unsigned
short, the actual steps are different (although the results will always be
the same). Thus, if you'd have written:
unsigned short k = 5;
unsigned short b = -k;
the first operation would depend on the size of short. If shorts are smaller
than ints (often, but not always the case), the first step would be to promote
k to an int. (If the size of short and int are identical, then the first
step would be to promote k to unsigned int; from then on, everything
happens as above.) The unary - would be applied to this int, according to
the rules of signed integer arithmetic (thus, resulting in a value of -5). The
resulting -5 will be implicitly converted to unsigned short, using modulo
arithmetic, as above.
In general, these distinctions don't make a difference, but in cases where you
might have an integral value of INT_MIN, they could; on 2's complement
machines, -i, where i has type int and value INT_MIN is implementation
defined, and may result in strange values when later converted to unsigned.
ISO/IEC 14882-2003 section 4.7 says:
"If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note ]
Technically not an underflow, just the representation of the signed value -5 when shown as an unsigned number. Note that signed and unsigned numbers have the same "bits" - they just display differently. If you were to print the value as a signed value [assuming it's extended using the sign bit to fill the remaining bits], it would show -5. [This assumes that it's a typical machine using 2s complement. The C standard doesn't require that signed and unsigned types are the same number of bits, nor that the computer uses 2s complement for representing signed numbers - obviously, if it's not using 2s complement, it won't match up to the value you've shown, so I made the assumption that yours IS a 2s complement machine - which is all common processors, such as x86, 68K, 6502, Z80, PDP-11, VAX, 29K, 8051, ARM, MIPS. But technically, it is not necessary for C to function correctly]
And when you use the unary operator -x, it has the same effect as 0-x [this applies for computers as well as math - it has the same result].
Reading the C++ Primer 5th edition book, I noticed that a signed char with a value of 256 is undefined.
I decided to try that, and I saw that std::cout didn't work for that char variable. (Printed Nothing).
But on C, the same thing
signed char c = 256;
would give a value 0 for the char c.
I tried searching but didn't find anything.
Can someone explain to me why is this the case in C++?
Edit: I understand that 256 is 2 bytes, but why doesn't the same thing as in C, happen to C++?
The book is wildly incorrect. There's no undefined behavior in
signed char c = 256;
256 is an integer literal of type int. To initialize a signed char with it, it is converted to signed char (§8.5 [dcl.init]/17.8; all references are to N4140). This conversion is governed by §4.7 [conv.integral]:
1 A prvalue of an integer type can be converted to a prvalue of
another integer type. A prvalue of an unscoped enumeration type can be
converted to a prvalue of an integer type.
2 If the destination type is unsigned, [...]
3 If the destination type is signed, the value is unchanged if it can
be represented in the destination type (and bit-field width);
otherwise, the value is implementation-defined.
If signed char cannot represent 256, then conversion yields an implementation-defined value of type signed char, which is then used to initialize c. There is nothing undefined here.
When people say "signed overflow is UB", they are usually referring to the rule in §5 [expr]/p4:
If during the evaluation of an expression, the result is not
mathematically defined or not in the range of representable values for
its type, the behavior is undefined.
This renders UB expressions like INT_MAX + 1 - the operands are both ints, so the result's type is also int, but the value is outside the range of representable values. This rule does not apply here, as the only expression is 256, whose type is int, and 256 is obviously in the range of representable values for int.
Edit: See T.C.'s answer below. It's better.
Signed integer overflow is undefined in C++ and C. In most implementations, the maximum value of signed char, SCHAR_MAX, is 127 and so putting 256 into it will overflow it. Most of the time you will see the number simply wrap around (to 0), but this is still undefined behavior.
You're seeing the difference between cout and printf. When you output a character with cout you don't get the numeric representation, you get a single character. In this case the character was NUL which doesn't appear on-screen.
See the example at http://ideone.com/7n6Lqc
A char is generally 8 bits or a byte, therefore can hold 2^8 different values. If it is unsigned, from 0 to 255 otherwise, when signed from -128 to 127
unsigned char values is (to be pedantic, usually) is from 0 to 255. There is 256 values, that 1 byte may hold.
If you get overflow (usually) values are used modulo 256, as other Integer type modulo MAX + 1
What happens if I do something like this:
unsigned int u;
int s;
...
s -= u;
What's the expected behavior of this:
1) Assuming that the unsigned integer isn't too big to fit in the signed integer?
2) Assuming that the unsigned integer would overflow the signed integer?
Thanks.
In general, consult 5/9 in the standard.
In your example, the signed value is converted to unsigned (by taking it mod UINT_MAX+1), then the subtraction is done modulo UINT_MAX+1 to give an unsigned result.
Storing this result back as a signed value to s involves a standard integral conversion - this is in 4.7/3. If the value is in the range of signed int then it is preserved, otherwise the value is implementation-defined. All the implementations I've ever looked at have used modulo arithmetic to push it into the range INT_MIN to INT_MAX, although as Krit says you might get a warning for doing this implicitly.
"Stunt" implementations that you'll probably never deal with might have different rules for unsigned->signed conversion. For example if the implementation has sign-magnitude representation of signed integers, then it's not possible to always convert by taking modulus, since there's no way to represent +/- (UNIT_MAX+1)/2 as an int.
Also relevant is 5.17/7, "The behavior of an expression of the form E1 op= E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once". This means that in order to say that the subtraction is done in the unsigned int type, all we need to know is that s - u is done in unsigned int: there's no special rule for -= that arithmetic should be done in the type of the LHS.
u is recast as a signed integer and subtracted from s. Ultimately the casting doesn't make any difference. One set of bits is subtracted from the other and the result goes into s.
When a compiler finds a signed / unsigned mismatch, what action does it take? Is the signed number cast to an unsigned or vice versa? and why?
If operand are integral and there is an unsigned value, then conversion to unsigned is done. For example:
-1 > (unsigned int)1 // as -1 will be converted to 2^nbits-1
Conversion int->unsigned int is: n>=0 -> n; n<0 -> n (mod 2^nbits), for example -1 goes to 2^nbits-1
Conversion unsigned int->int is: n <= INT_MAX -> n; n > INT_MAX -> implementation defined
If the destination type is unsigned,
the resulting value is the least
unsigned integer congruent to the
source integer (modulo 2^n where n is
the number of bits used to represent
the unsigned type).
If the destination type is signed, the
value is unchanged if it can be
represented in the destination type
(and bit-field width); otherwise, the
value is implementation-defined.
I don't think C++ deviates from the way C handles signed/unsigned conversions:
Conversion rules are more complicated
when unsigned operands are involved.
The problem is that comparisons
between signed and unsigned values are
machine-dependent, because they depend
on the sizes of the various integer
types. (K&R)
One important factor to consider is whether one of the types is a long integer or not, because that will affect integer promotions. For example, if a long int is compared to an unsigned int, and a long int can represent all the values of an unsigned int, then the unsigned int will be converted to a long int. (Otherwise, they're both just converted to an unsigned long int.)
In most cases, though, the compiler should convert signed integers to unsigned integers if it finds a mismatch.
It could be compiler-specific.
If you look at the Question "Should I disable the C compiler signed/unsigned mismatch warning?" you see that in case of "litb" the variable the signed-var got "promoted" to an unsigned value.
In any case there is no "correct" way for a compiler to handle this situation once one variable reaches a certain value (that is, where the most significant bit is set).
So, if you have such a warning, be sure to get rid of it ;)