Reading the C++ Primer 5th edition book, I noticed that a signed char with a value of 256 is undefined.
I decided to try that, and I saw that std::cout didn't work for that char variable. (Printed Nothing).
But on C, the same thing
signed char c = 256;
would give a value 0 for the char c.
I tried searching but didn't find anything.
Can someone explain to me why is this the case in C++?
Edit: I understand that 256 is 2 bytes, but why doesn't the same thing as in C, happen to C++?
The book is wildly incorrect. There's no undefined behavior in
signed char c = 256;
256 is an integer literal of type int. To initialize a signed char with it, it is converted to signed char (§8.5 [dcl.init]/17.8; all references are to N4140). This conversion is governed by §4.7 [conv.integral]:
1 A prvalue of an integer type can be converted to a prvalue of
another integer type. A prvalue of an unscoped enumeration type can be
converted to a prvalue of an integer type.
2 If the destination type is unsigned, [...]
3 If the destination type is signed, the value is unchanged if it can
be represented in the destination type (and bit-field width);
otherwise, the value is implementation-defined.
If signed char cannot represent 256, then conversion yields an implementation-defined value of type signed char, which is then used to initialize c. There is nothing undefined here.
When people say "signed overflow is UB", they are usually referring to the rule in §5 [expr]/p4:
If during the evaluation of an expression, the result is not
mathematically defined or not in the range of representable values for
its type, the behavior is undefined.
This renders UB expressions like INT_MAX + 1 - the operands are both ints, so the result's type is also int, but the value is outside the range of representable values. This rule does not apply here, as the only expression is 256, whose type is int, and 256 is obviously in the range of representable values for int.
Edit: See T.C.'s answer below. It's better.
Signed integer overflow is undefined in C++ and C. In most implementations, the maximum value of signed char, SCHAR_MAX, is 127 and so putting 256 into it will overflow it. Most of the time you will see the number simply wrap around (to 0), but this is still undefined behavior.
You're seeing the difference between cout and printf. When you output a character with cout you don't get the numeric representation, you get a single character. In this case the character was NUL which doesn't appear on-screen.
See the example at http://ideone.com/7n6Lqc
A char is generally 8 bits or a byte, therefore can hold 2^8 different values. If it is unsigned, from 0 to 255 otherwise, when signed from -128 to 127
unsigned char values is (to be pedantic, usually) is from 0 to 255. There is 256 values, that 1 byte may hold.
If you get overflow (usually) values are used modulo 256, as other Integer type modulo MAX + 1
Related
I am writing some code to test for a bit in a bit-field and I wrote this:
return (unsigned)(m_category | category) > 0
I though this wouldn't compile, since I didn't give the integer type, but to my surprise it did. So I'm wondering, what does this do? Is it undefined?
unsigned is the same as unsigned int.
The cast will convert the signed integer to unsigned as follows:
https://en.cppreference.com/w/cpp/language/implicit_conversion#Integral_conversions
If the destination type is unsigned, the resulting value is the smallest
unsigned value equal to the source value modulo 2n
where n is the number of bits used to represent the destination type.
That is, depending on whether the destination type is wider or narrower,
signed integers are sign-extended[footnote 1] or truncated
and unsigned integers are zero-extended or truncated respectively.
Suppose I assign an eleven digits number to an int, what will happen? I played around with it a little bit and I know it's giving me some other numbers within the int range. How is this new number created?
It is implementation-defined behaviour. This means that your compiler must provide documentation saying what happens in this scenario.
So, consult that documentation to get your answer.
A common way that implementations define it is to truncate the input integer to the number of bits of int (after reinterpreting unsigned as signed if necessary).
C++14 Standard references: [expr.ass]/3, [conv.integral]/3
In C++20, this behavior will still be implementation-defined1 but the requirements are much stronger than before. This is a side-consequence of the requirement to use two's complement representation for signed integer types coming in C++20.
This kind of conversion is specified by [conv.integral]:
A prvalue of an integer type can be converted to a prvalue of another integer type. [...]
[...]
Otherwise, the result is the unique value of the destination type that is congruent to the source integer modulo 2N, where N is the width of the destination type.
[...]
This behaviour is the same as truncating the representation of the number to the width of the integer type you are assigning to, e.g.:
int32_t u = 0x6881736df7939752;
...will kept the 32 right-most bits, i.e., 0xf7939752, and "copy" these bits to u. In two's complement, this corresponds to -141322414.
1 This will still be implementation-defined because the size of int is implementation-defined. If you assign to, e.g., int32_t, then the behaviour is fully defined.
Prior to C++20, there were two different rules for unsigned and signed types:
A prvalue of an integer type can be converted to a prvalue of another integer type. [...]
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [...]
If the destination type is signed, the value is unchanged if it can be represented in the destination type; otherwise, the value is implementation-defined.
INT_MAX on a 32 bit system is 2,147,483,647 (231 − 1), UINT_MAX is 4,294,967,295 (232 − 1).
int thing = 21474836470;
What happens is implementation-defined, it's up to the compiler. Mine appears to truncate the higher bits. 21474836470 is 0x4fffffff6,
warning: implicit conversion from 'long' to 'int' changes
value from 21474836470 to -10 [-Wconstant-conversion]
int thing = 21474836470;
TCPL 3rd Edition, C.6.2.1 Integral Conversions gives the following example:
signed char c = 1023; // implementation defined
Plausible results are 255 and -1 C.3.4
The -1 option is obtained if the target machine uses 2s complement.
What implementation of 'signedness' would result in 255?
A compiler could trivially replace your static initialiser with = 255 in this case, on a whim.
But I'd just like to say that looking for an answer to this question is the wrong approach to programming C++. The book is teaching you about C++, not about specific computers. Code to standards and you won't have any problems.
Honnestly I know no implementation that woult result in 255. It is theorically possible, but for a signed char to have a 255 value, the char type shall have at least 9 bits: 8 for the data value and one for the sign.
But as you are converting a value that cannot be represented into a signed type, the result is implementation dependant(*), so it could legitimately be MAX_CHAR for any integer value greater than it, which is 127. Or an implementation could choose to convert unacceptable values to 0.
In any way, it has nothing to do with what is the representation of negative values. Except that common implementations use 2's complement and simply keep the lowest order bits which ends in -1.
But what is true with common implementations is that:
char c = 1023; // plausible values are -1 (default char is signed) or 255 (default unsigned)
(*) Refs:
From n4296 draft for C++14
4.7 Integral conversions [conv.integral]
...
If the destination type is signed, the value is unchanged if it can be represented in the destination type;
otherwise, the value is implementation-defined.
The C and C++ standards both allow signed and unsigned variants of the same integer type to alias each other. For example, unsigned int* and int* may alias. But that's not the whole story because they clearly have a different range of representable values. I have the following assumptions:
If an unsigned int is read through an int*, the value must be within the range of int or an integer overflow occurs and the behaviour is undefined. Is this correct?
If an int is read through an unsigned int*, negative values wrap around as if they were casted to unsigned int. Is this correct?
If the value is within the range of both int and unsigned int, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?
Additionally, what about compatible but not equivalent integer types?
On systems where int and long have the same range, alignment, etc., can int* and long* alias? (I assume not.)
Can char16_t* and uint_least16_t* alias? I suspect this differs between C and C++. In C, char16_t is a typedef for uint_least16_t (correct?). In C++, char16_t is its own primitive type, which compatible with uint_least16_t. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias.
If an unsigned int is read through an int*, the value must be
within the range of int or an integer overflow occurs and the
behaviour is undefined. Is this correct?
Why would it be undefined? there is no integer overflow since no conversion or computation is done. We take an object representation of an unsigned int object and see it through an int. In what way the value of the unsigned int object transposes to the value of an int is completely implementation defined.
If an int is read through an unsigned int*, negative values wrap
around as if they were casted to unsigned int. Is this correct?
Depends on the representation. With two's complement and equivalent padding, yes. Not with signed magnitude though - a cast from int to unsigned is always defined through a congruence:
If the destination type is unsigned, the resulting value is the
least unsigned integer congruent to the source integer (modulo
2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this
conversion is conceptual and there is no change in the bit pattern (if
there is no truncation). — end note ]
And now consider
10000000 00000001 // -1 in signed magnitude for 16-bit int
This would certainly be 215+1 if interpreted as an unsigned. A cast would yield 216-1 though.
If the value is within the range of both int and unsigned int,
accessing it through a pointer of either type is fully defined and
gives the same value. Is this correct?
Again, with two's complement and equivalent padding, yes. With signed magnitude we might have -0.
On systems where int and long have the same range, alignment,
etc., can int* and long* alias? (I assume not.)
No. They are independent types.
Can char16_t* and uint_least16_t* alias?
Technically not, but that seems to be an unneccessary restriction of the standard.
Types char16_t and char32_t denote distinct types with the same
size, signedness, and alignment as uint_least16_t and
uint_least32_t, respectively, in <cstdint>, called the underlying
types.
So it should be practically possible without any risks (since there shouldn't be any padding).
If an int is read through an unsigned int*, negative values wrap around as if they were casted to unsigned int. Is this correct?
For a system using two's complement, type-punning and signed-to-unsigned conversion are equivalent, for example:
int n = ...;
unsigned u1 = (unsigned)n;
unsigned u2 = *(unsigned *)&n;
Here, both u1 and u2 have the same value. This is by far the most common setup (e.g. Gcc documents this behaviour for all its targets). However, the C standard also addresses machines using ones' complement or sign-magnitude to represent signed integers. In such an implementation (assuming no padding bits and no trap representations), the result of a conversion of an integer value and type-punning can yield different results. As an example, assume sign-magnitude and n being initialized to -1:
int n = -1; /* 10000000 00000001 assuming 16-bit integers*/
unsigned u1 = (unsigned)n; /* 11111111 11111111
effectively 2's complement, UINT_MAX */
unsigned u2 = *(unsigned *)&n; /* 10000000 00000001
only reinterpreted, the value is now INT_MAX + 2u */
Conversion to an unsigned type means adding/subtracting one more than the maximum value of that type until the value is in range. Dereferencing a converted pointer simply reinterprets the bit pattern. In other words, the conversion in the initialization of u1 is a no-op on 2's complement machines, but requires some calculations on other machines.
If an unsigned int is read through an int*, the value must be within the range of int or an integer overflow occurs and the behaviour is undefined. Is this correct?
Not exactly. The bit pattern must represent a valid value in the new type, it doesn't matter if the old value is representable. From C11 (n1570) [omitted footnotes]:
6.2.6.2 Integer types
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that objects of that type shall be capable of representing values from 0 to 2N-1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M≤N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:
the corresponding value with sign bit 0 is negated (sign and magnitude);
the sign bit has the value -2M (two's complement);
the sign bit has the value -2M-1 (ones' complement).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. In the case of sign and magnitude and ones' complement, if this representation is a normal value it is called a negative zero.
E.g., an unsigned int could have value bits, where the corresponding signed type (int) has a padding bit, something like unsigned u = ...; int n = *(int *)&u; may result in a trap representation on such a system (reading of which is undefined behaviour), but not the other way round.
If the value is within the range of both int and unsigned int, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?
I think, the standard would allow for one of the types to have a padding bit, which is always ignored (thus, two different bit patterns can represent the same value and that bit may be set on initialization) but be an always-trap-if-set bit for the other type. This leeway, however, is limited at least by ibid. p5:
The values of any padding bits are unspecified. A valid (non-trap) object representation of a signed integer type where the sign bit is zero is a valid object representation of the corresponding unsigned type, and shall represent the same value. For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.
On systems where int and long have the same range, alignment, etc., can int* and long* alias? (I assume not.)
Sure they can, if you don't use them ;) But no, the following is invalid on such platforms:
int n = 42;
long l = *(long *)&n; // UB
Can char16_t* and uint_least16_t* alias? I suspect this differs between C and C++. In C, char16_t is a typedef for uint_least16_t (correct?). In C++, char16_t is its own primitive type, which compatible with uint_least16_t. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias.
I'm not sure about C++, but at least for C, char16_t is a typedef, but not necessarily for uint_least16_t, it could very well be a typedef of some implementation-specific __char16_t, some type incompatible with uint_least16_t (or any other type).
It is not defined that happens since the c standard does not exactly define how singed integers should be stored. so you can not rely on the internal representation. Also there does no overflow occur. if you just typecast a pointer nothing other happens then another interpretation of the binary data in the following calculations.
Edit
Oh, i misread the phrase "but not equivalent integer types", but i keep the paragraph for your interest:
Your second question has much more trouble in it. Many machines can only read from correctly aligned addresses there the data has to lie on multiples of the types width. If you read a int32 from a non-by-4-divisable address (because you casted a 2-byte int pointer) your CPU may crash.
You should not rely on the sizes of types. If you chose another compiler or platform your long and int may not match anymore.
Conclusion:
Do not do this. You wrote highly platform dependent (compiler, target machine, architecture) code that hides its errors behind casts that suppress any warnings.
Concerning your questions regarding unsigned int* and int*: if the
value in the actual type doesn't fit in the type you're reading, the
behavior is undefined, simply because the standard neglects to define
any behavior in this case, and any time the standard fails to define
behavior, the behavior is undefined. In practice, you'll almost always
obtain a value (no signals or anything), but the value will vary
depending on the machine: a machine with signed magnitude or 1's
complement, for example, will result in different values (both ways)
from the usual 2's complement.
For the rest, int and long are different types, regardless of their
representations, and int* and long* cannot alias. Similarly, as you
say, in C++, char16_t is a distinct type in C++, but a typedef in
C (so the rules concerning aliasing are different).
I have done some tests in VC++2010 mixing operands of different sizes that cause overflow in add operation:
int _tmain(int argc, _TCHAR* argv[])
{
__int8 a=127;
__int8 b=1;
__int16 c=b+a;
__int8 d=b+a;
printf("c=%d,d=%d\n",c,d);
return 0;
}
//result is: c=128, d=-128
I don't understand why c==128! My understanding is that in both additions, b+a are still considered addition of 2 signed 8 bit variables. So the result is an overflow i.e. -128. After that, the result is then promoted to 16 bit signed int for the first assignment operation and c should still get a 16 bit -128 value. Is my understanding correct? The c++ standard is a bit difficult to read. Chapter 4 seems talking about integeral promotion but I can't find anything related to this specific example.
My understanding is that in both additions, b+a are still considered addition of 2 signed 8 bit variables. So the result is an overflow i.e. -128.
No, the promotion happens before the + is evaluated, not after it. The addition happens when both a and b are positive. Both numbers are promoted to ints for an addition, added as two positive numbers, and then converted to a 16-bit short. At no point in the process does the result become negative due to an overflow, hence the end result of 128.
Arguably, this makes sense: the behavior of a and b matches that of two numbers in mathematics, making it more intuitive to language practitioners.
It's integral promotion.
1 A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (4.13) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int. [§ 4.5]
In this statement
__int16 c=b+a;
First, all char and short int values are automatically elevated to int. This process is called integral promotion. Next, all operands are converted up to the type of the largest operand, which is called type promotion. [Herbert Schildt]
The values of variables b and a will be promoted to int and then the operation applies on them.
In twos-complement integer representation, a signed value is represented by setting the highest bit. This allows the machine to add and subtract binary integers with the same instructions, regardless of whether the integer is signed.
a = 127 == 0x7F == 0b01111111
+ b = 1 == 0x01 == 0b00000001
-------------------------------
c = 128 == 0x80 == 0b10000000
d =-128 == 0x80 == 0b10000000
The variables c and d may have different types, but different types of integers are merely different interpretations of a single binary value. As you can see above, the binary value fits in 8 bits just fine. Since the standard requires terms of a mathematical expression to be zero- or sign-extended (promoted) to the size of a machine word before any math is done and neither operand will be sign-extended, the result is always 0b10000000 no matter what type the operands are.
In summary, the difference between the results is that, to a 16 bit integer the sign bit is 0b1000000000000000 (which a+b doesn't have), and to an 8 bit integer the sign bit is 0b10000000 (which a+b does have).