I executed this c++ code using visual studio 2019, and I got the degree sign as the answer.
int main()
{
char ch1 = 760;
cout <<ch1;
return 0;
}
I have this as the result °, but I expected an integer as the output. I know the binary representation, so I expected to see -8. but I see the degree sign.
This depends on what the underlying type of char is. The initialization rule we have to consider is [dcl.init]/17.8 which states:
Otherwise, the initial value of the object being initialized is the (possibly converted) value of the initializer expression. Standard conversions will be used, if necessary, to convert the initializer expression to the cv-unqualified version of the destination type; no user-defined conversions are considered. If the conversion cannot be done, the initialization is ill-formed. When initializing a bit-field with a value that it cannot represent, the resulting value of the bit-field is implementation-defined.
So, from that we see it will preform a standard conversion. Going to that section we get to the integer conversion section ([conv.integral]) and we have
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type).
and
If the destination type is signed, the value is unchanged if it can be represented in the destination type; otherwise, the value is implementation-defined.
So, if char is unsigned then you'll get 760 mod 2^CHAR_BITS as the value for ch1. If it is signed, then you'll need to see how your implementation handles overflow. In no case though is this undefined behavior, just straight up well defined behavior or implementation defined behavior.
The reason you see a symbol after all of this is because ch1 is a char. std::cout::operator<< is type aware and has a special overload for char's that print out the character from the character set that has the integer value that ch1 holds.
If you want to see an integer you need to cast it like
std::cout << static_cast<int>(ch1);
With the proper warning enabled I get for your code the following (with clang):
warning: implicit conversion from 'int' to 'char' changes value from 760 to -8
Now, it is just a warning, not a compilation error. So the code runs. From this point it goes to printing the character which is represented by this value. Which is related to how std::ostream print chars.
If you want to see -8 as the output you can do:
std::cout << (int)ch1;
However this depends on the representation of char which is platform dependent and can be signed or unsigned, if it is unsigned you will probably see the output as 248.
Related
I am confused by the C++ conversion rules regarding unsigned-to-signed and vice versa.
I'm reading data from a socket and saving it in a std::vector<uint8_t>. I then need to read a part of it
(assuming it is ASCII data) and save it in a std::string. This is what I'm doing:
for (std::vector<uint8_t>::const_iterator it = payload.begin() + start; it < payload.begin() + end; ++it) {
store_name.push_back(*it);
}
So as you can see, *it returns a uint8_t and passes it into the push_back member function of std::string, which takes a char - thus an implicit conversion occurs. char may in fact be either signed or unsigned. I'm not sure what happens if it is signed.
I cannot wrap (no pun intended) my head around what is happening here, and whether or not it is safe.
Does store_name.push_back(*it) change the bit-pattern of *it before storing it in the std::string?
What rules exactly govern this?
I've gone through many places online explaining type-conversion rules, but it still doesn't really stick with me. Explanations will be appreciated.
EDIT: As a different way to put it - in general, what happens when we cast unsigned to signed and vice versa?
unsigned char a = 50; // Inside the range of signed char
signed char b = (signed char) a;
Is the bit pattern in b required to be the same as the bit pattern in a? Or may the bit pattern change?
Also, what about the opposite direction:
a = (unsigned char) b;
Again - does a change to the bit pattern occur? Or is it guaranteed that the underlying bit pattern stays the same, no matter how many signed-unsigned conversion we do, as long as the value is in the correct range?
And does it matter if it's an explicit cast using (cstyle cast) or static_cast<>, or if it's an implicit cast by assignment?
From implicit conversions - Numeric Conversion/Integral conversions:
To unsigned
If the destination type is unsigned, the resulting value is the
smallest unsigned value equal to the source value modulo 2n where n
is the number of bits used to represent the destination type. That is,
depending on whether the destination type is wider or narrower, signed
integers are sign-extended[footnote 1] or truncated and unsigned
integers are zero-extended or truncated respectively.
To signed
If the destination type is signed, the value does not change if the
source integer can be represented in the destination type. Otherwise
the result is implementation-defined (until C++20)the unique value of
the destination type equal to the source value modulo 2n where n is
the number of bits used to represent the destination type. (since
C++20). (Note that this is different from signed integer arithmetic
overflow, which is undefined).
So for values in range, there should be no conversion. Otherwise, I interpret it as if your machine represents values as two's complement, there is no changes in the bits for conversion to unsigned (from C++20 also to signed) and implementation defined until C++20. (I am not sure why, but I assume most compilers do not change the value, even though they are allowed to).
Regarding cstyle-cast vs static-cast: cstyle-cast performs (link)
When the C-style cast expression is encountered, the compiler
attempts to interpret it as the following cast expressions, in this
order:
a) const_cast<new_type>(expression);
b) static_cast<new_type>(expression), with extensions: pointer or
reference to a derived class is additionally allowed to be cast to
pointer or reference to unambiguous base class (and vice versa) even
if the base class is inaccessible (that is, this cast ignores the
private inheritance specifier). Same applies to casting pointer to
member to pointer to member of unambiguous non-virtual base;
c) static_cast (with extensions) followed by const_cast;
d) reinterpret_cast<new_type>(expression);
e) reinterpret_cast followed> by const_cast. The first choice that satisfies the requirements of the respective cast operator is selected, even if it cannot be compiled.
So for signed<->unsiged conversions, cstyle-cast should be the same as static_cast.
For implicit conversion (implicit conversions - Order of the conversions)
Implicit conversion sequence consists of the following, in this order:
zero or one standard conversion sequence;
zero or one user-defined conversion;
zero or one standard conversion sequence.
, where
A standard conversion sequence consists of the following, in this
order:
zero or one conversion from the following set: lvalue-to-rvalue
conversion, array-to-pointer conversion, and function-to-pointer
conversion;
zero or one numeric promotion or numeric conversion;
zero or one function pointer conversion; (since C++17) 4) zero or one
qualification adjustment.
and numeric conversion is yet again the conversion quoted on the top.
static_cast itself converts between types using a combination of implicit and user-defined conversions (link). So there should not be any difference between implicit or explicit.
Does the C++ standard guarantee whether integer conversion that both widens and casts away the sign will sign-extend or zero-extend?
The quick test:
int32_t s = -1;
uint64_t u = s;
produces an 0xFFFFFFFFFFFFFFFF under Xcode, but is that a defined behavior in the first place?
When you do
uint64_t u = s;
[dcl.init]/17.9 applies which states:
the initial value of the object being initialized is the (possibly converted) value of the initializer expression. A standard conversion sequence ([conv]) will be used, if necessary, to convert the initializer expression to the cv-unqualified version of the destination type; no user-defined conversions are considered.
and if we look in [conv], under integral conversions, we have
Otherwise, the result is the unique value of the destination type that is congruent to the source integer modulo 2N, where N is the width of the destination type.
So what you are guaranteed to have happen is that -1 becomes the largest number possible to represent, -2 is one less then that, -3 is one less then -2 and so on, basically it "wraps around".
In fact,
unsigned_type some_name = -1;
Is the canonical way to create a variable with the maximum value for that unsigned integer type.
You can find the standard verbiage in other answers.
But to help you form an intuitive mental model of widening conversions, it is helpful to think of these as a 2-step process:
Sign- or zero-extension of the value. If the value is of signed type, then sign extension is used. Here int32_t is sign-extended to int64_t. On x86, the signedness of type detetermines whether MOVSX or MOVZX instruction is used.
Converting the extended value to the destination type (change of signedness). Here int64_t is converted to uint64_t. It involves 0 assembly instructions as registers are untyped, the compiler just treats that register, which contains the result of sign extension of int32_t, as uint64_t.
Note that the standard doesn't specify these steps, it just specifies the required result.
From the section on Integral conversions:
[conv.integral/3]: Otherwise, the result is the unique value of the destination type that is congruent to the source integer modulo 2N, where N is the width of the destination type.
In other words, the wrap-around "happens last".
Compiling some test code in avr-gcc for an 8-bit micro-controller, the line
const uint32_t N = 65537;
uint8_t values[N];
I got the following compilation warning (by default should be an error, really)
warning: conversion from 'long unsigned int' to 'unsigned int' changes value from '65537' to '1' [-Woverflow]
uint8_t values[N];
Note that when compiling for this target, sizeof(int) is 2.
So it seems that, at an array size cannot exceed the size of an unsigned int.
Am I correct? Is this GCC-specific or is it part of some C or C++ standard?
Before somebody remarks that an 8-bit microcontroller generally does not have enough memory for an array so large, let me just anticipate saying that this is beside the point.
size_t is considered as the type to use, despite not being formally ratified by either the C or C++ standards.
The rationale for this is that the sizeof(values) will be that type (that is mandatated by the C and C++ standards), and the number of elements will be necessarily not greater than this since sizeof for an object is at least 1.
So it seems that, at an array size cannot exceed the size of an
unsigned int.
That seems to be the case in your particular C[++] implementation.
Am I correct? Is this gcc-specific or is it part of some C or C++
standard?
It is not a characteristic of GCC in general, nor is it specified by either the C or C++ standard. It is a characteristic of your particular implementation: a version of GCC for your specific computing platform.
The C standard requires the expression designating the number of elements of an array to have an integer type, but it does not specify a particular one. I do think it's strange that your GCC seems to claim it's giving you an array with a different number of elements than you specified. I don't think that conforms to the standard, and I don't think it makes much sense as an extension. I would prefer to see it reject the code instead.
I'll dissect the issue with the rules in the "incorrekt and incomplet" ISO CPP standard draft n4659. Emphasis is added by me.
11.3.4 defines array declarations. Paragraph one contains
If the constant-expression [between the square brackets] (8.20) is present, it shall be a converted constant expression of type std::size_t [...].
std::size_t is from <cstddef>and defined as
[...] an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object.
Since it is imported via the C standard library headers the C standard is relevant for the properties of size_t. The ISO C draft N2176 prescribes in 7.20.3 the "minimal maximums", if you want, of integer types. For size_t that maximum is 65535. In other words, a 16 bit size_t is entirely conformant.
A "converted constant expression" is defined in 8.20/4:
A converted constant expression of type T is an expression, implicitly converted to type T, where the converted expression is a constant expression and the implicit conversion sequence contains only [any of 10 distinct conversions, one of which concerns integers (par. 4.7):]
— integral conversions (7.8) other than narrowing conversions (11.6.4)
An integral conversion (as opposed to a promotion which changes the type to equivalent or larger types) is defined as follows (7.8/3):
A prvalue of an integer type can be converted to a prvalue of another integer type.
7.8/5 then excludes the integral promotions from the integral conversions. This means that the conversions are usually narrowing type changes.
Narrowing conversions (which, as you'll remember, are excluded from the list of allowed conversions in converted constant expressions used for array sizes) are defined in the context of list-initialization, 11.6.4, par. 7
A narrowing conversion is an implicit conversion
[...]
7.31 — from an integer type [...] to an integer type that cannot represent all the values of the original type, except where the source is a constant expression whose value after integral promotions will fit into the target type.
This is effectively saying that the effective array size must be the constant value at display, which is an entirely reasonable requirement for avoiding surprises.
Now let's cobble it all together. The working hypothesis is that std::size_t is a 16 bit unsigned integer type with a value range of 0..65535. The integer literal 65537 is not representable in the system's 16 bit unsigned int and thus has type long. Therefore it will undergo an integer conversion. This will be a narrowing conversion because the value is not representable in the 16 bit size_t2, so that the exception condition in 11.6.4/7.3, "value fits anyway", does not apply.
So what does this mean?
11.6.4/3.11 is the catch-all rule for the failure to produce an initializer value from an item in an intializer list. Because the initializer-list rules are used for array sizes, we can assume that the catch-all for conversion failure applies to the array size constant:
(3.11) — Otherwise, the program is ill-formed.
A conformant compiler is required to produce a diagnostic, which it does. Case closed.
1 Yes, they sub-divide paragraphs.
2 Converting an integer value of 65537 (in whatever type can hold the number — here probably a `long) to a 16 bit unsigned integer is a defined operation. 7.8/2 details:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s
complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is
no truncation). —end note ]
The binary representation of 65537 is 1_0000_0000_0000_0001, i.e. only the least significant bit of the lower 16 bits is set. The conversion to a 16 bit unsigned value (which circumstantial evidence indicates size_t is) computes the [expression value] modulo 2^16, i.e. simply takes the lower 16 bits. This results in the value of 1 mentioned in the compiler diagnostics.
In your implementation size_t is defined as unsigned int and uint32_t is defined as a long unsigned int. When you create a C array the argument for the array size gets implicitly converted to size_t by the compiler.
This is why you're getting a warning. You're specifying the array size argument with an uint32_t that gets converted to size_t and these types don't match.
This is probably not what you want. Use size_t instead.
The value returned by sizeof will be of type size_t.
It is generally used as the number of elements in an array, because it will be of sufficient size. size_t is always unsigned but it is implementation-defined which type this is. Lastly, it is implementation-defined whether the implementation can support objects of even SIZE_MAX bytes... or even close to it.
[This answer was written when the question was tagged with C and C++. I have not yet re-examined it in light of OP’s revelation they are using C++ rather than C.]
size_t is the type the C standard designates for working with object sizes. However, it is not a cure-all for getting sizes correct.
size_t should be defined in the <stddef.h> header (and also in other headers).
The C standard does not require that expressions for array sizes, when specified in declarations, have the type size_t, nor does it require that they fit in a size_t. It is not specified what a C implementation ought to do when it cannot satisfy a request for an array size, especially for variable length arrays.
In your code:
const uint32_t N = 65537;
uint8_t values[N];
values is declared as a variable length array. (Although we can see the value of N could easily be known at compile time, it does not fit C’s definition of a constant expression, so uint8_t values[N]; qualifies as a declaration of a variable length array.) As you observed, GCC warns you that the 32-bit unsigned integer N is narrowed to a 16-bit unsigned integer. This warning is not required by the C standard; it is a courtesy provided by the compiler. More than that, the conversion is not required at all—since the C standard does not specify the type for an array dimension, the compiler could accept any integer expression here. So the fact that it has inserted an implicit conversion to the type it needs for array dimensions and warned you about it is a feature of the compiler, not of the C standard.
Consider what would happen if you wrote:
size_t N = 65537;
uint8_t values[N];
Now there would be no warning in uint8_t values[N];, as a 16-bit integer (the width of size_t in your C implementation) is being used where a 16-bit integer is needed. However, in this case, your compiler likely warns in size_t N = 65537;, since 65537 will have a 32-bit integer type, and a narrowing conversion is performed during the initialization of N.
However, the fact that you are using a variable length array suggests you may be computing array sizes at run-time, and this is only a simplified example. Possibly your actual code does not use constant sizes like this; it may calculate sizes during execution. For example, you might use:
size_t N = NumberOfGroups * ElementsPerGroup + Header;
In this case, there is a possibility that the wrong result will be calculated. If the variables all have type size_t, the result may easily wrap (effectively overflow the limits of the size_t type). In this case, the compiler will not give you any warning, because the values are all the same width; there is no narrowing conversion, just overflow.
Therefore, using size_t is insufficient to guard against errors in array dimensions.
An alternative is to use a type you expect to be wide enough for your calculations, perhaps uint32_t. Given NumberOfGroups and such as uint32_t types, then:
const uint32_t N = NumberOfGroups * ElementsPerGroup + Header;
will produce a correct value for N. Then you can test it at run-time to guard against errors:
if ((size_t) N != N)
Report error…
uint8_t values[(size_t) N];
If I have this code:
int A;
unsigned int B;
if (A==B) foo();
the compiler will complain about mixed types in comparison. If I cast A like this:
if ((unsigned int) A==B) foo();
does this instruct the compiler to insert code to convert A from int to unsigned int? Or does it just tell the compiler don't worry about, ignore the type mismatch?
UPDATE: If this is unsafe (as pointed out below), how should I handle this comparison? (Wouldn't assigning the contents of an int to an unsigned int for later comparison also be unsafe)
UPDATE: Wow are there some different answers (from people with thousands of posts). I've accepted what seems like the best, but anyone reading this question should read ALL answers carefully.
When casting, at least at the conceptual level, compiler will create a temporary variable of the type specified in the cast expression.
You may test that this expression:
(unsigned int) A = B; // This time assignment is intended
will generate an error pointing modification of a temporary (const) variable.
Of course compiler is free to optimize away any temporary variables created through a cast. Nevertheless a valid method to build a temporary must exist.
The cast implies a conversion, if necessary. But this is problematic for negative values. They are mapped to positive values on the unsigned type. Thus you have to make sure a negative value never compares equal any (positive) unsigned value:
int A;
unsigned int B;
...
if ( (A >= 0) && (static_cast<unsigned int>(A) == B) )
foo();
This works because the unsigned variant of an integer type is guaranteed to hold all positive values (including 0) of the corressponding signed type.
Notice the usage of a static_cast instead of the "classic" C-style cast.
With plain types, in C and C++, == is always done with both operands converted to the same type. In OP's code, A is converted to unsigned first.
If I cast ... does this instruct the compiler to insert code to convert A from int to unsigned int?
Yes, but that code would have occurred anyway. Without the cast, the compiler is simple warning that it is going to do something that the programmer may not have intended.
Or (If I cast ) does it just tell the compiler don't worry about, ignore the type mismatch?
The type mis-match is not ignored. By supplying the cast, there is no type mis-match to warn about.
How should I handle this comparison?
Insure A is not negative, then convert to unsigned with a cast.
int A;
unsigned int B;
// if (A==B) foo();
if (A >= 0 && (unsigned)A == B) foo();
Every non-negative int can be converted to an unsigned with no value change.
The range of nonnegative values of a signed integer type is a subrange of the
corresponding unsigned integer type C11dr §6.2.5 9
So you question is just about a signed/unsigned comparison.
C++ standard says in clause 5 Expressions [expr] § 10:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follow:...
Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type.
and in 4.7 Integral conversions [conv.integral] §2
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s
complement representation, this conversion is conceptual and there is no change in the bit pattern (if there
is no truncation). —end note ]
That means that on a common system using 2-complement for negative numbers and 32 bits for an int or unsigned int, (unsigned int) -1 will end in 4294967295.
It may be what you want or not, the compiler just warn you that it will consider them as equal.
If it is not what you want, just first test whether the signed value is negative. If it is, say that they are not equal an skip the equality comparison.
It depends on type of cast and what you are casting. In your particular case nothing is going to happen, but in other cases the actual code will be performed. Simplest example:
void foo(double d) {};
...
int x;
foo(static_cast<double>(x));
In this example there would be code generated.
Reading the C++ Primer 5th edition book, I noticed that a signed char with a value of 256 is undefined.
I decided to try that, and I saw that std::cout didn't work for that char variable. (Printed Nothing).
But on C, the same thing
signed char c = 256;
would give a value 0 for the char c.
I tried searching but didn't find anything.
Can someone explain to me why is this the case in C++?
Edit: I understand that 256 is 2 bytes, but why doesn't the same thing as in C, happen to C++?
The book is wildly incorrect. There's no undefined behavior in
signed char c = 256;
256 is an integer literal of type int. To initialize a signed char with it, it is converted to signed char (§8.5 [dcl.init]/17.8; all references are to N4140). This conversion is governed by §4.7 [conv.integral]:
1 A prvalue of an integer type can be converted to a prvalue of
another integer type. A prvalue of an unscoped enumeration type can be
converted to a prvalue of an integer type.
2 If the destination type is unsigned, [...]
3 If the destination type is signed, the value is unchanged if it can
be represented in the destination type (and bit-field width);
otherwise, the value is implementation-defined.
If signed char cannot represent 256, then conversion yields an implementation-defined value of type signed char, which is then used to initialize c. There is nothing undefined here.
When people say "signed overflow is UB", they are usually referring to the rule in §5 [expr]/p4:
If during the evaluation of an expression, the result is not
mathematically defined or not in the range of representable values for
its type, the behavior is undefined.
This renders UB expressions like INT_MAX + 1 - the operands are both ints, so the result's type is also int, but the value is outside the range of representable values. This rule does not apply here, as the only expression is 256, whose type is int, and 256 is obviously in the range of representable values for int.
Edit: See T.C.'s answer below. It's better.
Signed integer overflow is undefined in C++ and C. In most implementations, the maximum value of signed char, SCHAR_MAX, is 127 and so putting 256 into it will overflow it. Most of the time you will see the number simply wrap around (to 0), but this is still undefined behavior.
You're seeing the difference between cout and printf. When you output a character with cout you don't get the numeric representation, you get a single character. In this case the character was NUL which doesn't appear on-screen.
See the example at http://ideone.com/7n6Lqc
A char is generally 8 bits or a byte, therefore can hold 2^8 different values. If it is unsigned, from 0 to 255 otherwise, when signed from -128 to 127
unsigned char values is (to be pedantic, usually) is from 0 to 255. There is 256 values, that 1 byte may hold.
If you get overflow (usually) values are used modulo 256, as other Integer type modulo MAX + 1