How does C treat char sums? - c++

When I'm in C++, and I call an overloaded function foo, like so:
foo('e' - (char) 5)
it can output "this is a char" or "this is an int" based on the type result. I get "this is an int" from my program, like this:
#include <iostream>
void foo(char x)
{
std::cout << "output is a char" << std::endl;
}
void foo(int x)
{
std::cout << "output is an int" << std::endl;
}
int main()
{
foo('a' + (char) 5);
}
My instructor says that in C, the expression above, ('a' + (char) 5), evaluates as a char. I see in the C99 standard that chars are promoted to ints to find the sum, but does C recast them back to chars when it's done? I can't find any references that seem credible saying one way or another what C actually does after the promotion is completed, and the sum is found.
Is the sum left as an int, or given as a char? How can I prove this in C, or is there a reference I'm not understanding/finding?

From the C Standard, 6.3.1.8 Usual arithmetic conversions, emphasis mine:
Many operators that expect operands of arithmetic type cause conversions and yield result
types in a similar way. The purpose is to determine a common real type for the operands
and result. For the specified operands, each operand is converted, without change of type
domain, to a type whose corresponding real type is the common real type. Unless
explicitly stated otherwise, the common real type is also the corresponding real type of
the result, whose type domain is the type domain of the operands if they are the same,
and complex otherwise. This pattern is called the usual arithmetic conversions:
First, if the correspeonding real type of either operand is long double...
Otherwise, if the corresponding real type of either operand is double...
Otherwise, if the corresponding real type of either operand is float...
Otherwise, the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands:
If both operands have the same type, then no further conversion is needed.
So you are exactly correct. The type of the expression 'a' + (char) 5 is int. There is no recasting back to char, unless explicitly asked for by the user. Note that 'a' here has type int, so it's only the (char)5 that needs to be promoted. This is stipulated in 6.4.4.4 Character Constants:
An integer character constant is a sequence of one or more multibyte characters enclosed
in single-quotes, as in 'x'.
...
An integer character constant has type int.
There is an example demonstrating the explicit recasting to char:
In executing the fragment
char c1, c2;
/* ... */
c1 = c1 + c2
the ‘‘integer promotions’’ require that the abstract machine promote the value of each variable to int size and then add the two ints and truncate the sum. Provided the addition of two chars can be done without overflow, or with overflow wrapping silently to produce the correct result, the actual execution need only produce the same result, possibly omitting the promotions.
The truncation here only happens because we assign back to a char.

No, C does not recast them back to chars.
The standard (ISO/IEC 9899:1999) says (6.3.1.8 Usual arithmetic conversions):
Many operators that expect operands of arithmetic type cause conversions and yield result
types in a similar way. The purpose is to determine a common real type for the operands
and result. For the specified operands, each operand is converted, without change of type
domain, to a type whose corresponding real type is the common real type. Unless
explicitly stated otherwise, the common real type is also the corresponding real type of
the result, whose type domain is determined by the operator.

Your instructor seems to be wrong. Additional to your standard find that the arithmetic promotes to int, we can use a simple test program to show the behavior (no standard prove of course, but the same level of proof as your C++ test):
#include <stdio.h>
int main () {
printf("%g",'c' - (char)5);
}
produces
Warning: format specifies type 'double' but argument has type 'int'
with gcc and clang.

You can't determine the type of an expression as easily in C, but you can easily determine the size of an expression:
#include <stdio.h>
int main(void) {
printf("sizeof(char)==1\n");
printf("sizeof(int)==%u\n", sizeof(int));
printf("sizeof('a' + (char) 5)==%u\n", sizeof('a' + (char) 5));
return 0;
}
This gives me:
sizeof(char)==1
sizeof(int)==4
sizeof('a' + (char) 5)==4
which at least proves that 'a' + (char) 5 is not of type char.

It's promoted to an int, and there's nothing to tell the compiler it should use anything else. You can convert back to a char like this:
foo((char)('a' + 5));
This tells the compiler to treat the result of the calculation as a char, otherwise it leaves it as an int.

Section 6.5.2.2/6
If the expression that denotes the called function has a type that
does not include a prototype, the integer promotions are performed on
each argument...
So the answer to your question depends on the function prototype. If the function is declared as
void foo(int x)
or
void foo()
then the function argument will be passed as an int.
OTOH, if the function is declared as
void foo( char x )
then the result of the expression will be implicitly cast to char.

In C (unlike C++), the character literal 'a' has type int (§6.4.4.4¶10: "An integer character constant has type int.")
Even if that were not the case, the C standard clearly states that prior to the evaluation of the operator +, "[i]f both operands have arithmetic type, the usual arithmetic conversions are performed on them." (C11, §6.5.6 ¶4) In this respect, C and C++ have identical semantics. (See [expr.add] §5.7¶1 of C++)

From the C++ Standard (C++ Working Draft N3797, 5.7 Additive operators)
1 The additive operators + and - group left-to-right. The usual
arithmetic conversions are performed for operands of arithmetic or
enumeration type.
and (5 Expressions)
10 Many binary operators that expect operands of arithmetic or
enumeration type cause conversions and yield result types in a similar
way. The purpose is to yield a common type, which is also the type of
the result. This pattern is called the usual arithmetic conversions,
which are defined as follows:
...
— Otherwise, the integral promotions (4.5) shall be performed on
both operands.62 Then the following rules shall be applied to the
promoted operands:
Thus the expression in the function call
foo('a' + (char) 5);
has type int.
To call the overloaded function with parameter of type char you have to write for example
foo( char( 'a' + 5 ) );
or
foo( ( char )( 'a' + 5 ) );
or you can use C++ casting like
foo( static_cast<char>( 'a' + 5 ) );
The above quotes from the C++ Standard also are valid for C Standard. The visible difference is that in C++ character literals have type char while in C they have type int.
So in C++ the output of the statement
std::cout << sizeof( 'a' ) << std::endl;
will be equal to 1.
While in C the output of the statement
printf( "%zu\n", sizeof( 'a' ) );
will be equal to sizeof( int ) that is usually equal to 4.

Related

Is it safe to implicitly convert a `uint8_t` (read from a socket) to a `char`?

I am confused by the C++ conversion rules regarding unsigned-to-signed and vice versa.
I'm reading data from a socket and saving it in a std::vector<uint8_t>. I then need to read a part of it
(assuming it is ASCII data) and save it in a std::string. This is what I'm doing:
for (std::vector<uint8_t>::const_iterator it = payload.begin() + start; it < payload.begin() + end; ++it) {
store_name.push_back(*it);
}
So as you can see, *it returns a uint8_t and passes it into the push_back member function of std::string, which takes a char - thus an implicit conversion occurs. char may in fact be either signed or unsigned. I'm not sure what happens if it is signed.
I cannot wrap (no pun intended) my head around what is happening here, and whether or not it is safe.
Does store_name.push_back(*it) change the bit-pattern of *it before storing it in the std::string?
What rules exactly govern this?
I've gone through many places online explaining type-conversion rules, but it still doesn't really stick with me. Explanations will be appreciated.
EDIT: As a different way to put it - in general, what happens when we cast unsigned to signed and vice versa?
unsigned char a = 50; // Inside the range of signed char
signed char b = (signed char) a;
Is the bit pattern in b required to be the same as the bit pattern in a? Or may the bit pattern change?
Also, what about the opposite direction:
a = (unsigned char) b;
Again - does a change to the bit pattern occur? Or is it guaranteed that the underlying bit pattern stays the same, no matter how many signed-unsigned conversion we do, as long as the value is in the correct range?
And does it matter if it's an explicit cast using (cstyle cast) or static_cast<>, or if it's an implicit cast by assignment?
From implicit conversions - Numeric Conversion/Integral conversions:
To unsigned
If the destination type is unsigned, the resulting value is the
smallest unsigned value equal to the source value modulo 2n where n
is the number of bits used to represent the destination type. That is,
depending on whether the destination type is wider or narrower, signed
integers are sign-extended[footnote 1] or truncated and unsigned
integers are zero-extended or truncated respectively.
To signed
If the destination type is signed, the value does not change if the
source integer can be represented in the destination type. Otherwise
the result is implementation-defined (until C++20)the unique value of
the destination type equal to the source value modulo 2n where n is
the number of bits used to represent the destination type. (since
C++20). (Note that this is different from signed integer arithmetic
overflow, which is undefined).
So for values in range, there should be no conversion. Otherwise, I interpret it as if your machine represents values as two's complement, there is no changes in the bits for conversion to unsigned (from C++20 also to signed) and implementation defined until C++20. (I am not sure why, but I assume most compilers do not change the value, even though they are allowed to).
Regarding cstyle-cast vs static-cast: cstyle-cast performs (link)
When the C-style cast expression is encountered, the compiler
attempts to interpret it as the following cast expressions, in this
order:
a) const_cast<new_type>(expression);
b) static_cast<new_type>(expression), with extensions: pointer or
reference to a derived class is additionally allowed to be cast to
pointer or reference to unambiguous base class (and vice versa) even
if the base class is inaccessible (that is, this cast ignores the
private inheritance specifier). Same applies to casting pointer to
member to pointer to member of unambiguous non-virtual base;
c) static_cast (with extensions) followed by const_cast;
d) reinterpret_cast<new_type>(expression);
e) reinterpret_cast followed> by const_cast. The first choice that satisfies the requirements of the respective cast operator is selected, even if it cannot be compiled.
So for signed<->unsiged conversions, cstyle-cast should be the same as static_cast.
For implicit conversion (implicit conversions - Order of the conversions)
Implicit conversion sequence consists of the following, in this order:
zero or one standard conversion sequence;
zero or one user-defined conversion;
zero or one standard conversion sequence.
, where
A standard conversion sequence consists of the following, in this
order:
zero or one conversion from the following set: lvalue-to-rvalue
conversion, array-to-pointer conversion, and function-to-pointer
conversion;
zero or one numeric promotion or numeric conversion;
zero or one function pointer conversion; (since C++17) 4) zero or one
qualification adjustment.
and numeric conversion is yet again the conversion quoted on the top.
static_cast itself converts between types using a combination of implicit and user-defined conversions (link). So there should not be any difference between implicit or explicit.

Can I check built-in type at runtime?

For example,
If I write:
char c = CHAR_MAX;
c++;
Can I know if 'c++' results in int or char so I know for sure if its not an overflow?
I don't know what you mean by "check at runtime", but I can tell you for sure that c++ results in a prvalue of type char, and c is always char. c is never converted to int.
Per [expr.post.incr]/1:
The value of a postfix ++ expression is the value of its operand.
[ Note: The value obtained is a copy of the original value
— end note ] The operand shall be a modifiable lvalue. The
type of the operand shall be an arithmetic type other than cv
bool, or a pointer to a complete object type. The value of the
operand object is modified by adding 1 to it. The value computation
of the ++ expression is sequenced before the modification of the
operand object. With respect to an indeterminately-sequenced function
call, the operation of postfix ++ is a single evaluation. [ Note:
Therefore, a function call shall not intervene between the
lvalue-to-rvalue conversion and the side effect associated with any
single postfix ++ operator. — end note ] The result is a
prvalue. The type of the result is the cv-unqualified version of the
type of the operand. If the operand is a bit-field that cannot
represent the incremented value, the resulting value of the bit-field
is implementation-defined. See also [expr.add] and [expr.ass].
As mentioned by Nikos C. in comment, you should check if c == CHAR_MAX prior to incrementing. For more about checking for signed overflow, see Detecting signed overflow in C/C++.
Can I know if 'c++' results in int or char
As per standard quote in L.F.'s answer, you can know that it results in char.
so I know for sure if its not an overflow?
You can know for sure that it is an overflow. On systems where char is a signed type, the behaviour of the program will be undefined as far as I can tell.
Can I check built-in type at runtime?
You cannot check built-in types at runtime, but you can check them already at compiletime. For example:
static_assert(std::is_same_v<decltype(c++), char>);
when I say: signed char c = CHAR_MAX + 1 then CHAR_MAX + 1 becomes int result and then in is assigned to c which is implementation-defined.
Indeed. Except on exotic systems where sizeof(signed char) == sizeof(int) in which case there is no promotion, and the arithmetic causes overflow which is undefined behaviour.
And only until C++20. Since C++20, signed initialisation with unrepresentable value is defined by the standard.
Can I ever make signed char overflow?
Yes. Using the increment operator. As far as I can tell, the standard says nothing about promotion within the increment operator. However, this may be open to interpretation.

What is the reason for the existent difference between C and C++ relative to the unary arithmetic operator +

In C the unary plus operator is called unary arithmetic operator and may not be applied to pointers (the C Standard, 6.5.3.3 Unary arithmetic operators).
1 The operand of the unary + or - operator shall have arithmetic
type; of the ~ operator, integer type; of the ! operator, scalar
type.
Thus this program will not compile
#include <stdio.h>
int main(void)
{
int a = 10;
int *pa = &a;
printf( "%d\n", *+pa );
return 0;
}
However in C++ the unary plus operator may be applied to pointers (the C++ Standard, 5.3.1 Unary operators)
7 The operand of the unary + operator shall have arithmetic, unscoped
enumeration, or pointer type and the result is the value of the
argument. Integral promotion is performed on integral or enumeration
operands. The type of the result is the type of the promoted operand.
And this program compiles successfully.
#include <iostream>
int main()
{
int a = 10;
int *pa = &a;
std::cout << *+pa << std::endl;
return 0;
}
What is the reason for maintaining this difference between C and C++?
The question arose when I was answering the question Why size of int pointer is different of size of int array?. I was going to show how to convert an array to a pointer in the sizeof operator.
At first I wanted to write
sizeof( +array )
However this expression is invalid in C. So I had to write
sizeof( array + 0 )
and I found that there is such a difference between C and C++.:)
Different languages may attach different semantics to the same syntax.
C and C++ are different languages with a common ancestor. C++ semantics look deceptively similar but are subtly different for some parts of the common syntax. Another curious case is this:
if (sizeof(char) == sizeof(int)) {
printf("Hello embedded world\n");
} else {
if (sizeof('a') == sizeof(char))
printf("This is C++ code\n");
if (sizeof('a') == sizeof(int))
printf("This is C code\n");
}
The reason for C++ to have extended the C syntax in the case of unary + might be to allow for some extended numeric types to be implemented as pointers, or simply for reasons of symmetry.
As Jaa-c mentions in a comment, +p is a computed expression whereas p is a reference to p. You provided another example where + can be used to force expression context. The question is why did the original authors of the C language disallow unary + on non numeric types? Maybe a side effect of the original implementation of pcc.
Note that in Javascript, the unary + operator can be applied to non number types and operates as a conversion to number.
In my considerations:
C++ is a type of Object-Oriented Language. So every data type can be treated as a "Class".
In C int is one of "the basic data type of C". But in C++ we can consider int as a Class. Thus, In C++ int pointer and int array belong to the different classes. In C a int pointer variable stored another int variable's address. int array's name instead of the first element's address of that int array. So in C they have kind of the same meaning.
As for the unary opreator "+", I understand the C++ language as: Every class In C++ represents a set of stuff. Every stuff in the set has the same properties. And there's some operations can be done onto each stuff. Of course these operations are member functions of a class. Another character In C++ is that users can overload an operator. Overload means we can do the same operation on the different Classes. For example: A man is eating a burger. we can overload action "Eat" between cats and rat: A cat is Eating a rat.
So as the C++ standard say:"The operand of the unary + operator shall have arithmetic, unscoped enumeration, or pointer type and the result is the value of the argument." That's just a overload for unary operator + in Class unscoped enumeration and pointer type. "And The Result Is The Value Of The Argument"-> I guess that's the point.

How does the compiler choose which value to give auto?

First off, I am aware that this is an extremely simple question. I'm just looking for a technical explanation as to why the compiler decides to make the following variable with an auto type specifier the type double over int:
int value1 = 5;
double value2 = 2.2;
auto value3 = value1 * value2;
I know that the compiler will derive the double type for value3 from the initialized value, but why exactly is that?
auto variable types are defined in terms of template type deduction. Like this:
template<typename T>
void f(T t);
f(value1 * value2); // will call f<double>()
The reason value1 * value2 gives double rather than int is because the arithmetic conversion rules allow turning int into double (the reverse is an implicit conversion also but not an arithmetic conversion). When you use operators on built-in types, "the usual arithmetic conversions are applied".
Here's the rule found in section 5 (Expressions) of the Standard:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result.This pattern is called the usual arithmetic conversions, which are defined as follows:
If either operand is of scoped enumeration type, no conversions are performed; if the other operand does not have the same type, the expression is ill-formed.
If either operand is of type long double, the other shall be converted to long double.
Otherwise, if either operand is double, the other shall be converted to double.
Otherwise, if either operand is float, the other shall be converted to float.
Otherwise, the integral promotions shall be performed on both operands.
Because when multiplying an int by a double you get a double.
C and C++ compilers always promote basic numeric types to the most general type included in an expression. So, any expression involving two int values yields an int, but if either of the operands is double, then the expression value will also be double.

Why does C/C++ automatically convert char/wchar_t/short/bool/enum types to int?

So, if I understood it well, integral promotion provides that: char, wchar_t, bool, enum, short types ALWAYS are converted to int (or unsigned int). Then, if there are different types in an expression, further conversion will be applied.
Am I understanding this well?
And if yes, then my question: Why is it good? Why? Don't become char/wchar_t/bool/enum/short unnecessary? I mean for example:
char c1;
char c2;
c1 = c2;
As I described before, char ALWAYS is converted to int, so in this case after automatic converting this looks like this:
int c1;
int c2;
c1 = c2;
But I can't understand why is this good, if I know that char type will be enough for my needs.
Storage types are never automatically converted. You only get automatic integer promotion as soon as you start doing integer arithmetics (+, -, bitshifts, ...) on those variables.
char c1, c2; // stores them as char
char c3 = c1 + c2; // equivalent to
char c3 = (char)((int)c1 + (int)c2);
The conversions you're asking about are the usual arithmetic conversions and the integer promotions, defined in section 6.3.1.8 of the latest ISO C standard. They're applied to the operands of most binary operators ("binary" meaning that they take two operands, such as +, *, etc.). (The rules are similar for C++. In this answer, I'll just refer to the C standard.)
Briefly the usual arithmetic conversions are:
If either operand is long double, the other operand is converted to long double.
Otherwise, if either operand is double, the other operand is converted to double.
Otherwise, if either operand is float, the other operand is converted to float.
Otherwise, the integer promotions are performed on both operands, and then some other rules are applied to bring the two operands to a common type.
The integer promotions are defined in section 6.3.1.1 of the C standard. For a type narrower than int, if the type int can hold all the values of the type, then an expression of that type is converted to int; otherwise it's converted to unsigned int. (Note that this means that an expression of type unsigned short may be converted either to int or to unsigned int, depending on the relative ranges of the types.)
The integer promotions are also applied to function arguments when the declaration doesn't specify the type of the parameter. For example:
short s = 2;
printf("%d\n", s);
promotes the short value to int. This promotion does not occur for non-variadic functions.
The quick answer for why this is done is that the standard says so.
The underlying reason for all this complexity is to allow for the restricted set of arithmetic operations available on most CPUs. With this set of rules, all arithmetic operators (other than the shift operators, which are a special case) are only required to work on operands of the same type. There is no short + long addition operator; instead, the short operand is implicitly converted to long. And there are no arithmetic operators for types narrower than int; if you add two short values, both arguments are promoted to int, yielding an int result (which might then be converted back to short).
Some CPUs can perform arithmetic on narrow operands, but not all can do so. Without this uniform set of rules, either compilers would have to emulate narrow arithmetic on CPUs that don't support it directly, or the behavior of arithmetic expressions would vary depending on what operations the target CPU supports. The current rules are a good compromise between consistency across platforms and making good use of CPU operations.
if I understood it well, integral promotion provides that: char, wchar_t, bool, enum, short types ALWAYS converted to int (or unsigned int).
Your understanding is only partially correct: short types are indeed promoted to int, but only when you use them in expressions. The conversion is done immediately before the use. It is also "undone" when the result is stored back.
The way the values are stored remains consistent with the properties of the type, letting you control the way you use your memory for the variables that you store. For example,
struct Test {
char c1;
char c2;
};
will be four times as small as
struct Test {
int c1;
int c2;
};
on systems with 32-bit ints.
The conversion is not performed when you store the value in the variable. The conversion is done if you cast the value or if you perform some operation like some arithmetic operation on it explicitly
It really depends on your underlying microprocessor architecture. For example, if your processor is 32-bit, that is its native integer size. Using its native integer size in integer computations is better optimized.
Type conversion takes place when arithmetic operations, shift operations, unary operations are performed. See what standard says about it:
C11; 6.3.1.4 Real floating and integer:
If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int; otherwise, it is converted to an unsigned
int. These are called the integer promotions.58) All other types are unchanged by the
integer promotions.
58.The integer promotions are applied only: as part of the usual arithmetic conversions, to certain argument expressions, to the operands of the unary +, -, and ~ operators, and to both operands of the shift operators,1 as specified by their respective subclauses
1. Emphasis is mine.