Unary plus (+) against literal string - c++

Today I wrote an expression:
"<" + message_id + "#" + + ">"
^
|
\____ see that extra '+' here!
and got surprised that it actually compiled. (PS message_id is a QString, it would also work with an std::string)
I often do things like that, leave out a variable as I'm working and I expect the compiler to tell me where I'm still missing entries. The final would look something like this:
"<" + message_id + "#" + network_domain + ">"
Now I'd like to know why the + unary operator is valid against a string literal!?

Unary + can be applied to arithmetic type values, unscoped enumeration values and pointer values because ...
the C++ standard defines it that way, in C++11 §5.3.1/7.
In this case the string literal, which is of type array of char const, decays to pointer to char const.
It's always a good idea to look at the documentation when one wonders about the functionality of something.
“The operand of the unary + operator shall have arithmetic, unscoped enumeration, or pointer type and the
result is the value of the argument. Integral promotion is performed on integral or enumeration operands.
The type of the result is the type of the promoted operand.”

Related

What makes this string addition starting with a '+' a valid statement?

I had a copy/paste error in my code and ended up with a line that looked like:
myString = otherString; + "?" + anotherString;
The code after the first semicolon wasn't issuing any errors or warnings. Using an online compiler to double check my environment, I created this quick example that also compiles and runs:
int main()
{
std::string sText("Hello World");
std::string sMore(" again");
+ "???" + sText + sMore; //No warning, no error
cout << sText; //output "Hello World" as expected
+ 4; //Warning has no effect
+ sMore; //error: no match for ‘operator+’ (operand type is ‘std::string {aka std::basic_string}’)
return 0;
}
So what is the beginning + doing?
Literal strings (like e.g. "???") are really arrays of characters. And as all other arrays they decay to pointers to themselves. And this is what happens here, the expression + "???" applies the unary + operator on the pointer to the first element of the string.
This results in another pointer (to a character) that is equal to the first, and which can then be used to add to std::string objects.
The same thing happens for other literals, like numbers, which is why +4 is valid as well.
But there's no unary + operator defined for std::string which is why you get an error for +sMore.
Firstly, string literal is an array of characters. When you pass an array as operand to unary operator +, the array is implicitly converted to a pointer to first element (which is of type const char). This implicit conversion is called decaying.
The result of unary operator + is the operand after the conversion i.e. the pointer to the first element of the string literal in this case.
The following binary operator + invokes the overloaded operator that takes a pointer to a character as one operand, and a std::string object as the other.
For integers, operator + behaves the same, except instead of array-to-pointer decay, there is integral promotion. int is not promoted, but all types smaller than int are. For std::string, there is no overload for unary +, hence the error.
And I assume that there is no warning on that line because calling operator+ is "having an effect" even tho the value isn't stored.
Lack of effect is only a reason to warn about if the result of the operation is discarded. In the string case, the result is used as an operand of the binary operator, so there is no reason to warn about lack of effect.
Now, the result of the binary operation is discraded, and has no effect either, but it is practically impossible for the compiler to analyse all possible code paths for "effects", and it doesn't attempt to do so. The compiler is kind enough to check for primitive operations on pointers, but it probably won't bother analysing function calls (operator overloads for classes are functions).

C11 and constant expression evaluation in switch-case labels

following this question Why doesn't gcc allow a const int as a case expression?, basically the same as What promoted types are used for switch-case expression comparison? or Is there any way to use a constant array with constant index as switch case label in C?.
From the first link, I tried to replace :
case FOO: // aka 'const int FOO = 10'
with :
case ((int) "toto"[0]): // can't be anything *but* constant
Which gives :
https://ideone.com/n1bmIb -> https://ideone.com/4aOSXR = works in C++
https://ideone.com/n1bmIb -> https://ideone.com/RrnO2R = fails in C
I don't quite understand since the "toto" string can't be anything but a constant one, it isn't even a variable, it lies in the void of the compiler memory. I'm not even playing with the 'const' fuzzy logic of the C language (that really stands for "read-only, not constant, what did you expect?"), the problem is either "array access" or "pointer referencing" into a constant expression that do not evaluate in C, but do quite well in C++.
I expected to use this "trick" to use a HASH_MACRO(str) to generate unique case labels values from a key identifier, leaving eventually the compiler to raise an error in case of collision because of similar label values found.
OK, ok, I was told these restrictions were made to simplify language tooling (preproc, compiler, linker) and C ain't no LISP, but you can have full featured LISP interpreter/compilers for a fraction of the size of a C equivalent, so that's no excuse.
Question is : is there an "extension" to C11 that just allows this "toto" thingy to work in GCC, CLANG and... MSVC ? I don't want to go the C++ path (typedef's forward declarations don't work anymore) and because embedded stuff (hence the compile-time hash computation for space-time distortion).
Is there an intermediary "C+" language that is more 'permissive' and 'understand' embedded a little better, like -Praise the Lords- "enums as bitfield members", among nice other things we cannot have (because of out-of-reality standards evolving like snails under a desert sun) ?
#provemewrong, #changemymind, #norustplease
It doesn't matter whether or not it could be known to the compiler at compile time. The case label needs to have a value that is an integer constant expression (C11 6.8.4.2p3).
The expression of each case label shall be an integer constant expression and no two of the case constant expressions in the same switch statement shall have the same value after conversion. There may be at most one default label in a switch statement. (Any enclosed switch statement may have a default label or case constant expressions with values that duplicate case constant expressions in the enclosing switch statement.)
And the definition of an integer constant expression is in C11 6.6p6:
An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, _Alignof expressions, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof or _Alignof operator.
Since "toto" is none of integer constants, enumeration constants, character constants, constant sizeof, _Alignof expressions or floating point constant cast to an integer; and that list was specified in the constraints section of the standard, the compiler must not pass this silently. (Even a conforming compiler may still successfully compile the program, but it must diagnose this as a constraint violation.)
What you can use is chained ? : to resolve the index to a character constant, i.e.
x == 0 ? 't'
: x == 1 ? 'o'
: x == 2 ? 't'
: x == 3 ? 'o'
This can be written into a macro.
"toto[0]" is not an integer constant expression as C defines the term:
6.6 Constant expressions
...
6 An integer constant expression117) shall have integer type and shall only have operands
that are integer constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants, _Alignof expressions, and floating
constants that are the immediate operands of casts. Cast operators in an integer constant
expression shall only convert arithmetic types to integer types, except as part of an
operand to the sizeof or _Alignof operator.
117) An integer constant expression is required in a number of contexts such as the size of a bit-field
member of a structure, the value of an enumeration constant, and the size of a non-variable length
array. Further constraints that apply to the integer constant expressions used in conditional-inclusion
preprocessing directives are discussed in 6.10.1.
C 2011 online draft
The issue that you're running into is that, in C, "toto" is an array of chars. Sure, it's constant in memory, but it's still just an array. The [] operator indexes in an array (from a pointer). If you wanted, you would be able to edit a compiled binary and change the string "toto" to something else. In a sense, it is not compile-time known. It's equivalent to doing:
char * const ___string1 = "toto";
...
case ((int) ___string1[0]):
(This is a little forced and redundant, but it's just for demonstration)
Note that the type of the elements of a string literal is char, not const char.
The case, must be a constant however, as it is built into the compiled program control flow.

What is the reason for the existent difference between C and C++ relative to the unary arithmetic operator +

In C the unary plus operator is called unary arithmetic operator and may not be applied to pointers (the C Standard, 6.5.3.3 Unary arithmetic operators).
1 The operand of the unary + or - operator shall have arithmetic
type; of the ~ operator, integer type; of the ! operator, scalar
type.
Thus this program will not compile
#include <stdio.h>
int main(void)
{
int a = 10;
int *pa = &a;
printf( "%d\n", *+pa );
return 0;
}
However in C++ the unary plus operator may be applied to pointers (the C++ Standard, 5.3.1 Unary operators)
7 The operand of the unary + operator shall have arithmetic, unscoped
enumeration, or pointer type and the result is the value of the
argument. Integral promotion is performed on integral or enumeration
operands. The type of the result is the type of the promoted operand.
And this program compiles successfully.
#include <iostream>
int main()
{
int a = 10;
int *pa = &a;
std::cout << *+pa << std::endl;
return 0;
}
What is the reason for maintaining this difference between C and C++?
The question arose when I was answering the question Why size of int pointer is different of size of int array?. I was going to show how to convert an array to a pointer in the sizeof operator.
At first I wanted to write
sizeof( +array )
However this expression is invalid in C. So I had to write
sizeof( array + 0 )
and I found that there is such a difference between C and C++.:)
Different languages may attach different semantics to the same syntax.
C and C++ are different languages with a common ancestor. C++ semantics look deceptively similar but are subtly different for some parts of the common syntax. Another curious case is this:
if (sizeof(char) == sizeof(int)) {
printf("Hello embedded world\n");
} else {
if (sizeof('a') == sizeof(char))
printf("This is C++ code\n");
if (sizeof('a') == sizeof(int))
printf("This is C code\n");
}
The reason for C++ to have extended the C syntax in the case of unary + might be to allow for some extended numeric types to be implemented as pointers, or simply for reasons of symmetry.
As Jaa-c mentions in a comment, +p is a computed expression whereas p is a reference to p. You provided another example where + can be used to force expression context. The question is why did the original authors of the C language disallow unary + on non numeric types? Maybe a side effect of the original implementation of pcc.
Note that in Javascript, the unary + operator can be applied to non number types and operates as a conversion to number.
In my considerations:
C++ is a type of Object-Oriented Language. So every data type can be treated as a "Class".
In C int is one of "the basic data type of C". But in C++ we can consider int as a Class. Thus, In C++ int pointer and int array belong to the different classes. In C a int pointer variable stored another int variable's address. int array's name instead of the first element's address of that int array. So in C they have kind of the same meaning.
As for the unary opreator "+", I understand the C++ language as: Every class In C++ represents a set of stuff. Every stuff in the set has the same properties. And there's some operations can be done onto each stuff. Of course these operations are member functions of a class. Another character In C++ is that users can overload an operator. Overload means we can do the same operation on the different Classes. For example: A man is eating a burger. we can overload action "Eat" between cats and rat: A cat is Eating a rat.
So as the C++ standard say:"The operand of the unary + operator shall have arithmetic, unscoped enumeration, or pointer type and the result is the value of the argument." That's just a overload for unary operator + in Class unscoped enumeration and pointer type. "And The Result Is The Value Of The Argument"-> I guess that's the point.

How does C treat char sums?

When I'm in C++, and I call an overloaded function foo, like so:
foo('e' - (char) 5)
it can output "this is a char" or "this is an int" based on the type result. I get "this is an int" from my program, like this:
#include <iostream>
void foo(char x)
{
std::cout << "output is a char" << std::endl;
}
void foo(int x)
{
std::cout << "output is an int" << std::endl;
}
int main()
{
foo('a' + (char) 5);
}
My instructor says that in C, the expression above, ('a' + (char) 5), evaluates as a char. I see in the C99 standard that chars are promoted to ints to find the sum, but does C recast them back to chars when it's done? I can't find any references that seem credible saying one way or another what C actually does after the promotion is completed, and the sum is found.
Is the sum left as an int, or given as a char? How can I prove this in C, or is there a reference I'm not understanding/finding?
From the C Standard, 6.3.1.8 Usual arithmetic conversions, emphasis mine:
Many operators that expect operands of arithmetic type cause conversions and yield result
types in a similar way. The purpose is to determine a common real type for the operands
and result. For the specified operands, each operand is converted, without change of type
domain, to a type whose corresponding real type is the common real type. Unless
explicitly stated otherwise, the common real type is also the corresponding real type of
the result, whose type domain is the type domain of the operands if they are the same,
and complex otherwise. This pattern is called the usual arithmetic conversions:
First, if the correspeonding real type of either operand is long double...
Otherwise, if the corresponding real type of either operand is double...
Otherwise, if the corresponding real type of either operand is float...
Otherwise, the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands:
If both operands have the same type, then no further conversion is needed.
So you are exactly correct. The type of the expression 'a' + (char) 5 is int. There is no recasting back to char, unless explicitly asked for by the user. Note that 'a' here has type int, so it's only the (char)5 that needs to be promoted. This is stipulated in 6.4.4.4 Character Constants:
An integer character constant is a sequence of one or more multibyte characters enclosed
in single-quotes, as in 'x'.
...
An integer character constant has type int.
There is an example demonstrating the explicit recasting to char:
In executing the fragment
char c1, c2;
/* ... */
c1 = c1 + c2
the ‘‘integer promotions’’ require that the abstract machine promote the value of each variable to int size and then add the two ints and truncate the sum. Provided the addition of two chars can be done without overflow, or with overflow wrapping silently to produce the correct result, the actual execution need only produce the same result, possibly omitting the promotions.
The truncation here only happens because we assign back to a char.
No, C does not recast them back to chars.
The standard (ISO/IEC 9899:1999) says (6.3.1.8 Usual arithmetic conversions):
Many operators that expect operands of arithmetic type cause conversions and yield result
types in a similar way. The purpose is to determine a common real type for the operands
and result. For the specified operands, each operand is converted, without change of type
domain, to a type whose corresponding real type is the common real type. Unless
explicitly stated otherwise, the common real type is also the corresponding real type of
the result, whose type domain is determined by the operator.
Your instructor seems to be wrong. Additional to your standard find that the arithmetic promotes to int, we can use a simple test program to show the behavior (no standard prove of course, but the same level of proof as your C++ test):
#include <stdio.h>
int main () {
printf("%g",'c' - (char)5);
}
produces
Warning: format specifies type 'double' but argument has type 'int'
with gcc and clang.
You can't determine the type of an expression as easily in C, but you can easily determine the size of an expression:
#include <stdio.h>
int main(void) {
printf("sizeof(char)==1\n");
printf("sizeof(int)==%u\n", sizeof(int));
printf("sizeof('a' + (char) 5)==%u\n", sizeof('a' + (char) 5));
return 0;
}
This gives me:
sizeof(char)==1
sizeof(int)==4
sizeof('a' + (char) 5)==4
which at least proves that 'a' + (char) 5 is not of type char.
It's promoted to an int, and there's nothing to tell the compiler it should use anything else. You can convert back to a char like this:
foo((char)('a' + 5));
This tells the compiler to treat the result of the calculation as a char, otherwise it leaves it as an int.
Section 6.5.2.2/6
If the expression that denotes the called function has a type that
does not include a prototype, the integer promotions are performed on
each argument...
So the answer to your question depends on the function prototype. If the function is declared as
void foo(int x)
or
void foo()
then the function argument will be passed as an int.
OTOH, if the function is declared as
void foo( char x )
then the result of the expression will be implicitly cast to char.
In C (unlike C++), the character literal 'a' has type int (§6.4.4.4¶10: "An integer character constant has type int.")
Even if that were not the case, the C standard clearly states that prior to the evaluation of the operator +, "[i]f both operands have arithmetic type, the usual arithmetic conversions are performed on them." (C11, §6.5.6 ¶4) In this respect, C and C++ have identical semantics. (See [expr.add] §5.7¶1 of C++)
From the C++ Standard (C++ Working Draft N3797, 5.7 Additive operators)
1 The additive operators + and - group left-to-right. The usual
arithmetic conversions are performed for operands of arithmetic or
enumeration type.
and (5 Expressions)
10 Many binary operators that expect operands of arithmetic or
enumeration type cause conversions and yield result types in a similar
way. The purpose is to yield a common type, which is also the type of
the result. This pattern is called the usual arithmetic conversions,
which are defined as follows:
...
— Otherwise, the integral promotions (4.5) shall be performed on
both operands.62 Then the following rules shall be applied to the
promoted operands:
Thus the expression in the function call
foo('a' + (char) 5);
has type int.
To call the overloaded function with parameter of type char you have to write for example
foo( char( 'a' + 5 ) );
or
foo( ( char )( 'a' + 5 ) );
or you can use C++ casting like
foo( static_cast<char>( 'a' + 5 ) );
The above quotes from the C++ Standard also are valid for C Standard. The visible difference is that in C++ character literals have type char while in C they have type int.
So in C++ the output of the statement
std::cout << sizeof( 'a' ) << std::endl;
will be equal to 1.
While in C the output of the statement
printf( "%zu\n", sizeof( 'a' ) );
will be equal to sizeof( int ) that is usually equal to 4.

Code compiles with std::string even when the syntax appears bad and contains multiple '+' operator [duplicate]

Today I wrote an expression:
"<" + message_id + "#" + + ">"
^
|
\____ see that extra '+' here!
and got surprised that it actually compiled. (PS message_id is a QString, it would also work with an std::string)
I often do things like that, leave out a variable as I'm working and I expect the compiler to tell me where I'm still missing entries. The final would look something like this:
"<" + message_id + "#" + network_domain + ">"
Now I'd like to know why the + unary operator is valid against a string literal!?
Unary + can be applied to arithmetic type values, unscoped enumeration values and pointer values because ...
the C++ standard defines it that way, in C++11 §5.3.1/7.
In this case the string literal, which is of type array of char const, decays to pointer to char const.
It's always a good idea to look at the documentation when one wonders about the functionality of something.
“The operand of the unary + operator shall have arithmetic, unscoped enumeration, or pointer type and the
result is the value of the argument. Integral promotion is performed on integral or enumeration operands.
The type of the result is the type of the promoted operand.”