I had a copy/paste error in my code and ended up with a line that looked like:
myString = otherString; + "?" + anotherString;
The code after the first semicolon wasn't issuing any errors or warnings. Using an online compiler to double check my environment, I created this quick example that also compiles and runs:
int main()
{
std::string sText("Hello World");
std::string sMore(" again");
+ "???" + sText + sMore; //No warning, no error
cout << sText; //output "Hello World" as expected
+ 4; //Warning has no effect
+ sMore; //error: no match for ‘operator+’ (operand type is ‘std::string {aka std::basic_string}’)
return 0;
}
So what is the beginning + doing?
Literal strings (like e.g. "???") are really arrays of characters. And as all other arrays they decay to pointers to themselves. And this is what happens here, the expression + "???" applies the unary + operator on the pointer to the first element of the string.
This results in another pointer (to a character) that is equal to the first, and which can then be used to add to std::string objects.
The same thing happens for other literals, like numbers, which is why +4 is valid as well.
But there's no unary + operator defined for std::string which is why you get an error for +sMore.
Firstly, string literal is an array of characters. When you pass an array as operand to unary operator +, the array is implicitly converted to a pointer to first element (which is of type const char). This implicit conversion is called decaying.
The result of unary operator + is the operand after the conversion i.e. the pointer to the first element of the string literal in this case.
The following binary operator + invokes the overloaded operator that takes a pointer to a character as one operand, and a std::string object as the other.
For integers, operator + behaves the same, except instead of array-to-pointer decay, there is integral promotion. int is not promoted, but all types smaller than int are. For std::string, there is no overload for unary +, hence the error.
And I assume that there is no warning on that line because calling operator+ is "having an effect" even tho the value isn't stored.
Lack of effect is only a reason to warn about if the result of the operation is discarded. In the string case, the result is used as an operand of the binary operator, so there is no reason to warn about lack of effect.
Now, the result of the binary operation is discraded, and has no effect either, but it is practically impossible for the compiler to analyse all possible code paths for "effects", and it doesn't attempt to do so. The compiler is kind enough to check for primitive operations on pointers, but it probably won't bother analysing function calls (operator overloads for classes are functions).
Related
I've been having really freaky stuff happening in my code. I believe I have tracked it down to the part labeled "here" (code is simplified, of course):
std::string func() {
char c;
// Do stuff that will assign to c
return "" + c; // Here
}
All sorts of stuff will happen when I try to cout the result of this function. I think I've even managed to get pieces of underlying C++ documentation, and many a segmentation fault. It's clear to me that this doesn't work in C++ (I've resorted to using stringstream to do conversions to string now), but I would like to know why. After using lots of C# for quite a while and no C++, this has caused me a lot of pain.
"" is a string literal. Those have the type array of N const char. This particular string literal is an array of 1 const char, the one element being the null terminator.
Arrays easily decay into pointers to their first element, e.g. in expressions where a pointer is required.
lhs + rhs is not defined for arrays as lhs and integers as rhs. But it is defined for pointers as the lhs and integers as the rhs, with the usual pointer arithmetic.
char is an integral data type in (i.e., treated as an integer by) the C++ core language.
==> string literal + character therefore is interpreted as pointer + integer.
The expression "" + c is roughly equivalent to:
static char const lit[1] = {'\0'};
char const* p = &lit[0];
p + c // "" + c is roughly equivalent to this expression
You return a std::string. The expression "" + c yields a pointer to const char. The constructor of std::string that expects a const char* expects it to be a pointer to a null-terminated character array.
If c != 0, then the expression "" + c leads to Undefined Behaviour:
For c > 1, the pointer arithmetic produces Undefined Behaviour. Pointer arithmetic is only defined on arrays, and if the result is an element of the same array.
If char is signed, then c < 0 produces Undefined Behaviour for the same reason.
For c == 1, the pointer arithmetic does not produce Undefined Behaviour. That's a special case; pointing to one element past the last element of an array is allowed (it is not allowed to use what it points to, though). It still leads to Undefined Behaviour since the std::string constructor called here requires its argument to be a pointer to a valid array (and a null-terminated string). The one-past-the-last element is not part of the array itself. Violating this requirement also leads to UB.
What probably now happens is that the constructor of std::string tries to determine the size of the null-terminated string you passed it, by searching the (first) character in the array that is equal to '\0':
string(char const* p)
{
// simplified
char const* end = p;
while(*end != '\0') ++end;
//...
}
this will either produce an access violation, or the string it creates contains "garbage".
It is also possible that the compiler assumes this Undefined Behaviour will never happen, and does some funny optimizations that will result in weird behaviour.
By the way, clang++3.5 emits a nice warning for this snippet:
warning: adding 'char' to a string does not append to the string
[-Wstring-plus-int]
return "" + c; // Here
~~~^~~
note: use array indexing to silence this warning
There are a lot of explanations of how the compiler interprets this code, but what you probably wanted to know is what you did wrong.
You appear to be expecting the + behavior from std::string. The problem is that neither of the operands actually is a std::string. C++ looks at the types of the operands, not the final type of the expression (here the return type, std::string) to resolve overloading. It won't pick std::string's version of + if it doesn't see a std::string.
If you have special behavior for an operator (either you wrote it, or got a library that provides it), that behavior only applies when at least one of the operands has class type (or reference to class type, and user-defined enumerations count too).
If you wrote
std::string("") + c
or
std::string() + c
or
""s + c // requires C++14
then you would get the std::string behavior of operator +.
(Note that none of these are actually good solutions, because they all make short-lived std::string instances that can be avoided with std::string(1, c))
The same thing goes for functions. Here's an example:
std::complex<double> ipi = std::log(-1.0);
You'll get a runtime error, instead of the expected imaginary number. That's because the compiler has no clue that it should be using the complex logarithm here. Overloading looks only at the arguments, and the argument is a real number (type double, actually).
Operator overloads ARE functions and obey the same rules.
This return statement
return "" + c;
is valid. There is used so called the pointer arithmetic. String literal "" is converted to pointer to its first character (in this case to its terminating zero) and integer value stored in c is added to the pointer.
So the result of expression
"" + c
has type const char *
Class std::string has conversion constructor that accepts argument of type const char *. The problem is that this pointer can points to beyond the string literal. So the function has undefined behaviour.
I do not see any sense in using this expression. If you want to build a string based on one character you could write for example
return std::string( 1, c );
the difference between C++ and C# is that in C# string literals have type System.String that has overloaded operator + for strings and characters (that are unicode characters in C#). In C++ string literals are constant character arrays and the semantic of operator + for arrays and integers are different. Arrays are converted to pointers to their first elements and there are used the pointer arithmetic.
It is standard class std::string that has overloaded operator + for characters. String literals in C++ are not objects of this class that is of type std::string.
I've been having really freaky stuff happening in my code. I believe I have tracked it down to the part labeled "here" (code is simplified, of course):
std::string func() {
char c;
// Do stuff that will assign to c
return "" + c; // Here
}
All sorts of stuff will happen when I try to cout the result of this function. I think I've even managed to get pieces of underlying C++ documentation, and many a segmentation fault. It's clear to me that this doesn't work in C++ (I've resorted to using stringstream to do conversions to string now), but I would like to know why. After using lots of C# for quite a while and no C++, this has caused me a lot of pain.
"" is a string literal. Those have the type array of N const char. This particular string literal is an array of 1 const char, the one element being the null terminator.
Arrays easily decay into pointers to their first element, e.g. in expressions where a pointer is required.
lhs + rhs is not defined for arrays as lhs and integers as rhs. But it is defined for pointers as the lhs and integers as the rhs, with the usual pointer arithmetic.
char is an integral data type in (i.e., treated as an integer by) the C++ core language.
==> string literal + character therefore is interpreted as pointer + integer.
The expression "" + c is roughly equivalent to:
static char const lit[1] = {'\0'};
char const* p = &lit[0];
p + c // "" + c is roughly equivalent to this expression
You return a std::string. The expression "" + c yields a pointer to const char. The constructor of std::string that expects a const char* expects it to be a pointer to a null-terminated character array.
If c != 0, then the expression "" + c leads to Undefined Behaviour:
For c > 1, the pointer arithmetic produces Undefined Behaviour. Pointer arithmetic is only defined on arrays, and if the result is an element of the same array.
If char is signed, then c < 0 produces Undefined Behaviour for the same reason.
For c == 1, the pointer arithmetic does not produce Undefined Behaviour. That's a special case; pointing to one element past the last element of an array is allowed (it is not allowed to use what it points to, though). It still leads to Undefined Behaviour since the std::string constructor called here requires its argument to be a pointer to a valid array (and a null-terminated string). The one-past-the-last element is not part of the array itself. Violating this requirement also leads to UB.
What probably now happens is that the constructor of std::string tries to determine the size of the null-terminated string you passed it, by searching the (first) character in the array that is equal to '\0':
string(char const* p)
{
// simplified
char const* end = p;
while(*end != '\0') ++end;
//...
}
this will either produce an access violation, or the string it creates contains "garbage".
It is also possible that the compiler assumes this Undefined Behaviour will never happen, and does some funny optimizations that will result in weird behaviour.
By the way, clang++3.5 emits a nice warning for this snippet:
warning: adding 'char' to a string does not append to the string
[-Wstring-plus-int]
return "" + c; // Here
~~~^~~
note: use array indexing to silence this warning
There are a lot of explanations of how the compiler interprets this code, but what you probably wanted to know is what you did wrong.
You appear to be expecting the + behavior from std::string. The problem is that neither of the operands actually is a std::string. C++ looks at the types of the operands, not the final type of the expression (here the return type, std::string) to resolve overloading. It won't pick std::string's version of + if it doesn't see a std::string.
If you have special behavior for an operator (either you wrote it, or got a library that provides it), that behavior only applies when at least one of the operands has class type (or reference to class type, and user-defined enumerations count too).
If you wrote
std::string("") + c
or
std::string() + c
or
""s + c // requires C++14
then you would get the std::string behavior of operator +.
(Note that none of these are actually good solutions, because they all make short-lived std::string instances that can be avoided with std::string(1, c))
The same thing goes for functions. Here's an example:
std::complex<double> ipi = std::log(-1.0);
You'll get a runtime error, instead of the expected imaginary number. That's because the compiler has no clue that it should be using the complex logarithm here. Overloading looks only at the arguments, and the argument is a real number (type double, actually).
Operator overloads ARE functions and obey the same rules.
This return statement
return "" + c;
is valid. There is used so called the pointer arithmetic. String literal "" is converted to pointer to its first character (in this case to its terminating zero) and integer value stored in c is added to the pointer.
So the result of expression
"" + c
has type const char *
Class std::string has conversion constructor that accepts argument of type const char *. The problem is that this pointer can points to beyond the string literal. So the function has undefined behaviour.
I do not see any sense in using this expression. If you want to build a string based on one character you could write for example
return std::string( 1, c );
the difference between C++ and C# is that in C# string literals have type System.String that has overloaded operator + for strings and characters (that are unicode characters in C#). In C++ string literals are constant character arrays and the semantic of operator + for arrays and integers are different. Arrays are converted to pointers to their first elements and there are used the pointer arithmetic.
It is standard class std::string that has overloaded operator + for characters. String literals in C++ are not objects of this class that is of type std::string.
I've been having really freaky stuff happening in my code. I believe I have tracked it down to the part labeled "here" (code is simplified, of course):
std::string func() {
char c;
// Do stuff that will assign to c
return "" + c; // Here
}
All sorts of stuff will happen when I try to cout the result of this function. I think I've even managed to get pieces of underlying C++ documentation, and many a segmentation fault. It's clear to me that this doesn't work in C++ (I've resorted to using stringstream to do conversions to string now), but I would like to know why. After using lots of C# for quite a while and no C++, this has caused me a lot of pain.
"" is a string literal. Those have the type array of N const char. This particular string literal is an array of 1 const char, the one element being the null terminator.
Arrays easily decay into pointers to their first element, e.g. in expressions where a pointer is required.
lhs + rhs is not defined for arrays as lhs and integers as rhs. But it is defined for pointers as the lhs and integers as the rhs, with the usual pointer arithmetic.
char is an integral data type in (i.e., treated as an integer by) the C++ core language.
==> string literal + character therefore is interpreted as pointer + integer.
The expression "" + c is roughly equivalent to:
static char const lit[1] = {'\0'};
char const* p = &lit[0];
p + c // "" + c is roughly equivalent to this expression
You return a std::string. The expression "" + c yields a pointer to const char. The constructor of std::string that expects a const char* expects it to be a pointer to a null-terminated character array.
If c != 0, then the expression "" + c leads to Undefined Behaviour:
For c > 1, the pointer arithmetic produces Undefined Behaviour. Pointer arithmetic is only defined on arrays, and if the result is an element of the same array.
If char is signed, then c < 0 produces Undefined Behaviour for the same reason.
For c == 1, the pointer arithmetic does not produce Undefined Behaviour. That's a special case; pointing to one element past the last element of an array is allowed (it is not allowed to use what it points to, though). It still leads to Undefined Behaviour since the std::string constructor called here requires its argument to be a pointer to a valid array (and a null-terminated string). The one-past-the-last element is not part of the array itself. Violating this requirement also leads to UB.
What probably now happens is that the constructor of std::string tries to determine the size of the null-terminated string you passed it, by searching the (first) character in the array that is equal to '\0':
string(char const* p)
{
// simplified
char const* end = p;
while(*end != '\0') ++end;
//...
}
this will either produce an access violation, or the string it creates contains "garbage".
It is also possible that the compiler assumes this Undefined Behaviour will never happen, and does some funny optimizations that will result in weird behaviour.
By the way, clang++3.5 emits a nice warning for this snippet:
warning: adding 'char' to a string does not append to the string
[-Wstring-plus-int]
return "" + c; // Here
~~~^~~
note: use array indexing to silence this warning
There are a lot of explanations of how the compiler interprets this code, but what you probably wanted to know is what you did wrong.
You appear to be expecting the + behavior from std::string. The problem is that neither of the operands actually is a std::string. C++ looks at the types of the operands, not the final type of the expression (here the return type, std::string) to resolve overloading. It won't pick std::string's version of + if it doesn't see a std::string.
If you have special behavior for an operator (either you wrote it, or got a library that provides it), that behavior only applies when at least one of the operands has class type (or reference to class type, and user-defined enumerations count too).
If you wrote
std::string("") + c
or
std::string() + c
or
""s + c // requires C++14
then you would get the std::string behavior of operator +.
(Note that none of these are actually good solutions, because they all make short-lived std::string instances that can be avoided with std::string(1, c))
The same thing goes for functions. Here's an example:
std::complex<double> ipi = std::log(-1.0);
You'll get a runtime error, instead of the expected imaginary number. That's because the compiler has no clue that it should be using the complex logarithm here. Overloading looks only at the arguments, and the argument is a real number (type double, actually).
Operator overloads ARE functions and obey the same rules.
This return statement
return "" + c;
is valid. There is used so called the pointer arithmetic. String literal "" is converted to pointer to its first character (in this case to its terminating zero) and integer value stored in c is added to the pointer.
So the result of expression
"" + c
has type const char *
Class std::string has conversion constructor that accepts argument of type const char *. The problem is that this pointer can points to beyond the string literal. So the function has undefined behaviour.
I do not see any sense in using this expression. If you want to build a string based on one character you could write for example
return std::string( 1, c );
the difference between C++ and C# is that in C# string literals have type System.String that has overloaded operator + for strings and characters (that are unicode characters in C#). In C++ string literals are constant character arrays and the semantic of operator + for arrays and integers are different. Arrays are converted to pointers to their first elements and there are used the pointer arithmetic.
It is standard class std::string that has overloaded operator + for characters. String literals in C++ are not objects of this class that is of type std::string.
Today I wrote an expression:
"<" + message_id + "#" + + ">"
^
|
\____ see that extra '+' here!
and got surprised that it actually compiled. (PS message_id is a QString, it would also work with an std::string)
I often do things like that, leave out a variable as I'm working and I expect the compiler to tell me where I'm still missing entries. The final would look something like this:
"<" + message_id + "#" + network_domain + ">"
Now I'd like to know why the + unary operator is valid against a string literal!?
Unary + can be applied to arithmetic type values, unscoped enumeration values and pointer values because ...
the C++ standard defines it that way, in C++11 §5.3.1/7.
In this case the string literal, which is of type array of char const, decays to pointer to char const.
It's always a good idea to look at the documentation when one wonders about the functionality of something.
“The operand of the unary + operator shall have arithmetic, unscoped enumeration, or pointer type and the
result is the value of the argument. Integral promotion is performed on integral or enumeration operands.
The type of the result is the type of the promoted operand.”
Today I wrote an expression:
"<" + message_id + "#" + + ">"
^
|
\____ see that extra '+' here!
and got surprised that it actually compiled. (PS message_id is a QString, it would also work with an std::string)
I often do things like that, leave out a variable as I'm working and I expect the compiler to tell me where I'm still missing entries. The final would look something like this:
"<" + message_id + "#" + network_domain + ">"
Now I'd like to know why the + unary operator is valid against a string literal!?
Unary + can be applied to arithmetic type values, unscoped enumeration values and pointer values because ...
the C++ standard defines it that way, in C++11 §5.3.1/7.
In this case the string literal, which is of type array of char const, decays to pointer to char const.
It's always a good idea to look at the documentation when one wonders about the functionality of something.
“The operand of the unary + operator shall have arithmetic, unscoped enumeration, or pointer type and the
result is the value of the argument. Integral promotion is performed on integral or enumeration operands.
The type of the result is the type of the promoted operand.”