I was under the impression that comparison operators are not defined for C-style strings, which is why we use things like strcmp(). Therefore the following code would be illegal in C and C++:
if("foo" == "foo"){
printf("The C-style comparison worked.\n");
}
if("foo" == "bob"){
printf("The C-style comparison produced the incorrect answer.\n");
} else {
printf("The C-style comparison worked, strings were not equal.\n");
}
But I tested it in both Codeblocks using GCC and in VS 2015, compiling as C and also as C++. Both allowed the code and produced the correct output.
Is it legal to compare C-style strings? Or is it a non-standard compiler extension that allows this code to work?
If this is legal, then why do people use strcmp() in C?
The compiler is free to use string interning, i.e. save memory by avoiding to duplicate identical data. The 2 "foo" literals that compare equal must be stored in the same memory location in your case.
However, you should not take this as the rule. The strcmp method will work under all circumstances, whereas it is implementation defined whether your observation will hold with another compiler, compiler version, compilation flags set etc.
The code is legal in C. It just may not produce the result you expected.
The type of string literal is char[N] in C and const char[N] in C++, where N is the number of characters in the string literal.
"foo" is type char[4] and const char[4] in C and C++ respectively. Basically it's an array. An array gets converted into a pointer to its first element when used in an expression. So in the comparison, if("foo" == "foo") the string literals get converted into pointers. Hence, the "address comparison".
In the comparison,
if("foo" == "foo"){
the addresses of the string literals are compared, which may or may not be equal.
It is equivalent to:
const char *p = "foo";
const char *q = "foo";
if ( p == q) {
...
}
C standard doesn't guarantee that addresses are equal for two string literals with same content ("foo"'s here) are placed in same location. But in practice, any compiler would place at the same address. So the comparison seems to work. But you can't rely on this behaviour.
6.4.5, String literals (C11, draft)
It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.
Similarly, this comparison
if("foo" == "bob"){
...
}
is equivalent to:
const char *x = "foo";
const char *y = "bob";
if("foo" == "bob"){
...
}
In this case, the string literals would be at different locations and pointer comparison fails. So in both cases, it looks as if the == operator actually works for comparing C-strings.
Instead if you do comparisons using arrays, it will not work:
char s1[] ="foo";
char s2[] = "foo";
if (s1 == s2) {
/* always false */
}
The difference is that when an array is initialized with a string literals, it's copied into the array. The arrays s1 and s2 have distinct the addresses and will never be equal. But in case of string literals, both p and q point to the same address (assuming the compiler places so - this is not guaranteed as noted above).
it is copying/comparing the addresses of the the string, not the content of the strings.
comparing the addresses is a valid operation
Related
I've been having really freaky stuff happening in my code. I believe I have tracked it down to the part labeled "here" (code is simplified, of course):
std::string func() {
char c;
// Do stuff that will assign to c
return "" + c; // Here
}
All sorts of stuff will happen when I try to cout the result of this function. I think I've even managed to get pieces of underlying C++ documentation, and many a segmentation fault. It's clear to me that this doesn't work in C++ (I've resorted to using stringstream to do conversions to string now), but I would like to know why. After using lots of C# for quite a while and no C++, this has caused me a lot of pain.
"" is a string literal. Those have the type array of N const char. This particular string literal is an array of 1 const char, the one element being the null terminator.
Arrays easily decay into pointers to their first element, e.g. in expressions where a pointer is required.
lhs + rhs is not defined for arrays as lhs and integers as rhs. But it is defined for pointers as the lhs and integers as the rhs, with the usual pointer arithmetic.
char is an integral data type in (i.e., treated as an integer by) the C++ core language.
==> string literal + character therefore is interpreted as pointer + integer.
The expression "" + c is roughly equivalent to:
static char const lit[1] = {'\0'};
char const* p = &lit[0];
p + c // "" + c is roughly equivalent to this expression
You return a std::string. The expression "" + c yields a pointer to const char. The constructor of std::string that expects a const char* expects it to be a pointer to a null-terminated character array.
If c != 0, then the expression "" + c leads to Undefined Behaviour:
For c > 1, the pointer arithmetic produces Undefined Behaviour. Pointer arithmetic is only defined on arrays, and if the result is an element of the same array.
If char is signed, then c < 0 produces Undefined Behaviour for the same reason.
For c == 1, the pointer arithmetic does not produce Undefined Behaviour. That's a special case; pointing to one element past the last element of an array is allowed (it is not allowed to use what it points to, though). It still leads to Undefined Behaviour since the std::string constructor called here requires its argument to be a pointer to a valid array (and a null-terminated string). The one-past-the-last element is not part of the array itself. Violating this requirement also leads to UB.
What probably now happens is that the constructor of std::string tries to determine the size of the null-terminated string you passed it, by searching the (first) character in the array that is equal to '\0':
string(char const* p)
{
// simplified
char const* end = p;
while(*end != '\0') ++end;
//...
}
this will either produce an access violation, or the string it creates contains "garbage".
It is also possible that the compiler assumes this Undefined Behaviour will never happen, and does some funny optimizations that will result in weird behaviour.
By the way, clang++3.5 emits a nice warning for this snippet:
warning: adding 'char' to a string does not append to the string
[-Wstring-plus-int]
return "" + c; // Here
~~~^~~
note: use array indexing to silence this warning
There are a lot of explanations of how the compiler interprets this code, but what you probably wanted to know is what you did wrong.
You appear to be expecting the + behavior from std::string. The problem is that neither of the operands actually is a std::string. C++ looks at the types of the operands, not the final type of the expression (here the return type, std::string) to resolve overloading. It won't pick std::string's version of + if it doesn't see a std::string.
If you have special behavior for an operator (either you wrote it, or got a library that provides it), that behavior only applies when at least one of the operands has class type (or reference to class type, and user-defined enumerations count too).
If you wrote
std::string("") + c
or
std::string() + c
or
""s + c // requires C++14
then you would get the std::string behavior of operator +.
(Note that none of these are actually good solutions, because they all make short-lived std::string instances that can be avoided with std::string(1, c))
The same thing goes for functions. Here's an example:
std::complex<double> ipi = std::log(-1.0);
You'll get a runtime error, instead of the expected imaginary number. That's because the compiler has no clue that it should be using the complex logarithm here. Overloading looks only at the arguments, and the argument is a real number (type double, actually).
Operator overloads ARE functions and obey the same rules.
This return statement
return "" + c;
is valid. There is used so called the pointer arithmetic. String literal "" is converted to pointer to its first character (in this case to its terminating zero) and integer value stored in c is added to the pointer.
So the result of expression
"" + c
has type const char *
Class std::string has conversion constructor that accepts argument of type const char *. The problem is that this pointer can points to beyond the string literal. So the function has undefined behaviour.
I do not see any sense in using this expression. If you want to build a string based on one character you could write for example
return std::string( 1, c );
the difference between C++ and C# is that in C# string literals have type System.String that has overloaded operator + for strings and characters (that are unicode characters in C#). In C++ string literals are constant character arrays and the semantic of operator + for arrays and integers are different. Arrays are converted to pointers to their first elements and there are used the pointer arithmetic.
It is standard class std::string that has overloaded operator + for characters. String literals in C++ are not objects of this class that is of type std::string.
This question already has answers here:
Two string literals have the same pointer value?
(5 answers)
Closed 4 years ago.
Here is the code example(compiled and run in vs2015):
#include<cassert>
using namespace std;
int main() {
const char*p = "ohoh";
const char*p1 = "ohoh";
char p3[] = "ohoh";
char p4[] = "ohoh";
assert(p == p1);//OK,success,is this always true?
assert(p3 == p4);//failed
return 0;
}
As far as I know,the string literals are stored in the readonly segment in address space,and const char*p = "ohoh"; just generate a pointer to that position.However,it seems like the compiler will just generate one copy of that string literal,so the p==p1 is true.
Is it a optimization ,or something guaranteed by the standard?
No, it is not guaranteed by the standard. According to cppref:
The compiler is allowed, but not required, to combine storage for equal or overlapping string literals. That means that identical string literals may or may not compare equal when compared by pointer.
The behavior is unspecified, you can't rely on it. From the standard, [lex.string]/16
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) and whether successive evaluations of a string-literal yield the same or a different object is unspecified.
For p3 and p4, they're different things. Note that p and p1 are pointers (to string literal) but p3 and p4 are arrays initialized from string literals.
String literals can be used to initialize character arrays. If an array is initialized like char str[] = "foo";, str will contain a copy of the string "foo".
That means p3 and p4 are independent arrays. When decay to pointer they'll be different (because they point to different arrays), then p3 == p4 would be false.
It is implementation defined whether equal string literals are stored by the compiler as one string literal. So this comparison
p == p1
can yield either true or false depending on the compiler options.
As for arrays then they do not have a built-in comparison operator.
Instead of
assert(p == p1);
assert(p3 == p4);
you could write
assert( strcmp( p, p1 ) == 0 );
assert( strcmp( p3, p4 ) == 0 );
String literals may share storage, and may be in read-only memory.
Neither is guaranteed though.
What is guaranteed is that two different arrays won't share space unless their lifetime does not overlap. In the latter case there's no conforming way to prove it anyway, so who cares?
So from my understanding pointer variables point to an address. So, how is the following code valid in C++?
char* b= "abcd"; //valid
int *c= 1; //invalid
The first line
char* b= "abcd";
is valid in C, because "string literals", while used as initializer, boils down to the address of the first element in the literal, which is a pointer (to char).
Related, C11, chapter §6.4.5, string literals,
[...] The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence. [...]
and then, chapter §6.3.2.1 (emphasis mine)
Except when it is the operand of the sizeof operator, the _Alignof operator, or the
unary & operator, or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points
to the initial element of the array object and is not an lvalue.
However, as mentioned in comments, in C++11 onwards, this is not valid anymore as string literals are of type const char[] there and in your case, LHS lacks the const specifier.
OTOH,
int *c= 1;
is invalid (illegal) because, 1 is an integer constant, which is not the same type as int *.
In C and very old versions of C++, a string literal "abcd" is of type char[], a character array. Such an array can naturally get pointed at by a char*, but not by a int* since that's not a compatible type.
However, C and C++ are different, often incompatible programming languages. They dropped compatibility with each other some 20 years ago.
In standard C++, a string literal is of type const char[] and therefore none of your posted code is valid in C++. This won't compile:
char* b = "abcd"; //invalid, discards const qualifier
This will:
const char* c = "abcd"; // valid
"abcd" is actually a const char[5] type, and the language permits this to be assigned to a const char* (and, regrettably, a char* although C++11 onwards disallows it.).
int *c = 1; is not allowed by the C++ or C standards since you can't assign an int to an int* pointer (with the exception of 0, and in that case your intent will be expressed clearer by assigning nullptr instead).
"abcd" is the address that contains the sequence of five bytes 97 98 99 100 0 -- you cannot see what the address is in the source code, but the compiler will still assign it an address.
1 is also an address near the bottom of your [virtual] memory. This may not seem to be useful to you, but it is useful to other people, so even though the "standard" might not want to permit this, every compiler you are ever likely to run into will support this.
While all other answers give the correct answer of why you code doesn't work, using a compound literal to initialize c, is one way you can make your code work, e.g.
int *c= (int[]){ 1 };
printf ("int pointer c : %d\n", *c);
Note, there are differences between C and C++ in the use of compound literals, they are only available in C.
I've been having really freaky stuff happening in my code. I believe I have tracked it down to the part labeled "here" (code is simplified, of course):
std::string func() {
char c;
// Do stuff that will assign to c
return "" + c; // Here
}
All sorts of stuff will happen when I try to cout the result of this function. I think I've even managed to get pieces of underlying C++ documentation, and many a segmentation fault. It's clear to me that this doesn't work in C++ (I've resorted to using stringstream to do conversions to string now), but I would like to know why. After using lots of C# for quite a while and no C++, this has caused me a lot of pain.
"" is a string literal. Those have the type array of N const char. This particular string literal is an array of 1 const char, the one element being the null terminator.
Arrays easily decay into pointers to their first element, e.g. in expressions where a pointer is required.
lhs + rhs is not defined for arrays as lhs and integers as rhs. But it is defined for pointers as the lhs and integers as the rhs, with the usual pointer arithmetic.
char is an integral data type in (i.e., treated as an integer by) the C++ core language.
==> string literal + character therefore is interpreted as pointer + integer.
The expression "" + c is roughly equivalent to:
static char const lit[1] = {'\0'};
char const* p = &lit[0];
p + c // "" + c is roughly equivalent to this expression
You return a std::string. The expression "" + c yields a pointer to const char. The constructor of std::string that expects a const char* expects it to be a pointer to a null-terminated character array.
If c != 0, then the expression "" + c leads to Undefined Behaviour:
For c > 1, the pointer arithmetic produces Undefined Behaviour. Pointer arithmetic is only defined on arrays, and if the result is an element of the same array.
If char is signed, then c < 0 produces Undefined Behaviour for the same reason.
For c == 1, the pointer arithmetic does not produce Undefined Behaviour. That's a special case; pointing to one element past the last element of an array is allowed (it is not allowed to use what it points to, though). It still leads to Undefined Behaviour since the std::string constructor called here requires its argument to be a pointer to a valid array (and a null-terminated string). The one-past-the-last element is not part of the array itself. Violating this requirement also leads to UB.
What probably now happens is that the constructor of std::string tries to determine the size of the null-terminated string you passed it, by searching the (first) character in the array that is equal to '\0':
string(char const* p)
{
// simplified
char const* end = p;
while(*end != '\0') ++end;
//...
}
this will either produce an access violation, or the string it creates contains "garbage".
It is also possible that the compiler assumes this Undefined Behaviour will never happen, and does some funny optimizations that will result in weird behaviour.
By the way, clang++3.5 emits a nice warning for this snippet:
warning: adding 'char' to a string does not append to the string
[-Wstring-plus-int]
return "" + c; // Here
~~~^~~
note: use array indexing to silence this warning
There are a lot of explanations of how the compiler interprets this code, but what you probably wanted to know is what you did wrong.
You appear to be expecting the + behavior from std::string. The problem is that neither of the operands actually is a std::string. C++ looks at the types of the operands, not the final type of the expression (here the return type, std::string) to resolve overloading. It won't pick std::string's version of + if it doesn't see a std::string.
If you have special behavior for an operator (either you wrote it, or got a library that provides it), that behavior only applies when at least one of the operands has class type (or reference to class type, and user-defined enumerations count too).
If you wrote
std::string("") + c
or
std::string() + c
or
""s + c // requires C++14
then you would get the std::string behavior of operator +.
(Note that none of these are actually good solutions, because they all make short-lived std::string instances that can be avoided with std::string(1, c))
The same thing goes for functions. Here's an example:
std::complex<double> ipi = std::log(-1.0);
You'll get a runtime error, instead of the expected imaginary number. That's because the compiler has no clue that it should be using the complex logarithm here. Overloading looks only at the arguments, and the argument is a real number (type double, actually).
Operator overloads ARE functions and obey the same rules.
This return statement
return "" + c;
is valid. There is used so called the pointer arithmetic. String literal "" is converted to pointer to its first character (in this case to its terminating zero) and integer value stored in c is added to the pointer.
So the result of expression
"" + c
has type const char *
Class std::string has conversion constructor that accepts argument of type const char *. The problem is that this pointer can points to beyond the string literal. So the function has undefined behaviour.
I do not see any sense in using this expression. If you want to build a string based on one character you could write for example
return std::string( 1, c );
the difference between C++ and C# is that in C# string literals have type System.String that has overloaded operator + for strings and characters (that are unicode characters in C#). In C++ string literals are constant character arrays and the semantic of operator + for arrays and integers are different. Arrays are converted to pointers to their first elements and there are used the pointer arithmetic.
It is standard class std::string that has overloaded operator + for characters. String literals in C++ are not objects of this class that is of type std::string.
I've been having really freaky stuff happening in my code. I believe I have tracked it down to the part labeled "here" (code is simplified, of course):
std::string func() {
char c;
// Do stuff that will assign to c
return "" + c; // Here
}
All sorts of stuff will happen when I try to cout the result of this function. I think I've even managed to get pieces of underlying C++ documentation, and many a segmentation fault. It's clear to me that this doesn't work in C++ (I've resorted to using stringstream to do conversions to string now), but I would like to know why. After using lots of C# for quite a while and no C++, this has caused me a lot of pain.
"" is a string literal. Those have the type array of N const char. This particular string literal is an array of 1 const char, the one element being the null terminator.
Arrays easily decay into pointers to their first element, e.g. in expressions where a pointer is required.
lhs + rhs is not defined for arrays as lhs and integers as rhs. But it is defined for pointers as the lhs and integers as the rhs, with the usual pointer arithmetic.
char is an integral data type in (i.e., treated as an integer by) the C++ core language.
==> string literal + character therefore is interpreted as pointer + integer.
The expression "" + c is roughly equivalent to:
static char const lit[1] = {'\0'};
char const* p = &lit[0];
p + c // "" + c is roughly equivalent to this expression
You return a std::string. The expression "" + c yields a pointer to const char. The constructor of std::string that expects a const char* expects it to be a pointer to a null-terminated character array.
If c != 0, then the expression "" + c leads to Undefined Behaviour:
For c > 1, the pointer arithmetic produces Undefined Behaviour. Pointer arithmetic is only defined on arrays, and if the result is an element of the same array.
If char is signed, then c < 0 produces Undefined Behaviour for the same reason.
For c == 1, the pointer arithmetic does not produce Undefined Behaviour. That's a special case; pointing to one element past the last element of an array is allowed (it is not allowed to use what it points to, though). It still leads to Undefined Behaviour since the std::string constructor called here requires its argument to be a pointer to a valid array (and a null-terminated string). The one-past-the-last element is not part of the array itself. Violating this requirement also leads to UB.
What probably now happens is that the constructor of std::string tries to determine the size of the null-terminated string you passed it, by searching the (first) character in the array that is equal to '\0':
string(char const* p)
{
// simplified
char const* end = p;
while(*end != '\0') ++end;
//...
}
this will either produce an access violation, or the string it creates contains "garbage".
It is also possible that the compiler assumes this Undefined Behaviour will never happen, and does some funny optimizations that will result in weird behaviour.
By the way, clang++3.5 emits a nice warning for this snippet:
warning: adding 'char' to a string does not append to the string
[-Wstring-plus-int]
return "" + c; // Here
~~~^~~
note: use array indexing to silence this warning
There are a lot of explanations of how the compiler interprets this code, but what you probably wanted to know is what you did wrong.
You appear to be expecting the + behavior from std::string. The problem is that neither of the operands actually is a std::string. C++ looks at the types of the operands, not the final type of the expression (here the return type, std::string) to resolve overloading. It won't pick std::string's version of + if it doesn't see a std::string.
If you have special behavior for an operator (either you wrote it, or got a library that provides it), that behavior only applies when at least one of the operands has class type (or reference to class type, and user-defined enumerations count too).
If you wrote
std::string("") + c
or
std::string() + c
or
""s + c // requires C++14
then you would get the std::string behavior of operator +.
(Note that none of these are actually good solutions, because they all make short-lived std::string instances that can be avoided with std::string(1, c))
The same thing goes for functions. Here's an example:
std::complex<double> ipi = std::log(-1.0);
You'll get a runtime error, instead of the expected imaginary number. That's because the compiler has no clue that it should be using the complex logarithm here. Overloading looks only at the arguments, and the argument is a real number (type double, actually).
Operator overloads ARE functions and obey the same rules.
This return statement
return "" + c;
is valid. There is used so called the pointer arithmetic. String literal "" is converted to pointer to its first character (in this case to its terminating zero) and integer value stored in c is added to the pointer.
So the result of expression
"" + c
has type const char *
Class std::string has conversion constructor that accepts argument of type const char *. The problem is that this pointer can points to beyond the string literal. So the function has undefined behaviour.
I do not see any sense in using this expression. If you want to build a string based on one character you could write for example
return std::string( 1, c );
the difference between C++ and C# is that in C# string literals have type System.String that has overloaded operator + for strings and characters (that are unicode characters in C#). In C++ string literals are constant character arrays and the semantic of operator + for arrays and integers are different. Arrays are converted to pointers to their first elements and there are used the pointer arithmetic.
It is standard class std::string that has overloaded operator + for characters. String literals in C++ are not objects of this class that is of type std::string.