While reading C++ Primer, I came across this line:
const char *cp = "Hello World";
To my understanding, "Hello World" is a string literal, which is an array of constant characters. Since an array decays to a pointer to the first element in an array. Does that mean cp points to H which is a literal? Isn't it impossible to have a pointer to a literal, since a pointer has to point to the address of an object in memory?
The storage type of the literals types: boolean, integer, floating, character and nullptr is not specified and therefore they do not need to have a storage location in memory.
The storage type of literal string type is specified:
"...String literals have static storage duration, and thus exist in memory for the life of the program..." source: https://en.cppreference.com/w/cpp/language/string_literal
Therefore the address of a literal string can be taken and stored in a const char *.
As suggested by #MichaelKenzel :
From the C++17 draft standard (n4659) https://timsong-cpp.github.io/cppwp/n4659/lex.string#16
Evaluating a string-literal results in a string literal object with
static storage duration, initialized from the given characters as
specified above. Whether all string literals are distinct (that is,
are stored in nonoverlapping objects) and whether successive
evaluations of a string-literal yield the same or a different object
is unspecified. [ Note: The effect of attempting to modify a string
literal is undefined. — end note ]
Related
So from my understanding pointer variables point to an address. So, how is the following code valid in C++?
char* b= "abcd"; //valid
int *c= 1; //invalid
The first line
char* b= "abcd";
is valid in C, because "string literals", while used as initializer, boils down to the address of the first element in the literal, which is a pointer (to char).
Related, C11, chapter §6.4.5, string literals,
[...] The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence. [...]
and then, chapter §6.3.2.1 (emphasis mine)
Except when it is the operand of the sizeof operator, the _Alignof operator, or the
unary & operator, or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points
to the initial element of the array object and is not an lvalue.
However, as mentioned in comments, in C++11 onwards, this is not valid anymore as string literals are of type const char[] there and in your case, LHS lacks the const specifier.
OTOH,
int *c= 1;
is invalid (illegal) because, 1 is an integer constant, which is not the same type as int *.
In C and very old versions of C++, a string literal "abcd" is of type char[], a character array. Such an array can naturally get pointed at by a char*, but not by a int* since that's not a compatible type.
However, C and C++ are different, often incompatible programming languages. They dropped compatibility with each other some 20 years ago.
In standard C++, a string literal is of type const char[] and therefore none of your posted code is valid in C++. This won't compile:
char* b = "abcd"; //invalid, discards const qualifier
This will:
const char* c = "abcd"; // valid
"abcd" is actually a const char[5] type, and the language permits this to be assigned to a const char* (and, regrettably, a char* although C++11 onwards disallows it.).
int *c = 1; is not allowed by the C++ or C standards since you can't assign an int to an int* pointer (with the exception of 0, and in that case your intent will be expressed clearer by assigning nullptr instead).
"abcd" is the address that contains the sequence of five bytes 97 98 99 100 0 -- you cannot see what the address is in the source code, but the compiler will still assign it an address.
1 is also an address near the bottom of your [virtual] memory. This may not seem to be useful to you, but it is useful to other people, so even though the "standard" might not want to permit this, every compiler you are ever likely to run into will support this.
While all other answers give the correct answer of why you code doesn't work, using a compound literal to initialize c, is one way you can make your code work, e.g.
int *c= (int[]){ 1 };
printf ("int pointer c : %d\n", *c);
Note, there are differences between C and C++ in the use of compound literals, they are only available in C.
Is the pointer returned by the following function valid?
const char * bool2str( bool flg )
{
return flg ? "Yes" : "No";
}
It works well in Visual C++ and g++. What does C++ standard say about this?
On storage duration:
2.13.4
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow
string literal has type “array of n const char”, where n is the size of the string as defined below, and has
static storage duration
read in conjunction with 3.7.1
3.7.1.
All objects which do not have dynamic storage duration, do not have thread storage duration, and are
not local have static storage duration. The storage for these objects shall last for the duration of the
program (3.6.2, 3.6.3).
On type:
Annex C
Subclause 2.13.4:
Change: String literals made const
The type of a string literal is changed from “array of char ” to “array of const char.” The type of a
char16_t string literal is changed from “array of some-integer-type ” to “array of const char16_t.” The
type of a char32_t string literal is changed from “array of some-integer-type ” to “array of const char32_-
t.” The type of a wide string literal is changed from “array of wchar_t ” to “array of const wchar_t.”
Rationale: This avoids calling an inappropriate overloaded function, which might expect to be able to
modify its argument.
Effect on original feature: Change to semantics of well-defined feature.
Difficulty of converting: Simple syntactic transformation, because string literals can be converted to
char*; (4.2). The most common cases are handled by a new but deprecated standard conversion:
char* p = "abc"; // valid in C, deprecated in C++
char* q = expr ? "abc" : "de"; // valid in C, invalid in C++
How widely used: Programs that have a legitimate reason to treat string literals as pointers to potentially
modifiable memory are probably rare.
Dynamically allocated (the word 'heap' is never used in context of an area of memory AFAIK in the standard) memory requires a function call that can happen as early as main much after the static memory is allocated.
This code is perfectly valid and conformant. The only "gotcha" would be to ensure that the caller doesn't try to free the string.
This code is valid and standard compliant.
String literals are stored in read-only memory, and the function just gets the address of the chosen string.
C++ standard (2.13.4) says :
An ordinary string literal has type
“array of n const char” and static
storage duration
They key to understand your problem here, is the static storage duration : string literals are allocated when your program launch, and last for the duration of the program. Your function just gets the address and returns it.
Technically Yes it is valid.
The strings have static storage durataion.
But that is not the whole story.
These are C-Strings. The convention in C-Libraries and funcctions is to return a dynamically allocated string that should be freed. ie A pointer returned is implicitly passing ownership back to tha caller (As usuall in C there are also exceptions).
If you do not follow these conventions you will confuse a lot of experienced C-Developers that would expect this convention. If you do not follow this standard expectation then it should be well documented in the code.
Also this is C++ (as per your tags). So it is more conventional to return a std::string. The reason for this is that the passing of ownership via pointers is only implied (and this lead to a lot of errors in C code were the above expectation was broken but documented, unfortunately the documentaiton was never read by the user of the code). By using a std::string you are passing an object and their is no longer any question of ownership (the result is passed back as a value and thus yours), but because it is an object there is no questions or issues with resource allocation.
If you are worried about effeciency I think that is a false concern.
If you want this for printing via streams there is already a standard convention to do that:
std::cout << std::boolalpha << false << std::endl;
std::cout << std::boolalpha << true << std::endl;
Out of curiosity, I'm wondering what the real underlying type of a C++ string literal is.
Depending on what I observe, I get different results.
A typeid test like the following:
std::cout << typeid("test").name() << std::endl;
shows me char const[5].
Trying to assign a string literal to an incompatible type like so (to see the given error):
wchar_t* s = "hello";
I get a value of type "const char *" cannot be used to initialize an entity of type "wchar_t *" from VS12's IntelliSense.
But I don't see how it could be const char * as the following line is accepted by VS12:
char* s = "Hello";
I have read that this was allowed in pre-C++11 standards as it was for retro-compatibility with C, although modification of s would result in Undefined Behavior. I assume that this is simply VS12 having not yet implemented all of the C++11 standard and that this line would normally result in an error.
Reading the C99 standard (from here, 6.4.5.5) suggests that it should be an array:
The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence.
So, what is the type underneath a C++ string literal?
Thank you very much for your precious time.
The type of a string literal is indeed const char[SIZE] where SIZE is the length of the string plus the null terminating character.
The fact that you're sometimes seeing const char* is because of the usual array-to-pointer decay.
But I don't see how it could be const char * as the following line is accepted by VS12:
char* s = "Hello";
This was correct behaviour in C++03 (as an exception to the usual const-correctness rules) but it has been deprecated since. A C++11 compliant compiler should not accept that code.
The type of a string literal is char const[N] where N is the number of characters including the terminating null character. Although this type does not convert to char*, the C++ standard includes a clause allowing assignments of string literal to char*. This clause was added to support compatibility especially for C code which didn't have const back then.
The relevant clause for the type in the standard is 2.14.5 [lex.string] paragraph 8:
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration (3.7).
First off, the type of a C++ string literal is an array of n const char. Secondly, if you want to initialise a wchar_t with a string literal you have to code:
wchar_t* s = L"hello"
I know that for example "hello" is of type const char*. So my questions are:
How can we assign a literal string like "hello" to a non-const char* like this:
char* s = "hello"; // "hello" is type of const char* and s is char*
// and we know that conversion from const char* to
// char* is invalid
Is a literal string like "hello", which will take memory in all my program, or it's just like temporary variable that will get destroyed when the statement ends?
In fact, "hello" is of type char const[6].
But the gist of the question is still right – why does C++ allow us to assign a read-only memory location to a non-const type?
The only reason for this is backwards compatibility to old C code, which didn’t know const. If C++ had been strict here it would have broken a lot of existing code.
That said, most compilers can be configured to warn about such code as deprecated, or even do so by default. Furthermore, C++11 disallows this altogether but compilers may not enforce it yet.
For Standerdese Fans:
[Ref 1]C++03 Standard: §4.2/2
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ] For the purpose of ranking in overload resolution (13.3.3.1.1), this conversion is considered an array-to-pointer conversion followed by a qualification conversion (4.4). [Example: "abc" is converted to “pointer to const char” as an array-to-pointer conversion, and then to “pointer to char” as a qualification conversion. ]
C++11 simply removes the above quotation which implies that it is illegal code in C++11.
[Ref 2]C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
is literal string like "hello" will take memory in all my program all it's just like a temporary variable that will get destroyed when the statement ends.
It is kept in programm data, so it is awaiable within lifetime of the programm. You can return pointers and references to this data from the current scope.
The only reason why const char* is being cast to char* is comatiblity with c, like winapi system calls. And this cast is made unexplicit unlike any other const casting.
Just use a string:
std::string s("hello");
That would be the C++ way. If you really must use char, you'll need to create an array and copy the contents over it.
The answer to your second question is that the variable s is stored in RAM as type pointer-to-char. If it's global or static, it's allocated on the heap and remains there for the life of the running program. If it's a local ("auto") variable, it's allocated on the stack and remains there until the current function returns. In either case, it occupies the amount of memory required to hold a pointer.
The string "Hello" is a constant, and it's stored as part of the program itself, along with all the other constants and initializers. If you built your program to run on an appliance, the string would be stored in ROM.
Note that, because the string is constant and s is a pointer, no copying is necessary. The pointer s simply points to wherever the string is stored.
In your example, you are not assigning, but constructing. std::string, for example, does have a std::string(const char *) constructor (actually it's more complicated, but it doesn't matter). And, similarly, char * (if it was a type rather than a pointer to a type) could have a const char * constructor, which is copying the memory.
I don't actually know how the compiler really works here, but I think it could be similar to what I've described above: a copy of "Hello" is constructed in stack and s is initialized with this copy's address.
Can you do this?
char* func()
{
char * c = "String";
return c;
}
is "String" here a globally allocated data by compiler?
You can do that. But it would be even more correct to say:
const char* func(){
return "String";
}
The c++ spec says that string literals are given static storage duration. I can't link to it because there are precious few versions of the c++ spec online.
This page on const correctness is the best reference I can find.
Section 2.13.4 of ISO/IEC 14882 (Programming languages - C++) says:
A string literal is a sequence of characters (as defined in 2.13.2) surrounded by double quotes, optionally
beginning with the letter L, as in "..." or L"...". A string literal that does not begin with L is an ordinary
string literal, also referred to as a narrow string literal. An ordinary string literal has type “array of n
const char” and static storage duration (3.7), where n is the size of the string as defined below, and is
initialized with the given characters. ...
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation defined.
The effect of attempting to modify a string literal is undefined.
You can do this currently (there is no reason to, though). But you cannot do this anymore with C++0x. They removed the deprecated conversion of a string literal (which has the type const char[N]) to a char *.
Note that this conversion is only for string literals. Thus the following two things are illegal, the first of which specifies an array and the second of which specifies a pointer for initialization
char *x = (0 ? "123" : "345"); // illegal: const char[N] -> char*
char *x = +"123"; // illegal: const char * -> char*
GCC incorrectly accepts both, Clang correctly rejects both.
The constant is not allocated on the heap but it is a constant. You don't need to destroy it.
Not in a modern compiler. In modern compilers, the type of "String" is const char *, which you can't assign to a char * due to the const mismatch.
If you made c a const char * (and changed the return type of the function), the code would be legal. Typically the string literal "String" would be placed in the executable's data section by the linker, and in many cases, in a special section for read-only data.