I know that for example "hello" is of type const char*. So my questions are:
How can we assign a literal string like "hello" to a non-const char* like this:
char* s = "hello"; // "hello" is type of const char* and s is char*
// and we know that conversion from const char* to
// char* is invalid
Is a literal string like "hello", which will take memory in all my program, or it's just like temporary variable that will get destroyed when the statement ends?
In fact, "hello" is of type char const[6].
But the gist of the question is still right – why does C++ allow us to assign a read-only memory location to a non-const type?
The only reason for this is backwards compatibility to old C code, which didn’t know const. If C++ had been strict here it would have broken a lot of existing code.
That said, most compilers can be configured to warn about such code as deprecated, or even do so by default. Furthermore, C++11 disallows this altogether but compilers may not enforce it yet.
For Standerdese Fans:
[Ref 1]C++03 Standard: §4.2/2
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ] For the purpose of ranking in overload resolution (13.3.3.1.1), this conversion is considered an array-to-pointer conversion followed by a qualification conversion (4.4). [Example: "abc" is converted to “pointer to const char” as an array-to-pointer conversion, and then to “pointer to char” as a qualification conversion. ]
C++11 simply removes the above quotation which implies that it is illegal code in C++11.
[Ref 2]C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
is literal string like "hello" will take memory in all my program all it's just like a temporary variable that will get destroyed when the statement ends.
It is kept in programm data, so it is awaiable within lifetime of the programm. You can return pointers and references to this data from the current scope.
The only reason why const char* is being cast to char* is comatiblity with c, like winapi system calls. And this cast is made unexplicit unlike any other const casting.
Just use a string:
std::string s("hello");
That would be the C++ way. If you really must use char, you'll need to create an array and copy the contents over it.
The answer to your second question is that the variable s is stored in RAM as type pointer-to-char. If it's global or static, it's allocated on the heap and remains there for the life of the running program. If it's a local ("auto") variable, it's allocated on the stack and remains there until the current function returns. In either case, it occupies the amount of memory required to hold a pointer.
The string "Hello" is a constant, and it's stored as part of the program itself, along with all the other constants and initializers. If you built your program to run on an appliance, the string would be stored in ROM.
Note that, because the string is constant and s is a pointer, no copying is necessary. The pointer s simply points to wherever the string is stored.
In your example, you are not assigning, but constructing. std::string, for example, does have a std::string(const char *) constructor (actually it's more complicated, but it doesn't matter). And, similarly, char * (if it was a type rather than a pointer to a type) could have a const char * constructor, which is copying the memory.
I don't actually know how the compiler really works here, but I think it could be similar to what I've described above: a copy of "Hello" is constructed in stack and s is initialized with this copy's address.
Related
While reading C++ Primer, I came across this line:
const char *cp = "Hello World";
To my understanding, "Hello World" is a string literal, which is an array of constant characters. Since an array decays to a pointer to the first element in an array. Does that mean cp points to H which is a literal? Isn't it impossible to have a pointer to a literal, since a pointer has to point to the address of an object in memory?
The storage type of the literals types: boolean, integer, floating, character and nullptr is not specified and therefore they do not need to have a storage location in memory.
The storage type of literal string type is specified:
"...String literals have static storage duration, and thus exist in memory for the life of the program..." source: https://en.cppreference.com/w/cpp/language/string_literal
Therefore the address of a literal string can be taken and stored in a const char *.
As suggested by #MichaelKenzel :
From the C++17 draft standard (n4659) https://timsong-cpp.github.io/cppwp/n4659/lex.string#16
Evaluating a string-literal results in a string literal object with
static storage duration, initialized from the given characters as
specified above. Whether all string literals are distinct (that is,
are stored in nonoverlapping objects) and whether successive
evaluations of a string-literal yield the same or a different object
is unspecified. [ Note: The effect of attempting to modify a string
literal is undefined. — end note ]
So from my understanding pointer variables point to an address. So, how is the following code valid in C++?
char* b= "abcd"; //valid
int *c= 1; //invalid
The first line
char* b= "abcd";
is valid in C, because "string literals", while used as initializer, boils down to the address of the first element in the literal, which is a pointer (to char).
Related, C11, chapter §6.4.5, string literals,
[...] The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence. [...]
and then, chapter §6.3.2.1 (emphasis mine)
Except when it is the operand of the sizeof operator, the _Alignof operator, or the
unary & operator, or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points
to the initial element of the array object and is not an lvalue.
However, as mentioned in comments, in C++11 onwards, this is not valid anymore as string literals are of type const char[] there and in your case, LHS lacks the const specifier.
OTOH,
int *c= 1;
is invalid (illegal) because, 1 is an integer constant, which is not the same type as int *.
In C and very old versions of C++, a string literal "abcd" is of type char[], a character array. Such an array can naturally get pointed at by a char*, but not by a int* since that's not a compatible type.
However, C and C++ are different, often incompatible programming languages. They dropped compatibility with each other some 20 years ago.
In standard C++, a string literal is of type const char[] and therefore none of your posted code is valid in C++. This won't compile:
char* b = "abcd"; //invalid, discards const qualifier
This will:
const char* c = "abcd"; // valid
"abcd" is actually a const char[5] type, and the language permits this to be assigned to a const char* (and, regrettably, a char* although C++11 onwards disallows it.).
int *c = 1; is not allowed by the C++ or C standards since you can't assign an int to an int* pointer (with the exception of 0, and in that case your intent will be expressed clearer by assigning nullptr instead).
"abcd" is the address that contains the sequence of five bytes 97 98 99 100 0 -- you cannot see what the address is in the source code, but the compiler will still assign it an address.
1 is also an address near the bottom of your [virtual] memory. This may not seem to be useful to you, but it is useful to other people, so even though the "standard" might not want to permit this, every compiler you are ever likely to run into will support this.
While all other answers give the correct answer of why you code doesn't work, using a compound literal to initialize c, is one way you can make your code work, e.g.
int *c= (int[]){ 1 };
printf ("int pointer c : %d\n", *c);
Note, there are differences between C and C++ in the use of compound literals, they are only available in C.
Is the pointer returned by the following function valid?
const char * bool2str( bool flg )
{
return flg ? "Yes" : "No";
}
It works well in Visual C++ and g++. What does C++ standard say about this?
On storage duration:
2.13.4
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow
string literal has type “array of n const char”, where n is the size of the string as defined below, and has
static storage duration
read in conjunction with 3.7.1
3.7.1.
All objects which do not have dynamic storage duration, do not have thread storage duration, and are
not local have static storage duration. The storage for these objects shall last for the duration of the
program (3.6.2, 3.6.3).
On type:
Annex C
Subclause 2.13.4:
Change: String literals made const
The type of a string literal is changed from “array of char ” to “array of const char.” The type of a
char16_t string literal is changed from “array of some-integer-type ” to “array of const char16_t.” The
type of a char32_t string literal is changed from “array of some-integer-type ” to “array of const char32_-
t.” The type of a wide string literal is changed from “array of wchar_t ” to “array of const wchar_t.”
Rationale: This avoids calling an inappropriate overloaded function, which might expect to be able to
modify its argument.
Effect on original feature: Change to semantics of well-defined feature.
Difficulty of converting: Simple syntactic transformation, because string literals can be converted to
char*; (4.2). The most common cases are handled by a new but deprecated standard conversion:
char* p = "abc"; // valid in C, deprecated in C++
char* q = expr ? "abc" : "de"; // valid in C, invalid in C++
How widely used: Programs that have a legitimate reason to treat string literals as pointers to potentially
modifiable memory are probably rare.
Dynamically allocated (the word 'heap' is never used in context of an area of memory AFAIK in the standard) memory requires a function call that can happen as early as main much after the static memory is allocated.
This code is perfectly valid and conformant. The only "gotcha" would be to ensure that the caller doesn't try to free the string.
This code is valid and standard compliant.
String literals are stored in read-only memory, and the function just gets the address of the chosen string.
C++ standard (2.13.4) says :
An ordinary string literal has type
“array of n const char” and static
storage duration
They key to understand your problem here, is the static storage duration : string literals are allocated when your program launch, and last for the duration of the program. Your function just gets the address and returns it.
Technically Yes it is valid.
The strings have static storage durataion.
But that is not the whole story.
These are C-Strings. The convention in C-Libraries and funcctions is to return a dynamically allocated string that should be freed. ie A pointer returned is implicitly passing ownership back to tha caller (As usuall in C there are also exceptions).
If you do not follow these conventions you will confuse a lot of experienced C-Developers that would expect this convention. If you do not follow this standard expectation then it should be well documented in the code.
Also this is C++ (as per your tags). So it is more conventional to return a std::string. The reason for this is that the passing of ownership via pointers is only implied (and this lead to a lot of errors in C code were the above expectation was broken but documented, unfortunately the documentaiton was never read by the user of the code). By using a std::string you are passing an object and their is no longer any question of ownership (the result is passed back as a value and thus yours), but because it is an object there is no questions or issues with resource allocation.
If you are worried about effeciency I think that is a false concern.
If you want this for printing via streams there is already a standard convention to do that:
std::cout << std::boolalpha << false << std::endl;
std::cout << std::boolalpha << true << std::endl;
Out of curiosity, I'm wondering what the real underlying type of a C++ string literal is.
Depending on what I observe, I get different results.
A typeid test like the following:
std::cout << typeid("test").name() << std::endl;
shows me char const[5].
Trying to assign a string literal to an incompatible type like so (to see the given error):
wchar_t* s = "hello";
I get a value of type "const char *" cannot be used to initialize an entity of type "wchar_t *" from VS12's IntelliSense.
But I don't see how it could be const char * as the following line is accepted by VS12:
char* s = "Hello";
I have read that this was allowed in pre-C++11 standards as it was for retro-compatibility with C, although modification of s would result in Undefined Behavior. I assume that this is simply VS12 having not yet implemented all of the C++11 standard and that this line would normally result in an error.
Reading the C99 standard (from here, 6.4.5.5) suggests that it should be an array:
The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence.
So, what is the type underneath a C++ string literal?
Thank you very much for your precious time.
The type of a string literal is indeed const char[SIZE] where SIZE is the length of the string plus the null terminating character.
The fact that you're sometimes seeing const char* is because of the usual array-to-pointer decay.
But I don't see how it could be const char * as the following line is accepted by VS12:
char* s = "Hello";
This was correct behaviour in C++03 (as an exception to the usual const-correctness rules) but it has been deprecated since. A C++11 compliant compiler should not accept that code.
The type of a string literal is char const[N] where N is the number of characters including the terminating null character. Although this type does not convert to char*, the C++ standard includes a clause allowing assignments of string literal to char*. This clause was added to support compatibility especially for C code which didn't have const back then.
The relevant clause for the type in the standard is 2.14.5 [lex.string] paragraph 8:
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration (3.7).
First off, the type of a C++ string literal is an array of n const char. Secondly, if you want to initialise a wchar_t with a string literal you have to code:
wchar_t* s = L"hello"
char* foo = "fpp"; //compile in vs 2010 with no problem
I though string literal is const char* type.
And const type cannot be assigned to non-const type.
So I expect the code above to fail or am I missing something?
Edit: Sorry guys, I totally forgotten that compiler throws warning too.
I was looking at error list all this time.
I'm forget to check that.
Edit2: I set my project Warning Level to EnableAllWarnings (/Wall) and there's no warning about this.
So my question is still valid.
C++03 deprecates[Ref 1] use of string literal without the const keyword.
[Ref 1]C++03 Standard: §4.2/2
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ] For the purpose of ranking in overload resolution (13.3.3.1.1), this conversion is considered an array-to-pointer conversion followed by a qualification conversion (4.4). [Example: "abc" is converted to “pointer to const char” as an array-to-pointer conversion, and then to “pointer to char” as a qualification conversion. ]
C++11 simply removes the above quotation which implies that it is illegal code in C++11.
Prior to C++03, C++ derived its declaration of string literal without the const keyword, Note that the same is perfectly valid in C.
As I understand it, in C, before const was added, this was the way to assign a string to a pointer.
In C++ this is deprecated behavior, but still allowed to keep backwards compatibility. So don't use it.
In fact, I believe in C++11 it's completely invalid.
Not quite. A string literal is assignable to a char* type. A string literal should never be modified.
This strange situation is for backwards compatibility with programs before const existed.
gcc -std=c++0x warns about this:
a.cpp:5:14: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
So, this is still allowed, but deprecated, because literal strings are const.
There is no such thing as a const type. Const keyword is a so called type qualifier. It can be applied to any pointer type and just means that the value pointed at by the pointer should not be modified.
You could also apply the const qualifier to the pointer reference itself this way:
char* const p ="aaa";
This will protect the pointer variable from pointing to another string.
There's a special implicit conversion to support this, since it was a common idiom in legacy code (often written before const existed). The type of your string literal is char const[], and you should only use it as such. A good compiler will warn at the above, since the conversion was deprecated from the moment it was introduced.
Note that this is different from C, where the type of a string literal is char[] (but trying to modify it is still undefined behavior).
You are talking about C strings, which are actually vector of char. In C++, the class std::string is used, as well as a constant string is created as const std::string.
Anyway, compilers reserve a piece of memory in the future program in order to store the literal strings that show up in the source code. This part of the memory is considered read-only, so you shoud point to it with a const char *. It size is exactly the size of the string plus one extra position for the trailing zero, marking the end of the string.
Compilers need to keep backwards compatibility, so they still accept literals to be pointed by char *. However, this is misleading, since you are not supposed to be able to modify that memory which could be stored in ROM in an embedded system.
In my system, I use clang:
$ clang --version
Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0)
Target: i386-pc-linux-gnu
Thread model: posix
In the clang C compiler, this code compiles without errors:
#include <stdio.h>
#include <stdlib.h>
int main()
{
char * str = "Hello, World!";
printf( "%s", str );
return EXIT_SUCCESS;
}
However, the very same code (with minor modifications, such as the header's names) throws the following warning when compiled as a C++ program:
kk.cpp:6:15: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
char * str = "Hello, World!";
^
1 warning generated.
Hope this helps.