Type of a C++ string literal - c++

Out of curiosity, I'm wondering what the real underlying type of a C++ string literal is.
Depending on what I observe, I get different results.
A typeid test like the following:
std::cout << typeid("test").name() << std::endl;
shows me char const[5].
Trying to assign a string literal to an incompatible type like so (to see the given error):
wchar_t* s = "hello";
I get a value of type "const char *" cannot be used to initialize an entity of type "wchar_t *" from VS12's IntelliSense.
But I don't see how it could be const char * as the following line is accepted by VS12:
char* s = "Hello";
I have read that this was allowed in pre-C++11 standards as it was for retro-compatibility with C, although modification of s would result in Undefined Behavior. I assume that this is simply VS12 having not yet implemented all of the C++11 standard and that this line would normally result in an error.
Reading the C99 standard (from here, 6.4.5.5) suggests that it should be an array:
The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence.
So, what is the type underneath a C++ string literal?
Thank you very much for your precious time.

The type of a string literal is indeed const char[SIZE] where SIZE is the length of the string plus the null terminating character.
The fact that you're sometimes seeing const char* is because of the usual array-to-pointer decay.
But I don't see how it could be const char * as the following line is accepted by VS12:
char* s = "Hello";
This was correct behaviour in C++03 (as an exception to the usual const-correctness rules) but it has been deprecated since. A C++11 compliant compiler should not accept that code.

The type of a string literal is char const[N] where N is the number of characters including the terminating null character. Although this type does not convert to char*, the C++ standard includes a clause allowing assignments of string literal to char*. This clause was added to support compatibility especially for C code which didn't have const back then.
The relevant clause for the type in the standard is 2.14.5 [lex.string] paragraph 8:
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration (3.7).

First off, the type of a C++ string literal is an array of n const char. Secondly, if you want to initialise a wchar_t with a string literal you have to code:
wchar_t* s = L"hello"

Related

how is char * to string literal valid?

So from my understanding pointer variables point to an address. So, how is the following code valid in C++?
char* b= "abcd"; //valid
int *c= 1; //invalid
The first line
char* b= "abcd";
is valid in C, because "string literals", while used as initializer, boils down to the address of the first element in the literal, which is a pointer (to char).
Related, C11, chapter §6.4.5, string literals,
[...] The multibyte character
sequence is then used to initialize an array of static storage duration and length just
sufficient to contain the sequence. For character string literals, the array elements have
type char, and are initialized with the individual bytes of the multibyte character
sequence. [...]
and then, chapter §6.3.2.1 (emphasis mine)
Except when it is the operand of the sizeof operator, the _Alignof operator, or the
unary & operator, or is a string literal used to initialize an array, an expression that has
type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points
to the initial element of the array object and is not an lvalue.
However, as mentioned in comments, in C++11 onwards, this is not valid anymore as string literals are of type const char[] there and in your case, LHS lacks the const specifier.
OTOH,
int *c= 1;
is invalid (illegal) because, 1 is an integer constant, which is not the same type as int *.
In C and very old versions of C++, a string literal "abcd" is of type char[], a character array. Such an array can naturally get pointed at by a char*, but not by a int* since that's not a compatible type.
However, C and C++ are different, often incompatible programming languages. They dropped compatibility with each other some 20 years ago.
In standard C++, a string literal is of type const char[] and therefore none of your posted code is valid in C++. This won't compile:
char* b = "abcd"; //invalid, discards const qualifier
This will:
const char* c = "abcd"; // valid
"abcd" is actually a const char[5] type, and the language permits this to be assigned to a const char* (and, regrettably, a char* although C++11 onwards disallows it.).
int *c = 1; is not allowed by the C++ or C standards since you can't assign an int to an int* pointer (with the exception of 0, and in that case your intent will be expressed clearer by assigning nullptr instead).
"abcd" is the address that contains the sequence of five bytes 97 98 99 100 0 -- you cannot see what the address is in the source code, but the compiler will still assign it an address.
1 is also an address near the bottom of your [virtual] memory. This may not seem to be useful to you, but it is useful to other people, so even though the "standard" might not want to permit this, every compiler you are ever likely to run into will support this.
While all other answers give the correct answer of why you code doesn't work, using a compound literal to initialize c, is one way you can make your code work, e.g.
int *c= (int[]){ 1 };
printf ("int pointer c : %d\n", *c);
Note, there are differences between C and C++ in the use of compound literals, they are only available in C.

Is this overload resolution correct?

From: Is it safe to overload char* and std::string?
#include <string>
#include <iostream>
void foo(std::string str) {
std::cout << "std::string\n";
}
void foo(char* str) {
std::cout << "char*\n";
}
int main(int argc, char *argv[]) {
foo("Hello");
}
The above code prints "char*" when compiled with g++-4.9.0 -ansi -pedantic -std=c++11.
I feel that this is incorrect, because the type of a string literal is "array of n const char", and it shouldn't be possible to initialize a non-const char* with it, so the std::string overload should be selected instead. Is gcc violating the standard here?
First, the type of string literals: They are all constant arrays of their character type.
2.14.5 String literals [lex.string]
7 A string literal that begins with u8, such as u8"asdf", is a UTF-8 string literal and is initialized with the given characters as encoded in UTF-8.
8 Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration (3.7).
9 A string literal that begins with u, such as u"asdf", is a char16_t string literal. A char16_t string literal has type “array of n const char16_t”, where n is the size of the string as defined below; it has static storage duration and is initialized with the given characters. A single c-char may produce more than one char16_t character in the form of surrogate pairs.
10 A string literal that begins with U, such as U"asdf", is a char32_t string literal. A char32_t string literal has type “array of n const char32_t”, where n is the size of the string as defined below; it has static storage duration and is initialized with the given characters.
11 A string literal that begins with L, such as L"asdf", is a wide string literal. A wide string literal has type “array of n const wchar_t”, where n is the size of the string as defined below; it has static storage duration and is initialized with the given characters.
Next, lets see that we only have standard array decay, so from T[#] to T*:
4.2 Array-to-pointer conversion [conv.array]
1 An lvalue or rvalue of type “array of N T” or “array of unknown bound of T” can be converted to a prvalue of type “pointer to T”. The result is a pointer to the first element of the array.
And last, lets see that any conforming extension must not change the meaning of a correct program:
1.4 Implementation compliance [intro.compliance]
1 The set of diagnosable rules consists of all syntactic and semantic rules in this International Standard except for those rules containing an explicit notation that “no diagnostic is required” or which are described as resulting in “undefined behavior.”
2 Although this International Standard states only requirements on C++ implementations, those requirements are often easier to understand if they are phrased as requirements on programs, parts of programs, or execution of programs. Such requirements have the following meaning:
If a program contains no violations of the rules in this International Standard, a conforming implementation shall, within its resource limits, accept and correctly execute2 that program.
If a program contains a violation of any diagnosable rule or an occurrence of a construct described in this Standard as “conditionally-supported” when the implementation does not support that construct, a conforming implementation shall issue at least one diagnostic message.
If a program contains a violation of a rule for which no diagnostic is required, this International
Standard places no requirement on implementations with respect to that program.
So, in summary, it's a compiler bug.
(Before C++11 (C++03) the conversion was allowed but deprecated, so it would have been correct. A diagnostic in case it happened would not have been required but provided as a quality of implementation issue.)
It's a GCC bug (bug-report not found yet), and also a clang bug (found by T.C.).
The test-case from the clang bug-report, which is much shorter:
void f(char*);
int &f(...);
int &r = f("foo");

"hello world" string literal can be assigned to char * type?

char* foo = "fpp"; //compile in vs 2010 with no problem
I though string literal is const char* type.
And const type cannot be assigned to non-const type.
So I expect the code above to fail or am I missing something?
Edit: Sorry guys, I totally forgotten that compiler throws warning too.
I was looking at error list all this time.
I'm forget to check that.
Edit2: I set my project Warning Level to EnableAllWarnings (/Wall) and there's no warning about this.
So my question is still valid.
C++03 deprecates[Ref 1] use of string literal without the const keyword.
[Ref 1]C++03 Standard: §4.2/2
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ] For the purpose of ranking in overload resolution (13.3.3.1.1), this conversion is considered an array-to-pointer conversion followed by a qualification conversion (4.4). [Example: "abc" is converted to “pointer to const char” as an array-to-pointer conversion, and then to “pointer to char” as a qualification conversion. ]
C++11 simply removes the above quotation which implies that it is illegal code in C++11.
Prior to C++03, C++ derived its declaration of string literal without the const keyword, Note that the same is perfectly valid in C.
As I understand it, in C, before const was added, this was the way to assign a string to a pointer.
In C++ this is deprecated behavior, but still allowed to keep backwards compatibility. So don't use it.
In fact, I believe in C++11 it's completely invalid.
Not quite. A string literal is assignable to a char* type. A string literal should never be modified.
This strange situation is for backwards compatibility with programs before const existed.
gcc -std=c++0x warns about this:
a.cpp:5:14: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
So, this is still allowed, but deprecated, because literal strings are const.
There is no such thing as a const type. Const keyword is a so called type qualifier. It can be applied to any pointer type and just means that the value pointed at by the pointer should not be modified.
You could also apply the const qualifier to the pointer reference itself this way:
char* const p ="aaa";
This will protect the pointer variable from pointing to another string.
There's a special implicit conversion to support this, since it was a common idiom in legacy code (often written before const existed). The type of your string literal is char const[], and you should only use it as such. A good compiler will warn at the above, since the conversion was deprecated from the moment it was introduced.
Note that this is different from C, where the type of a string literal is char[] (but trying to modify it is still undefined behavior).
You are talking about C strings, which are actually vector of char. In C++, the class std::string is used, as well as a constant string is created as const std::string.
Anyway, compilers reserve a piece of memory in the future program in order to store the literal strings that show up in the source code. This part of the memory is considered read-only, so you shoud point to it with a const char *. It size is exactly the size of the string plus one extra position for the trailing zero, marking the end of the string.
Compilers need to keep backwards compatibility, so they still accept literals to be pointed by char *. However, this is misleading, since you are not supposed to be able to modify that memory which could be stored in ROM in an embedded system.
In my system, I use clang:
$ clang --version
Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0)
Target: i386-pc-linux-gnu
Thread model: posix
In the clang C compiler, this code compiles without errors:
#include <stdio.h>
#include <stdlib.h>
int main()
{
char * str = "Hello, World!";
printf( "%s", str );
return EXIT_SUCCESS;
}
However, the very same code (with minor modifications, such as the header's names) throws the following warning when compiled as a C++ program:
kk.cpp:6:15: warning: conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
char * str = "Hello, World!";
^
1 warning generated.
Hope this helps.

Why is it possible to assign a const char* to a char*?

I know that for example "hello" is of type const char*. So my questions are:
How can we assign a literal string like "hello" to a non-const char* like this:
char* s = "hello"; // "hello" is type of const char* and s is char*
// and we know that conversion from const char* to
// char* is invalid
Is a literal string like "hello", which will take memory in all my program, or it's just like temporary variable that will get destroyed when the statement ends?
In fact, "hello" is of type char const[6].
But the gist of the question is still right – why does C++ allow us to assign a read-only memory location to a non-const type?
The only reason for this is backwards compatibility to old C code, which didn’t know const. If C++ had been strict here it would have broken a lot of existing code.
That said, most compilers can be configured to warn about such code as deprecated, or even do so by default. Furthermore, C++11 disallows this altogether but compilers may not enforce it yet.
For Standerdese Fans:
[Ref 1]C++03 Standard: §4.2/2
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ] For the purpose of ranking in overload resolution (13.3.3.1.1), this conversion is considered an array-to-pointer conversion followed by a qualification conversion (4.4). [Example: "abc" is converted to “pointer to const char” as an array-to-pointer conversion, and then to “pointer to char” as a qualification conversion. ]
C++11 simply removes the above quotation which implies that it is illegal code in C++11.
[Ref 2]C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
is literal string like "hello" will take memory in all my program all it's just like a temporary variable that will get destroyed when the statement ends.
It is kept in programm data, so it is awaiable within lifetime of the programm. You can return pointers and references to this data from the current scope.
The only reason why const char* is being cast to char* is comatiblity with c, like winapi system calls. And this cast is made unexplicit unlike any other const casting.
Just use a string:
std::string s("hello");
That would be the C++ way. If you really must use char, you'll need to create an array and copy the contents over it.
The answer to your second question is that the variable s is stored in RAM as type pointer-to-char. If it's global or static, it's allocated on the heap and remains there for the life of the running program. If it's a local ("auto") variable, it's allocated on the stack and remains there until the current function returns. In either case, it occupies the amount of memory required to hold a pointer.
The string "Hello" is a constant, and it's stored as part of the program itself, along with all the other constants and initializers. If you built your program to run on an appliance, the string would be stored in ROM.
Note that, because the string is constant and s is a pointer, no copying is necessary. The pointer s simply points to wherever the string is stored.
In your example, you are not assigning, but constructing. std::string, for example, does have a std::string(const char *) constructor (actually it's more complicated, but it doesn't matter). And, similarly, char * (if it was a type rather than a pointer to a type) could have a const char * constructor, which is copying the memory.
I don't actually know how the compiler really works here, but I think it could be similar to what I've described above: a copy of "Hello" is constructed in stack and s is initialized with this copy's address.

Return char* to string literal

Can you do this?
char* func()
{
char * c = "String";
return c;
}
is "String" here a globally allocated data by compiler?
You can do that. But it would be even more correct to say:
const char* func(){
return "String";
}
The c++ spec says that string literals are given static storage duration. I can't link to it because there are precious few versions of the c++ spec online.
This page on const correctness is the best reference I can find.
Section 2.13.4 of ISO/IEC 14882 (Programming languages - C++) says:
A string literal is a sequence of characters (as defined in 2.13.2) surrounded by double quotes, optionally
beginning with the letter L, as in "..." or L"...". A string literal that does not begin with L is an ordinary
string literal, also referred to as a narrow string literal. An ordinary string literal has type “array of n
const char” and static storage duration (3.7), where n is the size of the string as defined below, and is
initialized with the given characters. ...
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation defined.
The effect of attempting to modify a string literal is undefined.
You can do this currently (there is no reason to, though). But you cannot do this anymore with C++0x. They removed the deprecated conversion of a string literal (which has the type const char[N]) to a char *.
Note that this conversion is only for string literals. Thus the following two things are illegal, the first of which specifies an array and the second of which specifies a pointer for initialization
char *x = (0 ? "123" : "345"); // illegal: const char[N] -> char*
char *x = +"123"; // illegal: const char * -> char*
GCC incorrectly accepts both, Clang correctly rejects both.
The constant is not allocated on the heap but it is a constant. You don't need to destroy it.
Not in a modern compiler. In modern compilers, the type of "String" is const char *, which you can't assign to a char * due to the const mismatch.
If you made c a const char * (and changed the return type of the function), the code would be legal. Typically the string literal "String" would be placed in the executable's data section by the linker, and in many cases, in a special section for read-only data.