This is valid, because a constexpr expression is allowed to take the value of "a glvalue of literal type that refers to a non-volatile object defined with constexpr, or that refers to a sub-object of such an object" (§5.19/2):
constexpr char str[] = "hello, world";
constexpr char e = str[1];
However, it would seem that string literals do not fit this description:
constexpr char e = "hello, world"[1]; // error: literal is not constexpr
2.14.5/8 describes the type of string literals:
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration.
It would seem that an object of this type could be indexed, if only it were temporary and not of static storage duration (5.19/2, right after the above snippet):
[constexpr allows lvalue-to-rvalue conversion of] … a glvalue of literal type that refers to a non-volatile temporary object whose lifetime has not ended, initialized with a constant expression
This is particularly odd since taking the lvalue of a temporary object is usually "cheating." I suppose this rule applies to function arguments of reference type, such as in
constexpr char get_1( char const (&str)[ 6 ] )
{ return str[ 1 ]; }
constexpr char i = get_1( { 'y', 'i', 'k', 'e', 's', '\0' } ); // OK
constexpr char e = get_1( "hello" ); // error: string literal not temporary
For what it's worth, GCC 4.7 accepts get_1( "hello" ), but rejects "hello"[1] because "the value of ‘._0’ is not usable in a constant expression"… yet "hello"[1] is acceptable as a case label or an array bound.
I'm splitting some Standardese hairs here… is the analysis correct, and was there some design intent for this feature?
EDIT: Oh… there is some motivation for this. It seems that this sort of expression is the only way to use a lookup table in the preprocessor. For example, this introduces a block of code which is ignored unless SOME_INTEGER_FLAG is 1 or 5, and causes a diagnostic if greater than 6:
#if "\0\1\0\0\0\1"[ SOME_INTEGER_FLAG ]
This construct would be new to C++11.
The intent is that this works and the paragraphs that state when an lvalue to rvalue conversion is valid will be amended with a note that states that an lvalue that refers to a subobject of a string literal is a constant integer object initialized with a constant expression (which is described as one of the allowed cases) in a post-C++11 draft.
Your comment about the use within the preprocessor looks interesting but I'm unsure whether that is intended to work. I hear about this the first time at all.
Regarding your question about #if, it was not the intent of the standards committee to increase the set of expressions which can be used in the preprocessor, and the current wording is considered to be a defect. This will be listed as core issue 1436 in the post-Kona WG21 mailing. Thanks for bringing this to our attention!
Related
Is the pointer returned by the following function valid?
const char * bool2str( bool flg )
{
return flg ? "Yes" : "No";
}
It works well in Visual C++ and g++. What does C++ standard say about this?
On storage duration:
2.13.4
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow
string literal has type “array of n const char”, where n is the size of the string as defined below, and has
static storage duration
read in conjunction with 3.7.1
3.7.1.
All objects which do not have dynamic storage duration, do not have thread storage duration, and are
not local have static storage duration. The storage for these objects shall last for the duration of the
program (3.6.2, 3.6.3).
On type:
Annex C
Subclause 2.13.4:
Change: String literals made const
The type of a string literal is changed from “array of char ” to “array of const char.” The type of a
char16_t string literal is changed from “array of some-integer-type ” to “array of const char16_t.” The
type of a char32_t string literal is changed from “array of some-integer-type ” to “array of const char32_-
t.” The type of a wide string literal is changed from “array of wchar_t ” to “array of const wchar_t.”
Rationale: This avoids calling an inappropriate overloaded function, which might expect to be able to
modify its argument.
Effect on original feature: Change to semantics of well-defined feature.
Difficulty of converting: Simple syntactic transformation, because string literals can be converted to
char*; (4.2). The most common cases are handled by a new but deprecated standard conversion:
char* p = "abc"; // valid in C, deprecated in C++
char* q = expr ? "abc" : "de"; // valid in C, invalid in C++
How widely used: Programs that have a legitimate reason to treat string literals as pointers to potentially
modifiable memory are probably rare.
Dynamically allocated (the word 'heap' is never used in context of an area of memory AFAIK in the standard) memory requires a function call that can happen as early as main much after the static memory is allocated.
This code is perfectly valid and conformant. The only "gotcha" would be to ensure that the caller doesn't try to free the string.
This code is valid and standard compliant.
String literals are stored in read-only memory, and the function just gets the address of the chosen string.
C++ standard (2.13.4) says :
An ordinary string literal has type
“array of n const char” and static
storage duration
They key to understand your problem here, is the static storage duration : string literals are allocated when your program launch, and last for the duration of the program. Your function just gets the address and returns it.
Technically Yes it is valid.
The strings have static storage durataion.
But that is not the whole story.
These are C-Strings. The convention in C-Libraries and funcctions is to return a dynamically allocated string that should be freed. ie A pointer returned is implicitly passing ownership back to tha caller (As usuall in C there are also exceptions).
If you do not follow these conventions you will confuse a lot of experienced C-Developers that would expect this convention. If you do not follow this standard expectation then it should be well documented in the code.
Also this is C++ (as per your tags). So it is more conventional to return a std::string. The reason for this is that the passing of ownership via pointers is only implied (and this lead to a lot of errors in C code were the above expectation was broken but documented, unfortunately the documentaiton was never read by the user of the code). By using a std::string you are passing an object and their is no longer any question of ownership (the result is passed back as a value and thus yours), but because it is an object there is no questions or issues with resource allocation.
If you are worried about effeciency I think that is a false concern.
If you want this for printing via streams there is already a standard convention to do that:
std::cout << std::boolalpha << false << std::endl;
std::cout << std::boolalpha << true << std::endl;
I know it's perfectly possible to initialise a char array with a string literal:
char arr[] = "foo";
C++11 8.5.2/1 says so:
A char array (whether plain char, signed char, or unsigned char), char16_t array, char32_t array, or
wchar_t array can be initialized by a narrow character literal, char16_t string literal, char32_t string
literal, or wide string literal, respectively, or by an appropriately-typed string literal enclosed in braces.
Successive characters of the value of the string literal initialize the elements of the array. ...
However, can you do the same with two string literals in a conditional expression? For example like this:
char arr[] = MY_BOOLEAN_MACRO() ? "foo" : "bar";
(Where MY_BOOLEAN_MACRO() expands to a 1 or 0).
The relevant parts of C++11 5.16 (Conditional operator) are as follows:
1 ... The first expression is contextually converted to bool (Clause 4).
It is evaluated and if it is true, the result of the conditional expression is the value of the second expression,
otherwise that of the third expression. ...
4 If the second and third operands are glvalues of the same value category and have the same type, the result
is of that type and value category and it is a bit-field if the second or the third operand is a bit-field, or if
both are bit-fields.
Notice that the literals are of the same length and thus they're both lvalues of type const char[4].
GCC one ideone accepts the construct. But from reading the standard, I am simply not sure whether it's legal or not. Does anyone have better insight?
On the other hand clang does not accept such code (see it live) and I believe clang is correct on this (MSVC also rejects this code ).
A string literal is defined by the grammar in section 2.14.5:
string-literal:
encoding-prefixopt" s-char-sequenceopt"
encoding-prefixoptR raw-string
and the first paragraph from this section says (emphasis mine):
A string literal is a sequence of characters (as defined in 2.14.3)
surrounded by double quotes, optionally prefixed by R, u8, u8R, u, uR,
U, UR, L, or LR, as in "...", R"(...)", u8"...", u8R"(...)",
u"...", uR"˜(...)˜", U"...", UR"zzz(...)zzz", L"...", or LR"(...)",
respectively
and it further says that the type of a narrow string literal is:
“array of n const char”,
as well as:
has static storage duration
but an “array of n const char”, with static storage duration is not a string literal since it does not fit the grammar nor does it fit paragraph 1.
We can make this fail on gcc if we use a non-constant expression (see it live):
bool x = true ;
char arr[] = x ? "foo" : "bar";
which means it is probably an extension, but it is non-conforming since it does not produce a warning in strict conformance mode i.e. using -std=c++11 -pedantic. From section 1.4 [intro.compliance]:
[...]Implementations are required to diagnose programs that use such
extensions that are ill-formed according to this International
Standard. Having done so, however, they can compile and execute such
programs.
This works in GCC in C++11 or newer because the literals you're providing are deterministic during compile time (eg, they are constexpr). Since the compiler can figure out which one is true, it is allowed to figure out which one to use.
To remove the constexpr ability, try something like this:
#include <iostream>
#include <cstdlib>
int main() {
bool _bool = rand();
char arr[] = (_bool) ? "asdf" : "ffff";
std::cout << arr << std::endl;
}
GCC then errors out with:
g++ test.cpp -std=c++11
test.cpp: In function ‘int main()’:
test.cpp:6:34: error: initializer fails to determine size of ‘arr’
char arr[] = (_bool) ? "asdf" : "ffff";
^
test.cpp:6:34: error: array must be initialized with a brace-enclosed initializer
I don't know the standard's text definition well enough to know where or why this is valid, but I feel that it is valid.
For further reading on constexpr and how it can impact compilability, see the answer by #ShafikYaghmour in another question.
What is the difference between char* and int*? Sure, they are of different types, but how is it that I can write
char* s1="hello world";
as
"hello world"
it is not a one character, it's an array of characters, and I cannot write
*s1
as
char* s1 = {'h','e','l','l','o',' ','w','o','r','l','d'};
and
int* a = {2,3,1,45,6};
What is the difference?
It is quite simple: A string literal, i.e., "foobar" is compiled to an array of chars which is stored in the static section of your program (i.e., where all constants are stored) and null terminated. Then, assigning this to a variable simply assigns a pointer to this memory to the variable. E.g., const char* a = "foo"; will assign the address where "foo" is stored to a.
In short, a string constant already brings the memory where it is to be stored with it.
In contrast, initializing a pointer with an initializer list, (i.e., a list of elements inside curly braces) is not defined for pointers. Informally, the problem with an initializer list -- in contrast to a string literal -- is that it does not "bring its own memory". Therefore, we must provide memory where the initializer list can store its chars. This is done by declaring an array instead of a pointer. This compiles fine:
char s1[11]={'h','e','l','l','o',' ','w','o','r','l','d'}
Now, we provided the space where the chars are to be stored by declaring s1 as an array.
Note that you can use brace initialization of pointers, though, e.g.:
char* c2 = {nullptr};
However, while the syntax seems equal, this something completely different which is called uniform initialization and will simply initialize c2 with nullptr.
In your first case, the string literal is decaying to a pointer to a const char. Although s1 really should be const char *, several compiler allow the other form as an extension:
const char* s1 = "hello world" ;
A sting literal is an array of const char, we can see this from the draft C++ standard section 2.14.5 String literals which says (emphasis mine going forward):
Ordinary string literals and UTF-8 string literals are also referred
to as narrow string literals. A narrow string literal has type “array
of n const char”, where n is the size of the string as defined below,
and has static storage duration (3.7).
The conversion of an array to pointer is covered in section 4.2 Array-to-pointer conversion which says:
[...] an expression that has type ‘‘array of type’’ is converted to an
expression with type ‘‘pointer to type’’ that points to the initial
element of the array object and is not an lvalue.[...]
Your other cases do not work because a scalar which can be an arithmetic type, enumeration types or a pointer type can only be initialized with a single element inside braces this is covered in the draft C++ standard section 5.17 Assignment and compound assignment operators 8.5.1 List-initialization paragraph 3 which says:
List-initialization of an object or reference of type T is defined as
follows:
and then enumerates the different cases the only that applies to the right hand side for this case is the following bullet:
Otherwise, if the initializer list has a single element of type E and
either T is not a reference type or its referenced type is
reference-related to E, the object or reference is initialized from
that element; if a narrowing conversion (see below) is required to
convert the element to T, the program is ill-formed.
which requires the list to have a single element, otherwise the final bullet applies:
Otherwise, the program is ill-formed.
In your two cases even if you reduced the initializer to one variable, the types are incorrect
h is a char and 2 is an int which won't convert to a pointer.
The assignment could be made to work by assigning the results to an array such as the following:
char s1[] = { 'h', 'e', 'l', 'l', 'o',' ', 'w', 'o', 'r', 'l', 'd' } ;
int a[] = { 2, 3, 1, 45, 6 } ;
This would be covered in section 8.5.1 Aggregates which says:
An array of unknown size initialized with a brace-enclosed
initializer-list containing n initializer-clauses, where n shall be
greater than zero, is defined as having n elements (8.3.4). [ Example:
int x[] = { 1, 3, 5 };
declares and initializes x as a one-dimensional array that has three
elements since no size was specified and there are three initializers.
—end example ] An empty initializer list {} shall not be used as the
initializer-clause for an array of unknown bound.104
Note:
It is incorrect to say that a brace-init-list is not defined for pointers, it is perfectly usable for pointers:
int x = 10 ;
int *ip = &x ;
int *a = {nullptr} ;
int *b = {ip} ;
From: Is it safe to overload char* and std::string?
#include <string>
#include <iostream>
void foo(std::string str) {
std::cout << "std::string\n";
}
void foo(char* str) {
std::cout << "char*\n";
}
int main(int argc, char *argv[]) {
foo("Hello");
}
The above code prints "char*" when compiled with g++-4.9.0 -ansi -pedantic -std=c++11.
I feel that this is incorrect, because the type of a string literal is "array of n const char", and it shouldn't be possible to initialize a non-const char* with it, so the std::string overload should be selected instead. Is gcc violating the standard here?
First, the type of string literals: They are all constant arrays of their character type.
2.14.5 String literals [lex.string]
7 A string literal that begins with u8, such as u8"asdf", is a UTF-8 string literal and is initialized with the given characters as encoded in UTF-8.
8 Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration (3.7).
9 A string literal that begins with u, such as u"asdf", is a char16_t string literal. A char16_t string literal has type “array of n const char16_t”, where n is the size of the string as defined below; it has static storage duration and is initialized with the given characters. A single c-char may produce more than one char16_t character in the form of surrogate pairs.
10 A string literal that begins with U, such as U"asdf", is a char32_t string literal. A char32_t string literal has type “array of n const char32_t”, where n is the size of the string as defined below; it has static storage duration and is initialized with the given characters.
11 A string literal that begins with L, such as L"asdf", is a wide string literal. A wide string literal has type “array of n const wchar_t”, where n is the size of the string as defined below; it has static storage duration and is initialized with the given characters.
Next, lets see that we only have standard array decay, so from T[#] to T*:
4.2 Array-to-pointer conversion [conv.array]
1 An lvalue or rvalue of type “array of N T” or “array of unknown bound of T” can be converted to a prvalue of type “pointer to T”. The result is a pointer to the first element of the array.
And last, lets see that any conforming extension must not change the meaning of a correct program:
1.4 Implementation compliance [intro.compliance]
1 The set of diagnosable rules consists of all syntactic and semantic rules in this International Standard except for those rules containing an explicit notation that “no diagnostic is required” or which are described as resulting in “undefined behavior.”
2 Although this International Standard states only requirements on C++ implementations, those requirements are often easier to understand if they are phrased as requirements on programs, parts of programs, or execution of programs. Such requirements have the following meaning:
If a program contains no violations of the rules in this International Standard, a conforming implementation shall, within its resource limits, accept and correctly execute2 that program.
If a program contains a violation of any diagnosable rule or an occurrence of a construct described in this Standard as “conditionally-supported” when the implementation does not support that construct, a conforming implementation shall issue at least one diagnostic message.
If a program contains a violation of a rule for which no diagnostic is required, this International
Standard places no requirement on implementations with respect to that program.
So, in summary, it's a compiler bug.
(Before C++11 (C++03) the conversion was allowed but deprecated, so it would have been correct. A diagnostic in case it happened would not have been required but provided as a quality of implementation issue.)
It's a GCC bug (bug-report not found yet), and also a clang bug (found by T.C.).
The test-case from the clang bug-report, which is much shorter:
void f(char*);
int &f(...);
int &r = f("foo");
I know that for example "hello" is of type const char*. So my questions are:
How can we assign a literal string like "hello" to a non-const char* like this:
char* s = "hello"; // "hello" is type of const char* and s is char*
// and we know that conversion from const char* to
// char* is invalid
Is a literal string like "hello", which will take memory in all my program, or it's just like temporary variable that will get destroyed when the statement ends?
In fact, "hello" is of type char const[6].
But the gist of the question is still right – why does C++ allow us to assign a read-only memory location to a non-const type?
The only reason for this is backwards compatibility to old C code, which didn’t know const. If C++ had been strict here it would have broken a lot of existing code.
That said, most compilers can be configured to warn about such code as deprecated, or even do so by default. Furthermore, C++11 disallows this altogether but compilers may not enforce it yet.
For Standerdese Fans:
[Ref 1]C++03 Standard: §4.2/2
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ] For the purpose of ranking in overload resolution (13.3.3.1.1), this conversion is considered an array-to-pointer conversion followed by a qualification conversion (4.4). [Example: "abc" is converted to “pointer to const char” as an array-to-pointer conversion, and then to “pointer to char” as a qualification conversion. ]
C++11 simply removes the above quotation which implies that it is illegal code in C++11.
[Ref 2]C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
is literal string like "hello" will take memory in all my program all it's just like a temporary variable that will get destroyed when the statement ends.
It is kept in programm data, so it is awaiable within lifetime of the programm. You can return pointers and references to this data from the current scope.
The only reason why const char* is being cast to char* is comatiblity with c, like winapi system calls. And this cast is made unexplicit unlike any other const casting.
Just use a string:
std::string s("hello");
That would be the C++ way. If you really must use char, you'll need to create an array and copy the contents over it.
The answer to your second question is that the variable s is stored in RAM as type pointer-to-char. If it's global or static, it's allocated on the heap and remains there for the life of the running program. If it's a local ("auto") variable, it's allocated on the stack and remains there until the current function returns. In either case, it occupies the amount of memory required to hold a pointer.
The string "Hello" is a constant, and it's stored as part of the program itself, along with all the other constants and initializers. If you built your program to run on an appliance, the string would be stored in ROM.
Note that, because the string is constant and s is a pointer, no copying is necessary. The pointer s simply points to wherever the string is stored.
In your example, you are not assigning, but constructing. std::string, for example, does have a std::string(const char *) constructor (actually it's more complicated, but it doesn't matter). And, similarly, char * (if it was a type rather than a pointer to a type) could have a const char * constructor, which is copying the memory.
I don't actually know how the compiler really works here, but I think it could be similar to what I've described above: a copy of "Hello" is constructed in stack and s is initialized with this copy's address.