Why is a hard-coded string constant an lvalue? [duplicate] - c++

C++03 5.1 Primary expressions §2 says:
A literal is a primary expression. Its type depends on its form (2.13). A string literal is an lvalue; all other literals are rvalues.
Similarly, C99 6.5.1 §4 says:
A string literal is a primary expression. It is an lvalue with type as detailed in 6.4.5.
What is the rationale behind this?
As I understand, string literals are objects, while all other literals are not. And an l-value always refers to an object.
But the question then is why are string literals objects while all other literals are not? This rationale seems to me more like an egg or chicken problem.
I understand the answer to this may be related to hardware architecture rather than C/C++ as programming languages, nevertheless I would like to hear the same.

A string literal is a literal with array type, and in C there is no way for an array type to exist in an expression except as an lvalue. String literals could have been specified to have pointer type (rather than array type that usually decays to a pointer) pointing to the string "contents", but this would make them rather less useful; in particular, the sizeof operator could not be applied to them.
Note that C99 introduced compound literals, which are also lvalues, so having a literal be an lvalue is no longer a special exception; it's closer to being the norm.

String literals are arrays - objects of inherently unpredictable size (i.e of user-defined and possibly large size). In general case, there's simply no other way to represent such literals except as objects in memory, i.e. as lvalues. In C99 this also applies to compound literals, which are also lvalues.
Any attempts to artificially hide the fact that string literals are lvalues at the language level would produce a considerable number of completely unnecessary difficulties, since the ability to point to a string literal with a pointer as well as the ability to access it as an array relies critically on its lvalue-ness being visible at the language level.
Meanwhile, literals of scalar types have fixed compile-time size. At the same time, such literals are very likely to be embedded directly into the machine commands on the given hardware architecture. For example, when you write something like i = i * 5 + 2, the literal values 5 and 2 become explicit (or even implicit) parts of the generated machine code. They don't exist and don't need to exist as standalone locations in data storage. There's simply no point in storing values 5 and 2 in the data memory.
It is also worth noting that on many (if not most, or all) hardware architectures floating-point literals are actually implemented as "hidden" lvalues (even though the language does not expose them as such). On platforms like x86 machine commands from floating-point group do not support embedded immediate operands. This means that virtually every floating-point literal has to be stored in (and read from) data memory by the compiler. E.g. when you write something like i = i * 5.5 + 2.1 it is translated into something like
const double unnamed_double_5_5 = 5.5;
const double unnamed_double_2_1 = 2.1;
i = i * unnamed_double_5_5 + unnamed_double_2_1;
In other words, floating-point literals often end up becoming "unofficial" lvalues internally. However, it makes perfect sense that language specification did not make any attempts to expose this implementation detail. At language level, arithmetic literals make more sense as rvalues.

I'd guess that the original motive was mainly a pragmatic one: a string
literal must reside in memory and have an address. The type of a string
literal is an array type (char[] in C, char const[] in C++), and
array types convert to pointers in most contexts. The language could
have found other ways to define this (e.g. a string literal could have
pointer type to begin with, with special rules concerning what it
pointed to), but just making the literal an lvalue is probably the
easiest way of defining what is concretely needed.

An lvalue in C++ does not always refer to an object. It can refer to a function too. Moreover, objects do not have to be referred to by lvalues. They may be referred to by rvalues, including for arrays (in C++ and C). However, in old C89, the array to pointer conversion did not apply for rvalues arrays.
Now, an rvalue denotes no, limited or soon to be an expired lifetime. A string literal, however, lives for the entire program.
So string literals being lvalues is exactly right.

Related

Confusion between constants and literals?

I am currently reading about constants on the c++ tutorial from TutorialsPoint and, where it says:
Constants refer to fixed values that the program may not alter and they are called literals.
(Source)
I do not really get this. If constants are called literals and literals are data represented directly in the code, how can constants be considered as literals? I mean variables preceded with the const keyword are constants, but they are not literals, so how can you say that constants are literals?
Here:
const int MEANING = 42;
the value MEANING is a constant, 42 is a literal. There is no real relationship between the two terms, as can be seen here:
int n = 42;
where n is not a constant, but 42 is still a literal.
The major difference is that a constant may have an address in memory (if you write some code that needs such an address), whereas a literal never has an address.
I disagree with the claim "...There wasn't a thing called const in C originally so this was fine." const is actually one of the 32 C keywords. Google to see.
With that rested, I think the man missed something at TP. To be fair to them at Tutorials Point, they had an article that explained the difference thus (full quote, verbatim):
https://www.tutorialspoint.com/questions/category/Cplusplus
A literal is a value that is expressed as itself. For example, the number 25 or the string "Hello World" are both literals.
A constant is a data type that substitutes a literal. Constants are used when a specific, unchanging value is used various times during the program. For example, if you have a constant named PI that you'll be using at various places in your program to find the area, circumference, etc of a circle, this is a constant as you'll be reusing its value. But when you'll be declaring it as:
const float PI = 3.141;
The 3.141 is a literal that you're using. It doesn't have any memory address of its own and just sits in the source code.
Pls don't disparage those fellows doing what you call "random tutorials". Kids from poorer homes and less developed world can't afford your " good C++ textbooks " e.g. Scott Myers Effective C++ It is these online free tutorials they can have, and most of these tutorials do better explaining than the "good books".
By any means read them guys. Get confused some then come over here to StackOveflow or Quora to have your confusion cleared. Happy coding guys.
The author of the article is confused, and spreading that confusion to others (including you).
In C, literals are "constants". There wasn't a thing called const in C originally so this was fine.
C++ is a different language. In C++, literals are called "literals", and "constant" has a few meanings but generally is a const thing. The two concepts are different (although both kinds of things cannot be mutated after initial creation). We also have compile-time constants via constexpr which is yet another thing.
In general, read a good book rather than random tutorials written by randomers on the internet!
While the first part of the statement makes sense
Constants refer to fixed values that the program may not alter
the continuation
and they are called literals
is not really true.
Neil has already explained the semantical difference between the literal and the constant in his answer. But I would also like to add that the values of constant variables in C++ are not necessarily known at compile time.
// x might be obtained at runtime
// for instance, from the user input
void print_square(int x)
{
const int square = x*x;
std::cout << square << '\n';
}
Literals are values that are known at compile-time, which allows the compiler to put them to a separate read-only address space in the resulting binaries.
You can also enforce your variables to be known at compile-time by applying constexpr keyword (C++11).
constexpr int meaning = 42;
P.S. And I also do agree with a comment suggesting to use a good book instead of tutorialspoint.
If constants are called literals and literals are data represented directly in the code, how can constants be considered as literals?
The article from which you drew the quote is defining the word "constant" to be a synonym of "literal". The latter is the C++ standard's term for what it is describing. The former is what the C standard uses for the same concept.
I mean variables preceded with the const keyword are constants, but they are not literals, so how can you say that constants are literals?
And there you are providing an alternative definition for the term "constant", which, you are right, is inconsistent with the other. That's all. TP is using a different definition of the term than the one you are used to.
In truth, although the noun usage of "constant" appears in a couple of places in the C++ standard outside the defined term "null pointer constant", apparently with the meaning you propose here, I do not find an actual definition of that term, and especially not one matching yours. In truth, your definition is less plausible than TutorialPoint's, because an expression having const-qualified type can nevertheless designate an object that is modifiable (via a different expression).
Constant is simply a variable declared constant by keyword 'const' whose value after being declared shouldn't be altered during the course of the program (and if tried to alter it will result in an error).
On the other hand, literal is simply what is used and represented as it is typed in. For example, 25 when used in an expression (x+4*y+25) will be termed as literal.
Whenever we use String values or directly supply it in double quotes ("hello"), then that value in double quotes is called literal.
For example, printf("This is literal");
And if you are assigning a string value to a variable then thereafter you will refer to the variable (which could be declared constant if desired) and not exclusively to the value you have stored in it, i.e., only till the point you are supplying a value (string type of any other type) to the variable, the value is referred to as literal value, after that the variable is talked about whenever referring that value.
Once again, the value(25) in expression : x+4*y+25 is literal.
The value(4) in the term 4*y is also a literal (since it is exactly as we see it and is known to compiler beforehand).
--> The value(4) in the term 4*y is called numerical coefficient in algebraic terms and y is called literal coefficient in algebraic terms.
Hence,
All the above explanation I have given is in computer terms only. The meaning of literals and constants in Algebra are somewhat different than used in computer terms.
"Constants refer to fixed values that the program may not alter and they are called literals. (Source)"
The sentence construction is weird which is leading to the confusion.
Here, the the "they" that are referring to are the the fixed values and not constants. I would phrase it as "Constants refer to fixed values, that the program may not alter, called literals." which is less confusing I hope.
Constants are variables that can't vary, whereas Literals are literally numbers/letters that indicate the value of a variable or constant.
I can explain it this way.
Basically, constants are variables whose value cannot change.
Literals are notations that represent fixed values. These values can be Strings numbers etc
Literals can be assigned to variables
Code :
var a = 10;
var name = "Simba";
const pi = 3.14;
Here a and name are variables. pi is a constant. ( Constants are those variables whose value doesn't change. )
Here 10, "Simba" and 3.14 are literals.

Is there any C++ style guide that talks about numeric literal suffixes?

In all of the C++ style guides I have read, I never have seen any information about numerical literal suffixes (i.e. 3.14f, 0L, etc.).
Questions
Is there any style guide out there that talks about there usage, or is there a general convention?
I occasionally encounter the f suffix in graphics programming. Is there any trend on there usage in the type of programming domain?
The only established convention (somewhat established, anyway) of which I'm aware is to always use L rather than l, to avoid its being mistaken for a 1. Beyond that, it's pretty much a matter of using what you need when you need it.
Also note that C++ 11 allows user-defined literals with user-defined suffixes.
There is no general style guide that I've found. I use capital letters and I'm picky about using F for float literals and L for long double. I also use the appropriate suffixes for integral literals.
I assume you know what these suffixes mean: 3.14F is a float literal, 12.345 is a double literal, 6.6666L is a long double literal.
For integers: U is unsigned, L is long, LL is long long. Order between U and the Ls doesn't matter but I always put UL because I declare such variables unsigned long for example.
If you assign a variable of one type a literal of another type, or supply a numeric literal of one type for function argument of another type a cast must happen. Using the proper suffix avoids this and is useful along the same lines as static_cast is useful for calling out casts. Consistent usage of numeric literal suffixes is good style and avoids numeric surprises.
People differ on whether lower or upper case is best. Pick a style that looks good to you and be consistent.
The CERT C Coding Standard
recommends to use uppercase letters:
DCL16-C. Use "L," not "l," to indicate a long value
Lowercase letter l (ell) can easily be confused with the digit 1 (one). This can be particularly confusing when indicating that an integer literal constant is a long value. This recommendation is similar to DCL02-C. Use visually distinct identifiers.
Likewise, you should use uppercase LL rather than lowercase ll when indicating that an integer literal constant is a long long value.
MISRA C++ 2008 for the C++03 language states in rule M2-13-3 (at least, as cited by this Autosar document) that
A “U” suffix shall
be applied to all octal or hexadecimal
integer literals of unsigned type.
The linked document also compares to JSF-AV 2005 and HIC++v4.0, all these four standards require the suffixes to be uppercase.
Nevertheless I can't find a rule (but I don't have a hardcopy of MISRA C++ at hand) that states that the suffixes shall be used whenever needed. However, IIRC there is one in MISRA C++ (or maybe was it just my former company coding guidelines…)
Web search for "c++ numeric suffixes" returns:
http://cpp.comsci.us/etymology/literals.html
http://www.cplusplus.com/forum/general/27226/
http://bytes.com/topic/c/answers/758563-numeric-constants
Are these what you're looking for?

Why doesn't this character conversion work?

Visual Studio 2008
Project compiled as multibyte character set
LPWSTR lpName[1] = {(WCHAR*)_T("Setup")};
After this conversion, lpName[0] contains garbage (at least when previewed in VS)
LPWSTR is typedef'd as follows:
typedef __nullterminated WCHAR *NWPSTR, *LPWSTR, *PWSTR;
It's an expanded version of my comment above.
The code shown casts a pointer of type A to a pointer of type B. This is a low-vevel, machine-dependent operation. It almost never works as a conversion of an object of type A to an object of type B, especially if one type is a regular character type and the other is wide characters.
Imagine that you take a French book, and read it aloud as if it was written in English.
FRENCH* book;
readaloud ((ENGLISH*) book);
You will mostly hear gibberish. The letters used in the two languages are the same (or similar, at any rate), but the rules of the two languages are are totally different. The representation is the same for both languages, but the meaning is not.
This is very similar to what we have here. Whatever type you have, bits and bytes are the same, but the rules are totally different. You take bits laid out according to regular character rules, and try to interpret them according to wide character rules. It doesn't work. The representation is the same in both cases, but the meaning is not.
To convert one character flavor to another, you in general need a lookup table or some other means to convert each character from one type to the other — change representation, but keep the meaning. Likewise, to convert a French book into an English book, you need to use a big lookup table a.k.a. dictionary... well, the analogy breaks here, because there's no formal set of conversion rules, you need to be creative! But you get the idea.
The rules of C++ actually prohibit such casts. You can only cast an object type poiner to void*, and only use the result to cast it back to the original object type. Everything else is a no-no (unless you are willing to venture in the realm of undefined behavior).
So what should you do?
Pick one character variant and stick to it.
If you must convert between flavors, do so with a library function.
Try to avoid pointer casts, they almost always signal trouble.
I think what you're looking for is
LPTSTR lpName[1] = {_T("Setup")};
The various typedefs with a T in them (e.g. TSTR, LPTSTR) are dependant on whether you use unicode or multi-byte or whatever else. By using these, you should be able to write code that work in whatever encoding you are using (i.e., tomorrow you could switch to ascii, and a large portion of your code should still work).
Edit
If you are in situation where you really must convert between encodings, then there are various conversion functions available, such as wcstombs (or microsoft's documentation) and mbstowcs. These are defined in <cstdlib>

What is the type of string literals in C and C++?

What is the type of string literal in C? Is it char * or const char * or const char * const?
What about C++?
In C the type of a string literal is a char[] - it's not const according to the type, but it is undefined behavior to modify the contents. Also, 2 different string literals that have the same content (or enough of the same content) might or might not share the same array elements.
From the C99 standard 6.4.5/5 "String Literals - Semantics":
In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; for wide string literals, the array elements have type wchar_t, and are initialized with the sequence of wide characters...
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
In C++, "An ordinary string literal has type 'array of n const char'" (from 2.13.4/1 "String literals"). But there's a special case in the C++ standard that makes pointer to string literals convert easily to non-const-qualified pointers (4.2/2 "Array-to-pointer conversion"):
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; a wide string literal can be converted to an rvalue of type “pointer to wchar_t”.
As a side note - because arrays in C/C++ convert so readily to pointers, a string literal can often be used in a pointer context, much as any array in C/C++.
Additional editorializing: what follows is really mostly speculation on my part about the rationale for the choices the C and C++ standards made regarding string literal types. So take it with a grain of salt (but please comment if you have corrections or additional details):
I think that the C standard chose to make string literal non-const types because there was (and is) so much code that expects to be able to use non-const-qualified char pointers that point to literals. When the const qualifier got added (which if I'm not mistaken was done around ANSI standardization time, but long after K&R C had been around to accumulate a ton of existing code) if they made pointers to string literals only able to be be assigned to char const* types without a cast nearly every program in existence would have required changing. Not a good way to get a standard accepted...
I believe the change to C++ that string literals are const qualified was done mainly to support allowing a literal string to more appropriately match an overload that takes a "char const*" argument. I think that there was also a desire to close a perceived hole in the type system, but the hole was largely opened back up by the special case in array-to-pointer conversions.
Annex D of the standard indicates that the "implicit conversion from const to non-const qualification for string literals (4.2) is deprecated", but I think so much code would still break that it'll be a long time before compiler implementers or the standards committee are willing to actually pull the plug (unless some other clever technique can be devised - but then the hole would be back, wouldn't it?).
A C string literal has type char [n] where n equals number of characters + 1 to account for the implicit zero at the end of the string.
The array will be statically allocated; it is not const, but modifying it is undefined behaviour.
If it had pointer type char * or incomplete type char [], sizeof could not work as expected.
Making string literals const is a C++ idiom and not part of any C standard.
They used to be of type char[]. Now they are of type const char[].
For various historical reasons, string literals were always of type char[] in C.
Early on (in C90), it was stated that modifying a string literal invokes undefined behavior.
They didn't ban such modifications though, nor did they make string literals const char[] which would have made more sense. This was for backwards-compatibility reasons with old code. Some old OS (most notably DOS) didn't protest if you modified string literals, so there was plenty of such code around.
C still has this defect today, even in the most recent C standard.
C++ inherited the same very same defect from C, but in later C++ standards, they have finally made string literals const (flagged obsolete in C++03, finally fixed in C++11).

(c/c++) do copies of string literals share memory in TEXT section?

If I call a function like
myObj.setType("fluid");
many times in a program, how many copies of the literal "fluid" are saved in memory? Can the compiler recognize that this literal is already defined and just reference it again?
This has nothing to do with C++(the language). Instead, it is an "optimization" that a compiler can do. So, the answer yes and no, depending on the compiler/platform you are using.
#David This is from the latest draft of the language:
§ 2.14.6 (page 28)
Whether all string literals are
distinct (that is, are stored in
non overlapping objects) is
implementation defined. The effect of
attempting to modify a string literal
is undefined.
The emphasis is mine.
In other words, string literals in C++ are immutable because modifying a string literal is undefined behavior. So, the compiler is free, to eliminate redundant copies.
BTW, I am talking about C++ only ;)
Yes, it can. Of course, it depends on the compiler. For VC++, it's even configurable:
http://msdn.microsoft.com/en-us/library/s0s0asdt(VS.80).aspx
Yes it can, but there's no guarantee that it will. Define a constant if you want to be sure.
This is a compiler implementation issue. Many compilers that I have used have an option to share or merge duplicate string literals. Allowing duplicate string literals speeds up the compilation process but produces larger executables.
I believe that in C/C++ there is no specified handling for that case, but in most cases would use multiple definitions of that string.
2.13.4/2: "whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation-defined".
This permits the optimisation you're asking about.
As an aside, there may be a slight ambiguity, at least locally within that section of the standard. The definition of string literal doesn't quite make clear to me whether the following code uses one string literal twice, or two string literals once each:
const char *a = "";
const char *b = "";
But the next paragraph says "In translation phase 6 adjacent narrow string literals are concatenated". Unless it means to say that something can be adjacent to itself, I think the intention is pretty clear that this code uses two string literals, which are concatenated in phase 6. So it's not one string literal twice:
const char *c = "a" "a";
Still, if you did read that "a" and "a" are the same string literal, then the standard requires the optimisation you're talking about. But I don't think they are the same literal, I think they're different literals that happen to consist of the same characters. This is perhaps made clear elsewhere in the standard, for instance in the general information on grammar and parsing.
Whether it's made clear or not, many compiler-writers have interpreted the standard the way I think it is, so I might as well be right ;-)