Static constant character pointer and why it's used in this fashion - c++

static const char* const test_script = "test_script";
When and why would you use a statement such as the one above?
Does it have any benefits?
Why use a char* instead of a constant character?
A "constant Character pointer" (const char*) is a constant already and cannot be changed; Then why use the word static in the front of it? What good does it do?

A const char *p is not a constant pointer. It's a modifiable pointer to a const char, that is, to a constant character. You can make the pointer point to something else, but you cannot change the character(s) to which it points. In other words, p = x is allowed, but *p = y is not.
A char * const is the opposite: a constant pointer to a mutable character. *p = y is allowed, p = x is not.
A const char * const is both: a constant pointer to a constant character.
Regarding static: this gives the declared variable internal linkage (cannot be accessed by name from outside the source file). Since you're asking about both C and C++, note that this is where they differ.
In C++, a variable declared const and not explicitly declared extern has internal linkage by default. Since the pointer in question is const (I'm talking about the second const), the static is superfluous in C++ and doesn't do anything.
That's not the case in C, where const variables cannot be used as constant expressions and do not have internal linkage by default. So in C, the static is necessary to give test_script internal linkage.
The above discussion of static assumes the declaration is at file scope (C) or namespace scope (C++). If it's inside a function, the meaning of static changes. Without static, it would a normal local variable in the function—each invocation would have its own copy. With static, it receives static storage duration and thus persists between function calls—all invocations share that one copy. Since you're asking about both C and C++, I am not going to discuss class scope.
Finally, you asked "why pointer instead of characters." This way, the pointer points to the actual string literal (probably somewhere in the read-only part of the process's memory). One reason to do that would be if you even need to pass test_script somewhere where a const char * const * (a pointer to a constant pointer to constant character(s)) is expected. Also, if the same string literal appears multiple times in source code, it can be shared.
An alternative would be to declare an array:
const char test_script[] = "test_script";
This would copy the string literal into test_script, guaranteeing that it has its own copy of the data. Then you can learn the length at compile-time from sizeof test_script (which includes the terminating NUL). It will also consume slightly less memory (no need for the pointer) if it's the only occurence of that string literal. However, as it will have its own copy of the data, it cannot share the storage of the string literal (if that is used elsewhere in the code as well).

Related

May a compiler store function-scoped, non-static, const arrays in constant data and avoid per-call initialization?

In reading How are char arrays / strings stored in binary files (C/C++)?, I was thinking about the various ways in which the raw string involved, "Nancy", would appear intact in the resulting binary. That post's case was:
int main()
{
char temp[6] = "Nancy";
printf("%s", temp);
return 0;
}
and obviously, in the general case (where the compiler can't confirm if temp is unmutated), it must actually initialize a stack local array to allow for mutations in the future; the array itself must have space allocated (on the stack, or maybe using registers for truly weird architectures), and it must be populated on each call to the function (let's pretend this isn't main which is called only once in C++ and typically only once in C), to avoid reentrancy issues and the like. Whether it hardcodes the initialization into the assembly, or does a memcpy from the program's constant data section is irrelevant; there is definitely something that must be initialized per-call.
By contrast, if char temp[6] = "Nancy"; was replaced with any of:
const char *temp = "Nancy";
char *temp = "Nancy"; (C only; in C++ the literals are const char[], though in practice they're not mutable in C either)
static const char temp[6] = "Nancy";
static char temp[6] = "Nancy";
then the program need not allocate any array-length-based resources per call (just a pointer variable in cases #1 & #2), and in all but case #4, it can put the data in read-only memory baked into the binary's data constants (#4 would put it in the section for read-write memory, but it could still be baked into the binary and loaded copy-on-write).
My question: Does the standard provided leeway for const char temp[6] = "Nancy"; to behave equivalently to static const char temp[6] = "Nancy";? Both are immutable, and modifying them is against the rules. The only differences I'm aware of would be:
Without static, you'd expect the array's address to be colocated with other locals, not in some other part of program memory (could have affects on cache performance)
Without static, you're technically saying the variable is created and destroyed on each call
I don't see anything obviously broken in terms of observable behavior by the standard:
You can't watch the array exist and cease to exist except in terms of undefined behavior, e.g. returning a pointer to temp, where there are no guarantees
You can't legally compute ptrdiff_t for unrelated variables (only within a given array, plus the one-past-the-end virtual element of said array)
so I'd think the compiler could safely "treat as static" for this case by as-if rules; there's no way to observe the difference, so it can do whatever it feels best.
Is there anything I'm missing where either the C or C++ standard would require some sort of per-call initialization of the const but non-static function scoped array? If the C and C++ standards disagree, I'd like to know that too.
Edit: As Barmar points out in the constants, there are standards-legal ways to detect this behavior in a particular compiler, e.g.:
int myfunc() {
const char temp[6] = "Nancy";
const char temp2[6] = "Nancy";
return temp == temp2; // true if compiler implicitly made them static or combined them, false if not
}
or:
int otherfunc(const char *s) {
const char temp[6] = "Nancy";
return s == temp;
}
int myfunc() {
const char temp[6] = "Nancy";
return otherfunc(temp); // true if compiler implicitly made them shared statics, false if not
}
The standard does not prescribe how local variables are implemented. A stack is a common choice, because it makes recursive functions easy. But leaf functions are easy to detect, and the example is almost a leaf function exact for the side-effect carrying printf.
For such leaf functions, a compiler might choose to implement local variables using statically allocated memory. As the question correctly states, the local variables still need to be constructed and destructed, since they're not static.
In this question, however, char temp[6] has no constructors or destructors. So a compiler which implements local variables in leaf functions as described would have a memcpy to initialize temp.
This memcpy would be visible to the optimizer - it would see the global address, the only use of the same address in printf, and it could then deduce that each memcpy can be moved to program startup. Repeated calls of that same memcpy are idempotent and can be optimized out.
This would cause the generated assembly to be identical to the static case. So the answer to the question is yes. A compiler can indeed generate the same code, and there's even a somewhat plausible way in which it could end up doing so.
Per C11, 6.2.2/6 temp has no linkage, because it is:
a block scope identifier for an object declared without the storage-class specifier extern
and per C11, 6.2.2/2:
each declaration of an identifier with no linkage denotes a unique entity
The "unique entity" implies (I guess) "unique address". Hence, the compiler is required to provide the uniqueness property.
However (speculating), if an optimizer proved that the uniqueness property is not used AND estimated that reading from memory is faster than writing & reading registers (generated code for = "Nancy"), then (I guess) it can make temp to have static storage duration. Note that usually writing & reading registers is much faster than reading from memory.
Extra: temp has block scope, not function scope.
Below the initial answer (which is "out of scope").
C11, 6.8 Statements and blocks, Semantics, 3 (emphasis added):
The initializers of objects that have automatic storage duration, and the variable length array declarators of ordinary identifiers with block scope, are evaluated and the values are stored in the objects (including storing an indeterminate value in objects without an initializer) each time the declaration is reached in the order of execution, as if it were a statement, and within each declaration in the order that declarators appear.
For C++, although I would expect the answer for C to be equivalent:
If the function with the declaration
const char temp[6] = "Nancy";
is entered recursively, then, in contrast to the variant with static, the declaration will cause multiple complete const char[6] objects with overlapping lifetimes to exist.
Applying [intro.object]/9, these objects may then not have overlapping memory and their addresses, as well as the addresses of their array elements, must be distinct. On the other hand with static, there would only be one instance of the array and so taking its address in multiple recursions must yield the same value. This is an observable difference between the version with and without static.
So, if the address of the array or one of its elements is taken or a reference to either formed and escapes the function body, and there are function calls which may potentially be recursive, then the compiler cannot generally treat the declaration with an additional static modifier.
If the compiler can be sure that either e.g. no pointer/reference to the array or its elements escapes the function or that the function cannot possibly be called recursively or that the behavior of the function doesn't depend on the addresses of the array copies, then it could under the as-if rule treat the array as static.
Because the array is a const-qualified automatic storage duration variable, it is impossible to modify values in it or to place new objects into its storage. As long as the addresses are not relevant to the behavior, there is therefore nothing else that could cause an observable difference in behavior.
I don't think anything here is specific to const char arrays. This applies to all const automatic storage duration constant-initialized variables with trivial destruction. constexpr instead of const would not change anything here either, since that doesn't affect the object identity.
Because of [intro.object]/9, both functions myfunc in your edit are also guaranteed to return 0. The two arrays have overlapping lifetimes and therefore may not share the same address. This is therefore not a method to "detect" this optimization. It causes it to become impossible.

Constant Pointers vs Pointers to Constant Strings

I know that if I use a global pointer to a constant string:
const char *h = "hello";
the variable h is stored in the writable data segment. But what if I use
constant pointer to a string
char * const h = "hello";
or a constant pointer to a constant string
const char * const h = "hello";
where would h be stored then?
The c++ language doesn't specify distinction between different areas of storage, other than these both variables have static storage duration. On one system, they might be stored in same area, on another in separate areas.
Given that the variable is const in the latter case, a language implementation might choose to use an area of memory that is protected from being overwritten.
First of all, you keep saying pointer to string, which isn't accurate. They're all pointers to a char, and this char is the beginning of the string.
For your question, where h will be stored when it's a constant pointer ?, It will be stored in a readonly part of your RAM. just like any constant variable like const int.

Lambda: Is capture of const char * from function scope in lambda undefined behaviour [duplicate]

This question already has answers here:
Do pointers to string literals remain valid after a function returns?
(4 answers)
Closed 3 years ago.
I have a lambda which uses a const char * defined in function scope. The lambda is executed after the function returns.
void function() {
const char * UNLOCK_RESULT = "unlockResult";
..
..
TestStep::verifyDictionary(test, [UNLOCK_RESULT](TestStep::Dictionary& dict){
return 1 == dict[UNLOCK_RESULT].toInt();
});
...
}
If I look at the assembly that GCC generates, the actual char array holding the "unlockResult" is statically allocated. But I was wondering if this is guaranteed by the standard as in theory the const char * is of function scope. Is the capture of const char * undefined behaviour or is this allowed by the standard because of some exception around C++ handling of const char * strings.
I know I can change:
const char * UNLOCK_RESULT=""
to
constexpr const char * UNLOCK_RESULT=""
And then the discussion falls away because I don't even need to capture it, but I am curious about the const char * case.
You are mixing 2 concepts. In function scope is variable UNLOCK_RESULT itself, so if you try to use reference or pointer to the variable then you would get UB. But as you use a value of that variable which is pointer to static data you are totally fine as such pointer is not invalidated after function terminates.
If your question is if a pointer to string literal is valid after function terminates, then yes it is valid and guaranteed by C++ standard to have static duration:
Evaluating a string-literal results in a string literal object with
static storage duration, initialized from the given characters as
specified above. Whether all string literals are distinct (that is,
are stored in nonoverlapping objects) and whether successive
evaluations of a string-literal yield the same or a different object
is unspecified.
(emphasis is mine)
Btw it is an old habit to use identifiers for constants with all uppercase as they used to be defined in preprocessor. When you have compile time identifiers for constants uppercase you would hit exactly the same problem this habit tried to minimize.
The behavior is well-defined.
The lambda [UNLOCK_RESULT](TestStep::Dictionary& dict){...} captures the pointer by-value. So UNLOCK_RESULT inside the lambda is a copy of the original UNLOCK_RESULT, the latter does not have to exist for the duration of the lambda.
Now the string literal "unlockResult" has static storage duration, which means it's allocated at program start and remains available for the duration of the program:
All variables which do not have dynamic storage duration, do not have thread storage duration, and are not local have static storage duration. The storage for these entities shall last for the duration of the program.

Pointers in CPP, confusion regarding an example program

Working my way through accelerated c++. There's an example where there are multiple things that I do not understand.
double grade = 88;
static const double numbers[] = { 97,94,90,87,84,80,77,74,70,60,0 };
static const char* const letters[] = { "A+","A","A-","B+","B","B-","C+","C","C-","D","F" };
static const size_t ngrades = sizeof(numbers) / sizeof(*numbers);
for (size_t i = 0;i < ngrades;++i) {
if (grade >= numbers[i]) {
cout << letters[i];
break;
}
}
I don't understand what's going on with static const char* const letters[] = (...). First of all, I always thought a char was a single character delimited by '. Single or more characters delimited by " are for me a string.
The way I've understood pointers is that they're a value that represents the address of an object, although this would be initialized as int* p=&x;. They have the advantage of being able to be used like an iterator (kind of). But I really do not get what is going on here, we declare a pointer letters that gets assigned to it an array of values (not addresses), what does that mean? What would be a reason for doing this?
I know what static is in java, is the meaning similar in CPP? The author writes it means that the compiler will initialize the static values only once. But isn't that done with every variable within a certain scope? I've noticed in debug that I seem to skip over the values after having executed it the first time. But that would imply that even after my program is finished running these static values are still saved? That doesn't seem logical to me.
Regarding your first question, a string literal (like e.g. "A+") is an (read-only) array of characters, and as all arrays they can decay to pointers to their first element, i.e. a pointer to char. The variable letters is an array of constant pointers (the pointer in the array can't be changed) to characters that are constant.
For the third questions, what static means is different depending on which scope you declare the variable in. It's a linkage specifier when used in the global scope, and means that the variable (or function) will not be exported from the translation unit. If use for a variable in local scope (i.e. inside a function) then variable will be shared between invocations of the function, i.e. all calls to the function will have the same variable with the the value of it being kept between calls. Declaring a class-member as static means that it's shared between all object instances of the class.
1) Here
static const char* const letters[] = (...)
letters is actually array of const pointers to const characters. Hence the "".
2) Like said, above variable is array of pointers. So each
element in the array holds address to the string literal
defined in the array. So letters[0] holds address of memory
where "A+" is stored.
3) static has various uses in C++. In your case if its declared inside
function, its value is preserved between successive calls to that function.. More details.
Those are not characters but strings, so your understanding is correct. Letters here are not stored as char but in the form of const char*.
we declare a pointer letters that gets assigned to it an array of values (not addresses)
those are not pointer letters but literal strings, "a" is a literal, its type is const char[], which decays to const char* - which means its a poitner.
3.
I know what static is in java, is the meaning similar in CPP
in general yes, but there are differences - like you cant use static inside functions in java while you can in c++, also you can have global static variables in c++.
The author writes it means that the compiler will initialize the static values only once. But isn't that done with every variable within a certain scope?
non static variables will be created on stack and default initialized (if no explicit initialization is done) on each function run. static variables on the other hand will be initialized on the first run only.
But that would imply that even after my program is finished running these static values are still saved?
thats not true, after program is done ,they are freed - your process is dead then
static const char* const letters[]
This is an array of pointers to characters. In this case the initializer list sets each pointer to the first character of each of the strings specified in the initializer list.
const ... const
The pointers and the characters the pointers point to are constants.
static
If declared inside a function, similar to a global, but with local scope.
Links:
http://msdn.microsoft.com/en-us/library/s1sb61xd.aspx
http://en.wikipedia.org/wiki/Static_(keyword)

c++ const pointer with operator[]

char* const p = "world";
p[2] = 'l';
The first statement creates a string pointed by a const pointer p, and the second statement tries to modify the string, and it is accepted by the compiler, while in the running time, an access violation exception is poped, and could anyone explain why?
So your question is two-fold:
Why does it give an access violation: character literal strings are stored as literals in the CODE pages of your executable program; most modern operating system do not allow changes to the these pages (including MS-windows) thus the protection fault.
Why does the compiler allow it: the const keyword in this context refers to the pointer and not the thing it points at. Code such as p="Hello"; will cause a compiler error as you have declared p as constant (not *p). If you wanted to declare the thing it points to as constant then your declaration should be const char *p.
In the
char* const p = "World";
P points to const character array, which resides in the .rodata memory area. So, one can not modify the data pointed by the p variable and one cannot change the p to point to some other string as well.
char* const p = "world";
This is illegal in the current C++ standard (C++11). Most compilers still accept it because they use the previous C++ standard (C++03) by default, but even there the code is deprecated, and a good compiler with the right warning level should warn about this.
The reason is that the type of the literal "world" is char const[6]. In other words, a literal is always constant and cannot be changed. When you say …
char* const p = "world";
… then the compiler converts the literal into a pointer. This is done implicitly by an operation called “array decay”: a C array can be implicitly converted into a pointer that points to its beginning.
So "world" is converted into a value of type char const*. Notice the const – we still are not allowed to change the literal, even when accessed through the pointer.
Alas, C++03 also allows that literals be assigned to a non-const pointer to provide backwards compatibility with C.
Since this is an interview question, the correct answer is thus: the code is illegal and the compiler shouldn’t allow it. Here’s the corrected code:
char const* const p = "world";
//p[2] = 'l'; // Not allowed!
We’ve used two consts here: the first is required for the literal. The second makes the pointer itself (rather than the pointed-to value) const.
The thing is if you define a string literal like this:
char * p = "some text"
the compiler will prepare memory for pointer only, and the text location
will be by default set to .text section
in the other hand if you define it as:
char p[] = "some text"
the compiler will know that you want the memory for entire character array.
In the first case you can't read the value as the MMU is set to readonly access for the .text section of memory, in the 2nd case you can freely access and modify memory.
Another think (to prevent the runtime error) would be correct describing of the memory the pointer points to.
For the constant pointer it would be:
const char * const p = "blaaaah"
and for the normal one:
const char * p = "blaah"
char* const p = "world";
Here p is a pointer with constant memory address. So this will compile fine as there is no syntax error as per compiler and it throws exception in runtime bcoz as its a constant pointer you can change its value.
C++ reference constant pointers
Const correctness