Pointers in CPP, confusion regarding an example program - c++

Working my way through accelerated c++. There's an example where there are multiple things that I do not understand.
double grade = 88;
static const double numbers[] = { 97,94,90,87,84,80,77,74,70,60,0 };
static const char* const letters[] = { "A+","A","A-","B+","B","B-","C+","C","C-","D","F" };
static const size_t ngrades = sizeof(numbers) / sizeof(*numbers);
for (size_t i = 0;i < ngrades;++i) {
if (grade >= numbers[i]) {
cout << letters[i];
break;
}
}
I don't understand what's going on with static const char* const letters[] = (...). First of all, I always thought a char was a single character delimited by '. Single or more characters delimited by " are for me a string.
The way I've understood pointers is that they're a value that represents the address of an object, although this would be initialized as int* p=&x;. They have the advantage of being able to be used like an iterator (kind of). But I really do not get what is going on here, we declare a pointer letters that gets assigned to it an array of values (not addresses), what does that mean? What would be a reason for doing this?
I know what static is in java, is the meaning similar in CPP? The author writes it means that the compiler will initialize the static values only once. But isn't that done with every variable within a certain scope? I've noticed in debug that I seem to skip over the values after having executed it the first time. But that would imply that even after my program is finished running these static values are still saved? That doesn't seem logical to me.

Regarding your first question, a string literal (like e.g. "A+") is an (read-only) array of characters, and as all arrays they can decay to pointers to their first element, i.e. a pointer to char. The variable letters is an array of constant pointers (the pointer in the array can't be changed) to characters that are constant.
For the third questions, what static means is different depending on which scope you declare the variable in. It's a linkage specifier when used in the global scope, and means that the variable (or function) will not be exported from the translation unit. If use for a variable in local scope (i.e. inside a function) then variable will be shared between invocations of the function, i.e. all calls to the function will have the same variable with the the value of it being kept between calls. Declaring a class-member as static means that it's shared between all object instances of the class.

1) Here
static const char* const letters[] = (...)
letters is actually array of const pointers to const characters. Hence the "".
2) Like said, above variable is array of pointers. So each
element in the array holds address to the string literal
defined in the array. So letters[0] holds address of memory
where "A+" is stored.
3) static has various uses in C++. In your case if its declared inside
function, its value is preserved between successive calls to that function.. More details.

Those are not characters but strings, so your understanding is correct. Letters here are not stored as char but in the form of const char*.
we declare a pointer letters that gets assigned to it an array of values (not addresses)
those are not pointer letters but literal strings, "a" is a literal, its type is const char[], which decays to const char* - which means its a poitner.
3.
I know what static is in java, is the meaning similar in CPP
in general yes, but there are differences - like you cant use static inside functions in java while you can in c++, also you can have global static variables in c++.
The author writes it means that the compiler will initialize the static values only once. But isn't that done with every variable within a certain scope?
non static variables will be created on stack and default initialized (if no explicit initialization is done) on each function run. static variables on the other hand will be initialized on the first run only.
But that would imply that even after my program is finished running these static values are still saved?
thats not true, after program is done ,they are freed - your process is dead then

static const char* const letters[]
This is an array of pointers to characters. In this case the initializer list sets each pointer to the first character of each of the strings specified in the initializer list.
const ... const
The pointers and the characters the pointers point to are constants.
static
If declared inside a function, similar to a global, but with local scope.
Links:
http://msdn.microsoft.com/en-us/library/s1sb61xd.aspx
http://en.wikipedia.org/wiki/Static_(keyword)

Related

May a compiler store function-scoped, non-static, const arrays in constant data and avoid per-call initialization?

In reading How are char arrays / strings stored in binary files (C/C++)?, I was thinking about the various ways in which the raw string involved, "Nancy", would appear intact in the resulting binary. That post's case was:
int main()
{
char temp[6] = "Nancy";
printf("%s", temp);
return 0;
}
and obviously, in the general case (where the compiler can't confirm if temp is unmutated), it must actually initialize a stack local array to allow for mutations in the future; the array itself must have space allocated (on the stack, or maybe using registers for truly weird architectures), and it must be populated on each call to the function (let's pretend this isn't main which is called only once in C++ and typically only once in C), to avoid reentrancy issues and the like. Whether it hardcodes the initialization into the assembly, or does a memcpy from the program's constant data section is irrelevant; there is definitely something that must be initialized per-call.
By contrast, if char temp[6] = "Nancy"; was replaced with any of:
const char *temp = "Nancy";
char *temp = "Nancy"; (C only; in C++ the literals are const char[], though in practice they're not mutable in C either)
static const char temp[6] = "Nancy";
static char temp[6] = "Nancy";
then the program need not allocate any array-length-based resources per call (just a pointer variable in cases #1 & #2), and in all but case #4, it can put the data in read-only memory baked into the binary's data constants (#4 would put it in the section for read-write memory, but it could still be baked into the binary and loaded copy-on-write).
My question: Does the standard provided leeway for const char temp[6] = "Nancy"; to behave equivalently to static const char temp[6] = "Nancy";? Both are immutable, and modifying them is against the rules. The only differences I'm aware of would be:
Without static, you'd expect the array's address to be colocated with other locals, not in some other part of program memory (could have affects on cache performance)
Without static, you're technically saying the variable is created and destroyed on each call
I don't see anything obviously broken in terms of observable behavior by the standard:
You can't watch the array exist and cease to exist except in terms of undefined behavior, e.g. returning a pointer to temp, where there are no guarantees
You can't legally compute ptrdiff_t for unrelated variables (only within a given array, plus the one-past-the-end virtual element of said array)
so I'd think the compiler could safely "treat as static" for this case by as-if rules; there's no way to observe the difference, so it can do whatever it feels best.
Is there anything I'm missing where either the C or C++ standard would require some sort of per-call initialization of the const but non-static function scoped array? If the C and C++ standards disagree, I'd like to know that too.
Edit: As Barmar points out in the constants, there are standards-legal ways to detect this behavior in a particular compiler, e.g.:
int myfunc() {
const char temp[6] = "Nancy";
const char temp2[6] = "Nancy";
return temp == temp2; // true if compiler implicitly made them static or combined them, false if not
}
or:
int otherfunc(const char *s) {
const char temp[6] = "Nancy";
return s == temp;
}
int myfunc() {
const char temp[6] = "Nancy";
return otherfunc(temp); // true if compiler implicitly made them shared statics, false if not
}
The standard does not prescribe how local variables are implemented. A stack is a common choice, because it makes recursive functions easy. But leaf functions are easy to detect, and the example is almost a leaf function exact for the side-effect carrying printf.
For such leaf functions, a compiler might choose to implement local variables using statically allocated memory. As the question correctly states, the local variables still need to be constructed and destructed, since they're not static.
In this question, however, char temp[6] has no constructors or destructors. So a compiler which implements local variables in leaf functions as described would have a memcpy to initialize temp.
This memcpy would be visible to the optimizer - it would see the global address, the only use of the same address in printf, and it could then deduce that each memcpy can be moved to program startup. Repeated calls of that same memcpy are idempotent and can be optimized out.
This would cause the generated assembly to be identical to the static case. So the answer to the question is yes. A compiler can indeed generate the same code, and there's even a somewhat plausible way in which it could end up doing so.
Per C11, 6.2.2/6 temp has no linkage, because it is:
a block scope identifier for an object declared without the storage-class specifier extern
and per C11, 6.2.2/2:
each declaration of an identifier with no linkage denotes a unique entity
The "unique entity" implies (I guess) "unique address". Hence, the compiler is required to provide the uniqueness property.
However (speculating), if an optimizer proved that the uniqueness property is not used AND estimated that reading from memory is faster than writing & reading registers (generated code for = "Nancy"), then (I guess) it can make temp to have static storage duration. Note that usually writing & reading registers is much faster than reading from memory.
Extra: temp has block scope, not function scope.
Below the initial answer (which is "out of scope").
C11, 6.8 Statements and blocks, Semantics, 3 (emphasis added):
The initializers of objects that have automatic storage duration, and the variable length array declarators of ordinary identifiers with block scope, are evaluated and the values are stored in the objects (including storing an indeterminate value in objects without an initializer) each time the declaration is reached in the order of execution, as if it were a statement, and within each declaration in the order that declarators appear.
For C++, although I would expect the answer for C to be equivalent:
If the function with the declaration
const char temp[6] = "Nancy";
is entered recursively, then, in contrast to the variant with static, the declaration will cause multiple complete const char[6] objects with overlapping lifetimes to exist.
Applying [intro.object]/9, these objects may then not have overlapping memory and their addresses, as well as the addresses of their array elements, must be distinct. On the other hand with static, there would only be one instance of the array and so taking its address in multiple recursions must yield the same value. This is an observable difference between the version with and without static.
So, if the address of the array or one of its elements is taken or a reference to either formed and escapes the function body, and there are function calls which may potentially be recursive, then the compiler cannot generally treat the declaration with an additional static modifier.
If the compiler can be sure that either e.g. no pointer/reference to the array or its elements escapes the function or that the function cannot possibly be called recursively or that the behavior of the function doesn't depend on the addresses of the array copies, then it could under the as-if rule treat the array as static.
Because the array is a const-qualified automatic storage duration variable, it is impossible to modify values in it or to place new objects into its storage. As long as the addresses are not relevant to the behavior, there is therefore nothing else that could cause an observable difference in behavior.
I don't think anything here is specific to const char arrays. This applies to all const automatic storage duration constant-initialized variables with trivial destruction. constexpr instead of const would not change anything here either, since that doesn't affect the object identity.
Because of [intro.object]/9, both functions myfunc in your edit are also guaranteed to return 0. The two arrays have overlapping lifetimes and therefore may not share the same address. This is therefore not a method to "detect" this optimization. It causes it to become impossible.

Constant Pointers vs Pointers to Constant Strings

I know that if I use a global pointer to a constant string:
const char *h = "hello";
the variable h is stored in the writable data segment. But what if I use
constant pointer to a string
char * const h = "hello";
or a constant pointer to a constant string
const char * const h = "hello";
where would h be stored then?
The c++ language doesn't specify distinction between different areas of storage, other than these both variables have static storage duration. On one system, they might be stored in same area, on another in separate areas.
Given that the variable is const in the latter case, a language implementation might choose to use an area of memory that is protected from being overwritten.
First of all, you keep saying pointer to string, which isn't accurate. They're all pointers to a char, and this char is the beginning of the string.
For your question, where h will be stored when it's a constant pointer ?, It will be stored in a readonly part of your RAM. just like any constant variable like const int.

Static constant character pointer and why it's used in this fashion

static const char* const test_script = "test_script";
When and why would you use a statement such as the one above?
Does it have any benefits?
Why use a char* instead of a constant character?
A "constant Character pointer" (const char*) is a constant already and cannot be changed; Then why use the word static in the front of it? What good does it do?
A const char *p is not a constant pointer. It's a modifiable pointer to a const char, that is, to a constant character. You can make the pointer point to something else, but you cannot change the character(s) to which it points. In other words, p = x is allowed, but *p = y is not.
A char * const is the opposite: a constant pointer to a mutable character. *p = y is allowed, p = x is not.
A const char * const is both: a constant pointer to a constant character.
Regarding static: this gives the declared variable internal linkage (cannot be accessed by name from outside the source file). Since you're asking about both C and C++, note that this is where they differ.
In C++, a variable declared const and not explicitly declared extern has internal linkage by default. Since the pointer in question is const (I'm talking about the second const), the static is superfluous in C++ and doesn't do anything.
That's not the case in C, where const variables cannot be used as constant expressions and do not have internal linkage by default. So in C, the static is necessary to give test_script internal linkage.
The above discussion of static assumes the declaration is at file scope (C) or namespace scope (C++). If it's inside a function, the meaning of static changes. Without static, it would a normal local variable in the function—each invocation would have its own copy. With static, it receives static storage duration and thus persists between function calls—all invocations share that one copy. Since you're asking about both C and C++, I am not going to discuss class scope.
Finally, you asked "why pointer instead of characters." This way, the pointer points to the actual string literal (probably somewhere in the read-only part of the process's memory). One reason to do that would be if you even need to pass test_script somewhere where a const char * const * (a pointer to a constant pointer to constant character(s)) is expected. Also, if the same string literal appears multiple times in source code, it can be shared.
An alternative would be to declare an array:
const char test_script[] = "test_script";
This would copy the string literal into test_script, guaranteeing that it has its own copy of the data. Then you can learn the length at compile-time from sizeof test_script (which includes the terminating NUL). It will also consume slightly less memory (no need for the pointer) if it's the only occurence of that string literal. However, as it will have its own copy of the data, it cannot share the storage of the string literal (if that is used elsewhere in the code as well).

Declaring arrays: name of the array with spaces

I'm trying to understand a part of my professor's code. He gave an example for a hw assignment but I'm not sure how to understand this part of the code..
Here is the code:
void addTask(TaskOrAssignment tasks[], int& taskCount, char *course, char *description, char *dueDate)
{
tasks[taskCount].course = course;
tasks[taskCount].description = description;
tasks[taskCount].dueDate = dueDate;
taskCount++;
}
Question: Is "tasks[taskCount].course = course;" accessing or declaring a location for char course?
I hope I could get this answered and I'm pretty new to this site too.
Thank you.
tasks[taskCount].course = course;
Let's break this down a piece at a time. First of all, we are using the assignment operator (=) to assign a value of one variable to another variable.
The right hand side is pretty simple, just the variable named course which is declared as a char*.
It is assigned to the variable tasks[taskCount].course. If you look at the parameters of the method, you can see that tasks is declared as an array of TaskOrAssignment objects. So tasks[taskCount] refers to one of the elements of this array. The .course at the end refers to a field named course in that object. Assuming that this code compiles, that field is declared as a char* in the TaskOrAssignment class.
Most likely, both course variables represent a string of characters. (This originates from C.) When all is said and done, both course and tasks[taskCount].course point to the same string buffer.
course is a char pointer, it points to the block of memory in stack or heap storing the '\0' terminated string.
tasks[taskCount].course is also a char pointer, the assignment just lets tasks[taskCount].course point to same memory address that course does.
course is a char* (a pointer to char). Presumably, the course member of TaskOrAssignment is also a char*. All that line does is assign the value of course to the member course of the taskCountth element in the array.
Presumably, the course argument is intended to be a pointer to the first character in a null-terminated C-style string. So after this assignment, the course member of the array element will point at that same C-style string. However, of course, the pointer could really be pointing anywhere.

String initializer and read only section

Suppose that I have an array(local to a function) and a pointer
char a[]="aesdf" and char *b="asdf"
My question is whether in the former case the string literal "aesdf" is stored in read only section and then copied on to the local array or is it similar to
char a[]={'a','e','s','d','f','\0'}; ?
I think that in this case the characters are directly created on the stack but in the earlier case (char a[]="aesdf") the characters are copied from the read only section to the local array.
Will `"aesdf" exist for the entire life of the executable?
From the abstract and formal point of view, each string literal is an independent nameless object with static storage duration. This means, that the initialization char a[] = "aesdf" formally creates literal object "aesdf" and then uses it to initialize the independent array a, i.e. it is not equivalent to char *a = "aesdf", where a pointer is made to point to the string literal directly.
However, since string literals are nameless objects, in the char a[] = "aesdf" variant there's no way to access the independent "aesdf" object before or after the initialization. This means that there's no way for you to "detect" whether this object actually existed. The existence (or non-existence) of that object cannot affect the observable behavior of the program. For this reason, the implementation has all the freedom to eliminate the independent "aesdf" object and initialize the a array in any other way that leads to the expected correct result, i.e. as char a[] = { 'a', 'e', 's', 'd', 'f', '\0' } or as char a[] = { 'a', 'e', "sdf" } or as something else.
First:
char a[]="aesdf";
Assuming this is an automatic local variable, it will allocate 6 bytes on the stack and initialize them with the given characters. How it does this (whether by memcpy from a string literal or loading a byte at a time with inline store instructions, or some other way) is completely implementation-defined. Note that initialization must happen every time the variable comes into scope, so if it's not going to change, this is a very wasteful construct to use.
If this is a static/global variable, it will produce a 6-byte char array with a unique address/storage whose initial contents are the given characters, and which is writable.
Next:
char *b="asdf";
This initializes the pointer b to point to a string literal "asdf", which might or might not share storage with other string literals, and which produces undefined behavior if you write to it.
Both a[] ="aesdf" and char a[]={'a','e','s','d','f','\0'} will be stored in function's run time stack and memory will be released when function returns. but for char* b= "asdf" asdf is stored in readonly section and is referred from there.
char a[] = "aesdf";
char a[] = {'a','e','s','d','f','\0'};
These two lines of code have the same effect. A compiler may choose to implement them the same way, or it may choose to implement them differently.
From the standpoint of a programmer writing code in C, it shouldn't really matter. You can use either and be sure that you will end up with a six element array of char initialized to the specified contents.
Will "aesdf" exist for the entire life of the executable?
Semantically, yes. String literals are char arrays that have static storage duration. An object that has static storage duration has a lifetime of the execution of the program: it is initialized before the program starts, and exists until the program terminates.
However, this doesn't matter at all in your program. That string literal is used to initialize array a. Since you do not obtain a pointer to the string literal itself, it doesn't matter what its actual lifetime of that string literal is or how it is actually stored. The compiler may do whatever it sees fit to do, so long as the array a is correctly initialized.