Scope of variably sized array - c++

Is this always going to run as expected?
char *x;
if (...) {
int len = dynamic_function();
char x2[len];
sprintf(x2, "hello %s", ...);
x = x2;
}
printf("%s\n", x);
// prints hello
How does the compiler (GCC in my case) implement variably sized arrays, in each of C and C++?

No. x2 is local to the if statement's scope and you access it outside of it using a pointer. This results in undefined behaviour.
By the way, VLAs have been made optional in C11 and had never been part of C++. So it's better to avoid it.

The scope is explained here:
Jumping or breaking out of the scope of the array name deallocates the
storage. Jumping into the scope is not allowed; you get an error
message for it.
In your case the array is out of scope.

No, for two separate reasons:
C++: The code isn't valid C++. Arrays in C++ must have a compile-time constant size.
C: No, because the array only lives until the end of the block in which it was declared, and thus dereferencing x is undefined behaviour.
From C11, 6.2.4/2:
If an object is referred to outside of its lifetime, the behavior is undefined.
And 6.2.4/7 says that the variable-length array lives from its declaration until the end of its enclosing scope:
For such an object that does have a variable length array type, its lifetime extends from
the declaration of the object until execution of the program leaves the scope of the
declaration.

Related

May a compiler store function-scoped, non-static, const arrays in constant data and avoid per-call initialization?

In reading How are char arrays / strings stored in binary files (C/C++)?, I was thinking about the various ways in which the raw string involved, "Nancy", would appear intact in the resulting binary. That post's case was:
int main()
{
char temp[6] = "Nancy";
printf("%s", temp);
return 0;
}
and obviously, in the general case (where the compiler can't confirm if temp is unmutated), it must actually initialize a stack local array to allow for mutations in the future; the array itself must have space allocated (on the stack, or maybe using registers for truly weird architectures), and it must be populated on each call to the function (let's pretend this isn't main which is called only once in C++ and typically only once in C), to avoid reentrancy issues and the like. Whether it hardcodes the initialization into the assembly, or does a memcpy from the program's constant data section is irrelevant; there is definitely something that must be initialized per-call.
By contrast, if char temp[6] = "Nancy"; was replaced with any of:
const char *temp = "Nancy";
char *temp = "Nancy"; (C only; in C++ the literals are const char[], though in practice they're not mutable in C either)
static const char temp[6] = "Nancy";
static char temp[6] = "Nancy";
then the program need not allocate any array-length-based resources per call (just a pointer variable in cases #1 & #2), and in all but case #4, it can put the data in read-only memory baked into the binary's data constants (#4 would put it in the section for read-write memory, but it could still be baked into the binary and loaded copy-on-write).
My question: Does the standard provided leeway for const char temp[6] = "Nancy"; to behave equivalently to static const char temp[6] = "Nancy";? Both are immutable, and modifying them is against the rules. The only differences I'm aware of would be:
Without static, you'd expect the array's address to be colocated with other locals, not in some other part of program memory (could have affects on cache performance)
Without static, you're technically saying the variable is created and destroyed on each call
I don't see anything obviously broken in terms of observable behavior by the standard:
You can't watch the array exist and cease to exist except in terms of undefined behavior, e.g. returning a pointer to temp, where there are no guarantees
You can't legally compute ptrdiff_t for unrelated variables (only within a given array, plus the one-past-the-end virtual element of said array)
so I'd think the compiler could safely "treat as static" for this case by as-if rules; there's no way to observe the difference, so it can do whatever it feels best.
Is there anything I'm missing where either the C or C++ standard would require some sort of per-call initialization of the const but non-static function scoped array? If the C and C++ standards disagree, I'd like to know that too.
Edit: As Barmar points out in the constants, there are standards-legal ways to detect this behavior in a particular compiler, e.g.:
int myfunc() {
const char temp[6] = "Nancy";
const char temp2[6] = "Nancy";
return temp == temp2; // true if compiler implicitly made them static or combined them, false if not
}
or:
int otherfunc(const char *s) {
const char temp[6] = "Nancy";
return s == temp;
}
int myfunc() {
const char temp[6] = "Nancy";
return otherfunc(temp); // true if compiler implicitly made them shared statics, false if not
}
The standard does not prescribe how local variables are implemented. A stack is a common choice, because it makes recursive functions easy. But leaf functions are easy to detect, and the example is almost a leaf function exact for the side-effect carrying printf.
For such leaf functions, a compiler might choose to implement local variables using statically allocated memory. As the question correctly states, the local variables still need to be constructed and destructed, since they're not static.
In this question, however, char temp[6] has no constructors or destructors. So a compiler which implements local variables in leaf functions as described would have a memcpy to initialize temp.
This memcpy would be visible to the optimizer - it would see the global address, the only use of the same address in printf, and it could then deduce that each memcpy can be moved to program startup. Repeated calls of that same memcpy are idempotent and can be optimized out.
This would cause the generated assembly to be identical to the static case. So the answer to the question is yes. A compiler can indeed generate the same code, and there's even a somewhat plausible way in which it could end up doing so.
Per C11, 6.2.2/6 temp has no linkage, because it is:
a block scope identifier for an object declared without the storage-class specifier extern
and per C11, 6.2.2/2:
each declaration of an identifier with no linkage denotes a unique entity
The "unique entity" implies (I guess) "unique address". Hence, the compiler is required to provide the uniqueness property.
However (speculating), if an optimizer proved that the uniqueness property is not used AND estimated that reading from memory is faster than writing & reading registers (generated code for = "Nancy"), then (I guess) it can make temp to have static storage duration. Note that usually writing & reading registers is much faster than reading from memory.
Extra: temp has block scope, not function scope.
Below the initial answer (which is "out of scope").
C11, 6.8 Statements and blocks, Semantics, 3 (emphasis added):
The initializers of objects that have automatic storage duration, and the variable length array declarators of ordinary identifiers with block scope, are evaluated and the values are stored in the objects (including storing an indeterminate value in objects without an initializer) each time the declaration is reached in the order of execution, as if it were a statement, and within each declaration in the order that declarators appear.
For C++, although I would expect the answer for C to be equivalent:
If the function with the declaration
const char temp[6] = "Nancy";
is entered recursively, then, in contrast to the variant with static, the declaration will cause multiple complete const char[6] objects with overlapping lifetimes to exist.
Applying [intro.object]/9, these objects may then not have overlapping memory and their addresses, as well as the addresses of their array elements, must be distinct. On the other hand with static, there would only be one instance of the array and so taking its address in multiple recursions must yield the same value. This is an observable difference between the version with and without static.
So, if the address of the array or one of its elements is taken or a reference to either formed and escapes the function body, and there are function calls which may potentially be recursive, then the compiler cannot generally treat the declaration with an additional static modifier.
If the compiler can be sure that either e.g. no pointer/reference to the array or its elements escapes the function or that the function cannot possibly be called recursively or that the behavior of the function doesn't depend on the addresses of the array copies, then it could under the as-if rule treat the array as static.
Because the array is a const-qualified automatic storage duration variable, it is impossible to modify values in it or to place new objects into its storage. As long as the addresses are not relevant to the behavior, there is therefore nothing else that could cause an observable difference in behavior.
I don't think anything here is specific to const char arrays. This applies to all const automatic storage duration constant-initialized variables with trivial destruction. constexpr instead of const would not change anything here either, since that doesn't affect the object identity.
Because of [intro.object]/9, both functions myfunc in your edit are also guaranteed to return 0. The two arrays have overlapping lifetimes and therefore may not share the same address. This is therefore not a method to "detect" this optimization. It causes it to become impossible.

Does std::construct_at make an array member of a union active?

Look at this example (godbolt):
#include <memory>
union U {
int i[1];
};
constexpr int foo() {
U u;
std::construct_at(u.i, 1);
return u.i[0];
}
constexpr int f = foo();
gcc and msvc successfully compile this, but clang complains:
construction of subobject of member 'i' of union with no active member is not allowed in a constant expression
Which compiler is right? I think that clang is wrong here, because C++20's implicit creation of objects (P0593) should make this program valid (because the array should be implicitly created, which should make u.i active), but I'm not sure.
U u;
does not begin the lifetime of the i subobject. Beginning the lifetime of a variable other than an array of type char, unsigned char or std::byte is also not one of the operations specifically qualified to be implicitly creating objects. [basic.intro.object]/13
Therefore at this point the i member is definitively not active and the array object is not alive.
As mentioned by #Sebastian in the question comments, calling std::construct_at on u.i is then not allowed in a constant expression since [expr.const]/6.1 specifically requires the provided pointer to point to an object whose lifetime began during the evaluation of the constant expression (or be storage returned from std::allocator).
Therefore Clang seems correct to me. There is an open GCC bug for exactly this issue here.
I am not sure that this is the intended interpretation though, since Clang does accept the program if a non-array type is used for the member, which by my reasoning would equally not be allowed.
The relevant wording is a consequence of this comment.
In any case, it is not intended that implicit object creation happens in constant expressions although it currently seems to (question), see CWG issue 2469.
Without implicit object creation as explained below, the use in a context requiring a constant expression should then be ill-formed independently of the std::construct_at restriction and the following considerations.
Whether the construction has defined behavior if used outside a constant expression context, I am not entirely sure.
But I think that std::construct_at being specified to be equivalent to a new-expression means that it will call operator new, which is specified to implicitly create objects in the storage it returns. [basic.intro.object]/13
Whether operator new must be an allocating operator new call for this to be true is not fully clear to me. I think the wording "in the returned region of storage" does not require it.
i is of type int[1], which is an implicit-lifetime type, which are implicitly created if necessary by operations qualified to implicitly create objects. [basic.types.general]/9
Therefore I think that construct_at will implicitly create the an array object at u.i and begin its lifetime. I also think that [basic.intro.object]/2 will guarantee that this object becomes subobject of the union, so that u.i will refer to it.
However, given that the storage operated on is only the size of a single int and assuming that this is also the storage meant in [basic.intro.object]/13, only an array of length 1 can be implicitly created in it. Therefore if i was of length larger than 1, the implicitly created array could not overlap exactly with the member and can therefore not become subobject of the union.
In this case implicit object creation could not make return u.i[0]; defined behavior.
There is a discussion of this issue here which seems to indicate that already forming the pointer to the first element of u.i outside its lifetime is UB, in which case the construct_at version with array would more directly have UB, but at least compilers accept both auto x = u.i; and auto x = &u.i[0]; in a constant expression without complaining. As mentioned in the comments to this answer, this also seems wrong.
All in all I think that std::construct_at can generally not be used to activate an array member of a union.
But, suppose you replace the std::construct_at call with
u.i[0] = 1;
Then this assignment will begin the lifetime of the array object, as described in [class.union.general]/6. This is not disqualified for constant expressions since C++20 either. Therefore the code will not be ill-formed if used in a context requiring a constant expression, nor will it have undefined behavior outside of that.
Deferred initialization of an array in a constexpr environment can be achieved with std::allocator and std::construct_at():
#include <memory>
constexpr int foo() {
std::allocator<int> alloc;
int* i; // pointer to first element of array
i = alloc.allocate(100); // allocate memory for 100 elements
std::construct_at(&i[0], 1); // initialize first element (first call of constructor)
int r = i[0];
alloc.deallocate(i, 100); // deallocate before leaving
return r;
}
constexpr int f = foo();
Pointers to the relevant standard clauses:
allocate:
[utilities.memory.default.allocator.members]/5: std::allocator<>::allocate() obtains storage by calling operator ::new and starts the lifetime of the array object, but not the lifetime of the array elements themselves.
[expr.const]/5.19: Explicitly allows std::allocator<>::allocate() in constant expressions, if the memory is deallocated again within the constant expression
construct_at:
[algorithms.specialized.construct]/2: std::construct_at() effectively calls placement new.
[expr.const]/6: Explicitly allows std::construct_at in constant expressions, if the memory is allocated by std::allocator
deallocate:
[expr.const]/5.19: Explicitly allows std::allocator<>::deallocate() in constant expressions, if the memory was allocated before within the constant expression

Local Variables Being Passed ( C++)

I have encountered a problem in my learning of C++, where a local variable in a function is being passed to the local variable with the same name in another function, both of these functions run in main().
When this is run,
#include <iostream>
using namespace std;
void next();
void again();
int main()
{
int a = 2;
cout << a << endl;
next();
again();
return 0;
}
void next()
{
int a = 5;
cout << a << endl;
}
void again()
{
int a;
cout << a << endl;
}
it outputs:
2
5
5
I expected that again() would say null or 0 since 'a' is declared again there, and yet it seems to use the value that 'a' was assigned in next().
Why does next() pass the value of local variable 'a' to again() if 'a' is declared another time in again()?
http://en.cppreference.com/w/cpp/language/ub
You're correct, an uninitialized variable is a no-no. However, you are allowed to declare a variable and not initialize it until later. Memory is set aside to hold the integer, but what value happens to be in that memory until you do so can be anything at all. Some compilers will auto-initialize variables to junk values (to help you catch bugs), some will auto-initialize to default values, and some do nothing at all. C++ itself promises nothing, hence it's undefined behavior. In your case, with your simple program, it's easy enough to imagine how the compiler created assembly code that reused that exact same piece of memory without altering it. However, that's blind luck, and even in your simple program isn't guaranteed to happen. These types of bugs can actually be fairly insidious, so make it a rule: Be vigilant about uninitialized variables.
An uninitialized non-static local variable of *built-in type (phew! that was a mouthful) has an indeterminate value. Except for the char types, using that value yields formally Undefined Behavior, a.k.a. UB. Anything can happen, including the behavior that you see.
Apparently with your compiler and options, the stack area that was used for a in the call of next, was not used for something else until the call of again, where it was reused for the a in again, now with the same value as before.
But you cannot rely on that. With UB anything, or nothing, can happen.
* Or more generally of POD type, Plain Old Data. The standard's specification of this is somewhat complicated. In C++11 it starts with §8.5/11, “If no initializer is specified for an object, the object is default-initialized; if no initialization is performed, an object with automatic or dynamic storage duration has indeterminate value.”. Where “automatic … storage duration” includes the case of local non-static variable. And where the “no initialization” can occur in two ways via §8.5/6 that defines default initialization, namely either via a do-nothing default constructor, or via the object not being of class or array type.
This is completely coincidental and undefined behavior.
What's happened is that you have two functions called immediately after one another. Both will have more or less identical function prologs and both reserve a variable of exactly the same size on the stack.
Since there are no other variables in play and the stack is not modified between the calls, you just happen to end up with the local variable in the second function "landing" in the same place as the previous function's local variable.
Clearly, this is not good to rely upon. In fact, it's a perfect example of why you should always initialize variables!

Is it safe to return a VLA?

The following code uses the heap:
char* getResult(int length) {
char* result = new char[length];
// Fill result...
return result;
}
int main(void) {
char* result = getResult(100);
// Do something...
delete result;
}
So result has to be deleted somewhere, preferably by the owner.
The code below, from what I understand, use an extension called VLA, which is part of C99, and not part of the C++ standard (but supported by GCC, and other compilers):
char* getResult(int length) {
char result[length];
// Fill result...
return result;
}
int main(void) {
char* result = getResult(100);
// Do something...
}
Am I correct in assuming that result is still allocated on the stack in this case?
Is result a copy, or is it a reference to garbage memory? Is the above code safe?
Am I correct in assuming that result is still allocated on the stack in this case?
Correct. VLA have automatic storage duration.
Is result a copy, or is it a reference to garbage memory? Is the above code safe?
The code is not safe. The address returned by getResult is an invalid address. Dereferencing the pointer invokes undefined behavior.
You can not return it, in C it will have automatic storage duration(the object will not be valid once you leave the scope) and returning it will invoke undefined behavior, from the C99 draft standard section 6.2.4 Storage durations of objects paragraph 6:
For such an object that does have a variable length array type, its lifetime extends from the declaration of the object until execution of the program leaves the scope of the
declaration.27) If the scope is entered recursively, a new instance of the object is created each time. The initial value of the object is indeterminate.
In C++ we have to rely on the docs since it is extension in that case and the gcc docs on VLA says that it is deallocated when the scope ends:
These arrays are declared like any other automatic arrays, but with a length that is not a constant expression. The storage is allocated at the point of declaration and deallocated when the block scope containing the declaration exits.
When you return from getResult(), the char array result will go out of scope and be deallocated along with the stack frame for the function call. If you want to preserve the function structure, you'll have to call malloc and later free the memory.

Hiding name of int variable in c++

Out of curiosity, I've tried this code, resulting from an interview question[*]
int main(int argc, char *argv[])
{
int a = 1234;
printf("Outer: %d\n", a);
{
int a(a);
printf("Inner: %d\n", a);
}
}
When compiled on Linux (both g++ 4.6.3 and clang++ 3.0) it outputs:
Outer: 1234
Inner: -1217375632
However on Windows (VS2010) it prints:
Outer: 1234
Inner: 1234
The rationale would be that, until the copy-constructor of the second 'a' variable has finished, the first 'a' variable is still accessible. However I'm not sure if this is standard behaviour, or just a(nother) Microsoft quirk.
Any idea?
[*] The actual question was:
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
{
// Not at global scope here
int a = 1234;
{
int a;
// how do you set this a to the value of the containing scope a ?
}
}
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
Unless the outer scope can be explicitly named you cannot do this. You can explicitly name the global scope, namespace scopes, and class scopes, but not function or block statement scopes.
C++11 [basic.scope.pdecl 3.3.2 p1 states:
The point of declaration for a name is immediately after its complete declarator (Clause 8) and before its initializer (if any), except as noted below. [ Example:
int x = 12;
{ int x = x; }
Here the second x is initialized with its own (indeterminate) value. —end example ]
MSVC correctly implements this example, however it does not correctly implement this when the initializer uses parentheses instead of assignment syntax. There's a bug filed about this on microsoft connect.
Here's an example program with incorrect behavior in VS as a result of this bug.
#include <iostream>
int foo(char) { return 0; }
int foo(int) { return 1; }
int main()
{
char x = 'a';
{
int x = foo(static_cast<decltype(x)>(0));
std::cout << "'=' initialization has correct behavior? " << (x?"Yes":"No") << ".\n";
}
{
int x(foo(static_cast<decltype(x)>(0)));
std::cout << "'()' initialization has correct behavior? " << (x?"Yes":"No") << ".\n";
}
}
C++ includes the following note.
[ Note: Operations involving indeterminate values may cause undefined behavior. —end note ]
However, this note indicates that operations may cause undefined behavior, not that they necessarily do. The above linked bug report includes an acknowledgement from Microsoft that this is a bug and not that the program triggers undefined behavior.
Edit: And now I've changed the example so that the object with indeterminate value is only 'used' in an unevaluated context, and I believe that this absolutely rules out the possibility of undefined behavior on any platform, while still demonstrating the bug in Visual Studio.
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
If you want to get technical about the wording, it's pretty easy. A "temporary" has a specific meaning in C++ (see §12.2); any named variable you create is not a temporary. As such, you can just create a local variable (which is not a temporary) initialized with the correct value:
int a = 1234;
{
int b = a;
int a = b;
}
An even more defensible possibility would be to use a reference to the variable in the outer scope:
int a = 1234;
{
int &ref_a = a;
int a = ref_a;
}
This doesn't create an extra variable at all -- it just creates an alias to the variable at the outer scope. Since the alias has a different name, we retain access to the variable at the outer scope, without defining a variable (temporary or otherwise) to do so. Many references are implemented as pointers internally, but in this case (at least with a modern compiler and optimization turned on) I'd expect it not to be -- that the alias really would just be treated as a different name referring to the variable at the outer scope (and a quick test with VC++ shows that it works this way -- the generated assembly language doesn't use ref_a at all).
Another possibility along the same lines would be like this:
const int a = 10;
{
enum { a_val = a };
int a = a_val;
}
This is somewhat similar to the reference, except that in this case there's not even room for argument about whether a_val could be called a variable -- it absolutely is not a variable. The problem is that an enumeration can only be initialized with a constant expression, so we have to define the outer variable as const for it to work.
I doubt any of these is what the interviewer really intended, but all of them answer the question as stated. The first is (admittedly) a pure technicality about definitions of terms. The second might still be open to some argument (many people think of references as variables). Though it restricts the scope, there's no room for question or argument about the third.
What you are doing, initializing a variable with itself, is undefined behavior. All your test cases got it right, this is not a quirk. An implementation could also initialize a to 123456789 and it would still be standard.
Update: The comments on this answer point that initializing a variable with itself is not undefined behavior, but trying to read such variable is.
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
You can't. As soon as the identical name is declared, the outer name is inaccessible for the rest of the scope. You'd need a copy or an alias of the outer variable, which means you'd need a temporary variable.
I'm surprised that, even with the warning level cranked up, VC++ doesn't complain on this line:
int a(a);
Visual C++ will sometimes warn you about hiding a variable (maybe that's only for members of derived classes). It's also usually pretty good about telling you you're using a value before it has been initialized, which is the case here.
Looking at the code generated, it happens to initialize the inner a to the same value of the outer a because that's what's left behind in a register.
I had a look at the standard, it's actually a grey area but here's my 2 cents...
3.1 Declarations and definitions [basic.def]
A declaration introduces names into a translation unit or redeclares names introduced by previous declarations.
A declaration is a definition unless... [non relevant cases follow]
3.3.1 Point of declaration
The point of declaration for a name is immediately after its complete declarator and before its initializer (if any), except as noted below [self-assignment example].
A nonlocal name remains visible up to the point of declaration of the local name that hides it.
Now, if we assume that this is the point of declaration of the inner 'a' (3.3.1/1)
int a (a);
^
then the outer 'a' should be visible up to that point (3.3.1/2), where the inner 'a' is defined.
Problem is that in this case, according to 3.1/2, a declaration IS a definition. This means the inner 'a' should be created. Until then, I can't understand from the standard whether the outer 'a' is still visible or not. VS2010 assumes that it is, and all that falls within the parentheses refers to the outer scope. However clang++ and g++ treat that line as a case of self-assignment, which results in undefined behaviour.
I'm not sure which approach is correct, but I find VS2010 to be more consistent: the outer scope is still visible until the inner 'a' is fully created.