I have a rather idiosyncratic C++14 initialization issue. I'm linking against a C library which provides main(). That library makes use of a global array that I'm meant to define, something like this:
extern int array[];
int main(void)
{
for (int i = 0; array[i] != -1; i++) {
printf("%d\n", i);
}
}
The expected use is to initialize the array, e.g. int array[] = {1, 2, 3, -1}. But I want to be able to dynamically initialize it. I'm using C++14, so my thought was to create a global object with a constructor that writes to the array, like this:
int array[2];
struct Init {
Init() {
array[0] = 1;
array[1] = -1;
}
}
Init init;
But the C++14 standard says this:
It is implementation-defined whether the dynamic initialization of a non-local variable with static storage duration is done before the first statement of main. If the initialization is deferred to some point in time after the first statement of main, it shall occur before the first odr-use (3.2) of any function or variable defined in the same translation unit as the variable to be initialized.
Am I reading this correctly that it's possible that when main() runs, my object won't yet have been constructed, meaning that my array won't be initialized (or rather, will be default initialized, not by my class)?
If so, is there any way around this? I have no control over the library which provides main(). Am I out of luck in wanting to set the array's value at startup time, before main() runs?
it shall occur before the first odr-use (3.2) of any function or variable defined in the same translation unit
Emphasis "any function or variable". The "variable" part includes, of course, the global variable in question.
In other words, the initialization is guaranteed to occur before odr-use of the variable. Problem solved.
So, as soon as anything in your C++ program sneezes in the direction of "any function or variable" in the translation unit where the global array is defined, they will materialize into existence.
You are guaranteed before the array is odr-used, the initialized of init takes place, since it's in the same translation unit, and that will take care of whipping array into shape, before it gets odr-used.
Related
When a range based for loop is used to iterate over an array, without binding a reference to each element, does this constitute an ODR-use of the array?
Example:
struct foo {
static constexpr int xs[] = { 1, 2, 3 };
};
int test(void) {
int sum = 0;
for (int x : foo::xs) // x is not a reference!
sum += x;
return sum;
}
// Definition, if needed
///constexpr foo::xs;
Is the definition of foo::xs necessary?
While this code, and variations of it, appear to work fine, that doesn't mean the definition is never necessary. Lack of a definition of an ODR-used variable rarely produces a diagnostic, since the variable could be defined in another translation unit. A linker error is the usual result, but it's quite possible to not get the error if the compiler is able to optimize away every use, which is what happens to the above code. The compiler effectively reduces test() to return 6;.
Binding a reference to an element would be an ODR-use, but that isn't done.
I was under impression that subscripting an array was not ODR-use in C++14 or later. But the range based for is not exactly subscripting.
In C++17, I believe this example avoids the problem because constexpr class data members are implicitly inline. And thus the declaration in the class also serves to define xs and an additional namespace scope definition isn't needed to satisfy ODR.
Some additional versions of the same question:
What if we use std::array?
constexpr std::array<int, 3> xs = { 1, 2, 3 };
What if we avoid the range based for?
for (int i = 0; i < foo::xs.size(); i++) sum += foo::xs[i];
Is the definition of foo::xs necessary?
Yes, because as NathanOliver points out in the comments, a reference is implicitly bound to foo::xs by the range-based for loop. When you bind a reference to an object, the object is odr-used. The same would occur if an std::array were used rather than a raw array.
What if we avoid the range based for?
Well, if you use a raw array and get its size using a technique that doesn't require binding a reference to it, then you can avoid providing a definition:
for (int i = 0; i < sizeof(foo::xs)/sizeof(foo::xs[0]); i++) {
sum += foo::xs[i];
}
In this case, the references inside sizeof are not odr-uses because they are unevaluated, and foo::xs is an element of the set of potential results of foo::xs[i]; this latter expression is of non-class type and immediately undergoes an lvalue-to-rvalue conversion, so it does not odr-use foo::xs.
I accidentally created a bug in a program by self-referencing in an array. Here's a very simplified demo program similar in concept:
#include <iostream>
using namespace std;
int kTest[] = {
kTest[0]
};
int main() {
cout << kTest[0] << endl;
}
I was surprised that I received neither a compiler error or even a warning on this code! In my case it ended up producing unpredictable output. Is it accessing garbage memory?
I was curious about under what circumstances this would have well-defined output (if ever!).
Edit: Does it make a difference if kTest is static? What about const? Both?
int kTest[] = {
kTest[0]
};
is similar to, if not exactly same as
int x = x;
It will be undefined behavior, if declared locally in a function.
It seems to be well defined when kTest is a global variable. See the other answer for additional details.
I'm not so sure this is undefined. Quote from the current draft:
[basic.start.static]/3
If constant initialization is not performed, a variable with static
storage duration ([basic.stc.static]) or thread storage duration
([basic.stc.thread]) is zero-initialized ([dcl.init]). Together,
zero-initialization and constant initialization are called static
initialization; all other initialization is dynamic initialization.
Static initialization shall be performed before any dynamic initialization takes place.
To me it looks like kTest is already zero-initialized when the dynamic initialization starts, so it may be defined to initialize to 0.
I have encountered a problem in my learning of C++, where a local variable in a function is being passed to the local variable with the same name in another function, both of these functions run in main().
When this is run,
#include <iostream>
using namespace std;
void next();
void again();
int main()
{
int a = 2;
cout << a << endl;
next();
again();
return 0;
}
void next()
{
int a = 5;
cout << a << endl;
}
void again()
{
int a;
cout << a << endl;
}
it outputs:
2
5
5
I expected that again() would say null or 0 since 'a' is declared again there, and yet it seems to use the value that 'a' was assigned in next().
Why does next() pass the value of local variable 'a' to again() if 'a' is declared another time in again()?
http://en.cppreference.com/w/cpp/language/ub
You're correct, an uninitialized variable is a no-no. However, you are allowed to declare a variable and not initialize it until later. Memory is set aside to hold the integer, but what value happens to be in that memory until you do so can be anything at all. Some compilers will auto-initialize variables to junk values (to help you catch bugs), some will auto-initialize to default values, and some do nothing at all. C++ itself promises nothing, hence it's undefined behavior. In your case, with your simple program, it's easy enough to imagine how the compiler created assembly code that reused that exact same piece of memory without altering it. However, that's blind luck, and even in your simple program isn't guaranteed to happen. These types of bugs can actually be fairly insidious, so make it a rule: Be vigilant about uninitialized variables.
An uninitialized non-static local variable of *built-in type (phew! that was a mouthful) has an indeterminate value. Except for the char types, using that value yields formally Undefined Behavior, a.k.a. UB. Anything can happen, including the behavior that you see.
Apparently with your compiler and options, the stack area that was used for a in the call of next, was not used for something else until the call of again, where it was reused for the a in again, now with the same value as before.
But you cannot rely on that. With UB anything, or nothing, can happen.
* Or more generally of POD type, Plain Old Data. The standard's specification of this is somewhat complicated. In C++11 it starts with §8.5/11, “If no initializer is specified for an object, the object is default-initialized; if no initialization is performed, an object with automatic or dynamic storage duration has indeterminate value.”. Where “automatic … storage duration” includes the case of local non-static variable. And where the “no initialization” can occur in two ways via §8.5/6 that defines default initialization, namely either via a do-nothing default constructor, or via the object not being of class or array type.
This is completely coincidental and undefined behavior.
What's happened is that you have two functions called immediately after one another. Both will have more or less identical function prologs and both reserve a variable of exactly the same size on the stack.
Since there are no other variables in play and the stack is not modified between the calls, you just happen to end up with the local variable in the second function "landing" in the same place as the previous function's local variable.
Clearly, this is not good to rely upon. In fact, it's a perfect example of why you should always initialize variables!
Is this always going to run as expected?
char *x;
if (...) {
int len = dynamic_function();
char x2[len];
sprintf(x2, "hello %s", ...);
x = x2;
}
printf("%s\n", x);
// prints hello
How does the compiler (GCC in my case) implement variably sized arrays, in each of C and C++?
No. x2 is local to the if statement's scope and you access it outside of it using a pointer. This results in undefined behaviour.
By the way, VLAs have been made optional in C11 and had never been part of C++. So it's better to avoid it.
The scope is explained here:
Jumping or breaking out of the scope of the array name deallocates the
storage. Jumping into the scope is not allowed; you get an error
message for it.
In your case the array is out of scope.
No, for two separate reasons:
C++: The code isn't valid C++. Arrays in C++ must have a compile-time constant size.
C: No, because the array only lives until the end of the block in which it was declared, and thus dereferencing x is undefined behaviour.
From C11, 6.2.4/2:
If an object is referred to outside of its lifetime, the behavior is undefined.
And 6.2.4/7 says that the variable-length array lives from its declaration until the end of its enclosing scope:
For such an object that does have a variable length array type, its lifetime extends from
the declaration of the object until execution of the program leaves the scope of the
declaration.
Out of curiosity, I've tried this code, resulting from an interview question[*]
int main(int argc, char *argv[])
{
int a = 1234;
printf("Outer: %d\n", a);
{
int a(a);
printf("Inner: %d\n", a);
}
}
When compiled on Linux (both g++ 4.6.3 and clang++ 3.0) it outputs:
Outer: 1234
Inner: -1217375632
However on Windows (VS2010) it prints:
Outer: 1234
Inner: 1234
The rationale would be that, until the copy-constructor of the second 'a' variable has finished, the first 'a' variable is still accessible. However I'm not sure if this is standard behaviour, or just a(nother) Microsoft quirk.
Any idea?
[*] The actual question was:
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
{
// Not at global scope here
int a = 1234;
{
int a;
// how do you set this a to the value of the containing scope a ?
}
}
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
Unless the outer scope can be explicitly named you cannot do this. You can explicitly name the global scope, namespace scopes, and class scopes, but not function or block statement scopes.
C++11 [basic.scope.pdecl 3.3.2 p1 states:
The point of declaration for a name is immediately after its complete declarator (Clause 8) and before its initializer (if any), except as noted below. [ Example:
int x = 12;
{ int x = x; }
Here the second x is initialized with its own (indeterminate) value. —end example ]
MSVC correctly implements this example, however it does not correctly implement this when the initializer uses parentheses instead of assignment syntax. There's a bug filed about this on microsoft connect.
Here's an example program with incorrect behavior in VS as a result of this bug.
#include <iostream>
int foo(char) { return 0; }
int foo(int) { return 1; }
int main()
{
char x = 'a';
{
int x = foo(static_cast<decltype(x)>(0));
std::cout << "'=' initialization has correct behavior? " << (x?"Yes":"No") << ".\n";
}
{
int x(foo(static_cast<decltype(x)>(0)));
std::cout << "'()' initialization has correct behavior? " << (x?"Yes":"No") << ".\n";
}
}
C++ includes the following note.
[ Note: Operations involving indeterminate values may cause undefined behavior. —end note ]
However, this note indicates that operations may cause undefined behavior, not that they necessarily do. The above linked bug report includes an acknowledgement from Microsoft that this is a bug and not that the program triggers undefined behavior.
Edit: And now I've changed the example so that the object with indeterminate value is only 'used' in an unevaluated context, and I believe that this absolutely rules out the possibility of undefined behavior on any platform, while still demonstrating the bug in Visual Studio.
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
If you want to get technical about the wording, it's pretty easy. A "temporary" has a specific meaning in C++ (see §12.2); any named variable you create is not a temporary. As such, you can just create a local variable (which is not a temporary) initialized with the correct value:
int a = 1234;
{
int b = a;
int a = b;
}
An even more defensible possibility would be to use a reference to the variable in the outer scope:
int a = 1234;
{
int &ref_a = a;
int a = ref_a;
}
This doesn't create an extra variable at all -- it just creates an alias to the variable at the outer scope. Since the alias has a different name, we retain access to the variable at the outer scope, without defining a variable (temporary or otherwise) to do so. Many references are implemented as pointers internally, but in this case (at least with a modern compiler and optimization turned on) I'd expect it not to be -- that the alias really would just be treated as a different name referring to the variable at the outer scope (and a quick test with VC++ shows that it works this way -- the generated assembly language doesn't use ref_a at all).
Another possibility along the same lines would be like this:
const int a = 10;
{
enum { a_val = a };
int a = a_val;
}
This is somewhat similar to the reference, except that in this case there's not even room for argument about whether a_val could be called a variable -- it absolutely is not a variable. The problem is that an enumeration can only be initialized with a constant expression, so we have to define the outer variable as const for it to work.
I doubt any of these is what the interviewer really intended, but all of them answer the question as stated. The first is (admittedly) a pure technicality about definitions of terms. The second might still be open to some argument (many people think of references as variables). Though it restricts the scope, there's no room for question or argument about the third.
What you are doing, initializing a variable with itself, is undefined behavior. All your test cases got it right, this is not a quirk. An implementation could also initialize a to 123456789 and it would still be standard.
Update: The comments on this answer point that initializing a variable with itself is not undefined behavior, but trying to read such variable is.
How you'd initialise a variable within a scope with the value of an identically named variable in the containing scope without using a temporary or global variable?
You can't. As soon as the identical name is declared, the outer name is inaccessible for the rest of the scope. You'd need a copy or an alias of the outer variable, which means you'd need a temporary variable.
I'm surprised that, even with the warning level cranked up, VC++ doesn't complain on this line:
int a(a);
Visual C++ will sometimes warn you about hiding a variable (maybe that's only for members of derived classes). It's also usually pretty good about telling you you're using a value before it has been initialized, which is the case here.
Looking at the code generated, it happens to initialize the inner a to the same value of the outer a because that's what's left behind in a register.
I had a look at the standard, it's actually a grey area but here's my 2 cents...
3.1 Declarations and definitions [basic.def]
A declaration introduces names into a translation unit or redeclares names introduced by previous declarations.
A declaration is a definition unless... [non relevant cases follow]
3.3.1 Point of declaration
The point of declaration for a name is immediately after its complete declarator and before its initializer (if any), except as noted below [self-assignment example].
A nonlocal name remains visible up to the point of declaration of the local name that hides it.
Now, if we assume that this is the point of declaration of the inner 'a' (3.3.1/1)
int a (a);
^
then the outer 'a' should be visible up to that point (3.3.1/2), where the inner 'a' is defined.
Problem is that in this case, according to 3.1/2, a declaration IS a definition. This means the inner 'a' should be created. Until then, I can't understand from the standard whether the outer 'a' is still visible or not. VS2010 assumes that it is, and all that falls within the parentheses refers to the outer scope. However clang++ and g++ treat that line as a case of self-assignment, which results in undefined behaviour.
I'm not sure which approach is correct, but I find VS2010 to be more consistent: the outer scope is still visible until the inner 'a' is fully created.