Is it legal to have code (which compiles to assembly instructions in the global scope of a C++ source file? Previously, I was under the impression that except for the Ch programming language (an interpreter for C/C++), you cannot have code in the global scope of a C++ program. Code/instructions can only be inside the body of a function [period]!
However, I found out that you can call functions before the main function in C++ by assigning them to a global variable! This would involve a call instruction in the assembly code. Also you can assign the sum of two variables into another global variable outside the assembly code. That would almost certainly involve an add and mov instructions. And if that code is in the global scope, outside of any function, when would it execute? If the + were an overloaded operator of a class type, if it had a print statement inside of it, when would that execute?
Also can you have loops and control structures in the global scope of a C++ program, and if so when are they executed? What about for other program constructs, are they allowed in the global scope, and under what circumstances, and when are they executed?
This question is in a response to a previous question that I posted:
Why can't I assign values to global variables outside a function in C?
The answerer to the original question asserts that you cannot have code outside of the scope of a function. I think that I do not fully understand the rules for this, and what exactly is considered to be "code" or not.
int foo() {
cout << "Inside foo()" << endl;
return 5;
}
// is this not code?
int global_variable = foo();
// How does this statement work without generating code?
int a = 4;
int b = 5;
int c = a + b;
int main() {
// The program behaves as if the statements above were executed from
// top to bottom before entering the main() function.
cout << "Inside main()" << endl;
cout << "int c = " << c << endl;
return 0;
}
The answer on the question you linked to was talking in a simple way, not using strict C++ naming for constructs.
Being more pedantic, C++ does not have "code". C++ has declarations, definitions, and statements. Statements are what you probably think of as "code": if, for, expressions, etc.
Only declarations and definitions can appear at global scope. Of course, definitions can include expressions. int a = 5; defines a global variable, initialized by an expression.
But you can't just have a random statement/expression at global scope, like a = 5;. That is, expressions can be part of definitions, but an expression is not a definition.
You can call functions before main of course. Global variable constructors and initializers which are too complex to be executed at compile time have to run before main. For example:
int b = []()
{
std::cout << "Enter a number.\n";
int temp;
std::cin >> temp;
return temp;
}();
The compiler can't do that at compile-time; it's interactive. And C++ requires that all global variables are initialized before main begins. So the compiler will have to invoke code pre-main. Which is perfectly legal.
Every C++ compilation and execution system has some mechanism for invoking code before and after main. Globals have to be initialized, and object constructors may need to be called to do that initialization. After main completes, global variables have to be destroyed, which means destructors need to be called.
Related
Is it legal to have code (which compiles to assembly instructions in the global scope of a C++ source file? Previously, I was under the impression that except for the Ch programming language (an interpreter for C/C++), you cannot have code in the global scope of a C++ program. Code/instructions can only be inside the body of a function [period]!
However, I found out that you can call functions before the main function in C++ by assigning them to a global variable! This would involve a call instruction in the assembly code. Also you can assign the sum of two variables into another global variable outside the assembly code. That would almost certainly involve an add and mov instructions. And if that code is in the global scope, outside of any function, when would it execute? If the + were an overloaded operator of a class type, if it had a print statement inside of it, when would that execute?
Also can you have loops and control structures in the global scope of a C++ program, and if so when are they executed? What about for other program constructs, are they allowed in the global scope, and under what circumstances, and when are they executed?
This question is in a response to a previous question that I posted:
Why can't I assign values to global variables outside a function in C?
The answerer to the original question asserts that you cannot have code outside of the scope of a function. I think that I do not fully understand the rules for this, and what exactly is considered to be "code" or not.
int foo() {
cout << "Inside foo()" << endl;
return 5;
}
// is this not code?
int global_variable = foo();
// How does this statement work without generating code?
int a = 4;
int b = 5;
int c = a + b;
int main() {
// The program behaves as if the statements above were executed from
// top to bottom before entering the main() function.
cout << "Inside main()" << endl;
cout << "int c = " << c << endl;
return 0;
}
The answer on the question you linked to was talking in a simple way, not using strict C++ naming for constructs.
Being more pedantic, C++ does not have "code". C++ has declarations, definitions, and statements. Statements are what you probably think of as "code": if, for, expressions, etc.
Only declarations and definitions can appear at global scope. Of course, definitions can include expressions. int a = 5; defines a global variable, initialized by an expression.
But you can't just have a random statement/expression at global scope, like a = 5;. That is, expressions can be part of definitions, but an expression is not a definition.
You can call functions before main of course. Global variable constructors and initializers which are too complex to be executed at compile time have to run before main. For example:
int b = []()
{
std::cout << "Enter a number.\n";
int temp;
std::cin >> temp;
return temp;
}();
The compiler can't do that at compile-time; it's interactive. And C++ requires that all global variables are initialized before main begins. So the compiler will have to invoke code pre-main. Which is perfectly legal.
Every C++ compilation and execution system has some mechanism for invoking code before and after main. Globals have to be initialized, and object constructors may need to be called to do that initialization. After main completes, global variables have to be destroyed, which means destructors need to be called.
int main()
{
int x;
int x;
return 0;
}
This snippet will give an error:
error: redeclaration of 'int x'
But this one, works just fine:
int main()
{
while(true)
{
int x;
{...}
}
return 0;
}
Which is the reason why in the second example, declaring x in the loop does not redeclare it every iteration? I was expecting the same error as in the first case.
You're smashing together two related but different concepts and thus your confusion. But it's not your fault, as most of the didactic material on the matter doesn't necessarily make the distinction between the two concepts clear.
Variable scope: This is the region of the source code where a symbol (a variable) is visible.
Object lifetime: this is the time during the runtime of the program that an object exists.
This brings us to other two concepts we need to understand and differentiate between:
A variable is a compile-time concept: it is a name (a symbol) that refers to objects
An object is an "entity" at runtime, an instance of a type.
Let's go back to your examples:
int main()
{
int x{};
int x{};
}
Here you try to declare 2 different variables inside the same scope. Those two variables would have the same name inside the function scope, so when you would "say" the name x (when you would write the symbol x) you wouldn't know to which variable you would refer. So it is not allowed.
int main()
{
while(true)
{
int x{};
}
}
Here you declare one variable inside the while body scope. When you write x inside this scope you refer to this variable. No ambiguity. No problems. Valid code. Note this discussion about declarations and variable scope applies at compile-time, i.e. we are discussion about what meaning has the code that you write.
When we discus object lifetime however we are talking about runtime, i.e. the moment when your compiled binary runs. Yes, at runtime, multiple objects will be created and destroyed in succession. All of these objects are referred by the symbol x inside the while body-scope. But the lifetimes of these objects don't overlap. I.e. when you run your program the first object is created. In the source code it is named x inside the while-body scope. Then the object is destroyed, the loop is re-entered and a new object is created. It is also named x in the source code inside the while-body scope. Then it is destroyed, the while is re-entered, a new object is created and so on.
To give you an expanded view on the matter, consider you can have:
A variable which never refers to an object
{ // not global scope
int a; // <-- not initialized
}
The variable a is not initialized, so an object will never be created at runtime.
An object without a name:
int get_int();
{
int sum = get_int() + get_int();
}
There are two objects returned by the two calls to the function get_int(). Those objects are temporaries. They are never named.
Multiple objects instantiated inside the scope of a variable.
This is an advanced, contrived example, at the fringe of C++. Just showing that it is technically possible:
{
int x;
// no object
new (&x) int{11}; // <-- 1st object created. It is is named `x`. Start of its lifetime
// 1st object is alive. Named x
x.~int(); // <-- 1st object destructed. End of its lifetime
// no object
new (&x) int{24}; // <-- 2nd object created. Also named `x`
// 2nd object alive. Named x
} // <-- implicit end of the lifetime of 2nd object.
The scope of x is the whole block delimited by the curly brackets. However there are two object with different non-overlapping lifetimes inside this scope.
Declarations don't happen at runtime, they happen at compile-time.
In your code int x; is declared once, because it appears in the code once. It doesn't matter if it's in a loop or not.
If the loop runs more than once, x will be created and then destroyed more than once. It's allowed, of course.
In c++, the curly braces represent the beginning {, and end }, of a scope. If you have a scope nested inside another scope, for example a while loop inside a function, then the previously declared variables from the outer scope are available inside the new loop scope.
You are not allowed to declare a variable with the same name inside the same scope twice. That's why the compiler creates the first error
error: redeclaration of 'int x'
But in the case of the loop, the variable is only declared once. It doesn't matter that the loop will reuse that declaration multiple times. Just like a function being called multiple times doesn't create a redeclaration error for the variables it declares.
Variables in loops stay in loops, and are not redeclared. This is because, to the best of my knowledge, Loops are just sets of instructions with jump points, and not actually the same code in the .exe file written over and over again.
If you try to make a for loop:
for(int x = 0; x < 10000; ++x);
The loop just reuses the same variable, then removes the variable after use. This is helpful so that loops, and do{}while(condition)'s can actally hold values, and not just have to redeclare, and reset each variable.
Back to the original question, I am going to ask my own: Why are you trying to redeclare a variable? You could just do this:
int main(void){
int variable = 0;
...
variable = 2;
}
Instead of this:
int main(void){
int variable = 0;
...
int variable = 2;
}
Curly brackets in C/C++ represent blocks of code. These blocks of code do not transfer information to other blocks. I recommend looking further into resources on blocks of code, but one resource has been linked here.
Unlike code that is written in "interpreted languages", variables require declaration. Moreover, your code is read sequentially in compiled languages.
Block example:
while (true) {
int i = 0;
}
This declaration is stored with the block somewhere in memory.
The storage is assigned to an "int" variable type. This data member has a certain memory capacity. Redeclaring, in essence, tries to override information stored in that particular block. These blocks are set aside at compilation time.
Is it legal to have code (which compiles to assembly instructions in the global scope of a C++ source file? Previously, I was under the impression that except for the Ch programming language (an interpreter for C/C++), you cannot have code in the global scope of a C++ program. Code/instructions can only be inside the body of a function [period]!
However, I found out that you can call functions before the main function in C++ by assigning them to a global variable! This would involve a call instruction in the assembly code. Also you can assign the sum of two variables into another global variable outside the assembly code. That would almost certainly involve an add and mov instructions. And if that code is in the global scope, outside of any function, when would it execute? If the + were an overloaded operator of a class type, if it had a print statement inside of it, when would that execute?
Also can you have loops and control structures in the global scope of a C++ program, and if so when are they executed? What about for other program constructs, are they allowed in the global scope, and under what circumstances, and when are they executed?
This question is in a response to a previous question that I posted:
Why can't I assign values to global variables outside a function in C?
The answerer to the original question asserts that you cannot have code outside of the scope of a function. I think that I do not fully understand the rules for this, and what exactly is considered to be "code" or not.
int foo() {
cout << "Inside foo()" << endl;
return 5;
}
// is this not code?
int global_variable = foo();
// How does this statement work without generating code?
int a = 4;
int b = 5;
int c = a + b;
int main() {
// The program behaves as if the statements above were executed from
// top to bottom before entering the main() function.
cout << "Inside main()" << endl;
cout << "int c = " << c << endl;
return 0;
}
The answer on the question you linked to was talking in a simple way, not using strict C++ naming for constructs.
Being more pedantic, C++ does not have "code". C++ has declarations, definitions, and statements. Statements are what you probably think of as "code": if, for, expressions, etc.
Only declarations and definitions can appear at global scope. Of course, definitions can include expressions. int a = 5; defines a global variable, initialized by an expression.
But you can't just have a random statement/expression at global scope, like a = 5;. That is, expressions can be part of definitions, but an expression is not a definition.
You can call functions before main of course. Global variable constructors and initializers which are too complex to be executed at compile time have to run before main. For example:
int b = []()
{
std::cout << "Enter a number.\n";
int temp;
std::cin >> temp;
return temp;
}();
The compiler can't do that at compile-time; it's interactive. And C++ requires that all global variables are initialized before main begins. So the compiler will have to invoke code pre-main. Which is perfectly legal.
Every C++ compilation and execution system has some mechanism for invoking code before and after main. Globals have to be initialized, and object constructors may need to be called to do that initialization. After main completes, global variables have to be destroyed, which means destructors need to be called.
class TestClass
{
public:
int x, y;
TestClass();
};
TestClass::TestClass()
{
cout << "TestClass ctor" << endl;
}
TestClass GlobalTestClass;
int main()
{
cout << "main " << endl;
return 0;
}
In this code as known first output will be "TestClass ctor".
My question: Does the ctor function call codes run before main() (I mean, does entry point change ?) , or right after main() and before the executable statements or is there different mechanism ? (Sorry for English)
The question as stated is not very meaningful, because
main is not the machine code level entry point to the program (main is called by the same code that e.g. executes constructors of non-local static class type variables), and
the notion of “right after main() and before the executable statements” isn't very meaningful: the executable statements are in main.
Generally, in practice you can count on the static variable being initialized before main in your concrete example, but the standard does not guarantee that.
C++11 §3.6.2/4:
” It is implementation-defined whether the dynamic initialization of a non-local variable with static storage
duration is done before the first statement of main. If the initialization is deferred to some point in time
after the first statement of main, it shall occur before the first odr-use (3.2) of any function or variable
defined in the same translation unit as the variable to be initialized.
It's a fine point whether the automatic call of main qualifies as odr-use. I would think not, because one special property of main is that it cannot be called (in valid code), and its address cannot be taken.
Apparently the above wording is in support of dynamically loaded libraries, and constitutes the only support of such libraries.
In particular, I would be wary of using thread local storage with dynamically loaded libraries, at least until I learned more about the guarantees offered by the standard in that respect.
Yes, objects with static storage duration are initialized before main(), so indeed the "entry point" is before main(). See e.g. http://en.cppreference.com/w/cpp/language/storage_duration
In fact (although not recommended), you can run a whole program with a trivial main(){}, by putting everything in global instances.
Why is the following code prints "xxY"? Shouldn't local variables live in the scope of whole function? Can I use such behavior or this will be changed in future C++ standard?
I thought that according to C++ Standard 3.3.2 "A name declared in a block is local to that block. Its potential scope begins at its point of declaration and ends at the end of its declarative region."
#include <iostream>
using namespace std;
class MyClass
{
public:
MyClass( int ) { cout << "x" << endl; };
~MyClass() { cout << "x" << endl; };
};
int main(int argc,char* argv[])
{
MyClass (12345);
// changing it to the following will change the behavior
//MyClass m(12345);
cout << "Y" << endl;
return 0;
}
Based on the responses I can assume that MyClass(12345); is the expression (and scope). That is make sense. So I expect that the following code will print "xYx" always:
MyClass (12345), cout << "Y" << endl;
And it is allowed to make such replacement:
// this much strings with explicit scope
{
boost::scoped_lock lock(my_mutex);
int x = some_func(); // should be protected in multi-threaded program
}
// mutex released here
//
// I can replace with the following one string:
int x = boost::scoped_lock (my_mutex), some_func(); // still multi-thread safe
// mutex released here
The object created in your
MyClass(12345);
is a temporary object which is only alive in that expression;
MyClass m(12345);
is an object which is alive for the entire block.
You're actually creating an object without keeping it in scope, so it is destroyed right after it is created. Hence the behavior you're experiencing.
You can't access the created object so why would the compiler keep it around?
To answer your other questions. The following is the invocation of the comma operator. It creates a MyClass temporary, which includes calling its constructor. It then evaluates the second expression cout << "Y" << endl which will print out the Y. It then, at the end of the full expression, will destroy the temporary, which will call its destructor. So your expectations were right.
MyClass (12345), cout << "Y" << endl;
For the following to work, you should add parentheses, because the comma has a predefined meaning in declarations. It would start declaring a function some_func returning an int and taking no parameters and would assign the scoped_lock object to x. Using parentheses, you say that the whole thing is a single comma operator expression instead.
int x = (boost::scoped_lock (my_mutex), some_func()); // still multi-thread safe
It should be noted that the following two lines are equivalent. The first does not create a temporary unnamed object using my_mutex as the constructor argument, but instead the parentheses around the name are redundant. Don't let the syntax confuse you.
boost::scoped_lock(my_mutex);
boost::scoped_lock my_mutex;
I've seen misuse of the terms scope and lifetime.
Scope is where you can refer to a name without qualifying its name. Names have scopes, and objects inherit the scope of the name used to define them (thus sometimes the Standard says "local object"). A temporary object has no scope, because it's got no name. Likewise, an object created by new has no scope. Scope is a compile time property. This term is frequently misused in the Standard, see this defect report, so it's quite confusing to find a real meaning.
Lifetime is a runtime property. It means when the object is set up and ready for use. For a class type object, the lifetime begins when the constructor ends execution, and it ends when the destructor begins execution. Lifetime is often confused with scope, although these two things are completely different.
The lifetime of temporaries is precisely defined. Most of them end lifetime after evaluation of the full expression they are contained in (like, the comma operator of above, or an assignment expression). Temporaries can be bound to const references which will lengthen their lifetime. Objects being thrown in exceptions are temporaries too, and their lifetime ends when there is no handler for them anymore.
You quoted standard correctly. Let me emphasize:
A name declared in a block is local to that block. Its potential scope begins at its point of declaration and ends at the end of its declarative region.
You didn't declare any name, actually. Your line
MyClass (12345);
does not even contain a declaration! What it contains is an expression that creates an instance of MyClass, computes the expression (however, in this particular case there's nothing to compute), and casts its result to void, and destroys the objects created there.
A less confusing thing would sound like
call_a_function(MyClass(12345));
You saw it many times and know how it works, don't you?