Related
In this question Will a static variable always use up memory? it is stated that compilers are allowed to optimize away a static variable if the address is never taken, e.g. like following:
void f() {
static int i = 3;
printf( "%d", i );
}
If there exists a function which takes its arguments by reference, is the compiler still allowed to optimize away the variable, e.g. as in
void ref( int & i ) {
printf( "%d", i );
}
void f() {
static int i = 3;
g( i );
}
Is the situation different for the "perfect forwarding" case. Here the function body is empty on purpose:
template< typename T >
void fwd( T && i ) {
}
void f() {
static int i = 3;
fwd( i );
}
Furthermore, would the compiler be allowed to optimize the call in the following case. (The function body is empty on purpose again):
void ptr( int * i ) {
}
void f() {
static int i = 3;
ptr( &i );
}
My questions arise from the fact, that references are not a pointer by the standard - but implemented as one usually.
Apart from, "is the compiler allowed to?" I am actually more interested in whether compilers do this kind of optimization?
that compilers are allowed to optimize away a static variable if the address is never taken
You seem to concentrated on the wrong part of the answer. The answer states:
the compiler can do anything it wants to your code so long as the observable behavior is the same
The end. You can take the address, don't take it, calculate the meaning of life and calculate how to heal cancer, the only thing that matters is observable effect. As long as you don't actually heal cancer (or output the results of calculations...), all calculations are just no-op.
f there exists a function which takes its arguments by reference, is the compiler still allowed to optimize away the variable
Yes. The code is just putc('3').
Is the situation different for the "perfect forwarding" case
No. The code is still just putc('3').
would the compiler be allowed to optimize the call in the following case
Yes. This code has no observable effect, contrary to the previous ones. The call to f() can just be removed.
in whether compilers do this kind of optimization?
Copy your code to https://godbolt.org/ and inspect the assembly code. Even with no experience in assembly code, you will see differences with different code and compilers.
Choose x86 gcc (trunk) and remember to enable optimizations -O. Copy code with static, then remove static - did the code change? Repeat for all code snippets.
Compilers are allowed to optimize out variables under the "as-if" rule, meaning that the compiler is allowed to do any optimization that doesn't alter the observable behaviour of the program. Whether the optimization actually occurs depends on how good the compiler's optimizer is, what optimization level you request, and whether the optimization belongs to a class of optimizations that actually improve performance (humans are not very good at predicting this).
In all of the examples you gave, the as-if rule gives the compiler latitude to eliminate the static variable.
In example 1, the definition of f is equivalent to void f() { printf("%d", 3); }. Since this has the exact same observable behaviour as the f you wrote, the compiler is allowed to replace one by the other, optimizing out the variable.
In example 2, since fwd does nothing, the definition of f is equivalent to void f() {}. Again, the as-if rule allows the compiler to replace the f you wrote with this empty function.
Example 3 is very similar to example 2 in terms of the implications of the as-if rule.
If you want to see whether a compiler will actually perform these optimizations, Godbolt is very useful. For example, if you look here, you'll see that at -O2, both GCC and Clang will perform the optimization described for example 1. They probably do this by first inlining ref into f.
How does an inline function differ from a preprocessor macro?
Preprocessor macros are just substitution patterns applied to your code. They can be used almost anywhere in your code because they are replaced with their expansions before any compilation starts.
Inline functions are actual functions whose body is directly injected into their call site. They can only be used where a function call is appropriate.
Now, as far as using macros vs. inline functions in a function-like context, be advised that:
Macros are not type safe, and can be expanded regardless of whether they are syntatically correct - the compile phase will report errors resulting from macro expansion problems.
Macros can be used in context where you don't expect, resulting in problems
Macros are more flexible, in that they can expand other macros - whereas inline functions don't necessarily do this.
Macros can result in side effects because of their expansion, since the input expressions are copied wherever they appear in the pattern.
Inline function are not always guaranteed to be inlined - some compilers only do this in release builds, or when they are specifically configured to do so. Also, in some cases inlining may not be possible.
Inline functions can provide scope for variables (particularly static ones), preprocessor macros can only do this in code blocks {...}, and static variables will not behave exactly the same way.
First, the preprocessor macros are just "copy paste" in the code before the compilation. So there is no type checking, and some side effects can appear
For example, if you want to compare 2 values:
#define max(a,b) ((a<b)?b:a)
The side effects appear if you use max(a++,b++) for example (a or b will be incremented twice).
Instead, use (for example)
inline int max( int a, int b) { return ((a<b)?b:a); }
The inline functions are expanded by the compiler where as the macros are expanded by the preprocessor which is a mere textual substitution.
Hence,
There is no type checking during macro invocation while type checking is done during function call.
Undesired results and inefficiency may occur during macro expansion due to reevaluation of arguments and order of operations. For example:
#define MAX(a,b) ((a)>(b) ? (a) : (b))
int i = 5, j = MAX(i++, 0);
would result in
int i = 5, j = ((i++)>(0) ? (i++) : (0));
The macro arguments are not evaluated before macro expansion
#include <stdio.h>
#define MUL(a, b) a*b
int main()
{
// The macro is expended as 2 + 3 * 3 + 5, not as 5*8
printf("%d", MUL(2+3, 3+5));
return 0;
}
// Output: 16
The return keyword cannot be used in macros to return values as in the case of functions.
Inline functions can be overloaded.
The tokens passed to macros can be concatenated using operator ## called Token-Pasting operator.
Macros are generally used for code reuse where as inline functions are used to eliminate the time overhead (excess time) during function call (avoiding a jump to a subroutine).
The key difference is type checking. The compiler will check whether what you pass as input values is of types that can be passed into the function. That's not true with preprocessor macros - they are expanded prior to any type checking and that can cause severe and hard to detect bugs.
Here are several other less obvious points outlined.
To add another difference to those already given: you can't step through a #define in the debugger, but you can step through an inline function.
Macros are ignoring namespaces. And that makes them evil.
inline functions are similar to macros (because the function code is expanded at the point of the call at compile time), inline functions are parsed by the compiler, whereas macros are expanded by the preprocessor. As a result, there are several important differences:
Inline functions follow all the protocols of type safety enforced on normal functions.
Inline functions are specified using the same syntax as any other function except that they include the inline keyword in the function declaration.
Expressions passed as arguments to inline functions are evaluated once.
In some cases, expressions passed as arguments to macros can be evaluated more than once.
http://msdn.microsoft.com/en-us/library/bf6bf4cf.aspx
macros are expanded at pre-compile time, you cannot use them for debugging, but you can use inline functions.
-- good article:
http://www.codeguru.com/forum/showpost.php?p=1093923&postcount=1
;
To know the difference between macros and inline functions, first we should know what exactly they are and when we should use them.
FUNCTIONS:
int Square(int x)
{
return(x*x);
}
int main()
{
int value = 5;
int result = Square(value);
cout << result << endl;
}
Function calls have overhead associated with them. After the function finishes executing it needs to know where to return to, so it stores the return address on the stack before calling the function. For small applications this might not be a problem, but in, say, a financial application, where thousands of transactions are happening every second, a function call might be too expensive.
MACROS:
# define Square(x) x*x;
int main()
{
int value = 5;
int result = Square(value);
cout << result << endl;
}
Macros are applied in the preprocessing stage. During this stage the statements written with #define keywords will be replaced or expanded
int result = Square(x*x)
But macros can cause unexpected behavior.
#define Square(x) x*x
int main()
{
int val = 5;
int result = Square(val + 1);
cout << result << endl;
}
Here the output is 11, not 36.
INLINE FUNCTIONS:
inline int Square(int x)
{
return x * x;
}
int main()
{
int val = 5;
int result = Square(val + 1);
cout << result << endl;
}
Output: 36
The inline keyword requests that the compiler replace the function call with the body of the function. Here the output is correct because it first evaluates the expression and then uses the result to perform the body of the function. Inline functions reduce the function call overhead as there is no need to store a return address or function arguments to the stack.
Comparison Between Macros and Inline Functions:
Macros work through text substitution, whereas inline functions duplicate the logic of a function.
Macros are error prone due to substitution while inline functions are safe to use.
Macros can't be assigned to function pointers; inline functions can.
Macros are difficult to use with multiple lines of code, whereas inline functions are not.
In C++ macros cannot be used with member functions whereas inline function could be.
CONCLUSION:
Inline functions are sometimes more useful than macros, as they are safe to use, but can also reduce function call overhead.
The inline keyword is a request to the compiler, certain functions won't be inlined like:
large functions
functions having too many conditional arguments
recursive code and code with loops etc.
which is a good thing, because it allows the compiler to determine whether it would be better to do things another way.
An inline function will maintain value semantics, whereas a preprocessor macro just copies the syntax. You can get very subtle bugs with a preprocessor macro if you use the argument multiple times - for example if the argument contains mutation like "i++" having that execute twice is quite a surprise. An inline function will not have this problem.
A inline functuion behaves syntactically just like a normal function, providing type safety and a scope for function local variables and access to class-members if it is a method.
Also when calling inline methods you must adhere to private/protected restrictions.
In GCC (I'm not sure about others), declaring a function inline, is just a hint to the compiler. It is still up to the compiler at the end of the day to decide whether or not it includes the body of the function whenever it is called.
The difference between in-line functions and preprocessor macros is relatively large. Preprocessor macros are just text replacement at the end of the day. You give up a lot of the ability for the compiler to perform checking on type checking on the arguments and return type. Evaluation of the arguments is much different (if the expressions you pass into the functions have side-effects you'll have a very fun time debugging). There are subtle differences about where functions and macros can be used. For example if I had:
#define MACRO_FUNC(X) ...
Where MACRO_FUNC obviously defines the body of the function. Special care needs to be taken so it runs correctly in all cases a function can be used, for example a poorly written MACRO_FUNC would cause an error in
if(MACRO_FUNC(y)) {
...body
}
A normal function could be used with no problem there.
From the perspective of coding, an inline function is like a function. Thus, the differences between an inline function and a macro are the same as the differences between a function and a macro.
From the perspective of compiling, an inline function is similar to a macro. It is injected directly into the code, not called.
In general, you should consider inline functions to be regular functions with some minor optimization mixed in. And like most optimizations, it is up to the compiler to decide if it actually cares to apply it. Often the compiler will happily ignore any attempts by the programmer to inline a function, for various reasons.
inline functions will behave as a function call if there exists any iterative or recursive statement in it, so as to prevent repeated execution of instructions. Its quite helpful to save the overall memory of your program.
#include<iostream>
using namespace std;
#define NUMBER 10 //macros are preprocessed while functions are not.
int number()
{
return 10;
}
/*In macros, no type checking(incompatible operand, etc.) is done and thus use of micros can lead to errors/side-effects in some cases.
However, this is not the case with functions.
Also, macros do not check for compilation error (if any). Consider:- */
#define CUBE(b) b*b*b
int cube(int a)
{
return a*a*a;
}
int main()
{
cout<<NUMBER<<endl<<number()<<endl;
cout<<CUBE(1+3); //Unexpected output 10
cout<<endl<<cube(1+3);// As expected 64
return 0;
}
Macros are typically faster than functions as they don’t involve actual function call overhead.
Some Disadvantages of macros:
There is no type checking.Difficult to debug as they cause simple replacement.Macro don’t have namespace, so a macro in one section of code can affect other section. Macros can cause side effects as shown in above CUBE() example.
Macros are usually one liner. However, they can consist of more than one line.There are no such constraints in functions.
void f() means that f returns nothing. If void returns nothing, then why we use it? What is the main purpose of void?
When C was invented the convention was that, if you didn't specify the return type, the compiler automatically inferred that you wanted to return an int (and the same holds for parameters).
But often you write functions that do stuff and don't need to return anything (think e.g. about a function that just prints something on the screen); for this reason, it was decided that, to specify that you don't want to return anything at all, you have to use the void keyword as "return type".
Keep in mind that void serves also other purposes; in particular:
if you specify it as the list of parameters to a functions, it means that the function takes no parameters; this was needed in C, because a function declaration without parameters meant to the compiler that the parameter list was simply left unspecified. In C++ this is no longer needed, since an empty parameters list means that no parameter is allowed for the function;
void also has an important role in pointers; void * (and its variations) means "pointer to something left unspecified". This is useful if you have to write functions that must store/pass pointers around without actually using them (only at the end, to actually use the pointer, a cast to the appropriate type is needed).
also, a cast to (void) is often used to mark a value as deliberately unused, suppressing compiler warnings.
int somefunction(int a, int b, int c)
{
(void)c; // c is reserved for future usage, kill the "unused parameter" warning
return a+b;
}
This question has to do with the history of the language: C++ borrowed from C, and C used to implicitly type everything untyped as int (as it turned out, it was a horrible idea). This included functions that were intended as procedures (recall that the difference between functions and procedures is that function invocations are expressions, while procedure invocations are statements). If I recall it correctly from reading the early C books, programmers used to patch this shortcoming with a #define:
#define void int
This convention has later been adopted in the C standard, and the void keyword has been introduced to denote functions that are intended as procedures. This was very helpful, because the compiler could now check if your code is using a return value from a function that wasn't intended to return anything, and to warn you about functions that should return but let the control run off the end instead.
In imperative programming languages such as C, C++, Java, etc., functions and methods of type void are used for their side effects. They do not produce a meaningful value to return, but they influence the program state in one of many possible ways. E.g., the exit function in C returns no value, but it has the side effect of aborting the application. Another example, a C++ class may have a void method that changes the value of its instance variables.
void() means return nothing.
void doesn't mean nothing. void is a type to represent nothing. That is a subtle difference : the representation is still required, even though it represents nothing.
This type is used as function's return type which returns nothing. This is also used to represent generic data, when it is used as void*. So it sounds amusing that while void represents nothing, void* represents everything!
Because sometimes you dont need a return value. That's why we use it.
If you didn't have void, how would you tell the compiler that a function doesn't return a value?
Cause consider some situations where you may have to do some calculation on global variables and put results in global variable or you want to print something depending on arguments , etc.. In these situations you can use the method which dont return value.. i.e.. void
Here's an example function:
struct SVeryBigStruct
{
// a lot of data here
};
SVeryBigStruct foo()
{
SVeryBigStruct bar;
// calculate something here
return bar;
}
And now here's another function:
void foo2(SVeryBigStruct& bar) // or SVeryBigStruct* pBar
{
bar.member1 = ...
bar.member2 = ...
}
The second function is faster, it doesn't have to copy whole struct.
probably to tell the compiler " you dont need to push and pop all cpu-registers!"
Sometimes it can be used to print something, rather than to return it. See http://en.wikipedia.org/wiki/Mutator_method#C_example for examples
Functions are not required to return a value. To tell the compiler that a function does not return a value, a return type of void is used.
I have some short defines in one of my headers like this:
#define ROUND_DOWN(a,b) (a)-(a)%(b)
e.g.
ROUND_DOWN(178,32) = 160
But if I pass this to it:
ROUND_DOWN(160*2, 32);
then it gets compiled like this?
(160*2)-(160*2)%(32),
which is just more processing as it does 160*2 twice..
I'm wondering if inline functions behave in the same way? e.g.
inline int RoundDown(int a, int b)
{
return (a)-(a)%(b)
}
Would 160*2 get stored in "int a" as 320 and then the calculation would work, or would it behave the same as the define?
A better example is calling:
RoundDown((x+x2)*zoom, tile_width);
Do “#define” and inline behave the same?
No they dont!
There are a number of differences between a macro and a inline function.
- No of times of Evaluation
Expressions passed as arguments to inline functions are evaluated once.
In some cases, expressions passed as arguments to macros can be evaluated more than once.
Every time you use an argument in a macro, that argument is evaluated.
A Code sample:
#define max(a,b) (a>b?a:b)
int main()
{
int a = 0;
int b = 1;
int c = max(a++, b++);
cout << a << endl << b << endl;
return 0;
}
The intention probably was to print 1 and 2, but macro expands to:
int c = a++ > b++ ? a++ : b++;
b gets incremented twice, and the program prints 1 and 3.
- Who evaluates them
Inline functions are evaluated by the compiler while Macros are evaluated at pre-compilation by precompiler.
- Type checking
Inline functions follow all the protocols of type safety enforced on normal functions.
Argument types are checked, and necessary conversions are performed correctly.
The compiler performs return type checking, function signature before putting inline function into symbol table.
They can be overloaded to perform the right kind of operation for the right kind of data.
Macros are more error prone as compared to inline functions. the The parameters are not typed (the macro works for any objects of arithmetic type).
No error checking is done during compilation.
A Code Sample:
#define MAX(a, b) ((a < b) ? b : a)
int main( void)
{
cout << "Maximum of 10 and 20 is " << MAX("20", "10") << endl;
return 0;
}
One can pass strings to a macro that does some integer arithmetic and a macro won't complain!
- Suggestion or Command?
Inline is just a suggestion to the compiler. It is the compiler’s decision whether to expand the function inline or not.
Macros will always be expanded.
- How about Debugging?
Inline functions can be debugged easily because you can put a break point at the inline function definition and step into the method for debugging step by step.
Macros can not be used for debugging as they are expanded at pre-compile time.
First, you should pretty much assume that all constant expressions are evaluated at compile-time, so that multiplication never survives to be executed when you run the program.
Second, you can't depend on inline having any effect at all, it's just a hint to the compiler, not a requirement.
But even if the function is not inlined, the expression would not be evaluated twice since argument passing requires it to be evaluated before the body of the function runs.
#defines are simple textual substitutions, so (as you noticed) you may need to be careful with parentheses, etc. inline parameters are parsed normally.
There's a related issue with respect to conditions.
Nominally, the function argument 160*2 is evaluated exactly once, and the result is then used in the body of the function, whereas the macro evaluates 160*2 twice. If the argument expression has side-effects, then you can see this[*]: ROUND_DOWN(printf("hi!\n"), 1); vs RoundDown(printf("hi!\n"), 1);
In practice, whether the function is inlined or the macro expanded, it's just integer arithmetic in the expression, with no side-effects. An optimizing compiler can work out the result of the whole macro/function call, and just stick the answer in the emitted code. So you might find that your macro and your inline function result in exactly the same code being executed, and so int a = ROUND_DOWN(160*2, 32); and int a = RoundDown(160*2, 32); might both be the same as int a = 320;.
Where there are no side-effects, optimization can also store and re-use intermediate results. So int c = ROUND_DONW(a*2, b); might end up emitting code that looks as though you've written:
int tmp = a*2;
int c = tmp - tmp % b;
Note that whether to actually inline a function is a decision made by the compiler based on its own optimization rules. Those rules might take account of whether the function is marked inline or not, but quite likely don't unless you're using compiler options to force inlining or whatever.
So, assuming a decent compiler there's no reason to use a macro for this - for your macro in particular you're just begging for someone to come along and write:
int a = ROUND_DOWN(321, 32) * 2;
and then waste a few minutes wondering why the result is 319.
[*] Although don't get carried away - for some expressions with side-effects, for example i++ where i is an integer, the macro has undefined behavior due to lack of sequence points.
Well in the example you've given with constants, on any reasonable compiler both versions will compute the constant at compile time.
Assuming you're actually asking about cases where variables are passed in, I would expect the compiler's optimizer to generate the same code in both cases (it wouldn't do the multiplication twice if it's more efficient to save off the result. Finally, the inline function does give the compiler the option to make an actual function call if it would improve performance.
Finally note that I wouldn't worry about micro-optimizations like this because 99% it's just going to have no effect on the performance of your program - I/O will be your bottleneck.
EDITED and refined my question after Johannes's valuable answer
bool b = true;
volatile bool vb = true;
void f1() { }
void f2() { b = false; }
void(* volatile pf)() = &f1; //a volatile pointer to function
int main()
{
//different threads start here, some of which may change pf
while(b && vb)
{
pf();
}
}
So, let's forget synchronization for a while. The question is whether b has to be declared volatile. I have read the standard and sort-of know the formal definition of volatile semantics (I even almost understand them, the word almost being the key). But let's be a bit informal here. If the compiler sees that in the loop there is no way for b to change then unless b is volatile, it can optimize it away and assume it is equivalent to while(vb). The question is, in this case pf is itself volatile, so is the compiler allowed to assume that b won't change in the loop even if b is not volatile?
Please refrain from comments and answers which address the style of this piece of code, this is not a real-world example, this is an experimental theoretical question.
Comments and answers which, apart from answering my question, also address the semantics of volatile in greater detail which you think I have misunderstood are very much welcome.
I hope my question is clear. TIA
Editing once more:
what about this?
bool b = true;
volatile bool vb = true;
void f1() {}
void f2() {b = false;}
void (*pf) () = &f1;
#include <iosrteam>
int main()
{
//threads here
while(b && vb)
{
int x;
std::cin >> x;
if(x == 0)
pf = &f1;
else
pf = &f2;
pf();
}
}
Is there a principal difference between the two programs. If yes, what is the difference?
The question is, in this case pf is itself volatile, so is the compiler allowed to assume that b won't change in the loop even if b is not volatile?
It can't, because you say that pf might be changed by the other threads, and this indirectly changes b if pf is called then by the while loop. So while it is theoretically not required to read b normally, it in practice must read it to determine whether it should short circuit (when b becomes false it must not read vb another time).
Answer to the second part
In this case pf is not volatile anymore, so the compiler can get rid of it and see that f1 has an empty body and f2 sets b to false. It could optimize main as follows
int main()
{
// threads here (which you say can only change "vb")
while(vb)
{
int x;
std::cin >> x;
if(x != 0)
break;
}
}
Answer to older revision
One condition for the compiler to be allowed to optimize the loop away is that the loop does not access or modify any volatile object (See [stmt.iter]p5 in n3126). You do that here, so it can't optimize the loop away. In C++03 a compiler wasn't allowed to optimize even the non-volatile version of that loop away (but compilers did it anyway).
Note that another condition for being able to optimize it away is that the loop contains no synchronization or atomic operations. In a multithreaded program, such should be present anyway though. So even if you get rid of that volatile, if your program is properly coded I don't think the compiler can optimize it away entirely.
The exact requirements on volatile in the current C++ standard in a case like this are, as I understand it, not entirely well-defined by the standard, since the standard doesn't really deal with multi-threading. It's basically a compiler hint. So, instead, I'll address what happens in a typical compiler.
First, suppose the compiler is compiling your functions independently, and then linking them together. In either example, you have a loop in which you're checking a variable, and calling a function pointer. Within the context of that function, the compiler has no idea what the function behind that function pointer will do, and thus it must always re-load b from memory after calling it. Thus, volatile is irrelevant there.
Expanding that to your first actual case, and allowing the compiler to make whole-program optimizations, because pf is volatile the compiler still has no idea what it's going to be pointing at (it can't even assume it's either f1 or f2!), and thus likewise cannot make any assumptions about what will be unmodified across the function-pointer call -- and so volatile on b is still irrelevant.
Your second case is actually simpler -- vb in it is a red herring. If you eliminate that, you can see that even in completely single-threaded semantics, the function-pointer call may modify b. You're not doing anything with undefined behavior, and so the program must operate correctly without volatile -- remember that, if you aren't considering a situation with external thread tweaks, volatile is a no-op. Therefore, without vb in the picture, you cannot possibly need volatile, and it's pretty clear that adding vb changes nothing.
Thus, in summary: You don't need volatile in either case. The difference, insofar as there is one, is that in the first case if fp were not volatile, a sufficiently-advanced compiler could possibly optimize b away, whereas it cannot even without volatile anywhere in the program in the second case. In practice, I do not expect any compilers would actually make that optimization.
volatile only hurts you if you think you could have benefited from an optimization that can't be done or if it communicates something that isn't true.
In your case, you said that these variables can be changed by other threads. Reading code, that's my assumption when I see volatile, so from a maintainer's perspective, that's good -- it's giving me extra information (which is true).
I don't know whether the optimizations are worth trying to salvage since you said this isn't the real code, but if they aren't then there aren't any reasons to not use volatile.
Not using volatile when you are supposed to results in incorrect behavior, since the optimizations are changing the meaning of the code.
I worry about coding the minutia of the standard and behavior of your compilers because things like this can change and even if they don't, your code changes (which could effect the compiler) -- so, unless you are looking for micro-optimization improvements on this specific code, I'd just leave it volatile.