Optimizing g++ when indexing an array with an injective function - c++

I have a for loop where each step i, it processes an array element p[f(i)], where f(i) is an injective (one-to-one) map from 1...n to 1...m (m > n). So there is no data coupling in the loop and all compiler optimization techniques such as pipelining can be used. But how can I inform g++ of the injectivity of f(i)? Or do I even need to (can g++ figure that out)?

Assuming that f doesn't rely on any global state and produces no side effects, you can tag it with the const attribute:
int f(int i) __attribute__((const));
If f does rely on global state but still has the property that it's a pure function of its inputs and global state (and produces no side effects), you can use the the slightly weaker pure attribute.
These attributes let gcc make more optimizations than it otherwise could, although I don't know if these will be helpful in your case. Take a look at the generated assembly code and see if they help.

You could also try processing the loop with a temporary storage array, i.e.:
temp[i]= process(p[f(i)]);
then copy the results back:
p[f(i)]= temp[i];
Assuming you declared p and temp to be restricted pointers, the compiler has enough information to optimize a little more aggressively.

If the definition of f() is in scope and is inlineable most any good compiler should first inline it into the function, then the next optimization passes should be able to rewrite the code as if the function call wasn't there.

Related

How to instruct VC++ compiler to not inline a constant?

I have the following global constant in my C++ program:
const int K = 123456 ;
When I compile the program, the resulting executable contains the literal value 123456 in all the places where the value is used (dozens of times).
But, if I remove the const qualifier, the value 123456 appears only once in the entire executable (in the .data section).
This is the result I'm looking for. I want the value 123456 to appear only once so that it can be changed simply by editing the .exe file with a HEX editor.
However, I don't want to remove the const qualifier because I want the compiler to prevent me from accidentally modifying the constant in the source code.
Is it possible to instruct the compiler somehow to not inline the value of said constant?
The reason I need to do this is so that the executable is easily modifiable by students who will be tasked with "cracking" an example program to alter its behavior. The exercise must be simple enough for inexperienced people.
If you don't want K to be inlined then put this in a header file:
extern const int K;
This means "K is defined somewhere else". Then put this in a cpp file:
const int K = 123456;
In all the places where K is used, the compiler only knows that K is a const int declared externally. The compiler doesn't know the value of K so it cannot be inlined. The linker will find the definition of K in the cpp file put it in the .data section of the executable.
Alternatively, you could define K like this:
const volatile int K = 123456;
This means "K might magically change so you better not assume its value". It has a similar effect to the previous approach as the compiler won't inline K because it can't assume that K will always be 123456. The previous approach would fail if LTO was enabled but using volatile should work in that case.
I must say, this is a really weird thing to do. If you want to make your program configurable, you should put the value of K into a text file and then read the file at startup.
The simplest option is probably to declare it as global without const, so the compiler can't assume that it still has the value of the static initializer.
int K = 123456;
Even link-time optimization can't know that a library function doesn't access this global, assuming you call any in your program.
If your used static int K = 123456;, the compiler could notice that no functions in the compilation unit write the value, and none of them pass or return its address, so escape analysis for the whole compilation unit could discover that it was effectively a constant and could be optimized away.
(If you really wanted it to be static int K;, include a global function like void setK(int x){K=x;} that you never actually call. Without Link-Time Optimization, the compiler will have to assume that something outside this compilation unit could have called this function and changed K, and that any call to a function whose definition isn't visible might result in such a call.)
Beware that volatile const int K = 123456; can hurt optimization significantly more than making it non-const, especially if you have expressions that use K multiple times.
(But either of these can hurt a lot, depending on what optimizations were possible. Constant-propagation can be a huge win.)
The compiler is required to emit asm that loads exactly K once for each time the C abstract machine reads it. (e.g. reading K is considered a visible side-effect, like a read from an MMIO port or a location you have a hardware watchpoint on.)
If you want to let a compiler load it once per loop, and assume K is a loop invariant, then code that uses it should do int local_k = K;. It's up to you how often you want to re-read K, i.e. what scope you do / redo local_k = K at.
On x86, using a memory source operand that stays hot in L1d cache is probably not much of a performance problem, but it will prevent auto-vectorization.
The reason I need to do this is so that the executable is easily modifiable by students who will be tasked with "cracking" an example program to alter its behavior. The exercise must be simple enough for inexperienced people.
For this use-case, yes volatile is exactly what you want. Having all uses re-read from memory on the spot makes it slightly simpler than following the value cached in a register.
And performance is essentially irrelevant, and you won't want auto-vectorization. Probably just light optimization so the students don't have to wade through store/reload of everything after every C++ statement. Like gcc's -Og would be ideal.
With MSVC, maybe try -O1 or -O2 and see if it does anything confusing. I don't think it has options for some but not too aggressive optimization, it might be either debug build (nice for single-stepping the C++ source, bad for reading asm), or fully optimized for size or speed.
Try declaring the constant as volatile. That should result in a single and changeable value that won't be inlined.

Does c optimize the check portion of for loops?

With the following code how many times would the min function actually be called
for (int i = 0; i < min(size, max_size); i++) {
//Do something cool that does not involve changing the value of size or max size
}
Would the compiler notice that they could just calculate the minimum and register it or should I explicitly create a variable to hold the value before entering the loop? What kinds of languages would be able to optimize this?
As an extension if I were in an object oriented language with a similar loop except it looked more like this
for (int i = 0; i < object.coolFunc(); i++) {
//Code that may change parameters and state of object but does not change the return value of coolFunc()
}
What would be optimized?
Any good compiler will optimize the controlling expression of a for loop by evaluating visibly invariant subexpressions in it just once, provided optimization is enabled. Here, “invariant” means the value of the subexpression does not change while the loop is executing. “Visibly” means the compiler can see that the expression is invariant. There are things that can interfere with this:
Suppose, inside the loop, some function is called and the address of size is passed as an argument. Since the function has the address of size, it could change the contents of size. Maybe the function does not do this, but the compiler might not be able to see the contents of the function. Its source code could be in another file. Or the function could be so complicated the compiler cannot analyze it. Then the compiler cannot see that size does not change.
min is not a standard C function, so your program must define it somewhere. As above, if the compiler does not know what min does or if it is too complicated for the compiler to analyze (not likely in this particular case, but in general), the compiler might not be able to see that it is a pure function.
Of course, the C standard does not guarantee this optimization. However, as you become experienced in programming, your knowledge of compilers and other tools should grow, and you will become familiar with what is expected of good tools, and you will also learn to beware of issues such as those above. For simple expressions, you can expect the compiler to optimize. But you need to remain alert to things that can interfere with optimization.

Why gcc and clang both don't emit any warning?

Suppose we have code like this:
int check(){
int x = 5;
++x; /* line 1.*/
return 0;
}
int main(){
return check();
}
If line 1 is commented out and the compiler is started with all warnings enabled, it emits:
warning: unused variable ‘x’ [-Wunused-variable]
However if we un-comment line 1, i.e. increase x, then no warning is emitted.
Why is that? Increasing the variable is not really using it.
This happen in both GCC and Clang for both c and c++.
Yes.
x++ is the same as x = x+1;, the assignment. When you are assigning to something, you possibly can not skip using it. The result is not discarded.
Also, from the online gcc manual, regarding -Wunused-variable option
Warn whenever a local or static variable is unused aside from its declaration.
So, when you comment the x++;, it satisfies the condition to generate and emit the warning message. When you uncomment, the usage is visible to the compiler (the "usefulness" of this particular "usage" is questionable, but, it's an usage, nonetheless) and no warning.
With the preincrement you are incrementing and assigning the value to the variable again. It is like:
x=x+1
As the gcc documentation says:
-Wunused-variable:
Warn whenever a local or static variable is unused aside from its declaration.
If you comment that line you are not using the variable aside of the line in which you declare it
increasing variable not really using it.
Sure this is using it. It's doing a read and a write access on the stored object. This operation doesn't have any effect in your simple toy code, and the optimizer might notice that and remove the variable altogether. But the logic behind the warning is much simpler: warn iff the variable is never used.
This has actually the benefit that you can silence that warning in cases where it makes sense:
void someCallback(void *data)
{
(void)data; // <- this "uses" data
// [...] handler code that doesn't need data
}
Why is that? increasing variable not really using it.
Yes, it is really using it. At least from the language point of view. I would hope that an optimizer removes all trace of the variable.
Sure, that particular use has no effect on the rest of the program, so the variable is indeed redundant. I would agree that warning in this case would be helpful. But that is not the purpose of the warning about being unused, that you mention.
However, consider that analyzing whether a particular variable has any effect on the execution of the program in general is quite difficult. There has to be a point where the compiler stops checking whether a variable is actually useful. It appears that the stages that generate warnings of the compilers that you tested only check whether the variable is used at least once. That once was the increment operation.
I think there is a misconception about the word 'using' and what the compiler means with that. When you have a ++i you are not only accessing the variable, you are even modifying it, and AFAIK this counts as 'use'.
There are limitations to what the compiler can identify as 'how' variables are being used, and if the statements make any sense. In fact both clang and gcc will try to remove unnecessary statements, depending on the -O-flag (sometimes too aggressively). But these optimizations happen without warnings.
Detecting a variable that is never ever accessed or used though (there is no further statement mentioning that variable) is rather easy.
I agree with you, it could generate a warning about this. I think it doesn't generate a warning, because developers of the compilers just didn't bothered handling this case (yet). Maybe it is because it is too complicated to do. But maybe they will do this in the future (hint: you can suggest them this warning).
Compilers getting more and more warnings. For example, there is -Wunused-but-set-variable in GCC (which is a "new" warning, introduced in GCC 4.6 in 2011), which warns about this:
void fn() {
int a;
a = 2;
}
So it is completely fine to expect that this emits a warning too (there is nothing different here, neither codes do anything useful):
void fn() {
int a = 1;
a++;
}
Maybe they could add a new warning, like -Wmeaningless-variable
As per C standard ISO/IEC 9899:201x, expressions evaluation are always executed to allow for expression's side effects to be produced unless the compiler can't be sufficiently sure that removing it the program execution is not altered.
5.1.2.3 Program execution
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
When removing the line
++x;
The compiler can deduce that the local variable x is defined and initialized, but not used.
When you add it, the expression itself can be considered a void expression, that must be evaluated for side effects, as stated in:
6.8.3 Expression and null statements
The expression in an expression statement is evaluated as a void expression for its side effects.
On the other hand to remove compiler warnings relative to unused variable is very common to cast the expression to void. I.e. for an unused parameter in a function you can write:
int MyFunc(int unused)
{
(void)unused;
...
return a;
}
In this case we have a void expression that reference the symbol unused.

Compiler Optimization with Parameters

Lets say you have some functions in some classes are called together like this
myclass::render(int offset_x, int offset_y)
{
otherClass.render(offset_x, offset_y)
}
This pattern will repeat for a while possibly through 10+ classes, so my question is:
Are modern C++ compilers smart enough to recognise that wherever the program stores function parameters - From what wikipedia tells me it seems to vary depending on parameter size, but that for a 2 parameter function the processor register seems likely - doesn't need to be overridden with new values?
If not I might need to look at implementing my own methods
I think it's more likely that the compiler will make a larger-scale optimization. You'd have to examine the actual machine code produced, but for example the following trivial attempt:
#include <iostream>
class B {
public:
void F( int x, int y ) {
std::cout << x << ", " << y << std::endl;
}
};
class A {
B b;
public:
void F( int x, int y ) {
b.F( x, y );
}
};
int main() {
A a;
a.F( 32, 64 );
}
causes the compiler (cl.exe from VS 2010, empty project, vanilla 'Release' configuration) to produce assembly that completely inlines the call tree; you basically get "push 40h, push 20h, call std::operator<<."
Abusing __declspec(noinline) causes cl.exe to realize that A::F just forwards to B::F and the definition of A::F is nothing but "call A::F" without stack or register manipulation at all (so in that case, it has performed the optimization you're asking about). But do note that my example is extremely contrived and so says nothing about the compiler's ability to do this well in general, only that it can be done.
In your real-world scenario, you'll have to examine the disassembly yourself. In particular, the 'this' parameter needs to be accounted for (cl.exe usually passes it via the ECX register) -- if you do any manipulation of the class member variables that may impact the results.
Yes, it is. The compiler performs dataflow analysis before register allocation, keeping track of which data is where at which time. And it will see that the arg0 location contains the value that needs to be in the arg0 location in order to call the next function, and so it doesn't need to move the data around.
I'm not a specialist, but it looks a lot like the perfect forwarding problem that will be solved in the next standard (C++0x) by using rvalue-references.
Currently I'd say it depend on the compiler, but I guess if the function and the parametters are simple enough then yes the function will serve as a shortcut.
If this function is imlpemented directly in the class definition (and then becoming implicitely candidate for inlining) it might be inligned, making the call directly call the wanted function instead of having two runtime calls.
In spite of your comment, I think that inlining is germane to this discussion. I don't believe that C++ compilers will do what you're asking (reuse parameters on the stack) UNLESS it also inlines the method completely.
The reason is that if it's making a real function call it still has to put the return address onto the stack, thus making the previous call's parameters no longer at the expected place on the stack. Thus in turn is has to put the parameters back on the stack a second time.
However I really wouldn't worry about that. Unless you're making a ridiculous number of function calls like this AND profiling shows that it's spending a large proportion of its time on these calls they're probably extremely minimal overhead and you shouldn't worry about it. For a function that small however, mark it inline and let the compiler decide if it can inline it away completely.
If I understand the question correctly, you are asking "Are most compilers smart enough to inline a simple function like this", and the answer to that question is yes. Note however the implicit this paremeter which is part of your function (because your function is part of a class), so it might not be completely inlineable if the call level is deep enough.
The problem with inlining is that the compiler will probably only be able to do this for a given compilation unit. The linker is probably less likely to be clever enough to inline from one compilation unit to another.
But given the total trivial nature of the function and that both functions have exactly the same arguments in the same order, the cost of the function call will probably be only one machine instruction viz. an additional branch (or jump) to the true implementation. There is no need to even push the return address onto the stack.

Two questions about inline functions in C++

I have question when I compile an inline function in C++.
Can a recursive function work with inline. If yes then please describe how.
I am sure about loop can't work with it but I have read somewhere recursive would work, If we pass constant values.
My friend send me some inline recursive function as constant parameter and told me that would be work but that not work on my laptop, no error at compile time but at run time display nothing and I have to terminate it by force break.
inline f(int n) {
if(n<=1)
return 1;
else {
n=n*f(n-1);
return n;
}
}
how does this work?
I am using turbo 3.2
Also, if an inline function code is too large then, can the compiler change it automatically in normal function?
thanks
This particular function definitely can be inlined. That is because the compiler can figure out that this particular form of recursion (tail-recursion) can be trivially turned into a normal loop. And with a normal loop it has no problem inlining it at all.
Not only can the compiler inline it, it can even calculate the result for a compile-time constant without generating any code for the function.
With GCC 4.4
int fac = f(10);
produced this instruction:
movl $3628800, 4(%esp)
You can easily verify when checking assembly output, that the function is indeed inlined for input that is not known at compile-time.
I suppose your friend was trying to say that if given a constant, the compiler could calculate the result entirely at compile time and just inline the answer at the call site. c++0x actually has a mechanism for this called constexpr, but there are limits to how complex the code is allowed to be. But even with the current version of c++, it is possible. It depends entirely on the compiler.
This function may be a good candidate given that it clearly only references the parameter to calculate the result. Some compilers even have non-portable attributes to help the compiler decide this. For example, gcc has pure and const attributes (listed on that page I just linked) that inform the compiler that this code only operates on the parameters and has no side effects, making it more likely to be calculated at compile time.
Even without this, it will still compile! The reason why is that the compiler is allowed to not inline a function if it decides. Think of the inline keyword more of a suggestion than an instruction.
Assuming that the compiler doesn't calculate the whole thing at compile time, inlining is not completely possible without other optimizations applied (see EDIT below) since it must have an actual function to call. However, it may get partially inlined. In that case the compiler will inline the initial call, but also emit a regular version of the function which will get called during recursion.
As for your second question, yes, size is one of the factors that compilers use to decide if it is appropriate to inline something.
If running this code on your laptop takes a very long time, then it is possible that you just gave it very large values and it is simply taking a long time to calculate the answer... The code look ok, but keep in mind that values above 13! are going to overflow a 32-bit int. What value did you attempt to pass?
The only way to know what actually happens is to compile it an look at the assembly generated.
PS: you may want to look into a more modern compiler if you are concerned with optimizations. For windows there is MingW and free versions of Visual C++. For *NIX there is of course g++.
EDIT: There is also a thing called Tail Recursion Optimization which allows compilers to convert certain types of recursive algorithms to iterative, making them better candidates for inlining. (In addition to making them more stack space efficient).
Recursive function can be inlined to certain limited depth of recursion. Some compilers have an option that lets you to specify how deep you want to go when inlining recursive functions. Basically, the compiler "flattens" several nested levels of recursion. If the execution reaches the end of "flattened" code, the code calls itself in usual recursive fashion and so on. Of course, if the depth of recursion is a run-time value, the compiler has to check the corresponding condition every time before executing each original recursive step inside the "flattened" code. In other words, there's nothing too unusual about inlining a recursive function. It is like unrolling a loop. There's no requirement for the parameters to be constant.
What you mean by "I am sure about loop can't work" is not clear. It doesn't seem to make much sense. Functions with a loop can be easily inlined and there's nothing strange about it.
What are you trying to say about your example that "displays nothing" is not clear either. There is nothing in the code that would "display" anything. No wonder it "displays nothing". On top of that, you posted invalid code. C++ language does not allow function declarations without an explicit return type.
As for your last question, yes, the compiler is completely free to implement an inline function as "normal" function. It has nothing to do with function being "too large" though. It has everything to do with more-or-less complex heuristic criteria used by that specific compiler to make the decision about inlining a function. It can take the size into account. It can take other things into account.
You can inline recursive functions. The compiler normally unrolls them to a certain depth- in VS you can even have a pragma for this, and the compiler can also do partial inlining. It essentially converts it into loops. Also, as #Evan Teran said, the compiler is not forced to inline a function that you suggest at all. It might totally ignore you and that's perfectly valid.
The problem with the code is not in that inline function. The constantness or not of the argument is pretty irrelevant, I'm sure.
Also, seriously, get a new compiler. There's modern free compilers for whatever OS your laptop runs.
One thing to keep in mind - according to the standard, inline is a suggestion, not an absolute guarantee. In the case of a recursive function, the compiler would not always be able to compute the recursion limit - modern compilers are getting extremely smart, a previous response shows the compiler evaluating a constant inline and simply generating the result, but consider
bigint fac = factorialOf(userInput)
there's no way the compiler can figure that one out........
As a side note, most compilers tend to ignore inlines in debug builds unless specifically instructed not to do so - makes debugging easier
Tail recursions can be converted to loops as long as the compiler can satisfactorily rearrange the internal representation to get the recursion conditional test at the end. In this case it can do the code generation to re-express the recursive function as a simple loop
As far as issues like tail recursion rewrites, partial expansions of recursive functions, etc, these are usually controlled by the optimization switches - all modern compilers are capable of pretty signficant optimization, but sometimes things do go wrong.
Remember that the inline key word merely sends a request, not a command to the compiler. The compliler may ignore yhis request if the function definition is too long or too complicated and compile the function as normal function.
in some of the cases where inline functions may not work are
For functions returning values, if a loop, a switch or a goto exists.
For functions not returning values, if a return statement exists.
If function contains static variables.
If in line functions are recursive.
hence in C++ inline recursive functions may not work.