I was recently reading about operator overloading in C++. So, I was wondering whether the built-in operators are replaced by function calls behind the scenes.
For example, Is a + b(a and b are int types) replaced by a.operator+(b)? Or the compiler does something different?
There is no int::operator+. Whether the compiler chooses to compile a + b directly to assembly (likely) or replace it with some internal function like int __add_ints(int, int) (unlikely) is an implementation detail.
The internals of the compiler are complex. On a conceptual level, the answer is YES. Whenever a compiler sees a + b, it does have to check for known functions with the name operator+ and replace it with a call to the right function.
In practice, their are 2 important nuances to make:
The compiler knows about fundamental types (which you can't override), it doesn't need to insert a function call it can immediately insert the right 'instructions'
Inlining is an important optimization, which will remove the function call when interesting
Maybe. Many arithmetic operations map dir calypso into CPU instructions, and the compiler will just generate the appropriate code in place. If that’s not possible, the compiler will generate a call to an appropriate function, and the runtime library will have a definition of that function. Back in the olden days floating-point math was usually done with function calls. These days, CPUs for desktop systems have floating-point hardware, and floating-point math is generated as direct CPU instructions. But embedded systems often don’t have hardware for that, so the compiler generates function calls instead.
Back in the really early days, even integer math was sometimes done with function calls. Because of his, the IBM 1620 was sometimes referred to as the CADET: Can’t Add, Doesn’t Even Try.
Related
#include <iostream>
int foo() {
std::cout<<"foo() is called\n";
return 9;
}
int bar() {
std::cout<<"bar() is called\n";
return 18;
}
int main() {
std::cout<<foo()<<' '<<bar()<<' '<<'\n';
}
// Above program's behaviour is unspecified
// clang++ evaluates function arguments from left to right: http://melpon.org/wandbox/permlink/STnvMm1YVrrSRSsB
// g++ & MSVC++ evaluates function arguments from right to left
// so either foo() or bar() can be called first depending upon compiler.
Output of above program is compiler dependent. Order in which function arguments are evaluated is unspecified. The reason I've read about this is that it can result in highly optimized code. How not specify an exact order of evaluation of function argument helps compiler to generate optimized code?
AFAIK, the order of evaluation is strictly specified in languages such as Java, C#, D etc.
I think the whole premise of the question is wrong:
How not specify an exact order of evaluation of function argument helps C & C++ compiler to generate optimized code?
It is not about optimizing code (though it does allow that). It is about not penalizing compilers because the underlying hardware has certain ABI constraints.
Some systems depend on parameters being pushed to stack in reverse order while others depend on forward order. C++ runs on all kinds of systems with all kinds on constraints. If you enforce an order at the language level you will require some systems to pay a penalty to enforce that order.
The first rule of C++ is "If you don't use it then you should not have to pay for it". So enforcing an order would be a violation of the prime directive of C++.
It doesn't. At least, it doesn't today. Maybe it did in the past.
A proposal for C++17 suggests defining left-right evaluation order for function calls, operator<< and so on.
As described in Section 7 of that paper, this proposal was tested by compiling the Windows NT kernel, and it actually led to a speed increase in some benchmarks. The authors' comment:
It is worth noting that these results are for the worst case scenario where the optimizers have not yet been updated to be aware of, and take advantage of the new evaluation rules and they are blindly forced to evaluate function calls from left to right.
suggests that there is further room for speed improvement.
The order of evaluation is related to the way arguments are passed. If stack is used to pass the arguments, evaluating right to left helps performance, since this is how arguments are pushed into the stack.
For example, with the following code:
void foo(bar(), baz());
Assuming calling conevention is 'passing arguments through the stack', C calling convention requires arguments to be pushed into stack starting from the last one - so that when callee function reads it, it would pop the first argument first and be able to support variadic functions. If order of evaluation was left to right, a result of bar() would have to be saved in the temporary, than baz() called, it's result pushed, following by temporary push. However, right-to-left evaluation allows compiler to avoid the temporary.
If arguments are passed through registers, order of evaluation is not overly important.
The original reason that the C and C++ standards didn't specify and an order of evaluation for function arguments is to provide more optimization opportunities for the compiler. Unfortunately, this rationale has not been backed up by extensive experimentation at the time these languages were initially designed. But it made sense.
This issue has been raised in the past few years. See this blog post by Herb Sutter and don't forget to go through the comments.
Proposal P0145R1 suggests that it's better to specify an order of evaluation for function arguments and for other operators. It says:
The order of expression evaluation, as
it is currently specified in the standard, undermines advices,
popular programming idioms, or the relative safety of standard library
facilities. The traps aren’t just for novices or the careless
programmer. They affect all of us indiscriminately, even when we know
the rules.
You can find more information about how this affects optimization opportunities in that document.
In the past few months, there has been a very extensive discussion about how this change in the language affects optimization, compatibility and portability. The thread begins here and continues here. You can find there numerous examples.
I have limited knowledge in assembly, but I can at least read through it and match with the corresponding C or C++ code. I can see that the function arguments are passed either by pushing them to the stack or by registers, and the function body uses some registers to do its operations. But it also seems to use the same registers that were used in the caller. Does this mean that the caller has no guarantee that the state of the registers will be the same after a function call? What if the whole body of the function is unknown during compilation? How does the compiler deal with this?
The compiler-generated assembler code follows some calling convention. A calling convention typically specifies
how are arguments passed to the function
how return values are passed from the called function to the caller
which registers should be saved within a function call and which can be modified
If all functions being called follow the same calling convention, no problems with using the same registers should occur.
As the comments allude to, the fact is that there is no standard for this. It is left entirely to the implementors of the particular c++ compiler you are using.
A more explicit question, like this: "when compiling on version N of compiler A with compiler options B, calling a function signature of C, for target CPU D, using ABI E, what are the guarantees vis-a-vis register preservation?"
In which case an expert (or the manual) on that particular toolset can answer.
As you can probably infer, for any kind of industrial-strength project, it's the wrong question to ask, because as your compiler evolves the answer will change, and you don't want that fact to impact the reliability of your program.
It's a good question, because it's nice to know what the compiler is doing under the hood - it aids learning.
But on the whole, the golden rule is to express clear uncomplicated logic to the compiler in your program, and allow the compiler to handle the details of turning that logic into optimised machine code, at which modern compilers are excellent.
I am debugging a transactional processing system which is performance sensitive.
I found a code which uses, __builtin_memcpy and __builtin_memset instead of memcpy and memset.
What are __builtin_functions for?
,to prevent the dependency problems on architecture or compiler?
Or.. is there any performance reason where __builtin_functions are prefered?
thank you :D
Traditional library functions, the standard memcpy is just a call to a function. Unfortunately, memcpy is often called for every small copies, and the overhead of calling a function, shuffling a few bytes and returning is quite a lot of overhead (especially since memcpy adds extra stuff to the beginning of the function to deal with unaligned memory, unrolling of the loop, etc, to do well on LARGE copies).
So, for the compiler to optimise those, it needs to "know" how to do for example memcpy - the solution for this is to have a function "builtin" into the compiler, which then contains code such as this:
int generate_builtin_memcpy(expr arg1, expr arg2, expr size)
{
if (is_constant(size) && eval(size) < SOME_NUMBER)
{
... do magic inline memory copy ...
}
else
{
... call "real" memcpy ...
}
}
[For retargetable compilers, there is typically one of these functions for each CPU architecture, that has different configurations as to what conditions the "real" memcpy gets called, or when an inline memcpy is used.]
The key here is that you MAY actually write your own memcpy function, that ISN'T based on __builtin_memcpy(), which is ALWAYS a function, and doesn't do the same thing as normal memcpy [you'd be a bit in trouble if you change it's behaviour a lot, since the C standard library probably calls memcpy in a few thousand places - but for example doing statistics over how many times memcpy is called, and what sizes are copies could be one such use-case].
Another big reason for using __builtin_* is that they provide code that would otherwise have to be written in inline assembler, or possibly not available at all to the programmer. Setting/getting special registers would be such a thing.
There are other techniques to solve this problem, for example clang has a LibraryPass that assumes library-calls do common functions with other alternatives, for example since printf is much "heavier" than puts, it replaces suitable printf("constant string with no formatting\n")s into puts("constant string with no formatting"), and many trigonometric and other math functions are resolved into common simple values when called with constants, etc.
Calling __builtin_* directly for functions like memcpy or sin or some such is probably the WRONG thing to do - it just makes your code less portable and not at all certain to be faster. Calling __builtin_special_function when there is no other is typically the solution in some tricky situations - but you should probably wrap it in your own function, e.g.
int get_magic_property()
{
return __builtin_get_magic_property();
}
That way, when you port to Windows, you can easily do:
int get_magic_property()
{
#if WIN32
return Win32GetMagicPropertyEx();
#else
return __builtin_magic_property();
#endif
}
__builtin_* functions are optimised functions provided by the compiler libraries. These might be builtin versions of standard library functions, such as memcpy, and perhaps more typically some of the maths functions.
Alternatively, they might be highly optimised functions for typical tasks for that particular target - eg a DSP might have built-in FFT functions
Which functions are provided as __builtin_ are determined by the developers of the compiler, and will be documented in the manuals for the compiler.
Different CPU types and compilers are designed for different use cases, and this will be reflected in the range of built-in functions provided.
Built-in functions might make use of specialised instructions in the target processor, or might trade off accuracy for speed by using lookup tables rather than calculating values directly, or any other reasonable optimisation, all of which should be documented.
These are definitely not to reduce dependency on a particular compiler or cpu, in fact quite the opposite. It actually adds a dependency, and so these might be wrapped up in preprocessor checks eg
#ifdef SOME_CPU_FLAG
#define MEMCPY __builtin_memcpy
#else
#define MEMCPY memcpy
on a compiler note, __builtin_memcpy can fall back to emitting
a memcpy function call. also less-capable
compilers the ability to simplify, by choosing the slow path of
unconditionally emitting a memcpy call.
http://lwn.net/Articles/29183/
Is there a difference if it is the first use of the variable or not. For example are a and b treated differently?
void f(bool&a, bool& b)
{
...
a=false;
boost::this_thread::sleep...//1 sec sleep
a=true;
b=true;
...
}
EDIT: people asked why I want to know this.
1. I would like to have some way to tell the compiler not to optimize(swap the order of the execution of the instructions) in some function, and using atomic and or mutexes is much more complicated than using sleep(and in my case sleeping is not a performance problem).
2. Like I said this is generally important to know.
We can't really tell. On scenario could be that the compiler has full introspection to your function at the calling site (and possibly does inline it), in which case it can jumble your function with the caller, and then do optimizations appropriately.
It could then e.g. completely optimize away a and b because there is no code that depends on a and b. Or it might see that you violate aliasing rules so that a and b refer to the same entity, and then merge them according to your program flow.
But it could also be that you tell the compiler to not optimize at all, e.g. with g++'s -O0 flag, in which case not much will happen.
The only proof for your particular platform *, can be made by looking at the generated assembly, or by telling the compiler to please output some log about what it optimizes (g++ has many flags for that).
* compiler+flags used to compile compiler+version+add-ons, hardware, operating system; even the weather might be relevant if your compiler omits some optimizations if it takes to long [which would actually be cool feature for debug builds, imho]
They are not local (because they are references), so it can't, because it can't tell whether the called function sees them or not and has to assume that it does. If they were local variables, it could, because local variables are not visible to the called function unless pointer or reference to them was created.
I have question when I compile an inline function in C++.
Can a recursive function work with inline. If yes then please describe how.
I am sure about loop can't work with it but I have read somewhere recursive would work, If we pass constant values.
My friend send me some inline recursive function as constant parameter and told me that would be work but that not work on my laptop, no error at compile time but at run time display nothing and I have to terminate it by force break.
inline f(int n) {
if(n<=1)
return 1;
else {
n=n*f(n-1);
return n;
}
}
how does this work?
I am using turbo 3.2
Also, if an inline function code is too large then, can the compiler change it automatically in normal function?
thanks
This particular function definitely can be inlined. That is because the compiler can figure out that this particular form of recursion (tail-recursion) can be trivially turned into a normal loop. And with a normal loop it has no problem inlining it at all.
Not only can the compiler inline it, it can even calculate the result for a compile-time constant without generating any code for the function.
With GCC 4.4
int fac = f(10);
produced this instruction:
movl $3628800, 4(%esp)
You can easily verify when checking assembly output, that the function is indeed inlined for input that is not known at compile-time.
I suppose your friend was trying to say that if given a constant, the compiler could calculate the result entirely at compile time and just inline the answer at the call site. c++0x actually has a mechanism for this called constexpr, but there are limits to how complex the code is allowed to be. But even with the current version of c++, it is possible. It depends entirely on the compiler.
This function may be a good candidate given that it clearly only references the parameter to calculate the result. Some compilers even have non-portable attributes to help the compiler decide this. For example, gcc has pure and const attributes (listed on that page I just linked) that inform the compiler that this code only operates on the parameters and has no side effects, making it more likely to be calculated at compile time.
Even without this, it will still compile! The reason why is that the compiler is allowed to not inline a function if it decides. Think of the inline keyword more of a suggestion than an instruction.
Assuming that the compiler doesn't calculate the whole thing at compile time, inlining is not completely possible without other optimizations applied (see EDIT below) since it must have an actual function to call. However, it may get partially inlined. In that case the compiler will inline the initial call, but also emit a regular version of the function which will get called during recursion.
As for your second question, yes, size is one of the factors that compilers use to decide if it is appropriate to inline something.
If running this code on your laptop takes a very long time, then it is possible that you just gave it very large values and it is simply taking a long time to calculate the answer... The code look ok, but keep in mind that values above 13! are going to overflow a 32-bit int. What value did you attempt to pass?
The only way to know what actually happens is to compile it an look at the assembly generated.
PS: you may want to look into a more modern compiler if you are concerned with optimizations. For windows there is MingW and free versions of Visual C++. For *NIX there is of course g++.
EDIT: There is also a thing called Tail Recursion Optimization which allows compilers to convert certain types of recursive algorithms to iterative, making them better candidates for inlining. (In addition to making them more stack space efficient).
Recursive function can be inlined to certain limited depth of recursion. Some compilers have an option that lets you to specify how deep you want to go when inlining recursive functions. Basically, the compiler "flattens" several nested levels of recursion. If the execution reaches the end of "flattened" code, the code calls itself in usual recursive fashion and so on. Of course, if the depth of recursion is a run-time value, the compiler has to check the corresponding condition every time before executing each original recursive step inside the "flattened" code. In other words, there's nothing too unusual about inlining a recursive function. It is like unrolling a loop. There's no requirement for the parameters to be constant.
What you mean by "I am sure about loop can't work" is not clear. It doesn't seem to make much sense. Functions with a loop can be easily inlined and there's nothing strange about it.
What are you trying to say about your example that "displays nothing" is not clear either. There is nothing in the code that would "display" anything. No wonder it "displays nothing". On top of that, you posted invalid code. C++ language does not allow function declarations without an explicit return type.
As for your last question, yes, the compiler is completely free to implement an inline function as "normal" function. It has nothing to do with function being "too large" though. It has everything to do with more-or-less complex heuristic criteria used by that specific compiler to make the decision about inlining a function. It can take the size into account. It can take other things into account.
You can inline recursive functions. The compiler normally unrolls them to a certain depth- in VS you can even have a pragma for this, and the compiler can also do partial inlining. It essentially converts it into loops. Also, as #Evan Teran said, the compiler is not forced to inline a function that you suggest at all. It might totally ignore you and that's perfectly valid.
The problem with the code is not in that inline function. The constantness or not of the argument is pretty irrelevant, I'm sure.
Also, seriously, get a new compiler. There's modern free compilers for whatever OS your laptop runs.
One thing to keep in mind - according to the standard, inline is a suggestion, not an absolute guarantee. In the case of a recursive function, the compiler would not always be able to compute the recursion limit - modern compilers are getting extremely smart, a previous response shows the compiler evaluating a constant inline and simply generating the result, but consider
bigint fac = factorialOf(userInput)
there's no way the compiler can figure that one out........
As a side note, most compilers tend to ignore inlines in debug builds unless specifically instructed not to do so - makes debugging easier
Tail recursions can be converted to loops as long as the compiler can satisfactorily rearrange the internal representation to get the recursion conditional test at the end. In this case it can do the code generation to re-express the recursive function as a simple loop
As far as issues like tail recursion rewrites, partial expansions of recursive functions, etc, these are usually controlled by the optimization switches - all modern compilers are capable of pretty signficant optimization, but sometimes things do go wrong.
Remember that the inline key word merely sends a request, not a command to the compiler. The compliler may ignore yhis request if the function definition is too long or too complicated and compile the function as normal function.
in some of the cases where inline functions may not work are
For functions returning values, if a loop, a switch or a goto exists.
For functions not returning values, if a return statement exists.
If function contains static variables.
If in line functions are recursive.
hence in C++ inline recursive functions may not work.