While searching for the difference in new and malloc, I came across this statement (source):
new is faster than malloc() because an operator is always faster than a function.
Are operators always faster than functions? If so, why? I would really appreciate low-level explanations (you can assume basic compiler, SASS, and hardware knowledge).
new is faster than malloc() because an operator is always faster than a function.
This is completely untrue. In fact, it is quite typical that the default behaviour of new expression is to internally call malloc, in which case it cannot possibly be faster.
There is no reason to expect different performance for using one over another as long as the contending programs do the same thing. The reasons to use new instead of malloc are not related to performance.
Are operators faster than functions?
Calling a function at runtime is potentially slower than not calling a function.
But, as we've found out, an operator can actually internally call a function. Besides, a function call for the abstract machine doesn't necessarily mean that a function will be called at runtime. As long as the compiler is able to produce the result of the function at compile time, or if it is able to expand the call inline, then there is no need for any function call overhead.
So, it depends on what function calls we are discussing. As far as a C++ function call is concerned: It is not necessarily slower than the use of an operator.
Also, do note that all overloaded operators that operate on class types are actually function calls to the operator overload function.
Related
As per my understanding when we call a non-inlined function like foo() program control will shift to called function address then store the location of caller and return bank to the caller to another statement after previous function class. But when I implement the class with operator definition will the same process occur or something different happens in favor for operator function?
An operator overload is just a function with a peculiar name.
The compiler translates use of the operator into a function call.
That is, a + b becomes a.operator+(b) or operator+(a, b), depending on how the overload is defined.
(You can also write those out yourself, and it will behave exactly the same but miss the point.)
Note that function call overhead is something I haven't seen anyone worry about during this millennium. It only takes nanoseconds on a reasonably modern machine, unless you make very expensive argument copies – but then you get rid of the copying, not the function.
You will very likely never encounter a situation where getting rid of function calls is your top-priority speed optimisation.
Virtual function calls can matter in very time-sensitive situations, for instance in a tight loop, but those instances are rare.
(And the overhead for that is not the function call per se, but is caused by the late binding.)
I have seen many instances where people have advised against using std::function<> because it is a heavyweight mechanism. Could someone please explain why that is so?
std::function is a type erasure class.
It takes whatever it is constructed from, and erases everything except:
Invoke with the signature in question (with possible implicit casting)
Destroy
Copy
Cast back to exact original type
and possibly
Move
This involves some overhead. A typical decent-quality std::function will have small object optimization (like small string optimization), avoiding a heap allocation when the amount of memory used is small.
A function pointer will fit in there.
However, there is still overhead. If you initialize a std::function with a compatible function pointer, instead of directly calling the function pointer in question, you do a virtual function table lookup, or invoke some other function, which then invokes the function pointer.
With a vtable implementation, that is a possible cache miss, an instruction cache miss, then another instruction cache miss. With a function pointer, the pointer is probably stored locally, and it is called directly, resulting on one possible instruction cache miss.
On top of this, in practice compilers understand function pointers better than std::functions: a number of compilers can figure out that the pointer is constant value during inlining or whole program optimization. I have never seen one that pulls that off with std::function.
For larger objects (say larger than sizeof(std::string) in one implementation), a heap allocation is also done by the std::function. This is another cost. For function pointers and reference wrappers, SOO is guaranteed by the standard.
Directly storing the lambda without storing it in a std::function is even better than a function pointer: in that case, the code being run is implicit in the type of the lambda. This makes it trivial for code to work out what is going to happen when it is called, and inlining easy for the compiler.
Only do type erasure when you need to.
Under the hood, std::function typically uses type erasure (one simplified explanation for how it may be implemented is here). The cost of storing your function object inside the std::function object may involve a heap allocation. The cost of invoking your function object is typically an indirection through a pointer plus a virtual function call. Also, while compilers are getting better at this, the virtual function call usually inhibits inlining of your function.
That being said, I recommend using std::function unless you know via measurements that the cost is too high (typically when you cannot afford heap allocations, your function will be called many times in a place that requires very low latency, etc.), as it is better to write straightforward code than to prematurely optimize.
Depending of the implementation, std::function will add some overhead due to the use of type easure. They have been some other implementation such as Don Clugston's fast delegate, with a C++11 implementation here. Please note that it uses UB to make the fastest possible delegate, but is still extremely portable.
If you want type erasure it's the right tool for the job and almost certainly not your bottleneck and not something you could write faster anyway.
However sometimes it can be all to tempting to use type erasure when it really isn't required. That's where to draw the line. For example if all you want to do is keep hold of a lambda locally then it's probably not the right tool and you should just use:
auto l = [](){};
Likewise for function pointers you don't plan to type erase - just use a function pointer type.
You also don't need type erasure for templates from <algorithm> or your own equivalents because there's simply no need for heterogenous functor types to coexist.
It's not so.
To put it simply, it's not too heavyweight unless you profiled your program and showed that it is too heavyweight. Since evidently you did not (otherwise you would know the answer to this question), we can safely conclude that it is in fact not too heavyweight at all.
You should always profile, before concluding that it's too slow.
In Stroustrup's The C++ programming language , Page 431, when he was discussing about the design of the standard libraries, he said,
For example, building the comparison criteria into a sort function is unacceptable because the same
data can be sorted according to different criteria. This is why the C standard library qsort() takes a comparison function as an argument rather than relying on something fixed, say, the < operator. On the other hand, the overhead imposed by a function call for each comparison compromises qsort() as a building block for further library building.
These above make sense to me. But in the second paragraph, he said,
Is that overhead serious? In most cases, probably not. However, the function call overhead can
dominate the execution time for some algorithms and cause users to seek alternatives. The technique of supplying comparison criteria through a template argument described in §13.4 solves that
problem.
In §13.4, the comparison criteria are defined as class with static member functions (which does the comparison). When these classes are used as template parameters, the comparison is still done by their static member functions. It seems to me there would still be overheads for calling the static member function.
What did Stroustrup mean by saying that?
std::sort is a function template. A separate sort instance will be created for each type and comparison operator during compilation. And because for each sort instantiation the type and the comparator is known at compile time, this allows inlining the comparator function and therefore avoiding the cost of a function call.
There is no theoretical reason why sort need be faster than qsort. Some compilers will even inline the function pointer passed to functions 'like' qsort: I believe I have seen gcc or clang do this (not to qsort), and even do so where the function definition was in a different cpp file.
The important part is that sort gets passed the function object as a type as well as an instance. A different function for each such type is generated: templates are factories for functions. At the point where it is called, the exact function called is really easy to determine for each such function instance, so inlining is trivial.
Doing the same with a function pointer is possible, but requires inlining from the point where qsort is invoked, and tracking the immutability of the function pointer carefully, and knowing which function it was to start with. This is far more fragile than the above mechanism in practice.
Similar issues appear with element stride (clearly static when sorting an array, harder to deal with when using qsort) and the like.
Calling a function via a pointer has two overheads: the pointer dereferencing and a function call overhead. This is a runtime process.
Template instantiation is done by compiler. Pointer dereferencing is eliminated as there is no pointer obviously. Function call overhead is optimised out by a compiler by inlining the call.
Considering the fact that a pointer to a function returning another pointer to another function is the mechanism used in C to introduce some runtime polymorphism/callbacks, What is the equivalent way to implement this in C++ while improving locality and lower the cost about pointers and indirections ?
For example this syntactic sugar can help but I'm not really interested in this, altough it's a nice way to do things in a C++ way instead of a more C-ish typedef, I'm more interested in improving locality while trying to reduce the use of explicit pointers at runtime.
The real reason as to why people use function pointers in C to emulate polymorphism is not performance but the fact that C neither supports real polymorphism nor templates. These are two alternatives you have in C++. All three approaches are compared in this thread.
Note that even though calling a function pointer does not require the additional vtable lookup that virtual function calls do, calling virtual functions and function pointers both suffer from the same major performance problem: Branch prediction in both cases is not as reliable and you tend to end up with more pipeline flushes.
I think you can use Virtual function for meet part of the requirement.
In C++ Standard Template Library, there's a 'functional' part, in which many classes have overloaded their () operator.
Does it bring any convenience to use functions as objects in C++?
Why can't we just use function pointer instead? Any examples?
Ofcourse, One can always use Function pointers instead of Function Objects, However there are certain advantages which function objects provide over function pointers, namely:
Better Performance:
One of the most distinct and important advantage is they are more likely to yield better performance. In case of function objects more details are available at compile time so that the compiler can accurately determine and hence inline the function to be called unlike in case of function pointers where the derefencing of the pointer makes it difficult for the compiler to determine the actual function that will be called.
Function objects are Smart functions:
Function objects may have other member functions and attributes.This means that function objects have a state. In fact, the same function, represented by a function object, may have different states at the same time. This is not possible for ordinary functions. Another advantage of function objects is that you can initialize them at runtime before you use/call them.
Power of Generic programming:
Ordinary functions can have different types only when their signatures differ. However, function objects can have different types even when their signatures are the same. In fact, each functional behavior defined by a function object has its own type. This is a significant improvement for generic programming using templates because one can pass functional behavior as a template parameter.
Why can't we just use function pointer instead? Any examples?
Using C style function pointer cannot leverage the advantage of inlining. Function pointer typically requires an additional indirection for lookup.
However, if operator () is overloaded then it's very easy for compiler to inline the code and save an extra call, so increase in performance.
The other advantage of overloaded operator () is that, one can design a function which implicitly considers the function object as argument; no need to pass it as a separate function. Lesser the hand coded program, lesser the bugs and better readability.
This question from Bjarne Stroustrup (C++ inventor) webpage explains that aspect nicely.
C++ Standard (Template) Library uses functional programming with overloaded operator (), if it's needed.
> Does it bring any convenience to use functions as objects in C++?
Yes: The C++ template mechanism allows all other C/C++ programming styles (C style and OOP style, see below).
> Why can't we just use function pointer instead? Any examples?
But we can: A simple C function pointer is an object with a well defined operator(), too.
If we design a library, we do not want to force anyone to use that C pointer style if not desired. It is usually as undesired as forcing everything/everyone to be in/use OOP style; see below.
From C-programmers and functional programmers views, OOP not only tends to be slower but more verbose and in most cases to be the wrong direction of abstraction ("information" is not and should not be an "object"). Because of that, people tend to be confused whenever the word "object" is used in other contexts.
In C++, anything with the desired properties can be seen as an object. In this case, a simple C function pointer is an object, too. This does not imply that OOP paradigms are used when not desired; it is just a proper way to use the template mechanism.
To understand the performance differences, compare the programming(-language) styles/paradigms and their possible optimisations:
C style:
Function pointer with its closure ("this" in OOP, pointer to some structure) as first parameter.
To call the function, the address of the function needs to be accessed first.
That is 1 indirection; no inlining possible.
C++ (and Java) OOP style:
Reference to an object derived from a class with virtual functions.
Reference is 1st pointer.
Pointer to virtual-table is 2nd pointer.
Function pointer in virtual-table is 3rd pointer.
That are 3 indirections; no inlining possible.
C++ template style:
Copy of an object with () function.
No virtual-table since the type of that object is known at compile time.
The address of the function is known at compile time.
That are 0 indirections; inlining possible.
The C++ templates are versatile enough to allow the other two styles above, and in the case of inlining they can even outperform…
compiled functional languages: (excluding JVM and Javascript as target platforms because of missing "proper tail calls")
Function pointer and reference to its closure in machine registers.
It is usually no function "call" but a GOTO like jump.
Functions do not need the stack, no address to jump back, no parameters nor local variables on the stack.
Functions have their garbage collectable closure(s) containing parameters and a pointer to the next function to be called.
For the CPU to predict the jump, the address of the function needs to be loaded to a register as early as possible.
That is 1 indirection with possible jump prediction; everything is nearly as fast as inlined.
The main difference is that function objects are more powerful than plain function pointers as they can hold state. Most algorithms take templates functions rather than plain function pointers, which enable the use of powerful constructs as binders that call functions with different signatures by filling extra arguments with values stored on the functor, or the newer lambdas in C++11. Once the algorithms are designed to take functors it just makes sense to provide a set of predefined generic function objects in the library.
Aside from that there are potential advantages in that in most cases those functors are simple classes for which the compiler has the full definition and can perform inlining of the function calls improving performance. This is the reason why std::sort can be much faster than qsort from the C library.