Is there a reason some functions don't take a void*? - c++

Many functions accept a function pointer as an argument. atexit and call_once are excellent examples. If these higher level functions accepted a void* argument, such as atexit(&myFunction, &argumentForMyFunction), then I could easily wrap any functor I pleased by passing a function pointer and a block of data to provide statefulness.
As is, there are many cases where I wish I could register a callback with arguments, but the registration function does not allow me to pass any arguments through. atexit only accepts one argument: a function taking 0 arguments. I cannot register a function to clean up after my object, I must register a function which cleans up after all objects of a class, and force my class to maintain a list of all objects needing cleanup.
I always viewed this as an oversight, there seemed no valid reason why you wouldn't allow a measly 4 or 8 byte pointer to be passed along, unless you were on an extremely limited microcontroller. I always assumed they simply didn't realize how important that extra argument could be until it was too late to redefine the spec. In the case of call_once, the posix version accepts no arguments, but the C++11 version accepts a functor (which is virtually equivalent to passing a function and an argument, only the compiler does some of the work for you).
Is there any reason why one would choose not to allow that extra argument? Is there an advantage to accepting only "void functions with 0 arguments"?

I think atexit is just a special case, because whatever function you pass to it is supposed to be called only once. Therefore whatever state it needs to do its job can just be kept in global variables. If atexit were being designed today, it would probably take a void* in order to enable you to avoid using global variables, but that wouldn't actually give it any new functionality; it would just make the code slightly cleaner in some cases.
For many APIs, though, callbacks are allowed to take additional arguments, and not allowing them to do so would be a severe design flaw. For example, pthread_create does let you pass a void*, which makes sense because otherwise you'd need a separate function for each thread, and it would be totally impossible to write a program that spawns a variable number of threads.

Quite a number of the interfaces taking function pointers lacking a pass-through argument are simply coming from a different time. However, their signatures can't be changed without breaking existing code. It is sort of a misdesign but that's easy to say in hindsight. The overall programming style has moved on to have limited uses of functional programming within generally non-functional programming languages. Also, at the time many of these interfaces were created storing any extra data even on "normal" computers implied an observable extra cost: aside from the extra storage used, the extra argument also needs to be passed even when it isn't used. Sure, atexit() is hardly bound to be a performance bottleneck seeing that it is called just once but if you'd pass an extra pointer everywhere you'd surely also have one qsort()'s comparison function.
Specifically for something like atexit() it is reasonably straight forward to use a custom global object with which function objects to be invoked upon exit are registered: just register a function with atexit() calling all of the functions registered with said global object. Also note that atexit() is only guaranteed to register up to 32 functions although implementations may support more registered functions. It seems ill-advised to use it as a registry for object clean-up function rather than the function which calling an object clean-up function as other libraries may have a need to register functions, too.
That said, I can't imagine why atexit() is particular useful in C++ where objects are automatically destroyed upon program termination anyway. Of course, this approach assumes that all objects are somehow held but that's normally necessary anyway in some form or the other and typically done using appropriate RAII objects.

Related

Does using a method always imply an indirect access to member fields through "this" pointer?

I was wondering if that type of thing falls under compiler's optimizations purview. From what I've gathered from this talk even std::unique_ptr is not truly "zero-cost" in part due to the implicit indirection of this pointer which plays a role in passing the actual underlying pointer to the unique pointer's member function.
Is there always a reference involved in passing member fields to methods of a class or can the compiler see that, say this method does not use certain fields and doesn't modify the rest so it will just pass them by value?
There seems to be some misunderstanding of what the talk discusses. There is a discussion of why unique_ptr is not "zero-cost". However, that discussion focuses on a specific case, the transferal of ownership. On the one hand, since there is a situation where there are costs, it is true that unique_ptr is not zero-cost. On the other hand, that conclusion is misleading, as it sounds just as all-encompassing as saying that it is zero-cost. A more accurate description would combine the two views: unique_ptr can be a zero-cost replacement for a raw pointer, but not always.
A unique_ptr can be zero-cost. The first question from the Q&A session at the end of the talk addresses this (starting at 36:36). Most member functions of smart_ptr are simple enough to be inlined by any C++ compiler that understands template syntax. There is no overhead associated with the this pointer and member functions. If you never transfer ownership, go on thinking of unique_ptr as zero-cost.
The extra cost comes when ownership is transferred. The talk specifically focused on passing a unique_ptr as a parameter to a function. This unambiguously gives the called function ownership of whatever the pointer points to. It also entails an additional run-time cost (two additional costs if the raw pointer version lacked exception safety).
The extra cost is not intrinsic to the C++ language, but rather comes from a commonly-used ABI (application binary interface). The ABI defines at a low level (think assembly) how parameters are passed to functions. According to this convention, there is an important difference between T* and unique_ptr<T> – the former is a primitive type, while the latter is an instance of a class. If I understood this part of the talk correctly, the ABI calls for primitive types to be placed directly in the call stack (potentially simply stored in a register), whereas class instances must exist in main memory and a pointer to the instance is placed directly in the stack / potentially in a register. Yes, even if the object is passed by value. Why? Because that's what the convention calls for. (There are better reasons, but they are tangential to the current subject.) In order for things like dynamic libraries to work, there needs to be a convention, and this ABI is what we have.
The upshot is that primitives receive preferential treatment, making them faster. When you switch a function's parameter from a pointer to a class instance, there is a runtime cost (of unknown size – it might be insignificant). This is the cost that prevents unique_ptr from being zero-cost in all cases. Zero-cost in many common cases, but not when a function takes a unique_ptr argument.

In how many ways can a function be invoked(called) in C++?

I know of one way to call a function :
func(x, y);
Are there more ways to call a function?
Functions can be invoked
explicitly, by providing an argument parenthesis after a designation of the function (in the case of constructors this is decidedly not formally correct wording, since they don't have names, but anyway),
implicitly, in particular destructors and default constructors, but also implicit type conversion,
via operators other than the function call operator (), in particular the copy assignment operator = and the dereferencing operator ->,
in a placement new expression, invocation of a specified allocation function by placing an argument parenthesis right after new (not sure if this counts as a separate way).
In addition library facilities can of course invoke functions for you.
I think the above list is exhaustive, but I'm not sure. I remember Andrei Alexandrescu enumerated the constructs that yielded callable thingies, in his Modern C++ Design book, and there was a surprise for me. So there is a possibility that the above is not exhaustive.
Arbitrary functions can be invoked:
using f(arguments...) notation
via a pointer to the function (whether member or non-)
via a std::function - (will check the implementation's left unspecified, though I'd expect it to use a pointer to function or pointer to member function under the covers so no new language features)
Class-specific functions are also invoked in certain situations:
constructors are invoked when objects are created on the stack, and when static/global or thread-specific objects or dynamically-allocated objects are dynamically initialised, or with placement new, and as expressions are evaluated
destructors are invoked when objects leave scope, are deleted, threads exit, temporaries are destroyed, and when the destructor is explicitly called ala x.~X()
all manner of operators ([], +=, ==, < etc.) may be invoked during expression evaluation
Arbitrary non-member functions may be run by:
functions may be run due to earlier std::atexit() or std::at_quick_exit() calls, and if they throw std::terminate may run
thread creation and asynchronous signals (again the interfaces accept pointer to functions, and there's no reason to think any implementation has or would use any other technique to achieve dispatch)
Specific functions are triggered in very specific situations:
main() is executed by the runtime
std::unexpected, std::unexpected_handler, std::terminate are invoked when dynamic exception specifications are violated
It's also possible to use setjmp and longjmp to "jump" back into a function... not quite the same thing as calling it though.
Though not truly "C++", it's also possible to arrange function execution using inline assembly language / linked assembler, writing to executable memory.
C++ is a fairly flexible language and therefore this is a very vague question as there can be a 100 different ways of "calling a function" given not limitations of what is allowed.
Remember a function is only really a block of code sitting somewhere in memory. The act of "calling" a function is to some extent the following:
Putting the parameters required in the correct registers/stack locations
Moving the PC(Program Counter) to the location of the function in memory (this is usually done with a "call" type machine instruction)
Technically afterwards there might be some "clean-up" code depending on how the compiler implements functions.
In the end all methods come down to this happening in some way or another.
Perhaps not 100% relevant here but remember that in C++ functions can be members of a class.
class MyClass{
public:
void myFunction(int A);
}
Usually what happens in this case is that the class object is passed as a first parameters.
So the function call:
myObject.myFunction(A)
is in a way equivalent to calling:
myFunction(myObject,A)
if you look at function object you will see this kind of behavior.
Function objects reference
Ok so here is a short list:
call the function normally myFunc(a,b);
function pointers. typedef int(*funcP)(int,in);
Function objects. overload the () operator makes your object callable.
C++11 std::function replaces function pointers largely and I suggest you look into how these works
lambda functions are also a type of function in a way.
Delegates can have a variety of implementations.
Things like function pointers and delegates are many times used with the concept of a callback
You can use multi-cast delegates. (e.g. boost.signals2 or Qt Signals & slots)
You can bind to a function in a DLL and call it. DLL calling
There are various ways to call functions between processes and over the network. Usually refereed to as rpc implementations.
In a threaded environment things might also get more interesting as you might want to call functions in a different thread.
See Qt Signals & Slots threaded connections
Also thread pools can be used. link1 link2
Lastly I suppose it's a good idea to mention meta-programming and the idea of RTTI. This is not as strongly supported as say in languages like c#.
If this is to be manually implemented one would be able to at run-time search the list of available functions and call one. By this method it would be possible to match a function at run-time vs a string name. This is to some extent impemented by Qt's MOC system.
What are we counting as a different way? If I have a function that is a member of a class foo, then I might call it like this:
foo.func(x, y);
If I have a pointer to foo, I would do this
foo->func(x, y);
If I had a class bar that was derived from foo, I might call foo's constructor with an initialization list
bar::bar(const int x, const int y) : foo(x, y) {}
A constructor is just a function, after all.

Should I use std::function or a function pointer in C++?

When implementing a callback function in C++, should I still use the C-style function pointer:
void (*callbackFunc)(int);
Or should I make use of std::function:
std::function< void(int) > callbackFunc;
In short, use std::function unless you have a reason not to.
Function pointers have the disadvantage of not being able to capture some context. You won't be able to for example pass a lambda function as a callback which captures some context variables (but it will work if it doesn't capture any). Calling a member variable of an object (i.e. non-static) is thus also not possible, since the object (this-pointer) needs to be captured.(1)
std::function (since C++11) is primarily to store a function (passing it around doesn't require it to be stored). Hence if you want to store the callback for example in a member variable, it's probably your best choice. But also if you don't store it, it's a good "first choice" although it has the disadvantage of introducing some (very small) overhead when being called (so in a very performance-critical situation it might be a problem but in most it should not). It is very "universal": if you care a lot about consistent and readable code as well as don't want to think about every choice you make (i.e. want to keep it simple), use std::function for every function you pass around.
Think about a third option: If you're about to implement a small function which then reports something via the provided callback function, consider a template parameter, which can then be any callable object, i.e. a function pointer, a functor, a lambda, a std::function, ... Drawback here is that your (outer) function becomes a template and hence needs to be implemented in the header. On the other hand you get the advantage that the call to the callback can be inlined, as the client code of your (outer) function "sees" the call to the callback will the exact type information being available.
Example for the version with the template parameter (write & instead of && for pre-C++11):
template <typename CallbackFunction>
void myFunction(..., CallbackFunction && callback) {
...
callback(...);
...
}
As you can see in the following table, all of them have their advantages and disadvantages:
function ptr
std::function
template param
can capture context variables
no1
yes
yes
no call overhead (see comments)
yes
no
yes
can be inlined (see comments)
no
no
yes
can be stored in a class member
yes
yes
no2
can be implemented outside of header
yes
yes
no
supported without C++11 standard
yes
no3
yes
nicely readable (my opinion)
no
yes
(yes)
(1) Workarounds exist to overcome this limitation, for example passing the additional data as further parameters to your (outer) function: myFunction(..., callback, data) will call callback(data). That's the C-style "callback with arguments", which is possible in C++ (and by the way heavily used in the WIN32 API) but should be avoided because we have better options in C++.
(2) Unless we're talking about a class template, i.e. the class in which you store the function is a template. But that would mean that on the client side the type of the function decides the type of the object which stores the callback, which is almost never an option for actual use cases.
(3) For pre-C++11, use boost::function
void (*callbackFunc)(int); may be a C style callback function, but it is a horribly unusable one of poor design.
A well designed C style callback looks like void (*callbackFunc)(void*, int); -- it has a void* to allow the code that does the callback to maintain state beyond the function. Not doing this forces the caller to store state globally, which is impolite.
std::function< int(int) > ends up being slightly more expensive than int(*)(void*, int) invocation in most implementations. It is however harder for some compilers to inline. There are std::function clone implementations that rival function pointer invocation overheads (see 'fastest possible delegates' etc) that may make their way into libraries.
Now, clients of a callback system often need to set up resources and dispose of them when the callback is created and removed, and to be aware of the lifetime of the callback. void(*callback)(void*, int) does not provide this.
Sometimes this is available via code structure (the callback has limited lifetime) or through other mechanisms (unregister callbacks and the like).
std::function provides a means for limited lifetime management (the last copy of the object goes away when it is forgotten).
In general, I'd use a std::function unless performance concerns manifest. If they did, I'd first look for structural changes (instead of a per-pixel callback, how about generating a scanline processor based off of the lambda you pass me? which should be enough to reduce function-call overhead to trivial levels.). Then, if it persists, I'd write a delegate based off fastest possible delegates, and see if the performance problem goes away.
I would mostly only use function pointers for legacy APIs, or for creating C interfaces for communicating between different compilers generated code. I have also used them as internal implementation details when I am implementing jump tables, type erasure, etc: when I am both producing and consuming it, and am not exposing it externally for any client code to use, and function pointers do all I need.
Note that you can write wrappers that turn a std::function<int(int)> into a int(void*,int) style callback, assuming there are proper callback lifetime management infrastructure. So as a smoke test for any C-style callback lifetime management system, I'd make sure that wrapping a std::function works reasonably well.
Use std::function to store arbitrary callable objects. It allows the user to provide whatever context is needed for the callback; a plain function pointer does not.
If you do need to use plain function pointers for some reason (perhaps because you want a C-compatible API), then you should add a void * user_context argument so it's at least possible (albeit inconvenient) for it to access state that's not directly passed to the function.
The only reason to avoid std::function is support of legacy compilers that lack support for this template, which has been introduced in C++11.
If supporting pre-C++11 language is not a requirement, using std::function gives your callers more choice in implementing the callback, making it a better option compared to "plain" function pointers. It offers the users of your API more choice, while abstracting out the specifics of their implementation for your code that performs the callback.
std::function may bring VMT to the code in some cases, which has some impact on performance.
The other answers answer based on technical merits. I'll give you an answer based on experience.
As a very heavy X-Windows developer who always worked with function pointer callbacks with void* pvUserData arguments, I started using std::function with some trepidation.
But I find out that combined with the power of lambdas and the like, it has freed up my work considerably to be able to, at a whim, throw multiple arguments in, re-order them, ignore parameters the caller wants to supply but I don't need, etc. It really makes development feel looser and more responsive, saves me time, and adds clarity.
On this basis I'd recommend anyone to try using std::function any time they'd normally have a callback. Try it everywhere, for like six months, and you may find you hate the idea of going back.
Yes there's some slight performance penalty, but I write high-performance code and I'm willing to pay the price. As an exercise, time it yourself and try to figure out whether the performance difference would ever matter, with your computers, compilers and application space.

STL Functional -- Why?

In C++ Standard Template Library, there's a 'functional' part, in which many classes have overloaded their () operator.
Does it bring any convenience to use functions as objects in C++?
Why can't we just use function pointer instead? Any examples?
Ofcourse, One can always use Function pointers instead of Function Objects, However there are certain advantages which function objects provide over function pointers, namely:
Better Performance:
One of the most distinct and important advantage is they are more likely to yield better performance. In case of function objects more details are available at compile time so that the compiler can accurately determine and hence inline the function to be called unlike in case of function pointers where the derefencing of the pointer makes it difficult for the compiler to determine the actual function that will be called.
Function objects are Smart functions:
Function objects may have other member functions and attributes.This means that function objects have a state. In fact, the same function, represented by a function object, may have different states at the same time. This is not possible for ordinary functions. Another advantage of function objects is that you can initialize them at runtime before you use/call them.
Power of Generic programming:
Ordinary functions can have different types only when their signatures differ. However, function objects can have different types even when their signatures are the same. In fact, each functional behavior defined by a function object has its own type. This is a significant improvement for generic programming using templates because one can pass functional behavior as a template parameter.
Why can't we just use function pointer instead? Any examples?
Using C style function pointer cannot leverage the advantage of inlining. Function pointer typically requires an additional indirection for lookup.
However, if operator () is overloaded then it's very easy for compiler to inline the code and save an extra call, so increase in performance.
The other advantage of overloaded operator () is that, one can design a function which implicitly considers the function object as argument; no need to pass it as a separate function. Lesser the hand coded program, lesser the bugs and better readability.
This question from Bjarne Stroustrup (C++ inventor) webpage explains that aspect nicely.
C++ Standard (Template) Library uses functional programming with overloaded operator (), if it's needed.
> Does it bring any convenience to use functions as objects in C++?
Yes: The C++ template mechanism allows all other C/C++ programming styles (C style and OOP style, see below).
> Why can't we just use function pointer instead? Any examples?
But we can: A simple C function pointer is an object with a well defined operator(), too.
If we design a library, we do not want to force anyone to use that C pointer style if not desired. It is usually as undesired as forcing everything/everyone to be in/use OOP style; see below.
From C-programmers and functional programmers views, OOP not only tends to be slower but more verbose and in most cases to be the wrong direction of abstraction ("information" is not and should not be an "object"). Because of that, people tend to be confused whenever the word "object" is used in other contexts.
In C++, anything with the desired properties can be seen as an object. In this case, a simple C function pointer is an object, too. This does not imply that OOP paradigms are used when not desired; it is just a proper way to use the template mechanism.
To understand the performance differences, compare the programming(-language) styles/paradigms and their possible optimisations:
C style:
Function pointer with its closure ("this" in OOP, pointer to some structure) as first parameter.
To call the function, the address of the function needs to be accessed first.
That is 1 indirection; no inlining possible.
C++ (and Java) OOP style:
Reference to an object derived from a class with virtual functions.
Reference is 1st pointer.
Pointer to virtual-table is 2nd pointer.
Function pointer in virtual-table is 3rd pointer.
That are 3 indirections; no inlining possible.
C++ template style:
Copy of an object with () function.
No virtual-table since the type of that object is known at compile time.
The address of the function is known at compile time.
That are 0 indirections; inlining possible.
The C++ templates are versatile enough to allow the other two styles above, and in the case of inlining they can even outperform…
compiled functional languages: (excluding JVM and Javascript as target platforms because of missing "proper tail calls")
Function pointer and reference to its closure in machine registers.
It is usually no function "call" but a GOTO like jump.
Functions do not need the stack, no address to jump back, no parameters nor local variables on the stack.
Functions have their garbage collectable closure(s) containing parameters and a pointer to the next function to be called.
For the CPU to predict the jump, the address of the function needs to be loaded to a register as early as possible.
That is 1 indirection with possible jump prediction; everything is nearly as fast as inlined.
The main difference is that function objects are more powerful than plain function pointers as they can hold state. Most algorithms take templates functions rather than plain function pointers, which enable the use of powerful constructs as binders that call functions with different signatures by filling extra arguments with values stored on the functor, or the newer lambdas in C++11. Once the algorithms are designed to take functors it just makes sense to provide a set of predefined generic function objects in the library.
Aside from that there are potential advantages in that in most cases those functors are simple classes for which the compiler has the full definition and can perform inlining of the function calls improving performance. This is the reason why std::sort can be much faster than qsort from the C library.

Should I stop using abstract base classes/interfaces and instead use boost::function/std::function?

I've just learned about what std::function really is about and what it is used for and I have a question: now that we essentially have delegates, where and when should we use Abstract Base Classes and when, instead, we should implement polymorphism via std::function objects fed to a generic class? Did ABC receive a fatal blow in C++11?
Personally my experience so far is that switching delegates is much simpler to code than creating multiple inherited classes each for particular behaviour... so I am a little confused abotu how useful Abstract Bases will be from now on.
Prefer well defined interfaces over callbacks
The problem with std::function (previously boost::function) is that most of the time you need to have a callback to a class method, and therefore need to bind this to the function object. However in the calling code, you have no way to know if this is still around. In fact, you have no idea that there even is a this because bind has molded the signature of the calling function into what the caller requires.
This can naturally cause weird crashes as the callback attempts to fire into methods for classes that no longer exist.
You can, of course use shared_from_this and bind a shared_ptr to a callback, but then your instance may never go away. The person that has a callback to you now participates in your ownership without them even knowing about it. You probably want more predictable ownership and destruction.
Another problem, even if you can get the callback to work fine, is with callbacks, the code can be too decoupled. The relationships between objects can be so difficult to ascertain that the code readability becomes decreased. Interfaces, however, provide a good compromise between an appropriate level of decoupling with a clearly specified relationship as defined be the interface's contract. You can also more clearly specify, in this relationship, issues like who owns whom, destrcution order, etc.
An additional problem with std::function is that many debuggers do not support them well. In VS2008 and boost functions, you have to step through about 7 layers to get to your function. Even if all other things being equal a callback was the best choice, the sheer annoyance and time wasted accidentally stepping over the target of std::function is reason enough to avoid it. Inheritance is a core feature of the language, and stepping into an overridden method of an interface is instantaneous.
Lastly I'll just add we don't have delegates in C++. Delegates in C# are a core part of the language, just like inheritance is in C++ and C#. We have a std library feature which IMO is one layer removed from a core language functionality. So its not going to be as tightly integrated with other core features of the language. It instead helps formalize the idea of function objects that have been a C++ idiom for a good while now.
I do not see how one can come to the conclusion that function pointers make abstract base classes obsolete.
A class, encapsulates methods and data that pertains to it.
A function pointer, is a function pointer. It has no notion of an encapsulating object, it merely knows about the parameters passed to it.
When we write classes, we are describing well-defined objects.
Function pointers are great for a good number of things, but changing the behaviour of an object isn't necessarily the target-audience (though I admit there may be times when you would want to do so, e.g. callbacks, as Doug.T mentions).
Please don't confuse the two.