How to get a meaningful function signature from anything callable - c++

Consider such a beast:
template<typename Func>
void register_function(Func func) {
// type-erase Func and pass it on to some other function
}
Assume that this can be passed anything callable.
I know how to get at the function's signature if Func is a plain function type. Given that func could be a plain function, a std::function<F>, or a function object (a std::bind() expression), how can I get at the function's arguments?
Note:
in this case, the functions only ever have either zero, one, or two arguments
if it's a function object, it's the result of std::bind()
the signature is needed in order to get at the argument's types, which need to be usable in the type-erased thing passed on
this is strictly C++03 (embedded platform), so no variable template arguments etc.

Impossible. A function object can have overloaded or templated operator(). Thus the idea of it having "a signature" simply doesn't apply, because it can have an unbounded number of signatures.
If you restrict it to only having one signature, then you can take the address of operator() and then get the arguments from the member function pointer type using regular template specialization.

If you know at runtime the signature of some called (plain) function, you could use (on Linux notably) the libffi to call it.
If you don't even know at runtime the signature of the function to call, it is impossible, since in general the ABI conventions will dictate different ways of passing function arguments according to their type.
For instance, the x86-64 ABI (followed on most Linux 64 bits x86-64 systems) requires that floating point values and integral values are passed in different set of registers.
See for example the x86 calling conventions wikipage.

Related

Is it possible to dynamically create and store pointers for functions for which we don't know the number and type of arguments?

I've been given a bunch of dummy functions, each one with its own return type, number (and types) of arguments and I'm trying to figure out a way to create function pointers of the correct type to them automatically, then store them inside a map to be retrieved at will. In a nutshell, I'm stuck at creating the actual function pointers. The way of storing them in a map is a separate, follow-up question, due to their variable types.
I think that templates are the way to go, and I've tried creating a templated function that returns the appropriately-typed pointer given the address and types of a function. I think it could not be possible though, so any input is appreciated.
Code for the aforementioned function:
template <typename retType, typename ... argTypes> retType makeFuncPtr(void* funcAddr) {
retType (*ptr)(argTypes) = funcAddr;
return ptr;
}
I'm getting an error "Declaration type contains unexpanded parameter pack 'argTypes'". What am I doing wrong and also which is the appropriate return type for this function, as I'm not actually sure about it?
The error you ask about is because in the line:
retType (*ptr)(argTypes) = funcAddr;
there is no ... after argTypes. Note this would not actually fix the situation because a void pointer can not be converted to some other kind of pointer without a cast. And also you could not convert the function pointer to retType.
If the functions have different signatures this is a fairly tricky problem, I suggest you take a look at libffi, the tricky part here is not storing the function pointers (so long as they are not non-static member functions you can simply cast to void * and store that), the tricky part is using the stored pointer value to make a call.
libffi gives you the ability to describe a function's calling convention, return type and expected arguments. You could then write code that compares the arguments you actually received and either convert or produce an error as appropriate. With C++ it would even be possible to produce that description programmatically (your template function would take a function pointer as a parameter then use the parameter pack to map to the libffi argument type values).

In C++ how do we call everything with a valid X(Args...) syntax that execute a function?

In C++, there is a Callable concept as well as the std::is_function type traits. I was wondering how the standard calls X, when the exact expression
X(Args...)
is valid and corresponds to a call (excluding constructors and C macros). For now, I can think of the following that fit in that category:
functions: R(Args...) (with optional const, volatile, &, && qualifiers)
function pointers: R(*)(Args...)
function references: R(&)(Args...)
functors (struct/class with overloaded operator())
lambda
My questions:
How does the standard call X? (eg.: not a is_function type because if X is a function pointer, X(Args...) is valid but is_function is false)
Did I forgot something in my list?
Your question seems a bit confused. std::is_function only identifies actual function types. It doesn't even include pointers to functions. The FunctionObject concept includes any object type that you can apply the function call operator to. This basically covers your list.
The Callable concept includes FunctionObjects, but it includes other things. Callable adds member pointers (both data and functions) to the rest of the FunctionObject family.
The INVOKE feature of C++ is what gets applied to all Callable objects. INVOKE is not actually a function in C++; it's simply the standard's name for the algorithm for calling Callable objects with an argument list (C++17 corrects this absurd overseight, giving us std::invoke). Section 20.9.2 of the standard covers exactly how this algorithm is implemented.
The general gist is what you would expect. If the callable is a pointer-to-member-data, then you take the first argument, and apply that to the pointer-to-member-data (dereferencing that argument if it isn't a pointer). Same goes for pointer-to-member-functions, only you pass the rest of the arguments as parameters.
For FunctionObject types, you just use () to call it, passing it the arguments.

Is the *only* purpose of a *function signature* (as opp. to type) to define duplicates in a potential overload set - or are there other purposes?

Related to Why does casting a function to a function type that is identical except for return type fail?, I would like to understand, in a fuller way, the distinction between a function's type and a function's signature.
For example, the type of a function must typically be considered when dealing with function pointers, and the type of the function includes the return type of that function.
However, as noted in Mike Seymour's answer to the above-linked question, the signature of a function is different from the type of a function. The signature is certainly used to disambiguate from among potential overloaded functions (noting that the return type of functions does not play a role in identifying unique functions). But, I would now like to understand the relevance and importance of function signatures vs. function types. It occurs to me that the only purpose of function signatures in C++ is to identify overload candidates and/or unique functions in an overload set, during overload resolution.
Am I correct? Is overload resolution the only purpose of function signatures in C++? Or are there any other uses/applications of function signatures, besides (or only indirectly related to) overload resolution?
ADDENDUM For clarity, please note that I am specifically seeking to understand the distinction between the purpose of a function signature and a function type. I.e., I know that a function type is required both for the use of function pointers, and for a compiler/linker's implementation of a calling convention. However, the calling convention is relevant only after overload resolution is complete. I am here asking, specifically, if the only purpose of the function signature (as opposed to type) is for overload resolution.
Am I correct?
As far as I'm concerned, there are other purposes too. Consider that C also has function signatures but doesn't have overloading.
Apart from overloading, the fundamental purpose of function signatures is conforming to the calling convention of a particular platform.
When a function accepts arguments and returns values, the compiler needs to know the type and the size of the arguments in order to pass them correctly to a function. In general, function arguments are pushed onto the stack (this is not a universal rule though, especially on 64-bit architecture systems). Consider the following situation. If you call a function like
foo(42);
how does the compiler know what is the size of the integer value it has to pass to the function? The number 42 can be represented using various bit width, for example as a 1, 2, 4 (or even 8)-byte integer:
00101010
0000000000101010
00000000000000000000000000101010
Now if the function doesn't have a signature which tells that, for instance, the argument is a char (which is 1 byte), or a short (which may be 2 bytes) or an int, which may be 4 bytes, then the compiler has no way of determining the correct size. It means that if it pushes an arbitrary number of bytes to the stack, but the function expects another size, then stack corruption occurs.
Another good example is returning structures (struct). Usually, primitive return values (such as integers and floating-point numbers) are returned in a register; this is generally the EAX register on x86. But what if one wants to write a function returning a struct? if the overall size of the struct is so large that it doesn't fit into a register, the compiler must generate code that pushes the return value onto the stack as opposed to assigning it to a register. So if a function is defined as
int foo()
{
return 1337;
}
or as
struct bar {
int a;
char b[16];
float x;
};
struct bar foo()
{
struct bar ret;
ret.a = 0;
memcpy(&ret.b, "abcdefghijklmno", sizeof(ret.b));
ret.x = 3.1415927;
return ret;
}
different assembly (and machine code) will be generated - the first function that returns an integer will use the EAX register for storing the return value, but the second call will have to use the stack.
The standard mentions that signatures are used for name mangling and linking.
That being said, name mangling is not standarized. The return type is redundant in a function symbol (since there is only one possible return type for a function with a given name and arguments in a valid program, it is not required to differentiate two different symbols), but even then some ABIs do include the return type of a function in the mangled name, probably as a way of double checking that there is no violation of the rule above.

calling a function without knowing the number of parameters in advance

Suppose I have a dll with 2 functions.name of dll="dll1"
f1(int a, int b, int c);
f2(int a);
My program would take the function name ,the dll name and a "list" of parameters as input.
how would i call the appropriate function with its appropriate parameters.
i.e,
if input is
dll1
f1
list(5,8,9)
this would require me to call f1 with 3 parameters
if input was
dll1
f2
list(8)
it would require me to call f2 with one parameter
how would i call the function without knowing the number of parameters in advance.
further clarification:
how do I write code that will call any
function with all its arguments by building the argument list dynamically
using some other source of information
Since the generated code differs based on the number of parameters, you have two choices: you can write some code in assembly language to do the job (basically walk through the parameter list and push each on the stack before calling the function), or you can create something like an array of pointers to functions, one for each number of parameters you care about (e.g., 0 through 10). Most people find the latter a lot simpler to deal with (if only because it avoids using assembly language at all).
To solve the problem in general you need to know:
The calling conventions (those stdcall, cdecl, fastcall, thiscall (btw, the latter two can be combined in MSVC++), etc things) that govern how the functions receive their parameters (e.g. in special registers, on the stack, both), how they return values (same) and what they are allowed to trash (e.g. some registers).
Exact function prototypes.
You can find all this only in the symbol/debug information produced by the compiler and (likely to a lesser extent) the header file containing the prototypes for the functions in the DLL. There's one problem with the header file. If it doesn't specify the calling convention and the functions have been compiled with non-default calling conventions (via a compiler option), you have ambiguity to deal with. In either case you'll need to parse something.
If you don't have this information, the only option left is reverse engineering of the DLL and/or its user(s).
In order to correctly invoke an arbitrary function only knowing its prototype and calling convention at run time you need to construct code analogous to that produced by the compiler when calling this function when it's known at compile time. If you're solving the general problem, you'll need some assembly code here, not necessarily hand-written, run-time generated machine code is a good option.
Last but not least, you need some code to generate parameter values. This is most trivial with numeric types (ints, floats and the like) and arrays of them and most difficult with structures, unions and classes. Creating the latter on the fly may be at least as difficult as properly invoking functions. Don't forget that they may refer to other objects using pointers and references.
The general problem is solvable, but not cheaply. It's far easier to solve a few simple specific cases and maybe avoid the entire problem altogether by rewriting the functions to have less-variable parameters and only one calling convention OR by writing wrapper functions to do that.
You might want to check out the Named Parameter Idiom.
It uses method chaining to basically accomplish what you want.
It solves the problem where you know what a default set of arguments look like, but you only need to customize a few of them and not necessarily in the order they are declared.
If your clients know at compile-time, then can wrap it this way:
template<class Args...>
void CallFunctionPointer(void* pf, Args&&... args)
{
typedef void(*FunctionType)(Args...);
FunctionType* pf2 = (FunctionType*) pf;
(*pf2)(forward<Args>(args)...);
}
Note, if you pass the wrong number of paramters or the wrong type(s) of parameters behaviour is undefined.
Background:
In C/C++ you can cast a function pointer to any signature you want, however if you get it wrong behavior is undefined.
In your case there are two signatures you have mentioned:
void (*)(int)
and
void (*)(int, int, int)
When you load the function from the DLL it is your responsibility to make sure you cast it to the correct signature, with the correct number and types of parameters before you call it.
If you have control over the design of these functions, I would modify them to take a variable number of arguments. It the base type is always int, than just change the signature of all the functions to:
void (*)(int* begin, size_t n);
// begin points to an array of int of n elements
so that you can safely bind any of the functions to any number of arguments.

Detailed difference between functor's call and function call?

The key reason this works is that for_each () doesn’t actually assume
its third argument to be a function.
It simply assumes that its third
argument is something that can be
called with an appropriate argument. A
suitably defined object serves as well
as – and often better than – a
function. For example, it is easier to
inline the application operator of a
class than to inline a function passed
as a pointer to function.
Consequently, function objects often
execute faster than do ordinary
functions. An object of a class with
an application operator (§11.9) is
called a functionlike object, a
functor, or simply a function object.
[Stroustrup, C++ 3rd edition, 18.4-last paragraph]
I always thought that the operator
( ) call is just like function call
at runtime. how does it differ from
a normal function call?
Why is it easier to inline the
application operator than a normal
function?
How are they faster than function
call?
Generally, functors are passed to templated functions - if you're doing so, then it doesn't matter if you pass a "real" function (i.e. a function pointer) or a functor (i.e. a class with an overloaded operator()). Essentially, both have a function call operator and are thus valid template parameters for which the compiler can instantiate the for_each template. That means for_each is either instantiated with the specific type of the functor passed, or with the specific type of function pointer passed. And it's in that specialization that it is possible for functors to outperform function pointers.
After all, if you're passing a function pointer, then the compile-type type of the argument is just that - a function pointer. If for_each itself is not inlined, then this particular for_each instance is compiled to call an opaque function pointer - after all, how could the compiler inline a function pointer? It just knows its type, not which function of that type is actually passed - at least, unless it can use non-local information when optimizing, which is harder to do.
However, if you pass a functor, then the compile-time type of that functor is used to instantiate the for_each template. In doing so, you're probably passing a simple, non-virtual class with only one implementation of the appropriate operator(). So, when the compiler encounters a call to operator() it knows exactly which implementation is meant - the unique implementation for that functor - and now it can inline that.
If your functor uses virtual methods, the potential advantage disappears. And, of course, a functor is a class with which you can do all kinds of other inefficient things. But for the basic case, this is why it's easier for the compiler to optimize & inline a functor call than a function pointer call.
Summary
Function pointers can't be inlined since while compiling for_each the compiler has only the type of
the function and not the identity of the function. By contrast, functors can be inlined since even though the compiler only has the type of functor, the type generally suffices to uniquely identify the functor's operator() method.
Consider the two following template instantiations:
std::for_each<class std::vector<int>::const_iterator, class Functor>(...)
and
std::for_each<class std::vector<int>::const_iterator, void(*)(int)>(...)
Because the 1st is customised for each type of function object, and because operator() is often defined inline, then the compiler may, at its discretion, choose to inline the call.
In the 2nd scenario, the compiler will instantiate the template once for all functions of the same signature, therefore, it cannot easily inline the call.
Now, smart compilers may be able to figure out which function to call at compile time, especially in scenarios like this:
std::for_each(v.begin(), v.end(), &foo);
and still inline the function by generating custom instantiations instead of the single generic one mentioned earlier.
I always thought that the operator ( ) call is just like function call at runtime. how does it differ from a normal function call?
My guess is not very much. For evidence of this, look at your compiler's assembly output for each. Assuming the same level of optimization, it's likely to be nearly identical. (With the additional detail that the this pointer will have to get passed.)
Why is it easier to inline the application operator than a normal function?
To quote the blurb you quoted:
For example, it is easier to inline the application operator of a class than to inline a function passed as a pointer to function.
I am not a compiler person, but I read this as: If the function is being called through a function pointer, it's a hard problem for the compiler to guess whether the address stored in that function pointer will ever change at runtime, therefore it's not safe to replace the call instruction with the body of the function; come to think of it, the body of the function itself wouldn't necessarily be known at compile time.
How are they faster than function call?
In many circumstances I'd expect you wouldn't notice any difference. But, given your quotation's argument that the compiler is free to do more inlining, this could produce better code locality and fewer branches. If the code is called frequently this would produce notable speedup.