Serializing function objects - c++

Is it possible to serialize and deserialize a std::function, a function object, or a closure in general in C++? How? Does C++11 facilitate this? Is there any library support available for such a task (e.g., in Boost)?
For example, suppose a C++ program has a std::function which is needed to be communicated (say via a TCP/IP socket) to another C++ program residing on another machine. What do you suggest in such a scenario?
Edit:
To clarify, the functions which are to be moved are supposed to be pure and side-effect-free. So I do not have security or state-mismatch problems.
A solution to the problem is to build a small embedded domain specific language and serialize its abstract syntax tree.
I was hoping that I could find some language/library support for moving a machine-independent representation of functions instead.

Yes for function pointers and closures. Not for std::function.
A function pointer is the simplest — it is just a pointer like any other so you can just read it as bytes:
template <typename _Res, typename... _Args>
std::string serialize(_Res (*fn_ptr)(_Args...)) {
return std::string(reinterpret_cast<const char*>(&fn_ptr), sizeof(fn_ptr));
}
template <typename _Res, typename... _Args>
_Res (*deserialize(std::string str))(_Args...) {
return *reinterpret_cast<_Res (**)(_Args...)>(const_cast<char*>(str.c_str()));
}
But I was surprised to find that even without recompilation the address of a function will change on every invocation of the program. Not very useful if you want to transmit the address. This is due to ASLR, which you can turn off on Linux by starting your_program with setarch $(uname -m) -LR your_program.
Now you can send the function pointer to a different machine running the same program, and call it! (This does not involve transmitting executable code. But unless you are generating executable code at run-time, I don't think you are looking for that.)
A lambda function is quite different.
std::function<int(int)> addN(int N) {
auto f = [=](int x){ return x + N; };
return f;
}
The value of f will be the captured int N. Its representation in memory is the same as an int! The compiler generates an unnamed class for the lambda, of which f is an instance. This class has operator() overloaded with our code.
The class being unnamed presents a problem for serialization. It also presents a problem for returning lambda functions from functions. The latter problem is solved by std::function.
std::function as far as I understand is implemented by creating a templated wrapper class which effectively holds a reference to the unnamed class behind the lambda function through the template type parameter. (This is _Function_handler in functional.) std::function takes a function pointer to a static method (_M_invoke) of this wrapper class and stores that plus the closure value.
Unfortunately, everything is buried in private members and the size of the closure value is not stored. (It does not need to, because the lambda function knows its size.)
So std::function does not lend itself to serialization, but works well as a blueprint. I followed what it does, simplified it a lot (I only wanted to serialize lambdas, not the myriad other callable things), saved the size of the closure value in a size_t, and added methods for (de)serialization. It works!

No.
C++ has no built-in support for serialization and was never conceived with the idea of transmitting code from one process to another, lest one machine to another. Languages that may do so generally feature both an IR (intermediate representation of the code that is machine independent) and reflection.
So you are left with writing yourself a protocol for transmitting the actions you want, and the DSL approach is certainly workable... depending on the variety of tasks you wish to perform and the need for performance.
Another solution would be to go with an existing language. For example the Redis NoSQL database embeds a LUA engine and may execute LUA scripts, you could do the same and transmit LUA scripts on the network.

No, but there are some restricted solutions.
The most you can hope for is to register functions in some sort of global map (e.g. with key strings) that is common to the sending code and the receiving code (either in different computers or before and after serialization).
You can then serialize the string associated with the function and get it on the other side.
As a concrete example the library HPX implements something like this, in something called HPX_ACTION.
This requires a lot of protocol and it is fragile with respect to changes in code.
But after all this is no different from something that tries to serialize a class with private data. In some sense the code of the function is its private part (the arguments and return interface is the public part).
What leaves you a slip of hope is that depending on how you organize the code these "objects" can be global or common and if all goes right they are available during serialization and deserialization through some kind predefined runtime indirection.
This is a crude example:
serializer code:
// common:
class C{
double d;
public:
C(double d) : d(d){}
operator(double x) const{return d*x;}
};
C c1{1.};
C c2{2.};
std::map<std::string, C*> const m{{"c1", &c1}, {"c2", &c2}};
// :common
main(int argc, char** argv){
C* f = (argc == 2)?&c1:&c2;
(*f)(5.); // print 5 or 10 depending on the runtime args
serialize(f); // somehow write "c1" or "c2" to a file
}
deserializer code:
// common:
class C{
double d;
public:
operator(double x){return d*x;}
};
C c1;
C c2;
std::map<std::string, C*> const m{{"c1", &c1}, {"c2", &c2}};
// :common
main(){
C* f;
deserialize(f); // somehow read "c1" or "c2" and assign the pointer from the translation "map"
(*f)(3.); // print 3 or 6 depending on the code of the **other** run
}
(code not tested).
Note that this forces a lot of common and consistent code, but depending on the environment you might be able to guarantee this.
The slightest change in the code can produce a hard to detect logical bug.
Also, I played here with global objects (which can be used on free functions) but the same can be done with scoped objects, what becomes trickier is how to establish the map locally (#include common code inside a local scope?)

Related

What is the advantage of a single-function-class over a function?

I have often encountered classes with the following structure:
class FooAlgorithm {
public:
FooAlgorithm(A a, B b);
void run();
private:
A _a;
B _b;
};
So classes with only one public member function that are also only run once.
Is there any advantage in any case over a single free function foo(A a, B b)?
The latter option is easier to call, potentially has less header dependencies and also has much less boilerplate in the header.
An object has state that can be set up at object construction time. Then the single member function can be called at a later time, without knowing anything about how the object was set up, and the function can refer to the state that was set up earlier. A standalone function cannot do that. Any state must be passed to it as arguments, or be global/static (global/static data is best avoided for a variety of reasons).
An hands-on example is worth a thousand abstract explanations, so here is an exercise. Consider a simple object:
struct Obj {
std::array<std::string, 42> attributes;
};
How would you sort a vector of such objects, comparing only the attribute number K (K being a run-time parameter in the range 0..41)? Use std::sort and do not use any global or static data. Note how std::sort compares two objects: it calls a user-provided comparator, and passes it two objects to be compared, but it knows nothing about the parameter K and cannot pass it along to the comparator.
For bigger projects it may be easier to structure your programm using OOP.
For a simple program like yours it is easier to use a single function.
Classes introduce a new dynamic to your program -> Better readability, security, moudularity, re-usability....
functions are used to organize code into blocks.

Is it idiomatically ok to put algorithm into class?

I have a complex algorithm. This uses many variables, calculates helper arrays at initialization and also calculates arrays along the way. Since the algorithm is complex, I break it down into several functions.
Now, I actually do not see how this might be a class from an idiomatic way; I mean, I am just used to have algorithms as functions. The usage would simply be:
Calculation calc(/* several parameters */);
calc.calculate();
// get the heterogenous results via getters
On the other hand, putting this into a class has the following advantages:
I do not have to pass all the variables to the other functions/methods
arrays initialized at the beginning of the algorithm are accessible throughout the class in each function
my code is shorter and (imo) clearer
A hybrid way would be to put the algorithm class into a source file and access it via a function that uses it. The user of the algorithm would not see the class.
Does anyone have valuable thoughts that might help me out?
Thank you very much in advance!
I have a complex algorithm. This uses many variables, calculates helper arrays at initialization and also calculates arrays along the way.[...]
Now, I actually do not see how this might be a class from an idiomatic way
It is not, but many people do the same thing you do (so did I a few times).
Instead of creating a class for your algorithm, consider transforming your inputs and outputs into classes/structures.
That is, instead of:
Calculation calc(a, b, c, d, e, f, g);
calc.calculate();
// use getters on calc from here on
you could write:
CalcInputs inputs(a, b, c, d, e, f, g);
CalcResult output = calculate(inputs); // calculate is now free function
// use getters on output from here on
This doesn't create any problems and performs the same (actually better) grouping of data.
I'd say it is very idiomatic to represent an algorithm (or perhaps better, a computation) as a class. One of the definitions of object class from OOP is "data and functions to operate on that data." A compex algorithm with its inputs, outputs and intermediary data matches this definition perfectly.
I've done this myself several times, and it simplifies (human) code flow analysis significantly, making the whole thing easier to reason about, to debug and to test.
If the abstraction for the client code is an algorithm, you
probably want to keep a pure functional interface, and not
introduce additional types there. It's quite common, on the
other hand, for such a function to be implemented in a source
file which defines a common data structure or class for its
internal use, so you might have:
double calculation( /* input parameters */ )
{
SupportClass calc( /* input parameters */ );
calc.part1();
calc.part2();
// etc...
return calc.results();
}
Depending on how your code is organized, SupportClass will be
in an unnamed namespace in the source file (probably the most
common case), or in a "private" header, included only by the
sources involved in the algorith.
It really depends of what kind of algorithm you want to encapsulate. Generally I agree with John Carmack : "Sometimes, the elegant implementation is just a function. Not a method. Not a class. Not a framework. Just a function."
It really boils down to: do the algorithm need access to the private area of the class that is not supposed to be public? If the answer is yes (unless you are willing to refactor your class interface, depending on the specific cases) you should go with a member function, if not, then a free function is good enough.
Take for example the standard library. Most of the algorithms are provided as free functions because they only access the public interface of the class (with iterators for standard containers, for example).
Do you need to call the exact same functions in the exact same order each time? Then you shouldn't be requiring calling code to do this. Splitting your algorithm into multiple functions is fine, but I'd still have one call the next and then the next and so on, with a struct of results/parameters being passed along the way. A class doesn't feel right for a one-off invocation of some procedure.
The only way I'd do this with a class is if the class encapsulates all the input data itself, and you then call myClass.nameOfMyAlgorithm() on it, among other potential operations. Then you have data+manipulators. But just manipulators? Yeah, I'm not so sure.
In modern C++ the distinction has been eroded quite a bit. Even from the operator overloading of the pre-ANSI language, you could create a class whose instances are syntactically like functions:
struct Multiplier
{
int factor_;
Multiplier(int f) : factor_(f) { }
int operator()(int v) const
{
return v * _factor;
}
};
Multipler doubler(2);
std::cout << doubler(3) << std::endl; // prints 6
Such a class/struct is called a functor, and can capture "contextual" values in its constructor. This allows you to effectively pass the parameters to a function in two stages: some in the constructor call, some later each time you call it for real. This is called partial function application.
To relate this to your example, your calculate member function could be turned into operator(), and then the Calculation instance would be a function! (or near enough.)
To unify these ideas, you can try thinking of a plain function as a functor of which there is only one instance (and hence no need for a constructor - although this is no guarantee that the function only depends on its formal parameters: it might depend on global variables...)
Rather than asking "Should I put this algorithm in a function or a class?" instead ask yourself "Would it be useful to be able to pass the parameters to this algorithm in two or more stages?" In your example, all the parameters go into the constructor, and none in the later call to calculate, so it makes little sense to ask users of your class make two calls.
In C++11 the distinction breaks down further (and things get a lot more convenient), in recognition of the fluidity of these ideas:
auto doubler = [] (int val) { return val * 2; };
std::cout << doubler(3) << std::endl; // prints 6
Here, doubler is a lambda, which is essentially a nifty way to declare an instance of a compiler-generated class that implements the () operator.
Reproducing the original example more exactly, we would want a function-like thing called multiplier that accepts a factor, and returns another function-like thing that accepts a value v and returns v * factor.
auto multiplier = [] (int factor)
{
return [=] (int v) { return v * factor; };
};
auto doubler = multiplier(2);
std::cout << doubler(3) << std::endl; // prints 6
Note the pattern: ultimately we're multiplying two numbers, but we specify the numbers in two steps. The functor we get back from calling multiplier acts like a "package" containing the first number.
Although lambdas are relatively new, they are likely to become a very common part of C++ style (as they have in every other language they've been added to).
But sadly at this point we've reached the "cutting edge" as the above example works in GCC but not in MSVC 12 (I haven't tried it in MSVC 13). It does pass the intellisense checking of MSVC 12 though (they use two completely different compilers)! And you can fix it by wrapping the inner lambda with std::function<int(int)>( ... ).
Even so, you can use these ideas in old-school C++ when writing functors by hand.
Looking further ahead, resumable functions may make it into some future version of the language (Microsoft is pushing hard for them as they are practically identical to async/await in C#) and that is yet another blurring of the distinction between functions and classes (a resumable function acts like a constructor for a state machine class).

Mimicing C# 'new' (hiding a virtual method) in a C++ code generator

I'm developing a system which takes a set of compiled .NET assemblies and emits C++ code which can then be compiled to any platform having a C++ compiler. Of course, this involves some extensive trickery due to various things .NET does that C++ doesn't.
One such situation is the ability to hide virtual methods, such as the following in C#:
class A
{
virtual void MyMethod()
{ ... }
}
class B : A
{
override void MyMethod()
{ ... }
}
class C : B
{
new virtual void MyMethod()
{ ... }
}
class D : C
{
override void MyMethod()
{ ... }
}
I came up with a solution to this that seemed clever and did work, as in the following example:
namespace impdetails
{
template<class by_type>
struct redef {};
}
struct A
{
virtual void MyMethod( void );
};
struct B : A
{
virtual void MyMethod( void );
};
struct C : B
{
virtual void MyMethod( impdetails::redef<C> );
};
struct D : C
{
virtual void MyMethod( impdetails::redef<D> );
};
This does of course require that all the call sites for C::MyMethod and D::MyMethod construct and pass the dummy object, as in this example:
C *c_d = &d;
c_d->MyMethod( impdetails::redef<C>() );
I'm not worried about this extra source code overhead; the output of this system is mainly not intended for human consumption.
Unfortunately, it turns out this actually causes runtime overhead. Intuitively, one would expect that because impdetails::redef<> is empty, it would take no space and passing it would involve no code.
However, the C++ standard, for reasons I understand but don't totally agree with, mandates that objects cannot have zero size. This leaves us with a situation where the compiler actually emits code to create and pass the object.
In fact, at least on VC2008, I found that it even went to the trouble of zeroing the dummy byte, even in release builds! I'm not sure why that was necessary, but it makes me even more not want to do it this way.
If all else fails I could always change the actual name of the function, such as perhaps having MyMethod, MyMethod$1, and MyMethod$2. However, this causes more problems. For instance, $ is actually not legal in C++ identifiers (although compilers I've tested will allow it.) A totally acceptable identifier in the output program could also be an identifier in the input program, which suggests a more complex approach would be needed, making this a less attractive option.
It also so turns out that there are other situations in this project where it would be nice to be able to modify method signatures using arbitrary type arguments similar to how I'm passing a type to impdetails::redef<>.
Is there any other clever way to get around this, or am I stuck between adding overhead at every call site or mangling names?
After considering some other aspects of the system as well such as interfaces in .NET, I am starting to think maybe it's better - perhaps even more-or-less necessary - to not even use the C++ virtual calling mechanism at all. The more I consider, the messier using that mechanism is getting.
In this approach, each user object class would have a separate struct for the vtable (perhaps kept in a separate namespace like vtabletype::. The generated class would have a pointer member that would be initialized through some trickery to point to a static instance of the vtable. Virtual calls would explicitly use a member pointer from that vtable.
If done properly this should have the same performance as the compiler's own implementation would. I've confirmed it does on VC2008. (By contrast, just using straight C, which is what I was planning on earlier, would likely not perform as well, since compilers often optimize this into a register.)
It would be hellish to write code like this manually, but of course this isn't a concern for a generator. This approach does have some advantages in this application:
Because it's a much more explicit approach, one can be more sure that it's doing exactly what .NET specifies it should be doing with respect to newslot as well as selection of interface implementations.
It might be more efficient (depending on some internal details) than a more traditional C++ approach to interfaces, which would tend to invoke multiple inheritance.
In .NET, objects are considered to be fully constructed when their .ctor runs. This impacts how virtual functions behave. With explicit knowledge of the vtables, this could be achieved by writing it in during allocation. (Although putting the .ctor code into a normal member function is another option.)
It might avoid redundant data when implementing reflection.
It provides better control and knowledge of object layout, which could be useful for the garbage collector.
On the downside, it totally loses the C++ compiler's overloading feature with regard to the vtable entries: those entries are data members, not functions, so there is no overloading. In this case it would be tempting to just number the members (say _0, _1...) This may not be so bad when debugging, since once the pointer is followed, you'll see an actual, properly-named member function anyway.
I think I may end up doing it this way but by all means I'd like to hear if there are better options, as this is admittedly a rather complex approach (and problem.)

How to perform type scanning in C++?

I have an ESB. Any serialized message transports its own fully qualified name (that is, namespace + class name). I have a concrete type for each message that encapsulates a specific logic to be executed.
Every time I receive a message, I need to deserialize it at first, so I can perform its operations --once more, depending on its concrete type--.
I need a way to register every single class at compile time or during my application initialization.
With .net I would use reflection to scan assemblies and discover the message types during initialization, but how would you do it in C++?
C++ has no reflection capability. I suppose you could try to scan object files, etc., but there's no reliable way to do this (AFAIK); the compiler may entirely eliminate or mangle certain things.
Essentially, for serialization, you will have to do the registration (semi-)manually. But you may be interested in a serialization library that will help out with the chores, such as Boost Serialization.
Since there is no reflection in C++, I would suggest using an external script to scan your source code for all relevant classes (which is easy if you use empty dummy #defines to annotate them in the source code) and have it generate the registration code.
I personally use the manual registration road. If you forget to register... then the test don't work anyway.
You just have to use a factory, and implement some tag dispatching. For example:
typedef void (*ActOnMessageType)(Message const&);
typedef std::map<std::string, ActOnMessageType> MessageDispatcherType;
static MessageDispatcherType& GetDispatcher() {
static MessageDispatcherType D; return D;
}
static bool RegisterMessageHandler(std::string name, ActOnMessageType func) {
return GetDispatcher().insert(std::make_pair(name, func)).second;
}
Then you just prepare your functions:
void ActOnFoo(Message const& m);
void ActOnBar(Message const& m);
And register them:
bool const gRegisteredFoo = RegisterMessageHandler("Foo", ActOnFoo);
bool const gRegisteredBar = RegsiterMessageHandler("Bar", ActOnBar);
Note: I effectively use a lazily initialized Singleton, in order to allow decoupling. That is the registration is done during the library load and thus each Register... call is placed in the file where the function is defined. The one difference with a global variable is that here the dispatching map is actually constant once the initialization ends.

Closures in C++

I've found myself in a strange place, mentally. In a C++ project, I long for closures.
Background. There's a Document-type class with a public Render method which spawns a deep call tree. There's some transient state that only makes sense during rendering. Right now it resides in the class like regular member variables. However, this is not satisfactory on some levels - this data only makes sense during a Render call, why store it all the time? Passing it around in arguments would be ugly - there are around 15 variables there. Passing around a structure would add a lot of "RenderState->..." in the lower-level methods.
So what do I want? I want the world, like we all do. Specifically, a set of variables that are:
available to some methods in a class (not all of them)
accessible by name alone (no pState->... stuff - so that refactoring is easy)
not copied around on every method call
only live during a method call and up its call tree (assuming trees grow up)
live on a stack
I know I can have some of those properties with C++ - but not all of them. Tell me I'm not turning weird.
Heck, in Pascal, of all places, nested functions give you all that...
So what is a good workaround to emulate closures in C++, getting as many of the above benefits as possible?
Standard C++ since C++11 provides native lambda expressions and several compilers (VC10+ GCC and clang at least) implements it.
With GCC and Clang you can activate it with "--std=c++11" (or use a higher version of C++ if available). VC10 and later versions have it activated without need for flags.
By the way, you can also use boost::lambda (that is not perfect but works with C++03) also provide lambda in C++.
You don't have nested functions, but you have local classes:
void Document::Render(Param)
{
class RenderState
{
public:
RenderState(Document&)
{
//...
}
void Go(Param);
private:
// "Nested" functions
// ....
// Data that nested functions operate on
// ...
};
RenderState s(*this);
s.Go(Param);
}
See this GotW article for more information
Personally, I'd go with the RenderState approach.
Alternatively, if there's a well-defined set of Render-only functions that all require access to the same data, I'd seriously investigate pulling those into their own DocumentRenderer class that contains both the appropriate methods and the appropriate member variables. (This is similar to Fowler's "method object" refactoring.)
C++ doesn't have nested functions, but local classes can serve as an imperfect solution. (Imperfect because local classes' methods cannot access variables of the enclosing class and because they can't be used to instantiate templates.) A local class is simply a class that's declared, along with its methods, within the body of a function. Herb Sutter discusses local classes in more detail here.
Local classes are used to implement Boost's ScopeExit library. ScopeExit's reviewers noted that ScopeExit "suggests a method for creating a general closure mechanism as a library," so if you aren't happy with a RenderState or DocumentRenderer approach, ScopeExit's implementation may give you some ideas for closures in C++.
Currently there are no closures in C++ that would generate "orinary" first-class functions (whether member or non-member). Moreover, there's no standard way to implement such closures.
Closure semantics is available for functors in template metaprogramming at compile time, but that's a completely different kind of beast. In order to obtain a true run-time closure functionality for first-class functions you haver no other choice but to use a non-standard low-level implementation like this one, for example.
A functor is basically a closure.
Why the downvotes? Take Érics comment, change void Go(Param); to void operator () (Param); and there you have it.
There is no way to keep the stack in a native application after the function has exited. But this would be neccesary to make closures like the ones in Javascript. And there is no way to reference a function's stack without doing anything evil. A class that acts like a function (=a functor) would have to get all the relevant information passed somehow, but this is as close as you get in C++. It has state, it has code, and you can pass it around.
Please explain, where am I wrong?
As long as the local variables you want to bind are in scope, you can try something like the following to bind them to your inner class. Though, if you have read the above posted GotW article, it is a fragile solution.
#include <iostream>
using namespace std;
int main() {
int x = 1;
cout << x << endl; // 1
class Inner {
public:
Inner(int& x) : bound_x(x) {}
void do_sth() { ++bound_x; }
private:
int& bound_x;
};
Inner i(x);
i.do_sth();
cout << x << endl; // 2
x = 5;
i.do_sth();
cout << x << endl; // 6
return 0;
}