Pre-serialisation message objects - implementation?

Pre-serialisation message objects - implementation? - c++

I have a TCP client-server setup where I need to be able to pass messages of different formats at different times, using the same transmit/receive infrastructure.
Two different types of messages sent from client to server might be:
TIME_SYNC_REQUEST: Requesting server's game time. Contains no information other than the message type.
UPDATE: Describes all changes to game state that happened since the last update that was posted (if this is not the very first one after connecting), so that the server may update its data model where it sees fit.
(The message type to be included in the header, and any data to be included in the body of the message.)
In dynamic languages, I'd create an AbstractMessage type, and derive two different message types from it, with TimeSyncRequestMessage accommodating no extra data members, and UpdateMessage containing all necessary members (player position etc.), and use reflection to see what I need to actually serialise for socket send(). Since the class name describes the type, I would not even need an additional member for that.
In C++: I do not wish to use dynamic_cast to mirror the approach described above, for performance reasons. Should I use a compositional approach, with dummy members filling in for any possible data, and a char messageType? I guess another possibility is to keep different message types in differently-typed lists. Is this the only choice? Otherwise, what else could I do to store the message info until it's time to serialise it?

Maybe you can let the message class to do the serialization - Define a serialize interface, and each message implements this interface. So at the time you want to serialize and send, you call AbstractMessage::Serialize() to get the serialized data.

Unless you have some very high performance characteristics, I would use a self describing message format. This typically use a common format (say key=value), but no specific structure, instead known attributes would describe the type of the message, and then any other attributes can be extracted from that message using logic specific to that message type.
I find this type of messaging retains better backward compatibility - so if you have new attributes you want to add, you can add away and older clients will simply not see them. Messaging that uses fixed structures tend to fare less well.
EDIT: More information on self describing message formats. Basically the idea here is that you define a dictionary of fields - this is the universe of fields that your generic message contains. Now a message be default must contain some mandatory fields, and then it's up to you what other fields are added to the message. The serialization/deserialization is pretty straightforward, you end up constructing a blob which has all the fields you want to add, and at the other end, you construct a container which has all the attributes (imagine a map). The mandatory fields can describe the type, for example you can have a field in your dictionary which is the message type, and this is set for all messages. You interrogate this field to determine how to handle this message. Once you are in the handling logic, you simply extract the other attributes the logic needs from the container (map) and process them.
This approach affords the best flexibility, allows you to do things like only transmit fields that have really changed. Now how you keep this state on either side is up to you - but given you have a one-to-one mapping between message and the handling logic - you need neither inheritance or composition. The smartness in this type of system stems from how you serialize the fields (and deserialize so that you know what attribute in the dictionary the field is). For an example of such a format look at the FIX protocol - now I wouldn't advocate this for gaming, but the idea should demonstrate what a self describing message is.
EDIT2: I cannot provide a full implementation, but here is a sketch.
Firstly let me define a value type - this is the typical type of values which can exist for a field:
typedef boost::variant<int32, int64, double, std::string> value_type;
Now I describe a field
struct field
{
int field_key;
value_type field_value;
};
Now here is my message container
struct Message
{
field type;
field size;
container<field> fields; // I use a generic "container", you can use whatever you want (map/vector etc. depending on how you want to handle repeating fields etc.)
};
Now let's say that I want to construct a message which is the TIME_SYNC update, use a factory to generate me an appropriate skeleton
boost::unique_ptr<Message> getTimeSyncMessage()
{
boost::unique_ptr<Message> msg(new Message);
msg->type = { dict::field_type, TIME_SYNC }; // set the type
// set other default attributes for this message type
return msg;
}
Now, I want to set more attributes, and this is where I need a dictionary of supported fields for example...
namespace dict
{
static const int field_type = 1; // message type field id
// fields that you want
static const int field_time = 2;
:
}
So now I can say,
boost::unique_ptr<Message> msg = getTimeSyncMessage();
msg->setField(field_time, some_value);
msg->setField(field_other, some_other_value);
: // etc.
Now the serialization of this message when you are ready to send is simply stepping through the container and adding to the blob. You can use ASCII encoding or binary encoding (I would start with former first and then move to latter - depending on requirements). So an ASCII encoded version of the above could be something like:
1=1|2=10:00:00.000|3=foo
Here for arguments sake, I use a | to separate the fields, you can use something else that you can guarantee doesn't occur in your values. With a binary format - this is not relevant, the size of each field can be embedded in the data.
The deserialization would step through the blob, extract each field appropriately (so by seperating by | for example), use the factory methods to generate a skeleton (once you've got the type - field 1), then fill in all the attributes in the container. Later when you want to get a specific attribute - you can do something like:
msg->getField(field_time); // this will return the variant - and you can use boost::get for the specific type.
I know this is only a sketch, but hopefully it conveys the idea behind a self describing format. Once you've got the basic idea, there are lots of optimizations that can be done - but that's a whole another thing...

A common approach is to simply have a header on all of your messages. for example, you might have a header structure that looks like this:
struct header
{
int msgid;
int len;
};
Then the stream would contain both the header and the message data. You could use the information in the header to read the correct amount of data from the stream and to determine which type it is.
How the rest of the data is encoded, and how the class structure is setup, greatly depends on your architecture. If you are using a private network where each host is the same and runs identical code, you can use a binary dump of a structure. Otherwise, the more likely case, you'll have a variable length data structure for each type, serialized perhaps using Google Protobuf, or Boost serialization.
In pseudo-code, the receiving end of a message looks like:
read_header( header );
switch( header.msgid )
{
case TIME_SYNC:
read_time_sync( ts );
process_time_sync( ts );
break;
case UPDATE:
read_update( up );
process_update( up );
break;
default:
emit error
skip header.len;
break;
}
What the "read" functions look like depends on your serialization. Google protobuf is pretty decent if you have basic data structures and need to work in a variety of languages. Boost serialization is good if you use only C++ and all code can share the same data structure headers.

A normal approach is to send the message type and then send the serialized data.
On the receiving side, you receive the message type and based on that type, you instantiate the class via a factory method (using a map or a switch-case), and then let the object deserialize the data.

Your performance requirements are strong enough to rule out dynamic_cast? I do not see how testing a field on a general structure can possibly be faster than that, so that leaves only the different lists for different messages: you have to know by some other means the type of your object on every case. But then you can have pointers to an abstract class and do a static cast over those pointers.
I recommend that you re-assess the usage of dynamic_cast, I do not think that it be deadly slow for network applications.

On the sending end of the connection, in order to construct our message, we keep the message ID and header separate from the message data:
Message is a type that holds only the messageCategory and messageID.
Each such Message is pushed onto a unified messageQueue.
Seperate hashes are kept for data pertaining to each of the messageCategorys. In these, there is a record of data for each message of that type, keyed by messageID. The value type depends on the message category, so for a TIME_SYNC message we'd have a struct TimeSyncMessageData, for instance.
Serialisation:
Pop the message from the messageQueue, reference the appropriate hash for that message type, by messageID, to retrieve the data we want to serialise & send.
Serialise & send the data.
Advantages:
No potentially unused data members in a single, generic Message object.
An intuitive setup for data retrieval when the time comes for serialisation.

Related

Using different message types in one gRPC request

I need to get some data using gRPC from a server which have different types but are semantically related. You can think of it as a data which can have types A, B, and C. I was thinking about what is the proper method of transferring this data to client. I personally can think of three different methods:
Using single message with oneof:
In this method, I just define a single message as the following:
message MyData {
oneof test_oneof {
DataA a = 1;
DataB b = 2;
DataC c = 3;
}
}
Now I just add a single rpc method which can get different types of data. The problem with this method is extensibility. I will not be able to update my message by adding new message types DataD to this oneof as stated here (or at least that is what I understood, though I think oneof will be somehow useless with this limitation).
Using single message with Any:
In this method, I just define a single message as the following:
import "google/protobuf/any.proto";
message MyData {
google.protobuf.Any data = 1;
}
Here, I need a single rpc method. This method is also extensible, but the drawback is that Any types are still under development. In addition, code will be more complex because of required reflection codes.
Using multiple rpc calls:
In this method, I just define a single rpc call for each data type. This method is extensible too, but it creates a huge amount of similar code and I personally don't like it at all. I personally think since these data are semantically related, they need to be transferred using single rpc call.
What do you think as the best method for transferring this data using gRPC and protobuf?

It's hard to tell which one is the best because they have their own pros and cons. In general, however, having multiple rpc calls in this case is recommended because it's clear to clients as to how to use and easy to extend them later. You still can combine some of them into a rpc with optional parameters, though. Once you have multiple RPCs for each message, you might be able to make a common handler not to repeat the code.

Best way of serializing/deserializing a simple protocol in C++

I want to build a simple application protocol using Berkeley sockets in C++ using on Linux. The transport layer should be UDP, and the protocols will contain the following two parts:
The first part:
It is a fixed part which is the protocol Header with the following fields:
1. int HeaderType
2. int TransactionID
3. unsigned char Source[4]
4. unsigned char Destination[4]
5. int numberoftlvs
The second part
It will contain variable number of TLVs, each TLV will contain the following fields:
1. int type
2. int length
3. unsigned char *data "Variable length"
My question is for preparing the message to be sent over the wire,what's the best way to do serialization and deserialization, to be portable on all the systems like little Endian and big Endian?
Should I prepare a big buffer of "unsigned char", and start copying the fields one by one to it? And after that, just call send command?
If I am going to follow the previous way, how can I keep tracking the pointer to where to copy my fields, my guess would be to build for each datatype a function which will know how many bytes to move the pointer, correct?
If someone can provide me with a well explained example ,it will much appreciated.

some ideas... in no particular order... and probably not making sense all together
You can have a Buffer class. This class contains raw memory pointer where you are composing your message and it can contains counter or pointers to keep track of how much have you written, where you're writing and how far can you go.
Probably you would like to have one instance of the Buffer class for each thread reading/writing. No more, because you don't want to have expensive buffers like this around. Bound to a specific thread because you don't want to share them without locking (and locking is expensive)
Probably you would like to reuse a Buffer from one message to the next, avoiding the cost of creating and destroying it.
You might want to explore the idea of a Decorator a class that inherits or contains each of your data classes. In this case they idea for this decorator is to contain the methods to serialize and deserialize each of your data types.
One option for this is to make the Decorator a template and use class template specialization to provide the different formats.
Combining the Decorator methods and Buffer methods you should have all the control you need.
You can have in the Buffer class magical templated methods that have an arbitary object as parameter and automatically creates a Decorator for it and serializes.
Connversely, deserializing should get you a Decorator that should be made convertible to the decorated type.
I'm sorry I don't have the time right now to give you a full blown example, but I hope the above ideas can get you started.

As an example I shamelessly plug my own (de)serialization library that's packing into msgpackv5 format:
flurry

c++ serialize std::error_code for transporting over network or saving to disk?

I want to serialize an std::error_code, transport it over the network and deserialize it again. Is there anyway to do this or will I need a translation table (switch-case) which maps integer values to/from an std::error_code?
int encode_error(const std::error_code& ec);
std::error_code decode_error(int value);
Thanks.

You need to use std::error_condition, which is a portable error code.

A std::error_code is a combination of two things:
An integer value (usually mapped from an enum of some sort); and
A pointer to the category that generated them.
To serialize them in the general case is very difficult, as you would need to know all of the possible categories that could be used, and then transmit which category is appropriate along with the value. The receiver would then need to look up the matching category locally and use that to create the error code from that.
A simpler case would be to get the message, category name, and error value, and send that. It would be impractical to translate that back into an error_code, but depending on your use case, it might be sufficient.
If you want to serialize error_codes from a custom category that you control is much simpler, since you eliminate the need to determine which category the error came from, so it should be trivial to serialize that.

Object-oriented networking

I've written a number of networking systems and have a good idea of how networking works. However I always end up having a packet receive function which is a giant switch statement. This is beginning to get to me. I'd far rather a nice elegant object-oriented way to handle receiving packets but every time I try to come up with a good solution I always end up coming up short.
For example lets say you have a network server. It is simply waiting there for responses. A packet comes in and the server needs to validate the packet and then it needs to decide how to handle it.
At the moment I have been doing this by switching on the packet id in the header and then having a huge bunch of function calls that handle each packet type. With complicated networking systems this results in a monolithic switch statement and I really don't like handling it this way. One way I've considered is to use a map of handler classes. I can then pass the packet to the relevant class and handle the incoming data. The problem I have with this is that I need some way to "register" each packet handler with the map. This means, generally, I need to create a static copy of the class and then in the constructor register it with the central packet handler. While this works it really seems like an inelegant and fiddly way of handling it.
Edit: Equally it would be ideal to have a nice system that works both ways. ie a class structure that easily handles sending the same packet types as receiving them (through different functions obviously).
Can anyone point me towards a better way to handle incoming packets? Links and useful information are much appreciated!
Apologies if I haven't described my problem well as my inability to describe it well is also the reason I've never managed to come up with a solution.

About the way to handle the packet type: for me the map is the best. However I'd use a plain array (or a vector) instead of a map. It would make access time constant if you enumerate your packet types sequentially from 0.
As to the class structure. There are libraries that already do this job: Available Game network protocol definition languages and code generation. E.g. Google's Protocol Buffer seems to be promising. It generates a storage class with getters, setters, serialization and deserialization routines for every message in the protocol description. The protocol description language looks more or less rich.

A map of handler instances is pretty much the best way to handle it. Nothing inelegant about it.

In my experience, table driven parsing is the most efficient method.
Although std::map is nice, I end up using static tables. The std::map cannot be statically initialized as a constant table. It must be loaded during run-time. Tables (arrays of structures) can be declared as data and initialized at compile time. I have not encountered tables big enough where a linear search was a bottleneck. Usually the table size is small enough that the overhead in a binary search is slower than a linear search.
For high performance, I'll use the message data as an index into the table.

When you are doing OOP, you try to represent every thing as an object, right? So your protocol messages become objects too; you'll probably have a base class YourProtocolMessageBase which will encapsulate any message's behavior and from which you will inherit your polymorphically specialized messages. Then you just need a way to turn every message (i.e. every YourProtocolMessageBase instance) into a string of bytes, and a way to do reverse. Such methods are called serialization techniques; some metaprogramming-based implementations exist.
Quick example in Python:
from socket import *
sock = socket(AF_INET6, SOCK_STREAM)
sock.bind(("localhost", 1234))
rsock, addr = sock.accept()
Server blocks, fire up another instance for a client:
from socket import *
clientsock = socket(AF_INET6, SOCK_STREAM)
clientsock.connect(("localhost", 1234))
Now use Python's built-in serialization module, pickle; client:
import pickle
obj = {1: "test", 2: 138, 3: ("foo", "bar")}
clientsock.send(pickle.dumps(obj))
Server:
>>> import pickle
>>> r = pickle.loads(rsock.recv(1000))
>>> r
{1: 'test', 2: 138, 3: ('foo', 'bar')}
So, as you can see, I just sent over link-local a Python object. Isn't this OOP?
I think the only possible alternative to serializing is maintaining the bimap IDs ⇔ classes. This looks really inevitable.

You want to keep using the same packet network protocol, but translate that into an Object in programming, right ?
There are several protocols that allow you to treat data as programming objects, but it seems, you don't want to change the protocol, just the way its treated in your application.
Does the packets come with something like a "tag" or metadata or any "id" or "data type" that allows you to map to an specific object class ? If it does, you may create an array that stores the id. and the matching class, and generate an object.

A more OO way to handle this is to build a state machine using the state pattern.
Handling incoming raw data is parsing where state machines provide an elegant solution (you will have to choose between elegant and performance)
You have a data buffer to process, each state has a handle buffer method that parses and processes his part of the buffer (if already possible) and sets the next state based on the content.
If you want to go for performance, you still can use a state machine, but leave out the OO part.

I would use Flatbuffers and/or Cap’n Proto code generators.

I solved this problem as part of my btech in network security and network programming and I can assure it's not one giant packet switch statement. The library is called cross platform networking and I modeled it around the OSI model and how to output it as a simple object serialization. The repository is here: https://bitbucket.org/ptroen/crossplatformnetwork/src/master/
Their is a countless protocols like NACK, HTTP, TCP,UDP,RTP,Multicast and they all are invoked via C++ metatemplates. Ok that is the summarized answer now let me dive a bit deeper and explain how you solve this problem and why this library can help you out whether you design it yourself or use the library.
First, let's talk about design patterns in general. To make it nicely organized you need first some design patterns around it as a way to frame your problem. For my C++ templates I framed it initially around the OSI Model(https://en.wikipedia.org/wiki/OSI_model#Layer_7:_Application_layer) down to the transport level(which becomes sockets at that point). To recap OSI :
Application Layer: What it means to the end user. IE signals getting deserialized or serialized and passed down or up from the networking stack
Presentation: Data independence from application and network stack
Session: dialogues between sessions
Transport: transporting the packets
But here's the kicker when you look at these closely these aren't design pattern but more like namespaces around transporting from A to B. So to a end user I designed cross platform network with the following standardized C++ metatemplate
template <class TFlyWeightServerIncoming, // a class representing the servers incoming payload. Note a flyweight is a design pattern that's a union of types ie putting things together. This is where you pack your incoming objects
class TFlyWeightServerOutgoing, // a class representing the servers outgoing payload of different types
class TServerSession, // a hook class that represent how to translate the payload in the form of a session layer translation. Key is to stay true to separation of concerns(https://en.wikipedia.org/wiki/Separation_of_concerns)
class TInitializationParameters> // a class representing initialization of the server(ie ports ,etc..)
two examples: https://bitbucket.org/ptroen/crossplatformnetwork/src/master/OSI/Transport/TCP/TCPTransport.h
https://bitbucket.org/ptroen/crossplatformnetwork/src/master/OSI/Transport/HTTP/HTTPTransport.h
And each protocol can be invoked like this:
OSI::Transport::Interface::ITransportInitializationParameters init_parameters;
const size_t defaultTCPPort = 80;
init_parameters.ParseServerArgs(&(*argv), argc, defaultTCPPort, defaultTCPPort);
OSI::Transport::TCP::TCP_ServerTransport<SampleProtocol::IncomingPayload<OSI::Transport::Interface::ITransportInitializationParameters>, SampleProtocol::OutgoingPayload<OSI::Transport::Interface::ITransportInitializationParameters>, SampleProtocol::SampleProtocolServerSession<OSI::Transport::Interface::ITransportInitializationParameters>, OSI::Transport::Interface::ITransportInitializationParameters> tcpTransport(init_parameters);
tcpTransport.RunServer();
citation:
https://bitbucket.org/ptroen/crossplatformnetwork/src/master/OSI/Application/Stub/TCPServer/main.cc
I also have in the code base under MVC a full MVC implementation that builds on top of this but let's get back to your question. You mentioned:
"At the moment I have been doing this by switching on the packet id in the header and then having a huge bunch of function calls that handle each packet type."
" With complicated networking systems this results in a monolithic switch statement and I really don't like handling it this way. One way I've considered is to use a map of handler classes. I can then pass the packet to the relevant class and handle the incoming data. The problem I have with this is that I need some way to "register" each packet handler with the map. This means, generally, I need to create a static copy of the class and then in the constructor register it with the central packet handler. While this works it really seems like an inelegant and fiddly way of handling it."
In cross platform network the approach to adding new types is as follows:
After you defined the server type you just need to make the incoming and outgoing types. The actual mechanism for handling them is embedded with in the incoming object type. The methods within it are ToString(), FromString(),size() and max_size(). These deal with the security concerns of keeping the layers below the application layer secure. But since your defining object handlers now you need to make the translation code to different object types. You'll need at minimum within this object:
1.A list of enumerated object types for the application layer. This could be as simple as numbering them. But for things like the session layer have a look at session layer concerns(for instance RTP has things like jitter and how to deal with imperfect connection. IE session concerns). Now you could also switch from enumerated to a hash/map but that's just another way of dealing of the problem how to look up the variable.
Defining Serialize and de serialize the object(for both incoming and outgoing types).
After you serialized or deserialize put the logic to dispatch it to the appropriate internal design pattern to handle the application layer. This could possibly be a builder , or command or strategy it really depends on it's use case. In cross platform some concerns is delegated by the TServerSession layer and others by the incoming and outgoing classes. It just depends on the seperation of concerns.
Deal with performance concerns. IE its not blocking(which becomes a bigger concern when you scale up concurrent user).
Deal with security concerns(pen test).
If you curious you can review my api implementation and it's a single threaded async boost reactor implementation and when you combine with something like mimalloc(to override new delete) you can get very good performance. I measured like 50k connections on a single thread easily.
But yeah it's all about framing your server in good design patterns , separating the concerns and selecting a good model to represent the server design. I believe the OSI model is appropriate for that which is why i put in cross platform network to provide superior object oriented networking.

Google protocol buffers compare

I want to compare two Messages or (two sub parameters) of Google protocol buffers.
I don't find an API to achieve it.
Any ideas?

You can use the class google::protobuf::util::MessageDifferencer for this. I think it's only available since v3.0.2:
Introduced new utility functions/classes in the google/protobuf/util
directory:
MessageDifferencer: compare two proto messages and report their differences.
#include <google/protobuf/util/message_differencer.h>
MessageDifferencer::Equals(msg1, msg2);

You can rely on the fact that all of your protobuf messages inherit from the google::protobuf::MesageLite type, which in turn has everything you need to compare any two protobuf messages, regardless of if they are even of the same derived type:
bool operator==(const google::protobuf::MessageLite& msg_a,
const google::protobuf::MessageLite& msg_b) {
return (msg_a.GetTypeName() == msg_b.GetTypeName()) &&
(msg_a.SerializeAsString() == msg_b.SerializeAsString());
}
EDIT
As was pointed out in the comments below, and especially for map fields, this answer is incorrect. map elements have non-deterministic ordering. Use MessageDifferencer if map fields might be present in your messages.

Instead of using message.DebugString you could also do
std::string strMsg;
message.SerializeToString(&strMsg);
with both messages and then compare the two (binary) strings. I didn't test the performance but I assume that it is faster than comparing the human readable message strings returned by .DebugString(). +You can do that with the protobuf-lite library (while for message.DebugString you need the full version).

Well, a protocol buffer is just a serialization format for some object type. Why not use the protocol buffer to reconstruct the original objects, and then allow those objects to compare themselves, using whatever comparison logic you've built into the class?

This might not be the ideal solution, but I think it could be done by:
messageA.DebugString() == messageB.DebugString();
Other than that, I think the only solution would be to create your own Message child class and implement a bool operator==(const Message&).

You can compare the descriptor's pointer (super fast):
if (mMessages[i]->body()->GetDescriptor() == T::descriptor())
mMessages it's a pool of network messages with header and crypto which creates a packet with the protobuf body(google::protobuf::Message*).
so, to get the right kind of message i compare the descriptors constant pointer which is the same for every single type of message (not %100 sure but i haven't got any problem so far).
That would be the fastest way to compare a protobuf Message wthout having to use string comparasion, which by the way you gan get the type name from the descriptor. :-)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js