Object serialization in C++ - c++

I would like to serialize/deserialize some structured data in order to send it over the network via a char* buffer.
More precisely, suppose I have a message of type struct Message.
struct Message {
Header header;
Address address;
size_t size; // size of data part
char* data;
} message
In C, I would use something such as:
size = sizeof(Header) + sizeof(Address) + sizeof(size_t) + message.size;
memcpy(buffer, (char *) message, size);
to serialize, and
Message m = (Message) buffer;
to deserialize.
What would be the "right" way to do it in C++. Is it better to define a class rather than a struct. Should I overload some operators? are there alignment issues to consider?
EDIT: thanks for pointing the "char *" problem. The provided C version is incorrect. The data section pointed to by the data field should be copied separately.

Actually there are many flavors:
You can boost let it do for you: http://www.boost.org/doc/libs/1_52_0/libs/serialization/doc/tutorial.html
Overloading the stream operators << for serialization and >> for deserialization works well with file and string streams
You could specify a constructor Message (const char*) for constructing from a char*.
I am a fan of static methods for deserialization like:
Message {
...
static bool desirialize (Message& dest, char* source);
}
since you could catch errors directly when deserializing.
And the version you proposed is ok, when applying the modifications in the comments are respected.

Why not insert a virtual 'NetworkSerializable' Class into your inheritance tree? A 'void NetSend(fd socket)' method would send stuff, (without exposing any private data), and 'int(bufferClass buffer)' could return -1 if no complete, valid message was deserilalized, or, if a valid message has been assembled, the number of unused chars in 'buffer'.
That encapsulates all the assembly/disassembly protocol state vars and other gunge inside the class, where it belongs. It also allows message/s to be assembled from multiple stream input buffers.
I'm not a fan of static methods. Protocol state data associated with deserialization should be per-instance, (thread-safety).

Related

How to access serialized data of Cap'n'Proto?

I'm working with Cap'n'Proto and my understanding is there is no need to do serialization as it's already being done. So my question is, how would I access the serialized data and get it's size so that I can pass it in as a byte array to another library.
// person.capnp
struct Person {
name #0 :Text;
age #1 :Int16;
}
// ...
::capnp::MallocMessageBuilder message;
Person::Builder person = message.initRoot<Person>();
person.setName("me");
person.setAge(20);
// at this point, how do I get some sort of handle to
// the serialized data of 'person' as well as it's size?
I've seen the writePackedMessageToFd(fd, message); call, but didn't quite understand what was being passed and couldn't find any API docs on it. I also wasn't trying to write to a file descriptor as I need the serialized data returned as const void*.
Looking in Capnproto's message.h file is this function which is in the base class for MallocMessageBuilder which says it gets the raw data making up the message.
kj::ArrayPtr<const kj::ArrayPtr<const word>> getSegmentsForOutput();
// Get the raw data that makes up the message.
But even then, Im' not sure how to get it as const void*.
Thoughts?
::capnp::MallocMessageBuilder message;
is your binary message, and its size is
message.sizeInWords()
(size in bytes divided by 8).
This appears to be whats needed.
// ...
::capnp::MallocMessageBuilder message;
Person::Builder person = message.initRoot<Person>();
person.setName("me");
person.setAge(20);
kj::Array<capnp::word> dataArr = capnp::messageToFlatArray(message);
kj::ArrayPtr<kj::byte> bytes = dataArr.asBytes();
std::string data(bytes.begin(), bytes.end());
const void* dataPtr = data.c_str();
At this point, I have a const void* dataPtr and size using data.size().

Using stream to treat received data

I am receiving messages from a socket.
The socket is packed within a header (that is basically the size of the message) and a footer that is a crc (a kind of code to check if the message is not corrupted)
So, the layout is something like :
size (2 bytes) | message (240 bytes) | crc (4 byte)
I wrote a operator>>
The operator>> is as following :
std::istream &operator>>(std::istream &stream, Message &msg) {
std::int16_t size;
stream >> size;
stream.read(reinterpret_cast<char*>(&msg), size);
// Not receive enough data
if (stream.rdbuf()->in_avail() < dataSize + 4) {
stream.setstate(std::ios_base::failbit);
return stream;
}
std::int16_t gotCrc;
stream >> gotCrc;
// Data not received correctly
if(gotCrc != computeCrc(msg)) {
stream.setstate(std::ios_base::failbit);
}
return stream;
}
The message can arrive byte by byte, or can arrive totally. We can even receive several messages in once.
Basically, what I did is something like this :
struct MessageReceiver {
std::string totalDataReceived;
void messageArrived(std::string data) {
// We add the data to totaldataReceived
totalDataReceived += data;
std::stringbuf buf(totalDataReceived);
std::istream stream(&buf);
std::vector<Message> messages(
std::istream_iterator<Message>(stream),
std::istream_iterator<Message>{});
std::for_each(begin(messages), end(messages), processMessage);
// +4 for crc and + 2 for the size to remove
auto sizeToRemove = [](auto init, auto message) {return init + message.size + 4 + 2;};
// remove the proceed messages
totalDataReceived.remove(accumulate(begin(messages), end(messages), 0, sizeToRemove);
}
};
So basically, we receive data, we insert it into a total array of data received. We stream it, and if we got at least one message, we remove it from the buffer totalDataReceived.
However, I am not sure it is the good way to go. Indeed, this code does not work when a compute a bad crc... (The message is not created, so we don't iterate over it). So each time, I am going to try to read the message with a bad crc...
How can I do this? I can not keep all the data in totalDataReceived because I can receive a lot of messages during the execution life time.
Should I implement my own streambuf?
I found what you want to create is a class which acts like a std::istream. Of course you can choose to create your own class, but I prefer to implement std::streambuf for some reasons.
First, people using your class are accustomed to using it since it acts the same as std::istream if you inherit and implement std::streambuf and std::istream.
Second, you don't need to create extra method or don't need to override operators. They're already ready in std::istream's class level.
What you have to do to implement std::streambuf is to inherit it, override underflow() and setting get pointers using setg().

C++ NTOH conversion with dispatcher - event queue

We are rewriting our legacy code in C to C++. At the core of our system, we have a TCP client, which is connected to master. Master will be streaming messages continuously. Each socket read will result in say an N number of message of the format - {type, size, data[0]}.
Now we don't copy these messages into individual buffers - but just pass the pointer the beginning of the message, the length and shared_ptr to the underlying buffer to a workers.
The legacy C version was single threaded and would do an inplace NTOH conversion like below:
struct Message {
uint32_t something1;
uint16_t something2;
};
process (char *message)
Message *m = (message);
m->something1 = htonl(m->something1);
m->something2 = htons(m->something2);
And then use the Message.
There are couple of issues with following the logging in new code.
Since we are dispatching the messages to different workers, each worker doing an ntoh conversion will cause cache miss issues as the messages are not cache aligned - i.e there is no padding b/w the messages.
Same message can be handled by different workers - this is the case where the message needs to processed locally and also relayed to another process. Here the relay worker needs the message in original network order and the local work needs to convert to host order. Obviously as the message is not duplicated both cannot be satisfied.
The solutions that comes to my mind are -
Duplicate the message and send one copy for all relay workers if any. Do the ntoh conversion of all messages belonging to same buffer in the dispatcher itself before dispatching - say by calling a handler->ntoh(message); so that the cache miss issue is solved.
Send each worker the original copy. Each worker will copy the message to local buffer and then do ntoh conversion and use it. Here each worker can use a thread-specific (thread_local) static buffer as a scratch pad to copy the message.
Now my question is
Is the option 1 way of doing ntoh conversion - C++sy? I mean the alignment requirement of the structure will be different from the char buffer. (we havent had any issue with this yet.). Using scheme 2 should be fine in this case as the scratch buffer can have alignment of max_align_t and hence should typecastable to any structure. But this incur copying the entire message - which can be quite big (say few K size)
Is there a better way to handle the situation?
Your primary issue seems to be how to handle messages that come in misaligned. That is, if each message structure doesn't have enough padding on the end of it so that the following message is properly aligned, you can trigger misaligned reads by reinterpreting a pointer to the beginning of a message as an object.
We can get around this a number of ways, perhaps the simplest would be to ntoh based on a single-byte pointer, which is effectively always aligned.
We can hide the nasty details behind wrapper classes, which will take a pointer to the start of a message and have accessors that will ntoh the appropriate field.
As indicated in the comments, it's a requirement that offsets be determined by a C++ struct, since that's how the message is initially created, and it may not be packed.
First, our ntoh implementation, templated so we can select one by type:
template <typename R>
struct ntoh_impl;
template <>
struct ntoh_impl<uint16_t>
{
static uint16_t ntoh(uint8_t const *d)
{
return (static_cast<uint16_t>(d[0]) << 8) |
d[1];
}
};
template <>
struct ntoh_impl<uint32_t>
{
static uint32_t ntoh(uint8_t const *d)
{
return (static_cast<uint32_t>(d[0]) << 24) |
(static_cast<uint32_t>(d[1]) << 16) |
(static_cast<uint32_t>(d[2]) << 8) |
d[3];
}
};
template<>
struct ntoh_impl<uint64_t>
{
static uint64_t ntoh(uint8_t const *d)
{
return (static_cast<uint64_t>(d[0]) << 56) |
(static_cast<uint64_t>(d[1]) << 48) |
(static_cast<uint64_t>(d[2]) << 40) |
(static_cast<uint64_t>(d[3]) << 32) |
(static_cast<uint64_t>(d[4]) << 24) |
(static_cast<uint64_t>(d[5]) << 16) |
(static_cast<uint64_t>(d[6]) << 8) |
d[7];
}
};
Now we'll define a set of nasty macros that will automatically implement accessors for a given name by looking up the member with the matching name in the struct proto (a private struct to each class):
#define MEMBER_TYPE(MEMBER) typename std::decay<decltype(std::declval<proto>().MEMBER)>::type
#define IMPL_GETTER(MEMBER) MEMBER_TYPE(MEMBER) MEMBER() const { return ntoh_impl<MEMBER_TYPE(MEMBER)>::ntoh(data + offsetof(proto, MEMBER)); }
Finally, we have an example implementation of the message structure you have given:
class Message
{
private:
struct proto
{
uint32_t something1;
uint16_t something2;
};
public:
explicit Message(uint8_t const *p) : data(p) {}
explicit Message(char const *p) : data(reinterpret_cast<uint8_t const *>(p)) {}
IMPL_GETTER(something1)
IMPL_GETTER(something2)
private:
uint8_t const *data;
};
Now Message::something1() and Message::something2() are implemented and will read from the data pointer at the same offsets they wind up being in Message::proto.
Providing the implementation in the header (effectively inline) has the potential to inline the entire ntoh sequence at the call site of each accessor!
This class does not own the data allocation it is constructed from. Presumably you could write a base class if there's ownership-maintaining details here.

Accessing the member of a class that is part of a WiFi listener callback member function

I have a WiFi Listener registered as a callback (pointer function) with a fixed 3rd party interface. I used a static member of my function to register the callback function and then that static function calls a nonstatic member through a static cast. The main problem is that I cannot touch the resulting char * buff with any members of my class nor can I even change an int flag that is also a member of my class. All result in runtime access violations. What can I do? Please see some of my code below. Other problems are described after the code.
void *pt2Object;
TextWiFiCommunication::TextWiFiCommunication()
{
networkDeviceListen.rawCallback = ReceiveMessage_thunkB;
/* some other initializing */
}
int TextWiFiCommunication::ReceiveMessage_thunkB(int eventType, NETWORK_DEVICE *networkDevice)
{
if (eventType == TCP_CLIENT_DATA_READY)
static_cast<TextWiFiCommunication *>(pt2Object)->ReceiveMessageB(eventType,networkDevice);
return 1;
}
int TextWiFiCommunication::ReceiveMessageB(int eventType, NETWORK_DEVICE *networkDevice)
{
unsigned char outputBuffer[8];
// function from an API that reads the WiFi socket for incoming data
TCP_readData(networkDevice, (char *)outputBuffer, 0, 8);
std::string tempString((char *)outputBuffer);
tempString.erase(tempString.size()-8,8); //funny thing happens the outputBuffer is double in size and have no idea why
if (tempString.compare("facereco") == 0)
cmdflag = 1;
return 1;
}
So I can't change the variable cmdflag without an access violation during runtime. I can't declare outputBuffer as a class member because nothing gets written to it so I have to do it within the function. I can't copy the outputBuffer to a string type member of my class. The debugger shows me strlen.asm code. No idea why. How can I get around this? I seem to be imprisoned in this function ReceiveMessageB.
Thanks in advance!
Some other bizzare issues include: Even though I call a buffer size of 8. When I take outputBuffer and initialize a string with it, the string has a size of 16.
You are likely getting an access violation because p2tObject does not point to a valid object but to garbage. When is p2tObject initialized? To what does it point?
For this to work, your code should look something like this:
...
TextWifiCommunication twc;
p2tObject = reinterpret_cast<void*>(&twc);
...
Regarding the string error, TCP_readData is not likely to null-terminate the character array you give it. A C-string ends at the first '\0' (null) character. When you convert the C-string to a std::string, the std::string copies bytes from the C-string pointer until it finds the null terminator. In your case, it happens to find it after 16 characters.
To read up to 8 character from a TCP byte stream, the buffer should be 9 characters long and all the bytes of the buffer should be initialized to '\0':
...
unsigned char outputBuffer[9] = { 0 };
// function from an API that reads the WiFi socket for incoming data
TCP_readData(networkDevice, (char *)outputBuffer, 0, 8);
std::string tempString((char *)outputBuffer);
...

How can I create C-style structs in Clojure?

I am trying to create C-style structs in Clojure, so I can call a poorly documented C++ API from Clojure.
The API is designed to send and receive serialized protobuf messages (the good) preceded by a C Header struct (the bad). The initial handshake is an RPCHandshakeHeader struct and the process is roughly described in the code below:
struct RPCHandshakeHeader {
char magic[8];
int version;
static const char REQUEST_MAGIC[9];
static const char RESPONSE_MAGIC[9];
};
[...snip...]
const char RPCHandshakeHeader::REQUEST_MAGIC[9] = "FooBar?\n";
[...snip...]
RPCHandshakeHeader header;
memcpy(header.magic, RPCHandshakeHeader::REQUEST_MAGIC, sizeof(header.magic));
header.version = 1;
socket = new CActiveSocket();
socket->Initialize();
socket->Open((const uint8 *)"localhost", 5000);
socket->Send((uint8*)&header, sizeof(header));
[...code to read response...]
How can I do this in clojure? Do I need to use JNA/JNI?
Is there a way to create a C struct, turn it into binary and send it over a socket? (I think this is what I need to do)
Sounds like a job for gloss! I don't know the details of this part of the API, but you want to look particularly at compile-frame, and repeated for the character strings.