Serialization/Deserialization of a struct to a char* in C - c++

I have a struct
struct Packet {
int senderId;
int sequenceNumber;
char data[MaxDataSize];
char* Serialize() {
char *message = new char[MaxMailSize];
message[0] = senderId;
message[1] = sequenceNumber;
for (unsigned i=0;i<MaxDataSize;i++)
message[i+2] = data[i];
return message;
}
void Deserialize(char *message) {
senderId = message[0];
sequenceNumber = message[1];
for (unsigned i=0;i<MaxDataSize;i++)
data[i] = message[i+2];
}
};
I need to convert this to a char* , maximum length MaxMailSize > MaxDataSize for sending over network and then deserialize it at the other end
I can't use tpl or any other library.
Is there any way to make this better I am not that comfortable with this, or is this the best we can do.

since this is to be sent over a network, i strongly advise you to convert those data into network byte order before transmitting, and back into host byte order when receiving. this is because the byte ordering is not the same everywhere, and once your bytes are not in the right order, it may become very difficult to reverse them (depending on the programming language used on the receiving side). byte ordering functions are defined along with sockets, and are named htons(), htonl(), ntohs() and ntohl(). (in those name: h means 'host' or your computer, n means 'network', s means 'short' or 16bit value, l means 'long' or 32 bit value).
then you are on your own with serialization, C and C++ have no automatic way to perform it. some softwares can generate code to do it for you, like the ASN.1 implementation asn1c, but they are difficult to use because they involve much more than just copying data over the network.

Depending if you have enough place or not... you might simply use the streams :)
std::string Serialize() {
std::ostringstream out;
char version = '1';
out << version << senderId << '|' << sequenceNumber << '|' << data;
return out.str();
}
void Deserialize(const std::string& iString)
{
std::istringstream in(iString);
char version = 0, check1 = 0, check2 = 0;
in >> version;
switch(version)
{
case '1':
senderId >> check1 >> sequenceNumber >> check2 >> data;
break;
default:
// Handle
}
// You can check here than 'check1' and 'check2' both equal to '|'
}
I readily admit it takes more place... or that it might.
Actually, on a 32 bits architecture an int usually cover 4 bytes (4 char). Serializing them using streams only take more than 4 'char' if the value is superior to 9999, which usually gives some room.
Also note that you should probably include some guards in your stream, just to check when you get it back that it's alright.
Versioning is probably a good idea, it does not cost much and allows for unplanned later development.

You can have a class reprensenting the object you use in your software with all the niceties and member func and whatever you need. Then you have a 'serialized' struct that's more of a description of what will end up on the network.
To ensure the compiler will do whatever you tell him to do, you need to instruct it to 'pack' the structure. The directive I used here is for gcc, see your compiler doc if you're not using gcc.
Then the serialize and deserialize routine just convert between the two, ensuring byte order and details like that.
#include <arpa/inet.h> /* ntohl htonl */
#include <string.h> /* memcpy */
class Packet {
int senderId;
int sequenceNumber;
char data[MaxDataSize];
public:
char* Serialize();
void Deserialize(char *message);
};
struct SerializedPacket {
int senderId;
int sequenceNumber;
char data[MaxDataSize];
} __attribute__((packed));
void* Packet::Serialize() {
struct SerializedPacket *s = new SerializedPacket();
s->senderId = htonl(this->senderId);
s->sequenceNumber = htonl(this->sequenceNumber);
memcpy(s->data, this->data, MaxDataSize);
return s;
}
void Packet::Deserialize(void *message) {
struct SerializedPacket *s = (struct SerializedPacket*)message;
this->senderId = ntohl(s->senderId);
this->sequenceNumber = ntohl(s->sequenceNumber);
memcpy(this->data, s->data, MaxDataSize);
}

int senderId;
int sequenceNumber;
...
char *message = new char[MaxMailSize];
message[0] = senderId;
message[1] = sequenceNumber;
You're overwriting values here. senderId and sequenceNumber are both ints and will take up more than sizeof(char) bytes on most architectures. Try something more like this:
char * message = new char[MaxMailSize];
int offset = 0;
memcpy(message + offset, &senderId, sizeof(senderId));
offset += sizeof(senderId);
memcpy(message + offset, &sequenceNumber, sizeof(sequenceNumber));
offset += sizeof(sequenceNumber);
memcpy(message + offset, data, MaxDataSize);
EDIT:
fixed code written in a stupor. Also, as noted in comment, any such packet is not portable due to endian differences.

To answer your question generally, C++ has no reflection mechanism, and so manual serialize and unserialize functions defined on a per-class basis is the best you can do. That being said, the serialization function you wrote will mangle your data. Here is a correct implementation:
char * message = new char[MaxMailSize];
int net_senderId = htonl(senderId);
int net_sequenceNumber = htonl(sequenceNumber);
memcpy(message, &net_senderId, sizeof(net_senderId));
memcpy(message + sizeof(net_senderId), &net_sequenceNumber, sizeof(net_sequenceNumber));

As mentioned in other posts, senderId and sequenceNumber are both of type int, which is likely to be larger than char, so these values will be truncated.
If that's acceptable, then the code is OK. If not, then you need to split them into their constituent bytes. Given that the protocol you are using will specifiy the byte order of multi-byte fields, the most portable, and least ambiguous, way of doing this is through shifting.
For example, let's say that senderId and sequenceNumber are both 2 bytes long, and the protocol requires that the higher byte goes first:
char* Serialize() {
char *message = new char[MaxMailSize];
message[0] = senderId >> 8;
message[1] = senderId;
message[2] = sequenceNumber >> 8;
message[3] = sequenceNumber;
memcpy(&message[4], data, MaxDataSize);
return message;
}
I'd also recommend replacing the for loop with memcpy (if available), as it's unlikely to be less efficient, and it makes the code shorter.
Finally, this all assumes that char is one byte long. If it isn't, then all the data will need to be masked, e.g.:
message[0] = (senderId >> 8) & 0xFF;

You can use Protocol Buffers for defining and serializing of structs and classes. This is what google uses internally, and has a very small transfer mechanism.
http://code.google.com/apis/protocolbuffers/

Related

Subsetting char array without copying it in C++

I have a long array of char (coming from a raster file via GDAL), all composed of 0 and 1. To compact the data, I want to convert it to an array of bits (thus dividing the size by 8), 4 bytes at a time, writing the result to a different file. This is what I have come up with by now:
uint32_t bytes2bits(char b[33]) {
b[32] = 0;
return strtoul(b,0,2);
}
const char data[36] = "00000000000000000000000010000000101"; // 101 is to be ignored
char word[33];
strncpy(word,data,32);
uint32_t byte = bytes2bits(word);
printf("Data: %d\n",byte); // 128
The code is working, and the result is going to be written in a separate file. What I'd like to know is: can I do that without copying the characters to a new array?
EDIT: I'm using a const variable here just to make a minimal, reproducible example. In my program it's a char *, which is continually changing value inside a loop.
Yes, you can, as long as you can modify the source string (in your example code you can't because it is a constant, but I assume in reality you have the string in writable memory):
uint32_t bytes2bits(const char* b) {
return strtoul(b,0,2);
}
void compress (char* data) {
// You would need to make sure that the `data` argument always has
// at least 33 characters in length (the null terminator at the end
// of the original string counts)
char temp = data[32];
data[32] = 0;
uint32_t byte = bytes2bits(data);
data[32] = temp;
printf("Data: %d\n",byte); // 128
}
In this example by using char* as a buffer to store that long data there is not necessary to copy all parts into a temporary buffer to convert it to a long.
Just use a variable to step through the buffer by each 32 byte length period, but after the 32th byte there needs the 0 termination byte.
So your code would look like:
uint32_t bytes2bits(const char* b) {
return strtoul(b,0,2);
}
void compress (char* data) {
int dataLen = strlen(data);
int periodLen = 32;
char* periodStr;
char tmp;
int periodPos = periodLen+1;
uint32_t byte;
periodStr = data[0];
while(periodPos < dataLen)
{
tmp = data[periodPos];
data[periodPos] = 0;
byte = bytes2bits(periodStr);
printf("Data: %d\n",byte); // 128
data[periodPos] = tmp;
periodStr = data[periodPos];
periodPos += periodLen;
}
if(periodPos - periodLen <= dataLen)
{
byte = bytes2bits(periodStr);
printf("Data: %d\n",byte); // 128
}
}
Please than be careful to the last period, which could be smaller than 32 bytes.
const char data[36]
You are in violation of your contract with the compiler if you declare something as const and then modify it.
Generally speaking, the compiler won't let you modify it...so to even try to do so with a const declaration you'd have to cast it (but don't)
char *sneaky_ptr = (char*)data;
sneaky_ptr[0] = 'U'; /* the U is for "undefined behavior" */
See: Can we change the value of an object defined with const through pointers?
So if you wanted to do this, you'd have to be sure the data was legitimately non-const.
The right way to do this in modern C++ is by using std::string to hold your string and std::string_view to process parts of that string without copying it.
You can using string_view with that char array you have though. It's common to use it to modernize the classical null-terminated string const char*.

memcpy unsigned char to int

I'm trying to get an int value from a file I read. The trick is that I don't know how many bytes this value lays on, so I first read the length octet, then try to read as many data bytes as length octet tells me. The issue comes when I try to put the data octets in an int variable, and eventually print it - if the first data octet is 0, only the one that comes after is copied, so the int I try to read is wrong, as 0x00A2 is not the same as 0xA200. If i use ntohs or ntohl, then 0xA200 is decoded wrong as 0x00A2, so it does not resolve the hole problem. I am using memcpy like this:
memcpy(&dst, (const *)src, bytes2read)
where dst is int, src is unsigned char * and bytes2read is a size_t.
So what am I doing wrong? Thank you!
You cannot use memcpy to portably store bytes in an integer, because the order of bytes is not specified by the standard, not speaking of possible padding bits. The portable way is to use bitwise operations and shift:
unsigned char b, len;
unsigned int val = 0;
fdin >> len; // read the field len
if (len > sizeof(val)) { // ensure it will fit into an
// process error: cannot fit in an int variable
...
}
while (len-- > 0) { // store and shift one byte at a bite
val <<= 8; // shift previous value to leave room for new byte
fdin >> b; // read it
val |= b; // and store..
}

Ways to interpret groups of bytes in a curl result as other data types

I'm trying to write a program which will query a URL using curl and retrieve a string of bytes. The returned data than needs to be interpreted as various data types; an int followed by a sequence structures.
The curl write back function must have a prototype of:
size_t write_callback(char *ptr, size_t size, size_t nmemb, void *userdata);
I've seen various examples where the returned data is stored in a buffer either as characters directly in memory or as a string object.
If I have a character array, then I know that I can interpret a portion of it as a structure with code like this:
struct mystruct {
//define struct
};
char *buffer;
//push some data into the buffer
char *read_position;
read_position = buffer + 5;
test = (mystruct *)buffer;
I have two related questions. Firstly, is there a better way of using curl to retrieve binary data and pushing it into structures, rather than reading it directly into memory as characters. Secondly if reading into memory as a character buffer is the way to go, is my code above a sensible way to interpret the chunks of memory as different data types?
Things you need to consider when interpreting raw structures, especially over network:
The size of your data types;
The endianness of your data types;
Struct padding.
You should only use data types in your structure that are the correct size regardless of what compiler is used. That means for integers, you should use types from <cstdint>.
As for the endianness, you need to know if the data will arrive as big-endian or little-endian. I like to be explicit about it:
template< class T >
const char * ReadLittleEndian32( const char *buf, T & val )
{
static_assert( sizeof(T) == 4 );
val = T(buf[0]) | T(buf[1]) << 8 | T(buf[2]) << 16 | T(buf[3]) << 24;
return buf + sizeof(T);
}
template< class T >
const char * ReadBigEndian32( const char *buf, T & val )
{
static_assert( sizeof(T) == 4 );
val = T(buf[0]) << 24 | T(buf[1]) << 16 | T(buf[2]) << 8 | T(buf[3]);
return buf + sizeof(T);
}
//etc...
Finally, dealing with potential padding differences... I've already been naturally tending towards a 'deserialise' approach where each value is read and translated explicitly. The structure is no different:
struct Foo
{
uint16_t a;
int16_t b;
int32_t c;
const char * Read( const char * buf );
};
const char * Foo::Read( const char * buf )
{
buf = ReadLittleEndian16( buf, a );
buf = ReadLittleEndian16( buf, b );
buf = ReadLittleEndian32( buf, c );
return buf;
}
Notice the templating handles sign and other things in the data type, so that all we care about in the end is size. Also remember that data types such as float and double already have inherent endianness and should not be translated -- they can be read verbatim:
const char * ReadDouble( const char * buf, double & val )
{
val = *(double*)buf;
return buf + sizeof(double);
}

QDataStream won’t work with custom crafted char array

I have an application which consists of two primary modules. One is written in C, uses standard C runtime library and one written in Qt C++. They communicate with each other with IPC. C module creates a char array, fills it with data and sends to the module written in Qt. I want to deserialize received data using QDataStream, but my efforts didn't yield any result yet. Here's a simple example what I'm trying to achieve:
unsigned int pointer = 0;
const int IPC_MSG_LEN = 500;
const int IPC_MSG_HEADER = 200;
const int SOMETHING = 1443;
char api = 55;
char msg[IPC_MSG_LEN] = {0};
memcpy_s(msg, IPC_MSG_LEN, &IPC_MSG_HEADER, sizeof(int));
pointer = sizeof(unsigned int);
memcpy_s(&msg[pointer], IPC_MSG_LEN - pointer, &api, sizeof(char));
++pointer;
memcpy_s(&msg[pointer], IPC_MSG_LEN - pointer, &SOMETHING, sizeof(int));
QByteArray arr(msg, IPC_MSG_LEN);
QDataStream ds(&arr, QIODevice::ReadOnly);
qint32 header = 0, aa = 0;
qint8 t_api = 0;
ds >> header; //Doesn't work
ds >> t_api; //Works
ds >> aa; //Doesn't work
As you can see, the code is pretty simple, but header and aa variables are deserialized to a random number. However t_api (one byte variable) has correct value assigned.
So what's the problem with this code? Does QDataStream uses a private data format which is not compatible with the one I'm using? Should I write my own QIODevice implementation or there is a quick fix I'm not aware of? :)
Thanks, I appreciate your help.
UPDATE
Thank you very much guys, your solution worked perfectly with those primitive data types, but the problem is that I also need to be able to serialize/deserialize char* strings too.
wchar_t* name1 = L"something";
memcpy_s(&msg[pointer], IPC_MSG_LEN - pointer, name1, (wcslen(name1) + 1) * 2);
char* ai = new char[500];
ds >> ai; //ai becomes NULL here :|
Is there a way to achieve that? Thanks again
QDataStream::setByteOrder(QDataStream::LittleEndian);
#include <QDebug>
#include <QByteArray>
#include <QDataStream>
#include <QString>
#include <vector>
template<typename T> void writePtr(char*&dst, T data){
*reinterpret_cast<T*>(dst) = data;
dst += sizeof(T);
}
int main(int argc, char** argv){
const size_t ipcSize = 512;
std::vector<char> buffer(ipcSize, 0);
quint32 sendVal1 = 0x12345678, recvVal1 = 0;
quint8 sendVal2 = 0xee, recvVal2 = 0;
quint32 sendVal3 = 0x9999abcd, recvVal3 = 0;
char* dst = &buffer[0];
writePtr(dst, sendVal1);
writePtr(dst, sendVal2);
writePtr(dst, sendVal3);
QByteArray byteArray(&buffer[0]);
QDataStream stream(&byteArray, QIODevice::ReadOnly);
stream.setByteOrder(QDataStream::LittleEndian);
stream >> recvVal1 >> recvVal2 >> recvVal3;
qDebug() << QString(QObject::tr("sent: %1, received: %2")).arg(sendVal1, 8, 16).arg (recvVal1, 8, 16);
qDebug() << QString(QObject::tr("sent: %1, received: %2")).arg(sendVal2, 2, 16).arg(recvVal2, 2, 16);
qDebug() << QString(QObject::tr("sent: %1, received: %2")).arg(sendVal3, 8, 16).arg(recvVal3, 8, 16);
return 0;
}
but the problem is that I also need to be able to serialize/deserialize char* strings too.
Qt data serialization format is explained (in detail) here. You MUST read that document if you want to use QDataStream for IPC. Qt has nice documentation, so use it.
Also this is not a char* string:
wchar_t* name1 = L"something";
It is a wchar_t* string.
wchar_t has different size depending on compiler - either 4 or 2 bytes per wchar_t. Which means problem for IPC. unlike wchar_t, char is guaranteed to be 1 byte big.
So either encode entire string to UTF8 (or use 8bit-encoded string with known codepage/encoding) and write it as raw data in QByteArray-compatible format:
void writeDataPtr(char*& ptr, const char* data, quint32 size){
if (!data){
size = 0xffffffff;
writePtr(ptr, size);
return;
}
memcpy(ptr, data, size);
ptr += size;
}
Then use QString::fromUtf8 to decode it (or QTextCodec - if you decided to use other 8bit encoding instead of utf8). OR if you can ensure that your wchar_t* string is UTF16-compliant and sizeof(wchar_t) == 2, dump it in QString-compatible format.
By the way - If I were you, I'd avoid memcpy_s. It is not part of C++ standard, which is a very good reason to avoid it.
I want is to read wchar_t*/char* from QDataStream until stream position gets to null terminating character.
If this is homework, tag your post accordingly.
One of those should work:
QString readWcharString(QDataStream& stream){
QVector<ushort> rawData;
ushort tmp;
do{
stream >> tmp;
rawData.push_back(tmp)
}while(tmp);
return QString::fromUtf16(rawData.data());
}
or
QString readWcharString(QDataStream& stream){
QVector<wchar_t> rawData;
ushort tmp;
do{
stream >> tmp;
rawData.push_back(tmp)
}while(tmp);
return QString::fromWCharArray(rawData.data());
}
QDataStream stores the numbers in the big endian format by default.
You can change that with:
ds.setByteOrder(QDataStream::ByteOrder(QSysInfo::ByteOrder));
which will use the detected host endianness instead.

Deciphering unsigned char*

I have a process that listens to an UDP multi-cast broadcast and reads in the data as a unsigned char*.
I have a specification that indicates fields within this unsigned char*.
Fields are defined in the specification with a type and size.
Types are: uInt32, uInt64, unsigned int, and single byte string.
For the single byte string I can merely access the offset of the field in the unsigned char* and cast to a char, such as:
char character = (char)(data[1]);
Single byte uint32 i've been doing the following, which also seems to work:
uint32_t integer = (uint32_t)(data[20]);
However, for multiple byte conversions I seem to be stuck.
How would I convert several bytes in a row (substring of data) to its corresponding datatype?
Also, is it safe to wrap data in a string (for use of substring functionality)? I am worried about losing information, since I'd have to cast unsigned char* to char*, like:
std::string wrapper((char*)(data),length); //Is this safe?
I tried something like this:
std::string wrapper((char*)(data),length); //Is this safe?
uint32_t integer = (uint32_t)(wrapper.substr(20,4).c_str()); //4 byte int
But it doesn't work.
Thoughts?
Update
I've tried the suggest bit shift:
void function(const unsigned char* data, size_t data_len)
{
//From specifiction: Field type: uInt32 Byte Length: 4
//All integer fields are big endian.
uint32_t integer = (data[0] << 24) | (data[1] << 16) | (data[2] << 8) | (data[3]);
}
This sadly gives me garbage (same number for every call --from a callback).
I think you should be very explicit, and not just do "clever" tricks with casts and pointers. Instead, write a function like this:
uint32_t read_uint32_t(unsigned char **data)
{
const unsigned char *get = *data;
*data += 4;
return (get[0] << 24) | (get[1] << 16) | (get[2] << 8) | get[3];
}
This extracts a single uint32_t value from a buffer of unsigned char, and increases the buffer pointer to point at the next byte of data in the buffer.
This assumes big-endian data, you need to have a well-defined idea of the buffer's endian-mode in order to interpret it.
Depends on the byte ordering of the protocol, for big-endian or so called network byte order do:
uint32_t i = data[0] << 24 | data[1] << 16 | data[2] << 8 | data[3];
Without commenting on whether it's a good idea or not, the reason why it doesn't work for you is that the result of wrapper.substring(20,4).c_str() is (uint32_t *), not (uint32_t). So if you do:
uint32_t * integer = (uint32_t *)(wrapper.substr(20,4).c_str(); it should work.
uint32_t integer = ntohl(*reinterpret_cast<const uint32_t*>(data + 20));
or (handles alignment issues):
uint32_t integer;
memcpy(&integer, data+20, sizeof integer);
integer = ntohl(integer);
The pointer way:
uint32_t n = *(uint32_t*)&data[20];
You will run into problems on different endian architectures though. The solution with bit shifts is better and consistent.
std::string wrapper((char*)(data),length); //Is this safe?
This should be safe since you specified the length of the data.
On the other hand if you did this:
std::string wrapper((char*)data);
The string length would be determined wherever the first 0 byte occurs, and you will more than likely chop off some data.