The Standard Way To Encode/Decode To/From Binary Object In C++ - c++

I want to encode/decode some basic type into/from binary.
The test code may looks like this.
int main()
{
int iter = 0;
char* binary = new char[100];
int32_t version = 1;
memcpy(binary, &version, sizeof(int32_t));
iter+=sizeof(int32_t);
const char* value1 = "myvalue";
memcpy(binary+iter, value1, strlen(value1));
iter+=strlen(value1);
double value2 = 0.1;
memcpy(binary+iter, &value2, sizeof(double));
#warning TODO - big/small endian - fixed type length
return 0;
}
But I still need to solve a lot of problems, such as the endian and fixed type length.
So I want to know if there is a standard way to implement this.
Simultaneously, I don't want to use any third-party implementation, such as Boost and so on. Because I need to keep my code simple and Independent.
If there is a function/class like NSCoding in Objc, it will be best. I wonder if there is same thing in C++ standard library.

No, there are no serialization functions within the standard library. Use a library or implement it by yourself.
Note that raw new and delete is a bad practice in C++.

The most standard thing you have in every OS base library is ntohs/ntohl and htons/htonl that you can use to go from 'host' to 'network' byte order that is considered the standard for serializing integers.
The problem is that there is not yet a standard API for 64bit types and you should anyway serialize strings by yourself (the most common method is to prepend string data with an int16/32 containing the string length in bytes).
Again C/C++ do not offer a standard way to serialize data from/to a binary buffer or an XML or a JSON, but there are tons of libraries that implement this, for example, one of the most used, also if it comes with a lot of dependencies is:
Boost serialize
Other libraries widely used but that require a precompilation step are:
Google procol buffers
FlatBuffers

Related

Passing data between dll boundaries in c++ Safely

Let there be a structure
struct MyDataStructure
{
int a;
int b;
string c;
};
Let there be a function in the interface exposed by a dll.
class IDllInterface
{
public:
void getData(MyDataStructure&) = 0;
};
From a client exe which loads the dll, would the following code be safe?
...
IDllInterface* dll = DllFactory::getInterface(); // Imagine this exists
MyDataStructure data;
dll->getData(data);
...
Assume, of course, that MyDataStructure is known to both the client and the dll. Also according to what I understand, as the code is compiled separately for the dll and exe, the MyDataStructure could be different for difference compilers/compiler versions. Is my understanding correct.
If so, how can you pass data between the dll boundaries safely when working with different compilers/compiler versions.
You could use a "protocol" approach. For this, you could use a memory buffer to transfer the data and both sides just have to agree on the buffer layout.
The protocol agreement could be something like:
We don't use a struct, we just use a memory buffer - (pass me a pointer or whatever means toolkit allows sharing a memory buffer.
We clear the buffer to 0s before setting any data in it.
All ints use 4 bytes in the buffer. This means each side uses whatever int type under their compiler is 4 bytes e.g. int/long.
For the particular case of two ints, the first 8 bytes has the ints and after that it's the string data.
#define MAX_STRING_SIZE_I_NEED 128
// 8 bytes for ints.
#define DATA_SIZE (MAX_STRING_SIZE_I_NEED + 8)
char xferBuf[DATA_SIZE];
So Dll sets int etc. e.g.
void GetData(void* p);
// "int" is whatever type is known to use 4 bytes
(int*) p = intA_ValueImSending;
(int*) (p + 4) = intB_ValueImSending;
strcpy((char*) (p + 8), stringBuf_ImSending);
On the receving end it's easy enough to place the buffered values in the struct:
char buf[DATA_SIZE];
void* p =(void*) buf;
theDll.GetData(p);
theStrcuctInstance.intA = *(int*) p;
theStrcuctInstance.intB = *(int*) (p + 4);
...
If you want you could even agree on the endianness of the bytes per integer and set each of the 4 bytes of each integer in the buffer - but you probably wouldn't need to go to that extent.
For more general purpose both sides could agree on "markers" in the buffer. The buffer would look like this:
<marker>
<data>
<marker>
<data>
<marker>
<data>
...
Marker: 1st byte indicates the data type, the 2nd byte indicates the length (very much like a network protocol).
If you want to pass a string in COM, you normally want to use a COM BSTR object. You can create one with SysAllocString. This is defined to be neutral between compilers, versions, languages, etc. Contrary to popular belief, COM does directly support the int type--but from its perspective, int is always a 32-bit type. If you want a 64-bit integer, that's a Hyper, in COM-speak.
Of course you could use some other format that both sides of your connection know/understand/agree upon. Unless you have an extremely good reason to do this, it's almost certain to be a poor idea. One of the major strengths of COM is exactly the sort of interoperation you seem to want--but inventing your own string formation would limit that substantially.
Using JSON for communication.
I think I have found an easier way to do it hence answering my own question. As suggested in the answer by #Greg, one has to make sure that the data representation follows a protocol e.g. network protocol. This makes sure that the object representation between different binary components (exe and dll here) becomes irrelevant. If we think about it again this is the same problem that JSON solves by defining a simple object representation protocol.
So a simple yet powerful solution according to me would be to construct a JSON object from your object in the exe, serialise it, pass it across the dll boundary as bytes and deserialise it in the dll. The only agreement between the dll and exe would be that both use the same string encoding (e.g. UTF-8).
https://en.wikibooks.org/wiki/JsonCpp
One can use the above Jsoncpp library. The strings are encoded by UTF-8 by default in Jsoncpp library so that it convenient as well :-)

What is the C++ equivalent of binary.write in Golang?

I am working on project in C++ that adopts many ideas from a golang project.
I don't properly understand how this binary.write works from the documentation and how I can replicate it in C++. I am stuck at this line in my project.
binary.Write(e.offsets, nativeEndian, e.offset)
The type of e.offsets is *bytes.Buffer and e.offset is uint64
In C++ standard libs, it is generally up to you to deal with endian concerns. So let's skip that for the time being. If you just want to write binary data to a stream such as a file, you can do something like this:
uint64_t value = 0xfeedfacedeadbeef;
std::ofstream file("output.bin", ios::binary);
file.write(reinterpret_cast<char*>(&value), sizeof(value));
The cast is necessary because the file stream deals with char*, but you can write whatever byte streams to it you like.
You can write entire structures this way as well so long as they are "Plain Old Data" (POD). For example:
struct T {
uint32_t a;
uint16_t b;
};
T value2 = { 123, 45 };
std::ofstream file("output.bin", ios::binary);
file.write(reinterpret_cast<char*>(&value2), sizeof(value2));
Reading these things back is similar using file.read, but as mentioned, if you REALLY do care about endian, then you need to take care of that yourself.
If you are dealing with non-POD types (such as std::string), then you will need to deal with a more involved data serialization system. There are numerous options to deal with this if needed.

Is there a proper formatter for boost::uint64_t to use with snprintf?

I am using boost/cstdint.hpp in a C++ project because I am compiling in C++03 mode (-std=c++03) and I want to have fixed-width integers (they are transmitted over the network and stored to files). I am also using snprintf because it is a simple and fast way to format strings.
Is there a proper formatter to use boost::uint64_t with snprintf(...) or should I switch to another solution (boost::format, std::ostringstream) ?
I am current using %lu but I am not fully happy with it as it may not work on another architecture (where boost::uint64_t is not defined as long unsigned), defeating the purpose of using fixed-width integers.
boost::uint64_t id
id = get_file_id(...)
const char* ENCODED_FILENAME_FORMAT = "encoded%lu.dat";
//...
char encoded_filename[34];
snprintf(encoded_filename, 34, ENCODED_FILENAME_FORMAT, id);
snprintf isn't a Boost function. It knows how to print the fundamental types only. If none of those coincides with boost::uint64_t, then it isn't even possible to print that.
In general, as you note the formatter has to match the underlying type. So even if it's possible, the formatter will be platform-dependent. There's no extension mechanism by which Boost can add new formatters to snprintf.

pack/unpack functions for C++

NOTE: I know that this has been asked many times before, but none of the questions have had a link to a concrete, portable, maintained library for this.
I need a C or C++ library that implements Python/Ruby/Perl like pack/unpack functions. Does such a library exist?
EDIT: Because the data I am sending is simple, I have decided to just use memcpy, pointers, and the hton* functions. Do I need to manipulate a char in any way to send it over the network in a platform agnostic manner? (the char is only used as a byte, not as a character).
In C/C++ usually you would just write a struct with the various members in the correct order (correct packing may require compiler-specific pragmas) and dump/read it to/from file with a raw fwrite/fread (or read/write when dealing with C++ streams). Actually, pack and unpack were born to read stuff generated with this method.
If you instead need the result in a buffer instead of a file it's even easier, just copy the structure to your buffer with a memcpy.
If the representation must be portable, your main concerns are is byte ordering and fields packing; the first problem can be solved with the various hton* functions, while the second one with compiler-specific directives.
In particular, many compilers support the #pragma pack directive (see here for VC++, here for gcc), that allows you to manage the (unwanted) padding that the compiler may insert in the struct to have its fields aligned on convenient boundaries.
Keep in mind, however, that on some architectures it's not allowed to access fields of particular types if they are not aligned on their natural boundaries, so in these cases you would probably need to do some manual memcpys to copy the raw bytes to variables that are properly aligned.
Why not boost serialization or protocol buffers?
Yes: Use std::copy from <algorithm> to operate on the byte representation of a variable. Every variable T x; can be accessed as a byte array via char * p = reinterpret_cast<char*>(&x); and p can be treated like a pointer to the first element of a an array char[sizeof(T)]. For example:
char buf[100];
double q = get_value();
char const * const p = reinterpret_cast<char const *>(&q);
std::copy(p, p + sizeof(double), buf);
// more stuff like that
some_stream.write(buf) //... etc.
And to go back:
double r;
std::copy(data, data + sizeof(double), reinterpret_cast<char *>(&r));
In short, you don't need a dedicated pack/unpack in C++, because the language already allows you access to its variables' binary representation as a standard part of the language.

Porting data serialization code from C++ linux/mac to C++ windows

I have a software framework compiled and running successfully on both mac and linux. I am now trying to port it to windows (using mingw). So far, I have the software compiling and running under windows but its inevitably buggy. In particular, I have an issue with reading data that was serialized in macos (or linux) into the windows version of the program (segfaults).
The serialization process serializes values of primitive variables (longs, ints, doubles etc.) to disk.
This is the code I am using:
#include <iostream>
#include <fstream>
template <class T>
void serializeVariable(T var, std::ofstream &outFile)
{
outFile.write (reinterpret_cast < char *>(&var),sizeof (var));
}
template <class T>
void readSerializedVariable(T &var, std::ifstream &inFile)
{
inFile.read (reinterpret_cast < char *>(&var),sizeof (var));
}
So to save the state of a bunch of variables, I call serializeVariable for each variable in turn. Then to read the data back in, calls are made to readSerializedVariable in the same order in which they were saved. For example to save:
::serializeVariable<float>(spreadx,outFile);
::serializeVariable<int>(objectDensity,outFile);
::serializeVariable<int>(popSize,outFile);
And to read:
::readSerializedVariable<float>(spreadx,inFile);
::readSerializedVariable<int>(objectDensity,inFile);
::readSerializedVariable<int>(popSize,inFile);
But in windows, this reading of serialized data is failing. I am guessing that windows serializes data a little differently. I wonder if there is a way in which I could modify the above code so that data saved on any platform can be read on any other platform...any ideas?
Cheers,
Ben.
Binary serialization like this should work fine across those platforms. You do have to honor endianness, but that is trivial. I don't think these three platforms have any conflicts in this respect.
You really can't use as loose of type specifications when you do, though. int, float, size_t sizes can all change across platforms.
For integer types, use the strict sized types found in the cstdint header. uint32_t, int32_t, etc. Windows doesn't have the header available iirc, but you can use boost/cstdint.hpp instead.
Floating point should work as most compilers follow the same IEEE specs.
C - Serialization of the floating point numbers (floats, doubles)
Binary serialization really needs thorough unit testing. I would strongly recommend investing the time.
this is just a wild guess sry I can't help you more. My idea is that the byte order is different: big endian vs little endian. So anything larger than one byte will be messed up when loaded on a machine that has the order reversed.
For example I found this peace of code in msdn:
int isLittleEndian() {
long int testInt = 0x12345678;
char *pMem;
pMem = (char *) testInt;
if (pMem[0] == 0x78)
return(1);
else
return(0);
}
I guess you will have different results on linux vs windows. Best case would be if there is a flag option for your compiler(s) to use one format or the other. Just set it to be the same on all machines.
Hope this helps,
Alex
Just one more wild guess:
you forget open file in binary reading mode, and on windows file streams
convert sequence 13,10 to 10.
Did you consider using serialization libraries or formats, like e.g.:
XDR (supported by libc) or ASN1
s11n (a C++ serialization library)
Json, a very simple textual format with many libraries for it, e.g. JsonCpp, Jansson, Jaula, ....)
YAML, a more powerful textual format, with many libraries
or even XML, which is often used for serialization purposes...
(And for serialization of scalars, the htonl and companion routines should help)