I am working on project in C++ that adopts many ideas from a golang project.
I don't properly understand how this binary.write works from the documentation and how I can replicate it in C++. I am stuck at this line in my project.
binary.Write(e.offsets, nativeEndian, e.offset)
The type of e.offsets is *bytes.Buffer and e.offset is uint64
In C++ standard libs, it is generally up to you to deal with endian concerns. So let's skip that for the time being. If you just want to write binary data to a stream such as a file, you can do something like this:
uint64_t value = 0xfeedfacedeadbeef;
std::ofstream file("output.bin", ios::binary);
file.write(reinterpret_cast<char*>(&value), sizeof(value));
The cast is necessary because the file stream deals with char*, but you can write whatever byte streams to it you like.
You can write entire structures this way as well so long as they are "Plain Old Data" (POD). For example:
struct T {
uint32_t a;
uint16_t b;
};
T value2 = { 123, 45 };
std::ofstream file("output.bin", ios::binary);
file.write(reinterpret_cast<char*>(&value2), sizeof(value2));
Reading these things back is similar using file.read, but as mentioned, if you REALLY do care about endian, then you need to take care of that yourself.
If you are dealing with non-POD types (such as std::string), then you will need to deal with a more involved data serialization system. There are numerous options to deal with this if needed.
Related
I want to encode/decode some basic type into/from binary.
The test code may looks like this.
int main()
{
int iter = 0;
char* binary = new char[100];
int32_t version = 1;
memcpy(binary, &version, sizeof(int32_t));
iter+=sizeof(int32_t);
const char* value1 = "myvalue";
memcpy(binary+iter, value1, strlen(value1));
iter+=strlen(value1);
double value2 = 0.1;
memcpy(binary+iter, &value2, sizeof(double));
#warning TODO - big/small endian - fixed type length
return 0;
}
But I still need to solve a lot of problems, such as the endian and fixed type length.
So I want to know if there is a standard way to implement this.
Simultaneously, I don't want to use any third-party implementation, such as Boost and so on. Because I need to keep my code simple and Independent.
If there is a function/class like NSCoding in Objc, it will be best. I wonder if there is same thing in C++ standard library.
No, there are no serialization functions within the standard library. Use a library or implement it by yourself.
Note that raw new and delete is a bad practice in C++.
The most standard thing you have in every OS base library is ntohs/ntohl and htons/htonl that you can use to go from 'host' to 'network' byte order that is considered the standard for serializing integers.
The problem is that there is not yet a standard API for 64bit types and you should anyway serialize strings by yourself (the most common method is to prepend string data with an int16/32 containing the string length in bytes).
Again C/C++ do not offer a standard way to serialize data from/to a binary buffer or an XML or a JSON, but there are tons of libraries that implement this, for example, one of the most used, also if it comes with a lot of dependencies is:
Boost serialize
Other libraries widely used but that require a precompilation step are:
Google procol buffers
FlatBuffers
i wonder if nobody is using an union to serialize structs for boost::asio sender/receiver. i have searched for something but all i found (yet) were been examples like this or this.
so i have it done like this:
struct ST {
short a;
long b;
float c;
char c[256];
}
...
void sender::send(const ST &_packet) {
union {
const ST &s;
char (&c)[sizeof(ST)];
}usc = {_packet};
socket.send_to(boost::asio::buffer(usc.c, sizeof(ST)), endpoint);
}
...
ST var = {1234, -1234, 1.4567, "some text"};
sercer.send(var);
so my question now is, is this bad practise to do serialization of the fundamental data types?
i know i can't send var-sized strings directly, and therefor i can use boost::serialization.
It is indeed bad practise. What you are sending is (supposed to be) a sequence of bytes containing the exact binary representation of whatever is in the union on your system. The problems are:
You fill the usc union with a ST, but access it as a char[], which produces undefined behavior (but probably works on all common systems).
You probably want to do the same on the receiver side, in reversed order. UB again, but probably works.
Here comes trouble: You send the data in a system specific format, that need not be the same on the receiving system, for the same struct definition. This includes
byte order of integral types (little/big endian)
size of types, not only integral (e.g. sizeof(long) sometimes differs from 32bit to 64bit systems, or between different compilers on 64bit systems)
padding bytes
sizeof(ST) itself may differ significantly
Simply said: just don't do this. If you are using boost::serialization anyways, use it for the whole ST struct, not only for the strings inside it. If your data structure becomes just a little more complicated (e.g. contains pointers, has nontrivial constructors etc.) you have to do that anyways.
code looks like this:
struct Dog {
string name;
unsigned int age;
};
int main()
{
Dog d = {.age = 3, .name = "Lion"};
FILE *fp = fopen("dog.txt", "wb");
fwrite(&d, sizeof(d), 1, fp); //write d into dog.txt
}
My problem is what's the point of write a data object or structure into a binary file? I assume it is for making the data generated in a running program persistent, right? If yes, then how can I get the data back? Using fread?
This makes me think of database-like stuff, dose database write data into disk the same way?
You can do it but you will have a lot of issues to care about:
structure types: all your data needs really be into struct or you can just writing a pointer to some other place.
structure changes: if you need change your structure you will need write a converter to read old struct and write the new.
language interoperability: will be hard to access the data using other language
It was a common practice in the early days before relational databases popularization. You can make index files pointing to a record number.
However nowadays I will advice you to make serialization and write strings instead binaries.
NOTE:
if string is something like char[40] your code maybe will survive... but if your question is about C++ and string is a class then kill you child before it grows up! The string object characters are not into your struct but in the heap.
Writing data in binary is extremely useful and much faster then reading/writing in text, take for instance video games (Although not every video game does this), when the game is saved all of the nescessary structures/classes and other data are written into a save file in binary.
It is just one use for using binary, but the major reason for doing this is speed.
And to read the data back, you will need to know the format that you saved it in, for instance as a simple example, if I saved an integer, char array of n size, and a boolean, I would need to read the binary file in as an integer, char array of n size, and a boolean. Otherwise the data is read improperly and will not be very useful at all
Be careful. The type of field 'name' in your structure is 'string'. This class contains data allocated dynamically. So writing 'string' data into file this way only pointers will be writed, not data itself.
The C++ Middleware Writer supports binary serialization to/from files.
From a marshalling perspective the "unsigned int age" member of your struct is a potential problem. I'd consider changing the type to uint32_t.
I have a software framework compiled and running successfully on both mac and linux. I am now trying to port it to windows (using mingw). So far, I have the software compiling and running under windows but its inevitably buggy. In particular, I have an issue with reading data that was serialized in macos (or linux) into the windows version of the program (segfaults).
The serialization process serializes values of primitive variables (longs, ints, doubles etc.) to disk.
This is the code I am using:
#include <iostream>
#include <fstream>
template <class T>
void serializeVariable(T var, std::ofstream &outFile)
{
outFile.write (reinterpret_cast < char *>(&var),sizeof (var));
}
template <class T>
void readSerializedVariable(T &var, std::ifstream &inFile)
{
inFile.read (reinterpret_cast < char *>(&var),sizeof (var));
}
So to save the state of a bunch of variables, I call serializeVariable for each variable in turn. Then to read the data back in, calls are made to readSerializedVariable in the same order in which they were saved. For example to save:
::serializeVariable<float>(spreadx,outFile);
::serializeVariable<int>(objectDensity,outFile);
::serializeVariable<int>(popSize,outFile);
And to read:
::readSerializedVariable<float>(spreadx,inFile);
::readSerializedVariable<int>(objectDensity,inFile);
::readSerializedVariable<int>(popSize,inFile);
But in windows, this reading of serialized data is failing. I am guessing that windows serializes data a little differently. I wonder if there is a way in which I could modify the above code so that data saved on any platform can be read on any other platform...any ideas?
Cheers,
Ben.
Binary serialization like this should work fine across those platforms. You do have to honor endianness, but that is trivial. I don't think these three platforms have any conflicts in this respect.
You really can't use as loose of type specifications when you do, though. int, float, size_t sizes can all change across platforms.
For integer types, use the strict sized types found in the cstdint header. uint32_t, int32_t, etc. Windows doesn't have the header available iirc, but you can use boost/cstdint.hpp instead.
Floating point should work as most compilers follow the same IEEE specs.
C - Serialization of the floating point numbers (floats, doubles)
Binary serialization really needs thorough unit testing. I would strongly recommend investing the time.
this is just a wild guess sry I can't help you more. My idea is that the byte order is different: big endian vs little endian. So anything larger than one byte will be messed up when loaded on a machine that has the order reversed.
For example I found this peace of code in msdn:
int isLittleEndian() {
long int testInt = 0x12345678;
char *pMem;
pMem = (char *) testInt;
if (pMem[0] == 0x78)
return(1);
else
return(0);
}
I guess you will have different results on linux vs windows. Best case would be if there is a flag option for your compiler(s) to use one format or the other. Just set it to be the same on all machines.
Hope this helps,
Alex
Just one more wild guess:
you forget open file in binary reading mode, and on windows file streams
convert sequence 13,10 to 10.
Did you consider using serialization libraries or formats, like e.g.:
XDR (supported by libc) or ASN1
s11n (a C++ serialization library)
Json, a very simple textual format with many libraries for it, e.g. JsonCpp, Jansson, Jaula, ....)
YAML, a more powerful textual format, with many libraries
or even XML, which is often used for serialization purposes...
(And for serialization of scalars, the htonl and companion routines should help)
Is it evil to serialize struct objects using memcpy?
In one of my projects I am doing the following: I memcpy a struct object, base64 encode it, and write it to file. I do the inverse when parsing the data. It seems to work OK, but in certain situations (for example when using the WINDOWPLACEMENT for the HWND of Windows Media Player) it turns out that the decoded data does not match sizeof(WINDOWPLACEMENT).
Here are some code fragments:
// Using WINDOWPLACEMENT from Windows API headers:
typedef struct tagWINDOWPLACEMENT {
UINT length;
UINT flags;
UINT showCmd;
POINT ptMinPosition;
POINT ptMaxPosition;
RECT rcNormalPosition;
#ifdef _MAC
RECT rcDevice;
#endif
} WINDOWPLACEMENT;
static std::string EncodeWindowPlacement(const WINDOWPLACEMENT & inWindowPlacement)
{
std::stringstream ss;
{
Poco::Base64Encoder encoder(ss); // From the Poco C++ libraries
const char * offset = reinterpret_cast<const char*>(&inWindowPlacement);
std::vector<char> buffer(offset, offset + sizeof(inWindowPlacement));
for (size_t idx = 0; idx != buffer.size(); ++idx)
{
encoder << buffer[idx];
}
encoder.close();
}
return ss.str();
}
static WINDOWPLACEMENT DecodeWindowPlacement(const std::string & inEncoded)
{
std::string decodedString;
{
std::istringstream istr(inEncoded);
Poco::Base64Decoder decoder(istr); // From the Poco C++ libraries
decoder >> decodedString;
assert(decoder.eof());
if (decoder.fail())
{
throw std::runtime_error("Failed to parse Window placement data from the configuration file.");
}
}
if (decodedString.size() != sizeof(WINDOWPLACEMENT))
{
// !! Occurs frequently !!
throw std::runtime_error("Errors occured during parsing of the Window placement.");
}
WINDOWPLACEMENT windowPlacement;
memcpy(&windowPlacement, &decodedString[0], decodedString.size());
return windowPlacement;
}
I'm aware that copying classes in C++ using memcpy is likely to cause trouble because the copy constructors are not properly executed. I'm not sure if this also applies to C-style structs. Or is serialization by memory dumping simply not done?
Update:
A bug in Poco's Base64Encoder/Decoder is not impossible, but unlikely. Its test cases seem pretty thorough: Base64Test.cpp.
You will run into problems if you need to transfer these files between machines that do not all share the same endianness and word size, or if you add/remove slots from the structs in future versions and need to retain binary compatibility.
I'm not sure how operator>>() is implemented in Poco::Base64Decoder. If it is same as istream's operator>>(), then after decoder >> decodedString; decodedString may not contain all characters from the input. For example, if there is any whitespace character in encoded string then decoder >> decodedString; will read upto that whitespace.
Doing a memcpy of classes/structs is okay if they're just Plain Old Data (POD), but if that's the case, then you could rely on C++ doing the copying for you via copy constructors (which exist for both struct and class types in C++).
Certainly you can do it the way you have been doing it - one of the products I've worked on serializes data using memcpy, sends the data over the wire, and client applications decode the bytestream to get the data back.
But if you have a choice, you might want something higher level like boost.serialization, which offers more flexibility and deep-pointer copying. The aforementioned Google ProtoBuffers would work nicely too.
Here are some threads discussing serialization methods in C++:
boost serialization vs google protocol buffers?
C++ Serialization Performance
I wouldn't go as far as to say that it's evil, but I think it is asking for trouble and weird problems in many cases.
I know it has been done and it can work (I've seen people serialize structs like that to send over a network connection), but it has a number of drawbacks that have been pointed out already (inflexibility, endianness problems, structs containing pointers, packing, etc).
I'd recommend a more robust way of serializing and deserializing your data. I've heard lots of good things about Google protocol buffers, something like that will be a lot more flexible and will probably save you headaches in the end.
Serializing data in the manner you've done it is not particularly evil, if you know you're staying on a machine with the same byte size, word size, endian-ness, etc. Since you're serializing the window placement information, you probably don't care about portability between two different machines, and only want to save this information between sessions on the same machine. I'd hazard a guess that you're storing this into the Registry. If you want portability for other data that is actually useful when it's ported to other architectures, then you can look at many of the other suggestions posted here already, such as Google protocol buffers, etc. Whitespace is a red-herring, as all WS is irrelevant in a base64 encoded data stream and all decoders should ignore it (PoCo does).I am curious to know what are the sizes of the string and the structure when it fails.Knowing this might give you some insight into the problem.