C++: How to save platform independent binary files? - c++

I have a 3D-volume, represented as as vector of vector of vector of float, that I want to save to a binary file. (It's a density volume reconstructed from X-ray images that come from a CT scanner.)
Now, I could do this in the following way:
//iterate through the volume
for (int x = 0; x < _xSize; ++x){
for (int y = 0; y < _ySize; ++y){
for (int z = 0; z < _zSize; ++z){
//save one float of data
stream.write((char*)&_volume[x][y][z], sizeof(float));
}
}
}
This basically works. However, I'm asking myself to which extent this is platform independent. I would like to produce a file which is identical regardless of the system it was created on. So there might be machines running Windows, Linux or Mac, they might have 32bit or 64bit word lenght and little endian or big endian byte order.
I suppose if I did this the way it was done above this wouldn't be the case. Now how could I achieve this? I've heard about serialisation but I haven't found a concrete solution for this instance.

Google Protocol Buffers: free, encodes to binary, available in several languages, works across most platforms too. For your requirements I would seriously consider GPB. Be careful though, Google have released several versions and they've not always been backward compatible, ie old data is not necessarily readable by new versions of GPB code. I feel that it's still evolving and further changes will happen, which could be a nuisance if your project is also going to evolve over many years.
ASN.1, the grandpa of them all, very good schema language (value and size constraints can be set which is a terrific way of avoiding buffer overruns and gives automatic validation of data streams provided the auto generated code is correct), some free tools, see this page (mostly though they cost money). GPB's schema language is kind of a poor imitation of ASN.1's.

I solved the problem using the Qt Datastream class. Qt is part of my project anyway, so the additional effort is minimal. I can tell the Datastream object exactly if I want to save my floats using single precision (32bit) or double precision (64bit) and if I want to use little endian or big endian byte order. This is totally sufficient for what I need; I don't need to serialize objects. The files I save now have exactly the same format on all platforms (at least they should), and this is all I need. They will afterwards be read by 3rd party applications to which these information (byte order, precision) will be supplied. So to say it is not of importance exactly how my floats are saved but that I know how they are saved and that this is consistent no matter on which platform you run the program.
Here is how the code looks now:
QDataStream out(&file);
out.setFloatingPointPrecision(QDataStream::SinglePrecision);
out.setByteOrder(QDataStream::LittleEndian);
for (int x = 0; x < _xSize; ++x){
for (int y = 0; y < _ySize; ++y){
for (int z = 0; z < _zSize; ++z){
//save one float of data
out<<_volume[x][y][z];
}
}
}

I'm surprised there is no mention of the <rpc/xdr.h> header, for External Data Representation. I believe it is on all unixes, and may even work on Windows: https://github.com/ralight/oncrpc-windows/blob/master/win32/include/rpc/xdr.h
XDR stores all primitive data types in big endian, and takes care of the conversions for you.

Related

Is there a portable Binary-serialisation schema in FlatBuffers/Protobuf that supports arbitrary 24bit signed integer definitions?

We are sending data over UART Serial at a high data rate so data size is important. The most optimal format is Int24 for our data which may be simplified as a C bit-field struct (GCC compiler) under C/C++ to be perfectly optimal:
#pragma pack(push, 1)
struct Int24
{
int32_t value : 24;
};
#pragma pack(pop)
typedef std::array<Int24,32> ArrayOfInt24;
This data is packaged with other data and shared among devices and cloud infrastructures. Basically we need to have a binary serialization which is sent between devices of different architecture and programming languages. We would like to use a Schema based Binary serialisation such as ProtoBuffers or FlatBuffers to avoid the client codes needing to handle the respective bit-shifting and recovery of the twos-complement sign bit handling themselves. i.e. Reading the 24-bit value in a non-C language requires the following:
bool isSigned = (_b2 & (byte)0x80) != 0; // Sign extend negative quantities
int32_t value = _b0 | (_b1 << 8) | (_b2 << 16) | (isSigned ? 0xFF : 0x00) << 24;
If not already existing which (if any) existing Binary Serialisation library could be modified easily to extend support to this as we would be willing to add to any open-source project in this respect.
Depending on various things, you might like to look at ASN.1 and the unaligned Packed Encoding Rules (uPER). This is a binary serialisation that is widely used in telephony to easily minimise the number of transmitted bits. Tools are available for C, C++, C#, Java, Python (I think they cover uPER). A good starting point is Useful Old Technologies.
One of the reasons you might choose to use it is that uPER likely ends up doing better than anything else out there. Other benefits are contraints (on values and array sizes). You can express these in your schema, and the generated code will check data against them. This is something that can make a real difference to a project - automatic sanitisation of incoming data is a great way of resisting attacks - and is something that GPB doesn't do.
Reasons not to use it are that the very best tools are commercial, and quite pricey. Though there are some open source tools that are quite good but not necessarily implementing the entire ASN.1 standard (which is vast). It's also a learning curve, though (at a basic level) not so very different to Google Protocol Buffers. In fact, at the conference where Google announced GPB, someone asked "why not use ASN.1?". The Google bod hadn't heard of it; somewhat ironic, a search company not searching the web for binary serialisation technologies, went right ahead and invented their own...
Protocol Buffers use a dynamically sized integer encoding called varint, so you can just use uint32 or sint32, and the encoded value will be four bytes or less for all values and three bytes or less for any value < 2^21 (the actual size for an encoded integer is ⌈HB/7⌉ where HB is the highest bit set in the value).
Make sure not to use int32 as that uses a very inefficient fixed size encoding (10 bytes!) for negative values. For repeated values, just mark them as repeated, so multiple values will be sent efficiently packed.
syntax = "proto3";
message Test {
repeated sint32 data = 1;
}
FlatBuffers doesn't support 24-bit ints. The only way to represent it would be something like:
struct Int24 { a:ubyte; b:ubyte; c:ubyte; }
which obviously doesn't do the bit-shifting for you, but would still allow you to pack multiple Int24 together in a parent vector or struct efficiently. It would also save a byte when stored in a table, though there you'd probably be better off with just a 32-bit int, since the overhead is higher.
One particularly efficient use of protobuf's varint format is to use it as a sort of compression scheme, by writing the deltas between values.
In your case, if there is any correlation between consecutive values, you could have a repeated sint32 values field. Then as the first entry in the array, write the first value. For all further entries, write the difference from the previous value.
This way e.g. [100001, 100050, 100023, 95000] would get encoded as [100001, 49, -27, -5023]. As a packed varint array, the deltas would take 3, 1, 1 and 2 bytes, total of 7 bytes. Compared with a fixed 24-bit encoding taking 12 bytes or non-delta varint taking also 12 bytes.
Of course this also needs a bit of code on the receiving side to process. But adding up the previous value is easy enough to implement in any language.

Detect endianness of binary file data

Recently I was (again) reading about 'endian'ness. I know how to identify the endianness of host, as there are lots of post on SO, and also I have seen this, which I think is pretty good resource.
However, one thing I like to know is to how to detect the endianness of input binary file. For example, I am reading a binary file (using C++) like following:
ifstream mydata("mydata.raw", ios::binary);
short value;
char buf[sizeof(short)];
int dataCount = 0;
short myDataMat[DATA_DIMENSION][DATA_DIMENSION];
while (mydata.read(reinterpret_cast<char*>(&buf), sizeof(buf)))
{
memcpy(&value, buf, sizeof(value));
myDataMat[dataCount / DATA_DIMENSION][dataCount%DATA_DIMENSION] = value;
dataCount++;
}
I like to know how I can detect the endianness in the mydata.raw, and whether endianness affects this program anyway.
Additional Information:
I am only manipulating the data in myDataMat using mathematical operations, and no pointer operation or bitwise operation is done on the data).
My machine (host) is little endian.
It is impossible to "detect" the endianity of data in general. Just like it is impossible to detect whether the data is an array of 4 byte integers, or twice that many 2 byte integers. Without any knowledge about the representation, raw data is just a mass of meaningless bits.
However, with some extra knowledge about the data representation, it become possible. Some examples:
Most file formats mandate particular endianity, in which case this is never a problem.
Unicode text files may optionally start with a byte order mark. Same idea can be implemented by other data representations.
Some file formats contain a checksum. You can guess one endianity, and if the checksum does not match, try again with another endianity. It will be unlikely that the checksum matches with wrong interpretation of the data.
Sometimes you can make guesses based on the data. Is the temperature outside 33'554'432 degrees, or maybe 2? You can pick the endianity that represents sane data. Of course, this type of guesswork fails miserably, when the aliens invade and start melting our planet.
You can't tell.
The endianness transformation is essentially an operator E(x) on a number x such that x = E(E(x)). So you don't know "which way round" the x elements are in your file.

efficient check for value change in array of floats in c++

i want to optimize an OpenGL application, and one hotspot is
doing expensive handling ( uploading to graphics card ) of
relatively small arrays ( 8-64 values ) where sometimes the
values change but most of the times stay constant. So most
efficient solution would be to upload the array only when
it has changed.
Of course the simplest way would be setting flags whenever
the data is changed, but this would need many code
changes, and for a quick test i would like to know the
possible performance gains, before too much work has to
be done.
So i thought of a quick check ( like a murmur hash etc )
in memory if the data has changed from frame to frame and
decide uploding after this check. so the question is, how
could i eg. XOR an array of values like
float vptr[] = { box.x1,box.y1, box.x1,box.y2, box.x2,box.y2, box.x2,box.y1 };
together to detect reliably value changes?
Best & thanks,
Heiner
If you're using intel, you could look into intel intrinsics.
http://software.intel.com/en-us/articles/intel-intrinsics-guide gives you an interactive reference where you can explore. There are a bunch of instructions for comparing multiple integers or doubles in one instruction, which is a nice speed-up.
#Ming, thank you for the intrinsic speedup, i will have a look into this.
float vptr[] = { box.x1,box.y1, box.x1,box.y2, box.x2,box.y2, box.x2,box.y1 };
unsigned hashval h = 0;
for(int i=...)
{
h ^= (unsigned&) vptr[i];
}
dead simple, worked for the really tiny arrays. compiler should be able to auto-vectorize, size of array is known. have to test for larger arrays.
origin: Hash function for floats

Correct way to serialize binary data in C++

After having read the following 1 and 2 Q/As and having used the technique discussed below for many years on x86 architectures with GCC and MSVC and not seeing a problems, I'm now very confused as to what is supposed to be the correct but also as important "most efficient" way to serialize then deserialize binary data using C++.
Given the following "wrong" code:
int main()
{
std::ifstream strm("file.bin");
char buffer[sizeof(int)] = {0};
strm.read(buffer,sizeof(int));
int i = 0;
// Experts seem to think doing the following is bad and
// could crash entirely when run on ARM processors:
i = reinterpret_cast<int*>(buffer);
return 0;
}
Now as I understand things, the reinterpret cast indicates to the compiler that it can treat the memory at buffer as an integer and subsequently is free to issue integer compatible instructions which require/assume certain alignments for the data in question - with the only overhead being the extra reads and shifts when the CPU detects the address it is trying to execute alignment oriented instructions is actually not aligned.
That said the answers provided above seem to indicate as far as C++ is concerned that this is all undefined behavior.
Assuming that the alignment of the location in buffer from which cast will occur is not conforming, then is it true that the only solution to this problem is to copy the bytes 1 by 1? Is there perhaps a more efficient technique?
Furthermore I've seen over the years many situations where a struct made up entirely of pods (using compiler specific pragmas to remove padding) is cast to a char* and subsequently written to a file or socket, then later on read back into a buffer and the buffer cast back to a pointer of the original struct, (ignoring potential endian and float/double format issues between machines), is this kind of code also considered undefined behaviour?
The following is more complex example:
int main()
{
std::ifstream strm("file.bin");
char buffer[1000] = {0};
const std::size_t size = sizeof(int) + sizeof(short) + sizeof(float) + sizeof(double);
const std::size_t weird_offset = 3;
buffer += weird_offset;
strm.read(buffer,size);
int i = 0;
short s = 0;
float f = 0.0f;
double d = 0.0;
// Experts seem to think doing the following is bad and
// could crash entirely when run on ARM processors:
i = reinterpret_cast<int*>(buffer);
buffer += sizeof(int);
s = reinterpret_cast<short*>(buffer);
buffer += sizeof(short);
f = reinterpret_cast<float*>(buffer);
buffer += sizeof(float);
d = reinterpret_cast<double*>(buffer);
buffer += sizeof(double);
return 0;
}
First, you can correctly, portably, and efficiently solve the alignment problem using, e.g., std::aligned_storage::value>::type instead of char[sizeof(int)] (or, if you don't have C++11, there may be similar compiler-specific functionality).
Even if you're dealing with a complex POD, aligned_stored and alignment_of will give you a buffer that you can memcpy the POD into and out of, construct it into, etc.
In some more complex cases, you need to write more complex code, potentially using compile-time arithmetic and template-based static switches and so on, but so far as I know, nobody came up with a case during the C++11 deliberations that wasn't possible to handle with the new features.
However, just using reinterpret_cast on a random char-aligned buffer is not enough. Let's look at why:
the reinterpret cast indicates to the compiler that it can treat the memory at buffer as an integer
Yes, but you're also indicating that it can assume that the buffer is aligned properly for an integer. If you're lying about that, it's free to generate broken code.
and subsequently is free to issue integer compatible instructions which require/assume certain alignments for the data in question
Yes, it's free to issue instructions that either require those alignments, or that assume they're already taken care of.
with the only overhead being the extra reads and shifts when the CPU detects the address it is trying to execute alignment oriented instructions is actually not aligned.
Yes, it may issue instructions with the extra reads and shifts. But it may also issue instructions that don't do them, because you've told it that it doesn't have to. So, it could issue a "read aligned word" instruction which raises an interrupt when used on non-aligned addresses.
Some processors don't have a "read aligned word" instruction, and just "read word" faster with alignment than without. Others can be configured to suppress the trap and instead fall back to a slower "read word". But others—like ARM—will just fail.
Assuming that the alignment of the location in buffer from which cast will occur is not conforming, then is it true that the only solution to this problem is to copy the bytes 1 by 1? Is there perhaps a more efficient technique?
You don't need to copy the bytes 1 by 1. You could, for example, memcpy each variable one by one into properly-aligned storage. (That would only be copying bytes 1 by 1 if all of your variables were 1-byte long, in which case you wouldn't be worried about alignment in the first place…)
As for casting a POD to char* and back using compiler-specific pragmas… well, any code that relies on compiler-specific pragmas for correctness (rather than for, say, efficiency) is obviously not correct, portable C++. Sometimes "correct with g++ 3.4 or later on any 64-bit little-endian platform with IEEE 64-bit doubles" is good enough for your use cases, but that's not the same thing as actually being valid C++. And you certainly can't expect it to work with, say, Sun cc on a 32-bit big-endian platform with 80-bit doubles and then complain that it doesn't.
For the example you added later:
// Experts seem to think doing the following is bad and
// could crash entirely when run on ARM processors:
buffer += weird_offset;
i = reinterpret_cast<int*>(buffer);
buffer += sizeof(int);
Experts are right. Here's a simple example of the same thing:
int i[2];
char *c = reinterpret_cast<char *>(i) + 1;
int *j = reinterpret_cast<int *>(c);
int k = *j;
The variable i will be aligned at some address divisible by 4, say, 0x01000000. So, j will be at 0x01000001. So the line int k = *j will issue an instruction to read a 4-byte-aligned 4-byte value from 0x01000001. On, say, PPC64, that will just take about 8x as long as int k = *i, but on, say, ARM, it will crash.
So, if you have this:
int i = 0;
short s = 0;
float f = 0.0f;
double d = 0.0;
And you want to write it to a stream, how do you do it?
writeToStream(&i);
writeToStream(&s);
writeToStream(&f);
writeToStream(&d);
How do you read back from a stream?
readFromStream(&i);
readFromStream(&s);
readFromStream(&f);
readFromStream(&d);
Presumably whatever kind of stream you're using (whether ifstream, FILE*, whatever) has a buffer in it, so readFromStream(&f) is going to check whether there are sizeof(float) bytes available, read the next buffer if not, then copy the first sizeof(float) bytes from the buffer to the address of f. (In fact, it may even be smarter—it's allowed to, e.g., check whether you're just near the end of the buffer, and if so issue an asynchronous read-ahead, if the library implementer thought that would be a good idea.) The standard doesn't say how it has to do the copy. Standard libraries don't have to run anywhere but on the implementation they're part of, so your platform's ifstream could use memcpy, or *(float*), or a compiler intrinsic, or inline assembly—and it will probably use whatever's fastest on your platform.
So, how exactly would unaligned access help you optimize this or simplify it?
In nearly every case, picking the right kind of stream, and using its read and write methods, is the most efficient way of reading and writing. And, if you've picked a stream out of the standard library, it's guaranteed to be correct, too. So, you've got the best of both worlds.
If there's something peculiar about your application that makes something different more efficient—or if you're the guy writing the standard library—then of course you should go ahead and do that. As long as you (and any potential users of your code) are aware of where you're violating the standard and why (and you actually are optimizing things, rather than just doing something because it "seems like it should be faster"), this is perfectly reasonable.
You seem to think that it would help to be able to put them into some kind of "packed struct" and just write that, but the C++ standard does not have any such thing as a "packed struct". Some implementations have non-standard features that you can use for that. For example, both MSVC and gcc will let you pack the above into 18 bytes on i386, and you can take that packed struct and memcpy it, reinterpret_cast it to char * to send over the network, whatever. But it won't be compatible with the exact same code compiled by a different compiler that doesn't understand your compiler's special pragmas. It won't even be compatible with a related compiler, like gcc for ARM, which will pack the same thing into 20 bytes. When you use non-portable extensions to the standard, the result is not portable.

C++, using one byte to store two variables

I am working on representation of the chess board, and I am planning to store it in 32 bytes array, where each byte will be used to store two pieces. (That way only 4 bits are needed per piece)
Doing it in that way, results in a overhead for accessing particular index of the board.
Do you think that, this code can be optimised or completely different method of accessing indexes can be used?
c++
char getPosition(unsigned char* c, int index){
//moving pointer
c+=(index>>1);
//odd number
if (index & 1){
//taking right part
return *c & 0xF;
}else
{
//taking left part
return *c>>4;
}
}
void setValue(unsigned char* board, char value, int index){
//moving pointer
board+=(index>>1);
//odd number
if (index & 1){
//replace right part
//save left value only 4 bits
*board = (*board & 0xF0) + value;
}else
{
//replacing left part
*board = (*board & 0xF) + (value<<4);
}
}
int main() {
char* c = (char*)malloc(32);
for (int i = 0; i < 64 ; i++){
setValue((unsigned char*)c, i % 8,i);
}
for (int i = 0; i < 64 ; i++){
cout<<(int)getPosition((unsigned char*)c, i)<<" ";
if (((i+1) % 8 == 0) && (i > 0)){
cout<<endl;
}
}
return 0;
}
I am equally interested in your opinions regarding chess representations, and optimisation of the method above, as a stand alone problem.
Thanks a lot
EDIT
Thanks for your replies. A while ago I created checkers game, where I was using 64 bytes board representation. This time I am trying some different methods, just to see what I like. Memory is not such a big problem. Bit-boards is definitely on my list to try. Thanks
That's the problem with premature optimization. Where your chess board would have taken 64 bytes to store, now it takes 32. What has this really boughten you? Did you actually analyze the situation to see if you needed to save that memory?
Assuming that you used one of the least optimal search method, straight AB search to depth D with no heuristics, and you generate all possible moves in a position before searching, then absolute maximum memory required for your board is going to be sizeof(board) * W * D. If we assume a rather large W = 100 and large D = 30 then you're going to have 3000 boards in memory at depth D. 64k vs 32k...is it really worth it?
On the other hand, you've increased the amount of operations necessary to access board[location] and this will be called many millions of times per search.
When building chess AI's the main thing you'll end up looking for is cpu cycles, not memory. This may vary a little bit if you're targeting a cell phone or something, but even at that you're going to worry more about speed before you'll ever reach enough depth to cause any memory issues.
As to which representation I prefer...I like bitboards. Haven't done a lot of serious measurements but I did compare two engines I made, one bitboard and one array, and the bitboard one was faster and could reach much greater depths than the other.
Let me be the first to point out a potential bug (depending on compilers and compiler settings). And bugs being why premature optimization is evil:
//taking left part
return *c>>4;
if *c is negative, then >> may repeat the negative high bit. ie in binary:
0b10100000 >> 4 == 0b11111010
for some compilers (ie the C++ standard leaves it to the compiler to decide - both whether to carry the high bit, and whether a char is signed or unsigned).
If you do want to go forward with your packed bits (and let me say that you probably shouldn't bother, but it is up to you), I would suggest wrapping the packed bits into a class, and overriding [] such that
board[x][y]
gives you the unpacked bits. Then you can turn the packing on and off easily, and having the same syntax in either case. If you inline the operator overloads, it should be as efficient as the code you have now.
Well, 64 bytes is a very small amount of RAM. You're better off just using a char[8][8]. That is, unless you plan on storing a ton of chess boards. Doing char[8][8] makes it easier (and faster) to access the board and do more complex operations on it.
If you're still interested in storing the board in packed representation (either for practice or to store a lot of boards), I say you're "doing it right" regarding the bit operations. You may want to consider inlining your accessors if you're going for speed using the inline keyword.
Is space enough of a consideration where you can't just use a full byte to represent a square? That would make accesses easier to follow on the program and additionally most likely faster as the bit manipulations are not required.
Otherwise to make sure everything goes smoothly I would make sure all your types are unsigned: getPosition return unsigned char, and qualify all your numeric literals with "U" (0xF0U for example) to make sure they're always interpreted as unsigned. Most likely you won't have any problems with signed-ness but why take chances on some architecture that behaves unexpectedly?
Nice code, but if you are really that deep into performance optimization, you should probably learn more about your particular CPU architecture.
AFAIK, you may found that storing a chess piece in as much 8 bytes will be more efficient. Even if you recurse 15 moves deep, L2 cache size would hardly be a constraint, but RAM misalignment may be. I would guess that proper handling of a chess board would include Expand() and Reduce() functions to translate between board representations during different parts of the algorithm: some may be faster on compact representation, and some vice versa. For example, caching, and algorithms involving hashing by composition of two adjacent cells might be good for the compact structure, all else no.
I would also consider developing some helper hardware, like some FPGA board, or some GPU code, if performance is so important..
As a chess player, I can tell you: There's more to a position than the mere placement of each piece. You have to take in to consideration some other things:
Which side has to move next?
Can white and/or black castle king and/or queenside?
Can a pawn be taken en passant?
How many moves have passed since the last pawn move and/or capturing move?
If the data structure you use to represent a position doesn't reflect this information, then you're in big trouble.