Serializing/deserializing a bitfield structure in c++ - c++

I have a
typedef struct {
uint32_t Thread: HTHREAD_BUS_WIDTH;
uint32_t Member: 3;
uint32_t Proxy:3;
// Other members, fill out 32 bits
} MyStruct;
that I must transfer from one system to another as an item of
a buffer comprising 32-bit words.
What is the best way to serialize the struct, and on the other side,
to deserialize it? "best" means here safe casting, and no unneeded copying.
For one direction of casting, I have found (as member function)
int &ToInt() {
return *reinterpret_cast<int *>(this);}
Is there similar valid casting in the other way round, i.e. from integer to MyStruct; the best would be as a member function?
How can I define which bit means which field? (It may even the case,
that the deserialization happens in another program, in another language, in little/big endian systems?

How can I define which bit means which field?
You cannot. You have no control over the layout of bitfields.
"best" means here safe casting, and no unneeded copying.
There is no portable safe cast that could avoid copying.
A portable way to serialise bitfields is to manually shift into an integer, in the desired order. For example:
MyStruct value = something;
uint32_t out = 0;
out |= value.Thread;
out << HTHREAD_BUS_WIDTH;
out |= value.Member;
out << 3;
out |= value.Proxy;
In the shown example, the least significant bits contain the field Proxy while the other fields are adjacent in more significant bits.
Of course, in order to serialise this generated integer correctly, just like serialising any integer, you must take endianness into consideration. Serialisation of an integer can be portably implemented by repeatedly shifting the integer, and copying the bytes in order of significance into an array.

If you need to read from other system which might have different endianess you cannot rely on a portable bitfield. A solution is to "expanse" your structure so that each field is serialyzed as a 32 bit value in the "transport" buffer. A safe implementation could be something like:
typedef struct {
uint32_t Thread: HTHREAD_BUS_WIDTH;
uint32_t Member: 3;
uint32_t Proxy:3;
// Other members, fill out 32 bits
std::vector<uint32_t > to_buffer() const;
} MyStruct;
Implementation of to_buffer():
std::vector<uint32_t > MyStruct::to_buffer() const
{
std::vector<uint32_t> buffer;
buffer.push_back((uint32_t )(Thread);
buffer.push_back((uint32_t )(Member);
buffer.push_back((uint32_t )(Proxy);
// push other members
return buffer;
}
then on the reception side you can do the "buffer" to struct.
If you do not want to expanse the fields that do not use 32 bits you can always implement you own packing function by shifting and masking bits eg:
uint32_t menber_and_procy = (Member << 3) | proxy; // and so one for other members.
It is much more error prone.
From my own experience, if communication bandwith is not an issue, relying on "text like" content is a better choice (no endianess issues and very easy to debug).

Related

Converting uint8_t* buffer to uint16_t and changing endianness

I'd like to process data provided by an external library.
The lib holds the data and provides access to it like this:
const uint8_t* data;
std::pair<const uint8_t*, const uint8_t*> getvalue() const {
return std::make_pair(data + offset, data + length);
}
I know that the current data contains two uint16_t numbers, but I need to change their endianness.
So altogether the data is 4 bytes long and contains this numbers:
66 4 0 0
So I'd like to get two uint16_t numbers with 1090 and 0 value respectively.
I can do basic arithmetic and in one place change the endianness:
pair<const uint8_t*, const uint8_t*> dataPtrs = library.value();
vector<uint8_t> data(dataPtrs.first, dataPtrs.second);
uint16_t first = data[1] <<8 + data[0]
uint16_t second = data[3]<<8 + data[2]
However I'd like to do something more elegant (the vector is replaceable if there is better way for getting the uint16_ts).
How can I better create uint16_t from uint8_t*? I'd avoid memcpy if possible, and use something more modern/safe.
Boost has some nice header-only endian library which can work, but it needs an uint16_t input.
For going further, Boost also provides data types for changing endianness, so I could create a struct:
struct datatype {
big_int16_buf_t data1;
big_int16_buf_t data2;
}
Is it possible to safely (paddings, platform-dependency, etc) cast a valid, 4 bytes long uint8_t* to datatype? Maybe with something like this union?
typedef union {
uint8_t u8[4];
datatype correct_data;
} mydata;
Maybe with something like this union?
No. Type punning with unions is not well defined in C++.
This would work assuming big_int16_buf_t and therefore datatype is trivially copiable:
datatype d{};
std::memcpy(&d, data, sizeof d);
uint16_t first = data[1] <<8 + data[0]
uint16_t second = data[3]<<8 + data[2]
However I'd like to do something more elegant
This is actually (subjectively, in my opinion) quite an elegant way because it works the same way on all systems. This reads the data as little endian, whether the CPU is little, big or some other endian. This is well portable.
However I'd like to do something more elegant (the vector is replaceable if there is better way for getting the uint16_ts).
The vector seems entirely pointless. You could just as well use:
const std::uint8_t* data = dataPtrs.first;
How can I better create uint16_t from uint8_t*?
If you are certain that the data sitting behind the uint8_t pointer is truly a uint16_t, C++ allows: auto u16 = *static_cast<uint16_t const*>(data); Otherwise, this is UB.
Given a big endian value, transforming this into little endian can be done with the ntohs function (under linux, other OSes have similar functions).
But beware, if the pointer you hold points to two individual uint8_t values, you mustn't convert them by pointer-cast. In that case, you have to manually specify which value goes where (conceivably with a function template). This will be the most portable solution, and in all likelihood the compiler will create efficient code out of the shifts and ors.

Memory Conservation with Manual Bit Fields vs. std::bitset

I'm learning about bit flags and creating bit fields manually using bit-wise operators. I then came across bitsets, seemingly an easier and cleaner way of storing a field of bits. I understand the value of using bit fields as far as minimizing memory usage. After testing the sizeof(bitset), though, I'm having a hard time understanding how this is a better approach.
Consider:
#include <bitset>
#include <iostream>
int main ()
{
// Using Bit Set, Size = 8 Bytes
const unsigned int i1 = 0;
const unsigned int i2 = 1;
std::bitset<8> mySet(0);
mySet.set(i1);
mySet.set(i2);
std::cout << sizeof(mySet) << std::endl;
// Manually managing bit flags
const unsigned char t1 = 1 << 0;
const unsigned char t2 = 1 << 1;
unsigned char bitField = 0;
bitField |= t1 | t2;
std::cout << sizeof(bitField) << std::endl;
return 0;
}
The output is:
The mySet is 8 bytes. The bitField is 1 byte.
Should I not use std::bitset if minimal memory usage is desired?
For the lowest possible memory footprint you shouldn't use std::bitset. It will likely require more memory than a plain built in type like char or int of equivalent effective size. Thus, it probably has memory overhead, but how much will depend on the implementation.
One major advantage of std::bitset is that it frees you from hardware-dependent implementations of various types. In theory, the hardware can use any representation for any type, as long as it fulfills some requirements in the C++ standard. Thus when you rely on unsigned char t1 = 1 to be 00000001 in memory, this is not actually guaranteed. But if you create a bitset and initialize it properly, it won't give you any nasty surprises.
A sidenote on bitfiddling: considering the pitfalls to fiddling with bits in this way, can you really justify this error-prone method instead of using std::bitset or even types like int and bool? Unless you're extremely resource constrained (e.g. MCU / DSP programming), I don't think you can.
Those who play with bits will be bitten, and those who play with bytes will be bytten.
By the way, the char bitField you declare and manipulate using bitwise operators is a bit field, but it's not the C++ language notion of a bit field, which looks like this:
struct BitField{
unsigned char flag1 : 1, flag2 : 1, flag3 : 1;
}
Loosely speaking, it is a data structure whose data member(s) are subdivided into separate variables. In this case the unsigned char of (presumably) 8 bits is used to create three 1-bit variables (flag1, flag2 and flag3). It is explicitly subdivided, but at the end of the day this is just compiler-/language-assisted bit fiddling similar to what you did above. You can read more about bit fields here.

Portable bit fields for Handles

I want to use and store "Handles" to data in an object buffer to reduce allocation overhead. The handle is simply an index into an array with the object. However I need to detect use-after-reallocations, as this could slip in quite easily. The common approach seems to be using bit fields. However this leads to 2 problems:
Bit fields are implementation defined
Bit shifting is not portable across big/little endian machines.
What I need:
Store handle to file (file handler can manage either integer types (byte swapping) or byte arrays)
Store 2 values in the handle with minimum space
What I got:
template<class T_HandleDef, typename T_Storage = uint32_t>
struct Handle
{
typedef T_HandleDef HandleDef;
typedef T_Storage Storage;
Handle(): handle_(0){}
private:
const T_Storage handle_;
};
template<unsigned T_numIndexBits = 16, typename T_Tag = void>
struct HandleDef{
static const unsigned numIndexBits = T_numIndexBits;
};
template<class T_Handle>
struct HandleAccessor{
typedef typename T_Handle::Storage Storage;
typedef typename T_Handle::HandleDef HandleDef;
static const unsigned numIndexBits = HandleDef::numIndexBits;
static const unsigned numMagicBits = sizeof(Storage) * 8 - numIndexBits;
/// "Magic" struct that splits the handle into values
union HandleData{
struct
{
Storage index : numIndexBits;
Storage magic : numMagicBits;
};
T_Handle handle;
};
};
A usage would be for example:
typedef Handle<HandleDef<24> > FooHandle;
FooHandle Create(unsigned idx, unsigned m){
HandleAccessor<FooHandle>::HandleData data;
data.idx = idx;
data.magic = m;
return data.handle;
}
My goal was to keep the handle as opaque as possible, add a bool check but nothing else. Users of the handle should not be able to do anything with it but passing it around.
So problems I run into:
Union is UB -> Replace its T_Handle by Storage and add a ctor to Handle from Storage
How does the compiler layout the bit field? I fill the whole union/type so there should be no padding. So probably the only thing that can be different is which type comes first depending on endianess, correct?
How can I store handle_ to a file and load it from a possible different endianess machine and still have index and magic be correct? I think I can store the containing Storage 'endian-correct' and get correct values, IF both members occupy exactly half the space (2 Shorts in an uint) But I always want more space for the index than for the magic value.
Note: There are already questions about bitfields and unions. Summary:
Bitfields may have unexpected padding (impossible here as whole type occupied)
Order of "members" depend on compiler (only 2 possible ways here, should be save to assume order depends entirely on endianess, so this may or may not actually help here)
Specific binary layout of bits can be achieved by manual shifting (or e.g. wrappers http://blog.codef00.com/2014/12/06/portable-bitfields-using-c11/) -> Is not an answer here. I need also a specific layout of the values IN the bitfield. So I'm not sure what I get, if I e.g. create a handle as handle = (magic << numIndexBits) | index and save/load this as binary (no endianess conversion) Missing a BigEndian machine for testing.
Note: No C++11, but boost is allowed.
Answer is pretty simple (based on another question I forgot the link to and comments by #Jeremy Friesner ):
As "numbers" are already an abstraction in C++ one can be sure to always have the same bit representation when the variable is in a CPU register (when it is used for anything calculation like) Also bit shifts in C++ are defined in an endian-independent way. This means x << 1 is always equal x * 2 (and hence big-endian)
Only time one get endianess problems is when saving to file, send/recv over network or accessing it from memory differently (e.g. via pointers...)
One cannot use C++ bitfields here, as one cannot be 100% sure about the order of the "entries". Bitfield containers might be ok, if they allow access to the data as a "number".
Savest is (still) using bitshifts, which are very simple in this case (only 2 values) During storing/serialization the number must then be stored in an endian-agnostic way.

Concise bit-manipulation for 64bit integer handle type

I have a 64bit integer that is used as a handle. The 64bits must be sliced into the following fields, to be accessed individually:
size : 30 bits
offset : 30 bits
invalid flag : 1 bit
immutable flag : 1 bit
type flag : 1 bit
mapped flag : 1 bit
The two ways I can think of to achieve this are:
1) Traditional bit operations (& | << >>), etc. But I find this a bit cryptic.
2) Use a bitfield struct:
#pragma pack(push, 1)
struct Handle {
uint32_t size : 30;
uint32_t offset : 30;
uint8_t invalid : 1;
uint8_t immutable : 1;
uint8_t type : 1;
uint8_t mapped : 1;
};
#pragma pack(pop)
Then accessing a field becomes very clear:
handle.invalid = 1;
But I understand bitfields are quite problematic and non-portable.
I'm looking for ways to implement this bit manipulation with the object of maximizing code clarity and readability. Which approach should I take?
Side notes:
The handle size must not exceed 64bits;
The order these fields are laid in memory is irrelevant, as long as each field size is respected;
The handles are not saved/loaded to file, so I don't have to worry about endianess.
I would go for the bitfields solution.
Bitfields are only "non-portable" if you want to store the in binary form and later read the bitfield using a different compiler or, more commonly, on a different machine architecture. This is mainly because field order is not defined by the standard.
Using bitfields within your application will be fine, and as long as you have no requirement for "binary portability" (storing your Handle in a file and reading it on a different system with code compiled by a different compiler or different processor type), it will work just fine.
Obviously, you need to do some checking, e.g. sizeof(Handle) == 8 should be done somewhere, to ensure that you get the size right, and compiler hasn't decided to put your two 30-bit values in separate 32-bit words. To improve the chances of success on multiple architectures, I'd probably define the type as:
struct Handle {
uint64_t size : 30;
uint64_t offset : 30;
uint64_t invalid : 1;
uint64_t immutable : 1;
uint64_t type : 1;
uint64_t mapped : 1;
};
There is some rule that the compiler should not "split elements", and if you define something as uint32_t, and there are only two bits left in the field, the whole 30 bits move to the next 32-bit element. [It probably works in most compilers, but just in case, using the same 64-bit type throughout is a better choice]
I recommend bit operations. Of course you should hide all those operations inside a class. Provide member functions to perform set/get operations. Judicious use of constants inside the class will make most of the operations fairly transparent. For example:
bool Handle::isMutable() const {
return bits & MUTABLE;
}
void Handle::setMutable(bool f) {
if (f)
bits |= MUTABLE;
else
bits &= ~MUTABLE;
}

Custom byte size?

So, you know how the primitive of type char has the size of 1 byte? How would I make a primitive with a custom size? So like instead of an in int with the size of 4 bytes I make one with size of lets say 16.
Is there a way to do this? Is there a way around it?
It depends on why you are doing this. Usually, you can't use types of less than 8 bits, because that is the addressable unit for the architecture. You can use structs, however, to define different lengths:
struct s {
unsigned int a : 4; // a is 4 bits
unsigned int b : 4; // b is 4 bits
unsigned int c : 16; // c is 16 bits
};
However, there is no guarantee that the struct will be 24 bits long. Also, this can cause endian issues. Where you can, it's best to use system independent types, such as uint16_t, etc. You can also use bitwise operators and bit shifts to twiddle things very specifically.
Normally you'd just make a struct that represents the data in which you're interested. If it's 16 bytes of data, either it's an aggregate of a number of smaller types or you're working on a processor that has a native 16-byte integral type.
If you're trying to represent extremely large numbers, you may need to find a special library that handles arbitrarily-sized numbers.
In C++11, there is an excellent solution for this: std::aligned_storage.
#include <memory>
#include <type_traits>
int main()
{
typedef typename std::aligned_storage<sizeof(int)>::type memory_type;
memory_type i;
reinterpret_cast<int&>(i) = 5;
std::cout << reinterpret_cast<int&>(i) << std::endl;
return 0;
}
It allows you to declare a block of uninitialized storage on the stack.
If you want to make a new type, typedef it. If you want it to be 16-bytes in size, typedef a struct that has 16-bytes of member data within it. Just beware that quite often compilers will pad things on you to match your systems alignment needs. A 1 byte struct rarely remains 1 bytes without care.
You could just static cast to and from std::string. I don't know enough C++ to give an example, but I think this would be pretty intuitive.